Deploying GeoMesa Spark with Zeppelin ===================================== Apache `Zeppelin`_ is a web-based notebook server for interactive data analytics, which includes built-in support for `Spark`_. .. note:: The instructions below have been tested with Zeppelin version |zeppelin_version|. Installing Zeppelin ------------------- Follow the `Zeppelin installation instructions`_, as well as the instructions for `configuring the Zeppelin Spark interpreter`_. Configuring Zeppelin with GeoMesa --------------------------------- The GeoMesa Accumulo Spark runtime JAR may be found either in the ``dist/spark`` directory of the GeoMesa Accumulo binary distribution, or (after building) in the ``geomesa-accumulo/geomesa-accumulo-spark-runtime-accumulo2/target`` directory of the GeoMesa source distribution. .. note:: See :ref:`spatial_rdd_providers` for details on choosing the correct GeoMesa Spark runtime JAR. #. In the Zeppelin web UI, click on the downward-pointing triangle next to the username in the upper-right hand corner and select "Interpreter". #. Scroll to the bottom where the "Spark" interpreter configuration appears. #. Click on the "edit" button next to the interpreter name (on the right-hand side of the UI). #. In the "Dependencies" section, add the GeoMesa JAR, either as a. the full local path to the ``geomesa-accumulo-spark-runtime-accumulo2_${VERSION}.jar`` described above, or b. the Maven groupId:artifactId:version coordinates (``org.locationtech.geomesa:geomesa-accumulo-spark-runtime-accumulo2_2.12:$VERSION``) #. Click "Save". When prompted by the pop-up, click to restart the Spark interpreter. It is not necessary to restart Zeppelin. Data Plotting ------------- Zeppelin provides built-in tools for visualizing quantitative data, which can be invoked by prepending "%table\\n" to tab-separated output (see the `Zeppelin table display system`_). For example, the following method may be used to print a ``DataFrame`` via this display system: .. code-block:: scala def printDF(df: DataFrame) = { val dfc = df.collect println("%table") println(df.columns.mkString("\t")) dfc.foreach(r => println(r.mkString("\t"))) } It is also possible to use third-party libraries such as `Vegas`_, as described in :ref:`jupyter_vegas`. For Zeppelin, the following implicit ``displayer`` method should be used: .. code-block:: scala implicit val displayer: String => Unit = s => println("%html\n"+s) .. |zeppelin_version| replace:: 0.7.0 .. _configuring the Zeppelin Spark interpreter: https://zeppelin.apache.org/docs/0.7.0/interpreter/spark.html .. _Spark: http://spark.apache.org/ .. _Vegas: https://github.com/vegas-viz/Vegas/ .. _Zeppelin: https://zeppelin.apache.org/ .. _Zeppelin installation instructions: https://zeppelin.apache.org/docs/0.7.0/install/install.html .. _Zeppelin table display system: https://zeppelin.apache.org/docs/0.7.0/displaysystem/basicdisplaysystem.html#table