Architecture ------------ GeoMesa Spark provides capabilities to run geospatial analysis jobs on the distributed, large-scale data processing engine `Apache Spark`_. This provides interfaces for Spark to ingest and analyze geospatial data stored in GeoMesa Accumulo and other data stores. GeoMesa Spark is divided into two modules. GeoMesa :doc:`./core` (``geomesa-spark-core``) is an extension for Spark that takes `GeoTools`_ ``Query`` objects as input and produces resilient distributed datasets (``RDD``\ s) containing serialized versions of geometry objects. Multiple backends that target different types of feature stores are available, including ones for GeoMesa Accumulo, other GeoTools ``DataStore``\ s, or files readable by the :ref:`converters` library. GeoMesa :doc:`./sparksql` (``geomesa-spark-sql``), in turn, stacks on GeoMesa Spark Core to convert between ``RDD``\ s and ``DataFrame``\ s. GeoMesa SparkSQL pushes down filtering logic from SQL queries and converts them into GeoTools ``Query`` objects, which are then passed to the ``GeoMesaSpark`` object provided by GeoMesa Spark Core. GeoMesa SparkSQL also provides a number of user-defined types and functions for working with geometry objects. .. image:: /user/_static/img/geomesa-spark-stack.png :align: center A stack composed of a distributed data store such as Accumulo, GeoMesa, the GeoMesa Spark libraries, Spark, and the `Jupyter`_ interactive notebook application (see above) provides a complete large-scale geospatial data analysis platform. See :doc:`/tutorials/spark` for a tutorial on analyzing data with GeoMesa Spark. .. _Apache Spark: https://spark.apache.org/ .. _Jupyter: http://jupyter.org/