GeoMesa Spark provides capabilities to run geospatial analysis jobs on the distributed, large-scale data processing engine Apache Spark. This provides interfaces for Spark to ingest and analyze geospatial data stored in GeoMesa Accumulo and other data stores.
GeoMesa Spark is divided into two modules.
GeoMesa Spark Core (
geomesa-spark-core) is an extension for Spark that takes
Query objects as input and produces resilient distributed datasets
RDDs) containing serialized versions of geometry objects. Multiple
backends that target different types of feature stores are available,
including ones for GeoMesa Accumulo, other GeoTools
DataStores, or files
readable by the GeoMesa Convert library.
GeoMesa SparkSQL (
geomesa-spark-sql), in turn, stacks on GeoMesa Spark
Core to convert between
DataFrames. GeoMesa SparkSQL pushes down
filtering logic from SQL queries and converts them into GeoTools
which are then passed to the
GeoMesaSpark object provided by GeoMesa Spark Core.
GeoMesa SparkSQL also provides a number of user-defined types and functions for
working with geometry objects.
A stack composed of a distributed data store such as Accumulo, GeoMesa, the GeoMesa Spark libraries, Spark, and the Jupyter interactive notebook application (see above) provides a complete large-scale geospatial data analysis platform.
See Apache Spark Analysis for a tutorial on analyzing data with GeoMesa Spark.