Store, index, query, and transform spatio-temporal data
at scale in Accumulo, HBase, Cassandra, and Kafka

Frequently Asked Questions

What versions of Accumulo are compatible with GeoMesa?

The tip of the master code branch supports Accumulo 1.5; the team is working on concurrent support for multiple versions of Accumulo now.

What is the best way to ingest data into GeoMesa?

There are at least two reasonable ways to get data loaded into GeoMesa efficiently:

  • batch ingest: the GDELT tutorial describes how to batch-load data using a map-reduce job
  • streaming ingest: the OSM GPS tutorial describes how to stream data into GeoMesa using Kafka and Storm

How can I tell if my ingest did anything?

The best way is to inspect your tables. This is easiest if the tables did not exist before your ingest job. There are at least three ways you might do this:

  • Point your browser to the Accumulo web monitor, by default running on port 50095 on the master, click on the "tables" link, and browse down to the records table that corresponds to your ingest job. The "Entries" column will tell you how many total records are present.
  • Use the Accumulo shell on the command line to scan the records table that corresponds to your ingest job.
  • Use the GeoMesa command-line tools to issue a test export query, and see if any results come back.

One common issue that can prevent ingest jobs from loading any real data is when the FeatureType does not match the format of the data being read.

Why do my queries hang (never finish)?

Although it is possible to request such a large number of records that queries timeout or appear never to finish, the far more frequent cause of this behavior is a mis-configured cloud. Check to be sure that ports are open on all nodes; that clock-time is reasonably consistent among the nodes; and that all software versions are the same across your cluster.

Why are there multiple tables for my one feature?

GeoMesa now supports secondary indexes, and we do this by using multiple support tables per feature (or group of features). Assume that we used "GeomesaTest" as the base table-name in the DataStore connection parameters for a feature named "PointObservation". The tables we use include the following:

  • GeomesaTest: This is the catalog table, where metadata concerning all of the features that share this table base name will be stored.
  • GeomesaTest_PointObservation_st_idx: This is the spatio-temporal index, successor to the original, single GeoMesa table that was created for a feature type.
  • GeomesaTest_PointObservation_attr_idx: This table indexes record IDs by secondary attributes.
  • GeomesaTest_PointObservation_records: This table stores every record exactly once, indexed by ID.
  • GeomesaTest_PointObservation_queries: If you create the feature with query-tracking enabled, then this table will exist, and will contain metadata concerning the queries you have run against this feature.

Assuming that your features share tables, which is the default, then you would expect the total number of tables to follow approximately 1 + 4F, where F is the number of feature types you ingest: one catalog table, and four feature-specific data tables.

Does each data type need its own table?

No. So long as the index-schema formats are distinct — and by default this will always be the case for any two FeatureTypes that have different feature names — then any number of features can share the same table. It is possible, by using custom index-schema formats, to introduce confusion between feature types within the same table; this is one reason why using those advanced features is not recommended for developers who are new to GeoMesa.

There is now a user-data option available to specify whether a new FeatureType should share tables or not: sft.getUserData.put("geomesa_table_sharing", true), for example, will enable table-sharing for this SimpleFeatureType. As a convenience, the utility function org.locationtech.geomesa.core.index.setTableSharing(sft, _) will do this for you, where "_" is where you would specify a boolean value.

Why is my CRS not being read?

If the GeoTools packages for reading EPSG data are on your classpath, review the permissions on your temporary directory. On multiple occasions, we have seen issues when the temporary directory containing EPSG data has been owned by a different user.

Another option is to ensure that you are using your own, specific directory. To do that, include -Djava.io.tmpdir=/tmp/YourUsername (substituting your username) when running a JVM.

What do I need to do to rebundle GeoMesa?

If you rebundle GeoMesa as part of a larger project's JAR file, it is important to ensure that the manifest includes the proper entries for the service-provider interfaces (SPIs); otherwise, the various factories will be unable to find the implementations you expect.