7.3. Ingest Commands

These commands are used to insert and delete simple features. Required parameters are indicated with a *.

7.3.1. delete-features

Delete specific features from a schema. Note that if deleting all features, it may be faster to delete the schema and re-create it.

Argument Description
-c, --catalog * The catalog table containing schema metadata
-f, --feature-name * The name of the schema
-q, --cql CQL filter to select features to delete

7.3.2. ingest

The ingest command takes files in various formats and ingests them as SimpleFeatures in GeoMesa. Generally, a GeoMesa ‘converter’ definition is required to map input data to SimpleFeatures. GeoMesa supports common input formats such as delimited text (TSV, CSV), fixed width files, JSON, XML, and Avro. The converter framework is extensible via Java SPI, to allow support for custom formats. See GeoMesa Convert for more information on converters.

See Moving and Migrating Data for details on how the export/import commands can be used to move data between clusters.

Argument Description
-c, --catalog * The catalog table containing schema metadata
-C, --converter The GeoMesa converter used to create SimpleFeatures
-s, --spec The SimpleFeatureType specification to create
-f, --feature-name The name of the schema
-t, --threads Number of parallel threads used
--input-format Format of input files (csv, tsv, avro, shp, json, etc)
--run-mode Must be one of distributed or local
<files>... Input files to ingest

The --converter argument may be any of the following:

  • The name of a GeoMesa converter already available on the classpath
  • A converter configuration string
  • The name of a file containing a converter configuration

See Defining Simple Feature Converters for more details on specifying the converter.

If a converter is not specified, GeoMesa will attempt to extract a schema from the input files. Currently this only works in a very limited fashion - the input must be a Shapefile, self describing Avro, or delimited text (TSV, CSV) with a header row that defines the attribute name and type of each column. For example:

id,name:String,dtg:Date,*geom:Point:srid=4326
1,foo,2017-01-01T00:00:00.000Z,POINT(45 55)
2,bar,2017-01-02T00:00:00.000Z,POINT(55 65)

If the --feature-name is specified and the schema already exists, then --spec is not required. Otherwise, --spec may be any of the following:

  • A string of attributes, for example name:String,dtg:Date,*geom:Point:srid=4326
  • The name of a SimpleFeatureType already available on the classpath
  • A string of attributes, defined as a TypeSafe configuration
  • The name of a file containing one of the above

See Defining Simple Feature Types for more details on specifying the SimpleFeatureType.

If the schema doesn’t exist, the --feature-name argument is required if it is not implied by the specification string. It may also be used to override the implied feature name.

The --input-format argument can be used to specify the type of files being ingested. Currently GeoMesa supports Avro, CSV, TSV, Json/GeoJson, GML, and SHP. If not specified, the input file extensions will be used to determine the file type.

The --run-mode argument can be used to run ingestion locally or distributed (using map/reduce). Note that in order to run in distributed mode, the input files must be in HDFS. By default, input files on the local filesystem will be ingested in local mode, and input files in HDFS will be ingested in distributed mode.

The --threads argument can be used to increase local ingest speed. However, there can not be more threads than there are input files. The --threads argument is ignored for distributed ingest.

The <files>... argument specifies the files to be ingested. * may be used as a wild card in file paths. GeoMesa can handle gzip, bzip and xz file compression as long as the file extensions match the compression type. GeoMesa supports ingesting files from local disks or HDFS. In addition, Amazon’s S3 and Microsoft’s Azure file systems are supported with a few configuration changes. See Remote File System Support for details.

Instead of specifying files, input data may be piped directly to the ingest command using stdin shell redirection. Note that this will only work in local mode, and will only use a single thread for ingestion. Progress indicators may not be entirely accurate, as the total size isn’t known up front. For example:

cat foo.csv | geomesa-accumulo ingest ...