21.5. FileSystem Command-Line Tools¶
The GeoMesa FileSystem distribution includes a set of command-line tools for feature management, ingest, export and debugging.
To install the tools, see Setting up the FileSystem Command Line Tools.
Once installed, the tools should be available through the command geomesa-fs:
$ geomesa-fs
INFO Usage: geomesa-fs [command] [command options]
Commands:
...
Commands that are common to multiple back ends are described in Command-Line Tools. The commands here are FileSystem-specific.
21.5.1. General Arguments¶
Most commands require the --path argument, to specify the root storage path. Configuration properties can
be passed in using --config or --config-file, which can be used to specify e.g. s3-related properties.
The --auths argument corresponds to the data store parameter geomesa.security.auths. See
Data Security for more information.
21.5.2. Commands¶
21.5.2.1. compact¶
Compact one or more filesystem partitions. This will merge multiple files into fewer, larger files, which may provide better query performance.
Argument |
Description |
|---|---|
|
The filesystem root path used to store data |
|
The name of the schema |
|
Partitions to compact (omit to compact all partitions) |
|
One of |
|
Path to a temp directory used for working files |
The --temp-path argument may be useful when working with s3 data, as s3 is slow for incremental writes.
21.5.2.2. generate-partition-filters¶
Calculate filters that exactly match partitions. This can be used to facilitate exports from another system directly into the appropriate partition directory.
Argument |
Description |
|---|---|
|
The filesystem root path used to store data |
|
The name of the schema |
|
CQL predicate to determine the partitions to operate on |
|
Partitions to operate on |
|
Suppress the column headers in the output |
At least one of --cql or --partitions must be specified, to select the partitions being operated on.
The results will be output in tab-delimited text, containing the partition name and the associated filter.
21.5.2.3. get-files¶
Displays the files for one or more filesystem partitions.
Argument |
Description |
|---|---|
|
The filesystem root path used to store data |
|
The name of the schema |
|
Partitions to list (omit to list all partitions) |
|
A file containing partitions to list, one per line (omit to list all partitions) |
|
CQL predicate to determine the partitions to operate on |
21.5.2.4. get-partitions¶
Displays the partitions for a given filesystem store.
Argument |
Description |
|---|---|
|
The filesystem root path used to store data |
|
The name of the schema |
|
Do not output a header line with column names |
21.5.2.5. ingest¶
For an overview of ingestion options, see ingest.
This command ingests files into a GeoMesa FS Datastore. Note that a “datastore” is simply a path in the filesystem. All data and metadata will be stored in the filesystem under the hierarchy of the root path.
Argument |
Description |
|---|---|
|
The filesystem root path used to store data |
|
Partition schemes |
|
Number of reducers to use (required for distributed ingest) |
|
Path to a temp directory used for working files |
|
Additional storage options to set as SimpleFeatureType user data, in the form |
If the schema does not already exist, then --partition-scheme is required, otherwise it may be omitted.
The --partition-scheme argument should be the well-known name of a provided partition scheme. See
Partition Schemes for more information.
The --num-reducers should generally be set to half the number of partitions.
The --temp-path argument may be useful when working with s3 data, as s3 is slow to write to.
21.5.2.6. manage-metadata¶
This command will compact, add and delete metadata entries in a file system storage instance. It has four sub-commands:
register- create a new metadata entry for an existing data fileunregister- remove a metadata entry for an existing data fileconfigure- set or unset metadata configuration valuesmigrate- migrate metadata from one type to anothercheck-consistency- check consistency between the metadata and data files
To invoke the command, use the command name followed by the sub-command, then any arguments. For example:
$ geomesa manage-metadata compact -p /tmp/geomesa ...
Argument |
Description |
|---|---|
|
The filesystem root path used to store data |
|
The name of the schema |
21.5.2.6.1. register¶
The register sub-command will add metadata associated with a particular file. When new data files are created through some
external bulk process, then they must be registered using this command before they are queryable. Note that generally
files must already be in the same filesystem in order to be registered.
Argument |
Description |
|---|---|
|
The path of the file(s) to register |
|
Delete files after copying them into the storage root path |
21.5.2.6.2. unregister¶
The unregister sub-command will the delete metadata associated with a particular file.
Argument |
Description |
|---|---|
|
The path of the file to unregister, relative to the storage root path |
21.5.2.6.3. configure¶
The configure sub-command lets you set storage-level configuration options.
Argument |
Description |
|---|---|
|
The configuration to set, in the form |
21.5.2.6.4. migrate¶
The migrate sub-command will move the metadata storage from one type (file or jdbc) to another.
Argument |
Description |
|---|---|
|
Metadata type to migrate to |
|
Metadata configuration properties for the type to migrate to, in the form k=v |
|
Name of a metadata configuration file for the type to migrate to, in Java properties format |
21.5.2.6.5. check-consistency¶
The check-consistency sub-command will check the metadata against the data files. It will
find data files that are not referenced in the metadata, and metadata entries that do not
correspond to data files.
Argument |
Description |
|---|---|
|
The name of partitions to check (omit to check all partitions) |
|
A file containing partitions to check, one per line (omit to check all partitions) |
|
Number of threads to use when listing data files |
21.5.2.7. register-iceberg-files¶
This command will register GeoMesa files into an Apache Iceberg table, allowing the data to be queried by any Iceberg-compatible query engine such as Apache Spark or Apache Trino.
Argument |
Description |
|---|---|
|
The filesystem root path used to store data |
|
The name of the schema containing the files to register |
|
The partitions to register |
|
A file containing partitions to register, one per line |
|
Configuration properties for connecting to Iceberg, in the form |
|
Name of a configuration file for connecting to Iceberg, in Java properties format |
|
Iceberg namespace to use for tables |
|
Do not check the table for existing files - if the partition has not been registered before, checking for duplicates is unnecessary work. However, care should be taken with this option, as duplicate files will cause duplicate results when querying |
Note: at least one of --partition or --partition-file must be specified. At least one of --iceberg-config or
--iceberg-config-file must be specified.
See the Apache Iceberg documentation for details on configuration properties.
Note
You may need to add additional dependencies to the GeoMesa classpath, depending on the Iceberg catalog implementation being used.