21.11. Converter Mode¶

The normal use-case for the FileSystem data store is to ingest data into it in the same way as any other database. However, the data store also supports reading arbitrary data files that may come from some other process, using GeoMesa Converters, as long as they meet a few criteria. To use this mode, specify fs.encoding as converter when creating a data store.

Note that converter mode is read-only.

21.11.1. Configuration¶

Converter mode requires several properties to be specified in the data store configuration. These can be set using the FileSystem Data Store Parameters fs.config.properties and fs.config.file.

21.11.1.1. `fs.options.converter.path`¶

This property must point to the root path containing the files to read.

21.11.1.2. `fs.options.sft.name`¶

This property may contain a well-known feature type name, to be loaded from the classpath.

21.11.1.3. `fs.options.sft.conf`¶

This property may contain a full feature type definition.

21.11.1.4. `fs.options.converter.name`¶

This property may contain a well-known converter name, to be loaded from the classpath.

21.11.1.5. `fs.options.converter.conf`¶

This property may contain a full converter definition.

21.11.1.6. `fs.options.leaf-storage`¶

Leaf storage controls the final layout of files and folders. When using leaf storage, the last component of the partition path is used as a prefix to the data file name, instead of as a separate folder. This can result in less directory overhead for filesystems such as S3.

As an example, a partition scheme of yyyy/MM/dd would produce a partition path like 2016/01/01. With leaf storage, the data files for that partition would be 2016/01/01_<datafile>.parquet. If leaf storage is disabled, the data files would be 2016/01/01/<datafile>.parquet, creating an extra level of directories.

21.11.1.7. `fs.partition-scheme.name`¶

Comma-delimited list of partition schemes used by the files. Additional partition scheme options can be configured by prefixing them with fs.partition-scheme.opts..

21.11.2. Path Filters¶

The FSDS can filter paths within a partition for more granular control of queries. Path filtering is configured in the feature type through the user data key geomesa.fs.path-filter.name.

Currently, the only implementation is the dtg path filter, whose purpose is to parse a datetime from the given path and compare it to the query filter to include or exclude the file from the query. The following options are required for the dtg path filter, configured through the key geomesa.fs.path-filter.opts:

attribute - The Date attribute in the query to compare against.
pattern - The regular expression, with a single capturing group, to extract a datetime string from the path.
format - The datetime formatting pattern to parse a date from the regex capture.
buffer - The duration to buffer the bounds of the parsed datetime by within the current partition. To buffer time across partitions, see the receipt-time partition scheme.

Custom path filters can be loaded via SPI.

21.11.3. Hierarchical Temporal Partitioning¶

The standard temporal partition schemes supported by the FileSystem data store are somewhat opaque, as they correspond to a number of days (or weeks/years/etc) since the Unix epoch (1970/01/01). The converter mode supports an additional temporal scheme that uses standard date formatting, which may be easier to use with external processes. Note that where the names overlap, the hierarchical schemes will take precedence over the standard partition schemes (when using the converter store).

21.11.3.1. Custom Scheme¶

Name: datetime

Configuration:

datetime-format - A Java DateTime format string, separated by forward slashes, which will be used to build a directory structure. For example, yyyy/MM/dd.
step-unit - A java.time.temporal.ChronoUnit defining how to increment the leaf of the partition scheme
step - The amount to increment the leaf of the partition scheme. If not specified, defaults to 1

The date-time scheme provides a fully customizable temporal scheme.

21.11.3.2. Hourly¶

Name: hourly

The hourly scheme partitions data by the hour, using the layout yyyy/MM/dd/HH.

21.11.3.3. Minute¶

Name: minute

The minute scheme partitions data by the minute, using the layout yyyy/MM/dd/HH/mm.

21.11.3.4. Daily¶

Name: daily

The daily scheme partitions data by the day, using the layout yyyy/MM/dd.

21.11.3.5. Weekly¶

Name: weekly

The weekly scheme partitions data by the week, using the layout yyyy/ww.

21.11.3.6. Monthly¶

Name: monthly

The monthly scheme partitions data by the month, using the layout yyyy/MM.

21.11.3.7. Julian¶

Names: julian-minute, julian-hourly, julian-daily

Julian schemes partition data by Julian day, instead of month/day. They use the patterns yyyy/DDD/HH/mm, yyyy/DDD/HH, and yyyy/DDD respectively

21.11.3.8. Receipt Time¶

Name: receipt-time