21.6. FileSystem DataStore Configuration¶
21.6.1. System Properties¶
This section details configuration properties specific to the FileSystem data store. For general properties, see Runtime Configuration.
21.6.1.1. geomesa.fs.file.cache.duration¶
To avoid repeated reads from disk, GeoMesa will cache the results of disk operations for a certain period of time. This has the side effect that files modified by external processes may not be visible until after the cache timeout.
The property is defined as a duration, e.g. 60 seconds or 100 millis. By default it is 10 minutes.
21.6.1.2. geomesa.fs.size.threshold¶
When specifying a target size for data files, this property controls the error margin that is considered acceptable. Files which are outside of the margin may be merged or split during compactions. See Configuring Target File Size for more information.
The threshold is specified as a float greater than 0 and less than 1, with a default value of 0.05.
For example, if the target file size is 100 bytes, then an error threshold of 0.05 means that files will not
be compacted if they are between 95 and 105 bytes.
21.6.1.3. geomesa.fs.validate.file¶
When this property is set, files will be checked for data corruption after closing a file writer.
21.6.1.4. geomesa.fs.writer.partition.timeout¶
When writing to multiple partitions, each partition writer is kept open until the overall feature writer is closed. When writing to many partitions at once, this may cause memory problems due to the large number of writers. To mitigate this, idle partitions can be closed after a configurable timeout.
The timeout is defined as a duration, e.g. 60 seconds or 100 millis.
21.6.2. Storage Configuration Properties¶
Storage-specific configuration properties are configured through the FileSystem Data Store Parameters fs.config.properties and
fs.config.file. Additional properties related to metadata storage are outlined in FileSystem Metadata.
21.6.2.1. geomesa.fs.visibilities¶
This property can be used to skip writing visibility labels, by setting it to false. If the data being written is known to
not have any labels, the visibility column can be removed. See Data Security for an overview of security labels.
21.6.2.2. geomesa.parquet.bounding-boxes¶
This property can be used to skip writing bounding boxes for geometry-type columns when using Parquet, by setting
it to false. By default, each geometry will include an array-type column that includes the minimum and maximum extents
of the geometry. This can be used to accelerate queries through push-down filtering.
21.6.2.3. geomesa.parquet.geometries¶
This property can be used to control the encoding schema used for geometry-type columns when using Parquet. The available options are:
GeoParquetWkb(default) - This schema uses GeoParquet 1.1.0 with geometries encoded as WKB. This format is supported by most 3rd party libraries that can read GeoParquet.GeoParquetNative- This schema uses GeoParquet 1.1.0 with geometries encoded “natively”. This format doesn’t require special libraries to read, but isn’t as widely supported as WKB.
21.6.3. AWS S3 Configuration¶
The following properties are specific to S3, and are also specified through the FileSystem Data Store Parameters fs.config.properties
and fs.config.file. Additionally, the AWS Java SDK (v2) will load
configuration from various places, such as ~/.aws/config, ~/.aws/credentials or various environment variables. Many of
these parameters map directly to the underlying S3 client configuration. See the AWS documentation for details.
21.6.3.1. fs.s3.region¶
Override the default S3 region. Can also be specified through the environment variable AWS_REGION.
21.6.3.2. fs.s3.access-key-id¶
Authenticate with this AWS access key. Can also be specified through the environment variable AWS_ACCESS_KEY_ID.
Warning
AWS credentials are valuable - make sure to safeguard them appropriately.
21.6.3.3. fs.s3.secret-access-key¶
Authenticate with this AWS secret access key. Can also be specified through the environment variable AWS_SECRET_ACCESS_KEY.
Warning
AWS credentials are valuable - make sure to safeguard them appropriately.
21.6.3.4. fs.s3.endpoint¶
Override the default S3 endpoint url. Can also be specified through the environment variable FS_S3_ENDPOINT.
21.6.3.5. fs.s3.force-path-style¶
Force “path-style” access, useful for connecting to non-AWS stores such as Minio. Can also be specified through the environment
variable FS_S3_FORCE_PATH_STYLE.
21.6.3.6. fs.s3.write-buffering¶
Specify the buffering strategy for writes to S3. Must be one of disk (default) or memory. Data will be written to the
specified location until it hits a certain threshold, at which point it will be asynchronously uploaded to S3.
Can also be specified through the environment variable FS_S3_WRITE_BUFFERING.
21.6.3.7. fs.s3.write-buffer-dir¶
When using disk buffering, specify the directory to use for intermediate writes, default ${java.io.tmpdir}/s3/.
Can also be specified through the environment variable FS_S3_WRITE_BUFFER_DIR.
21.6.3.8. fs.s3.write-buffer-in-bytes¶
Specify the amount of data to buffer before uploading to S3, default 64MB. Can also be specified through the environment
variable FS_S3_WRITE_BUFFER_IN_BYTES.
21.6.3.9. fs.s3.num-retries¶
Specify the number of retries in the S3 client. Can also be specified through the environment variable FS_S3_NUM_RETRIES.
21.6.3.10. fs.s3.target-throughput-in-gbps¶
Specify the target throughput in the S3 client. Can also be specified through the environment variable
FS_S3_TARGET_THROUGHPUT_IN_GBPS.
21.6.3.11. fs.s3.minimum-part-size-in-bytes¶
Specify the minimum part size in the S3 client. Can also be specified through the environment variable
FS_S3_MINIMUM_PART_SIZE_IN_BYTES.
21.6.3.12. fs.s3.max-concurrency¶
Specify the maximum concurrency in the S3 client. GeoMesa overrides the default to be 600. Can also be specified through the
environment variable FS_S3_MAX_CONCURRENCY.
21.6.3.13. fs.s3.connection-timeout¶
Specify the connection timeout in the S3 client. Can also be specified through the environment variable
FS_S3_CONNECTION_TIMEOUT.
21.6.3.14. fs.s3.max-native-memory-limit-in-bytes¶
Specify the maximum native memory limit in the S3 client. Can also be specified through the environment variable
FS_S3_MAX_NATIVE_MEMORY_LIMIT_IN_BYTES.
21.6.3.15. fs.s3.request-checksum-calculation¶
Specify the checksum calculation in the S3 client. Valid values are WHEN_SUPPORTED or WHEN_REQUIRED. Can also be
specified through the environment variable FS_S3_REQUEST_CHECKSUM_CALCULATION.
21.6.3.16. fs.s3.response-checksum-validation¶
Specify the checksum calculation in the S3 client. Valid values are WHEN_SUPPORTED or WHEN_REQUIRED. Can also be
specified through the environment variable FS_S3_RESPONSE_CHECKSUM_VALIDATION.
21.6.3.17. fs.s3.initial-read-buffer-size-in-bytes¶
Specify the initial read buffer size in the S3 client. Can also be specified through the environment variable
FS_S3_INITIAL_READ_BUFFER_SIZE_IN_BYTES.
21.6.3.18. fs.s3.accelerate¶
Enable the S3 client to use S3 Transfer Acceleration endpoints, default false. Can also be specified through the environment
variable FS_S3_ACCELERATE.
21.6.3.19. fs.s3.threshold-in-bytes¶
Specify the threshold in the S3 client. Can also be specified through the environment variable FS_S3_THRESHOLD_IN_BYTES.