21.7. Index Configuration

When creating a new feature type using the GeoMesa FileSystem data store (FSDS), there are several required options that must be specified. Through the command line tools, these options are exposed as flags. If using the GeoTools data store API, the options must be set as user data before calling createSchema.

21.7.1. Configuring the Partition Scheme

Partition schemes define how data is partitioned (grouped) into different files. Schemes are defined by a well-known name, along with optional configuration flags. See Partition Schemes for more details.

Partition schemes can be specified through the user data key geomesa.fs.scheme:

import org.locationtech.geomesa.fs.storage.common.interop.ConfigurationUtils;
import java.util.Collections;

SimpleFeatureType sft = ...
// use the utility method
ConfigurationUtils.setScheme(sft, "daily,z2:bits=2");
// or set directly in the user data as JSON
sft.getUserData().put("geomesa.fs.scheme", "daily,z2:bits=2");

Different schemes are separated with a comma (,), while scheme options are separated with a colon (:), and the value for each option is separated by an equals sign (=). For example: first_scheme_name:option.1.key=option1value:option.2.key=option2value,second_scheme_name

21.7.2. Configuring Target File Size

By default data files can grow to unlimited size as more data is written and files are compacted. This may lead to poor performance, if a file becomes too large. To manage this, a target file size can be configured through the user data key geomesa.fs.file-size:

import org.locationtech.geomesa.fs.storage.common.interop.ConfigurationUtils;

SimpleFeatureType sft = ...
// use the utility method
ConfigurationUtils.setTargetFileSize(sft, false);
// or set directly in the user data as a string
sft.getUserData().put("geomesa.fs.file-size", "1GB");

Note that target file size can also be specified in some operations, which will override any default configured in the feature type. See compact and ingest for details. See geomesa.fs.size.threshold for controlling the file size error margin.

21.7.3. Configuring Visibility Persistence

GeoMesa will by default persist feature visibility flags as a column in the FSDS files. If feature visibilities are not being used, this may be disabled by settings geomesa.fs.visibilities to false, either in the feature type user data or in the fs.config.properties or fs.config.file data store parameters.

21.7.4. Configuring Custom Observer Callbacks

The FSDS provides a mechanism to add custom handling during file writing. Users can implement observer factories, which will be invoked for each new file that is created. Observer factories must extend the trait org.locationtech.geomesa.fs.storage.core.observer.FileSystemObserverFactory:

package org.locationtech.geomesa.fs.storage.core.observer

import org.locationtech.geomesa.fs.storage.core.FileSystemStorage

import java.io.Closeable
import java.net.URI

/**
 * Factory for observing file writes
 */
trait FileSystemObserverFactory extends Closeable {

  /**
   * Called once after instantiating the factory
   *
   * @param conf hadoop configuration
   * @param root root path
   * @param sft simple feature type
   */
  def init(storage: FileSystemStorage): Unit

  /**
   * Create an observer for the given path
   *
   * @param path file path being written
   * @return
   */
  def apply(path: URI): FileSystemObserver
}

Note

Observer factories must have a default no-arg constructor in order to be instantiated by the framework.

Observers can be specified through the user data key geomesa.fs.observers:

import java.util.Arrays;
import java.util.Collections;
import java.util.List;

SimpleFeatureType sft = ...
List<String> factories =
  Arrays.asList("com.example.MyCustomObserverFactory", "com.example.MySecondObserverFactory");
// set directly in the user data as a comma-delimited string
sft.getUserData().put("geomesa.fs.observers", String.join(",", factories));