22.7. Advanced Topics

22.7.1. Parallelism

There are two types of parallelism for the Lambda store - the number of simultaneous writers for long-term persistence, and the number of Kafka consumers per data store.

The number of writers is determined by the number of data stores instances with a valid lambda.expiry parameter; however the parallelism on writes is limited by how many partitions are in the Kafka topic (writes are synchronized across instances by topic and partition). The lambda.kafka.partitions parameter can be used to control the number of partitions, but note that once a topic is created, you will need to use Kafka scripts (i.e. kafka-topics.sh) to modify partitions.

The number of consumers used to load features into the in-memory cache is controlled by the lambda.kafka.consumers parameter. This is the number of consumers per data store instance per simple feature type accessed. Note that having more consumers than topic partitions is not recommended and will cause some consumers to be idle.

22.7.2. Installation Tips

The typical use case for a Lambda data store is to be fed by an analytic streaming process. The analytic will update summary SimpleFeatures with derived attributes containing analytic results. For example, when aggregating GPS tracks, there could be a dynamically updated field for current estimated heading.

Typical streaming analytics will run in many concurrent threads, for example using Apache Storm. The Lambda store achieves parallelism based on the lambda.kafka.partitions parameter. Generally you should start with this set to the number of writer threads being used, and adjust up or down as needed.

Any data store instances used only for reads (e.g. GeoServer) should generally disable writing by setting the expiry parameter to Inf. Note that there must be at least one data store instance with a valid expiry, or features will never be removed from memory and persisted to long-term storage.

22.7.3. Manual Persistence

Instead of allowing features to be persisted to long-term storage automatically, you may instead control exactly when features are written. To do this, set the lambda.persist parameter to false on all data store instances. Write and remove features from the Lambda data store using SimpleFeatureWriters as usual to add and delete them from the in-memory cache. With lambda.persist disabled, this will not affect long-term storage. Note that data stores will still expire features from memory - this is required to clean up internal state. Make sure that the lambda.expiry parameter is set high enough that it won’t remove features that you still want available in memory.

To write features to long-term storage, instantiate an instance of the delegate data store (AccumuloDataStore) using the same connection parameters as for the Lambda store. Any features written to the delegate store will then be queryable by the Lambda store, and merged with the in-memory cache.

22.7.4. Monitoring State

The state of a Lambda data store can be monitored by enabling logging on the following classes:

Class Level Info
org.locationtech.geomesa.lambda.stream.kafka.DataStorePersistence debug Count of features written to long-term storage
org.locationtech.geomesa.lambda.stream.kafka.DataStorePersistence trace All features written to long-term storage
org.locationtech.geomesa.lambda.stream.kafka.KafkaStore trace All features written to Kafka
org.locationtech.geomesa.lambda.stream.kafka.KafkaCacheLoader trace All features read from Kafka
org.locationtech.geomesa.lambda.stream.kafka.KafkaFeatureCache debug Size of in-memory cache
org.locationtech.geomesa.lambda.stream.kafka.KafkaFeatureCache trace All features added/removed from in-memory cache