GeoMesa Storm Quick Start
=========================
Apache Storm is "a free and open source distributed realtime computation
system."
You can leverage Storm to analyze and ingest data into GeoMesa in near
real time. In this tutorial, we will:
1. Use Apache Kakfa to send messages to a Storm topology.
2. Use Storm to parse Open Street Map (OSM) data files and ingest them
into Accumulo.
3. Leverage Geoserver to query and visualize the data.
Prerequisites
-------------
You will need access to:
- an instance of Accumulo |accumulo_required_version|,
- an Accumulo user with create-table and write permissions,
- an installation of Kafka |kafka_version|,
- an installation of Storm 0.9+, and
- an instance of GeoServer |geoserver_version| with the GeoMesa Accumulo plugin.
installed
In order to install the GeoMesa Accumulo GeoServer plugin, see :ref:`install_accumulo_geoserver`.
You will also need:
- The `xz `__ data compression tool,
- `Java JDK
8 `__,
- `Apache Maven `__ |maven_version|, and
- a `git `__ client.
Download and Build the Tutorial
-------------------------------
Pick a reasonable directory on your machine, and run:
.. code-block:: bash
$ git clone https://github.com/geomesa/geomesa-tutorials.git
$ cd geomesa-tutorials
.. note::
You may need to download a particular release of the
tutorials project to target a particular GeoMesa release.
To build, run
.. code-block:: bash
$ mvn clean install -pl geomesa-quickstart-storm
.. note::
Ensure that the version of Accumulo, Hadoop, Storm,
etc in the root ``pom.xml`` match your environment.
.. note::
Depending on the version, you may also need to build
GeoMesa locally. Instructions can be found in
:ref:`installation`.
Obtaining OSM Data
------------------
In this demonstration, we will use the ``simple-gps-points`` OSM data
that contains only the location of an observation. Download the
`OSM `__ data
`here `__.
.. note::
The file is approximately 7 GB.
Use the following command to unpack the data:
.. code-block:: bash
$ xz simple-gps-points-120312.txt.xz
Deploy the Ingest Topology
--------------------------
The quickstart topology will read messages off of a Kafka topic, parse
them into ``SimpleFeature``\ s, and write them to Accumulo.
Use ``storm jar`` to submit the topology to your Storm instance:
.. code-block:: bash
$ storm jar geomesa-quickstart-storm/target/geomesa-quickstart-storm-$VERSION.jar \
com.example.geomesa.storm.OSMIngest \
-instanceId \
-zookeepers \
-user \
-password \
-tableName OSM \
-featureName event \
-topic OSM
Run Data through the System
---------------------------
We use Kafka as the input to our Storm topology. First, create a topic
to send data:
For Kafka 0.8 use the following command.
.. code-block:: bash
$ kafka-create-topic.sh \
--zookeeper \
--replica 3 \
--partition 10 \
--topic OSM
For Kafka 0.9+ use the following command.
.. code-block:: bash
$ kafka-topics.sh \
--create \
--zookeeper localhost \
--replication-factor 3 \
--partitions 10 \
--topic OSM
Note that we create a topic with several partitions in order to
parallelize the ingest from the producer side as well as from the
consumer (Storm) side.
Next, use the tutorial code to send the OSM file as a series of Kafka
messages:
.. code-block:: bash
$ java -cp geomesa-quickstart-storm/target/geomesa-quickstart-storm-$VERSION.jar \
com.example.geomesa.storm.OSMIngestProducer \
-ingestFile simple-gps-points-120312.txt \
-topic OSM \
-brokers
Note that Kafka's default partitioner class assigns a message partition
based on a hash of the provided key. If no key is provided, all messages
are assigned the same partition.
.. code-block:: java
for (String x = bufferedReader.readLine(); x != null; x = bufferedReader.readLine()) {
producer.send(new KeyedMessage(topic, String.valueOf(rnd.nextInt()), x));
}
Storm Spouts and Bolts
----------------------
In the quick start code, the Storm ``Spout``\ s consume messages from a
Kafka topic and send them through the ingest topology:
.. code-block:: java
public void nextTuple() {
if (kafkaIterator.hasNext()) {
List