GeoMesa NiFi Quick Start ======================== This tutorial provides an example implementation for using GeoMesa with NiFi. This walk-through will guide you in setting up the components required for ingesting GDELT files into GeoMesa. Prerequisites ------------- This tutorial uses `Docker `_, and assumes a Linux OS. About this Tutorial ------------------- This quick start operates by reading CSV files from the local filesystem, and writing them to GeoMesa Parquet files using the PutGeoMesa processor. Download the GeoMesa NiFi NARs ------------------------------ First, we will download the appropriate NARs. Full instructions are available under :ref:`nifi_install`, but the relevant sections are reproduced here. For this tutorial, we will be using three NARs: * ``geomesa-datastore-services-nar`` * ``geomesa-datastore-services-api-nar`` * ``geomesa-fs-nar`` This tutorial will use the GeoMesa FileSystem data store to avoid external dependencies, but any other back-end store can be used instead by changing the ``DataStoreService`` used. First, set the version to use: .. parsed-literal:: export TAG="|release_version|" export VERSION="|scala_binary_version|-${TAG}" # note: |scala_binary_version| is the Scala build version .. code-block:: bash mkdir -p ~/gm-nifi-quickstart/extensions cd ~/gm-nifi-quickstart export NARS="geomesa-fs-nar geomesa-datastore-services-api-nar geomesa-datastore-services-nar" for nar in $NARS; do wget -O "extensions/$nar_$VERSION.nar" "https://github.com/geomesa/geomesa-nifi/releases/download/geomesa-nifi-$TAG/$nar_$VERSION.nar"; done Obtain GDELT data ----------------- The `GDELT Event database `__ provides a comprehensive time- and location-indexed archive of events reported in broadcast, print, and web news media worldwide from 1979 to today. GeoMesa ships with the ability to parse GDELT data, so it's a good data format for this tutorial. For more details, see :ref:`gdelt_converter`. Run the following commands to download a recent GDELT file: .. code-block:: bash cd ~/gm-nifi-quickstart mkdir gdelt export GDELT_URL="$(wget -O - 'http://data.gdeltproject.org/gdeltv2/masterfilelist.txt' | head -n 1 | awk '{ print $3 }')" wget "$GDELT_URL" -O "gdelt/$(basename $GDELT_URL)" unzip -d gdelt gdelt/*.zip rm gdelt/*.zip Run NiFi with Docker -------------------- Next, we will run NiFi through Docker, mounting in our NARs and a directory for writing out data: .. code-block:: bash cd ~/gm-nifi-quickstart mkdir fs docker run --rm \ -p 8443:8443 -e SINGLE_USER_CREDENTIALS_USERNAME=nifi \ -e SINGLE_USER_CREDENTIALS_PASSWORD=nifipassword \ -v "$(pwd)/extensions:/opt/nifi/nifi-current/extensions:ro" \ -v "$(pwd)/fs:/fs:rw" \ -v "$(pwd)/gdelt:/gdelt:ro" \ apache/nifi:1.19.1 Once NiFi has finished starting up, it will be available at ``https://localhost:8443/nifi``. You will likely have to click through a certificate warning due to the default self-signed cert being used. Once in the NiFi UI, you can log in with the credentials we specified in the run command; i.e. ``nifi``/``nifipassword``. Create the NiFi Flow -------------------- If you are not familiar with NiFi, follow the `Getting Started `__ guide to familiarize yourself. The rest of this tutorial assumes a basic understanding of NiFi. Add the ingest processor by dragging a new processor to your flow, and selecting ``PutGeoMesa``. Select the processor and click the 'configure' button to configure it. On the properties tab, select ``DataStoreService`` and click on "Create new service". There should be only one option, the ``FileSystemDataStoreService``, so click the "Create" button. Next, click the small arrow next to the ``FileSystemDataStoreService`` entry, and select "Yes" when prompted to save changes. This should bring you to the Controller Services screen. Click the small gear next to the ``FileSystemDataStoreService`` to configure it. On the properties tab, enter the following configuration: * ``fs.path`` - ``/fs`` * ``fs.encoding`` - ``parquet`` .. image:: /tutorials/_static/img/nifi-qs-fs-controller-config.png :align: center Click "Apply", and the service should show as "validating". Click the "refresh" button in the bottom left of the screen, and the service should show as "disabled". Click the small lightning bolt next to the configure gear, and the click the "Enable" button to enable it. Once enabled, close the dialog, then close the controller services page by clicking the ``X`` in the top right. This should bring you back to the main flow. Now we will add two more processors to read our GDELT data. First, add a ``ListFile`` processor, and configure the ``Input Directory`` to be ``/gdelt`` (the location of our mounted GDELT data). Next, add a ``FetchFile`` processor, and connect the output of ``ListFile`` to it. Now we will create a process to set the attributes GeoMesa needs to ingest the data. Add an ``UpdateAttribute`` processor, and use the ``+`` button on the properties tab to add four dynamic properties: * ``geomesa.converter`` - ``gdelt2`` * ``geomesa.sft.name`` - ``gdelt`` * ``geomesa.sft.spec`` - ``gdelt2`` * ``geomesa.sft.user-data`` - ``geomesa.fs.scheme={"name":"daily","options":{"dtg-attribute":"dtg"}}`` .. image:: /tutorials/_static/img/nifi-qs-update-attributes.png :align: center The first three properties define the format of the input data. The last property is used by the GeoMesa File System data store to partition the data on disk. See :ref:`fsds_partition_schemes` for more information. Next, connect the output of the ``FetchFile`` processor to the ``UpdateAttribute`` processor, and the output of the ``UpdateAttribute`` processor to the ``PutGeoMesa`` processor. Auto-terminate any other relationships that are still undefined (in a production system, we'd want to handle failures instead of ignoring them). Now our flow is complete. It should look like the following: .. image:: /tutorials/_static/img/nifi-qs-flow.png :align: center Ingest the Data --------------- We can start the flow by clicking on the background to de-select any processors, then clicking the "Play" button on the left side of the NiFi UI. You should see the data pass through the NiFi flow and be ingested. Visualize the Data ------------------ Once the data has been ingested, you can use GeoServer to visualize it on a map. Follow the instructions in the File System data store quick-start tutorial, :ref:`fsds_quickstart_visualize`. Note that due to Docker file permissions, you may need to run something like the following to make the data accessible: .. code-block:: bash cd ~/gm-nifi-quickstart docker run --rm \ -v "$(pwd)/fs:/fs:rw" \ --entrypoint bash \ apache/nifi:1.19.1 \ -c "chmod -R 777 /fs"