Command Line Examples ===================== This chapter provides hands-on examples of some common tasks in GeoMesa, including the management of registered feature types in a data store, ingest of data, and export of data in a variety of formats. Feature Type Management ----------------------- Creating a feature type ^^^^^^^^^^^^^^^^^^^^^^^ To begin, let's start by creating a new feature type in GeoMesa with the ``create`` command. The ``create`` command takes three required and one optional flag: **Required** - ``-c`` or ``--catalog``: the name of the catalog table - ``-f`` or ``--feature-name``: the name of the feature - ``-s`` or ``--spec``: the ``SimpleFeatureType`` specification **Optional** - ``--dtg``: the default date attribute of the ``SimpleFeatureType`` Run the command: .. code:: bash $ geomesa-accumulo create-schema -u -p \ -c cmd_tutorial \ -f feature \ -s fid:String:index=true,dtg:Date,geom:Point:srid=4326 \ -dtg dtg This will create a new feature type, named "feature", on the GeoMesa catalog table "cmd\_tutorial". The catalog table stores metadata information about each feature, and it will be used to prefix each table name in Accumulo. If the above command was successful, you should see output similar to the following: .. code:: bash Creating 'cmd_tutorial_feature' with spec 'fid:String:index=true,dtg:Date,geom:Point:srid=4326'. Just a few moments... Feature 'cmd_tutorial_feature' with spec 'fid:String:index=true,dtg:Date,geom:Point:srid=4326' successfully created. Now that you've seen how to create feature types, create another feature type on catalog table "cmd\_tutorial" using your own first name for the ``--feature-name`` and the above schema for the ``--spec``. Listing known feature types ^^^^^^^^^^^^^^^^^^^^^^^^^^^ You should have two feature types on catalog table "cmd\_tutorial". To verify, we'll use the ``list`` command. The ``list`` command takes one flag: - ``-c`` or ``--catalog``: the name of the catalog table Run the following command: .. code:: bash $ geomesa-accumulo get-type-names -u -p -c cmd_tutorial The output text should be something like: .. code:: bash Listing features on 'cmd_tutorial'. Just a few moments... 2 features exist on 'cmd_tutorial'. They are: feature gdelt Finding the attributes of a feature type ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To find out more about the attributes of a feature type, we'll use the ``describe`` command. This command takes two flags: - ``-c`` or ``--catalog``: the name of the catalog table - ``-f`` or ``--feature-name``: the name of the feature type Let's find out more about the attributes on our first feature type. Run the command .. code:: bash $ geomesa-accumulo describe-schema -u -p -c cmd_tutorial -f feature The output should look like: .. code:: bash Describing attributes of feature 'cmd_tutorial_feature'. Just a few moments... fid: String (Indexed) dtg: Date (Time-index) geom: Point (Geo-index) Deleting a feature type ^^^^^^^^^^^^^^^^^^^^^^^ Continuing on, let's delete the first feature type we created with the ``remove-schema`` command. The ``remove-schema`` command takes two flags: - ``-c`` or ``--catalog``: the name of the catalog table - ``-f`` or ``--feature-name``: the name of the feature to delete Run the following command: .. code:: bash $ geomesa-accumulo remove-schema -u -p -c cmd_tutorial -f feature NOTE: Running this command will take a bit longer than the previous two, as it will delete three tables in Accumulo, as well as remove the metadata rows in the catalog table associated with the feature. The output should resemble the following: .. code:: bash Remove schema feature from catalog cmd_tutorial? (yes/no): yes Starting State change: CONNECTED Removed feature Ingesting Data -------------- GeoMesa Tools is a set of command line tools to add feature management functions, query planning and explanation, ingest, and export abilities from the command line. In this tutorial, we'll cover how to ingest and export features using GeoMesa Tools. Getting Data ^^^^^^^^^^^^ For this tutorial we will be using the GDELT data set, available here: http://data.gdeltproject.org/events/index.html. Download any daily data file, for example:: 20160119.export.CSV.zip and unzip the file on your computer. .. note:: The unpacked files have ``*.CSV`` extensions but the data within them are actually *tab* separated. Ingesting Features ^^^^^^^^^^^^^^^^^^ The ingest command currently supports three formats: CSV, TSV, and SHP. The ``ingest`` command has the following required flags: - ``-u`` or ``--user``: the Accumulo user - ``-c`` or ``--catalog``: the name of the GeoMesa catalog table - ``-f`` or ``--feature-name``: the name of the feature to ingest One (not both) of the following flags must also be specified: - ``-p`` or ``--password``: the Accumulo password - ``--keytab``: path to a Kerberos keytab file If ``-p`` (or ``--password``) and ``--keytab`` are both omitted, then password authentication is assumed and the user is prompted for a password. If ``$ACCUMULO_HOME`` does not contain the configuration of the Accumulo instance you wish to connect to, you also must specify the connection parameters for Accumulo: - ``-i`` or ``--instance``: the Accumulo instance - ``-z`` or ``--zookeepers``: a comma-separated list of Zookeeper hosts The optional ``-C`` switch lets you specify a converter defined in a JSON-based instruction file about how to convert the data as GeoMesa reads it. The converter library handles many of the data transformations necessary to fit a raw data set into a simple feature type suitable for use in GeoMesa applications. Conversions can take advantage of a variety of features such as ``concatenate()`` and ``stringToInteger()`` functions as well as the use of regular expressions. For more information see :ref:`setting_up_ingest_converter` below. The last argument that is required for all ingest commands is the path to the file to ingest. If ingesting CSV/TSV data this can be an HDFS path, specified by prefixing it with ``hdfs://``. .. _setting_up_ingest_converter: Setting up an Ingest Converter ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To use the ``-C`` switch, create (or edit) the file ``$GEOMESA_ACCUMULO_HOME/conf/application.conf``, which serves as the converter configuration file, to add the ``gdelt`` SimpleFeatureType and a converter ``gdelt_csv`` for reading the data from tab-separated value files: .. code:: geomesa { sfts { gdelt = { fields = [ { name = globalEventId, type = String, index = false} { name = eventCode, type = String } { name = actor1, type = String } { name = actor2, type = String } { name = dtg, type = Date, index = true } { name = geom, type = Point, srid = 4326 } ] } } converters { gdelt_tsv = { type = delimited-text format = TDF id-field = "$1" // global event id fields = [ { name = globalEventId, transform = "$1" } { name = eventCode, transform = "$27" } { name = actor1, transform = "$7" } { name = actor2, transform = "$17" } { name = dtg, transform = "date('yyyyMMdd', $2)" } { name = geom, transform = "point(stringToDouble($41, 0.0), $40::double)" } ] } } } The config file needs to have a ``SimpleFeatureType`` defined along with a converter that specifies instructions on how to turn the raw data file into that simple feature type. See :doc:`/user/convert/index` for a more details on converters, including a full list of the transformation functions available (:doc:`/user/convert/function_overview`). This example uses the ``date()`` function to tell the parser what date column is in. The ``stringToDouble()`` and ``::double`` functions give two different methods for type casting. The ``stringTo()`` methods take in the value to be cast as well as a prespecified default that will be returned if there is an exception, whereas the ``::double`` function will fail (and drop the record) if the casting fails. To confirm that GeoMesa can properly parse your edited ``$GEOMESA_ACCUMULO_HOME/conf/application.conf`` file, use ``geomesa-accumulo env``: .. code:: $ geomesa-accumulo env -s gdelt --format spec Using GEOMESA_ACCUMULO_HOME = /opt/geomesa/tools Simple Feature Types: gdelt = globalEventId:String,eventCode:String,actor1:String,actor2:String,dtg:Date:index=join,*geom:Point:srid=4326;geomesa.index.dtg='dtg' $ geomesa-accumulo env -c gdelt_tsv Using GEOMESA_ACCUMULO_HOME = /opt/geomesa/tools Simple Feature Type Converters: converter-name=gdelt_tsv fields=[ { name=globalEventId transform="$1" }, { name=eventCode transform="$27" }, { name=actor1 transform="$7" }, { name=actor2 transform="$17" }, { name=dtg transform="date('yyyyMMdd', $2)" }, { name=geom transform="point(stringToDouble($41, 0.0), $40::double)" } ] format=TDF # global event id id-field="$1" type=delimited-text Downloading sample data ^^^^^^^^^^^^^^^^^^^^^^^ Packaged with geomesa script for easily downloading publicly available data sets and a set of corresponding config files. The currently available data sets are GDELT_, GeoLife_, OSM-GPX_, T-Drive_, GeoNames_, NYCTaxi_, GTD_, and Twitter_. The first five of these sets are easily downloadable via a provided script. .. _GDELT: https://gdeltproject.org/ .. _GeoLife: https://research.microsoft.com/en-us/projects/geolife/ .. _OSM-GPX: https://wiki.openstreetmap.org/wiki/Planet.gpx .. _T-Drive: https://research.microsoft.com/apps/pubs/?id=152883 .. _GeoNames: https://www.geonames.org/ .. _NYCTaxi: https://publish.illinois.edu/dbwork/open-data/ .. _GTD: https://www.start.umd.edu/gtd/ .. _Twitter: https://dev.twitter.com/rest/public To download these sets, run the download script found in geomesa-tools/bin and provide the name of the data set desired. This can be one of ``gdelt``, ``geolife``, ``osm-gpx``, ``tdrive``, or ``geonames``: Example Usage: .. code:: bash $ ./download-data.sh geolife Depending on the desired data, you may be prompted further information to specify desired dates or locations. The resulting data will then be downloaded to ``$GEOMESA_ACCUMULO_HOME/data``. Configuration files for these data sets are found under ``$GEOMESA_ACCUMULO_HOME/conf/sfts``. Modifications to them can seen by running ``geomesa-accumulo env`` and will be reflected in the next run ingest. Running an Ingest ^^^^^^^^^^^^^^^^^ Now that we have everything ready, we will now combine the various parameters into the following complete ingest command: .. code-block:: bash $ geomesa-accumulo ingest \ -u -p -i -z \ -c gdelt -s gdelt -C gdelt_tsv --threads 1 \ /path/to/.csv ```` and ```` are the credentials associated with the Accumulo instance. ```` and ```` are the connection parameters for Accumulo, if this is not specified in the configuration files in ``$ACCUMULO_HOME``. Exporting Features ------------------ Let's export your newly ingested features in a couple of file formats. Currently, the ``export`` command supports exports to CSV, TSV, Shapefile, GeoJSON, and GML. We'll do one of each format in this next section. The ``export`` command has 3 required flags: - ``-c`` or ``--catalog``: the name of the catalog table - ``-f`` or ``--feature-name``: the name of the feature to export - ``-F`` or ``--format``: the output format (``csv``, ``tsv``, ``shp``, ``geojson``, or ``gml``) Additionally, you can specify more details about the kind of export you would like to perform with optional flags for ``export``: - ``-a`` or ``--attributes``: the attributes of the feature to return - ``-m`` or ``--max-features``: the maximum number of features to return in an export - ``-q`` or ``--query``: a `CQL query `__ to perform on the features, to return only subset of features matching the query We'll use the ``--max-features`` flag to ensure our dataset is small and quick to export. First, we'll export to CSV with the following command: .. code-block:: bash $ geomesa-accumulo export -u -p -c gdelt_Ukraine -fn gdelt -fmt csv -max 50 # or specifying Accumulo configuration explicitly: $ geomesa-accumulo export \ -u -p -i -z \ -c gdelt -f gdelt -f csv -m 50 This command will output the relevant rows to the console. Inspect the rows now, or pipe the output into a file for later review. Now, run the above command four additional times, changing the ``--format`` flag to ``tsv``, ``shp``, ``json``, and ``gml``. The ``shp`` format also requires the ``-o`` option to specify the name of an output file.