GeoMesa Transformations

GeoMesa allows users to perform relational projections on query results. We call these “transformations” to distinguish them from the overloaded term “projection” which has a different meaning in a spatial context. These transformations have the following uses and advantages:

  1. Subset to specified columns - reduces network overhead of returning results
  2. Rename specified columns - alters the schema of data on the fly
  3. Compute new attributes from one or more original attributes - adds derived fields to results

The transformations are applied in parallel across the cluster thus making them very fast. They are analogous to the map tasks in a map-reduce job. Transformations are also extensible; developers can implement new functions and plug them into the system using standard mechanisms from Geotools.

Note

When this tutorial refers to “projections”, it means in the relational sense - see Projection - Relational Algebra. Projection also has many other meanings in spatial discussions - they are not used in this tutorial. Although projections can also modify an attribute’s value, in this tutorial we will refer to such modifications as “transformations” to keep things clearer.

This tutorial will show you how to write custom Java code using GeoMesa to do the following:

  1. Query previously-ingested data.
  2. Apply relational projections to your query results.
  3. Apply transformations to your query results.

Prerequisites

You will need:

  • an instance of Accumulo 1.7 or 1.8 running on Hadoop 2.2 or better,
  • an Accumulo user that has appropriate permissions to query your data,
  • Java JDK 8,
  • Apache Maven 3.2.2 or better, and
  • a git client.

This tutorial queries the GDELT data set. Instructions on ingesting GDELT data are available in the Map-Reduce Ingest of GDELT tutorial.

Warning

Before continuing, ingest the GDELT data set described in the GeoMesa GDELT Map-Reduce Ingest of GDELT.

Download and Build the Tutorial

Pick a reasonable directory on your machine, and run:

$ git clone https://github.com/geomesa/geomesa-tutorials.git
$ cd geomesa-tutorials

Note

You may need to download a particular release of the tutorials project to target a particular GeoMesa release. See About Tutorial Versions.

To build, run

$ mvn clean install -pl geomesa-examples-transformations

Note

Ensure that the version of Accumulo, Hadoop, etc in the root pom.xml match your environment.

Note

Depending on the version, you may also need to build GeoMesa locally. Instructions can be found in Installation.

Run the Tutorial

Warning

Before continuing, ensure that you have ingested the GDELT data set described in the Map-Reduce Ingest of GDELT tutorial. If using GDELT data from a time period different than that used in the GDELT tutorial, change the date range in the QueryTutorial createBaseFilter function and recompile.

On the command line, run:

$ java -cp geomesa-examples-transformations/target/geomesa-examples-transformations-<version>.jar \
    com.example.geomesa.transformations.QueryTutorial \
    -instanceId <instance>                            \
    -zookeepers <zoos>                                \
    -user <user>                                      \
    -password <pwd>                                   \
    -tableName <table>                                \
    -featureName <feature>

where you provide the following arguments:

  • <instance> the name of your Accumulo instance
  • <zoos> comma-separated list of your Zookeeper nodes, e.g. zoo1:2181,zoo2:2181,zoo3:2181
  • <user> the name of an Accumulo user that will execute the scans, e.g. root
  • <pwd> the password for the previously-mentioned Accumulo user
  • <table> the name of the Accumulo table that has the GeoMesa GDELT dataset, e.g. gdelt if you followed the GDELT tutorial
  • <feature> the feature name used to ingest the GeoMesa GDELT dataset, e.g. event if you followed the GDELT tutorial

You should see several queries run and the results printed out to your console.

Insight into How the Tutorial Works

The code for querying and projections is available in the class com.example.geomesa.transformations.QueryTutorial. The source code is meant to be accessible, but the following is a high-level breakdown of the relevant methods:

  • basicQuery executes a base filter without any further options. All attributes are returned in the data set.
  • basicProjectionQuery executes a base filter but specifies a subset of attributes to return.
  • basicTransformationQuery executes a base filter and transforms one of the attributes that is returned.
  • renamedTransformationQuery executes a base filter and transforms one of the attributes, returning it in a separate derived attribute.
  • mutliFieldTransformationQuery executes a base filter and transforms two attributes into a single derived attributes.
  • geometricTransformationQuery executes a base filter and transforms the geometry returned from a point into a polygon by buffering it.

Additional transformation functions are listed here.

Please note that currently not all functions are supported by GeoMesa.

Sample Code and Output

The following code snippets show the basic aspects of creating queries for GeoMesa.

Create a basic query with no projections

This query does not use any projections or transformations. Note that all attributes are returned in the results.

Query query = new Query(simpleFeatureTypeName, cqlFilter);

Output

Result GLOBALEVENTID SQLDATE MonthYear Year FractionDate Actor1Code Actor1Name Actor1CountryCode Actor1KnownGroupCode Actor1EthnicCode Actor1Religion1Code Actor1Religion2Code Actor1Type1Code Actor1Type2Code Actor1Type3Code Actor2Code Actor2Name Actor2CountryCode Actor2KnownGroupCode Actor2EthnicCode Actor2Religion1Code Actor2Religion2Code Actor2Type1Code Actor2Type2Code Actor2Type3Code IsRootEvent EventCode EventBaseCode EventRootCode QuadClass GoldsteinScale NumMentions NumSources NumArticles AvgTone Actor1Geo_Type Actor1Geo_FullName Actor1Geo_CountryCode Actor1Geo_ADM1Code Actor1Geo_Lat Actor1Geo_Long Actor1Geo_FeatureID Actor2Geo_Type Actor2Geo_FullName Actor2Geo_CountryCode Actor2Geo_ADM1Code Actor2Geo_Lat Actor2Geo_Long Actor2Geo_FeatureID ActionGeo_Type ActionGeo_FullName ActionGeo_CountryCode ActionGeo_ADM1Code ActionGeo_Lat ActionGeo_Long ActionGeo_FeatureID DATEADDED geom
1 284464526 Sun Feb 02 00:00:00 EST 2014 201402 2014 2014.0876 USA UNITED STATES USA               USAGOV UNITED STATES USA         GOV     0 010 010 01 1 0.0 2 1 2 2.6362038 4 Kyiv, Kyyiv, Misto, Ukraine UP UP12 50.4333 30.5167 -1044367 1 United States US US 38.0 -97.0 null 1 United States US US 38.0 -97.0 null 20140202 POINT (30.5167 50.4333)
2 284466704 Sun Feb 02 00:00:00 EST 2014 201402 2014 2014.0876 USAGOV UNITED STATES USA         GOV     USA UNITED STATES USA               1 036 036 03 1 4.0 4 1 4 1.5810276 1 Ukraine UP UP 49.0 32.0 null 1 Ukraine UP UP 49.0 32.0 null 1 Ukraine UP UP 49.0 32.0 null 20140202 POINT (32 49)
3 284427971 Sun Feb 02 00:00:00 EST 2014 201402 2014 2014.0876 IGOUNO UNITED NATIONS   UNO       IGO     USA UNITED STATES USA               0 012 012 01 1 -0.4 27 3 27 1.0064903 4 Kiev, Ukraine (general), Ukraine UP UP00 50.4333 30.5167 -1044367 4 Kiev, Ukraine (general), Ukraine UP UP00 50.4333 30.5167 -1044367 4 Kiev, Ukraine (general), Ukraine UP UP00 50.4333 30.5167 -1044367 20140202 POINT (30.5167 50.4333)
4 284466607 Sun Feb 02 00:00:00 EST 2014 201402 2014 2014.0876 USAGOV UNITED STATES USA         GOV     UKR UKRAINE UKR               1 100 100 10 3 -5.0 2 1 2 7.826087 1 Ukraine UP UP 49.0 32.0 null 1 Ukraine UP UP 49.0 32.0 null 1 Ukraine UP UP 49.0 32.0 null 20140202 POINT (32 49)
5 284464187 Sun Feb 02 00:00:00 EST 2014 201402 2014 2014.0876 USA UNITED STATES USA               UKR UKRAINE UKR               0 111 111 11 3 -2.0 5 1 5 1.4492754 4 Kiev, Ukraine (general), Ukraine UP UP00 50.4333 30.5167 -1044367 4 Kiev, Ukraine (general), Ukraine UP UP00 50.4333 30.5167 -1044367 4 Kiev, Ukraine (general), Ukraine UP UP00 50.4333 30.5167 -1044367 20140202 POINT (30.5167 50.4333)

Create a query with a projection for two attributes

This query uses a projection to only return the ‘Actor1Name’ and ‘geom’ attributes.

String[] properties = new String[] {"Actor1Name", "geom"};
Query query = new Query(simpleFeatureTypeName, cqlFilter, properties);

Output

Result Actor1Name geom
1 UNITED STATES POINT (32 49)
2 UNITED STATES POINT (30.5167 50.4333)
3 UNITED STATES POINT (30.5167 50.4333)
4 UNITED STATES POINT (30.5167 50.4333)
5 UNITED STATES POINT (30.5167 50.4333)

Create a query with an attribute transformation

This query performs a transformation on the ‘Actor1Name’ attribute, to print it in a more user-friendly format.

String[] properties = new String[] {"Actor1Name=strCapitalize(Actor1Name)", "geom"};
Query query = new Query(simpleFeatureTypeName, cqlFilter, properties);

Output

Result geom Actor1Name
1 POINT (30.5167 50.4333) United States
2 POINT (32 49) United States
3 POINT (32 49) United States
4 POINT (30.5167 50.4333) United States
5 POINT (30.5167 50.4333) United States

Create a query with a derived attribute

This query creates a new attribute called ‘derived’ based off a join of the ‘Actor1Name’ and ‘Actor1Geo_FullName’ attribute. This could be used to show the actor and location of the event, for example.

String property = "derived=strConcat(Actor1Name,strConcat(' - ',Actor1Geo_FullName)),geom";
String[] properties = new String[] { property };
Query query = new Query(simpleFeatureTypeName, cqlFilter, properties);

Output

Result geom derived
1 POINT (30.5167 50.4333) UNITED STATES - Kyiv, Kyyiv, Misto, Ukraine
2 POINT (32 49) UNITED STATES - Ukraine
3 POINT (30.5167 50.4333) UNITED STATES - Kiev, Ukraine (general), Ukraine
4 POINT (32 49) UNITED STATES - Ukraine
5 POINT (30.5167 50.4333) UNITED NATIONS - Kiev, Ukraine (general), Ukraine

Create a query with a geometric transformation

This query performs a geometric transformation on the points returned, buffering them by a fixed amount. This could be used to estimate an area of impact around a particular event, for example.

String[] properties = new String[] {"geom,derived=buffer(geom, 2)"};
Query query = new Query(simpleFeatureTypeName, cqlFilter, properties);

Output

Result geom derived
1 POINT (30.5167 50.4333) POLYGON ((32.5167 50.4333, 32.478270560806465 50.04311935596775, 32.36445906502257 49.66793313526982, 32.17963922460509 49.3221595339608, 31.930913562373096 49.01908643762691, 31.627840466039206 48.77036077539491, 31.28206686473018 48.58554093497743, 30.906880644032256 48.47172943919354, 30.5167 48.4333, 30.126519355967744 48.47172943919354, 29.75133313526982 48.58554093497743, 29.405559533960798 48.77036077539491, 29.102486437626904 49.01908643762691, 28.85376077539491 49.3221595339608, 28.668940934977428 49.66793313526983, 28.55512943919354 50.04311935596775, 28.5167 50.4333, 28.55512943919354 50.82348064403226, 28.668940934977428 51.198666864730185, 28.85376077539491 51.54444046603921, 29.102486437626908 51.8475135623731, 29.405559533960798 52.09623922460509, 29.751333135269824 52.281059065022575, 30.126519355967748 52.39487056080647, 30.516700000000004 52.4333, 30.906880644032263 52.39487056080646, 31.282066864730186 52.281059065022575, 31.62784046603921 52.09623922460509, 31.9309135623731 51.847513562373095, 32.1796392246051 51.5444404660392, 32.36445906502258 51.19866686473018, 32.478270560806465 50.82348064403225, 32.5167 50.4333))
2 POINT (30.5167 50.4333) POLYGON ((32.5167 50.4333, 32.478270560806465 50.04311935596775, 32.36445906502257 49.66793313526982, 32.17963922460509 49.3221595339608, 31.930913562373096 49.01908643762691, 31.627840466039206 48.77036077539491, 31.28206686473018 48.58554093497743, 30.906880644032256 48.47172943919354, 30.5167 48.4333, 30.126519355967744 48.47172943919354, 29.75133313526982 48.58554093497743, 29.405559533960798 48.77036077539491, 29.102486437626904 49.01908643762691, 28.85376077539491 49.3221595339608, 28.668940934977428 49.66793313526983, 28.55512943919354 50.04311935596775, 28.5167 50.4333, 28.55512943919354 50.82348064403226, 28.668940934977428 51.198666864730185, 28.85376077539491 51.54444046603921, 29.102486437626908 51.8475135623731, 29.405559533960798 52.09623922460509, 29.751333135269824 52.281059065022575, 30.126519355967748 52.39487056080647, 30.516700000000004 52.4333, 30.906880644032263 52.39487056080646, 31.282066864730186 52.281059065022575, 31.62784046603921 52.09623922460509, 31.9309135623731 51.847513562373095, 32.1796392246051 51.5444404660392, 32.36445906502258 51.19866686473018, 32.478270560806465 50.82348064403225, 32.5167 50.4333))
3 POINT (32 49) POLYGON ((34 49, 33.961570560806464 48.609819355967744, 33.84775906502257 48.23463313526982, 33.66293922460509 47.8888595339608, 33.41421356237309 47.58578643762691, 33.1111404660392 47.33706077539491, 32.76536686473018 47.15224093497743, 32.390180644032256 47.038429439193536, 32 47, 31.609819355967744 47.038429439193536, 31.23463313526982 47.15224093497743, 30.888859533960797 47.33706077539491, 30.585786437626904 47.58578643762691, 30.33706077539491 47.8888595339608, 30.152240934977428 48.234633135269824, 30.03842943919354 48.609819355967744, 30 49, 30.03842943919354 49.390180644032256, 30.152240934977428 49.76536686473018, 30.33706077539491 50.11114046603921, 30.585786437626908 50.4142135623731, 30.888859533960797 50.66293922460509, 31.234633135269824 50.84775906502257, 31.609819355967748 50.961570560806464, 32.00000000000001 51, 32.39018064403226 50.96157056080646, 32.76536686473018 50.84775906502257, 33.11114046603921 50.66293922460509, 33.4142135623731 50.41421356237309, 33.6629392246051 50.111140466039195, 33.84775906502258 49.765366864730176, 33.961570560806464 49.39018064403225, 34 49))
4 POINT (30.5167 50.4333) POLYGON ((32.5167 50.4333, 32.478270560806465 50.04311935596775, 32.36445906502257 49.66793313526982, 32.17963922460509 49.3221595339608, 31.930913562373096 49.01908643762691, 31.627840466039206 48.77036077539491, 31.28206686473018 48.58554093497743, 30.906880644032256 48.47172943919354, 30.5167 48.4333, 30.126519355967744 48.47172943919354, 29.75133313526982 48.58554093497743, 29.405559533960798 48.77036077539491, 29.102486437626904 49.01908643762691, 28.85376077539491 49.3221595339608, 28.668940934977428 49.66793313526983, 28.55512943919354 50.04311935596775, 28.5167 50.4333, 28.55512943919354 50.82348064403226, 28.668940934977428 51.198666864730185, 28.85376077539491 51.54444046603921, 29.102486437626908 51.8475135623731, 29.405559533960798 52.09623922460509, 29.751333135269824 52.281059065022575, 30.126519355967748 52.39487056080647, 30.516700000000004 52.4333, 30.906880644032263 52.39487056080646, 31.282066864730186 52.281059065022575, 31.62784046603921 52.09623922460509, 31.9309135623731 51.847513562373095, 32.1796392246051 51.5444404660392, 32.36445906502258 51.19866686473018, 32.478270560806465 50.82348064403225, 32.5167 50.4333))
5 POINT (30.5167 50.4333) POLYGON ((32.5167 50.4333, 32.478270560806465 50.04311935596775, 32.36445906502257 49.66793313526982, 32.17963922460509 49.3221595339608, 31.930913562373096 49.01908643762691, 31.627840466039206 48.77036077539491, 31.28206686473018 48.58554093497743, 30.906880644032256 48.47172943919354, 30.5167 48.4333, 30.126519355967744 48.47172943919354, 29.75133313526982 48.58554093497743, 29.405559533960798 48.77036077539491, 29.102486437626904 49.01908643762691, 28.85376077539491 49.3221595339608, 28.668940934977428 49.66793313526983, 28.55512943919354 50.04311935596775, 28.5167 50.4333, 28.55512943919354 50.82348064403226, 28.668940934977428 51.198666864730185, 28.85376077539491 51.54444046603921, 29.102486437626908 51.8475135623731, 29.405559533960798 52.09623922460509, 29.751333135269824 52.281059065022575, 30.126519355967748 52.39487056080647, 30.516700000000004 52.4333, 30.906880644032263 52.39487056080646, 31.282066864730186 52.281059065022575, 31.62784046603921 52.09623922460509, 31.9309135623731 51.847513562373095, 32.1796392246051 51.5444404660392, 32.36445906502258 51.19866686473018, 32.478270560806465 50.82348064403225, 32.5167 50.4333))