19.11.8. Marine Cadastre AIS

Automated Identification System is a global system to track ships, with each vessel beaconing its position to other (nearby) vessels. While various commercial data providers exist, this converter is intended for data collected by the U.S. Coast Guard around the USA and openly published by Marine Cadastre.

The data is disseminated as zipped ESRI File Geodatabases containing a number of tables. Only the broadcast table is considered here, which consists of vessel locations over time. Note that “Ship name and call sign fields have been removed, and the MMSI (Maritime Mobile Service Identity) field has been encrypted for the 2010 through 2014 data at the request of the U.S. Coast Guard.”

19.11.8.1. Getting the Data

Machine-friendly links for bulk data download can be found here. Replace the four digit year with that desired (2009-2016), data is then split into month and UTM Zone. Once data is downloaded, it must be unzipped e.g.

find . -name "*.zip" -exec unzip {} \;

19.11.8.2. Converting the Data

GeoMesa does not currently support ESRI File Geodatabases (FileGDB), so an external tool is required to convert the data into a suitable format. ogr2ogr from GDAL can convert from FileGDB into Comma Separated Value (CSV) format e.g.

find . -name "*.gdb" -exec sh -c "ogr2ogr -f CSV /dev/stdout {} Broadcast -lco GEOMETRY=AS_XY | tail -n +2 > {}.csv" \;

Note that the tail command removes the header row from the output file, making ingest more amenable to processing using HDFS or similar. Also note that the resulting CSV files may be very large – 264 GB for the 2009 & 2019 data.

19.11.8.3. Sample Ingest Command

Check that the marinecadastre-ais simple feature type and converter are available on the GeoMesa tools classpath. This is the default case. Note that you will need to use the command specific to your back-end e.g. geomesa-accumulo.

geomesa env | grep 'marinecadastre-ais'

If they are not, merge the contents of reference.conf with $GEOMESA_HOME/conf/application.conf, or ensure that reference.conf is in $GEOMESA_HOME/conf/sfts/marinecadastre-ais.

To ingest using the GeoMesa command line interface:

$ geomesa ingest -u username -c catalogName -s marinecadastre-ais -C marinecadastre-ais -t 8 /path/to/data/*.csv

Note this example uses 8 threads, which for all of the 2009 & 2010 data (approx 3.5B records) took 15h on a 5 node cluster.