Remote File System Support
==========================
Through Hadoop's file system support, GeoMesa supports ingesting files directly from remote file systems, including
Amazon's S3 and Microsoft's Azure.
Note: the examples below use the Accumulo tools, but should work with any other distribution as well.
Enabling S3 Ingest
------------------
Hadoop ships with implementations of S3-based filesystems, which can be enabled in the Hadoop configuration used with
GeoMesa tools. Specifically, GeoMesa tools can perform ingests using both the second-generation (`s3n`) and
third-generation (`s3a`) filesystems. Edit the ``$HADOOP_CONF_DIR/core-site.xml`` file in your Hadoop installation,
as shown below (these instructions apply to Hadoop 2.5.0 and higher). Note that you must have the environment variable
``$HADOOP_MAPRED_HOME`` set properly in your environment. Some configurations
can substitute ``$HADOOP_PREFIX`` in the classpath values below.
.. warning::
AWS credentials are valuable! They pay for services and control read and write protection for data. If you are
running GeoMesa on AWS EC2 instances, it is recommended to use the ``s3a`` filesystem. With ``s3a``, you can omit the
Access Key Id and Secret Access keys from `core-site.xml` and rely on IAM roles.
Configuration
^^^^^^^^^^^^^
For ``s3a``:
.. code-block:: xml
mapreduce.application.classpath
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*
The classpath specifically for Map-Reduce jobs. This override is needed so that s3 URLs work on Hadoop 2.6.0+
fs.s3a.access.key
XXXX YOURS HERE
fs.s3a.secret.key
XXXX YOURS HERE
Valuable credential - do not commit to CM
After you have enabled S3 in your Hadoop configuration you can ingest with GeoMesa tools. Note that you can still
use the Kleene star (*) with S3.:
.. code-block:: bash
$ geomesa-accumulo ingest -u username -p password -c geomesa_catalog -i instance -s yourspec -C convert s3a://bucket/path/file*
For ``s3n``:
.. code-block:: xml
mapreduce.application.classpath
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*
The classpath specifically for map-reduce jobs. This override is needed so that s3 URLs work on hadoop 2.6.0+
fs.s3n.impl
org.apache.hadoop.fs.s3native.NativeS3FileSystem
Tell hadoop which class to use to access s3 URLs. This change became necessary in hadoop 2.6.0
fs.s3n.awsAccessKeyId
XXXX YOURS HERE
fs.s3n.awsSecretAccessKey
XXXX YOURS HERE
S3n paths are prefixed in hadoop with ``s3n://`` as shown below::
$ geomesa-accumulo ingest -u username -p password \
-c geomesa_catalog -i instance -s yourspec \
-C convert s3n://bucket/path/file s3n://bucket/path/*
Enabling Azure Ingest
---------------------
Hadoop ships with implementations of Azure-based filesystems, which can be enabled in the Hadoop configuration used with
GeoMesa tools. Specifically, GeoMesa tools can perform ingests using the ``wasb`` and ``wasbs`` filesystems.
Edit the ``$HADOOP_CONF_DIR/core-site.xml`` file in your Hadoop installation as shown below
(these instructions apply to Hadoop 2.5.0 and higher). In addition, the hadoop-azure and azure-storage JARs need to be
available.
.. warning::
Azure credentials are valuable! They pay for services and control read and write protection for data. Be sure to keep
your core-site.xml configuration file safe. It is recommended that you use Azure's SSL enable file protocol
variant ``wasbs`` where possible.
Configuration
^^^^^^^^^^^^^
To enable, place the following in your Hadoop Installation's core-site.xml.
.. code-block:: xml
fs.azure.account.key.ACCOUNTNAME.blob.core.windows.net
XXXX YOUR ACCOUNT KEY
After you have enabled Azure in your Hadoop configuration you can ingest with GeoMesa tools. Note that you can still
use the Kleene star (*) with Azure.:
.. code-block:: bash
$ geomesa-accumulo ingest -u username -p password \
-c geomesa_catalog -i instance -s yourspec \
-C convert wasb://CONTAINER@ACCOUNTNAME.blob.core.windows.net/files/*