184.108.40.206. Getting Twitter Data¶
Unlike other example data sets, the Twitter data set does not have an immediately downloadable link. Instead, one must use the Twitter API to receive data as JSON.
220.127.116.11. Cleaning the Data¶
The converter expects data is in JSON that has had newlines removed, such that there is one record per line. Files may be compressed.
Twitter data collected with location data may have a point location if
the user posts with precise location. This is stored in the
coordinates field. Otherwise the tweet is associated with a named
place, which has a bounding box.
The bounding box provided by the Twitter API is not a properly formed geoJson polygon. The array of points does not form a linear ring, as it does not close. Thus the converter takes the bounds and builds a polygon from it. The centroid of this box is then taken as a point geometry for the tweet.
18.104.22.168. Ingest procedure¶
Check that the
geomesa env | grep twitter
If it is not, merge the contents of
$GEOMESA_ACCUMULO_HOME/conf/application.conf, or ensure that
reference.conf is in
A recommended ingest procedure is to ingest first picking up the bounding boxes. Tweets with point geometry may fail this ingest. Then ingest to pick up the points. Tweets without points will fail this ingest, and their geometries will remain as set in the first pass.
geomesa ingest -u USERNAME -c CATALOG -s twitter -C twitter-place-centroid hdfs://namenode:port/path/to/twitter/* geomesa ingest -u USERNAME -c CATALOG -s twitter -C twitter hdfs://namenode:port/path/to/twitter/*