9.8. Delimited Text Converter¶
The delimited text converter handles plain delimited text files such as CSV or TSV.
9.8.1. Configuration¶
The delimited text converter supports the following configuration keys:
Key |
Required |
Type |
Description |
|---|---|---|---|
|
yes |
String |
Must be the string |
|
yes |
String |
The delimited text format (see below). |
|
no |
Char |
Override the delimiter character. |
|
no |
Char |
Override the quote character. Can be disabled by setting to an empty string. |
|
no |
Char |
Override the escape character. Can be disabled by setting to an empty string. |
|
no |
Integer |
Skip over header lines |
9.8.1.1. format¶
The format key specifies an instance of org.apache.commons.csv.CSVFormat that will be used for parsing. The available
formats are:
Name |
Format |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
See Apache Commons CSV for additional details on each format.
9.8.2. Transform Functions¶
The transform element supports referencing each field in the record by its column number using $. $0
refers to the whole line, then the first columns is $1, etc. Each column will initially be a string, so
further transforms may be necessary to create the correct type. See Transformation Functions for more details.
9.8.3. Example Usage¶
Suppose you have a SimpleFeatureType with the following schema:
phrase:String,dtg:Date,*geom:Point:srid=4326
And you have the following comma-separated data:
number,word,date,lat,lon
first,hello,2015-01-01T00:00:00.000Z,45.0,45.0
second,world,2015-01-01T00:00:00.000Z,45.0,45.0
We want to concatenate the first two fields together to form the phrase, parse the third field as a date, and
use the last two fields as coordinates for a Point geometry. The following configuration defines an appropriate
converter for taking this CSV data and transforming it into our SimpleFeatureType:
geomesa.converters.example = {
type = "delimited-text"
format = "CSV"
options = {
skip-lines = 1
}
id-field = "murmurHash3($0)"
fields = [
{ name = "phrase", transform = "concatenate($1, $2)" },
{ name = "dtg", transform = "dateHourMinuteSecondMillis($3)" },
{ name = "lat", transform = "$4::double" },
{ name = "lon", transform = "$5::double" },
{ name = "geom", transform = "point($lon, $lat)" }
]
}
The id of the SimpleFeature is formed from a hash of the entire record ($0 is the whole row). The simple feature
attributes are created from the fields list with appropriate transforms - note the use of intermediate fields ‘lat’ and ‘lon’.