9.11. Avro Converter¶
The Avro converter handles data written by Apache Avro.
9.11.1. Configuration¶
The Avro converter supports the following configuration keys:
Key |
Required |
Type |
Description |
|---|---|---|---|
|
yes |
String |
Must be the string |
|
yes |
String |
The Avro schema used for parsing (may be omitted if using |
|
yes |
String |
A pointer to an Avro schema on the classpath (may be omitted if using |
9.11.1.1. schema/schema-file¶
The Avro converter supports parsing whole Avro files, with the schema embedded, or Avro IPC messages with
the schema omitted. For an embedded schema, set schema = "embedded" in your converter definition.
For IPC messages, specify the schema in one of two ways: to use an inline schema string, set
schema = "<schema string>"; or to use a schema defined in a separate file, set schema-file = "<path to file>"
(the schema file must be available on the classpath).
9.11.2. Transform Functions¶
The current Avro record being parsed is available to field transforms as $1. The original message bytes are available
as $0, which may be useful for generating consistent feature IDs.
In addition to the standard Transformation Functions, the Avro converter provides the following Avro-specific functions:
9.11.2.1. avroPath¶
Description: Extract values from nested Avro structures.
Usage: avroPath($ref, $pathString)
$ref- a reference object (avro root or extracted object)pathString- forward-slash delimited path strings
Avro paths are defined similarly to JSONPath or XPath, and allow you to extract specific fields out of an Avro record. An Avro path consists of forward-slash delimited strings. Each part of the path defines a field name with an optional predicate:
$type=<typename>- match the Avro schema type name on the selected element[$<field>=<value>]- match elements with a field named “field” and a value equal to “value”
For example, /foo$type=bar/baz[$qux=quux]. See the example below for a concrete example.
9.11.2.2. avroToJson¶
Description: Converts Avro objects to JSON strings.
Usage: avroToJson($ref)
$ref- a reference object (avro root or extracted object)
9.11.2.3. avroBinaryList¶
GeoMesa has a custom Avro schema for writing SimpleFeatures. List, map and UUID attributes are serialized as binary Avro fields. This function can read a serialized list-type attribute.
Description: Parses a binary Avro value as a list
Usage: avroBinaryList($ref)
9.11.2.4. avroBinaryMap¶
GeoMesa has a custom Avro schema for writing SimpleFeatures. List, map and UUID attributes are serialized as binary Avro fields. This function can read a serialized map-type attribute.
Description: Parses a binary Avro value as a map
Usage: avroBinaryMap($ref)
9.11.2.5. avroBinaryUuid¶
GeoMesa has a custom Avro schema for writing SimpleFeatures. List, map and UUID attributes are serialized as binary Avro fields. This function can read a serialized UUID-type attribute.
Description: Parses a binary Avro value as a UUID
Usage: avroBinaryUuid($ref)
9.11.3. Example Usage¶
For this example we’ll use the following Avro schema in a classpath file named schema.avsc:
{
"namespace": "org.locationtech",
"type": "record",
"name": "CompositeMessage",
"fields": [
{
"name": "content",
"type": [
{
"name": "DataObj",
"type": "record",
"fields": [
{
"name": "kvmap",
"type": {
"type": "array",
"items": {
"name": "kvpair",
"type": "record",
"fields": [
{ "name": "k", "type": "string" },
{ "name": "v", "type": ["string", "double", "int", "null"] }
]
}
}
}
]
},
{
"name": "OtherObject",
"type": "record",
"fields": [{ "name": "id", "type": "int"}]
}
]
}
]
}
This schema defines an avro file that has a field named content
which has a nested object which is either of type DataObj or
OtherObject. As an exercise, we can use avro tools to generate some
test data and view it:
java -jar avro-tools-1.11.4.jar random --schema-file schema.avsc -count 5 /tmp/avro
$ java -jar /tmp/avro-tools-1.11.4.jar tojson /tmp/avro
{"content":{"org.locationtech.DataObj":{"kvmap":[{"k":"thhxhumkykubls","v":{"double":0.8793488185997134}},{"k":"mlungpiegrlof","v":{"double":0.45718223406586045}},{"k":"mtslijkjdt","v":null}]}}}
{"content":{"org.locationtech.OtherObject":{"id":-86025408}}}
{"content":{"org.locationtech.DataObj":{"kvmap":[]}}}
{"content":{"org.locationtech.DataObj":{"kvmap":[{"k":"aeqfvfhokutpovl","v":{"string":"kykfkitoqk"}},{"k":"omoeoo","v":{"string":"f"}}]}}}
{"content":{"org.locationtech.DataObj":{"kvmap":[{"k":"jdfpnxtleoh","v":{"double":0.7748286862915655}},{"k":"bueqwtmesmeesthinscnreqamlwdxprseejpkrrljfhdkijosnogusomvmjkvbljrfjafhrbytrfayxhptfpcropkfjcgs","v":{"int":-1787843080}},{"k":"nmopnvrcjyar","v":null},{"k":"i","v":{"string":"hcslpunas"}}]}}}
Here’s a more relevant sample record:
{
"content" : {
"org.locationtech.DataObj" : {
"kvmap" : [ {
"k" : "lat",
"v" : {
"double" : 45.0
}
}, {
"k" : "lon",
"v" : {
"double" : 45.0
}
}, {
"k" : "prop3",
"v" : {
"string" : " foo "
}
}, {
"k" : "prop4",
"v" : {
"double" : 1.0
}
} ]
}
}
}
Let’s say we want to convert our Avro array of kvpairs into a simple feature. We notice that there are 4 attributes:
lat
lon
prop3
prop4
The following converter config would be sufficient to parse the Avro:
{
type = "avro"
schema-file = "schema.avsc"
id-field = "uuid()"
fields = [
{ name = "tobj", transform = "avroPath($1, '/content$type=DataObj')" },
{ name = "lat", transform = "avroPath($tobj, '/kvmap[$k=lat]/v')" },
{ name = "lon", transform = "avroPath($tobj, '/kvmap[$k=lon]/v')" },
{ name = "geom", transform = "point($lon, $lat)" }
]
}