.. _avro_converter: Avro Converter ============== The Avro converter handles data written by `Apache Avro `__. To use the Avro converter, specify ``type = "avro"`` in your converter definition. Configuration ------------- The Avro converter supports parsing whole Avro files, with the schema embedded, or Avro IPC messages with the schema omitted. For an embedded schema, set ``schema = "embedded"`` in your converter definition. For IPC messages, specify the schema in one of two ways: to use an inline schema string, set ``schema = ""``; to use a schema defined in a separate file, set ``schema-file = ""``. The Avro record being parsed is available to field transforms as ``$1``. Avro Paths ---------- Avro paths are defined similarly to JSONPath or XPath, and allow you to extract specific fields out of an Avro record. An Avro path consists of forward-slash delimited strings. Each part of the path defines a field name with an optional predicate: * ``$type=`` - match the Avro schema type name on the selected element * ``[$=]`` - match elements with a field named "field" and a value equal to "value" For example, ``/foo$type=bar/baz[$qux=quux]``. See `Example Usage`, below, for a concrete example. Avro paths are available through the ``avroPath`` transform function, as described below. .. _avro_converter_functions: Avro Transform Functions ------------------------ GeoMesa defines several Avro-specific transform functions. avroPath ^^^^^^^^ Description: Extract values from nested Avro structures. Usage: ``avroPath($ref, $pathString)`` * ``$ref`` - a reference object (avro root or extracted object) * ``pathString`` - forward-slash delimited path strings. See `Avro Paths`, above avroBinaryList ^^^^^^^^^^^^^^ GeoMesa has a custom Avro schema for writing SimpleFeatures. List, map and UUID attributes are serialized as binary Avro fields. This function can read a serialized list-type attribute. Description: Parses a binary Avro value as a list Usage: ``avroBinaryList($ref)`` avroBinaryMap ^^^^^^^^^^^^^ GeoMesa has a custom Avro schema for writing SimpleFeatures. List, map and UUID attributes are serialized as binary Avro fields. This function can read a serialized map-type attribute. Description: Parses a binary Avro value as a map Usage: ``avroBinaryMap($ref)`` avroBinaryUuid ^^^^^^^^^^^^^^ GeoMesa has a custom Avro schema for writing SimpleFeatures. List, map and UUID attributes are serialized as binary Avro fields. This function can read a serialized UUID-type attribute. Description: Parses a binary Avro value as a UUID Usage: ``avroBinaryUuid($ref)`` Example Usage ------------- For this example we'll use the following Avro schema in a file named ``/tmp/schema.avsc``: :: { "namespace": "org.locationtech", "type": "record", "name": "CompositeMessage", "fields": [ { "name": "content", "type": [ { "name": "DataObj", "type": "record", "fields": [ { "name": "kvmap", "type": { "type": "array", "items": { "name": "kvpair", "type": "record", "fields": [ { "name": "k", "type": "string" }, { "name": "v", "type": ["string", "double", "int", "null"] } ] } } } ] }, { "name": "OtherObject", "type": "record", "fields": [{ "name": "id", "type": "int"}] } ] } ] } This schema defines an avro file that has a field named ``content`` which has a nested object which is either of type ``DataObj`` or ``OtherObject``. As an exercise, we can use avro tools to generate some test data and view it:: java -jar /tmp/avro-tools-1.7.7.jar random --schema-file /tmp/schema -count 5 /tmp/avro $ java -jar /tmp/avro-tools-1.7.7.jar tojson /tmp/avro {"content":{"org.locationtech.DataObj":{"kvmap":[{"k":"thhxhumkykubls","v":{"double":0.8793488185997134}},{"k":"mlungpiegrlof","v":{"double":0.45718223406586045}},{"k":"mtslijkjdt","v":null}]}}} {"content":{"org.locationtech.OtherObject":{"id":-86025408}}} {"content":{"org.locationtech.DataObj":{"kvmap":[]}}} {"content":{"org.locationtech.DataObj":{"kvmap":[{"k":"aeqfvfhokutpovl","v":{"string":"kykfkitoqk"}},{"k":"omoeoo","v":{"string":"f"}}]}}} {"content":{"org.locationtech.DataObj":{"kvmap":[{"k":"jdfpnxtleoh","v":{"double":0.7748286862915655}},{"k":"bueqwtmesmeesthinscnreqamlwdxprseejpkrrljfhdkijosnogusomvmjkvbljrfjafhrbytrfayxhptfpcropkfjcgs","v":{"int":-1787843080}},{"k":"nmopnvrcjyar","v":null},{"k":"i","v":{"string":"hcslpunas"}}]}}} Here's a more relevant sample record:: { "content" : { "org.locationtech.DataObj" : { "kvmap" : [ { "k" : "lat", "v" : { "double" : 45.0 } }, { "k" : "lon", "v" : { "double" : 45.0 } }, { "k" : "prop3", "v" : { "string" : " foo " } }, { "k" : "prop4", "v" : { "double" : 1.0 } } ] } } } Let's say we want to convert our Avro array of kvpairs into a simple feature. We notice that there are 4 attributes: - lat - lon - prop3 - prop4 The following converter config would be sufficient to parse the Avro:: { type = "avro" schema-file = "/tmp/schema.avsc" sft = "testsft" id-field = "uuid()" fields = [ { name = "tobj", transform = "avroPath($1, '/content$type=DataObj')" }, { name = "lat", transform = "avroPath($tobj, '/kvmap[$k=lat]/v')" }, { name = "lon", transform = "avroPath($tobj, '/kvmap[$k=lon]/v')" }, { name = "geom", transform = "point($lon, $lat)" } ] }