9.11. Avro Converter

The Avro converter handles data written by Apache Avro.

9.11.1. Configuration

The Avro converter supports the following configuration keys:

Key

Required

Type

Description

type

yes

String

Must be the string avro.

schema

yes

String

The Avro schema used for parsing (may be omitted if using schema-file).

schema-file

yes

String

A pointer to an Avro schema on the classpath (may be omitted if using schema).

9.11.1.1. schema/schema-file

The Avro converter supports parsing whole Avro files, with the schema embedded, or Avro IPC messages with the schema omitted. For an embedded schema, set schema = "embedded" in your converter definition. For IPC messages, specify the schema in one of two ways: to use an inline schema string, set schema = "<schema string>"; or to use a schema defined in a separate file, set schema-file = "<path to file>" (the schema file must be available on the classpath).

9.11.2. Transform Functions

The current Avro record being parsed is available to field transforms as $1. The original message bytes are available as $0, which may be useful for generating consistent feature IDs.

In addition to the standard Transformation Functions, the Avro converter provides the following Avro-specific functions:

9.11.2.1. avroPath

Description: Extract values from nested Avro structures.

Usage: avroPath($ref, $pathString)

  • $ref - a reference object (avro root or extracted object)

  • pathString - forward-slash delimited path strings

Avro paths are defined similarly to JSONPath or XPath, and allow you to extract specific fields out of an Avro record. An Avro path consists of forward-slash delimited strings. Each part of the path defines a field name with an optional predicate:

  • $type=<typename> - match the Avro schema type name on the selected element

  • [$<field>=<value>] - match elements with a field named “field” and a value equal to “value”

For example, /foo$type=bar/baz[$qux=quux]. See the example below for a concrete example.

9.11.2.2. avroToJson

Description: Converts Avro objects to JSON strings.

Usage: avroToJson($ref)

  • $ref - a reference object (avro root or extracted object)

9.11.2.3. avroBinaryList

GeoMesa has a custom Avro schema for writing SimpleFeatures. List, map and UUID attributes are serialized as binary Avro fields. This function can read a serialized list-type attribute.

Description: Parses a binary Avro value as a list

Usage: avroBinaryList($ref)

9.11.2.4. avroBinaryMap

GeoMesa has a custom Avro schema for writing SimpleFeatures. List, map and UUID attributes are serialized as binary Avro fields. This function can read a serialized map-type attribute.

Description: Parses a binary Avro value as a map

Usage: avroBinaryMap($ref)

9.11.2.5. avroBinaryUuid

GeoMesa has a custom Avro schema for writing SimpleFeatures. List, map and UUID attributes are serialized as binary Avro fields. This function can read a serialized UUID-type attribute.

Description: Parses a binary Avro value as a UUID

Usage: avroBinaryUuid($ref)

9.11.3. Example Usage

For this example we’ll use the following Avro schema in a classpath file named schema.avsc:

{
  "namespace": "org.locationtech",
  "type": "record",
  "name": "CompositeMessage",
  "fields": [
    {
      "name": "content",
      "type": [
         {
           "name": "DataObj",
           "type": "record",
           "fields": [
             {
               "name": "kvmap",
               "type": {
                  "type": "array",
                  "items": {
                    "name": "kvpair",
                    "type": "record",
                    "fields": [
                      { "name": "k", "type": "string" },
                      { "name": "v", "type": ["string", "double", "int", "null"] }
                    ]
                  }
               }
             }
           ]
         },
         {
            "name": "OtherObject",
            "type": "record",
            "fields": [{ "name": "id", "type": "int"}]
         }
      ]
    }
  ]
}

This schema defines an avro file that has a field named content which has a nested object which is either of type DataObj or OtherObject. As an exercise, we can use avro tools to generate some test data and view it:

java -jar avro-tools-1.11.4.jar random --schema-file schema.avsc -count 5 /tmp/avro

$ java -jar /tmp/avro-tools-1.11.4.jar tojson /tmp/avro
{"content":{"org.locationtech.DataObj":{"kvmap":[{"k":"thhxhumkykubls","v":{"double":0.8793488185997134}},{"k":"mlungpiegrlof","v":{"double":0.45718223406586045}},{"k":"mtslijkjdt","v":null}]}}}
{"content":{"org.locationtech.OtherObject":{"id":-86025408}}}
{"content":{"org.locationtech.DataObj":{"kvmap":[]}}}
{"content":{"org.locationtech.DataObj":{"kvmap":[{"k":"aeqfvfhokutpovl","v":{"string":"kykfkitoqk"}},{"k":"omoeoo","v":{"string":"f"}}]}}}
{"content":{"org.locationtech.DataObj":{"kvmap":[{"k":"jdfpnxtleoh","v":{"double":0.7748286862915655}},{"k":"bueqwtmesmeesthinscnreqamlwdxprseejpkrrljfhdkijosnogusomvmjkvbljrfjafhrbytrfayxhptfpcropkfjcgs","v":{"int":-1787843080}},{"k":"nmopnvrcjyar","v":null},{"k":"i","v":{"string":"hcslpunas"}}]}}}

Here’s a more relevant sample record:

{
  "content" : {
    "org.locationtech.DataObj" : {
      "kvmap" : [ {
        "k" : "lat",
        "v" : {
          "double" : 45.0
        }
      }, {
        "k" : "lon",
        "v" : {
          "double" : 45.0
        }
      }, {
        "k" : "prop3",
        "v" : {
          "string" : " foo "
        }
      }, {
        "k" : "prop4",
        "v" : {
          "double" : 1.0
        }
      } ]
    }
  }
}

Let’s say we want to convert our Avro array of kvpairs into a simple feature. We notice that there are 4 attributes:

  • lat

  • lon

  • prop3

  • prop4

The following converter config would be sufficient to parse the Avro:

{
  type        = "avro"
  schema-file = "schema.avsc"
  id-field    = "uuid()"
  fields = [
    { name = "tobj", transform = "avroPath($1, '/content$type=DataObj')" },
    { name = "lat",  transform = "avroPath($tobj, '/kvmap[$k=lat]/v')" },
    { name = "lon",  transform = "avroPath($tobj, '/kvmap[$k=lon]/v')" },
    { name = "geom", transform = "point($lon, $lat)" }
  ]
}