Parsing XML
-----------
The XML converter defines each field using XPath expressions. For XML documents with multiple features,
the ``feature-path`` element can be used to select feature elements. In this case, the attribute paths will
be relevant to the feature element. The optional ``xsd`` element can be used to validate input files against
an XML schema.
By default, the XML converter will treat each line of input as a single XML document. The ``line-mode`` option
can be used to parse the entire input as a single document instead of line-by-line. Note that multi-line parsing
will read the entire input into memory, so should not be used with large files.
The XML converter will attempt to use the Saxon XPath factory if it is available. In GeoMesa tools, a script is
provided to download saxon - ``bin/install-saxon.sh``. To specify an alternate XPath factory, use the ``xpath-factory``
option. If the factory can not be loaded, the default Java factory will be used - note that this can be
significantly slower.
Example XML:
.. code-block:: xml
myxml
123
12.23
44.3
red
456
20.3
33.2
blue
Config:
::
{
type = "xml"
id-field = "uuid()"
feature-path = "Feature" // optional path to feature elements
xsd = "example.xsd" // optional xsd file to validate input
xpath-factory = "net.sf.saxon.xpath.XPathFactoryImpl"
options = {
line-mode = "multi" // or "single"
}
fields = [
{ name = "number", path = "number", transform = "$0::integer" }
{ name = "color", path = "color", transform = "trim($0)" }
{ name = "weight", path = "physical/@weight", transform = "$0::double" }
{ name = "source", path = "/doc/DataSource/name/text()" }
{ name = "lat", path = "geom/lat", transform = "$0::double" }
{ name = "lon", path = "geom/lon", transform = "$0::double" }
{ name = "geom", transform = "point($lon, $lat)" }
]
}
Handling Namespaces with Saxon
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Using the default XPath factory, XML namespaces can generally be ignored. However, the Saxon factory
requires namespaces to be declared. You can accomplish this through the ``xml-namespaces`` configuration.
Example XML:
.. code-block:: xml
myxml
123
12.23
44.3
red
Config:
::
{
type = "xml"
id-field = "uuid()"
feature-path = "foo:Feature" // optional path to feature elements
xsd = "example.xsd" // optional xsd file to validate input
xpath-factory = "net.sf.saxon.xpath.XPathFactoryImpl"
options = {
line-mode = "multi" // or "single"
}
xml-namespaces = {
foo = "http://example.com/foo"
bar = "http://example.com/bar"
}
fields = [
{ name = "number", path = "foo:number", transform = "$0::integer" }
{ name = "color", path = "foo:color", transform = "trim($0)" }
{ name = "weight", path = "foo:physical/@weight", transform = "$0::double" }
{ name = "source", path = "/foo:doc/foo:DataSource/foo:name/text()" }
{ name = "lat", path = "bar:geom/bar:lat", transform = "$0::double" }
{ name = "lon", path = "bar:geom/bar:lon", transform = "$0::double" }
{ name = "geom", transform = "point($lon, $lat)" }
]
}