Tips & Tricks for Spatial Data Harmonization Dr. Christine Giger ([email protected]) Dr. Jan Schulze Althoff ([email protected])
Tips & Tricks for Spatial Data Harmonization
Dr. Christine Giger ([email protected])
Dr. Jan Schulze Althoff ([email protected])
Overview
➞Why is Spatial Data Harmonization still important or necessary?
➞Tools & Methods ➞Tips & Tricks ➞Conclusions
25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 2
Overview
➞Why is Spatial Data Harmonization still important or necessary?
➞Tools & Methods ➞Tips & Tricks ➞Conclusions
25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 3
Provision of INSPIRE-compliant data
➞All of the commercial SW vendors and many open source products offer “off-the-shelf” solutions to work with INSPIRE-compliant services
➞All data providers deliver their data INSPIRE compliant
➞Everything should be interoperable when using INSPIRE-compliant data
25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 4
BUT: Observed Problems (Software) ➞Still various interoperability issues between different
software systems – “data exported by system abc as INSPIRE compliant data cannot directly be used with system xyz” – some examples: – The data is not valid against the schema – The data and/or the schema cannot be imported into the
favoured spatial ETL-Tool or GIS ABC – The data cannot be visualized in the ETL-Tool or GIS ABC – Data on different themes created by the same tool cannot
be integrated or migrated – The data cannot be migrated with other (non-INSPIRE)
GML/XML data – Etc.
25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 5
Causes for the problems ➞Many GIS and ETL-Tools...:
– ... operate “data-centred” and are not “schema-aware”
• Data is not validated in the production process (errors like missing attributes, non-declared elements)
• Implicit restrictions (e.g. specific geometry representation required - <pos> vs. <poslist> or specific position of “srs” Attribute)
– ... use hard-coded namespaces and schema locations
– ... use a vendor-specific GML3.2.1 core schema – ... use deprecated types – …
25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 6
Observed Problems (Data provision)
➞Often data is still delivered in different formats/structures – Several XML-based formats, e.g.
• proprietary XML formats • GML 2.1 • GML 3.2.1 (in different flavors)
– Schemas are huge (up to 1.5 million lines) and (partly) complex (e.g. 580 complex types and over 80 referenced schemas)
25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 7
GML.xsd
AAA-Basisschema.xsd
AAA-Fachschema.xsd
Example: Overview on the structure of German topographic data (here: Streets)
AX_Strasse AX_StrasseType
AA_ZUSOType
AA_ObjektType
AbstractFeatureType
AX_Strassenachse
AX_Fahrbahnachse
AX_StrassenachseType
AX_FahrbahnachseType
TA_CurveComponentType
AG_ObjekteMit GemeinsamerGeometrie
AA_REOType hatDirektUnten
istTeilVon
25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 8
Overview
➞Why is Spatial Data Harmonization still important or necessary?
➞Tools & Methods ➞Tips & Tricks ➞Conclusions
25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 9
Requirements for data harmonization
➞Interoperability between software – „minor“ adaptions (adding/removing
attributes, changing namespaces, validating…)
-> „Scripting tasks“ ➞Data transformation of delivered/
provided data – „major“ structural changes (extraction of
elements, reclassification, grouping, …) ->“Complex transformations“
25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 10
Requirements for tools to support data transformation Technical requirements (grouped/simplified) ➞Read & Write XML Data ➞Support namespaces & schema validation ➞Support filtering on values, types, structures ➞Support conditional statements ➞Support group functions ➞Support simple spatial operations ➞Support GML 3.2.1 types directly
INSPIRE Conference 2013, Florence - Dr. Christine Giger 25.06.2013 11
Technology used in different projects ➞Spatial ETL
– Safe FME – Talend Spatial Data Integrator – GeoKettle – Humboldt Alignment Editor – …
➞Combinations of open toolsets – XSLT – Python – GDAL – XQuery (incl. ExPath Geo Module)
➞Observation: No or very few spatial transformations are needed
INSPIRE Conference 2013, Florence - Dr. Christine Giger 25.06.2013 12
XQuery - Overview ➞Functional language to query and create XML ➞Official W3C standard aligned with
– XPath for adressing XML – XSLT as template language
➞ Increasing relevance and maturity – esp. XML Databases (eXist DB, Marklogic, Oracle,
MS, ..) ➞Several Tools
– Execution Environments (Saxon, Zorba, Altova, …) – Development Support(Eclipse XQDT, XML Spy,
Oxygen, Stylus Studio, …)
25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 13
XQuery - Technically ➞Functional language
– Execution as chain of functions – Variety of predefined functions
• Standard functions for Strings, Numerics, Paths, … • Extended functions for fulltext, geo operations, … • External functions in C, Java, …
➞XML oriented – XPath based selection and filtering – Native XML types & Schema aware – Loops, conditional statements and grouping on XML
Collections – Static & dynamic creation of XML elements
25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 14
XQuery for Spatial Data Harmonization ➞Pros
– Open standard; several implementations and tools – Optimized for XML Processing (Schema aware;
collections/sequences and XML types) – Modularization and external libraries (e.g. ExPath Geo)
➞Cons – „Programming“ language with specific syntax (steep
learning curve) – No direct geospatial support
INSPIRE Conference 2013, Florence - Dr. Christine Giger 25.06.2013 15
Overview
➞Why is Spatial Data Harmonization still important or necessary?
➞Tools & Methods ➞Tips & Tricks ➞Conclusions
25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 16
Main “Tip & Trick”: use XQuery
➞Trick: for data/schema reduction – Simplifying data analysis and
understanding of structures – Speed up processing
➞Tip: for data/schema transformation – Encapsulate repetitive tasks in functions – Build modules for common structures
25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 17
Example 1: Data reduction by filtering of Featuretypes declare variable $input as element() := validate strict {doc('file:///C:/data/dataset.xml')/gid:AX_Bestandsdatenauszug};
let $featureSet = $input/enthaelt/wfs:FeatureCollection/gml:featureMember [name(child::*)='AX_Strasse' or name(child::*)='AX_Strassenachse' or name(child::*)='AX_Fahrbahnachse' ]
for $feature in $featureSet
return $feature
1. Define and validate an external file as datasource 2. Selection of XML elements by using XPath expression 3. Iterate the result and return data
1.
2.
3.
INSPIRE Conference 2013, Florence - Dr. Christine Giger 25.06.2013 18
Example 2: Reclassification of values
switch ($strasse/aaa:widmung) case "1301" return attribute{“abc:roadType”}{“highway”} case "1303" return attribute{“abc:roadType”}{“road”} default return element attribute{“abc:roadType”}{“unknown”}
1. Select the criteria 2. Decode the values and create according attributes 3. Return default attribute value
1.
2.
3.
INSPIRE Conference 2013, Florence - Dr. Christine Giger 25.06.2013 19
Tips: Get started with XQuery ➞Info:
– XQuery Spec: http://www.w3.org/TR/xquery-30/
– XQuery Tutorial: http://www.w3schools.com/xquery/
➞Environment: – Zorba (XQuery Processor)
http://www.zorba.io/ – Eclipse XQDT
http://wiki.eclipse.org/XQDT/
25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 20
XQuery Tools - XQDT
INSPIRE Conference 2013, Florence - Dr. Christine Giger 25.06.2013 21
XQuery Tools - XQDT
Tip: Use the older Eclipse „Indigo“ (Eclipse „Juno“ and „Kepler“ fail on big XML Schemas)
INSPIRE Conference 2013, Florence - Dr. Christine Giger 25.06.2013 22
Overview
➞Why is Spatial Data Harmonization still important or necessary?
➞Tools & Methods ➞Tips & Tricks ➞Conclusions
25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 23
Conclusions
➞XQuery is an excellent, easy-to-use method to specify and execute transformations for XML/GML data
➞Further possibilities for simplification, dependent on input and output schemas
25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 24
Next Steps - XQuery Modules
GML Base Modules: • GML321_basicTypes.xq • GML321_geometryAggregates.xq • GML321_geometryBasic0d1d.xq • GML321_geometryBasic2d.xq • GML321_geometryPrimitives.xq • GML321_gmlBase.xq • GML321_feature.xq • XLink10.xq (XLink Schema) Helper: • Tools (Type extension, UUID, …) • External Calls
Simplifying Modules: • GML321_Simple.xq (Creation of
Feature, Point, Curve, Surface) • GML321_GeometryTools.xq
(Harmonising Geometry, Simple Transf.)
Schema Modules: • MySchema.xq (Creation of
CoreElements)
25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 25
Conclusions
➞We are very much interested in exchanging experiences on the usage of XQuery for the transformation of spatial data!
➞Thank you for your attention!
25.06.2013 INSPIRE Conference 2013, Florence - Dr. Christine Giger 26