Top Banner
Importing and Using diverse Schemas and Data with the TopBraid Suite
38

Data Transformation using Semantic Web Standards

Jan 19, 2015

Download

Technology

Irene Polikoff

This presentation explains the benefits of using Semantic Web standards for integration and transformation of data. Step by step examples are included.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Transformation using Semantic Web Standards

Importing and Using diverse Schemas and Data with the TopBraid Suite

Page 2: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 2

The enterprise data integration problem

XML

RDB

Spreadsheet

How does government spending in certain sectors relates to my company’s earnings?

How does the historic spending relates to the current figures?

Give me report about all of my customers across the whole organization

Page 3: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 3

Merging data with RDF“Rote” syntactic transformation into RDF (the mathematically simplest way to denote linked data)

XML

RDB

Spreadsheet

Once in RDF: Merges happen as part of the infrastructure Concepts can be mapped to one another

For example, to say that one notion of “Customer” is more general than another

Without needing to reference the syntactic type of the source!

Mapping is also captured in RDFData transformation (on merged data) make no reference to syntax of the source – they can be written in a single language (SPARQL)

Page 4: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 4

Semantic Mappings

Page 5: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 5

Benefits of separating Syntactic details from Semantic mapping - 1

Rote import provides a name (URI) for every entity in every data source, so that they can be referenced It is easier to discuss how "my" use of the word

"Customer" relates to "your" use than to agree who gets to define "Customer“

By translating into a simple, common language, all mappings and transforms can be of the same form (i.e., SPARQL). In contrast to several transforms for each pair of

languages

Page 6: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 6

Benefits of separating Syntactic details from Semantic mapping - 2

Each new kind of source only needs one new importer In contrast to needing one for each old syntax

Import modules don’t need to implement the merge functionality The underlying data representation supports merge as

a primitive operation No need to worry about a number of information

types and when they can be merged; there is just one.

Page 7: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 7

TopBraid Suite’s Implementation - 1

Built-in default converters transform information from a variety of sources into RDF (rote import): Arbitrary XML, XML Schema, Spreadsheets, Databases,

etc Depending on the complexity, conversion logic is

either encoded in an ontology or in a Java module If round-triping is supported, all information from the

original is preserved (sometimes in annotations)

Page 8: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 8

TopBraid Suite’s Implementation - 2

Once in RDF, SPARQL is used to transform and map as needed Imported RDF “as-is” may not be what a particular

application requires Transformation steps are represented using mapping

ontologies and/or SPIN (http://www.topquadrant.com/spin/ ) rules/templates

Entire transformation process is saved as a SPARQLMotion (http://www.topquadrant.com/sparqlmotion/) script for repeated executions

Page 9: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 9

Semantic XML

Page 10: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 10

Built-in Converter Example: Semantic XML

Select an XML file and open it in TopBraid Composer (you may need to right click on a file and select Open With > TopBraid)Each element name becomes a classEach attribute becomes datatype propertyNesting is mapped into a dedicated object property

(composite:child)

(we are using a simple file describing people and jobs)

Page 11: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 11

Built-in Converter Example: Semantic XML

Converted to RDF

Page 12: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 12

Built-in Converter Example: Semantic XML

composite:child property captures the hierarchical nesting in the XML document

Each element becomes a class with instances for each occurrence of the element in the document

Page 13: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 13

Semantic Tables

Page 14: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 14

Built-in Converter Example: Semantic Tables *

Select an Excel file and simply open it in TopBraid ComposerEach sheet becomes a classColumns become datatype propertiesRows become instancesCells will be converted into triples, where the subject is

the row instance, the predicate is the column property, and the object is a literal with the value of the cell

*Assumes that the spreadsheet is structured as a table. Not all spreadsheets are designed this way. To support different design patterns TopBraid Suite offers more than one spreadsheet importer.

Page 15: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 15

Built-in Converter Example: Semantic Tables

Converted to RDF

Page 16: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 16

Other default importers

Relational Databases Uses simple mapping of tables to classes, columns and foreign

keys to properties XML profiles

Extends Semantic XML with pre-built profiles such as one for XHTML

XML Schema Complex logic provided in a specialized Java module

UML, RDFa, RSS, e-Mail, additional spreadsheet importers, …

Page 17: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 17

Merging Data

Page 18: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 18

Next Steps

RDF converted from the XML file and RDF from the spreadsheet can now be merged: Open one, switch to Import tab, drag and drop the second one or Create a mapping/aggregation file and import both, XML and

spreadsheet Creating connections

Conceptually XML and Excel examples are linked:• XML lists different people including their jobs and organizations they

work for• Excel has company information organized by industry sectors

But there are no connections in the raw data SPARQL queries (CONSTRUCT) including query templates (to

generalize query patterns) can be used to establish connections• Mappings are recorded in the mapping ontologies and scripts for

repeat execution

Page 19: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 19

Scripting Data Transformations

Page 20: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 20

Step by Step Example

Extract and convert data from a real XML file Publish result as a web pageCombine SPARQLMotion, Web Service, Semantic

XML, and XSD to accomplish the result.Step by step instructions are provided, requires

TopBraid Composer Maestro EditionAlso requires some familiarity with SPARQL and

SPARQLMotion Recommended first step is to go through the

SPARQLMotion tutorial and examples at: http://www.topquadrant.com/sparqlmotion/

Page 21: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 21

Open XML file

We will use an XML file from the US Federal Government about the FEA. Download from:

http://www.whitehouse.gov/omb/assets/fea_docs/FEA_XML_Doc_Rev_2_3.xml

Open it with Semantic XML

Page 22: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 22

Explore converted RDF

There are 42 BusinessLines in this XML file. Each one has a Name, Defintion, and SubFunction detail. Click on one and explore in the graph view

Page 23: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 23

Extract some information using SPARQL

Looking at the graph, write a SPARQL query that will determine the name of the business line and the BusinessLineID

Check that the business line in the graph appears in the solution

Page 24: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 24

Extract correlated information with SPARQL

Extend your query to find the corresponding BusinessLineDefinitionText.

Display just the names and descriptions of the business lines.

Save this query in a safe place – we’ll use it later

Page 25: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 25

Complete queries (or for more complex queries a starting point), can be generated directly from a graph

We call this generation capability “SPARQL by Example” – saves a lot of tedious work and helps to prevent mistakes

To get started display the graph pattern for a single business line

Shortcut: SPARQL by EXAMPLE - 1

Click to “pin down”, all the classes in the diagram, the rest will be treated as a variable

Click to on the star icon to generate a queryRun it in the usual way

Looks good, but we are not getting the text fields

Page 26: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 26

Shortcut: SPARQL by EXAMPLE - 2Click to “pin down” text fields so that they are included in the query

Modify the query by hand to turn the name and description into variables and to include only these variables in the SELECT list

We get one resultBut we need names and descriptions for all business lines, not just the one we pinned down!

Page 27: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 27

Encode the process in SPARQLMotion

Create a new SPARLQMotion file. Click “Yes”, it will declare web services.

Create a new SPARQLMotion script. Start with a CreateSpreadsheet;

call it findLOB

Page 28: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 28

Encode the process in SPARQLMotion

Bring the XML file into the SPARQLMotion script by dragging it onto the canvas.

This automatically makes a SXML import module.

Page 29: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 29

Connect these two modules together

Encode the process in SPARQLMotion

Page 30: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 30

Add your query to findLOB (double-click to edit)

Encode the process in SPARQLMotion

Page 31: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 31

Add a “ModifyPrefixes” module to specify the namespace for the query you just pasted.

Connect it with next to findLOB module Copy-and paste the base URI of the XML file with a space before

and a # after

Encode the process in SPARQLMotion

Page 32: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 32

Test the script

Run the whole script with the debug button select the last step

Results appear in the Console tab Results are in tab-delimited form

Page 33: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 33

Exposing Results with Web Services

Page 34: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 34

Serve as a web page

Add a Return Text module to the script. Call it showLOB. Make it the last module, right after findLOB

Page 35: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 35

View as a web page

Point you browser to:http://localhost:8083/tbl/actions?action=sparqlmotion&id=showLOB

Page 36: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 36

Extend the script to create an HTML file

These first two modules can be re-used from the

initial script

xhtml.owl can be found in your TBC folder, just

drag and drop it

ConvertRDFtoXML no configuration

needed

ReturnXMLMimetype text/html

ApplyConstructSee Copy and Paste file

for details

Page 37: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 37

Viewing in a Web Browser

http://localhost:8083/tbl/actions?action=sparqlmotion&id=tabulateLOB

Page 38: Data Transformation using Semantic Web Standards

© Copyright 2007-2009 TopQuadrant Inc. Slide 38

To Learn More

Attend one of TopQuadrant’s Semantic Web Technology Trainings:Semantic Web Technology & Introduction to TopBraid SuiteTopBraid Suite Advanced Product Training Series

For scheduled dates, locations and other information, visit: http://www.topquadrant.com/training/training_overview.html

Private, on-site trainings are also availableCall (703) 299-9330 or write to [email protected].