Importing and Using diverse Schemas and Data with the TopBraid Suite
Jan 19, 2015
Importing and Using diverse Schemas and Data with the TopBraid Suite
© Copyright 2007-2009 TopQuadrant Inc. Slide 2
The enterprise data integration problem
XML
RDB
Spreadsheet
How does government spending in certain sectors relates to my company’s earnings?
How does the historic spending relates to the current figures?
Give me report about all of my customers across the whole organization
© Copyright 2007-2009 TopQuadrant Inc. Slide 3
Merging data with RDF“Rote” syntactic transformation into RDF (the mathematically simplest way to denote linked data)
XML
RDB
Spreadsheet
Once in RDF: Merges happen as part of the infrastructure Concepts can be mapped to one another
For example, to say that one notion of “Customer” is more general than another
Without needing to reference the syntactic type of the source!
Mapping is also captured in RDFData transformation (on merged data) make no reference to syntax of the source – they can be written in a single language (SPARQL)
© Copyright 2007-2009 TopQuadrant Inc. Slide 4
Semantic Mappings
© Copyright 2007-2009 TopQuadrant Inc. Slide 5
Benefits of separating Syntactic details from Semantic mapping - 1
Rote import provides a name (URI) for every entity in every data source, so that they can be referenced It is easier to discuss how "my" use of the word
"Customer" relates to "your" use than to agree who gets to define "Customer“
By translating into a simple, common language, all mappings and transforms can be of the same form (i.e., SPARQL). In contrast to several transforms for each pair of
languages
© Copyright 2007-2009 TopQuadrant Inc. Slide 6
Benefits of separating Syntactic details from Semantic mapping - 2
Each new kind of source only needs one new importer In contrast to needing one for each old syntax
Import modules don’t need to implement the merge functionality The underlying data representation supports merge as
a primitive operation No need to worry about a number of information
types and when they can be merged; there is just one.
© Copyright 2007-2009 TopQuadrant Inc. Slide 7
TopBraid Suite’s Implementation - 1
Built-in default converters transform information from a variety of sources into RDF (rote import): Arbitrary XML, XML Schema, Spreadsheets, Databases,
etc Depending on the complexity, conversion logic is
either encoded in an ontology or in a Java module If round-triping is supported, all information from the
original is preserved (sometimes in annotations)
© Copyright 2007-2009 TopQuadrant Inc. Slide 8
TopBraid Suite’s Implementation - 2
Once in RDF, SPARQL is used to transform and map as needed Imported RDF “as-is” may not be what a particular
application requires Transformation steps are represented using mapping
ontologies and/or SPIN (http://www.topquadrant.com/spin/ ) rules/templates
Entire transformation process is saved as a SPARQLMotion (http://www.topquadrant.com/sparqlmotion/) script for repeated executions
© Copyright 2007-2009 TopQuadrant Inc. Slide 9
Semantic XML
© Copyright 2007-2009 TopQuadrant Inc. Slide 10
Built-in Converter Example: Semantic XML
Select an XML file and open it in TopBraid Composer (you may need to right click on a file and select Open With > TopBraid)Each element name becomes a classEach attribute becomes datatype propertyNesting is mapped into a dedicated object property
(composite:child)
(we are using a simple file describing people and jobs)
© Copyright 2007-2009 TopQuadrant Inc. Slide 11
Built-in Converter Example: Semantic XML
Converted to RDF
© Copyright 2007-2009 TopQuadrant Inc. Slide 12
Built-in Converter Example: Semantic XML
composite:child property captures the hierarchical nesting in the XML document
Each element becomes a class with instances for each occurrence of the element in the document
© Copyright 2007-2009 TopQuadrant Inc. Slide 13
Semantic Tables
© Copyright 2007-2009 TopQuadrant Inc. Slide 14
Built-in Converter Example: Semantic Tables *
Select an Excel file and simply open it in TopBraid ComposerEach sheet becomes a classColumns become datatype propertiesRows become instancesCells will be converted into triples, where the subject is
the row instance, the predicate is the column property, and the object is a literal with the value of the cell
*Assumes that the spreadsheet is structured as a table. Not all spreadsheets are designed this way. To support different design patterns TopBraid Suite offers more than one spreadsheet importer.
© Copyright 2007-2009 TopQuadrant Inc. Slide 15
Built-in Converter Example: Semantic Tables
Converted to RDF
© Copyright 2007-2009 TopQuadrant Inc. Slide 16
Other default importers
Relational Databases Uses simple mapping of tables to classes, columns and foreign
keys to properties XML profiles
Extends Semantic XML with pre-built profiles such as one for XHTML
XML Schema Complex logic provided in a specialized Java module
UML, RDFa, RSS, e-Mail, additional spreadsheet importers, …
© Copyright 2007-2009 TopQuadrant Inc. Slide 17
Merging Data
© Copyright 2007-2009 TopQuadrant Inc. Slide 18
Next Steps
RDF converted from the XML file and RDF from the spreadsheet can now be merged: Open one, switch to Import tab, drag and drop the second one or Create a mapping/aggregation file and import both, XML and
spreadsheet Creating connections
Conceptually XML and Excel examples are linked:• XML lists different people including their jobs and organizations they
work for• Excel has company information organized by industry sectors
But there are no connections in the raw data SPARQL queries (CONSTRUCT) including query templates (to
generalize query patterns) can be used to establish connections• Mappings are recorded in the mapping ontologies and scripts for
repeat execution
© Copyright 2007-2009 TopQuadrant Inc. Slide 19
Scripting Data Transformations
© Copyright 2007-2009 TopQuadrant Inc. Slide 20
Step by Step Example
Extract and convert data from a real XML file Publish result as a web pageCombine SPARQLMotion, Web Service, Semantic
XML, and XSD to accomplish the result.Step by step instructions are provided, requires
TopBraid Composer Maestro EditionAlso requires some familiarity with SPARQL and
SPARQLMotion Recommended first step is to go through the
SPARQLMotion tutorial and examples at: http://www.topquadrant.com/sparqlmotion/
© Copyright 2007-2009 TopQuadrant Inc. Slide 21
Open XML file
We will use an XML file from the US Federal Government about the FEA. Download from:
http://www.whitehouse.gov/omb/assets/fea_docs/FEA_XML_Doc_Rev_2_3.xml
Open it with Semantic XML
© Copyright 2007-2009 TopQuadrant Inc. Slide 22
Explore converted RDF
There are 42 BusinessLines in this XML file. Each one has a Name, Defintion, and SubFunction detail. Click on one and explore in the graph view
© Copyright 2007-2009 TopQuadrant Inc. Slide 23
Extract some information using SPARQL
Looking at the graph, write a SPARQL query that will determine the name of the business line and the BusinessLineID
Check that the business line in the graph appears in the solution
© Copyright 2007-2009 TopQuadrant Inc. Slide 24
Extract correlated information with SPARQL
Extend your query to find the corresponding BusinessLineDefinitionText.
Display just the names and descriptions of the business lines.
Save this query in a safe place – we’ll use it later
© Copyright 2007-2009 TopQuadrant Inc. Slide 25
Complete queries (or for more complex queries a starting point), can be generated directly from a graph
We call this generation capability “SPARQL by Example” – saves a lot of tedious work and helps to prevent mistakes
To get started display the graph pattern for a single business line
Shortcut: SPARQL by EXAMPLE - 1
Click to “pin down”, all the classes in the diagram, the rest will be treated as a variable
Click to on the star icon to generate a queryRun it in the usual way
Looks good, but we are not getting the text fields
© Copyright 2007-2009 TopQuadrant Inc. Slide 26
Shortcut: SPARQL by EXAMPLE - 2Click to “pin down” text fields so that they are included in the query
Modify the query by hand to turn the name and description into variables and to include only these variables in the SELECT list
We get one resultBut we need names and descriptions for all business lines, not just the one we pinned down!
© Copyright 2007-2009 TopQuadrant Inc. Slide 27
Encode the process in SPARQLMotion
Create a new SPARLQMotion file. Click “Yes”, it will declare web services.
Create a new SPARQLMotion script. Start with a CreateSpreadsheet;
call it findLOB
© Copyright 2007-2009 TopQuadrant Inc. Slide 28
Encode the process in SPARQLMotion
Bring the XML file into the SPARQLMotion script by dragging it onto the canvas.
This automatically makes a SXML import module.
© Copyright 2007-2009 TopQuadrant Inc. Slide 29
Connect these two modules together
Encode the process in SPARQLMotion
© Copyright 2007-2009 TopQuadrant Inc. Slide 30
Add your query to findLOB (double-click to edit)
Encode the process in SPARQLMotion
© Copyright 2007-2009 TopQuadrant Inc. Slide 31
Add a “ModifyPrefixes” module to specify the namespace for the query you just pasted.
Connect it with next to findLOB module Copy-and paste the base URI of the XML file with a space before
and a # after
Encode the process in SPARQLMotion
© Copyright 2007-2009 TopQuadrant Inc. Slide 32
Test the script
Run the whole script with the debug button select the last step
Results appear in the Console tab Results are in tab-delimited form
© Copyright 2007-2009 TopQuadrant Inc. Slide 33
Exposing Results with Web Services
© Copyright 2007-2009 TopQuadrant Inc. Slide 34
Serve as a web page
Add a Return Text module to the script. Call it showLOB. Make it the last module, right after findLOB
© Copyright 2007-2009 TopQuadrant Inc. Slide 35
View as a web page
Point you browser to:http://localhost:8083/tbl/actions?action=sparqlmotion&id=showLOB
© Copyright 2007-2009 TopQuadrant Inc. Slide 36
Extend the script to create an HTML file
These first two modules can be re-used from the
initial script
xhtml.owl can be found in your TBC folder, just
drag and drop it
ConvertRDFtoXML no configuration
needed
ReturnXMLMimetype text/html
ApplyConstructSee Copy and Paste file
for details
© Copyright 2007-2009 TopQuadrant Inc. Slide 37
Viewing in a Web Browser
http://localhost:8083/tbl/actions?action=sparqlmotion&id=tabulateLOB
© Copyright 2007-2009 TopQuadrant Inc. Slide 38
To Learn More
Attend one of TopQuadrant’s Semantic Web Technology Trainings:Semantic Web Technology & Introduction to TopBraid SuiteTopBraid Suite Advanced Product Training Series
For scheduled dates, locations and other information, visit: http://www.topquadrant.com/training/training_overview.html
Private, on-site trainings are also availableCall (703) 299-9330 or write to [email protected].