Linked Energy Data Generation
Post on 22-Nov-2014
331 Views
Preview:
DESCRIPTION
Transcript
Linked Energy Data GenerationTutorial
Filip Radulovic, María Poveda Villalón, Raúl García-Castro
{fradulovic,mpoveda,rgarcia}@fi.upm.esETSI Informaticos
Universidad Politécnica de MadridCampus de Montegancedo s/n
28660 Boadilla del Monte, Madrid, Spain
Twitter: @LD4SC
02.10.2014. Sustainable Places 2014, Nice, France
License
• This work is licensed under the Creative Commons Attribution – Non Commercial – Share Alike License
• You are free:• to Share — to copy, distribute and transmit the work• to Remix — to adapt the work
• Under the following conditions• Attribution — You must attribute the work by inserting
• “[source http://www.oeg-upm.net/]” at the footer of each reused slide
• a credits slide stating: “These slides are partially based on “Linked Energy Data Generation” by F. Radulovic, M. Poveda-Villalón, R. García-Castro”
• Non-commercial• Share-Alike
2
Table of Contents
1. Introduction2. Data preparation3. Ontology development4. Data generation5. Discussion and Conclusions
3
Classic Web
CIA WorldFactBookWikipedia
Data exposed to the Web via HTML
Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig4
Classic Web
• Typical web page markup consists of:• Rendering information
(e.g., font size and colour)
• Hyper-links to related content
• Semantic content is accessible to humans but not (easily) to computers…
5
Classic Web
Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
Information from single pages can be found via search engines
6
CIA WorldFactBookMovieDB
Classic Web
Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
What about complex queries over multiple pages / data sources?
Show me a picture of the tallest building in the country with the highest
CO2 emission rate in 2013
Impossible
7
CIA WorldFactBookMovieDB
Classic WebWhat about complex queries over multiple pages / data sources?
Show me a picture of the tallest building in the country with the highest
CO2 emission rate in 2013?
Impossible
8
What do we actually want?• Use the Web like a single global database
• Move from a Web of documents to a Web of Data
Slide adapted from Boris Villazón Terrazas and “5min Introduction to Linked Data”- Olaf Hartig
Wikipedia CIA WorldFactBook
Shanghai Tower 2013-8-3CC BY-SA 3.0
9
Linked Data enables such Web of Data
Slide adapted from Boris Villazón Terrazas and “5min Introduction to Linked Data”- Olaf Hartig
Global Identifier: URI (Uniform Resource Identifier) identifies a resource on the Internet.Data Model: RDF (Resource Description Framework) standard model for data interchange on the Web.Access Mechanism: HTTPConnection: Typed Links
Wikipedia CIA WorldFactBook
Shanghai Tower 2013-8-3CC BY-SA 3.0
http://cia.../China 10000…http://...wikipedia.../data/shangaiTower
http://.../co2emission
http
://...
/dep
ictio
n
2013
http://.../co2emissionPerYearhttp://.../location
http://.../location
http://.../year
http://…#sameAs
10
The four principles (Tim Berners Lee, 2006)
1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those
names. 3. When someone looks up a URI, provide useful
information, using the standards (RDF*, SPARQL) 4. Include links to other URIs, so that they can discover
more things.
http://www.w3.org/DesignIssues/LinkedData.html11
“The Semantic Web is an extension of the current Web in which information is
given well-defined meaning, better enabling computers and people to work in
cooperation.
It is based on the idea of having data on the Web defined and linked such that it
can be used for more effective discovery, automation, integration, and reuse
across various applications.”
Hendler, J., Berners-Lee, T., and Miller, E. Integrating Applications on the Semantic Web, 2002, http://www.w3.org/2002/07/swint.html
Semantic Web definition
12
Benefits + Cases of success
• Provide semantics meaningful data & common understanding• Interoperability
• Reasoning power• Infer more data • Find mistakes in the original data?
• Enrich your data (with what is already out there)• Search engines are indexing some schemas
• Increase visibility• Multilingual information
13
In this tutorial“D4.1 Requirements and guidelines for energy data generation” From READY4SmartCities project available at http://goo.gl/IWDmYy
14
Table of Contents
1. Introduction2. Data preparation
1. Select data source2. Obtain access to data source3. Analyse licensing of the data source4. Analyse data source
3. Ontology development4. Data generation5. Discussion and Conclusions
15
Select data source
• Selecting the data source that will be transformed into Linked Data
• Steps1. To define the requirements2. To select one or several data sources
• Alternatives:• Data set from your own organization• Data sourced not owned by your organization (external data
sources)
16
Select data source – LCC example
• Limitation to external data sources (search)1. Requirements
• Real-world scenario in the energy domain • Available for use• Available in machine-processable format (the
more structured the data are, the better)• Can be linked with generic entities (e.g., location)
2. Leeds City Council – energy consumption (http://data.gov.uk/dataset/council-energy-consumption)
17
Table of Contents
1. Introduction2. Data preparation
1. Select data source2. Obtain access to data source3. Analyse licensing of the data source4. Analyse data source
3. Ontology development4. Data generation5. Discussion and Conclusions
18
Obtain access to data source
• Data access means • technical means to retrieve the data• legal rights to use the data
• In some cases, data source might not be accessible
• Steps1. To identify the person to contact2. To request the access3. To obtain access and to retrieve the data
• Access alternatives: files, programming interface, database, data streams, etc.
19
Obtain access to data source – LCC example
• Data set already available for download
• Available in a CSV file
20
Table of Contents
1. Introduction2. Data preparation
1. Select data source2. Obtain access to data source3. Analyse licensing of the data source4. Analyse data source
3. Ontology development4. Data generation5. Discussion and Conclusions
21
Analysing licensing of the data source
• Licenses specify the legal terms under which a data set can be used and exploited
• Steps1. To identify the publisher2. To find the applicable license
• Web page, data set metadata, data itself• Contact the publisher
3. To read the license and determine legal terms
• Tips• Analysis should be performed upon all available copies of
the data• Ensure compatible licences between several data sources
22
Analyse licensing – LCC example
23
Table of Contents
1. Introduction2. Data preparation
1. Select data source2. Obtain access to data source3. Analyse licensing of the data source4. Analyse data source
3. Ontology development4. Data generation5. Discussion and Conclusions
24
Analyse data source
• Getting insight into data structure and organization
• Steps1. To analyse the characteristics of the data
• Data values, data ranges, etc.
2. To obtain the schema of the data• Description of concepts and their relationships
• Data format alternatives: • Structured data• Unstructured data
• Tip: Use standard modeling language for data schema (e.g., UML)
25
Analyse data source – LCC example
• Electricity, gas and oil consumptions as decimal values
• 1-year intervals - 2010/11, 2011/12, 2012/13• Different types of council sites (mostly buildings) • Full address provided (street, city, district)• Correspondence with people from LCC open data
26
Table of Contents
1. Introduction2. Data preparation3. Ontology development4. Data generation5. Discussion and Conclusions
27
Ontology development - Preparation• RDF – Resource Description Framework
• Data model• (subject-predicate-object)
• Resource naming strategy• For terms
• Pattern: http://smartcity.linkeddata.es/lcc/ontology/EnergyConsumption#myterm
• Example: http://smartcity.linkeddata.es/lcc/ontology/EnergyConsumption#hasQuantitiveValue
• For individuals• Pattern:
http://smartcity.linkeddata.es/lcc/resource/LeisureCentre/myIndividual• Example:
http://smartcity.linkeddata.es/lcc/resource/LeisureCentre/LeisureCentreWetJohnCharlesCentreforSport
• RDF syntaxes• RDF/XML, ttl, N3, N quads
28
Ontology development
[1] Suárez-Figueroa, M.C. PhD Thesis: NeOn Methodology for Building Ontology Networks: Specification, Scheduling and Reuse. Spain. June 2010.
Activity definition taken from [1]
Focus of each activity
Existing tools to carry out the activity
Tips, alternatives and references
29
Ontology developmentOntology Requirements: refers to the activity of collecting the requirements that the ontology should fulfil (for example, reasons to build the ontology, identification of target groups and intended uses). (NeOn)
30
Proposed references: - NeOn Guidelines for non functional
requirements.-Competency Questions technique
[1]
Tools: mind map, text editor, etc
[1] Gruninger, M., Fox, M. S. The role of competency questions in enterprise engineering. In Proceedings of the IFIP WG5.7 Workshop on Benchmarking - Theory and Practice, Trondheim, Norway, 1994.
Ontology development – LLC exampleLCC example (Data from….)
Non functional requirements specified:• The ontology will try to adopt concepts and design patterns
in other ontologies where possible• The ontology should be implemented in OWL 2 DL
31
Ontology developmentOntology term extraction to extract a glossary of terms that may be developed.
Tools for terminology extraction:• Identify nouns, verbs, etc.
• Tools: Freeling for free text
Focus:• Extract terminology from Competency Questions (NeOn)• Extract terminology directly from the data
• Expert advise || Done by experts
32
Complete the list with synonyms
Ontology development – LLC example
Siteplace
Address
PostCode
ElectricityConsumption, utilization
yearstime
33
Ontology developmentOntology conceptualization refers to the activity of organizing and structuring the information (data, knowledge, etc.), obtained during the acquisition process, into meaningful models at the knowledge level and according to the ontology requirements specification document. (NeOn)
Drawing tools, including paper and pencil
Focus drafting (optional):• Identify main domains and top concept• Establish relations between concepts and domains
Focus detail model:• Establish hierarchies• Establish specific relationships among defined
elements, rules, axioms, etc.
34
Do not try to define everything. You might change your mind during the implementation.
Ontology development – LLC example
35
Ontology developmentOntology search refers to the activity of finding candidate ontologies or ontology modules to be reused (NeOn).
Search tools:• General purpose:
• LOV: http://lov.okfn.org• LOD2Stats: http://stats.lod2.eu/vocabularies• Google• Others: ODP Portal http://ontologydesignpatterns.org
• Domain base:• Smart cities: http://smartcity.linkeddata.es/
Focus:• Terms already used in LOD• Save time and resources• Increase interoperability
Use domain terms and synonyms
Do not spend too much time trying to find terms
for everything. You might need to create them.
36
Ontology development – LLC exampleTerms and synonyms
37
Ontology developmentOntology Selection refers to the activity of choosing the most suitable ontologies or ontology modules among those available in an ontology repository or library, for a concrete domain of interest and associated tasks. (NeOn)
Evaluation tools:• OOPS! – OntOlogy pitfalls scanner [1] http://www.oeg-
upm.net/oops/• Triple checker http://graphite.ecs.soton.ac.uk/checker/
(already included in OOPS!)• Vapour http://validator.linkeddata.org/vapour (to be included
in OOPS!)Also it should be considered:
• Modelling issues (OOPS!, reasoners, manually review, etc.)• Domain coverage (based on the data to be represented)• Used in Linked Data (LOD2Stats, Sindice, etc)
Focus:• Assessment by Linked Data principles• Modelling issues• Domain coverage: data driven
[1] Poveda-Villalón, M., Suárez-Figueroa, M. C., & Gómez-Pérez, A. (2012). Validating ontologies with oops!. In Knowledge Engineering and Knowledge Management (pp. 267-281). Springer Berlin Heidelberg.
Further reference: NeOn Guidelines
38
Ontology development – LLC example
• Domain coverage• Schema.org for public places and provides some additional
terms and properties that can be used(e.g., PostalAddress and City)
• Also widely-known and accepted vocabulary interoperability
• Closer semantics• ero:FinalEnergy class from the Energy Resource and the
ssn:Property class from the SSN ontology in order to represent specific indicator for which the consumption is related to
39
Ontology developmentOntology Integration. It refers to the activity of including one ontology in another ontology. (NeOn)
Tools:• Ontology editors: Protégé, NeOn Toolkit, etc.
• Plug-ins: Ontology Module Extraction and Partition• Text editors for manual approach
Focus:• How much information should I reuse?• How to reuse the elements or vocabs? Preliminary analysis [1]
• Should I import another ontology?• Should I reference other ontology element URIs?
• ... replicating manually the URI?• ... merging ontologies?
• How to link them?
Techniques:• Import the ontology as a whole• Reuse some parts of the ontology (or ontology module)• Reuse statements
[1] Poveda-Villalón, M., Suárez-Figueroa, M. C., & Gómez-Pérez, A. The Landscape of Ontology Reuse in Linked Data. 1st Ontology Engineering in a Data-driven World (OEDW 2012) Workshop at the18th International Conference on Knowledge Engineering and Knowledge Management . Galway, Ireland, 9th October 2012. http://www.slideshare.net/MariaPovedaVillalon/mpoveda-oedw2012v1
40
Ontology developmentOntology Enrichment It refers to the activity of extending an ontology with new conceptual structures (e.g., concepts, roles and axioms). (NeOn)
Focus:• How should I create terms according to ontological foundations
and Linked Data principles?
Ontology development:• Ontology Development 101: A Guide to Creating Your First
Ontology [2]• Ontology Engineering Patterns
http://www.w3.org/2001/sw/BestPractices/• Extracting ontology conceptualization, formalization
techniques from existing methodologiesRecommendation
• Link to existing entities• Provide human readable documentation• Keep the semantics of the reused elements
[1] Natalya F. Noy and Deborah L. McGuinness. Ontology Development 101: A Guide to Creating Your First Ontology’. Stanford Knowledge Systems Laboratory Technical Report KSL-01-05 and Stanford Medical Informatics Technical Report SMI-2001-0880, March 2001.
Tools:• Ontology editors: Protégé, NeOn Toolkit, etc.
41
Ontology development – LLC example
42
Ontology developmentOntology Evaluation it refers to the activity of checking the technical quality of an ontology against a frame of reference. (NeOn)
Evaluation tools related to Linked Data principles:• OOPS! – OntOlogy pitfalls scanner [2] http://www.oeg-
upm.net/oops/• Triple checker http://graphite.ecs.soton.ac.uk/checker/
(already included in OOPS!)Evaluation tools/techniques other aspects:
• Modelling issues (OOPS!, reasoners, manually review, etc.)• Domain coverage (based on the data to be represented)• Application based (queries)• Syntax issues: validators
Focus:• Assessment by Linked Data principles• Modelling issues• Domain coverage: data driven
[1] Poveda-Villalón, M., Suárez-Figueroa, M. C., & Gómez-Pérez, A. (2012). Validating ontologies with oops!. In Knowledge Engineering and Knowledge Management (pp. 267-281). Springer Berlin Heidelberg.
43
Ontology development – LLC example
Minor, mostly lack of
annotations in reused
terms.
44
Table of Contents
1. Introduction2. Data preparation3. Ontology development4. Data generation
1. Data transformation2. Data linking
5. Discussion and Conclusions
45
Data transformation
• Transformation of the data to RDF
• Steps1. To select the RDF serialization
• RDF/XML, Turtle, N-Triples, JSON-LD2. To select a tool3. To transform the data4. To evaluate the obtained RDF data
• Syntax evaluation• Accuracy• Usage
46
Data transformation - Tools
47
Database to RDF Data streams to RDF• morph-RDB• D2R Server• TopBraid Composer
• morph-streams• D2R Server
Spreadsheets to RDF XML to RDF
• TopBraid Composer• Excel2RDF• RDF123• XLWrap• OpenRefine
• XML2RDF• TopBraid Composer• OpenRefine (GoogleRefine,
LODRefine)
Data transformation – LCC example
1. Turtle syntax
2. OpenRefine + RDF extension
48
Data transformation – LCC example: OpenRefine creating project
49
Data transformation – LCC example: OpenRefine adding columns
50
Data transformation – LCC example OpenRefine adding columns
51
Data transformation – LCC example OpenRefine column transformations
52
Data transformation – LCC example OpenRefine RDF extension
53
Data transformation – LCC example OpenRefine RDF extension
54
Data transformation – LCC example OpenRefine RDF extension
55
Data transformation – LCC example OpenRefine RDF extension
56
Data transformation – LCC example OpenRefine RDF extension
57
Data transformation – LCC example OpenRefine RDF extension
58
Data transformation – LCC example OpenRefine RDF extension
59
Data transformation – LCC example OpenRefine RDF generation
60
Data transformation – LCC example Evaluation
• Syntax evaluation
• Consistency with the ontologies
• Usage evaluation by running SPARQL queries• show all electricity consumptions and related time periods
for all council sites related to culture• show all energy consumptions and related time period of
council sites from Wakefield district
61
Table of Contents
1. Introduction2. Data preparation3. Ontology development4. Data generation
1. Data transformation2. Data linking
5. Discussion and Conclusions
62
Data linking
• Ensuring that data are not just “isolated islands”
• Steps1. To identify classes whose instances can be the
subject of linking2. To identify data sets that may contain instances
for the previously-identified classes3. To select the tools for performing the task4. To use the tool in order to obtain links
• Tools: LN2R, LD mapper, Silk, LIMES, RDF-AI, Serimi, OpenRefine
63
Data linking – LCC example
1. Classes: City, District2. Data sets: Dbpedia3. Tool: OpenRefine
64
Data linking – LCC example OpenRefine reconciliation
65
Data linking – LCC example OpenRefine reconciliation
66
Data linking – LCC example OpenRefine reconciliation
67
Data linking – LCC example OpenRefine reconciliation
68
Data linking – LCC example OpenRefine reconciliation
69
Table of Contents
1. Introduction2. Data preparation3. Ontology development4. Data generation5. Discussion and Conclusions
70
Discussion and Conclusions
• The guidelines are based on requirements from smart city stakeholders
• Address the broad scope of scenarios• Different data formats (databases, CSV, Excel, XML, etc.)• Update frequencies (static and dynamic data)• Legal and licensing issues
• Introduces a complete example
71
Radulovic, F., García-Castro, R., Poveda-Villalón, M., Weise, M., Tryferdis, T.: D4.1: Requirements and guidelines for energy data generation. Technical report, READY4SmartCities Consortium, May 2014
More information
72
Linked Data is just data
73
Benefits of linking data
74
Total electric consumption
Original data + geolocation
Total electric consumption in locations with population > 20.000
Original data + geolocation+ population
Benefits of reasoning
75
Total electric consumption in cultural buildings
Discussion and Conclusions
76
Discussion and Conclusions – Future work
• Development of services for facilitating the usage of Linked Data technology
• Support in adopting Linked Data technology
• Guidelines for publication and exploitation of Linked Data
• Summer school for 2015• Other training?
77
Linked Energy Data GenerationTutorial
Filip Radulovic, María Poveda Villalón, Raúl García-Castro
{fradulovic,mpoveda,rgarcia}@fi.upm.esETSI Informaticos
Universidad Politécnica de MadridCampus de Montegancedo s/n
28660 Boadilla del Monte, Madrid, Spain
Twitter: @LD4SC
02.10.2014. Sustainable Places 2014, Nice, France
top related