Linked Energy Data Generation

Post on 22-Nov-2014

331 Views

Category:

Education

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Slides from our tutorial on Linked Data generation in the energy domain, presented at the Sustainable Places 2014 conference on October 2nd in Nice, France

Transcript

Linked Energy Data GenerationTutorial

Filip Radulovic, María Poveda Villalón, Raúl García-Castro

{fradulovic,mpoveda,rgarcia}@fi.upm.esETSI Informaticos

Universidad Politécnica de MadridCampus de Montegancedo s/n

28660 Boadilla del Monte, Madrid, Spain

Twitter: @LD4SC

02.10.2014. Sustainable Places 2014, Nice, France

License

• This work is licensed under the Creative Commons Attribution – Non Commercial – Share Alike License

• You are free:• to Share — to copy, distribute and transmit the work• to Remix — to adapt the work

• Under the following conditions• Attribution — You must attribute the work by inserting

• “[source http://www.oeg-upm.net/]” at the footer of each reused slide

• a credits slide stating: “These slides are partially based on “Linked Energy Data Generation” by F. Radulovic, M. Poveda-Villalón, R. García-Castro”

• Non-commercial• Share-Alike

2

Table of Contents

1. Introduction2. Data preparation3. Ontology development4. Data generation5. Discussion and Conclusions

3

Classic Web

CIA WorldFactBookWikipedia

Data exposed to the Web via HTML

Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig4

Classic Web

• Typical web page markup consists of:• Rendering information

(e.g., font size and colour)

• Hyper-links to related content

• Semantic content is accessible to humans but not (easily) to computers…

5

Classic Web

Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig

Information from single pages can be found via search engines

6

CIA WorldFactBookMovieDB

Classic Web

Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig

What about complex queries over multiple pages / data sources?

Show me a picture of the tallest building in the country with the highest

CO2 emission rate in 2013

Impossible

7

CIA WorldFactBookMovieDB

Classic WebWhat about complex queries over multiple pages / data sources?

Show me a picture of the tallest building in the country with the highest

CO2 emission rate in 2013?

Impossible

8

What do we actually want?• Use the Web like a single global database

• Move from a Web of documents to a Web of Data

Slide adapted from Boris Villazón Terrazas and “5min Introduction to Linked Data”- Olaf Hartig

Wikipedia CIA WorldFactBook

Shanghai Tower 2013-8-3CC BY-SA 3.0

9

Linked Data enables such Web of Data

Slide adapted from Boris Villazón Terrazas and “5min Introduction to Linked Data”- Olaf Hartig

Global Identifier: URI (Uniform Resource Identifier) identifies a resource on the Internet.Data Model: RDF (Resource Description Framework) standard model for data interchange on the Web.Access Mechanism: HTTPConnection: Typed Links

Wikipedia CIA WorldFactBook

Shanghai Tower 2013-8-3CC BY-SA 3.0

http://cia.../China 10000…http://...wikipedia.../data/shangaiTower

http://.../co2emission

http

://...

/dep

ictio

n

2013

http://.../co2emissionPerYearhttp://.../location

http://.../location

http://.../year

http://…#sameAs

10

The four principles (Tim Berners Lee, 2006)

1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those

names. 3. When someone looks up a URI, provide useful

information, using the standards (RDF*, SPARQL) 4. Include links to other URIs, so that they can discover

more things.

http://www.w3.org/DesignIssues/LinkedData.html11

“The Semantic Web is an extension of the current Web in which information is

given well-defined meaning, better enabling computers and people to work in

cooperation.

It is based on the idea of having data on the Web defined and linked such that it

can be used for more effective discovery, automation, integration, and reuse

across various applications.”

Hendler, J., Berners-Lee, T., and Miller, E. Integrating Applications on the Semantic Web, 2002, http://www.w3.org/2002/07/swint.html

Semantic Web definition

12

Benefits + Cases of success

• Provide semantics meaningful data & common understanding• Interoperability

• Reasoning power• Infer more data • Find mistakes in the original data?

• Enrich your data (with what is already out there)• Search engines are indexing some schemas

• Increase visibility• Multilingual information

13

In this tutorial“D4.1 Requirements and guidelines for energy data generation” From READY4SmartCities project available at http://goo.gl/IWDmYy

14

Table of Contents

1. Introduction2. Data preparation

1. Select data source2. Obtain access to data source3. Analyse licensing of the data source4. Analyse data source

3. Ontology development4. Data generation5. Discussion and Conclusions

15

Select data source

• Selecting the data source that will be transformed into Linked Data

• Steps1. To define the requirements2. To select one or several data sources

• Alternatives:• Data set from your own organization• Data sourced not owned by your organization (external data

sources)

16

Select data source – LCC example

• Limitation to external data sources (search)1. Requirements

• Real-world scenario in the energy domain • Available for use• Available in machine-processable format (the

more structured the data are, the better)• Can be linked with generic entities (e.g., location)

2. Leeds City Council – energy consumption (http://data.gov.uk/dataset/council-energy-consumption)

17

Table of Contents

1. Introduction2. Data preparation

1. Select data source2. Obtain access to data source3. Analyse licensing of the data source4. Analyse data source

3. Ontology development4. Data generation5. Discussion and Conclusions

18

Obtain access to data source

• Data access means • technical means to retrieve the data• legal rights to use the data

• In some cases, data source might not be accessible

• Steps1. To identify the person to contact2. To request the access3. To obtain access and to retrieve the data

• Access alternatives: files, programming interface, database, data streams, etc.

19

Obtain access to data source – LCC example

• Data set already available for download

• Available in a CSV file

20

Table of Contents

1. Introduction2. Data preparation

1. Select data source2. Obtain access to data source3. Analyse licensing of the data source4. Analyse data source

3. Ontology development4. Data generation5. Discussion and Conclusions

21

Analysing licensing of the data source

• Licenses specify the legal terms under which a data set can be used and exploited

• Steps1. To identify the publisher2. To find the applicable license

• Web page, data set metadata, data itself• Contact the publisher

3. To read the license and determine legal terms

• Tips• Analysis should be performed upon all available copies of

the data• Ensure compatible licences between several data sources

22

Analyse licensing – LCC example

23

Table of Contents

1. Introduction2. Data preparation

1. Select data source2. Obtain access to data source3. Analyse licensing of the data source4. Analyse data source

3. Ontology development4. Data generation5. Discussion and Conclusions

24

Analyse data source

• Getting insight into data structure and organization

• Steps1. To analyse the characteristics of the data

• Data values, data ranges, etc.

2. To obtain the schema of the data• Description of concepts and their relationships

• Data format alternatives: • Structured data• Unstructured data

• Tip: Use standard modeling language for data schema (e.g., UML)

25

Analyse data source – LCC example

• Electricity, gas and oil consumptions as decimal values

• 1-year intervals - 2010/11, 2011/12, 2012/13• Different types of council sites (mostly buildings) • Full address provided (street, city, district)• Correspondence with people from LCC open data

26

Table of Contents

1. Introduction2. Data preparation3. Ontology development4. Data generation5. Discussion and Conclusions

27

Ontology development - Preparation• RDF – Resource Description Framework

• Data model• (subject-predicate-object)

• Resource naming strategy• For terms

• Pattern: http://smartcity.linkeddata.es/lcc/ontology/EnergyConsumption#myterm

• Example: http://smartcity.linkeddata.es/lcc/ontology/EnergyConsumption#hasQuantitiveValue

• For individuals• Pattern:

http://smartcity.linkeddata.es/lcc/resource/LeisureCentre/myIndividual• Example:

http://smartcity.linkeddata.es/lcc/resource/LeisureCentre/LeisureCentreWetJohnCharlesCentreforSport

• RDF syntaxes• RDF/XML, ttl, N3, N quads

28

Ontology development

[1] Suárez-Figueroa, M.C. PhD Thesis: NeOn Methodology for Building Ontology Networks: Specification, Scheduling and Reuse. Spain. June 2010.

Activity definition taken from [1]

Focus of each activity

Existing tools to carry out the activity

Tips, alternatives and references

29

Ontology developmentOntology Requirements: refers to the activity of collecting the requirements that the ontology should fulfil (for example, reasons to build the ontology, identification of target groups and intended uses). (NeOn)

30

Proposed references: - NeOn Guidelines for non functional

requirements.-Competency Questions technique

[1]

Tools: mind map, text editor, etc

[1] Gruninger, M., Fox, M. S. The role of competency questions in enterprise engineering. In Proceedings of the IFIP WG5.7 Workshop on Benchmarking - Theory and Practice, Trondheim, Norway, 1994.

Ontology development – LLC exampleLCC example (Data from….)

Non functional requirements specified:• The ontology will try to adopt concepts and design patterns

in other ontologies where possible• The ontology should be implemented in OWL 2 DL

31

Ontology developmentOntology term extraction to extract a glossary of terms that may be developed.

Tools for terminology extraction:• Identify nouns, verbs, etc.

• Tools: Freeling for free text

Focus:• Extract terminology from Competency Questions (NeOn)• Extract terminology directly from the data

• Expert advise || Done by experts

32

Complete the list with synonyms

Ontology development – LLC example

Siteplace

Address

PostCode

ElectricityConsumption, utilization

yearstime

33

Ontology developmentOntology conceptualization refers to the activity of organizing and structuring the information (data, knowledge, etc.), obtained during the acquisition process, into meaningful models at the knowledge level and according to the ontology requirements specification document. (NeOn)

Drawing tools, including paper and pencil

Focus drafting (optional):• Identify main domains and top concept• Establish relations between concepts and domains

Focus detail model:• Establish hierarchies• Establish specific relationships among defined

elements, rules, axioms, etc.

34

Do not try to define everything. You might change your mind during the implementation.

Ontology development – LLC example

35

Ontology developmentOntology search refers to the activity of finding candidate ontologies or ontology modules to be reused (NeOn).

Search tools:• General purpose:

• LOV: http://lov.okfn.org• LOD2Stats: http://stats.lod2.eu/vocabularies• Google• Others: ODP Portal http://ontologydesignpatterns.org

• Domain base:• Smart cities: http://smartcity.linkeddata.es/

Focus:• Terms already used in LOD• Save time and resources• Increase interoperability

Use domain terms and synonyms

Do not spend too much time trying to find terms

for everything. You might need to create them.

36

Ontology development – LLC exampleTerms and synonyms

37

Ontology developmentOntology Selection refers to the activity of choosing the most suitable ontologies or ontology modules among those available in an ontology repository or library, for a concrete domain of interest and associated tasks. (NeOn)

Evaluation tools:• OOPS! – OntOlogy pitfalls scanner [1] http://www.oeg-

upm.net/oops/• Triple checker http://graphite.ecs.soton.ac.uk/checker/

(already included in OOPS!)• Vapour http://validator.linkeddata.org/vapour (to be included

in OOPS!)Also it should be considered:

• Modelling issues (OOPS!, reasoners, manually review, etc.)• Domain coverage (based on the data to be represented)• Used in Linked Data (LOD2Stats, Sindice, etc)

Focus:• Assessment by Linked Data principles• Modelling issues• Domain coverage: data driven

[1] Poveda-Villalón, M., Suárez-Figueroa, M. C., & Gómez-Pérez, A. (2012). Validating ontologies with oops!. In Knowledge Engineering and Knowledge Management (pp. 267-281). Springer Berlin Heidelberg.

Further reference: NeOn Guidelines

38

Ontology development – LLC example

• Domain coverage• Schema.org for public places and provides some additional

terms and properties that can be used(e.g., PostalAddress and City)

• Also widely-known and accepted vocabulary interoperability

• Closer semantics• ero:FinalEnergy class from the Energy Resource and the

ssn:Property class from the SSN ontology in order to represent specific indicator for which the consumption is related to

39

Ontology developmentOntology Integration. It refers to the activity of including one ontology in another ontology. (NeOn)

Tools:• Ontology editors: Protégé, NeOn Toolkit, etc.

• Plug-ins: Ontology Module Extraction and Partition• Text editors for manual approach

Focus:• How much information should I reuse?• How to reuse the elements or vocabs? Preliminary analysis [1]

• Should I import another ontology?• Should I reference other ontology element URIs?

• ... replicating manually the URI?• ... merging ontologies?

• How to link them?

Techniques:• Import the ontology as a whole• Reuse some parts of the ontology (or ontology module)• Reuse statements

[1] Poveda-Villalón, M., Suárez-Figueroa, M. C., & Gómez-Pérez, A. The Landscape of Ontology Reuse in Linked Data. 1st Ontology Engineering in a Data-driven World (OEDW 2012) Workshop at the18th International Conference on Knowledge Engineering and Knowledge Management . Galway, Ireland, 9th October 2012. http://www.slideshare.net/MariaPovedaVillalon/mpoveda-oedw2012v1

40

Ontology developmentOntology Enrichment It refers to the activity of extending an ontology with new conceptual structures (e.g., concepts, roles and axioms). (NeOn)

Focus:• How should I create terms according to ontological foundations

and Linked Data principles?

Ontology development:• Ontology Development 101: A Guide to Creating Your First

Ontology [2]• Ontology Engineering Patterns

http://www.w3.org/2001/sw/BestPractices/• Extracting ontology conceptualization, formalization

techniques from existing methodologiesRecommendation

• Link to existing entities• Provide human readable documentation• Keep the semantics of the reused elements

[1] Natalya F. Noy and Deborah L. McGuinness. Ontology Development 101: A Guide to Creating Your First Ontology’. Stanford Knowledge Systems Laboratory Technical Report KSL-01-05 and Stanford Medical Informatics Technical Report SMI-2001-0880, March 2001.

Tools:• Ontology editors: Protégé, NeOn Toolkit, etc.

41

Ontology development – LLC example

42

Ontology developmentOntology Evaluation it refers to the activity of checking the technical quality of an ontology against a frame of reference. (NeOn)

Evaluation tools related to Linked Data principles:• OOPS! – OntOlogy pitfalls scanner [2] http://www.oeg-

upm.net/oops/• Triple checker http://graphite.ecs.soton.ac.uk/checker/

(already included in OOPS!)Evaluation tools/techniques other aspects:

• Modelling issues (OOPS!, reasoners, manually review, etc.)• Domain coverage (based on the data to be represented)• Application based (queries)• Syntax issues: validators

Focus:• Assessment by Linked Data principles• Modelling issues• Domain coverage: data driven

[1] Poveda-Villalón, M., Suárez-Figueroa, M. C., & Gómez-Pérez, A. (2012). Validating ontologies with oops!. In Knowledge Engineering and Knowledge Management (pp. 267-281). Springer Berlin Heidelberg.

43

Ontology development – LLC example

Minor, mostly lack of

annotations in reused

terms.

44

Table of Contents

1. Introduction2. Data preparation3. Ontology development4. Data generation

1. Data transformation2. Data linking

5. Discussion and Conclusions

45

Data transformation

• Transformation of the data to RDF

• Steps1. To select the RDF serialization

• RDF/XML, Turtle, N-Triples, JSON-LD2. To select a tool3. To transform the data4. To evaluate the obtained RDF data

• Syntax evaluation• Accuracy• Usage

46

Data transformation - Tools

47

Database to RDF Data streams to RDF• morph-RDB• D2R Server• TopBraid Composer

• morph-streams• D2R Server

Spreadsheets to RDF XML to RDF

• TopBraid Composer• Excel2RDF• RDF123• XLWrap• OpenRefine

• XML2RDF• TopBraid Composer• OpenRefine (GoogleRefine,

LODRefine)

Data transformation – LCC example

1. Turtle syntax

2. OpenRefine + RDF extension

48

Data transformation – LCC example: OpenRefine creating project

49

Data transformation – LCC example: OpenRefine adding columns

50

Data transformation – LCC example OpenRefine adding columns

51

Data transformation – LCC example OpenRefine column transformations

52

Data transformation – LCC example OpenRefine RDF extension

53

Data transformation – LCC example OpenRefine RDF extension

54

Data transformation – LCC example OpenRefine RDF extension

55

Data transformation – LCC example OpenRefine RDF extension

56

Data transformation – LCC example OpenRefine RDF extension

57

Data transformation – LCC example OpenRefine RDF extension

58

Data transformation – LCC example OpenRefine RDF extension

59

Data transformation – LCC example OpenRefine RDF generation

60

Data transformation – LCC example Evaluation

• Syntax evaluation

• Consistency with the ontologies

• Usage evaluation by running SPARQL queries• show all electricity consumptions and related time periods

for all council sites related to culture• show all energy consumptions and related time period of

council sites from Wakefield district

61

Table of Contents

1. Introduction2. Data preparation3. Ontology development4. Data generation

1. Data transformation2. Data linking

5. Discussion and Conclusions

62

Data linking

• Ensuring that data are not just “isolated islands”

• Steps1. To identify classes whose instances can be the

subject of linking2. To identify data sets that may contain instances

for the previously-identified classes3. To select the tools for performing the task4. To use the tool in order to obtain links

• Tools: LN2R, LD mapper, Silk, LIMES, RDF-AI, Serimi, OpenRefine

63

Data linking – LCC example

1. Classes: City, District2. Data sets: Dbpedia3. Tool: OpenRefine

64

Data linking – LCC example OpenRefine reconciliation

65

Data linking – LCC example OpenRefine reconciliation

66

Data linking – LCC example OpenRefine reconciliation

67

Data linking – LCC example OpenRefine reconciliation

68

Data linking – LCC example OpenRefine reconciliation

69

Table of Contents

1. Introduction2. Data preparation3. Ontology development4. Data generation5. Discussion and Conclusions

70

Discussion and Conclusions

• The guidelines are based on requirements from smart city stakeholders

• Address the broad scope of scenarios• Different data formats (databases, CSV, Excel, XML, etc.)• Update frequencies (static and dynamic data)• Legal and licensing issues

• Introduces a complete example

71

Radulovic, F., García-Castro, R., Poveda-Villalón, M., Weise, M., Tryferdis, T.: D4.1: Requirements and guidelines for energy data generation. Technical report, READY4SmartCities Consortium, May 2014

More information

72

Linked Data is just data

73

Benefits of linking data

74

Total electric consumption

Original data + geolocation

Total electric consumption in locations with population > 20.000

Original data + geolocation+ population

Benefits of reasoning

75

Total electric consumption in cultural buildings

Discussion and Conclusions

76

Discussion and Conclusions – Future work

• Development of services for facilitating the usage of Linked Data technology

• Support in adopting Linked Data technology

• Guidelines for publication and exploitation of Linked Data

• Summer school for 2015• Other training?

77

Linked Energy Data GenerationTutorial

Filip Radulovic, María Poveda Villalón, Raúl García-Castro

{fradulovic,mpoveda,rgarcia}@fi.upm.esETSI Informaticos

Universidad Politécnica de MadridCampus de Montegancedo s/n

28660 Boadilla del Monte, Madrid, Spain

Twitter: @LD4SC

02.10.2014. Sustainable Places 2014, Nice, France

top related