Feb 17, 2017
Smart data for smart meters
Wouter Beek
www.wouterbeek.com
Context• Energy label / Energy Performance of Buildings Directive (EPBD)• Possible values: A - G• Measurements are valid for 10 years.• Requirement when buying or renting a house.
EnergyLabels dataset in numbers• 2,354,560 entries• Energy index• Electricity consumption• Gas consumption
• License: Creative Commons 0• Dissemination date: 2012-11-05• Updated on a daily basis
• Issued by: Energielabels Agentschap NL• Related dataset?: Liander Open Data, approx. 1,250,000 entries.
Linked Open Data• Connect to existing datasets.• Connect to services.• Run queries across datasets.• Perform inference across datasets.• Easy to create mash-ups / new applications.
If it is cheap to do all of this,
only then will Linked Data be an enabler for large-scale innovation.
(disclaimer: this is a subjective claim)
Relational DBdomain knowledge
RDF files
Text filesambiguous
XML filesdepends on structure
domain knowledge
Link to external sources (linksets)domain knowledge needed
Domain-independent data conversionsfully automated
Simple RDF
Domain-dependent data conversionsdomain knowledge needed
Connect to services(e.g. query interface, maps)
high level of reuse
Fixing bad dataorigin inconsistencies
& inaccuracies
Technological contribution• From 3-star (published, open format) to 5-star (Linked Data, URI
identifiers, linked to BAG).• Stored in 2.6 GB XML document containing one (1!) line :-)• DOM is too big to hold in RAM.
• Convert to multi-line XML document.• XML2RDF conversion infrastructure:• Create a resource using primary/rigid properties.• Create triples for a resource
Application based on 5-star dataset• [4-star] How energy-efficient is your house relative to your
neighbourhood?• SPARQL query (30 min.)• HTML table (30 min.)• CSS heat-bar with JS tooltips (120 min.)• Deploy Web app (15 min.)
• [5-star] Calculate energy-efficiency relative to house surface.• Add a SPARQL query retrieving into the BAG.
Using Linked Data (Wouter’s Inbox)
Dear Wouter,
we gave the students of our Semantic Web class the link to the Kadaster information, and made them enthusiastic to use it. As a result several now have build their apps around this data. But now it has been offline for several days.
Cheers,Stefan.
Main difficulties (1/3)Technical difficulties due to arbitrary data formatting.• Publishing data in a sane way decreases the conversion costs
considerably.• In this use case: half of all the effort went into the 1 line XML...
Main difficulties (2/3)Institutional difficulties:• Data publication is a short-duration visible event.• Data maintenance is a long-duration invisible event.
“You can fool all the people some of the time, and some of the people all the time, but you cannot fool all the people all the time.”Abraham Lincoln
Let's make some substitutions here...
“All LOD datasets are offline some of the time, and some of the LOD datasets are offline all of the time, but not all LOD datasets are offline all of the time.”Wouter Beek
Main difficulties (3/3)Infrastructural difficulties:• Assuming that some LOD data is online some of the time, we must
explicitly represent the network of interconnected LOD datasets, institutions, and maintainers (DC, FOAF, VoID).• Anticipating malfunctioning datasets should be a standard part of the
development API.
ConclusionOnly when the technical, institutional, and infrastructural problems are solved will Linked Data become an enabler for large-scale innovation.