EarthBiAs2014 Global NEST University of the Aegean Dealing with Seman@c Heterogeneity in RealTime Informa@on Dr. Edward Curry Insight Centre for Data Analy@cs, Na@onal University of Ireland Galway Tuesday 8 th July 2014 711 July 2014, Rhodes, Greece EarthBiAs2014 1
110
Embed
Dealing with Semantic Heterogeneity in Real-Time Information
Tutorial at the EarthBiAs 2014 Summer School on Dealing with Semantic Heterogeneity in Real-Time Information
Part I: Large Scale Open Environments Part Ii: Computational Paradigms Part III: RDF Event Processing Part IV: Theory of Event Exchange Part V: Approaches to Semantic Decoupling Part VI: Example Application: Linked Energy Intelligence
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
EarthBiAs2014
Global NEST
University of the Aegean
Dealing with Seman@c Heterogeneity in Real-‐Time Informa@on
Dr. Edward Curry
Insight Centre for Data Analy@cs, Na@onal University of Ireland Galway
Tuesday 8th July 2014
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014 1
Talk Overview
• Part I: Large Scale Open Environments • Part Ii: ComputaKonal Paradigms • Part III: RDF Event Processing • Part IV: Theory of Event Exchange • Part V: Approaches to SemanKc Decoupling • Part VI: Example ApplicaKon: Linked Energy Intelligence
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
About Me
• PhD in Computer Science (NUI Galway)
• Green and Sustainable IT Research Group Leader in DERI/Insight NUI Galway
• Researcher in both Computer Science and InformaKon Systems
Overall Objective WATERNOMICS will provide personalised and actionable
information about water consumption and water availability to individual households, companies and cities in an intuitive
and effective manner at a time-scale relevant for decision making.
Project-‐Sense
Non-Technical Users
• Targets Occupants of the Building
• Non-Technical Office Workers
• No experience in Energy Management
• Low cost installation
Self-Configuration
• Collaborative system configuration
• Crowdsourced contextual data from building occupants
• Transtheoretical Model • Gamification • User Personalisation • Simple non-technical user
interfaces
Self-‐configuring smart energy management systems for small commercial buildings
7 European Data Forum 2014 BIG 318062
BIG Big Data Public Private Forum
7 BIG 318062
The BIG Project
BIG aims to promote a well-developed EU industrial landscape in Big Data: ▶ Providing a clear picture of existing technology trends and
their maturity ▶ Acquiring a sharp understanding of how Big Data can be
applied to concrete environments / use cases ▶ Pushing European Big Data research and innovation to
contribute in increasing European competitiveness ▶ Building a self-sustainable, industry-led initiative
Overall Objective
Work at technical, business and policy levels, shaping the future through the positioning of IIM and Big Data
specifically in Horizon 2020.
Bringing the necessary stakeholders into a self-sustainable industry-led initiative, which will greatly contribute to enhance the EU competitiveness taking
full advantage of Big Data technologies.
@BYTE_EU www.byte-project.eu
Big data roadmap and cross-‐disciplinarY community for addressing socieTal Externali9es
• The effects of a decision by stakeholders (e.g., governments, industry, scienKsts, policy-‐makers) that have an impact on a third party (especially members of the public).
• May be posiKve or negaKve
Economic
• Boost to the economy
• InnovaKon • Increase efficiency
• Smaller actors le] behind
• Shrink economies
Legal
• Privacy • Data protecKon • Data ownership • Copyright • Risks associated with inclusion & exclusion
Lots of Data “90% of the data in the world today has been created in the last two years alone” – IBM
“The bringing together of a vast amount of data from public and private sources […] is what
Big Data is all about” – IDC
Over the next few years we’ll see the adop@on of scalable frameworks and pla^orms for handling streaming, or near real-‐@me, analysis and processing.” – O’Reilly
Big Data represents a number of developments in technology that have
been brewing for years and are coming to a boil. They include an
explosion of data and new kinds of data, like from the Web and sensor
streams; [...].” – IDC
From Rigid Schemas to Schema-‐less
13
• Heterogeneous, complex and large-‐scale data • Very-‐large and dynamic “schemas” • Open Environments: distributed, decoupled data sources, anonymous
users, mulK-‐domain, lack of global order of informaKon flow
10s-‐100s aeributes
1,000s-‐1,000,000s aeributes
circa 2000
circa 2014
Fundamental DecentralizaKon
14
• MulKple perspecKves (conceptualizaKons) of the reality. • Ambiguity, vagueness, inconsistency.
Current Trends
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Small scale, controlled environments
Large scale, open environments
Informa@on sources 10s to 100s 1000s to millions
Data heterogeneity Small number of schemas High number of schemas
Users Small number Know the environment
Large number Not quite know the environment
Users organiza@on Users know each others Top-‐down hierarchies (e.g. enterprises)
Decoupled and distributed
Dynamism Low High (sources and users join and leave o]en)
Domain Domain specific Users interest range from domain specific to domain agnosKc
COMPUTATIONAL PARADIGMS
PART II
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
InformaKon Flow Processing (IFP)
• Users need to collect informaKon – Produced by mulKple distributed sources – For Kmely way processing – To extract knowledge asap
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014 Financial Continuous
Analytics RFID Inventory Management
Environmental Monitoring
InformaKon Flow Processing (IFP)
• Processing informaKon as it flows – No intermediate storage – New informaKon produced – Raw informaKon can be discarded
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
InformaKon Flow Processing Engine
Producers Consumers
Rule managers
CUGOLA, G. AND MARGARA, A., 2011. Processing flows of informaKon: From data stream to complex event processing. ACM Compu:ng Surveys Journal.
InformaKon Flow Processing (IFP)
• Requirements – Real-‐Kme or near real-‐Kme processing – Expressive language for rules – Scalability to large number of producers and consumers
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
ComputaKonal Paradigm
• Event Processing – Event: object represenKng a happening. – Deals with events and relaKons of events (e.g. inter-‐events sequencing, causality, etc.)
• Stream Processing – Stream: homogeneous and totally ordered set of data items. – Deals with streams and operaKons on streams (e.g. joins).
• Event “cloud” may contain steams of events as well as parKally ordered set of events.
– (Cugola & Margara, 2012)
• Event processing agents, network, and rules.
Event Processing Architecture
Producer
Producer E2
E3
E1
Rule
21 of 31
Event Processing Engine Consumer
Events Processing is Decoupled for Scalability
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Event Processing
Space
Time
SynchronizaKon Event source
Event consumer
Patrick Th. Eugster, Pascal A. Felber, Rachid Guerraoui, and Anne-‐Marie Kermarrec. 2003. The many faces of publish/subscribe. ACM Comput. Surv. 35, 2 (June 2003), 114-‐131.
AcKve Databases
• TradiKonal database systems – Passive – Store data and wait for user’s interacKon – ReacKve behaviour in the applicaKon layer
– DAYAL, U., BLAUSTEIN, B., BUCHMANN, A., CHAKRAVARTHY, U., HSU, M., LEDIN, R., MCCARTHY, D., ROSENTHAL, A., SARIN, S., CAREY, M. J., LIVNY, M., AND JAUHARI, R. 1988. The hipac project: Combining acKve databases and Kming constraints. SIGMOD Rec. 17, 1, 51–70.
– LIEUWEN, D. F., GEHANI, N. H., AND ARLEIN, R. M. 1996. The ode acKve database: Trigger semanKcs and implementaKon. In Proceedings of the 12th InternaKonal Conference on Data Engineering (ICDE’96). IEEE Computer Society, Los Alamitos, CA, 412–420.
– GATZIU, S. AND DITTRICH, K. 1993. Events in an acKve object-‐oriented database system. In Proceedings of the InternaKonal Workshop on Rules in Database Systems (RIDS), N. Paton and H. Williams, Eds. Workshops in CompuKng, Springer-‐Verlag, Edinburgh, U.K.
– CHAKRAVARTHY, S. AND ADAIKKALAVAN, R. 2008. Events and streams: Harnessing and unleashing their synergy! In Proceedings of the 2nd InternaKonal Conference on Distributed Event-‐Based Systems (DEBS’08). ACM, New York, NY, 1–12.
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
AcKve Databases
• ReacKve behaviour to database layer • Event-‐CondiKon-‐AcKon (ECA) rules
– Event: source. E.g. tuple inserted – CondiKon: post event. E.g. inserted.value > 5 – AcKon: what to do. E.g. modify the DB
• Cons – Persistent storage model – Suitable when updates not frequent and few rules
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Data Stream Management Systems
• Streams unbounded (not like tables) • No arrival order assumpKons • Typically no storage • Use conKnuous, or standing, queries • ReacKve in nature • CHANDRASEKARAN, S., COOPER, O., DESHPANDE, A., FRANKLIN, M. J., HELLERSTEIN, J. M., HONG, W., KRISHNAMURTHY, S.,
MADDEN, S. R., REISS, F., AND SHAH, M. A. 2003. Telegraphcq: ConKnuous dataflow processing. In Proceedings of the ACM SIGMOD InternaKonal Conference on Management of Data (SIGMOD’03). ACM, New York, NY, 668–668.
• CHEN, J., DEWITT, D. J., TIAN, F., AND WANG, Y. 2000. Niagaracq: A scalable conKnuous query system for Internet databases. SIGMOD Rec. 29, 2, 379–390.
• LIU, L., PU, C., AND TANG, W. 1999. ConKnual queries for internet scale event-‐driven informaKon delivery. IEEE Trans. Knowl. Data Eng. 11, 4, 610–628.
• ARASU, A., BABU, S., AND WIDOM, J. 2006. The CQL conKnuous query language: SemanKc foundaKons and query execuKon. VLDB J. 15, 2, 121–142.
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Data Stream Management Systems
• ConKnuous queries semanKcs – Answer: append only stream or update store – Exact or approximate answer
• Cons – Atomic item is the stream – Not possible to detect sequencing or causal paeerns
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Publish/Subscribe Systems
• InformaKon items are no:fica:on • Indirect addressing-‐based communicaKon scheme
• Ancestors – Message Passing – Remote Procedure Call (RPC) – Shared spaces – Message Queueing EUGSTER, P.T., FELBER, P.A., GUERRAOUI, R. AND KERMARREC, A.M., 2003. The many faces of publish/subscribe. ACM Compu:ng Surveys (CSUR), 35(2), pp.114–131. MUHL , G., FIEGE, L., AND PIETZUCH, P. 2006. Distributed Event-‐Based Systems. Springer
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Publish/Subscribe Systems
• One-‐to-‐many and many-‐to-‐many distribuKon mechanism – allows single producer to send a message to one user or potenKally hundreds of thousands of consumers
E. Curry, “Message-‐Oriented Middleware,” in Middleware for CommunicaKons, Q. H. Mahmoud, Ed. Chichester, England: John Wiley and Sons, 2004, pp. 1–28.
IntroducKon to Message-‐Oriented Middleware 28
Publish/Subscribe Systems
• Topic-‐based pub/sub – Topics are groups or channels – Events of a topic are sent to the topic’s subscribers ALTHERR, M., ERZBERGER, M., AND MAFFEIS, S. 1999. iBus—a so]ware bus middleware for the Java plavorm. In Proceedings of the InternaKonal Workshop on Reliable Middleware Systems. 43–53.
• Content-‐based pub/sub – Matching by message filters – Publishers and subscribers channels are defined by the content and the subscripKons
David S. Rosenblum and Alexander L. Wolf. 1997. A design framework for Internet-‐scale event observaKon and noKficaKon. SIGSOFT SoGw. Eng. Notes 22, 6 (November 1997), 344-‐360. DOI=10.1145/267896.267920 hep://doi.acm.org/10.1145/267896.267920
• Type-‐based pub/sub – Matching on type hierarchy EUGSTER, P. AND GUERRAOUI, R. 2001. Content based publish/subscribe with structural reflecKon. In Proceedings of the 6th Usenix Conference on Object-‐Oriented Technologies andSystems (COOTS’01).
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Complex Event Processing Systems
• DetecKon of complex paeerns – Sequencing – Causal – Ordering in general – Of mulKple events – And generate complex, or derived, events
LUCKHAM, D., 2002. The Power of Events: An Introduc:on to Complex Event Processing in Distributed Enterprise Systems, Addison-‐Wesley Professional.
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Complex Event Processing Systems
Adapted from CUGOLA, G. AND MARGARA, A., 2011. Processing flows of informaKon: From data stream to complex event processing. ACM Compu:ng Surveys Journal.
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
RDF EVENT PROCESSING
PART III
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Why Linked Data for the IoT?
• Many communiKes struggle with closed approaches – E.g., pervasive compuKng, embedded systems, IoT, ...
• Cyber-‐Physical Systems are inherently “open world” – Prof. David Karger (MIT) in his ESWC 2013 keynote: “Semantic Web technologies support and open world assumption where
millions of unforeseeable schemas may have to be integrated.”
• Simple integraKon with exisKng LOD data sets – Geo-‐spaKal, governmental, media, ...
• Manageable integraKon effort with other graph data, e.g., Google Knowledge Graph, Facebook Graph, etc.
EU ICT OpenIoT Project
Knowledge-Based Future Internet Step 2:
Sensor/Cloud Formulation
Step 1: Sensing-as-a-Service
Request
Step 3: Service Provisioning
(Utility Metrics)
Infrastructure’s provider(s) (e.g., Smart City)
OpenIoT User (Citizen, Corporate)
Domain #1
Domain #N
34
Middleware Core features:
Open Source
Linked Data
Cloud Computing
Internet of Things
IoT Management
Data Privacy and
Security
Mobility
and
Quality of Service
www.openiot.eu
EU ICT-2011.1.3 Contract No.: 287305
An Open Source Cloud Solution for the Internet of Things!
Open Source blueprint for large scale self-organizing cloud environments for IoT applications
Sensor Networks
• OpenIoT leverages the SoA on Internet of Things (IoT) RFID/WSN middleware frameworks.
• OpenIoT provides baseline service functionalities associated with registering and looking up internet-connected objects (ICOs) named things.
Projects using Linked Data for IoT Open Source IoT Architectural Blueprint http://www.openiot.eu/
https://github.com/OpenIotOrg/openiot
Real-Time IoT Stream Processing and Large-scale Data Analytics for Smart Cities http://www.ict-citypulse.eu/
Smart, secure and cost-effective integrated IoT deployments in smart cities http://vital-project.eu/
Behaviour-driven Autonomous Services for smart transportation in smart cities http://gambas-ict.eu/
THEORY OF EVENT EXCHANGE PART IV
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Problem
• Event producers and consumers are semanKcally coupled – Consumers need prior knowledge of event types, aeributes and values.
– Limits scalability in heterogeneous and dynamic environments due to explicit dependencies
– Difficult development of event processing subscripKons/rules in heterogeneous and dynamic environments.
Space
Time
Synch
Producer Consumer Semantic
Type Energy Consumption
Place Room 202e
Amount 40 kWh
Type Electricity Consumption
Loca@on Room 202e
Amount 70 kWh
Type Electricity Utilized
Venue Room 202e
Amount 600 kWh
e1
Event Producers e.g. Sensors
Type =“Energy Consumption” Place =“Room 202e”
Type =“Electricity Consumption” Location =“Room 202e”
Type =“Electricity Utilized” Venue =“Room 202e”
TradiKonal Event
Processing
e1
Consumer
e1 e2
e1 e3
Exact Matching Model
Type Energy Consumption
Place Room 202e
Amount 40 kWh
Type Electricity Consumption
Loca@on Room 202e
Amount 70 kWh
Type Electricity Utilized
Venue Room 202e
Amount 600 kWh
e1
Event Producers e.g. Sensors
e1
e1 e2
e1 e3
SemanKc Event
Processing
Type =“Energy Consumption”~ Location =“Room 202e”
Consumer
SemanKc Matching
How Good are Our Paradigms?
• Scale – Big volume – Big Velocity – Big Variety
• Distributed sources and consumers
• The big challenge is now in the exchange of knowledge at a very large-‐scale
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Shannon-‐Weaver Model
C. Shannon and W. Weaver. The mathemaKcal theory of communicaKon. University of Illinois Press, 1949.
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Cross-‐Boundaries Exchange
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
SyntacKc
SemanKc
PragmaKc
Producer Consumer
P. R. Carlile. Transferring, translaKng, and transforming: An integraKve framework for managing knowledge across boundaries. OrganizaKon science, 15(5):555{568, 2004.
Boundaries Open environment
Known environment
SyntacKc Boundary
• Transfer is the most common type of informaKon movement across this boundary
• A common lexicon exists – Move and process syntax (0’s and 1’s) – Dominant form of Shannon Weaver’s theory
• E.g. Different data models of events • E.g. Transfer RDF events over HTTP
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
SemanKc Boundary
• Common lexicon doesn’t exist • Lexicon evolve • AmbiguiKes exist • TranslaKon is the process to cross this boundary
• E.g. Different ontologies for sensors • E.g. Ontology alignment for RDF events
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
PragmaKc Boundary
• Actors on the sides of the boundary have: – Different contexts – Different perspecKves – Different interests
• TransformaKon is the process to cross this boundary
• E.g. Temp sensor reading of 35 celsius is acceptable from outdoor sensors but not from indoor
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Cross-‐Boundaries Exchange
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
SyntacKc
SemanKc
PragmaKc
Producer Consumer
Boundaries Open environment
Known environment
P. R. Carlile. Transferring, translaKng, and transforming: An integraKve framework for managing knowledge across boundaries. OrganizaKon science, 15(5):555{568, 2004.
Transfer-‐Translate-‐Transform
• Current approaches in event processing • Transfer
– Common event/language models • E.g. RDF over HTTP
• Translate – Agreements on schemas/thesauri/ontologies
• E.g. DERI Energy ontology for building energy events • Curry, Edward, et al. "Linking building data in the cloud: IntegraKng cross-‐domain building data using linked
• Transform – Dedicated enrichers, joins in event languages
• CQELS language for Linked Stream Data mashups
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Decoupling for Scalability
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Event Processing
Space
Time
SynchronizaKon Event source
Event consumer
Patrick Th. Eugster, Pascal A. Felber, Rachid Guerraoui, and Anne-‐Marie Kermarrec. 2003. The many faces of publish/subscribe. ACM Comput. Surv. 35, 2 (June 2003), 114-‐131.
SemanKc Coupling
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Event Processing Space
Time
SynchronizaKon Event source
Event consumer SemanKc Coupling
type, aTributes, values
APPROACHES TO SEMANTIC COUPLING Part V
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Loosening the SemanKc Coupling
• Approach 1: Content-‐Based with SemanKc Decoupling – A. Carzaniga, D. S. Rosenblum, and A. L. Wolf. Achieving scalability and expressiveness in an internet-‐scale event noK_caKon service. In Proceedings of the
nineteenth annual ACM symposium on Principles of distributed compuKng, pages 219-‐227. ACM, 2000.
• Approach 2: Content-‐Based with Implicit Shared Agreements • David S. Rosenblum and Alexander L. Wolf. 1997. A design framework for Internet-‐scale event observaKon and noKficaKon. SIGSOFT SoGw. Eng. Notes 22, 6
• Approach 3: Concept-‐Based – M. Petrovic, I. Burcea, and H.-‐A. Jacobsen. S-‐topss: semanKc toronto publish/subscribe system. In Proceedings of the 29th internaKonal
conference on Very large data bases -‐ Volume 29, VLDB '03, pages 1101-‐1104. VLDB Endowment, 2003.
• Approach 4: Loose SemanKc Coupling + ApproximaKon – Hasan, S. and Curry, E., 2014. Approximate SemanKc Matching of Events for The Internet of Things. ACM Transac:ons
on Internet Technology (TOIT). In Press
• Approach 5: Theme-‐Based
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Current Approaches
Semantic Decoupling
Effectiveness & Efficiency
Conte
nt-
bas
ed
Conce
pt-
bas
ed
Bot
tom
-up
Sem
anti
cs
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Approach 1: Content-‐Based with SemanKc Decoupling
• Very low detecKon rate – High false posiKves/negaKves – Low precision/recall
Producer Consumer
event
Seman@c De-‐Coupling
Happened
Publish: A Happened
Interested in
Subscribe: Interested in B
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Approach 1: Content-‐Based with SemanKc Decoupling
• Use many rules to improve detecKon – Time and effort – Affects scalability to heterogeneous environments
Producer Consumer
event
Seman@c De-‐Coupling
Happened
Publish: A Happened
Interested in
Subscribe: Interested in A Interested in B Interested in C
Approach 2: Content-‐Based with Implicit Shared Agreements
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Producer Consumer
event
Seman@c Coupling via Implicit Agreements
Happened
Publish: A Happened
Interested in
Subscribe: Interested in A
Face-‐to-‐face, or via documentaKon
Use symbol A to describe
Approach 2: Content-‐Based with Implicit Shared Agreements
• Implicit semanKcs – Top-‐down approach to semanKcs – Granular on the level of concepts
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Producer Consumer
event
Seman@c Coupling via Implicit Agreements
Happened
Publish: A Happened
Interested in
Subscribe: Interested in A
Approach 2: Content-‐Based with Implicit Shared Agreements
• Need for shared agreements – Time and effort – Affects scalability to heterogeneous environments
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Producer Consumer
event
Seman@c Coupling via Implicit Agreements
Happened
Publish: A Happened
Interested in
Subscribe: Interested in A
Approach 3: Concept-‐Based
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Producer Consumer
event
Seman@c Coupling via Ontologies
Happened
Publish: A Happened
Interested in
Subscribe: Interested in B
C D
B E
A F subClassOf
Approach 3: Concept-‐Based
• Explicit semanKcs – Top-‐down approach to semanKcs – Granular on the level of concepts
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Producer Consumer
event
Seman@c Coupling via Ontologies
Happened
Publish: A Happened
Interested in
Subscribe: Interested in B
Approach 3: Concept-‐Based
• Need for shared agreements – Time and effort – Affects scalability to heterogeneous environments
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Producer Consumer
event
Seman@c Coupling via Ontologies
Happened
Publish: A Happened
Interested in
Subscribe: Interested in B
• Most semanKc models have dealt with parKcular types of construcKons, and have been carried out under very simplifying assumpKons, in true lab condiKons.
• If these idealizaKons are removed it is not clear at all that modern semanKcs can give a full account of all but the simplest models/statements.
Sahlgren, 2013
Formal World
Real World
SemanKcs for a Complex World
67
Baroni et al. 2013
Distributional Semantic Model
• Distributional hypothesis: the context surrounding a given word in a text provides relevant information about its meaning.
• Simplified semantic model. – Associational and quantitative.
• Explicit Semantic Analysis (ESA) is the primary distributional model used in this work.
68
A wife is a female partner in a marriage. The term "wife" seems to be a close term to bride, the laeer is a female parKcipant in a wedding ceremony, while a wife is a married woman during her marriage. ...
DistribuKonal SemanKc Model
c1
child
husband spouse
cn
c2
function (number of times that the words occur in c1)
0.7
0.5
Commonsense is here
69 (Freitas, 2012)
SemanKc Relatedness
70
θ
c1
child
husband spouse
cn
c2
Works as a semantic ranking function
E.g. esa(room, building)= 0.099 E.g. esa(room, car)= 0.009 (Freitas, 2012)
Approach 4: Loose SemanKc Coupling + ApproximaKon
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Producer Consumer
event
Loose Seman@c Coupling via Large Text Corpora
Happened
Publish: A Happened
Interested in
Subscribe: Interested in B
A d1 d2 d3 d4 d5 d6 d7 d8 ….
B d1 d3 d4 d17 d25 d26 d77 d78 ….
~
(Hasan et al., 2004)
Approach 4: Loose SemanKc Coupling + ApproximaKon
• Boeom-‐up model of semanKcs • Global semanKcs: distribuKon vs. granular
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Producer Consumer
event
Loose Seman@c Coupling via Large Text Corpora
Happened
Publish: A Happened
Interested in
Subscribe: Interested in B ~
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Approach 4: Loose SemanKc Coupling + ApproximaKon
• Low cost to Scale to heterogeneous environments
• Slightly lower detecKon rate
Producer Consumer
event
Loose Seman@c Coupling via Large Text Corpora
Happened
Publish: A Happened
Interested in
Subscribe: Interested in B ~
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Approach 5: Theme-‐Based
• Can we exchange beeer approximaKons of meanings rather than mere symbols to improving detecKon rate?
Producer Consumer
event
Loose Seman@c Coupling via Large Text Corpora
Happened
Publish: A Happened
Interested in
Subscribe: Interested in B ~
(Hasan and Curry, 2014)
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Approach 5: Theme-‐Based
Producer Consumer
event
Loose Seman@c Coupling via Large Text Corpora
Happened
Publish: (A+T1)
Happened
Interested in
Subscribe: Interested in (B
+T2)
A d1 d2 d3 d4 d5 d6 d7 d8 ….
B d1 d3 d4 d17 d25 d26 d77 d78 ….
~
Theme T2
The ThemaKc Approach
• Exchange approximaKons of meanings
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Event
Publisher Alice
Consumer Bob
Theme the
Payload
Subscrip@on
Theme ths
Expression Approximate matcher
ParameterizaKon
Loose coupling mode: lightweight agreement on themes
No coupling mode: free use of well representaKve themes
Hasan, S. and Curry, E., 2014. ThemaKc Event Processing. Middleware 2014. Under review.
Event RepresentaKon
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Event
energy, appliances, building
type: increased energy consumpKon event, measurement unit: kilowae per hour,
device: computer, office: room 112
SubscripKon RepresentaKon
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Subscrip@on
power, computers
type= increased energy usage event~, device~= laptop~, office= room 112
ProbabilisKc Approximate Matcher
• Top-‐1 and Top-‐k mappings between an event and a subscripKon
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Building IoT So]ware
7-‐11 July 2014, Rhodes, Greece
Indexing
Collector
SemanKc relatedness web service
Textual corpus
Vector space index
Consumer Bob (user)
Publisher Alice Publish + thema:c tags
ThemaKc event processing engine(s)
Approximate single event matching
Subscribe + thema:c tags
IoT sensors
Terms + themes pairs
Relatedness score
Collector Publisher Carol Publish + thema:c tags
Collector Publisher Dave Publish + thema:c tags
Consumer Dan (applicaKon developer)
Consumer Erin (applicaKon developer)
Heterogeneous IoT Events
Relevant events
normalized for Bob
Subscribe + thema:c tags
Relevant events
normalized for Dan
Subscribe + thema:c tags
Relevant events
normalized for Erin
Summary
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Simple Content-‐based
Content-‐based + Many Rules
Concept-‐based
Simple Distribu@onal + Approxima@on
Thema@c
Matching exact string matching
exact string matching
Boolean semanKc matching
approximate semanKc matching
approximate semanKc matching
SemanKc Coupling
term-‐level full agreement
term-‐level full agreement
concept-‐level shared agreement
loose agreement loose agreement
SemanKcs not explicit not explicit top-‐down ontology-‐based
staKsKcal model based on distribuKonal semanKcs
staKsKcal model based on distribuKonal semanKcs + themes
EffecKveness very low 100% depends on the domains and number of concept models
depends on the corpus depends on the corpus + theme representaKves
Cost defining a small number of rules
defining a large number of rules
establishing shared agreement on ontologies
minimal agreement on a large textual corpus
minimal agreement on a large textual corpus + good theme representaKves
Efficiency high high medium to high medium to high Medium to high
EvaluaKon Dataset
• Seed events synthesized from IoT sensors • SmartSantander smart city project
– Luis Sanchez, Jos´e Antonio Galache, Veronica GuKerrez, JM Hernandez, J Bernat, Alex Gluhak, and Tom´as Garcia. 2011. SmartSantander: The meeKng point between Future Internet research and experimentaKon and the smart ciKes. In Future Network & Mobile Summit (FutureNetw), 2011. IEEE, 1–8.
• Sensor CapabiliKes – solar radiaKon, parKcles, speed, wind direcKon, wind speed, temperature, water ow, atmospheric pressure, noise, ozone, rainfall, parking, radiaKon par, co, ground temperature, light, no2, soil moisture tension, relaKve humidity, energy consumpKon, cpu usage, memory usage
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Hasan, S. and Curry, E., 2014. Approximate SemanKc Matching of Events for The Internet of Things. ACM Transac:ons on Internet Technology (TOIT). In Press
EvaluaKon Dataset
• Seed events synthesized from IoT sensors • Linked Energy Intelligence plavorm
– Edward Curry, Souleiman Hasan, and Sean O’Riain. 2012. Enterprise energy management using a linked dataspace for Energy Intelligence. In Sustainable Internet and ICT for Sustainability (SustainIT), 2012. IEEE, 1–6.
• Car brands from the yahoo directory – Yahoo! 2013. Yahoo! Directory: AutomoKve -‐ Makes and Models. (2013). hep://dir.yahoo.com/recreaKon/
automoKve/makes and models/
• Home based appliances from BLUED dataset – Kyle Anderson, Adrian Ocneanu, Diego Benitez, Derrick Carlson, Anthony Rowe, and Mario Berges. 2012. BLUED: A
Fully Labeled Public Dataset for Event-‐Based Non-‐Intrusive Load Monitoring Research. In Proc. SustKDD.
• Rooms from DERI Building – Richard Cyganiak. 2013. Rooms in the DERI building. (2013). hep://lab.linkeddata.deri.ie/2010/deri-‐rooms
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Hasan, S. and Curry, E., 2014. Approximate SemanKc Matching of Events for The Internet of Things. ACM Transac:ons on Internet Technology (TOIT). In Press
EvaluaKon
• FScore up to 95% and 1000s events/sec
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Hasan, S. and Curry, E., 2014. Approximate SemanKc Matching of Events for The Internet of Things. ACM Transac:ons on Internet Technology (TOIT). In Press
EXAMPLE APPLICATION: LINKED ENERGY INTELLIGENCE PART VI
usage of business processes, business line or group
94 of XYZ
Key Challenges
• Technology and Data Interoperability • Data scaeered among different systems • MulKple incompaKble technologies make it difficult to use
• InterpreKng Dynamic and StaKc Data • Sensors, ERP, BMS, assets databases, … • Need to proacKvely idenKfy efficiency opportuniKes
• Empowering AcKons and Including Users in the Loop • Understanding of direct and indirect impacts of acKviKes • Embedding impacts within business processes • Engaging Users
95
96
Building Data Center
Office IT Logistics
Corporate
Organisation-level
Business Process Personal-level
Linked dataspace for Energy Intelligence
Linked Energy Intelligence
Linked Energy Intelligence A
pplica
tions
Energy Analysis Model
Complex Events
Situation Awareness Apps
Energy and Sustainability Dashboards
Decision Support Systems
Linked
Dat
a
Support
Se
rvic
es
Entity Management
Service
Data Catalog
Complex Event Processing
Engine
Provenance Search & Query
Sourc
es
Adapter Adapter Adapter Adapter Adapter
n Cloud of Energy Data
n Linked Sensor Middleware
n Resource Description Framework (RDF)
n Semantic Sensor Networks n Constrained Application
Protocol (CoAP)
n Semantic Event Processing
n Collaborative Data Mgmt.
n Energy Saving Applications
n Energy Awareness
Curry E. et al, Enterprise Energy Management using a Linked dataspace for Energy Intelligence. In: The Second IFIP Conference on Sustainable Internet and ICT for Sustainability (SustainIT) 2012.
Energy Saving ApplicaKons
Enterprise Energy Observatory
Smart Buildings Green Cloud Computing
Office IT Energy Mgmt. Personal Energy Mgmt.
Building Energy Explorer
99 of 26
1. Data from Enterprise Linked Data Cloud
2. Sensor Data
3. Building Energy SituaKon Awareness
Energy Analysis by Group
iEnergy – Personal
@WATERNOMICS_EU www.waternomics.eu 102
Concrete Objectives
• To introduce demand response and accountability principles (water footprint) in the water sector
• To engage consumers in new interactive and personalized ways that bring water efficiency to the forefront and leads to changes in water behaviours
• To empower corporate decision makers and municipal area managers with a water information platform together with relevant tools and methodologies to enact ICT-enabled water management programs
• To promote ICT enabled water awareness using airports and water utilities as pilot examples
• To make possible new water pricing options and policy actions by combining water availability and consumption data
WATERNOMICS will provide personalised and actionable information on water consumption and water availability to individual households, companies and cities in an intuitive & effective manner at relevant time-scales for decision making
@WATERNOMICS_EU www.waternomics.eu 103
WATERNOMICS PLATFORM ARCHITECTURE Su
pport
Se
rvic
es
Sourc
es
Applica
tions
Water Analysis Model
Complex Events
Usage Model Water Dashboards
Entity Management
Service
Decision Support Systems
Linked
Wat
er
Dat
a
Data Catalog
Complex Event Processing
Engine Prediction Search &
Query
Adapter Adapter Adapter Adapter Adapter
▶ Water Management Apps ▶ Water Data Analysis and
Prediction ▶ Semantic Sensor
Networks and Complex Event Processing to aid Decision Making
▶ Linking of data from different Water Management Sustems using Linked Data / RDF
@WATERNOMICS_EU www.waternomics.eu 104
PILOT OVERVIEW
# Focus Location Intent Partner
1 Water utility for domestic users (Thermi)
To demonstrate, validate, and assess the WATERNOMICS Platform for domestic water users
2
Water Management Cycle in an airport (Milan Linate)
To demonstrate, validate, and assess the WATERNOMICS methodology and hardware innovations, and software/analysis results via the deployment of WATERNOMICS ICT
3
Water distribution in a Municipality (Sochaczew)
To validate and showcase the WATERNOMICS Platform at a municipal level (i.e. mixed use consumers supplied by a water utility)
Conclusions
• Coupling necessary for crossing boundaries • Decoupling necessary for scalable so]ware • Event-‐based systems do not address the coupling/decoupling tradeoff for semanKcs
• Approximate and themaKc event processing exchange approximaKons of meaning with loose semanKc coupling
• Collider – Souleiman Hasan, Kalpa Gunaratna, Yongrui Qin, and Edward Curry. 2013. Demo: approximate semanKc matching in
the collider event processing engine. In Proceedings of the 7th ACM interna:onal conference on Distributed event-‐based systems (DEBS '13). ACM, New York, NY, USA, 337-‐338. DOI=10.1145/2488222.2489277 hep://doi.acm.org/10.1145/2488222.2489277
• Easy ESA – EasyESA is an implementaKon of Explicit SemanKc Analysis (ESA) – hep://treo.deri.ie/easyesa/
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
References
• CUGOLA, G. AND MARGARA, A., 2011. Processing flows of informaKon: From data stream to complex event processing. ACM Compu:ng Surveys Journal.
• EUGSTER, P.T., FELBER, P.A., GUERRAOUI, R. AND KERMARREC, A.M., 2003. The many faces of publish/subscribe. ACM Compu:ng Surveys (CSUR), 35(2), pp.114–131.
• Carlile, Paul R. "Transferring, translaKng, and transforming: An integraKve framework for managing knowledge across boundaries." Organiza:on science15.5 (2004): 555-‐568.
• HASAN, S. AND CURRY, E., 2014. Approximate SemanKc Matching of Events for The Internet of Things. ACM Transac>ons on Internet Technology (TOIT). In Press
• HASAN, S., O’RIAIN, S. AND CURRY, E., 2013. TOWARDS UNIFIED AND NATIVE ENRICHMENT IN EVENT PROCESSING SYSTEMS. IN THE 7TH ACM INTERNATIONAL CONFERENCE ON DISTRIBUTED EVENT-‐BASED SYSTEMS (DEBS 2013). ARLINGTON, TEXAS, USA: ACM.
• HASAN, S., O’RIAIN, S. AND CURRY, E., 2012. Approximate SemanKc Matching of Heterogeneous Events. In 6th ACM Interna:onal Conference on Distributed Event-‐Based Systems (DEBS 2012). Berlin, Germany: ACM, pp. 252–263.
• HASAN, S. AND CURRY, E., 2014. ThemaKc Event Processing. Middleware 2014. Under review. • HASAN, S., CURRY, E., BANDUK, M., AND O’RIAIN, S. TOWARD SITUATION AWARENESS FOR THE SEMANTIC
SENSOR WEB: COMPLEX EVENT PROCESSING WITH DYNAMIC LINKED DATA ENRICHMENT. THE 4TH INTERNATIONAL WORKSHOP ON SEMANTIC SENSOR NETWORKS 2011 (SSN11), (2011), 60–72.
• E. Curry, “Message-‐Oriented Middleware,” in Middleware for CommunicaKons, Q. H. Mahmoud, Ed. Chichester, England: John Wiley and Sons, 2004, pp. 1–28.
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
More References
• P. McFedries, The coming data deluge, IEEE Spectrum, 2011. • CUGOLA, G. AND MARGARA, A., 2011. Processing flows of informaKon: From data stream to complex event processing. ACM Compu:ng
Surveys Journal. • EUGSTER, P.T., FELBER, P.A., GUERRAOUI, R. AND KERMARREC, A.M., 2003. The many faces of publish/subscribe. ACM Compu:ng Surveys
(CSUR), 35(2), pp.114–131. • LUCKHAM, D., 2002. The Power of Events: An Introduc:on to Complex Event Processing in Distributed Enterprise Systems, Addison-‐Wesley
Professional. • DAYAL, U., BLAUSTEIN, B., BUCHMANN, A., CHAKRAVARTHY, U., HSU, M., LEDIN, R., MCCARTHY, D., ROSENTHAL, A., SARIN, S., CAREY,
M. J., LIVNY, M., AND JAUHARI, R. 1988. The hipac project: Combining acKve databases and Kming constraints. SIGMOD Rec. 17, 1, 51–70.
• LIEUWEN, D. F., GEHANI, N. H., AND ARLEIN, R. M. 1996. The ode acKve database: Trigger semanKcs and implementaKon. In Proceedings of the 12th InternaKonal Conference on Data Engineering (ICDE’96). IEEE Computer Society, Los Alamitos, CA, 412–420.
• GATZIU, S. AND DITTRICH, K. 1993. Events in an acKve object-‐oriented database system. In Proceedings of the InternaKonal Workshop on Rules in Database Systems (RIDS), N. Paton and H. Williams, Eds. Workshops in CompuKng, Springer-‐Verlag, Edinburgh, U.K.
• CHAKRAVARTHY, S. AND ADAIKKALAVAN, R. 2008. Events and streams: Harnessing and unleashing their synergy! In Proceedings of the 2nd InternaKonal Conference on Distributed Event-‐Based Systems (DEBS’08). ACM, New York, NY, 1–12.
• CHANDRASEKARAN, S., COOPER, O., DESHPANDE, A., FRANKLIN, M. J., HELLERSTEIN, J. M., HONG, W., KRISHNAMURTHY, S., MADDEN, S. R., REISS, F., AND SHAH, M. A. 2003. Telegraphcq: ConKnuous dataflow processing. In Proceedings of the ACM SIGMOD InternaKonal Conference on Management of Data (SIGMOD’03). ACM, New York, NY, 668–668.
• CHEN, J., DEWITT, D. J., TIAN, F., AND WANG, Y. 2000. Niagaracq: A scalable conKnuous query system for Internet databases. SIGMOD Rec. 29, 2, 379–390.
• LIU, L., PU, C., AND TANG, W. 1999. ConKnual queries for internet scale event-‐driven informaKon delivery. IEEE Trans. Knowl. Data Eng. 11, 4, 610–628.
• ARASU, A., BABU, S., AND WIDOM, J. 2006. The CQL conKnuous query language: SemanKc foundaKons and query execuKon. VLDB J. 15, 2, 121–142.
• MUHL , G., FIEGE, L., AND PIETZUCH, P. 2006. Distributed Event-‐Based Systems. Springer • ALTHERR, M., ERZBERGER, M., AND MAFFEIS, S. 1999. iBus—a so]ware bus middleware for the Java plavorm. In Proceedings of the
InternaKonal Workshop on Reliable Middleware Systems. 43–53..
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
More References
• David S. Rosenblum and Alexander L. Wolf. 1997. A design framework for Internet-‐scale event observaKon and noKficaKon. SIGSOFT SoGw. Eng. Notes 22, 6 (November 1997), 344-‐360. DOI=10.1145/267896.267920 hep://doi.acm.org/10.1145/267896.267920
• EUGSTER, P. AND GUERRAOUI, R. 2001. Content based publish/subscribe with structural reflecKon. In Proceedings of the 6th Usenix Conference on Object-‐Oriented Technologies andSystems (COOTS’01).
• C. Shannon and W. Weaver. The mathemaKcal theory of communicaKon. University of Illinois Press, 1949. • P. R. Carlile. Transferring, translaKng, and transforming: An integraKve framework for managing knowledge across boundaries.
OrganizaKon science, 15(5):555{568, 2004. • Curry, Edward, Souleiman Hasan, and Seán O'Riain. "Enterprise energy management using a linked dataspace for energy
intelligence." Sustainable Internet and ICT for Sustainability (SustainIT), 2012. IEEE, 2012. • Curry, Edward, et al. "Linking building data in the cloud: IntegraKng cross-‐domain building data using linked data." Advanced
Engineering Informa:cs 27.2 (2013): 206-‐219. • Patrick Th. Eugster, Pascal A. Felber, Rachid Guerraoui, and Anne-‐Marie Kermarrec. 2003. The many faces of publish/subscribe. ACM
Comput. Surv. 35, 2 (June 2003), 114-‐131. • A. Carzaniga, D. S. Rosenblum, and A. L. Wolf. Achieving scalability and expressiveness in an internet-‐scale event noK_caKon service. In
Proceedings of the nineteenth annual ACM symposium on Principles of distributed compuKng, pages 219{227. ACM, 2000. • M. Petrovic, I. Burcea, and H.-‐A. Jacobsen. S-‐topss: semanKc toronto publish/subscribe system. In Proceedings of the 29th internaKonal
conference on Very large data bases -‐ Volume 29, VLDB '03, pages 1101-‐1104. VLDB Endowment, 2003. • Luis Sanchez, Jos´e Antonio Galache, Veronica GuKerrez, JM Hernandez, J Bernat, Alex Gluhak, and Tom´as Garcia. 2011.
SmartSantander: The meeKng point between Future Internet research and experimentaKon and the smart ciKes. In Future Network & Mobile Summit (FutureNetw), 2011. IEEE, 1–8.
• Edward Curry, Souleiman Hasan, and Sean O’Riain. 2012. Enterprise energy management using a linked dataspace for Energy Intelligence. In Sustainable Internet and ICT for Sustainability (SustainIT), 2012. IEEE, 1–6.
• Yahoo! 2013. Yahoo! Directory: AutomoKve -‐ Makes and Models. (2013). hep://dir.yahoo.com/recreaKon/ automoKve/makes and models/
• Kyle Anderson, Adrian Ocneanu, Diego Benitez, Derrick Carlson, Anthony Rowe, and Mario Berges. 2012. BLUED: A Fully Labeled Public Dataset for Event-‐Based Non-‐Intrusive Load Monitoring Research. In Proc. SustKDD.
• Richard Cyganiak. 2013. Rooms in the DERI building. (2013). hep://lab.linkeddata.deri.ie/2010/deri-‐rooms
7-‐11 July 2014, Rhodes, Greece EarthBiAs2014
Credits
Green and Sustainable IT Group at Insight Galway for all their hard work. Special thanks to Souleiman Hasan for his assistance with the Tutorial Andre Freitas – Slides on DistribuKonal SemanKcs Prof. Manfred Hauswirth and USM at Insight Galway (LSM, OpenIoT, etc..)