Nuxeo & Semantic Technologies Stefane Fermigier - Nuxeo The Web - May 2011 Wednesday, May 25, 2011
May 11, 2015
Nuxeo & Semantic TechnologiesStefane Fermigier - Nuxeo
The Web - May 2011
Wednesday, May 25, 2011
Agenda
• A pragmatic introduction to the Semantic Web
• Experience report and demos from Nuxeo
Wednesday, May 25, 2011
1. Introduction to the Semantic Web
Wednesday, May 25, 2011
Prelude
Wednesday, May 25, 2011
Source: Mills Davis, “Semantic Social Computing”, sept. 2007Wednesday, May 25, 2011
Source: Mills Davis, “Semantic Social Computing”, sept. 2007Wednesday, May 25, 2011
Source: Mills Davis, “Semantic Social Computing”, sept. 2007Wednesday, May 25, 2011
Source: Mills Davis, “Semantic Social Computing”, sept. 2007Wednesday, May 25, 2011
History
Wednesday, May 25, 2011
Wednesday, May 25, 2011
Invented the web in 1989(yeah!)
Wednesday, May 25, 2011
Invented the web in 1989(yeah!)
Invented the semantic web in 1994 (duh?)
Wednesday, May 25, 2011
Historical perspective
• From web 1.0: web of sites and pages, aka the World Wide Web
• To web 2.0: web of people and of participation, aka the Social Web (Blogs, RSS, tags, Facebook, Wikipedia, etc.)
• To web 3.0: web of data, of meaning and connected knowledge, aka the Semantic Web
Wednesday, May 25, 2011
Semantics & Ontologies
Wednesday, May 25, 2011
Wednesday, May 25, 2011
Wednesday, May 25, 2011
Wednesday, May 25, 2011
Wednesday, May 25, 2011
Some examples
• FOAF: relationships between people (social network)
• SIOC: relationships between websites, articles, blogs, comments
• Rich Snippets: syndicate RDFa content for SEO by Google, Yahoo
• good-relations: e-commerce (Ebay...)
• rNews: metadata for news agencies (AFP, Reuters...)
Wednesday, May 25, 2011
How is it related tothe Web?
Wednesday, May 25, 2011
The traditional Web
• A principle: hypertext
• A protocol: HTTP
• An identification scheme: URNs/URIs
• A language: HTML
Wednesday, May 25, 2011
“To a computer, then, the web is a flat, boring world devoid of meaning”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/Wednesday, May 25, 2011
“This is a pity, as in fact documents on the web describe real objects and imaginary
concepts, and give particular relationships between them”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/Wednesday, May 25, 2011
“Adding semantics to the web involves two things: allowing documents which have information in
machine-readable forms, and allowing links to be created with relationship values.”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/Wednesday, May 25, 2011
“The Semantic Web is not a separate Web but an extension of the current one, in which information
is given well-defined meaning, better enabling computers and people to work in cooperation.”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/Wednesday, May 25, 2011
The traditional Web
• A principle: hypertext
• A protocol: HTTP
• An identification scheme: URNs/URIs
• A language: HTML
Wednesday, May 25, 2011
The semantic Web
• A principle: hypertext
• A protocol: HTTP
• An identification scheme: URNs/URIs
• A language: HTML RDF
Wednesday, May 25, 2011
The W3C “Layer Cake”
Wednesday, May 25, 2011
The W3C “Layer Cake”
Alreadystandardized
Wednesday, May 25, 2011
URIs and theWeb of Things
• URIs (Unique Resource Identifiers) are used to identify things (also called entities) in the real world
• For instance: people, places, events, companies, products, movies, etc.
Wednesday, May 25, 2011
The RDF model
Subject ObjectPredicate
RDF is used to describe relationships between objects, identified by their URIs
Wednesday, May 25, 2011
Example
Source: http://www.slideshare.net/AntidotNet/web-smantique-web-de-donnes-web-30-linked-data-quelques-repres-pour-sy-retrouver
Wednesday, May 25, 2011
RDF serialization
As XML:
Others, ex: N3:
Wednesday, May 25, 2011
SPARQL
• Query language for RDF databases
• Several implementations
• OSS: Apache Jena, Sesame, 4Store, Virtuoso, Mulgara, Redland, Open Anzo...
• Proprietary: 5Store, AllegroGraph RDFStore, Stardog, Dydra, OWLIM...
• More expressive than SQL, scalability is still an open question
Wednesday, May 25, 2011
SPARQL Sample
Wednesday, May 25, 2011
Where and howto find these data?
Wednesday, May 25, 2011
Solution 1: “Lift”
• One can use HTML scrapping and natural language processing (NLP) technique to extract semantic information from existing content / sites
• Generic solutions: OpenCalais, Zemanta, Apache Stanbol
• Pro: no need to change existing content
• Con: error prone, needs human checks
Wednesday, May 25, 2011
Example: DBPedia
Wednesday, May 25, 2011
Solution 2: export
• RDFa and microformats are used to embed semantic information (expressed using the RDF model) into regular web pages
• RDFa does it using existing (rel) and additional (about, property, typeof) attributes
• Microformats only use usual HTML attributes (class)
Wednesday, May 25, 2011
Solution 3: reuse
• Linked Open Data: (usually large) data repositories available on the web (for free or not), expressed using the RDF model
• Interoperability between these repositories (their ontologies) must be defined
Wednesday, May 25, 2011
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Linked Open Data in 2007
Wednesday, May 25, 2011
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
2008
Wednesday, May 25, 2011
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
2009
Wednesday, May 25, 2011
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
2010
Wednesday, May 25, 2011
Good for Enterprise apps too!
Diagram source: http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/Wednesday, May 25, 2011
Why now?
Wednesday, May 25, 2011
Key Enablers
Open Data and Linked Online Data
Advances in automatic content analysis (linguistics, image processing) and machine learning
Classical logic and classical AI
Computing power (Moore’s law + MapReduce)
Wednesday, May 25, 2011
Let’s put them to use!
The technologies and data are available,
Wednesday, May 25, 2011
2. Nuxeo &Semantic ECM
Wednesday, May 25, 2011
Nuxeo
Wednesday, May 25, 2011
Nuxeo: an open source ECM vendor
Our Focus is Enterprise Content Management
ECM as a Platform for Content Applications
Open Source as Efficient Development Model
Modern architecture for 21st Century business
“Lean, mobile, social, interoperable”
A Social Marketplace in action
Innovation driven by community of customers, partners, and our core developers
Wednesday, May 25, 2011
49
Nuxeo ECM - From Platform to Products
PlatformContent
Infrastructure
Nuxeo Enterprise PlatformComplete set of components covering all aspects of ECM
Nuxeo CoreLightweight, scalable, embeddable content repository
HorizontalPackages
DocumentManagement
Digital AssetManagement
CaseManagement
Framework
StructuredDocument
Server
ContentAggregator
Business Solutions
Correspondence Management
Contracts Management
Invoice ProcessingRecords
Management
Construction Media Government Life Sciences
Wednesday, May 25, 2011
Major Customers
Wednesday, May 25, 2011
Goals
Wednesday, May 25, 2011
Goals for Semantic ECM
Repurpose existing content
Improve search and collaboration
Make information contextual
Extract and use information from your content
Make your content smarter!
Wednesday, May 25, 2011
Semantic ECM
Wednesday, May 25, 2011
Content
Text
Image
Sound
Video
Semantic ECM
Wednesday, May 25, 2011
Content
Text
Image
Sound
Video
Meaning
Metadata
Relations
EntitiesTags
Reasoning
Semantic ECM
Wednesday, May 25, 2011
Content
Text
Image
Sound
Video
Meaning
Metadata
Relations
EntitiesTags
Reasoning
Semantic ECM
Wednesday, May 25, 2011
Content Stack vs. Knowledge Cake
Architectural Challenge
Wednesday, May 25, 2011
Business valuefrom semantic ECM
Efficiency gains: 20% to 90% (ex: in search, collaboration)
Effectiveness gains: better returns from your assets (ex: news and images from AFP)
Strategic edge: growth, value capture, new services, gain unfair strategic advantage (ex: vertical ontologies for CEVAs / CCAs)
Wednesday, May 25, 2011
56
Demo
Wednesday, May 25, 2011
How does it work?
Wednesday, May 25, 2011
58
IKS project
• European project under the FP7, with 13 partners (6 SMEs) and a 8.5 MEUR budget
• Goal: create a semantic software “stack” that will be used by CMS vendors to add semantic features to their products
• Started in Jan. 2009, will last until Dec. 2012
• First tangible result: Apache Stanbol, already integrated in a Nuxeo plugin
Wednesday, May 25, 2011
59
Wednesday, May 25, 2011
Stanbol: a semantic engine
• From unstructured content to Knowledge
• Language guessing
• Topic classification (Business, Sports, Media, ...)
• Named Entities extraction and linking
• Relationships and properties extraction
• Pluggable with proprietary engines (ex: Temis)
Wednesday, May 25, 2011
61
Wednesday, May 25, 2011
62
Wednesday, May 25, 2011
63
RESTfulis
Beautiful
Wednesday, May 25, 2011
64
= Semantic Engines
(Apache OpenNLP) +
Fast Linked Data local index(Apache Solr)
+ Semantic Rule Engine
(Apache Jena)Wednesday, May 25, 2011
Local IT infrastructure (LAN) 65
Nuxeo DM
addon
1
Apache Stanbol
2
Engine 1
Engine 2
Engine 3
3
DBpedia
Freebase
GeonamesLDAP
Wednesday, May 25, 2011
How to try it?
Wednesday, May 25, 2011
https://connect.nuxeo.com/nuxeo/site/marketplace/category/semanticWednesday, May 25, 2011
Notes
• Nuxeo EP 5.4.2 (next week) will have significant improvements to enable new features of the semantic plugins
• Source code here: http://hg.nuxeo.org/addons/nuxeo-platform-semantic-entities/
• Join us at the IKS Paris Workshop on July 5-6 to learn much more about Nuxeo and semantic technologies!
Wednesday, May 25, 2011
69
Resources• http://iks-project.eu
• http://stanbol.demo.nuxeo.com
• http://incubator.apache.org/stanbol
• http://blogs.nuxeo.com/dev
• http://hadoop.apache.org/
• http://incubator.apache.org/opennlp/
Wednesday, May 25, 2011
70
Questions?
Wednesday, May 25, 2011
71
Up Next!
Live Demo - Nuxeo StudioJune 1, 2011
Building Packages for the Nuxeo Marketplace
Juen 8, 2011
Wednesday, May 25, 2011