1 ECM and Semantic Web En route to Semantic ECM Roland Benedetti Olivier Grisel Stefane Fermigier
1
ECM and Semantic WebEn route to Semantic ECM
Roland BenedettiOlivier GriselStefane Fermigier
2
Agenda Nuxeo in short Nuxeo platform, overview Why integrating semantic web with Nuxeo ? The Scribo & IKS project Demo & Work delivered What’s next ?
3
Nuxeo, in short• Open Source Enterprise Content Management provider with
global install base
• Our Focus is Enterprise Content Management Not an onramp to another core offering
• ECM as a Platform for Content Applications Technically superior, extensible, plug-in friendly architecture
• Current ECM Platform fully based on the Java environment – technology refresh in 2005
• Open Source as Efficient Development ModelOpen development process, freely available ECM platform & components to download / deploy / extend, trick-free business-friendly licensing, etc.Innovation driven by community of customers, partners, our core developers
• 10 years old, Paris, Boston, San Francisco, 50+ employees
Nuxeo ECM - Our Approach
PlatformContent
Infrastructure
Nuxeo Enterprise PlatformComplete set of components covering all aspects of ECM
Nuxeo CoreLightweight, scalable, embeddable content repository
HorizontalPackages
DocumentManagement
Digital AssetManagement
CaseManagement
Framework
StructuredDocument
Server
ContentAggregator
Business Solutions
Correspondence Management
Contracts Management
Invoice Processing
Records Management
Construction Media Government Life Sciences
5
Nuxeo, from ECM ...
6
Nuxeo, from ECM ...
... to Semantic ECM
7
2010, technologies and data are available but not really used by Enterprise Content Management providers.
Let’s put them to use!
8
Goals for Semantic ECM• Repurpose existing content better
• Improve search and collaboration
• Make information more contextual
• Extract and use information from content
• Leverage Open and Linked Data, contribute
• Make ECM user’s content smarter!
• > Gain efficiency, effectiveness and strategic positioning on the ECM market
9
Demo
Scribo project• Project under the french FUI program,
with 9 partners, and a budget of 4.7 MEUR
• Goal: to develop algorithms and collaborative tools for extracting knowledge from unstructured documents and images
• Started in 2008, finishing in Dec. 2010, with results already integrated as a Nuxeo plugin
10
11
IKS project
• European project under the FP7, with 13 partners (6 SMEs) and a 8.5 MEUR budget
• Goal: create a semantic software “stack” that will be used by CMS vendors to add semantic features to their products
• Started in Jan. 2009, will last until Dec. 2012
• First tangible result: FISE, already integrated in a Nuxeo plugin
12
The Semantic Engine
• From unstructured content to Knowledge
• Language guessing
• Topic classification (Business, Sports, Media, ...)
• Named Entities extraction and linking
• Relationships and properties extraction
13
14
15
RESTfulis
Beautiful
16
17
18
= fise +
fast Linked Data local index +
semantic rule engine+
more ?
Local IT infrastructure (LAN) 19
Nuxeo DM
addon
1
Apache Stanbol
2
Engine 1
Engine 2
Engine 3
3
DBpedia
Freebase
GeonamesLDAP
20
Next ?
21
Mining Wikipedia in the Cloud with Hadoop and Pig to improve Natural Language Processing efficiency, and better result in extracting Named Entities
• http://blogs.nuxeo.com/dev/2011/01/mining-wikipedia-with-hadoop-and-pig-for-natural-language-processing.html
22
Mining Wikipedia in the Cloud with Hadoop and Pig to improve Natural Language Processing efficiency, and better result in extracting Named Entities
• Wikipedia as a learning knowledge base to train our NLP system on
• DBPedia to locate entities from Wikipedia to the NLP system
• OpenNLP to translate this in learning material
• Apache Hadoop for distributed processing
• Apache Pig and Whirr for deployment and management on Amazon EC2 cluster
23
Resources• http://iks-project.eu
• http://stanbol.demo.nuxeo.com
• http://incubator.apache.org/stanbol
• http://blogs.nuxeo.com/dev
• http://hadoop.apache.org/
• http://incubator.apache.org/opennlp/
• http://incubator.apache.org/projects/whirr.html