12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi Assogna, Francesco Taglino CNR-IASI (Italy)
Apr 01, 2015
12/03/2013 1
Second International Workshop on New Generation Enterprise and Business
InnovationNGEBIS 2013
Cross Domain Crawling for Innovation
Pieruigi Assogna, Francesco TaglinoCNR-IASI (Italy)
12/03/2013 2
Outline
Motivations & ObjectivesMethodological approachTechnological approachConclusions
12/03/2013 3
Motivations and Objectives
In any kind of organization, creativity and innovation come from peopleTools aiming at supporting creativity need to be based on the most accredited theories related to how people use their knowledge to act on the environment, adapt to new situations, invent.
The method proposed here aims at providing knowledge “raw material”,
capable of triggering out-of-the-box ideas
12/03/2013 4
Constructivism
According to Constructivism a person’s culture is an integrated network of concepts and modelsThis guides the person’s activity, and is consolidated, enriched, modified by each new experience Apart from pathological situations (schizophrenia) each person’s structure is anyway connected
12/03/2013 5
New Paths
The connections between concepts create paths that, with time, our mind travels more or less automaticallyIn new situations we have to “take the lead” and try new paths, possibly linking different and distant clusters This is for instance what is favored by “lateral thinking” methods
12/03/2013 6
Knowledge Base
In general a domain Knowledge Base (KB) is a tool for maintaining and enriching its users’ focused knowledgeIn particular the KB’s ontology mimics their focused conceptual structure When the users are confronted by new issues, a search on the KB or on the Net (on the base of the domain ontology) typically keeps them within this focused ground
12/03/2013 7
The Methodology
We propose a way to extend a focused knowledge domain to support diversions from usual thinking pathsWe use the domain ontology to search the Net for documents that address key topics of the domain together with topics belonging to different onesThese documents have good probability of containing considerations, theories, metaphors that link the person’s knowledge clusters with “exotic” ones, able to trigger ideas out-of-the-box
12/03/2013 8
Semantics-based cross-domains crawling
TopicClassification
SemanticFilter
RSSFeedLinked OpenData cloud
KeywordsExtraction linked_to
sopported_byreads
Semantic Crawler
Candidateinterestingdocuments
to be manuallyvalidated
websites
documentrepository
ReferenceOntology
Documental Resources Space
12/03/2013 9
Documental Resources Space
where we search for interesting documentswebsites (e.g., MIT website on innovations), RSS feeds, and public documents repositories (e.g., BBC news)In our example we focus on Robotics and Machine Vision (R&MV) domain
12/03/2013 10
Linked Data
A set of principles to allowStandard description of data (RDF-based)Standard way of accessing data (HTTP)Linking resources/data among them
Linking Open Data as a project for publishing datasets (e.g., Dbpedia) in a Linked Data fashion
12/03/2013 11
The Linking Open Data cloud
DBpedia
12/03/2013 12
Reference ontology and bridgeto the LOD cloud
Within the BIVEE project we have built a glossary of 600 concepts on R&MVWe enriched such concepts with DBpedia entries (owl:sameAs)
Photodiodes
R&MV referenceontology
DBpedia
Photodiodehttp://dbpedia.org/page/Photodiode
owl:sameAs
Camera Camerahttp://dbpedia.org/page/Camera
owl:sameAs
12/03/2013 13
Terms extraction fromanalyzed document
Extracted terms/concepts are representative and somehow synthesize the document’s contentWe analyzed different tools for extracting knowledge from documents
Zemanta, Alchemy, OpenCalais, FISE
AlchemyAPI: extract concepts from a text
relevance valuelink to DBpedia and other LOD dataset
12/03/2013 14
Semantic Filter over a doc
Two stepsIdentify the extracted concepts related to our domain of interestIdentify good candidate and discarding not interesting documents
12/03/2013 15
Semantic Filter over a doc: step 1
Identify the extracted concepts related to our domain of interest (e.g., R&MV)Given an extracted concept ec, it exists at least one reference concept rc, such that
ExtractedConcept (ec)
(r1 = ref. to Dbpedia entry)
Reference OntologyConcept (rc)
(r2 = ref. to Dbpedia entry)
(r1 dc:subject) r AND (r2 dc:subject r)where r is a resources
r1 = r2
OR
12/03/2013 16
Semantic Filter over a doc: step 2
Let be S1 the set of extracted concepts related to our domain
Let be S2 the set of extracted concepts NOT related to our domain
A document is a good candidate if (a) t1<Sum(relVal(S1))<t2 AND t1=0.1, t2=0.4
(b) Sum(relVal(S2))>t3 t3=0.4
(a) ensures that the analyzed document deals with our reference domain, but in a small manner, (b) second constraint ensures that the analyzed document deals with other topics in a considerable measure.
12/03/2013 17
Filtering: example 1
Document 1 http://news.bbc.co.uk/2/hi/science/nature/1542588.stm
Classification science_technology R&MV Other
Text and Relevance
Light-emitting diode(*) 0.913929
0.37 0.48
Slug 0.858322 Power (*) 0.855041
Foot-and-mouth disease 0.854825 Mucus 0.832959
Camera(*) 0.831103
Soil 0.830792
ExtractedConcepts andRelevance
The document is about extracting energy from insects
SUGGESTED AS INTERESTING
12/03/2013 18
Filtering: example 2
Document 2 http://www.bbc.co.uk/news/technology-10687701
Classification computer_internet R&MV Other
Text and Relevance
Female body shape 0.967518
0.11 0.40
Body shape 0.635835
Clothing 0.476781
Human body 0.467204
Robotics(*) 0.447413
Robot(*) 0.441914
Fashion 0.342003
Physical attractiveness 0.331898
ExtractedConcepts andRelevance
The document is about supporting shoppers get the right fit when buying clothes online
SUGGESTED AS INTERESTING
12/03/2013 19
Filtering: example 3
Document 3 http://www.bbc.co.uk/news/health-21965092
Classification Health R&MV Other
Text and Relevance
Obesity 0.976225
0 0.54
Bariatric surgery 0.597715
Gastric bypass surgery 0.535516
Weight loss 0.479716
Bacteria 0.464552
Nutrition 0.45946
Bariatrics 0.45838
Medicine 0.374071
ExtractedConcepts andRelevance
The document does not consider Robotics and Machine Vision at all
NOT INTERESTING document
12/03/2013 20
Filtering: example 4
Document 4 http://www.bbc.co.uk/news/business-20800118
Cassification arts_entertainment R&MVs Other
Text and Relevanc
Robot(*) 0.971036
0.42 0.14
Robotics(*) 0.691485
White-collar worker 0.615792
Industrial robot(*) 0.509681
Humanoid robot(*) 0.418013
Manufacturing 0.37688
Automaton(*) 0.347331
ExtractedConcepts andRelevance
The document is too much Robotics oriented, so it can be surely useful for experts in the Robotics field, but it does not appear inspiring for lateral thinking
NOT INTERESTING document
12/03/2013 21
Conclusions and Outlook
Very preliminary work on supporting lateral thinking activitiesMore experimentationUsing the LOD cloud as much as possible
12/03/2013 22
Questions & Answers