Top Banner
12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi Assogna, Francesco Taglino CNR-IASI (Italy)
22

12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

Apr 01, 2015

Download

Documents

Orion Covin
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 1

Second International Workshop on New Generation Enterprise and Business

InnovationNGEBIS 2013

Cross Domain Crawling for Innovation

Pieruigi Assogna, Francesco TaglinoCNR-IASI (Italy)

Page 2: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 2

Outline

Motivations & ObjectivesMethodological approachTechnological approachConclusions

Page 3: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 3

Motivations and Objectives

In any kind of organization, creativity and innovation come from peopleTools aiming at supporting creativity need to be based on the most accredited theories related to how people use their knowledge to act on the environment, adapt to new situations, invent.

The method proposed here aims at providing knowledge “raw material”,

capable of triggering out-of-the-box ideas

Page 4: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 4

Constructivism

According to Constructivism a person’s culture is an integrated network of concepts and modelsThis guides the person’s activity, and is consolidated, enriched, modified by each new experience Apart from pathological situations (schizophrenia) each person’s structure is anyway connected

Page 5: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 5

New Paths

The connections between concepts create paths that, with time, our mind travels more or less automaticallyIn new situations we have to “take the lead” and try new paths, possibly linking different and distant clusters This is for instance what is favored by “lateral thinking” methods

Page 6: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 6

Knowledge Base

In general a domain Knowledge Base (KB) is a tool for maintaining and enriching its users’ focused knowledgeIn particular the KB’s ontology mimics their focused conceptual structure When the users are confronted by new issues, a search on the KB or on the Net (on the base of the domain ontology) typically keeps them within this focused ground

Page 7: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 7

The Methodology

We propose a way to extend a focused knowledge domain to support diversions from usual thinking pathsWe use the domain ontology to search the Net for documents that address key topics of the domain together with topics belonging to different onesThese documents have good probability of containing considerations, theories, metaphors that link the person’s knowledge clusters with “exotic” ones, able to trigger ideas out-of-the-box

Page 8: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 8

Semantics-based cross-domains crawling

TopicClassification

SemanticFilter

RSSFeedLinked OpenData cloud

KeywordsExtraction linked_to

sopported_byreads

Semantic Crawler

Candidateinterestingdocuments

to be manuallyvalidated

websites

documentrepository

ReferenceOntology

Documental Resources Space

Page 9: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 9

Documental Resources Space

where we search for interesting documentswebsites (e.g., MIT website on innovations), RSS feeds, and public documents repositories (e.g., BBC news)In our example we focus on Robotics and Machine Vision (R&MV) domain

Page 10: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 10

Linked Data

A set of principles to allowStandard description of data (RDF-based)Standard way of accessing data (HTTP)Linking resources/data among them

Linking Open Data as a project for publishing datasets (e.g., Dbpedia) in a Linked Data fashion

Page 11: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 11

The Linking Open Data cloud

DBpedia

Page 12: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 12

Reference ontology and bridgeto the LOD cloud

Within the BIVEE project we have built a glossary of 600 concepts on R&MVWe enriched such concepts with DBpedia entries (owl:sameAs)

Photodiodes

R&MV referenceontology

DBpedia

Photodiodehttp://dbpedia.org/page/Photodiode

owl:sameAs

Camera Camerahttp://dbpedia.org/page/Camera

owl:sameAs

Page 13: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 13

Terms extraction fromanalyzed document

Extracted terms/concepts are representative and somehow synthesize the document’s contentWe analyzed different tools for extracting knowledge from documents

Zemanta, Alchemy, OpenCalais, FISE

AlchemyAPI: extract concepts from a text

relevance valuelink to DBpedia and other LOD dataset

Page 14: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 14

Semantic Filter over a doc

Two stepsIdentify the extracted concepts related to our domain of interestIdentify good candidate and discarding not interesting documents

Page 15: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 15

Semantic Filter over a doc: step 1

Identify the extracted concepts related to our domain of interest (e.g., R&MV)Given an extracted concept ec, it exists at least one reference concept rc, such that

ExtractedConcept (ec)

(r1 = ref. to Dbpedia entry)

Reference OntologyConcept (rc)

(r2 = ref. to Dbpedia entry)

(r1 dc:subject) r AND (r2 dc:subject r)where r is a resources

r1 = r2

OR

Page 16: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 16

Semantic Filter over a doc: step 2

Let be S1 the set of extracted concepts related to our domain

Let be S2 the set of extracted concepts NOT related to our domain

A document is a good candidate if (a) t1<Sum(relVal(S1))<t2 AND t1=0.1, t2=0.4

(b) Sum(relVal(S2))>t3 t3=0.4

(a) ensures that the analyzed document deals with our reference domain, but in a small manner, (b) second constraint ensures that the analyzed document deals with other topics in a considerable measure.

Page 17: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 17

Filtering: example 1

Document 1 http://news.bbc.co.uk/2/hi/science/nature/1542588.stm

Classification science_technology R&MV Other

Text and Relevance

Light-emitting diode(*) 0.913929

0.37 0.48

Slug 0.858322 Power (*) 0.855041

Foot-and-mouth disease 0.854825 Mucus 0.832959

Camera(*) 0.831103

Soil 0.830792

ExtractedConcepts andRelevance

The document is about extracting energy from insects

SUGGESTED AS INTERESTING

Page 18: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 18

Filtering: example 2

Document 2 http://www.bbc.co.uk/news/technology-10687701

Classification computer_internet R&MV Other

Text and Relevance

Female body shape 0.967518

0.11 0.40

Body shape 0.635835

Clothing 0.476781

Human body 0.467204

Robotics(*) 0.447413

Robot(*) 0.441914

Fashion 0.342003

Physical attractiveness 0.331898

ExtractedConcepts andRelevance

The document is about supporting shoppers get the right fit when buying clothes online

SUGGESTED AS INTERESTING

Page 19: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 19

Filtering: example 3

Document 3 http://www.bbc.co.uk/news/health-21965092

Classification Health R&MV Other

Text and Relevance

Obesity 0.976225

0 0.54

Bariatric surgery 0.597715

Gastric bypass surgery 0.535516

Weight loss 0.479716

Bacteria 0.464552

Nutrition 0.45946

Bariatrics 0.45838

Medicine 0.374071

ExtractedConcepts andRelevance

The document does not consider Robotics and Machine Vision at all

NOT INTERESTING document

Page 20: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 20

Filtering: example 4

Document 4 http://www.bbc.co.uk/news/business-20800118

Cassification arts_entertainment R&MVs Other

Text and Relevanc

Robot(*) 0.971036

0.42 0.14

Robotics(*) 0.691485

White-collar worker 0.615792

Industrial robot(*) 0.509681

Humanoid robot(*) 0.418013

Manufacturing 0.37688

Automaton(*) 0.347331

ExtractedConcepts andRelevance

The document is too much Robotics oriented, so it can be surely useful for experts in the Robotics field, but it does not appear inspiring for lateral thinking

NOT INTERESTING document

Page 21: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 21

Conclusions and Outlook

Very preliminary work on supporting lateral thinking activitiesMore experimentationUsing the LOD cloud as much as possible

Page 22: 12/03/2013 1 Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.

12/03/2013 22

Questions & Answers