Goal 2 Goal 2 Activities 4, 6, 7 Activities 4, 6, 7 Letizia Tanca Letizia Tanca Politecnico di Milano Politecnico di Milano
Mar 26, 2015
Goal 2Goal 2Activities 4, 6, 7Activities 4, 6, 7
Letizia TancaLetizia Tanca
Politecnico di MilanoPolitecnico di Milano
Goal 2: Goal 2: Knowledge ManagementKnowledge Management(Polimi)(Polimi)
Activity 4:Activity 4: Knowledge extraction from natural language actions (Polimi + IBM + Bari)
Activity 6:Activity 6: Knowledge extraction, modeling and integration from semi-structured information sources, driven by domain ontologies (PoliMI)
Activity 7:Activity 7: Knowledge fusion, “tailoring” and dissemination for business model redesign (PoliMI)
Context-aware Web PortalContext-aware Web Portal
Contextual data analysisContextual data analysis At GialloRosso the oenologist and the agronomist
interact with the data related to harvesting and to the wine ageing – the information they interact with depend on their role and
on the workflow phase– The agronomist inserts information related to the nature of
the natural phoenomena– The agronomist and the oenologist ask information related
to the phase
At BiancoRosso the sales manager:– analyzes sales data – in a different moment analyzes the market trends, then – reads similar information in natural language from the web
GialloRosso performs market analyses by accessing its own information combined with market information collected by its ally BiancoRosso
GialloRosso Logical SchemaGialloRosso Logical Schema
BiancoRosso Logical SchemaBiancoRosso Logical Schema
VINO(ID_Vino, nome, vinificazione, invecchiamento, denominazione, temperatura, min_temp, note)
EVENTO(ID_Evento, nome, tipo, data, luogo)
TRENDSETTER(ID_Trend, nome, professione)
FONTE(ID_Fonte, nome, uri, tipo, rilevanza, provenienza, descrizione)
DOCUMENTO(ID_Doc, riassunto, url, data, autore, titolo, argomento, descrittore, ID_Fonte)
VALUTAZIONEMERITO(ID_valutazione, descrizione, giudizio, lingua)
RISULTATORICERCA(ID_risultato, ID_vino, ID_evento, ID_trend, ID_fonte, ID_doc, posizione, ID_valutazione)
Context-aware data tailoringContext-aware data tailoring
Data tailoring via view Data tailoring via view compositioncomposition
Context Dimension TreeContext Dimension Tree
Some relevant areasSome relevant areas
AT GIALLOROSSO THE OENOLOGIST AND AT GIALLOROSSO THE OENOLOGIST AND THE AGRONOMIST INTERACT WITH THE THE AGRONOMIST INTERACT WITH THE DATA RELATED TO CULTIVATION AND TO DATA RELATED TO CULTIVATION AND TO THE CELLARTHE CELLAR
A PORTION OF THE CDT A PORTION OF THE CDT OF OUR SCENARIOOF OUR SCENARIO
oenologist
Some contextual viewsSome contextual views
C1=<role=agronomist, *, phase=harvesting>
C2 =<role=agronomist, *, phase=ageing> C3=<role=enologist, *, phase=harvesting>C4 =<role=enologist, *, phase=ageing>
Some contextual queriesSome contextual queries
The agronomist during the harvesting phase (context C1) wants to collect all the available information coming from sensors:
SELECT m.date_time,m.value,s.s_id,s.meas_unit FROM sensor s, measure_data m
WHERE s.s_id=m.s_id;
S/he obtains only the information from sensors placed in the vineyards (see Rel(C1))
Some contextual queriesSome contextual queries
The oenologist during the harvesting phase (context C3) wants to collect all the available information about bottles of “Aglianico” wine:
SELECT * FROM bottle b WHERE b.appellation="aglianico";But the query is out of context, in the
context C3 only information about vineyard and grapevine are available for the oenologist.
Some more contextual queriesSome more contextual queries
The previous query makes sense in context C4, where the oenologist is in the ageing phase:
SELECT * FROM bottle b WHERE b.appellation="aglianico";Produces a non- empty result.
AT BIANCOROSSO:AT BIANCOROSSO:
1.1.THE SALES MANAGER ANALYZES SALES DATA THE SALES MANAGER ANALYZES SALES DATA
2.2.THE OENOLOGIST ANALYZES WINE FEATURES TO DESIGN A NEW THE OENOLOGIST ANALYZES WINE FEATURES TO DESIGN A NEW WINEWINE
3.3.THEN S/HE READS SIMILAR INFORMATION IN NATURAL LANGUAGE THEN S/HE READS SIMILAR INFORMATION IN NATURAL LANGUAGE FROM THE WEBFROM THE WEB
4.4.ALSO INTENSIONAL QUERIES ARE PERFORMED ALSO INTENSIONAL QUERIES ARE PERFORMED
Sales and promotions planning Sales and promotions planning (Q1)(Q1)Sales and promotions planning for events
and festivals The sales manager of BiancoRosso wants
to select the wines to promote for each event or festival– For each event or type of event he/she needs to
identify the most related wines– Interesting wines for each event can be
obtained by analyzing frequent rules in the form• EventType=value → Wine=value • E.g., EventType=“Summer party” → Wine=“White
wine” support=20%, confidence=36%
Sales and promotions planning Sales and promotions planning (Q2)(Q2)Sales and promotions planning depending on
time periods The sales manager wants to plan specific
promotions for each time period of the year– For each time period (e.g., month) the manager
needs to select the most related wines– Interesting wines can be obtained by analyzing
frequent rules in the form• Month=value → Wine=value • E.g., Month=“June” → Wine=“White wine”
support=20%, confidence=36%
Design of wine (Q3)Design of wine (Q3)
Analysis of the main characteristics of wines The oenologist of BiancoRosso wants to
produce new wines He/she needs to know the main
characteristics of each wine to select the most interesting wines to produce– He/she obtains the characteristics of each
wine by exploiting rules in the form• Wine=value → Characteristic=value• E.g., Wine=“White wine” → Characteristic=“Mainly
drunk in a specific time period” support=6%, confidence=100%
Design of wine (Q4)Design of wine (Q4)
Identification of correlations between wines and time periods
The time period in which each wine is mainly consumed is useful to select the wines to produce
For each wine the oenologist wants to obtain the time period (e.g., month) in which the wine is mainly consumed– Allows selecting wines related to time periods not
already covered by the wines currently produced by BiancoRosso
– He/she uses rules in the form• Wine=value → Month=value • E.g., Wine=“White wine” → Month=“June” support=20%,
confidence=100%
Design of wine (Q5)Design of wine (Q5)
Identification of correlations between wines and information sources
Once the oenologist has selected the new wines to be produced, he/she needs to identify the sources containing documents related to the selected wines– The oenologist identifies the sources containing
information about the wines of his/her interest by exploiting the following rules• Wine=value → Source=value • E.g., Wine=“Montello e colli asolani cabernet
superiore” → Source=“Gambero Rosso” support=11%, confidence=100%
DIESIRAEDIESIRAEA semantic search engine A semantic search engine based on based on Natural Language ProcessingNatural Language Processing
Knowledge ManagementKnowledge Management
Knowledge Indexing & Knowledge Indexing & Extraction: GoalsExtraction: Goals Domain model Ontology (W3C OWL standard)
– Describes the concepts of the domain
Domain vocabulary Semantic Network– Describes the lemmas of the domain
Mapping model Stochastic model– 2° order HMM-inspired model– Transition probs approximated by means of MaxEnt
models– Solves mapping ambiguities
Queries:– Keyword-based (AND/OR; max probability/exaustive)– Phrase-based (Disambiguated Word queries and
Ontological queries)
Knowledge indexing & Knowledge indexing & extraction: Functionalitiesextraction: Functionalities
Training Indexing, querying, and extending
Knowledge indexing & Knowledge indexing & extraction: Information extraction: Information Extraction EngineExtraction Engine
Training Indexing, querying, and extending
Linguistic Context Extractor:– Calls linguistic tools (Stanford
Parser, FreeLing, JavaRAP,…)
– words Wi (lemmas Li , linguistic context information Ii )
MaxEnt Models:– Calculates HMM
transition probabilities (takes in account the linguistic context info)
Extended Viterbi:– (Li , Ii) concepts Ci
TF-IDF:– Document ranking,
based on concept frequencies
Art deco Wine Domain OntologyArt deco Wine Domain Ontology
Keyword-based queriesKeyword-based queries
Sequence of isolated words – No linguistic structure
Exhaustive AND/OR keywords– No concept disambiguation– Searches for multiple tuples– Example: light wine several meanings found…
country wine search for instances…
taste wine search for subclasseses…
Max probability AND/OR keywords– Searches for a single tuple– Exploits the a-priori concept probabilities– Example: [light wine] max probability meaning
Phrase-based queriesPhrase-based queries
Phrase– Linguistic structure– Context-based disambiguation
Disambiguated Word queries– Context used for concept disambiguation
• Index the phrase ( extract concepts)• Search for AND-ed concepts
– Example: (fruit taste) disambiguates fruit Ontological queries
– Context used to select the request to the ontology• Indexes the sentences• Select the request; searches the ontology for the mapped
concepts
– Example: “type of tannins in wine” instance list
GIALLOROSSO PERFORMS MARKET ANALYSES GIALLOROSSO PERFORMS MARKET ANALYSES BY ACCESSING ITS OWN INFORMATION BY ACCESSING ITS OWN INFORMATION COMBINEDCOMBINED WITH MARKET INFORMATION WITH MARKET INFORMATION COLLECTED BY ITS ALLY BIANCOROSSOCOLLECTED BY ITS ALLY BIANCOROSSO
The Integration problemThe Integration problemfrom the user point of viewfrom the user point of view
DATA SOURCE 1(RDBMS)
DATA SOURCE 2(XML)
DATA SOURCE 3(WWW)
GLOBAL KNOWLEDGE INTERFACE
query answer
DATA SOURCE 4(Base station)
User
APPLICATION
Information integration in ART Information integration in ART DECO DECO
Knowledge retrieval from the Knowledge retrieval from the sourcessourcesIn order to integrate the two original
sources, we define the following query to populate the ontology:
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#PREFIX fn: http://www.w3.org/2005/xpath-functions#PREFIX do: file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art-deco/wineDomain.owl#SELECT ?w1 ?w2 ?wn1 ?wn2 ?wb ?bq ?dse ?dso ?snFROM file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art-deco/wineDomain.owlWHERE {
?w1 rdf:type do:WineInFarm .?wb rdf:type do:WineBottle .?wb do:containsWine ?w1 .?wb do:bottleQuantity ?bq .?w1 do:appellationInFarm ?wn1 .?w2 do:appellationInDocument ?wn2 .?w2 rdf:type do:WineInDocument .?dse rdf:type do:DocSearch .?dso rdf:type do:DocSource .?dse do:searchWineID ?w2 .?dse do:searchSrcID ?dso .?dso do:docSrcName ?sn .
}
Query 1Query 1Quantity of bottles (in the GialloRosso DB)
available for each wine cited by the web source “Percorsi di Vino” (stored in the BiancoRosso DB):PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#PREFIX fn: http://www.w3.org/2005/xpath-functions#PREFIX do: file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art-deco/wineDomain.owl#SELECT ?wine_name sum(?bottle_quantity)FROM file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art-deco/wineDomain.owlWHERE {
?w1 rdf:type do:WineInFarm .?wb rdf:type do:WineBottle .?wb do:containsWine ?w1 .?wb do:bottleQuantity ?bottle_quantity .?w1 do:appellationInFarm ?wn1 .?w2 do:appellationInDocument ?wine_name .?w2 rdf:type do:WineInDocument .?dse rdf:type do:DocSearch .?dso rdf:type do:DocSource .?dse do:searchWineID ?w2 .?dse do:searchSrcID ?dso .?dso do:docSrcName ?source_name .
FILTER regex(?source_name, “PercorsiDiVino")FILTER fn:contains(?wine_name, ?wn1)
}
GROUP BY ?wine_name ?source_name
Query 2Query 2Which sources (from BiancoRosso) cite wines of which
we (GialloRosso) have at least a bottle available?
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#PREFIX fn: http://www.w3.org/2005/xpath-functions#PREFIX do: file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art-deco/wineDomain.owl#SELECT ?wine_name ?source_nameFROM file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art-deco/wineDomain.owlWHERE {
?w1 rdf:type do:WineInFarm .?wb rdf:type do:WineBottle .?wb do:containsWine ?w1 .?wb do:bottleQuantity ?bottle_quantity .?w1 do:appellationInFarm ?wn1 .?w2 do:appellationInDocument ?wine_name .?w2 rdf:type do:WineInDocument .?dse rdf:type do:DocSearch .?dso rdf:type do:DocSource .?dse do:searchWineID ?w2 .?dse do:searchSrcID ?dso .?dso do:docSrcName ?source_name .
FILTER (?bottle_quantity > 0)FILTER fn:contains(?wine_name, ?wn1)
}
GROUP BY ?wine_name ?source_name
Q & AQ & A
Q & A(If you see this slide we’ve not run out of
time)
Part 3 of the bookPart 3 of the book
Ontology-based knowledge elicitation: an architecture (Chapter editor Licia Sbattella, Roberto Tedesco, Giorgio Orsi, Politecnico di Milano, Marcello Montedoro, IBM Italia)
Knowledge extraction from Natural Language (Chapter editor Licia Sbattella, Roberto Tedesco, Politecnico di Milano)
Knowledge extraction from event flows (Chapter editor Alberto Sillitti, Università di Bolzano)
Context-aware knowledge querying in a networked enterprise (Chapter editor Cristiana Bolchini, Elisa Quintarelli, Fabio A. Schreiber, Politecnico di Milano, Teresa Baldassare, Università di Bari)
On-the-fly and Context-Aware Integration of Heterogeneous Data Sources (Chapter editors Giorgio Orsi, Letizia Tanca, Politecnico di Milano)
A methodology for context-driven data-warehouse design (Chapter editor Cristiana Bolchini, Elisa Quintarelli, Letizia Tanca, Politecnico di Milano)