Special Topics in Computer Science Special Topics in Computer Science Advanced Topics in Information Advanced Topics in Information Retrieval Retrieval Lecture 11: Lecture 11: Natural Language Processing Natural Language Processing and IR. and IR. Semantics Semantics and Semantically-rich and Semantically-rich representations representations Alexander Gelbukh www.Gelbukh.com
48
Embed
Special Topics in Computer Science Advanced Topics in Information Retrieval Lecture 11: Natural Language Processing and IR. Semantics and Semantically-rich.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Special Topics in Computer ScienceSpecial Topics in Computer Science
Advanced Topics in Information RetrievalAdvanced Topics in Information Retrieval
Lecture 11: Lecture 11: Natural Language Processing and IR. Natural Language Processing and IR.
SemanticsSemantics
and Semantically-rich representations and Semantically-rich representations Alexander Gelbukh
Tasks beyond IR Question Answering Summarization Information Extraction Cross-language IR
5
Syntactic representation Syntactic representation
A sequence of syntactic trees.
BE
SCIENCE IMPORTANT
COUNTRY
WE
of
PAY
GOVERNMENT ATTENTION IT
MUCH
6
Linguistic processor
Morpho-logical
analyzer
Semantic analyzer
Syntactic parser
Semanticanalysis
Semantic analysisSemantic analysis
7
Semantic representationSemantic representation
Complex structure of whole text
SCIENCE
IMPORTANT
COUNTRY
WE
GOVERNMENT
ATTENTION
is
of
gives
for
of for
8
Semantic representationSemantic representation
Expresses the (direct) meaning of the text Not what is implied
Free of the means of communications Morphological cases (transformed to semantic links) Word order, passive/active Sentences and paragraphs Pronouns (resolved)
Free of means of expressing Synonyms (reduced to a common ID) Lexical functions
9
Lexical FunctionsLexical Functions
The same meaning expressed by different words The choice of the word is a function of other words Few standard meanings Example: Magn = “much”, “very”
Strong wind, tea, desire Thick soup High temperature, potential, sea; highly expensive Hard work; hardcore porno Deep understanding, knowledge, appreciation
10
...Lexical Functions...Lexical Functions
“give” pay attention provide help adjudge a prize yield the word confer a degree deliver a lection
Used for synonymic rephrasing Need to reduce the meaning to a standard form Example: Syn, hyponyms, hypernyms
W Syn (W) complex apparatus complex mechanism
Example: Conv31, Conv24, ... A V B C C Conv31(V) B A
John sold the book to Mary for $5 Mary bough the book from John for $5 The book costed Mary $5
14
Semantic networkSemantic network
Representation of the text as a directed graph Nodes are situations and entities Edges are participation of an entity in a situation
Also situation in a situation:begin reading a book, John died yesterday
Situation can be expressed with a noun:Professor delivered a lection to studentsProfessor “*lectured” to studentsLecture on history, memorial to heroes
A node can participate in many situations! No division into sentences
15
SituationsSituations
Situations with different participants are different situations John reads a book and Mary reads a newspaper. He aks h
er whether the newspaper is interesting. Here two different situations of reading! But the same entities: John, Mary, newspaper, participatin
g in different situations
Tense and number is described as situations John reads a book: Now (reading (John, book) & quantity (book, one)
16
Semantic valenciesSemantic valencies
A situation can have few participants (up to ~5) Their meaning is usually very general They are usually “naturally” ordered:
Who (agent) What (patient, object) To whom (receiver) With what (instrument, ...) John sold the book to Mary for $5
So, in the network the outgoing arcs of a node are numbered
17
Semantic representationSemantic representation
Complex structure of whole text
SCIENCE
IMPORTANT
COUNTRY
WE
GOVERNMENT
ATTENTION
1
2
1 2
Give2
1
Possess
1
2
Now
Now
Now
Quantity
1
18
Reasoning and common-sense infoReasoning and common-sense info
One can reason on the network If John sold a book, he does not have it
For this, additional knowledge is needed! A huge amount of knowledge to reason
A 9-year-old child knows some 10,000,000 simple facts Probably some of them can be inferred, but not (yet)
automatically There were attempts to compile such knowledge
manually There is a hope to compile it automatically...
19
Semantic representationSemantic representation
... and common-sense knowledge
SCIENCE
IMPORTANT
COUNTRY
WE
GOVERNMENT
ATTENTION
is
of
gives
for
of for
Funding
Organization
Sector
Money
is a main form
needs
is a
gives
is a implies
20
Computer representationComputer representation
Logical predicates Arcs are arguments
In AI, allows reasoning In IR, can allow comparison even without reasoning
21
Conceptual GraphsConceptual Graphs
•A CG is a bipartite graph.
Concept nodes represent entities, attributes, or events (actions).
Relation nodes denote the kinds of relationships between the concept nodes.
[John](agnt)[love](ptnt)[Mary]
22
program:{*} analyze logicallypnt mnr
criteriaprovide use Invariant:{*}ptn for ptn
Implication:{*}examine approach
diagnosis
automatic
correctionerrorlogical
ptn of
for
ofattr attr
for
23
Use in IRUse in IR
Restrict the search to specific situations Where John loves Mary, but not vice versa
or
Soften the comparison Approximate search Look for John loves Mary, get someone loves Mary
And what if the answer is not in a short passage Summarize: say the same (without unimportant
details) but in fewer words Now: statistical methods Reasoning can help
44
Tasks beyond IR:Tasks beyond IR: Information Extraction Information Extraction
Question answering on a massive basis Fill a database with the answers Example: what company bought what company and
when? A database of three columns Now: (statistical) patterns Reasoning can help
45
Cross-lingual IRCross-lingual IR
Question in one language, answer in another language
Or: question and summary of the answer in English, over a database in Chinese
Is a kind of translation, but simpler Thus can be done more reliably A transformation into semantic network can greatly help
46
Research topicsResearch topics
Recognition of the semantic structure Convert text to conceptual graphs All kinds of disambiguation
Shallow semantic representations Application of semantic representations to specific
tasks Similarity measures on semantic representations Reasoning and IR
47
ConclusionsConclusions
Semantic representation gives meaning Language-specific constructions used only in the
process of communication are removed Network of entities / situations and predicates Allows for translation and logical reasoning Can improve IR:
Compare the query with the doc by meaning, not words Search for a specific situation Search for an approximate situation QA, summarization, IE Cross-lingual IR