Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

Post on 16-Apr-2017

109 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

Transcript

How to compute semantic relationships

between entities and facts out of

natural texts

Michael Fuchs Technology Evangelist

ABBYY fuchs@abbyy.com

Agenda

1. How machines read pixels

2. Documents, words, layout & semantics

3. Syntactic & semantic text parsing

4. Live demo

5. Q&A

2

How machines read pixels

3

Separate pixels to characters Pixel analysis Find text/image blocks

How machines read pixels

4

Build proper words as editable text Recognize individual characters

-> Linguistics: Alphabets & Morphology Dictionaries

-> Math, AI, Statistics, Experience, and…

Requirements to make a machine read text:

5

What is needed to make a machine understand the meaning

of words, sentences, texts?

Documents & Words

6

What is a document?

Statistics can give basic insights

-> No real semantic understanding

b) Words in order?

Layouts generate visual pattern

-> Semantics can be derived from layout

a) Bag of words?

Documents, Words and Layout

7

Document with layout

Text document with “simulated” layout Text with line breaks

Text only

-> Rules can extract data out of (semi-)structured texts and documents -> Layout helps to identify the semantic meaning of data

Text and Structure

Is “plain” natural language text unstructured?

8

-> yes, at least for almost all IT systems

-> not for humans who can read and speak the language

-> Facts and their relations can’t be reliably detected with “simple” rules

Text, Structure & Translation

9

Is a word by word translation enough?

-> … well – not really…

-> Semantic understanding of the words and their relationship in sentences is needed!

-> That is true for humans and machines

Text & Structure

10

Why is natural language text understanding difficult for machines?

-> Languages are not logical and context dependent

– different usage, e.g. as verb, noun, adjective

-> Different words – the same concept, e.g. to buy/sell something

– different meanings, e.g. run, plant, apple …

-> One word – different variants, e.g. go, went, gone

Basic Language Structure

11

-> Morphology = Rules how to use words

-> Semantics = meaning and the usage of words

-> Semantic Relations = reflect/organise the meaning and relations of words and sentences.

-> Syntax = Rules are used to build correct sentences

How to get to the insides of a sentence?

Compreno System Architecture

13

Extraction rules Interpretation

rules

Identification rules

Morphological analyzer

Syntactic and semantic analysis

Anaphora resolution

Disambiguation

Semantic representation

of text

Parser Information Extraction

Module

RDF Graph

Morphology Analysis

14 14

Sentence Analysis with Semantic Info

15

17

How to get the correct semantic meaning of words?

ABBYY’s answer: Universal Semantic Hierarchy

= language independent semantic concepts

ABBYY’s Universal Semantic Hierarchy

18

Semantic Meaning “Vocabulary” EN “Vocabulary” DE

Handling Lexical Ambiguity

19

Recovering Omitted Words and Links (Ellipsis)

20

Recovered Node

Ellipsis

Identifying Pronoun Referents (Anaphora)

21

Mary saw her students. They were wearing masks. She was surprised. (Mary → her, Mary → she, students → they).

From Text to Semantic with Compreno

22

DEMO

Summary: What is ABBYY Compreno? ● … NLP technology featuring a unique model-based approach that employs

universal language models and identifies language structures.

● …. combines both syntactic and semantic analysis, as well as machine learning on untagged text corpora.

● … allows to create a semantic representation of text

● … able to resolve complex language phenomena: − lexical ambiguity − omitted words and links recovering ellipsis − identifying pronoun referents anaphora − coreference − coordination and more

● … support of English, Russian, German in progress

24

QUESTIONS?

Thank you for your attention!

top related