Top Banner
Natural language processing (NLP) Presented By : Mohamed El-Serngawy
26
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NLP

Natural language processing(NLP)

Presented By : Mohamed El-Serngawy

Page 2: NLP

AgendaDefinition & Introduction Steps in NLPStatistical NLPReal World ApplicationDemos with free NLP Application

Page 3: NLP

Definition & Introduction Natural language processing (NLP) is a

field of computer science and linguistics concerned with the interactions between computers and human (natural) languages

Why Natural Language Processing ? Huge amounts of data

◦ Internet = at least 20 billions pages◦ Intranet

Applications for processing large amounts of texts require NLP expertise

Page 4: NLP

Definition & Introduction We look at how we can exploit knowledge

about the world, in combination with linguistic facts, to build computational natural language systems.

Natural language generation systems convert information from computer databases into readable human language, Natural language understanding systems convert samples of human language into more formal representations such as parse trees or first-order logic structures that are easier for computer programs to manipulate

Page 5: NLP

Steps in NLP Phonetics, Phonology: how Word are

prononce in termes of sequences of sounds Morphological Analysis: Individual words are

analyzed into their components and non word tokens such as punctuation are separated from the words.

Syntactic Analysis: Linear sequences of words are transformed into structures that show how the words relate to each other.

Semantic Analysis: The structures created by the syntactic analyzer are assigned meanings.

Discourse integration: The meaning of an individual sentence may depend on the sentences that precede it and may influence the meanings of the sentences that follow it.

Pragmatic Analysis: The structure representing what was said is reinterpreted to determine what was actually meant.

Page 6: NLP

PhoneticsStudy of the physical sounds of human speech

◦ /i:/, /ɜ:/, /ɔ:/, /ɑ:/ and /u:/◦ 'there' => /ðeə/ ◦ 'there on the table' => /ðeər ɒn ðə teɪbl /

• Transcription of sounds (IPA)

Page 7: NLP

Phonetic Articulory phonetics :

production

• Auditory phonetics : speech perception– McGurk effect

• Acoustics phonetics: properties of sound waves (frequency and harmonics)

Page 8: NLP

Morphological AnalysisSuppose we have an english interface to an

operating system and the following sentence is typed:◦ I want to print Bill’s .init file.

Morphological analysis must do the following things:◦ Pull apart the word “Bill’s” into proper noun “Bill”

and the possessive suffix “’s”◦ Recognize the sequence “.init” as a file extension

that is functioning as an adjective in the sentence.This process will usually assign syntactic

categories to all the words in the sentence.Consider the word “prints”. This word is

either a plural noun or a third person singular verb ( he prints ).

Page 9: NLP

Syntactic AnalysisSyntactic analysis must exploit the results of

morphological analysis to build a structural description of the sentence.

The goal of this process, called parsing, is to convert the flat list of words that forms the sentence into a structure that defines the units that are represented by that flat list.

The important thing here is that a flat sentence has been converted into a hierarchical structure and that the structure correspond to meaning units when semantic analysis is performed.

Reference markers are shown in the parenthesis in the parse tree

Each one corresponds to some entity that has been mentioned in the sentence.

Page 10: NLP

Syntactic AnalysisSyntactic Processing : Almost all the systems that are

actually used have two main components:◦A declarative representation, called a

grammar, of the syntactic facts about the language.

◦A procedure, called parser, that compares the grammar against input sentences to produce parsed structures.

Page 11: NLP

Syntactic Analysis Grammars and Parsers :The most common way to represent grammars is as a set of production rules.A simple Context-free phrase structure grammar fro English:

S → NP VP NP → the NP1 NP → PRO NP → PN NP → NP1 NP1 → ADJS N ADJS → ε | ADJ ADJS VP → V VP → V NP N → file | printer PN → Bill PRO → I ADJ → short | long | fast V → printed | created | want

First rule can be read as “ A sentence is composed of a noun phrase followed by Verb Phrase”; Vertical bar is OR ; ε represnts empty string.

Symbols that are further expanded by rules are called nonterminal symbols.Symbols that correspond directly to strings that must be found in an input

sentence are called terminal symbols.

Page 12: NLP

Syntactic Analysis

S

NP

PN

Bill

VP

V

printed

NP

the NP1

ADJS

E

N

file

Bill Printed the file

A Parse tree for a sentence :

Page 13: NLP

Syntactic Analysis A parse tree :John ate the apple.1. S -> NP VP2. VP -> V NP3. NP -> NAME4. NP -> ART N5. NAME -> John6. V -> ate7. ART-> the8. N -> apple

S

NP VP

NAME

John

V

ate

NP

ART N

the apple

Page 14: NLP

Semantic AnalysisSemantic analysis must do two

important things:◦It must map individual words into

appropriate objects in the knowledge base or database

◦It must create the correct structures to correspond to the way the meanings of the individual words combine with each other.

Page 15: NLP

Semantic AnalysisLexical processing : The first step in any semantic processing system is to look up

the individual words in a dictionary ( or lexicon) and extract their meanings.

Many words have several meanings, and it may not be possible to choose the correct one just by looking at the word itself.

The process of determining the correct meaning of an individual word is called word sense disambiguation or lexical disambiguation.

It is done by associating, with each word in lexicon, information about the contexts in which each of the word’s senses may appear.

Sometimes only very straightforward info about each word sense is necessary. For example, baseball field interpretation of diamond could be marked as a LOCATION.

Some useful semantic markers are :◦ PHYSICAL-OBJECT◦ ANIMATE-OBJECT◦ ABSTRACT-OBJECT

Page 16: NLP

Semantic AnalysisWord Net (common sense

KnowledgBase) :A database of lexical relations.Inspired by current psycholinguistic theories of human lexical memory.Synset: a set of synonyms, representing one underlying lexical concept◦Example:

fool {chump, fish, fool, gull, mark, patsy, fall guy, sucker, schlemiel, shlemiel, soft touch, mug}

Relations link the synsets: hypernym, Has-Member, Member-Of, Antonym, etc.

16

Page 17: NLP

Semantic AnalysisExample

pu-erh.cs.utexas.edu$ wn bike -partn

Part Meronyms of noun bike

2 senses of bike

Sense 1

motorcycle, bike

HAS PART: mudguard, splashguard

Sense 2

bicycle, bike, wheel

HAS PART: bicycle seat, saddle

HAS PART: bicycle wheel

HAS PART: chain

HAS PART: coaster brake

HAS PART: handlebar

HAS PART: mudguard, splashguard

HAS PART: pedal, treadle, foot lever

HAS PART: sprocket, sprocket wheel

17

• ExamplePu-erh.cs.utexas.edu$wn bike

Information available for noun bike -hypen Hypernyms -hypon, -treen Hyponyms & Hyponym Tree -synsn Synonyms (ordered by frequency) -partn Has Part Meronyms -meron All Meronyms -famln Familiarity & Polysemy Count -coorn Coordinate Sisters -simsn Synonyms (grouped by similarity

of meaning) -hmern Hierarchical Meronyms -grepn List of Compound Words -over Overview of Senses

Information available for verb bike -hypev Hypernyms -hypov, -treev Hyponyms & Hyponym Tree -synsv Synonyms (ordered by frequency) -famlv Familiarity & Polysemy Count -framv Verb Frames -simsv Synonyms (grouped by similarity

of meaning) -grepv List of Compound Words -over Overview of Senses

Page 18: NLP

Discourse IntegrationSpecifically we do not know whom the

pronoun “I” or the proper noun “Bill” refers to.

To pin down these references requires an appeal to a model of the current discourse context, from which we can learn that the current user is USER068 and that the only person named “Bill” about whom we could be talking is USER073.

Once the correct referent for Bill is known, we can also determine exactly which file is being referred to.

Page 19: NLP

Pragmatic Analysis The final step toward effective understanding is to

decide what to do as a results. One possible thing to do is to record what was said as

a fact and be done with it. For some sentences, whose intended effect is clearly

declarative, that is precisely correct thing to do. But for other sentences, including this one, the

intended effect is different. We can discover this intended effect by applying a set

of rules that characterize cooperative dialogues. The final step in pragmatic processing is to translate,

from the knowledge based representation to a command to be executed by the system.

The results of the understanding process is

Page 20: NLP

Pragmatic AnalysisKnowledge about the kind of

actions that speakers intend by their use of sentences◦REQUEST: HAL, open the pod bay

door.◦STATEMENT: HAL, the pod bay door is

open.◦INFORMATION QUESTION: HAL, is the

pod bay door open?Speech act analysis (politeness,

irony, greeting, apologizing...)

Page 21: NLP

Statistical NLPStatistical NLP aims to perform

statistical inference for the field of NLP

Statistical inference consists of taking some data generated in accordance with some unknown probability distribution and making inferences.

Page 22: NLP

Motivations for Statistical NLP

Cognitive modeling of the human language processing has not reached a stage where we can have a complete mapping between the language signal and the information contents.

Complete mapping is not always required.

Statistical approach provides the flexibility required for making the modeling of a language more accurate.

Page 23: NLP

Real World Application

Automatic summarization Foreign language reading aid Foreign language writing aid Information extraction Information retrieval (IR) - IR is concerned with storing, searching

and retrieving information. It is a separate field within computer science (closer to databases), but IR relies on some NLP methods (for example, stemming). Some current research and applications seek to bridge the gap between IR and NLP.

Machine translation - Automatically translating from one human language to another.

Named entity recognition (NER) - Given a stream of text, determining which items in the text map to proper names, such as people or places. Although in English, named entities are marked with capitalized words, many other languages do not use capitalization to distinguish named entities.

Natural language generation Natural language search

Page 24: NLP

Real World Application

Natural language understanding Optical character recognition anaphora resolution Query expansion Question answering - Given a human language question, the

task of producing a human-language answer. The question may be a closed-ended (such as "What is the capital of Canada?") or open-ended (such as "What is the meaning of life?").

Speech recognition - Given a sound clip of a person or people speaking, the task of producing a text dictation of the speaker(s). (The opposite of text to speech.)

Spoken dialogue system Stemming Text simplification Text-to-speech Text-proofing

Page 25: NLP

Demos with free NLP Application

DEMO

Page 26: NLP

THANKS