Top Banner
Kocaeli University, Izmit, June 12, 2014 Man-Machine Communication Introduction : Language Technologies Adam Mickiewicz University in Poznań Dept of Computer Linguistics and Artificial Intelligence [email protected] Zygmunt Vetulani
14

Kocaeli University , Izmit, June 12, 2014

Feb 23, 2016

Download

Documents

satin

Kocaeli University , Izmit, June 12, 2014. Man-Machine Communication Introduction : Language Technologies. Zygmunt Vetulani. Adam Mickiewicz University in Poznań Dept of Computer Linguistics and Artificial Intelligence [email protected]. I. PLAN OF THE LECTURE - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Kocaeli University , Izmit,  June  12, 2014

Kocaeli University, Izmit, June 12, 2014

Man-Machine CommunicationIntroduction : Language

Technologies

Adam Mickiewicz University in PoznańDept of Computer Linguistics and Artificial Intelligence

[email protected]

Zygmunt Vetulani

Page 2: Kocaeli University , Izmit,  June  12, 2014

2

 I. PLAN OF THE LECTURE

Introduction to Language Technologies- a historical perspective- review of problems, resources and tools

  

Izmit, June 12, 2014

Page 3: Kocaeli University , Izmit,  June  12, 2014

3

Definition 1Natural language : "everyday language" spontaneously spoken by some community. Typically acquired by young members of the community as the mother tongue. It may also be learned by the adult speakers typically from outside of this community. Its primary role is to serve the inter-human communication purposes.The natural language is a result of evolution.Definition 2By a technology we mean an organized set of tools, methods, techniques which constitute the "know-how" used to solve problems, to perform specific functions or to produce artifacts or/and information. Human Language Technologies are closely connected with information processing.

Natural Language; Technology - definitions

Izmit, June 12, 2014

Page 4: Kocaeli University , Izmit,  June  12, 2014

4

Beginnings

- The Neanderthal man was (probably) able to speek- Humans started speaking some 50.000 years ago or earlier- Humans invented writting some 7000 (?) years ago - (invention of writting = beginning of the historical times) - Now: most of some 7000 languages spoken on the Earth are

doted with a writting system

Some history

Izmit, June 12, 2014

Page 5: Kocaeli University , Izmit,  June  12, 2014

5

The first language resources Dispilio Tablet (Grèce)

Dictionaries

- Egypt - VII c. BC.- India Amarakosha the first lexicon of sanskrit (Vc. BC)- Ancient Greece - Homer (V c. BC) - glossaire- Ancient Rome - Onomasticon - II c. BC.

(Wiki)Izmit, June 12, 2014

Grammars

- India - VIs. BC. - Yaska, IV c. BC . - Panini (Sanskrit)- Ancient Greece - III c. BC.- Ancient Rome - I c. BC. (Latin)- Arabic Grammar - VIII c.

Page 6: Kocaeli University , Izmit,  June  12, 2014

6

Middle Ages: writing is serving the "global" culture of Christian and Muslim world

Printing characters:known in China since XI c.(Bi Sheng)

(pictures from Wiki)

Renaissance : The Gutenberg's Revolution (1452-1455) - the first serial reproduction of the Bible

XVIII-XIX s. Industrial Revolution - high speed printing machines

Text reproduction technologies

Izmit, June 12, 2014

Page 7: Kocaeli University , Izmit,  June  12, 2014

7

Recent revolutions

Humans' environment changes over the last 7000 years :virgin natural environment -> rural -> urban -> artificial (saturated with artifacts and technologies)

- XX c. : Computer revolution - >Text processing and internetPublic information accessible to (almost) everybody in form of text, sound, image, inscriptions, messages, publicity, instructions,...

Computers makes easy and cheap automatic text processing (spoken and written)-> Computer technologies of natural language appear-> Invention of the Internet

New phenomena : In XX century the environment is full of information-bearing artifacts

Izmit, June 12, 2014

Page 8: Kocaeli University , Izmit,  June  12, 2014

8

 XXI c. : Technologies of the Information Society EpochA new phenomenon of XX/XXI century : the environment (information rich) becomes interactive with respect to humans.

Some examples: ◦ dialogue between users (human) and machines (robots), ◦ automatic voice command recognition, ◦ virtual reality, ◦ sociable (social) robots.

(Many science-fiction ideas has been implemented or are close to be implemented)

Present day revolutions

Izmit, June 12, 2014

Page 9: Kocaeli University , Izmit,  June  12, 2014

9

- XXI c. - A NEW GLOBALISATION EPOCH

In the past :Roman, Christian and Islamic civilizations pretended to be global

Characteristic features (typical phenomena):- common practice of a "lingua franca" - LATIN -> English- general knowledge of writing as result of the industrial revolution (XVIII-XIX c.)- common access to the industrial infrastructure (XX c.)- common access to information : Internet (XX/XXI c.)

Now:Information is the "force motrice" of the new globalization 

Globalization needs a technological support, in particular in the domain of information (language) technologies.  

TECHNOLOGICAL EXCLUSION PROBLEM for people using „less-resourced -languages”

Some history – present days revolution

Izmit, June 12, 2014

Page 10: Kocaeli University , Izmit,  June  12, 2014

10

HIGH LEVEL TECHNOLOGIES (AI)Systems with linguistic competence. Human-machine NL interfaces

Software which organizes communication between the user and the computer system on the bases of natural language, i.e. the software doted with the software which emulate the human language competence. ("Computing Machinery and Intelligence")

Machine translation(MT) – automatic translation without any human intervention, performed by software uniquely (no human assistance)

MT is historically one of the first language technologies. Anticipated by the ideas of Rene Descartes. Couturatet et Leau noted also a lost text by Wilhelm L. Rieger : " Zifferen-grammatik, welche mit Hilfe der Wörterbtcher ein mechanisches Ubersetzen aus einer Sprache in alle andere ermöglicht " (Prag, 1903). (also see in Bernd Spillner, in Übersetzung im Umbruch, 1996, p. 209) "Translation" memorandum by Warren Weaver, 1949. (Georgetown Experiment 1954 (60 phrases))

Automatic summarizationAutomatic generation of a summary (admitting lost of secondary information).

Review of problems, resources and tools

Izmit, June 12, 2014

Page 11: Kocaeli University , Izmit,  June  12, 2014

11

MID LEVEL TECHNOLOGIES : NLPParsing

Identification of the structure of text units (typically sentences) Natural language generationNatural language understandingDiscourse analysis

Description of the discourse structure (spoken or written) (structural, statistical) Information ProcessingInformation retrieval

Search (in a corpus of documents) for documents containing necessary information Search for information about documents (metadata extraction)

Information extractionExtraction of a structured information from (non-structured) text documents

 

Review of problems, resources and tools

Izmit, June 12, 2014

Page 12: Kocaeli University , Izmit,  June  12, 2014

12

Text ProcessingExtraction of relations /e.g. temporal/ (Relationship extraction)Named-entity recognitionTerminology extraction from corporaPart of speech tagging

Tagging of words in the text with labels containing information connected with these words

Text annotationTagging text with metadata

Text segmentationMorphemes, words, sentences,...

Co-reference processing, anaphora 

Izmit, June 12, 2014

Page 13: Kocaeli University , Izmit,  June  12, 2014

13

Processing of words

Lemmatisation, morphological analysisWord sense disambiguationSentiment analysis

Study and detection of positive/negative connotations of text elements

Optical character recognition (OCR).Text as picture -> text file

T9 Fast text capturing technology with the 9 keys keyboard (typical cellular phone)

 

Izmit, June 12, 2014

Page 14: Kocaeli University , Izmit,  June  12, 2014

14

 BASIC TECHNOLOGIESCorpora- speech- text- dialogueGrammarsLexicons, thesauriOntologies (wordnets)

SPEECH PROCESSINGSpeech recognition - recognition of speech captured with a microphone

Speech-to-textSpeaker identification ( speaker identification)Speech segmentation (phone identification)Segmentation of the voice signal

Text-to-speech, speech generation

Izmit, June 12, 2014