Top Banner
Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken The Rôle of Linguistics for the Future of Language Processing
23

Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

Mar 26, 2015

Download

Documents

Antonio Keith
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

Hans UszkoreitGerman Research Center for Artificial Intelligence

and Saarland University at Saarbruecken

Hans UszkoreitGerman Research Center for Artificial Intelligence

and Saarland University at Saarbruecken

The Rôle of Linguisticsfor the Future of

Language Processing

The Rôle of Linguisticsfor the Future of

Language Processing

Page 2: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

The development of linguistics

Linguistics and the computer

The relevance of CL for theoretical linguistics

The role of linguistics for language technology

Current trends and outlook

The development of linguistics

Linguistics and the computer

The relevance of CL for theoretical linguistics

The role of linguistics for language technology

Current trends and outlook

OutlineOutline

Page 3: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

Data-Gathering and Maintenance• automatic handling of large volumes of data

Scientific Computing • data and model visualization• data exploitation, • simulation• modelling

Electronic scientific information• data on research (centers, people, resources, projects,

literature)

Electronic scientific content• reports, articles, books, e-journals, e-print archives

Data-Gathering and Maintenance• automatic handling of large volumes of data

Scientific Computing • data and model visualization• data exploitation, • simulation• modelling

Electronic scientific information• data on research (centers, people, resources, projects,

literature)

Electronic scientific content• reports, articles, books, e-journals, e-print archives

IT in ScienceIT in Science

Page 4: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

Development of LinguisticsDevelopment of Linguistics

first half of 20th century: linguistics becomes concrete structuralist linguistics - ontological concepts (entities and structures)

second half of 20th century: linguistics becomes formalgenerative linguistics - formalisms for syntax and semantics

first half of 21st century: linguistics becomes empirical empirical linguistics - quantitative models - graded grammaticality

Page 5: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

The Rôle of ComputationThe Rôle of Computation

formalization led to highly complex systems of formal rules, principles or constraints that cannot be tested, validated and modified without sophisticated information processing

language data of sufficient size cannot be gathered, searched, and maintained anymore without powerful computing

Page 6: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

Empirical LinguisticsEmpirical Linguistics

discrete findings

statistical findings

replicability

shared interpretations of data

connection with data and results

Page 7: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

EMPIRICAL LINGUISTICS EMPIRICAL LINGUISTICS

corpus data experimentalpsycholinguistic data

introspective data

DB of relevant data

research

Page 8: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

Driving Forces of CLDriving Forces of CL

CognitionCognition

models of human models of human language processinglanguage processing

CognitionCognition

models of human models of human language processinglanguage processing

EngineeringEngineering

language technologylanguage technologyapplicationsapplications

EngineeringEngineering

language technologylanguage technologyapplicationsapplications

LinguisticsLinguistics

linguistic theorylinguistic theory

LinguisticsLinguistics

linguistic theorylinguistic theory

Page 9: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

Role of Computing in LinguisticsRole of Computing in Linguistics

theoreticallinguistics

applied linguistics

linguistics w/o the computer

linguistics with the computer

Page 10: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

Until 1980Until 1980

LinguisticsComputational

Linguistics

Page 11: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

1980-19901980-1990

LinguisticsComputational

Linguistics

Page 12: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

1990 - 20001990 - 2000

LinguisticsComputational

Linguistics

Page 13: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

LT METHODSLT METHODSdiscrete non-discretehybrid

shallow

deep

HMM-basedHMM-basedPOS TaggerPOS Tagger

Page 14: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

discrete non-discretehybrid

shallow

deep

HPSG-ParserHPSG-Parserwith MRSwith MRS

LT METHODSLT METHODS

Page 15: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

discrete non-discretehybrid

shallow

deep

PCF Parser PCF Parser

LT METHODSLT METHODS

Page 16: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

discrete non-discretehybrid

shallow

deep

syntactic LFGsyntactic LFGparser with MEparser with MEselection selection

LT METHODSLT METHODS

Page 17: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

discrete non-discretehybrid

shallow

deep

LT METHODS (Trends)LT METHODS (Trends)

Page 18: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

Simulation and ModellingSimulation and Modelling

N

NP

A

NDetV

VP

NP

S

Sue gave Paul an old penny.

NP

PHON/anoldpenny/

SYN

CATNP

HEADCASEobjectiveNUMBERsingPERSONthird

VALENCEvstruc

SEM

QUANTexistVARX1

RESTR

RELold'VARX1

ARGpenny'

Page 19: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

Sue gab Paul einen alten Pfennig.

NP

NA

NDetV

S/NP

NP

S

NP

N

NP

A

NDetV

VP

NP

S

Sue gave Paul an old penny.

NP

x[(old'(penny')) (x) Past(give'(sue‘, paul‘, x)))]

Page 20: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

APPLICATIONSAPPLICATIONS

Machine Translation

e.g. Systran, Logos, METAL-Comprendium, IBM PT

Access to Databases

e.g. Core Language Engine

New: Information Extraction and Text Enrichment

e.g. WHITEBOARD, DEEP THOUGH

Machine Translation

e.g. Systran, Logos, METAL-Comprendium, IBM PT

Access to Databases

e.g. Core Language Engine

New: Information Extraction and Text Enrichment

e.g. WHITEBOARD, DEEP THOUGH

Page 21: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

Problems with Deep AnalysisProblems with Deep Analysis

Coverage (Development Time)

Robustness (Coping with Out-of-Grammar Input)

Efficiency (Runtime and Space Efficiency)

Specificity (Selection among Readings)

Coverage (Development Time)

Robustness (Coping with Out-of-Grammar Input)

Efficiency (Runtime and Space Efficiency)

Specificity (Selection among Readings)

Page 22: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

OutlookOutlook

Linguistics will develop hybrid discrete and nondiscrete models of language

More subareas of linguistics will employ computational modelling

Computational linguistics will play a central role in the emprirical branch of linguistic research

Computational linguistics methods and results do have a future in language technology

Language technology will have to get more deeply into semantics

The field provides some grand challenges

Linguistics will develop hybrid discrete and nondiscrete models of language

More subareas of linguistics will employ computational modelling

Computational linguistics will play a central role in the emprirical branch of linguistic research

Computational linguistics methods and results do have a future in language technology

Language technology will have to get more deeply into semantics

The field provides some grand challenges

Page 23: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken Hans Uszkoreit German Research Center for Artificial.

LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit

Grand ChallengesGrand Challenges

hybrid models of language processing and learning,

models of language change

empirical methodology of language science: large multilevel linguistically interpreted data collections

ambient computing -- ubiquitous natural access to information and assistance

turning the WWW as well as personal and collective digital infor-mation repositories into digital memories and knowledge bases

hybrid models of language processing and learning,

models of language change

empirical methodology of language science: large multilevel linguistically interpreted data collections

ambient computing -- ubiquitous natural access to information and assistance

turning the WWW as well as personal and collective digital infor-mation repositories into digital memories and knowledge bases