Top Banner
Machine Learning and NLP Research Interests Machine Learning for Language Learning and Processing Sharon Goldwater, Frank Keller, Mirella Lapata, Victor Lavrenko, Mark Steedman School of Informatics University of Edinburgh [email protected] October 15, 2008 Goldwater et al. Machine Learning for Language 1
15
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Machine Learning for Language Learning and Processing

Machine Learning and NLPResearch Interests

Machine Learning for Language Learning andProcessing

Sharon Goldwater, Frank Keller, Mirella Lapata,Victor Lavrenko, Mark Steedman

School of InformaticsUniversity of [email protected]

October 15, 2008

Goldwater et al. Machine Learning for Language 1

Page 2: Machine Learning for Language Learning and Processing

Machine Learning and NLPResearch Interests

1 Machine Learning and NLPLatent VariablesMulti-class and Structured VariablesDiscrete, Sparse DataOther Problems

2 Research InterestsParsingLanguage AcquisitionLanguage GenerationInformation Retrieval

Goldwater et al. Machine Learning for Language 2

Page 3: Machine Learning for Language Learning and Processing

Machine Learning and NLPResearch Interests

Latent VariablesMulti-class and Structured VariablesDiscrete, Sparse DataOther Problems

1 Machine Learning and NLPLatent VariablesMulti-class and Structured VariablesDiscrete, Sparse DataOther Problems

2 Research InterestsParsingLanguage AcquisitionLanguage GenerationInformation Retrieval

Goldwater et al. Machine Learning for Language 3

Page 4: Machine Learning for Language Learning and Processing

Machine Learning and NLPResearch Interests

Latent VariablesMulti-class and Structured VariablesDiscrete, Sparse DataOther Problems

Latent Variables

Natural language processing (NLP) problems typically involveinferring latent (non-observed) variables.

given a bilingual text, infer an alignment;

given a string of words, infer a parse tree.

Goldwater et al. Machine Learning for Language 4

Page 5: Machine Learning for Language Learning and Processing

Machine Learning and NLPResearch Interests

Latent VariablesMulti-class and Structured VariablesDiscrete, Sparse DataOther Problems

Multi-class and Structured Variables

The learning targets in NLP often are multi-class, e.g., in part ofspeech tagging:

standard POS tag sets for English have around 60 classes;more elaborate ones around 150 (CLAWS6);

morphological annotation often increases the size of the tagset (e.g., Bulgarian, around 680 tags).

Everything in the sale will have been used in films .PNI PRP AT0 NN1 VM0 VHI VBN VVN PRP NN2 PUN

Goldwater et al. Machine Learning for Language 5

Page 6: Machine Learning for Language Learning and Processing

Machine Learning and NLPResearch Interests

Latent VariablesMulti-class and Structured VariablesDiscrete, Sparse DataOther Problems

Multi-class and Structured Variables

NLP tasks are often sequencing tasks, rather than simpleclassification:

PNI → PRP → AT0 → NN1 → VM0 → VHI → VBN → VVN . . .

Many NLP tasks are not classification but often involvehierarchical structure, e.g., in parsing:

Most structures in the Penn Treebank are unique.

Goldwater et al. Machine Learning for Language 6

Page 7: Machine Learning for Language Learning and Processing

Machine Learning and NLPResearch Interests

Latent VariablesMulti-class and Structured VariablesDiscrete, Sparse DataOther Problems

Discrete, Sparse Data

Linguistic data is different from standard ML data (speech, vision):

typically discrete (characters, words, texts);

follows a Zipfian distribution.

Goldwater et al. Machine Learning for Language 7

Page 8: Machine Learning for Language Learning and Processing

Machine Learning and NLPResearch Interests

Latent VariablesMulti-class and Structured VariablesDiscrete, Sparse DataOther Problems

Discrete, Sparse Data

The Zipfian distribution leads to ubiquitous data sparseness:

standard maximum likelihood estimation doesn’t work well forlinguistic data;

a large number of smoothing techniques have been developedto deal with this problem;

most of them are ad hoc; Bayesian methods are a principledalternative.

G ∼ PY(d , θ,G0)

Goldwater et al. Machine Learning for Language 8

Page 9: Machine Learning for Language Learning and Processing

Machine Learning and NLPResearch Interests

Latent VariablesMulti-class and Structured VariablesDiscrete, Sparse DataOther Problems

Other Problems

NLP typically uses pipeline models; errors propagate;

models often highly domain-dependent (models for broadcastnews will not work well for biomedical text, etc.);

there is no single error function to optimize; evaluationmetrics differ from task to task (BLEU for MT, ROUGE forsummarization, PARSEVAL for parsing).

Goldwater et al. Machine Learning for Language 9

Page 10: Machine Learning for Language Learning and Processing

Machine Learning and NLPResearch Interests

ParsingLanguage AcquisitionLanguage GenerationInformation Retrieval

1 Machine Learning and NLPLatent VariablesMulti-class and Structured VariablesDiscrete, Sparse DataOther Problems

2 Research InterestsParsingLanguage AcquisitionLanguage GenerationInformation Retrieval

Goldwater et al. Machine Learning for Language 10

Page 11: Machine Learning for Language Learning and Processing

Machine Learning and NLPResearch Interests

ParsingLanguage AcquisitionLanguage GenerationInformation Retrieval

Parsing (Keller, Steedman)

Current focus of research in probabilistic parsing:

models for more expressive syntactic representations (CCG,TAG, dependency grammar);

semi-supervised induction of grammars and parsing models;

cognitive modeling:

incrementality;limited parallelism, limited memory;evaluation against behavioral data.

����

7 ��������

8����

9

��������

����

����

����

��������

����

������

������

���

���

���������������������������������� ��������

������������������ ��������������������

54321 6

10 11

The pilot embarrassed John and put himself in a very awkward situation.

gaze duration = 5+6Total time = 5+6+8+10Second pass time = 8+10

First fixation time = 5

Skipping rate: e.g. put

Goldwater et al. Machine Learning for Language 11

Page 12: Machine Learning for Language Learning and Processing

Machine Learning and NLPResearch Interests

ParsingLanguage AcquisitionLanguage GenerationInformation Retrieval

Language Acquisition (Goldwater, Steedman)

Research focuses on Bayesian models for improving unsupervisedNLP and understanding of human language acquisition:

What constraints/biases are needed for effectivegeneralization?

How can different sources of information be successfullycombined?

ML methods and problems:

infinite models, esp. those for sequences/hierarchies;

incremental, memory-limited inference methods;

joint inference of different kinds of linguistic information(e.g., morphology and syntax).

Goldwater et al. Machine Learning for Language 12

Page 13: Machine Learning for Language Learning and Processing

Machine Learning and NLPResearch Interests

ParsingLanguage AcquisitionLanguage GenerationInformation Retrieval

Language Generation (Lapata)

Research focuses on data-driven models for language generation:

fluent and coherent text that resembles human writing;

general modeling framework for different input types (timeseries data, pictures, logical forms).

ML methods and problems:

mathematical programming for sentence compression andsummarization;

latent variable models for image caption generation;

models have to integrate conflicting constraints and varyinglinguistic representations.

Goldwater et al. Machine Learning for Language 13

Page 14: Machine Learning for Language Learning and Processing

Machine Learning and NLPResearch Interests

ParsingLanguage AcquisitionLanguage GenerationInformation Retrieval

Language Generation (Lapata)

Example: image captioning beyond keywords.

troop, Israel, force, ceasefire, soldiers

Thousands of Israeli troops are in Lebanon as the ceasefire begins.

Goldwater et al. Machine Learning for Language 14

Page 15: Machine Learning for Language Learning and Processing

Machine Learning and NLPResearch Interests

ParsingLanguage AcquisitionLanguage GenerationInformation Retrieval

Information Retrieval (Lavrenko)

Universal search:

learn to relate relevant text/images/products/DB records;

data: high-dimensional, extremely sparse, but dimensionalityreduction is a bad idea;

targets: focused information needs, not broad categories;

semi-supervised: lots of unlabeled data, few judgments.

Learning to rank:

partial preferences → ranking function;

objective: non-smooth, can be very expensive to evaluate.

Novelty detection:

example: identify first reports of events in the news;

supervised task, but hard to learn anything from labels;

best approaches unsupervised, performance very low.

Goldwater et al. Machine Learning for Language 15