Top Banner
1 Unsupervised and Knowledge-Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at University of Wolverhampton September 7, 2007
37

1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

1

Unsupervised and Knowledge-Free Natural Language

Processing in the Structure Discovery Paradigm

Chris BiemannUniversity of Leipzig, Germany

Talk at University of Wolverhampton

September 7, 2007

Page 2: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

2

OutlineReview of traditional approaches• Knowledge-intensive vs. knowledge-free• Degrees of Supervision

A new approach• The Structure Discovery Paradigm

Graph-based SD procedures• Graph models for language processing• Chinese Whispers Graph Clustering

Structure Discovery Processes• Language Separation• Unsupervised POS tagging• Word Sense Induction

Page 3: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

3

Knowledge-Intensive vs. Knowledge-Free

In traditional automated language processing, knowledge is involved in all cases where humans manually tell machines

• How to process language by explicit knowledge

• How a task should be solved by implicit knowledge

Knowledge can be provided by the means of:

• Dictionaries, e.g. thesaurus, WordNet, ontologies, …

• (grammar) rules

• Annotation

Page 4: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

4

Degrees of Supervision

Supervision is providing positive and negative training examples to Machine Learning algorithms, which use this as a basis for building a model that reproduces the classification on unseen data

Degrees:• Fully supervised (Classification): Learning is only carried out on

fully labeled training set• Semi-supervised: Unlabeled examples are also used for building

a data model• Weakly-supervised (Bootstrapping): A small set of labeled

examples is grown and classifications are used for re-training• Unsupervised (Clustering): No labeled examples are provided

Page 5: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

5

Structure Discovery Paradigm SD:• Analyze raw data and identify regularities• Statistical methods, clustering• Knowledge-free, unsupervised• Structures: as many as can be discovered• Language-independent, domain-independent, encoding-independent• Goal: Discover structure in language data and mark it in the data

Page 6: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

6

Example: Discovered Structures

• Annotation on various levels

• Similar labels denote similar properties as found by the SD algorithms

• Similar structures in corpus are annotated in a similar way

„ Increased interest rates lead to investments in banks .“

<sentence lang=12, subj=34.11> <chunk id=c25> <word POS=p3 m=0.0 s=s14>In=creas-ed</word> <MWU POS=p1 s=s33> <word POS=p1 m=5.1 s=s44>interest</word> <word POS=p1 m=2.12 s=s106>rate-s</word> </MWU> </chunk> <chunk id=c13> <MWU POS=p2> <word POS=p2 m=17.3 s=74>lead</word> <word POS=p117 m=11.98>to</word> </MWU> </chunk> <chunk id=c31> <word POS=p1 m=1.3 s=33>investment-s</word> <word POS=p118 m=11.36>in</word> <word POS=p1 m=1.12 s=33>bank-s</word> </chunk> <word> POS=298> . </word></sentence>

Page 7: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

7

Consequences of Working in SD• Only input allowed is raw text data• Machines are told how to algorithmically discover structure• Self-annotation process by marking regularities in the data• Structure Discovery process is iterated

Text Data

SD algorithm

Find regularities by analysis

Annotate data with regularitiesSD algorithm

SD algorithmSD algorithms

Page 8: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

8

Pros and Cons of Structure Discovery

Advantages:

• Cheap: only raw data needed• Alleviation of acquisition bottleneck• Language and domain independent• No data-resource mismatch (all resources leak)

Disadvantages:

• No control over self-annotation labels• Congruence to linguistic concepts not guaranteed• Much computing time needed

Page 9: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

9

What is it good for? How do I know?

• Many structural regularities can be thought of, some are interesting, some are not.

• Structures discovered by SD algorithms will not necessarily match the concepts of linguists

• Working in the SD paradigm means to over-generate structure acquisition methods and to check, whether these are helpful

Methods for telling helpful from useless SD procedures:• „Look at my nice clusters“-approach: Examine data by hand. While

good in the initial phase of testing, this is inconclusive: choice of clusters, coverage…

• Task-based evaluation: Use the labels obtained as features in a Machine Learning scenario and measure the contribution of each label type. Involves supervision, is indirect

Page 10: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

10

Graph models for SD procedures

Motivation for graph representation• Graphs are an intuitive and natural way to encode

language units as nodes and their similarities as edges - but also other representations are possible

• Graph clustering can efficiently perform abstraction by grouping units into homogeneous sets with Chinese Whispers

Some graphs on basic units• Word co-occurrence (neighbour/sentence), significance,

higher orders• Word context similarity based on global context vectors• Sentence/document similarity on common words

Page 11: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

11

Graph Clustering• Find groups of nodes in undirected, weighted graphs• Hierarchical Clustering vs. Flat Partitioning

3 3 3

3 4 4 3

Page 12: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

12

? Desired outcomes ?

• Colors symbolise partitions

3 3 3

3 4 4 3

Page 13: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

13

Chinese Whispers Algorithm

• Nodes have a class and communicate it to their adjacent nodes

• A node adopts one of the the majority class in its neighbourhood

• Nodes are processed in random order for some iterations

Algorithm:

initialize:forall vi in V: class(vi)=i;

while changes:

forall v in V, randomized order:

class(v)=highest ranked class in neighborhood of v;

AL1

DL2

EL3

BL4

CL3

58

63

deg=1deg=2

deg=3deg=5

deg=4

Page 14: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

14

Properties of CWPRO:• Efficiency: CW is time-linear in the number of edges. This is bound by

n² with n= number of nodes, but in real world data, graphs are much sparser

• Parameter-free: this includes number of clusters

CON:• Non-deterministic: due to random order processing and possible ties

w.r.t. the majority.• Does not converge: See tie example:• Formally hard to analyse

However, the CONs are not severe for real world data...

Page 15: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

15

Application: Language Separation

• Cluster the co-occurrence graph of a multilingual corpus

• Use words of the same class in a language identifier as lexicon

• Almost perfect performance

Precision, Recall and F-value for 7-lingual corpora

0,96

0,97

0,98

0,99

1

100 1000 10000 100000

# of sentences per language

P/R

/F

Precision Recall F-value

Page 16: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

16

“Look at my nice languages!” Cleaning CUCWeb

Latin:In expeditionibus tessellata et sectilia pauimenta circumferebat.Britanniam petiuit spe margaritarum: earum amplitudinem conferebat et interdum sua manu

exigebat ..

Scripting:@echo @cd $(TLSFDIR);$(CC) $(RTLFLAGS) $(RTL_LWIPFLAGS) -c $(TLSFSRC) …@echo @cd $(TOOLSDIR);$(CC) $(RTLFLAGS) $(RTL_LWIPFLAGS) -c $(TOOLSSRC) ..

Hungarian:A külügyminiszter a diplomáciai és konzuli képviseletek címjegyzékét és konzuli …Köztestületek, jogi személyiséggel és helyi jogalkotási jogkörrel.

Esperanto:Por vidi ghin kun internacia kodigho kaj kun kelkaj bildoj kliku tie chi ) La Hispana.. Ne nur pro tio, ke ghi perdigis la vivon de kelk-centmil hispanoj, sed ankau pro ghia efiko..

Human Genome:1 atgacgatga gtacaaacaa ctgcgagagc atgacctcgt acttcaccaa ctcgtacatg 61 ggggcggaca

tgcatcatgg gcactacccg ggcaacgggg tcaccgacct ggacgcccag 121 cagatgcacc …

Isoko (Nigeria):(1) Ko Ileleikristi a re rowo ino Oghene yo Esanerovo?(5) Ko Jesu o whu evao uruwhere? 

Page 17: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

17

Unsupervised POS taggingGiven:

Unstructured monolingual text corpus

Goal: Induction of POS tags for all words.

Motivation: • POS information is a processing step in a variety of NLP

applications such as parsing, IE, indexing• POS taggers need a considerable amount of hand-tagged

training data which is expensive and only available for major languages

• Even for major languages, POS taggers are suited for well-formed texts and do not cope well with domain-dependent issues as being found e.g. in email or spoken corpora

Page 18: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

18

Steps in unsupervised POS tagging... , sagte der Sprecher bei der Sitzung .... , rief der Vorsitzende in der Sitzung .

... , warf in die Tasche aus der Ecke .

17

C1: sagte, warf, riefC2: Sprecher, Vorsitzende, TascheC3: inC4: der, die

... , sagte|C1 der|C4 Sprecher|C2 bei der|C4 Sitzung .... , rief|C1 der|C4 Vorsitzende|C2 in|C3 der|C4 Sitzung .

... , warf|C1 in|C3 die|C4 Tasche|C2 aus der|C4 Ecke .

)24,|3( CCwordCP

... , sagte|C1 der|C4 Sprecher|C2 bei|C3 der|C4 Sitzung|C2 .... , rief|C1 der|C4 Vorsitzende|C2 in|C3 der|C4 Sitzung|C2 .

... , warf|C1 in|C3 die|C4 Tasche|C2 aus|C3 der|C4 Ecke|C2 .

Unlabelled Text

Distributional Vectors NB-cooccurrences

high frequency words medium frequency words

Graph 1 Graph 2

Partitio

nin

g1

Maxtag Lexicon

Partially Labelled Text

Fully Labelled Text

Trigram Viterbi Tagger

Chinese Whispers Graph Clustering

Partitio

nin

g2

Page 19: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

19

Partitioning 1 Example• Collect global contexts

for top 10K words

• Cosine between vectors = similarity- matrix not sparse!

• Construct graph, cluster with Chinese Whispers– no predefined number of

clusters– efficient– parameter: threshold on sim.

Value– excludes singletons

... _KOM_ sagte der Sprecher bei der Sitzung _ESENT_

... _KOM_ rief der Vorsitzende in der Sitzung _ESENT_

... _KOM_ warf in die Tasche aus der Ecke _ESENT_Features: der(1), die(2), bei(3), in(4), _ESENT_(5), _KOM_(6)

Word

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6

sagte

1 1

rief

1 1

warf

1 1 1

Sprecher

1 1 1

Vorsitzende

1 1 1

Tasche

1 1 1

-2 -1 +1 +2

Cos(x,y) sagte rief warf Sprecher Vorsitzende Tasche

sagte 1

rief 1 1

warf 0.408 0.408 1

Sprecher 0 0 0 1

Vorsitzende 0 0 0.333 0.666 1

Tasche 0 0 0 0.408 0.333 1

Page 20: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

20

Partitioning 2 Example• Clustering more than 10K words based on context vectors

is not viable (time-quadratic). Vectors tend to get sparser for lower frequency words

• Use neigbouring cooccurrences to compute similarity between medium- and low frequency words (without top 2000):

• Clustering with CW: Medium and low frequency words are arranged in POS-motivated clusters

Page 21: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

21

unsuPOS: Ambiguity Example

Word cluster ID cluster members (size)

I 166 I (1)

saw 2 past tense verbs (3818)

the 73 a, an, the (3)

man 1 nouns (17418)

with 13 prepositions (143)

a 73 a, an, the (3)

saw 1 nouns (17418)

. 116 . ! ? (3)

Page 22: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

22

unsuPOS: Medline tagset1 (13721) recombinogenic, chemoprophylaxis, stereoscopic, MMP2, NIPPV, Lp, biosensor, bradykinin, issue, S-100beta,

iopromide, expenditures, dwelling, emissions, implementation, detoxification, amperometric, appliance, rotation, diagonal, 2(1687) self-reporting, hematology, age-adjusted, perioperative, gynaecology, antitrust, instructional, beta-thalassemia, interrater,

postoperatively, verbal, up-to-date, multicultural, nonsurgical, vowel, narcissistic, offender, interrelated, 3(1383) proven, supplied, engineered, distinguished, constrained, omitted, counted, declared, reanalysed, coexpressed, wait, 4(957) mediates, relieves, longest, favor, address, complicate, substituting, ensures, advise, share, employ, separating, allowing, 5(1207) peritubular, maxillary, lumbar, abductor, gray, rhabdoid, tympanic, malar, adrenal, low-pressure, mediastinal, 6(653) trophoblasts, paws, perfusions, cerebrum, pons, somites, supernatant, Kingdom, extra-embryonic, Britain, endocardium, 7(1282) acyl-CoAs, conformations, isoenzymes, STSs, autacoids, surfaces, crystallins, sweeteners, TREs, biocides, pyrethroids, 8(1613) colds, apnea, aspergilloma, ACS, breathlessness, perforations, hemangiomas, lesions, psychoses, coinfection, terminals,

headache, hepatolithiasis, hypercholesterolemia, leiomyosarcomas, hypercoagulability, xerostomia, granulomata, pericarditis, 9(674) dysregulated, nearest, longest, satisfying, unplanned, unrealistic, fair, appreciable, separable, enigmatic, striking, i10(509) differentiative, ARV, pleiotropic, endothermic, tolerogenic, teratogenic, oxidizing, intraovarian, anaesthetic, laxative, 13(177) ewe, nymphs, dams, fetuses, marmosets, bats, triplets, camels, SHR, husband, siblings, seedlings, ponies, foxes,

neighbor, sisters, mosquitoes, hamsters, hypertensives, neonates, proband, anthers, brother, broilers, woman, eggs, 14(103) considers, comprises, secretes, possesses, sees, undergoes, outlines, reviews, span, uncovered, defines, shares, s15(87) feline, chimpanzee, pigeon, quail, guinea-pig, chicken, grower, mammal, toad, simian, rat, human-derived, piglet, ovum, 16(589) dually, rarely, spectrally, circumferentially, satisfactorily, dramatically, chronically, therapeutically, beneficially, already, 18(124) 1-min, two-week, 4-min, 8-week, 6-hour, 2-day, 3-minute, 20-year, 15-minute, 5-h, 24-h, 8-h, ten-year, overnight, 120-21(12) July, January, May, February, December, October, April, September, June, August, March, November23(13) acetic, retinoic, uric, oleic, arachidonic, nucleic, sialic, linoleic, lactic, glutamic, fatty, ascorbic, folic25(28) route, angle, phase, rim, state, region, arm, site, branch, dimension, configuration, area, Clinic, zone, atom, isoform, 247(6) P<0.001, P<0.01, p<0.001, p<0.01, P<.001, P<0.0001391(119) alcohol, ethanol, heparin, cocaine, morphine, cisplatin, dexamethasone, estradiol, melatonin, nicotine, fibronectin,

Page 23: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

23

Task-based unsuPOS evaluationUnsuPOS tags are used as features, performance is

compared to no POS and supervised POS. Tagger was induced in one-CPU-day from BNC

• Kernel-based WSD: better than noPOS, equal to suPOS• POS-tagging: better than noPOS• Named Entity Recognition: no significant differences• Chunking: better than noPOS, worse than suPOS

Page 24: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

24

Application: Word Sense Induction

• Co-occurrence graphs of ambiguous words can be partitioned: Leave out focus word

• Clusters contain context words for disambiguation

Page 25: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

25

hip

Page 26: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

26

hip

Page 27: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

27

hip

Page 28: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

28

hip

Page 29: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

29

Future work

• Grammar induction: Use unsupervised POS labels and sequences for inducing tree-like structures

• Word Sense Induction and Disambiguation with hierarchical sense models, evaluation in SEMEVAL framework

• Structure Discovery Machine: putting many SD procedures together, including morphology induction, multiword expression identification, relation finding etc.

• Semisupervised SD: Framework for connecting SD processes to application-based settings

Page 30: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

30

Summary• Structure Discovery Paradigm contrasted to

traditional approaches:– no manual annotation, no resources (cheaper)– language- and domain-independent– iteratively enriching structural information by finding and

annotating regularities

• Graph-based SD procedures– Chinese Whispers for Language Separation,

unsupervised POS tagging and Word Sense Induction

Page 31: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

31

Questions?

THANKS FOR YOUR ATTENTION!

Page 32: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

32

Emergent Language Generation Models

From SD to NLG• Structure Discovery allows us to process yet undescribed

languages• Generation models only capture certain aspects of real natural

language• SD processes can identify what is common and what is different

between generated and real language• Generation models can be improved according to meet the

criteria of SD processes

Understanding the emergence of natural language is a third way of approaching the understanding of natural language structure (besides classical linguistics and bottum-up NLP)

Page 33: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

33

Graph-based Generation ModelWord GeneratorSentence Generator

Page 34: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

34

Comparison: Generated and Real

1

10

100

1000

10000

1 10 100 1000 10000

fre

qu

en

cy

rank

rank-frequency

sentence generatorEnglish

power law z=1.5

1

10

100

1000

10000

100000

1 10

fre

qu

en

cy

length in letters

word length

sentence generatorEnglish

gamma distribution

1

10

100

1000

10000

1 10 100

num

ber

of s

ente

nces

length in words

sentence length

sentence generatorEnglish

0.001

0.01

0.1

1

10

100

1000

10000

1 10 100 1000

nr o

f ver

tices

degree interval

degree distribution

sentence generatorEnglish

word generatorpower law z=2

Page 35: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

35

Computational Linguistics and Statistical NLP

CL:• Implementing linguistic theories with computers• Rule-based approaches• Rules found by introspection, not data-driven• Explicit knowledge• Goal: understanding language itself

Statistical NLP:• Building systems that perform language processing tasks• Machine Learning approaches• Models are built by training on annotated dataset• Implicit knowledge• Goal: Build robust systems with high performance

There is a continuum rather than a sharp cutting edge

Page 36: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

36

Building Blocks in SD

Hierarchical levels of basic units in text data:• Letters• Words• Sentences• DocumentsThese are assumed to be recognizable in the remainder.

SD allows for • arbitrary numbers of intermediate levels • grouping of basic into complex units, but these have to be found by SD procedures.

Page 37: 1 Unsupervised and Knowledge- Free Natural Language Processing in the Structure Discovery Paradigm Chris Biemann University of Leipzig, Germany Talk at.

37

Similarity and Homogeneity

For determining which units share structure, a similarity measure for units is needed. Two kinds of features are possible:

• Internal features: compare units based on the lower level units they contain

• Context features: compare units based on other units of same or other level that surround them

A clustering based on unit similarity yields sets of units that are homogeneous w.r.t. structure

This is an abstraction process: Units are subsumed under the same label.