Top Banner
Annotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling Burkhard Dietterle Clarin-D F-AG 7 Curation Project II 5. Arbeitstagung 25.04.2013, Hamburg
54

Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Mar 19, 2018

Download

Documents

truonglien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Annotating Dependency Relations in Non-standard Varieties

Marc Reznicek Stefanie Dipper Anke Lüdeling

Burkhard Dietterle Clarin-D F-AG 7 Curation Project II

5. Arbeitstagung 25.04.2013, Hamburg

Page 2: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Overview

2

Annotation of non-standard varieties

NoSta-D corpus

Dependencies

Normalisation

Chat-specific linguistic structures

Coordination (generell)

Outlook

Page 3: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Clarin-D Curation Project II

3

Clarin F-AG 7 - Curation project (KP2): Linguistic annotation of non-standard varieties — guidelines and "best practices"

Annotation categories , guidelines and automatic tools are based on newspaper texts

Growing demand for the description of other (= non-standard) varieties. Pilot project: Extension of given resources for 5

non-standard varieties

Page 4: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

NoSta-D corpus

4

no

n-s

tan

dar

d learner

historical

literary prose

dialogues

chat

Falko 6.762 tokens

DDB & Anselm 7.503 tokens

DCC: UB & Plauder 6.664 tokens

Bematac 6.731 tokens

Kafka: der Prozeß 7.294 tokens

Creation of a non-standard variety pilot corpus of German (Dipper et al. to appear)

Page 5: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

3 types of annotations

5

NER (spans) Coreference (pointing relations) Dependencies (trees)

Dependency parsing for German "reaches an accuracy […] better than the best constituent analysis including grammatical functions."

(Kübler & Prokic 2006)

Page 6: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Non-standard dependencies

6

How to define non-standard dependency structures?

1) Take guidelines that fully describe structures in a large newspaper corpus of German:

TiGer (Alberts et al. 2003)

problem: Constituents

(Alberts et al. 2003:9)

Page 7: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Non-standard dependencies

7

How to define non-standard dependency structures?

2) Give human annotators a translation of TiGer-constituent trees into dependencies

PIS → HEAD

NN → HEAD if not head is PIS

ADJA → HEAD if not head is PIS or NN. If

not HEAD, then ATTR

ART → DET

Page 8: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Non-standard dependencies

8

PIS → HEAD

NN → HEAD if not head is PIS

ADJA → HEAD if not head is PIS or NN. If

not HEAD, then ATTR

ART → DET

How to define non-standard dependency structures?

2) Give human annotators a translation of TiGer-constituent trees into dependencies

Page 9: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Non-standard dependencies

9

How to define non-standard dependency structures?

3) Structures that aren't covered by the TiGer guidelines are considered non-standard.

Page 14: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Extension of label set

14

some need new labels SF sentence fragment

CHU chunk (non-hierachical multi-word unit)

Page 15: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Normalisation & Context

15

1 system JustChat 4.0r0.204 (55.204) developed by

Medium.net.

2 system Du betrittst den Raum.

3 quaki was echt zori?

4 system little15 betritt den Raum.

5 quaki das küssen??

start of chat : missing context ambiguous parse

Was

PWS

echt

ADJD

,

$,

Zora

NE

?

$.

SF

PRED

VOK

Was

PWS

,

$,

echt

ADJD

,

$,

Zora

NE

?

$.

SFDM SFDM

VOK

ROOT

Was

PWS

,

$,

echt

ADJD

Zora

NE

?

$.

SFDM

MOD

SFSUBJ

Page 16: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Normalisation & Context

16

1 2 Zora: Yesterday I kissed someone. 3 quaki: Was, ECHT, Zora?

reconstructing context

Was

PWS

,

$,

echt

ADJD

,

$,

Zora

NE

?

$.

SFDM SFDM

VOK

ROOT1

Page 17: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Normalisation & Context

17

1 2 Zora: Yesterday I kissed someone. 3 quaki: Was, ECHT, Zora?

reconstructing context

2 2 Pharao: Did you know that Zora kissed someone yesterday? 3 quaki: Was, ECHT Zora (hat das gemacht)?

Was

PWS

,

$,

echt

ADJD

,

$,

Zora

NE

?

$.

SFDM SFDM

VOK

ROOT

Was

PWS

,

$,

echt

ADJD

Zora

NE

?

$.

SFDM

MOD

SFSUBJ1 2

Page 18: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Normalisation & Context

18

1 2 Zora: Yesterday I kissed someone. 3 quaki: Was, ECHT, Zora?

reconstructing context

2 2 Pharao: Did you know that Zora kissed someone yesterday? 3 quaki: Was, ECHT Zora (hat das gemacht)?

3 2 Zora: Did you really (echt) kiss someone yesterday? 3 quaki: Was (heißt) ECHT, Zora?

Was

PWS

echt

ADJD

,

$,

Zora

NE

?

$.

SF

PRED

VOK

Was

PWS

,

$,

echt

ADJD

,

$,

Zora

NE

?

$.

SFDM SFDM

VOK

ROOT

Was

PWS

,

$,

echt

ADJD

Zora

NE

?

$.

SFDM

MOD

SFSUBJ1 2 3

Page 19: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Normalisation & Context

19

alternative: Don't annotate first n postings!

1 system JustChat 4.0r0.204 (55.204) developed by

Medium.net.

2 system Du betrittst den Raum.

3 quaki was echt zori?

4 system little15 betritt den Raum.

5 quaki das küssen??

6 Pharao na gut marc. kein servicepaket nr.1 für

dich :)

7 zora was?

8 system TomcatMJ kommt aus dem Raum Go-Rin-No-

Sho herein.

9 TomcatMJ hi

10 system TomcatMJ ist wieder da.

Page 20: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Fragments and dependencies

20

The root of a (German) dependency structure is the verb. Fragments are difficult to model.

TiGer Guidelines: Bei verblosen Sätzen, die v.a. in Überschriften und Titeln erscheinen, sollte man den Satz in Gedanken sinnvoll ergänzen und ihn dann ganz normal annotieren.

(Albert et al 2003:72)

Verbless sentences as in newpaper titles should be completed in a sensible way and then be annotated as usually.

Page 21: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Fragments and normalisation

21

Normalisations in NoSta-D are made explicit in the corpus. are documented in the manual. (detailed discussion, e.g. Lüdeling et al. 2005, Lüdeling 2008,

Reznicek et al. to appear)

Kein

PIAT

kein

Pharao

Servicepaket

NN

servicepaket

_

Nr.

NN

nr.

_

1

CARD

1

_

existiert

VVFIN

_

_

für

APPR

für

_

dich

PPER

dich

_

.

$.

_

_

DET

SUBJ

APP

APP

SF

MOD PN

tok

norm

Page 22: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Fragments

22

For normalisation a verb is reconstructed (norm) Grammatical functions are annotated If possible: subject > acc obj. > dat obj.

Kein

PIAT

kein

Pharao

Servicepaket

NN

servicepaket

_

Nr.

NN

nr.

_

1

CARD

1

_

existiert

VVFIN

_

_

für

APPR

für

_

dich

PPER

dich

_

.

$.

_

_

DET

SUBJ

APP

APP

SF

MOD PN

tok

norm

Page 23: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Fragments

23

For normalisation a verb is reconstructed (norm) Grammatical functions are annotated If possible: subject > acc obj. > dat obj.

How to deal with missing verbs in the original data?

Kein

PIAT

kein

Pharao

Servicepaket

NN

servicepaket

_

Nr.

NN

nr.

_

1

CARD

1

_

existiert

VVFIN

_

_

für

APPR

für

_

dich

PPER

dich

_

.

$.

_

_

DET

SUBJ

APP

APP

SF

MOD PN

tok

norm

Page 24: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Fragments

24

3 possible ways modeling the attachement

Page 25: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Fragments

25

3 possible ways modeling the attachement

a) Dummy verb is highest head (Seeker & Kuhn 2012)

Kein

PIAT

kein

Pharao

Servicepaket

NN

servicepaket

_

Nr.

NN

nr.

_

1

CARD

1

_

existiert

VVFIN

_

_

für

APPR

für

_

dich

PPER

dich

_

.

$.

_

_

DET

SUBJ

APP

APP

SF

MOD PN

Page 26: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Kein

PIAT

kein

Servicepaket

NN

servicepaket

Nr.

NN

nr.

1

CARD

1

existiert

VVFIN

_

für

APPR

für

dich

PPER

dich

.

$.

_

DET

SFSUBJ

APP

APP

SFMOD

PN

26

Fragments

3 possible ways modeling the attachement

b) Verb is not annotated. (Foth 2006)

Fragments are not linked.

Page 27: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Fragments

27

3 possible ways modeling the attachement

c) All fragments are linked to the highest head. (NoSta-D current approach)

Kein

PIAT

kein

Servicepaket

NN

servicepaket

Nr.

NN

nr.

1

CARD

1

existiert

VVFIN

_

für

APPR

für

dich

PPER

dich

.

$.

_

DET

SFSUBJ

APP

APP

MOD

PN

Page 28: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Fragments

28

Fragment roots are assigned grammatical functions where possible SF + gram. funct.

Redundant SF-annotation helpful in up-to-date query tools e.g. TiGer-Search (König et al. 2003), ANNIS3 (Zeldes et al. 2009) http://www.sfb632.uni-potsdam.de/annis/annis3.html)

Page 29: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Juhu

ITJ

juhuuu

,

$,

_

Tom

NE

tom

ist

VAFIN

is

da

ADV

dada

.

$.

_

DM

SUBJ

S

MOD

Linked classic interjections Discourse marker (DM)

Interjections and friends

29

Ach

ach

ITJ

,

$,

wenn

wenn

KOUS

ich

ich

PPER

DM SB

S

CP

TiGer: s31718

Page 30: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Non-linked classic interjections Discourse marker fragments (SFDM)

Interjections and friends

30

28 Lantonie Hallo. :)

29 zora LANTOOO :)))

30 TomcatMJ *mal guck wo quaki sich

nu hinstelt*G*

31 quaki freu

32 zora juhuuu 33 Lantonie Hallo quaki.

34 marc30 Lantöööö :o)

35 TomcatMJ hi lanto

Page 31: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Emoticons and *-expressions are … … tagged as interjections.

… always considered non-linked fragments when peripheral.

… of an underspecified kind.

Interjections and friends

31

111 Emon mann, habe tatsächlich was

verdauliches gegessen... :)

112 TomcatMJ da is sone etwa 25 m hohe

pappel wo die drinsitzen und

rumzetern*G*

Page 32: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Grammatical functions are only assigned if unambiguous.

Underspecification

32

44 Lantonie Ich finde den quaki klasse, ein

toller Neuzugang, der sich echt

bewährt.

45 Lantonie :)))

46 Lantonie Na, zori? :))

47 marc30 den?

Page 33: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Grammatical functions are only assigned if unambiguous.

Underspecification

33

12 quaki juhuuu tom is dada

13 zora echt?

Meinst

VAFIN

_

das

PDS

_

echt

ADJD

echt

?

$.

?

SFMOD

du

PPER

_

Ist

VAFIN

_

das

PDS

_

echt

ADJD

echt

?

$.

?

SFPRED

Page 34: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Grammatical functions are only assigned if unambiguous.

Underspecification

34

12 quaki juhuuu tom is dada

13 zora echt?

Meinst

VAFIN

_

das

PDS

_

echt

ADJD

echt

?

$.

?

SFMOD

du

PPER

_

Ist

VAFIN

_

das

PDS

_

echt

ADJD

echt

?

$.

?

SFPRED

Echt

ADJD

echt

_

?

$.

?

_

SF

Page 35: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Responsive particles are treated as full sentences.

sentence quivalences

35

Page 36: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Inflectives are labeled "SI".

Inflectives I

36

Ich

PPER

_

mal

ADV

mal

gucke

VVFIN

guck

,

$,

_

wo

PWAV

wo

Quaki

NE

quaki

sich

PRF

sich

nun

ADV

nu

hinstellt

VVFIN

hinstelt

$.

_

MOD

SI

MOD

SUBJ

OBJA

MOD

OBJC

Page 37: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Inflectives II

37

For normalisation inflectives are

Vend sentences.

Ich

PPER

_

jemandem

PIS

_

eins

PIS

1

an

APPR

an-

die

ART

-ne

Stirn

NN

stirn

bappe

VVFIN

bapp

$.

_

OBJA

OBJP

DET

PN

SI

Ich

PPER

_

nicht

PTKNEG

nicht

festgebunden

VVPP

festgebunden

sein

VAINF

sein

mag

VMFIN

mag

$.

_

$.

_

MOD

PREDAUX

SI

Page 38: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Inflectives II

38

For normalisation inflectives are

Vend sentences. Ich

PPER

_

mich

PPER

_

aufplustere

VVFIN

aufpluster

$.

_

SI

Ich

PPER

_

mich

PPER

_

freue

VVFIN

freu

$.

_

SI

Page 39: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Ich

PPER

_

fett

ADJD

fett bin

bin

VVFIN

.

$.

_

PRED

SI

Inflectives III retokenization

39

Concatenations are retokenized into separate words.

In this annotation pilot we do not worry about automatic performance.

Ich

PPER

_

erleichtert

ADJD

erleichtert

gucke

VVFIN

guck

$.

_

MOD SI

299

marc30 *fettbin*

11 marc30 Danke Pharao *erleichtertguck*

Page 40: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Asterisk expressions (retokenized)

40

165 quaki *nagut50cmlauflaufleine*

Not all concatinated tokens are inflectives.

Na

ITJ

na gut 50 cm lauflaufleine

gut

ITJ

,

$,

50

CARD

cm

NN

Lauflaufleine

NN

MOD

SFDM

ATTRGRAD

SF

Page 41: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

V2-derived inflectives

41

Not all inflectives are Vend.

Normalisation expects an inserted object here.

560 Happy lachwech@bochum

Ich

PPER

Ich

lache

VVFIN

lachwech

mich

PPER

_

weg

PTKVZ

_

@

APPR

@

Bochum

NE

bochum

.

$.

_

SIAVZ

MOD

PN

Page 42: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

@ expressions as prepositions

42

@ may replace subcategorized prepositions.

323 zora wos? *eifersüchtel*@lanto

Ich

PPER

*eifersüchtel* @ lanto

bin

VAFIN

_

eifersüchtig

ADJD

_

auf

APPR

Lanto

NE

.

$.

_

SI

OBJPPN

Page 43: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

@ expressions as address

43

@ may direct the content of a whole utterance at some adressee.

Ich

PPER

Ich

lache

VVFIN

lachwech

mich

PPER

_

weg

PTKVZ

_

@

APPR

@

Bochum

NE

bochum

.

$.

_

SIAVZ

MOD

PN

560 Happy lachwech@bochum

Page 44: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

@ expressions as address

44

@ may direct the content of a whole utterance at some adressee.

269 TomcatMJ naja,dann heissts wohl mal rumtelefonieren

und nachfragen da umzugsfiormen selten im

i8nternet stehen@zora

Naja

PTKANT

,

$,

dann

ADV

heißt

VVFIN

-s

PPER

wohl

ADV

,

$,

mal

ADV

runzutelefonieren

VVIZU

und

KON

nachzufragen

VVIZU

,

$,

da

KOUS

Unzugsfirmen

NN

selten

ADV

im

APPRART

Internet

NN

stehen

VVFIN

@

APPR

Zora

NE

.

$.

DM

MOD

S

PPER

ADV MOD

OBJI

KONOBJI

KONJ

SUBJ

MOD

MOD

PN

NEB

MOD

PN

naja , dann heisst -s wohl _ mal rumtelefonieren und nachfragen , da unzugsfiormenselten im I8nternet stehen @ zora .

Page 45: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

extra: Coordination

45

Classical problem for dependencies Solution 1) KON & CJ (Foth 2006)

Problem: What category does CJ have?

OBJA

Ich liebe Kaffee , Zucker und in der Sonne sitzen

I love coffee , suggar and in the sun sitting

KON KON CJ

Page 46: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

extra: Coordination

46

Classical problem for dependencies

Solution 2) loose CC & gram. funct. (Kübler et al. 2012)

Problem: Which daughters are coordinated?

Ich liebe Kaffee , Zucker und in der Sonne sitzen

I love coffee , suggar and in the sun sitting

OA

CC

OC

OA

Page 47: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

extra: Coordination

47

Classical problem for dependencies

Solution 3) KON & GF (NoSta-D)

Problem: Obj-Obj chains

Ich liebe Kaffee , Zucker und in der Sonne sitzen

I love coffee , suggar and in the sun sitting

OBJA KON OBJI

OBJA

Page 48: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

extra: Coordination

48

Classical problem for dependencies

Solution 4) KON & GF with ',' as KON Problem: Comma is annotated only in coordinations Only works for annotation of normalisation

Ich liebe Kaffee , Zucker und in der Sonne sitzen

I love coffee , suggar and in the sun sitting

OBJA KON OBJA OBJI KON

Page 49: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

extra: Coordination

49

Classical problem for dependencies

Solution 4) loose CC & new gram. funct.

Ich nenne ihn einen Hund und Ehebrecher

I call him a dog and adulterer

OBJA

KON

OBJAC

OBJAC

Page 50: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Summary

50

Tasks: We need guidelines to decide how to deal with the first

postings of a chat that don't come with context. We need a new POS-tag for inflectives.

Page 51: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Summary

51

Tasks: We need guidelines to decide how to deal with the first

postings of a chat that don't come with context. We need a new POS-tag for inflectives.

Questions: What would be a preferable attachment of fragments? Are response particles full sentences? Is retokenizing of concatenations a helpful analysis? Should we analyse concatenations as Vend?

Page 52: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Outlook

52

Annotation of coreference named entity recognition including crowd-sourcing reference

Publication of the pilot corpus NoSta-D end of summer

Page 53: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

82 stoeps :-)

83 TomcatMJ hi stoeps

84 Emon unf tom : (

85 Emon *g*

86 Thor... über meine

87 TomcatMJ hi emon

88 quaki 200 krähen?

89 TomcatMJ jo..

90 Lantonie Die eins Minus hast du

91 marc30 was ist Benehmen?

92 quaki die vögel

93 system B67 betritt den Raum.

94 system Lantonie ist jetzt weg Pissen

53

Thanks ;0)

Page 54: Annotating Dependency Relations in Non-standard · PDF fileAnnotating Dependency Relations in Non-standard Varieties Marc Reznicek Stefanie Dipper Anke Lüdeling ... 8 system TomcatMJ

Bibliography Albert, Stefanie; Anderssen, Jan; Bader, Regine; Becker, Stefanie; Bracht, Tobias; Brants, Thorsten et al. (2003): TIGER Annotationsschema.. Online

http://www.ims.uni-stuttgart.de/projekte/TIGER/ .

Beißwenger, Michael (2013): Das Dortmunder Chat-Korpus: ein annotiertes Korpus zur Sprachverwendung und sprachlichen Variation in der deutschsprachigen Chat-Kommunikation. Online-Publikation, LINSE - Linguistik Server Essen http://www.linse.uni-due.de/tl_files/PDFs/Publikationen-Rezensionen/Chatkorpus_Beisswenger_2013.pdf

Dipper, Stefanie; Lüdeling, Anke; Reznicek, Marc (to appear): NoSta-D. A Corpus of German Non-Standard Varieties. In: Marcos Zampieri (ed.): Non-Standard Data Sources in Corpus-Based Research: Shaker.

Foth, Kilian A. (2006): Eine umfassende Constraint-Dependenz-Grammatik des Deutschen. Technischer Report. Universität Hamburg.

König, Esther; Lezius, Wolfgang; Voormann, Holger (2003): TIGERSearch User’s Manual. Stuttgart. Online verfügbar unter http://www.tigersearch.de.

Lüdeling, Anke (2008): Mehrdeutigkeiten und Kategorisierung. Probleme bei der Annotation von Lernerkorpora. In: Maik Walter und Patrick Grommes (Hg.): Fortgeschrittene Lernervarietäten. Korpuslinguistik und Zweitspracherwerbsforschung. Tübingen: Max Niemeyer Verlag (Linguistische Arbeiten, 520), S. 119–140.

Lüdeling, Anke; Walter, Maik; Kroymann, Emil; Adolphs, Peter (2005): Multi-level Error Annotation in Learner Corpora. In: Proceedings of Corpus Linguistics 2005. Birmingham.

Kübler, Sandra; Prokic, Jelena (2006): Why is German Dependency Parsing more Reliable than Constituent Parsing? In: Proceedings of the 5th International Workshop on Treebanks and Linguistic Theories (TLT 2006). Prague, Czech Republic. Prague, Czech Republic.

Reznicek, Marc; Lüdeling, Anke; Hirschmann, Hagen (to appear): Competing Target Hypotheses in the Falko Corpus. A Flexible Multi-Layer Corpus Architecture. In: Ana Díaz-Negrillo (Hg.): Automatic Treatment and Analysis of Learner Corpus Data: John Benjamins.

Seeker, Wolfgang; Kuhn, Jonas (2012): Making Ellipses Explicit in Dependency Conversion for a German Treebank. In: Proceedings of the 8th International Conference on Language Resources and Evaluation. Istanbul, Turkey: European Language Resources Association (ELRA), S. 3132–3139. Online http://www.lrec-conf.org/proceedings/lrec2012/pdf/235\_Paper.pdf.

Tenfjord, Kari; Hagen, Jon Erik; Johansen, Hilde (2006): The "Hows" and the "Whys" of Coding Categories in a Learner Corpus. (or "How and Why an Error-Tagged Learner Corpus is not 'ipso facto' One Big Comparative Fallacy"). In: Rivista di psicolinguistica applicata (3), S. 93–108.

Zeldes, Amir; Ritz, Julia; Lüdeling, Anke; Chiarcos, Christian (2009): ANNIS. A Search Tool for Multi-Layer Annotated Corpora. In: Michaela Mahlberg, Victorina González-Díaz und Catherine Smith (Hg.): Proceedings of Corpus Linguistics 2009, Liverpool, July 20-23, 2009. Corpus Linguistics. Liverpool, 20-23 July 2009. University of Liverpool.