Annotation : the scope er so Steven said it was not a property of um annotated corpora verb phrase Anaphoric reference noun phrase named entity passing.

Post on 03-Jan-2016

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Annotation : the scope

er so Steven said it was not a property of um annotated corpora

verb phrase

Anaphoric reference

noun phrasenamed entity

passing truck

speech act

disfluencyintonation pattern

contextparticipant

Some xml annotations

<phon addr=”1:10”>st</phon><phon addr=”12:5”>i</phon><phon addr=”23:5”>v</phon><phon addr=”30:2”>n</phon>

<w pos=”NP1”>Steven</w>

<person ident=”SB01” gender=”M”><birthDate>12/03/1956</birthDate>.... </person> <name persKey=”SB01”>

steven</name>

<u who=”SB01” start=”0:1”>er so steven said it was <emph>not</emph> a property of annotated corpora</u>

Transcribing speech

normalization issues

ease of reading vs accuracy

interpretation vs prosody

analogous to problems of handling digitized images

The Spoken base tagset

components : <u> <event> <kinesic> <vocal> <pause> <shift>

contextual information in header <settingDesc> <particDesc>

facilities for synchronization and timing

Features of speech

lexica l<u>

non-lexica l<vocal>

anthropophenic non-anthropophenic<k inesic>

com m unicative non-com m unicative<event>

transcribed events

Utterances

Basic unit of discourse, corresponding to speaker turns

Optionally grouped into higher-level divisions (<div>s), e.g. to mark discourse function

Linked by who attribute to <person> description in header

Vocals and events

Empty elements are used to mark paralinguistic phenomena

<u who="Jan">This is just delicious</u><event desc='telephone rings'><u who="Kim">I'll get it</u> <u who="Tom">I used to <vocal desc="cough"/> smoke a lot</u><u who="Bob"><vocal desc="sniff"/>He thinks he's tough</u><vocal who="Ann" desc="snorts"/>

Voice quality and prosody

The <shift> element is used to mark changes in voice quality

Other prosodic features may be marked using specific kinds of <seg> or entity refs

<u who="LB"><shift feature="loud" new="f"/>Elizabeth</u><u who="EB">Yes</u><u who="LB"><shift/>Come and try this <pause/><shift feature="loud" new="ff"/>come on</u>

Another example<u who="MAR">you never <pause/> take this cat for show and tell <pause dur='5'/> meow meow</u> <u who="ROS">yeah well I dont want to</u> <event><desc>toy cat has bell in tail which continues to make a tinkling sound</desc></event> <vocal who="MAR"><desc>miaows</desc></vocal> <u who="ROS">because it is so old</u> <u who="MAR">how <reg>about</reg> your cat <pause/> yours is new<kinesic><desc>shows Father the cat</desc></kinesic></u><u who="FAT" trans="pause">that<pause/> darling</u><u who="MAR"><s>no mine isnt old</s> <s>mine is just um a little dirty</s></u>

Participant Description

<person xml:id="P1" sex='2' age='mid'> <p>Female informant, well-educated, born in Shropshire UK, 12 Jan 1950, of unknown occupation. Speaks French fluently. Socio-Economic status B2 in the PEP classification scheme. </person>

<person xml:id="P1" sex="2" age='mid'> <birth date='1950-01-12'> <date>12 January 1950</date> <name type="place">Shropshire, UK</name></birth><firstLang>English</firstLang><langKnown>French</langKnown><residence>Long term resident of Hull</residence><education>University postgraduate</education><occupation>Unknown</occupation><socecstatus source="PEP" code="B2"/></person>

Setting Description

eg from P2<settingDesc><setting who="#P1 #P2"><name type="city">Bedford</><name type="region">UK: South East</name><date value="1989">early spring, 1989</><locale>rug of a suburban home</locale><activity>playing</activity></setting><setting who="#P3"><name type="city">Bedford</name><name type="region">UK: South East</name><date value="1989">early spring, 1989</date><locale>at the sink</locale> <activity>washing-up</activity></setting><setting who="#P4"><name type="place">London, UK</name> <time>unknown</time><locale>broadcasting studio</locale><activity>radio performance</activity></setting></settingDesc>

Timing

Pausinguse <pause> element

Durationuse dur attribute

Overlapuse trans attribute

OverlapHave you heard the the election results?

its a disasterits a miracle

<u xml:id="A1" who="A">Have you heard the</u> <u xml:id="B1" who="B" trans="latching">the election results? </u><u xml:id="A2" who="A" trans="pause">its a disaster</u><u xml:id="B2" who="B" trans="overlap">its a miracle </u>

Linking, segmentation, alignment

Provides generic segmentation elements

Provides extensive set of attributes for linkage, correspondence,synchronization, aggregation, alternation, etc.

Documents generic pointing mechanism

Generic segmentation elements

<seg> for arbitrary (nesting) segmentation

<s> for end-to-end segmentation

use type attribute to subcategorise

<anchor> for points

Segmentation is the key to successful linking and analysis

Clustering

(Difficulty (is being expressed) with ((the method) (to be used)))

<s>Difficulty <seg>is being expressed</seg> with <seg><seg>the method</seg> <seg>to be used</seg></seg></s>

discontinuous segments

fundamental problem

first segment, then link, using stand-off

“You put it,” Quill reminded him, “in the safe.”<s xml:id="s1">"You put it,"</s> <s xml:id="s2">Quill reminded him,</s> <s xml:id="s3">"in the safe."</s>

discontinuous segments

can also use PART attribute to indicate that segments are incomplete

“You put it,” Quill reminded him, “in the safe.”

<s xml:id="s1" next="#s3">"You put it,"</s> <s xml:id="s2">Quill reminded him,</s> <s xml:id="s3" prev="#s1">"in the safe."</s>

discontinuous segments“You put it,” Quill reminded him, “in the safe.”

<s xml:id="s1">”You put it,”</s> <s xml:id="s2">Quill reminded him,</s> <s xml:id="s3">“in the safe.”</s>

<join targets="#s1 #s3" result="s"/>

Translation pairs

<s xml:id="s1" corresp="#s2" xml:lang="EN">For a long time I used to go to bed early</s><s xml:id="s2" corresp="#s1" xml:lang="FR">Longtemps je me couchais de bonne heure</s>

<correspGrp type="trans"><link targets="#s1 #s2"/>

</correspGrp>

and/or....

Synchronization

of whole elements

of points in time

<u xml:id="A2" who="A" synch="#B2">its a disaster</u><u xml:id="B2" who="B">its a miracle</u>

<u xml:id="A1" who="A">Have you heard <anchor xml:id="AO"/>the</u> <u xml:id="B1" who="B" start="#A0"><anchor xml:id="BO"/>the election results? yes</u>

XML semantics are limited

The containment relation is implicit, so we do not need to say

though we may wish to say

<s id=”S1” head=”V1”> <np id=”N1”>annotated corpora</np> <vp id=”V1”>rule</vp> <tq id=”T1”>okay</tq></s>

<vp id=”V1” partOf=”S1”>rule</vp>

<vp id=”V1” role=”head” >rule</vp>

Analytic mechanisms

Specific kinds of segment for linguistic analyses

Why is there no tag for noun?

Specialized interpretive pointers (<span> and <spanGrp>)

The ana attribute and its possible targets

<interp> and <interpGrp>

feature systems <fs> and <fsd>

Arbitrary characterizations

The <span> points into a stretch of a text and characterizes it in some way

Target may be anything you can reach by an xpath

<spanGrp resp=”#LB” type="thematic" > <span value="ships" from="#P1" to="#P2"/> <span value="shoes" from="#P4" to="#P8"/> <span value="sealing wax" from="http://www.somewhere.com/waxinit.xml#P45"/></spanGrp>

<w ana="#VVD">annotated</w>

More detailed analysis

the ana attribute is of type IDREFS

what does VVD identify?a prose description

an <interp> element

a feature structure

using interp...

<w ana="#VVD">annotated</w><w ana="#NN2">corpora</w>

<interp xml:id="VVD" type="lexicalClass" value="verbPastTense"/><interp xml:id="NN2" type="lexicalClass" value="nounPlural"/>

hierarchic grouping of interpsnouns can be common or proper

nouns can be singular or plural

<interpGrp value="nomimal"> <interpGrp value="common"> <interp value="singular"/> <interp value="plural"/> </interpGrp></interpGrp>

for example...

<interp xml:id=‘VVD’> <desc>verb past tense</desc></interp><interp xml:id=‘NN2’> <desc>plural common noun</desc>

</interp>

<w ana=‘#VVD’>annotated</w>

<w ana=‘#NN2’>corpora</w>

Encoding analyses

Linguistic Annotation Frameworks and standards

the philosophers stone

Generic feature structure system any analysis can be represented by bundles of named feature-value pairs

embedded within text or indirectly linked

Ancillary feature system declarationTheoretically neutral (?) pragmatic solution to real world problem of intermachine communication

Feature structures

a feature structure consists of a bundle of featuresa feature has a name and a valuevalues may be binary switches, symbols, strings, feature structures, or operations on thembundling may constrained in various (not necessarily hierarchic) ways

... or, in XML:

The <fs> element represents a (typed) feature structure, which contains...One or more <f> elements, each of which has

a name

a value

Feature values may beatomic: <binary> <string> <numeric> <symbol>

complex: <fs> <coll>

expressions: <vNot> <vAlt> <vColl> ... or <var>

Using a feature structure...<w ana=‘#NN2’>corpora</w>

<fs xml:id=‘NN2’> <f name=‘class’> <symbol value=‘noun’/></f> <f name=‘number’> <symbol value=‘plural’/></f> <f name=‘proper’> <binary value=”false”/></f></fs>

Features: simple values

binary, numeric, symbol or stringconstraints may be declared in FSD

<fs type='word structure'> <f name='lemma'><string>goose</str></f> <f name='category'><symbol value='noun‘/></f> <f name='barLevel'><numeric value='0‘/></f> <f name='number'><symbol value='plural‘/></f></fs>

lemma : goose,

category: noun,

number:plural

bar level: 0

Features: plus or minus<fs type='phonetic segment'> <f name='segment'><binary

value=”yes”></f> <f

name='consonantal'><binary value=”yes”/></f>

<f name='vocalic'><binary value=”no”/></f>

<f name='nasal'><binary value=”no”/></f>

<!-- .... -->. <f name='coronal'><binary

value=”yes”/></f> <f name='continuant'><binary

value=”yes”/></f> <f

name='delayedRelease'><binary value=”yes”/></f>

<f name='strident'><binary value=”yes”/></f>

</fs>

segment +, consonantal +, vocalic -, nasal -, low -,

high -, back -, round -, anterior +, coronal +,

continuant +,

delayed release +,

strident +]

Alternate values

<w ana=VVD>annotated</w>

<fs id=VVD type=‘lexical’> <f name=“class”> <vAlt mutExcl=“Y”> <sym value=‘verb’/> <sym value=‘adj’/> </vAlt> </f>...</fs>

for example...<fs> <f name="cat"> <symbol value="verb"/></f> <f name="aux"> <string value="avoir"/></f> <f name=”mode”> <symbolvalue=”indicatif”/></f> <f name="tense"> <symbol value="present"/> </f> <f name="pers"> <vAlt> <symbol value="1"/> <symbol value="3"/> </vAlt> </f> <f name="num"> <symbol value="sing"/></f></fs>

“mange”

Value librariesCollections of re-usable feature-structure components, each with a unique key

May be referenced from an <fs> (using feats attribute) or an <f> (using fVal attribute)

NB effect is to transclude (embed a copy of) the referenced item

Not to be confused with....

for example <fLib type="agreement features"> <f xml:id="p1" name="person"> <symbol value="first"/></f> <f xml:id="p2" name="person"> <symbol value="second"/></f> <!-- ... --> <f xml:id="ns" name="number"> <symbol value="singular"/></f> <f xml:id="np" name="number"> <symbol value="plural"/></f> <!-- ... --></fLib>

<fs feats=”#p2 #ns”/><fs feats=”#p2 #ns”/>

Structure sharingSome <fs> are not trees but DAGs – nodes may have multiple parents

We represent this by labelling each re-entrancy point, using a <var> element

All <var>s with the same label are held to be the same node: any contents found are to be unified

for example<fs><f name="nominal"> <fs> <f name="nm-num"> <var label="L1"> <symbol value="singular"/></var> </f> <!-- other nominal features --> </fs></f><f name="verbal"> <fs> <f name="vb-num"><var label="L1"/></f> </fs> <!-- other verbal features --></f></fs>

Collections and other multiples

The value of a feature may be an aggregate of atomic values organized as a set, list, or bag

We represent this as a <coll> with a distinguishing org attribute

The value of a feature may (more usually) be a feature structure

... or the value of a feature may be given by a feature expression

For example <fs> <f name="lexicalForm"> <symbol value="auxquels"/></f> <f name="analyses"> <coll org="list"> <fs> <f name="cat"><symbol value="prep"/></f> </fs> <fs> <f name="cat"><symbol value="pronoun"/></f> <f name="kind"><symbol value="rel"/></f> <f name="num"><symbol value="pl"/></f> <f name="gender"><symbol value="masc"/></f> </fs> </coll> </f> </fs>

Feature expressions

We provide the following operatorsNegation <vNot> i.e. complement

Alternation <vAlt>

“Flattening” collection <vColl>

We also provide a <default> element

... but some of these are not very useful in the absence of a feature system declaration

Validation of Feature Structures

Constraints can be applied at three levels

in the XML schema (e.g. empty <f> is not allowed)

by supplying additional rules in an established XML constraint language (e.g. Schematron)

by defining a complete FSD or equivalent

Or, a given set of <fs> could be “de-abstracted” to form a structure for which a specific schema could be written

Essential to support “typing” and “sub-typing” of feature structures

“de-abstractification”A generic XML representation can be automatically converted to a specific one...<fs type=”ABC”> <f name=”xyz”> <symbol value=”zzz”/></f> <f name=”foo”> <numeric value=”42”/></f></fs> <ABC>

<xyz>zzz</xyz> <foo>42</foo></ABC>

<!ELEMENT ABC (xyz,foo)><!ELEMENT xyz (#PCDATA)><!ELEMENT foo (#PCDATA)>

top related