-
LIRICS Deliverable D4.4
Multilingual test suites for semantically annotated data
Project reference number e-Content-22236-LIRICS
Project acronym LIRICS
Project full title Linguistic Infrastructure for Interoperable
Resource and Systems
Project contact point Laurent Romary, INRIA-Loria 615, rue du
jardin botanique BP101. 54602 Villers lès Nancy (France)
[email protected]
Project web site http://lirics.loria.fr
EC project officer Erwin Valentini
Document title Multilingual test suites for semantically
annotated data
Deliverable ID D4.4
Document type Report
Dissemination level Public
Contractual date of delivery M30
Actual date of delivery 30th June 2007
Status & version Final, version 3.0
Work package, task & deliverable responsible UtiL
Author(s) & affiliation(s) Harry Bunt, Olga Petukhova, and
Amanda Schiffrin, UtiL
Additional contributor(s)
Keywords Annotation, Semantic Representation, Test suites
Document evolution
version date version Date
1.0 26th June 2007
2.0 28th August 2007
3.0 13th September 2007
mailto:[email protected]://lirics.loria.fr/
-
1 Introduction
This document forms the last in the series of deliverables for
Work Package 4 in the LIRICS project. The first, D4.1 (Bunt and
Schiffrin, 2006), introduced the methodological factors which
should be taken into consideration when isolating appropriate
semantic concepts for representation; the second, D4.2 (Schiffrin
and Bunt, 2007), discussed some of the problems encountered in
identifying commonalities in alternative approaches, suggesting
ways in which these problems might be solved, and presenting a
preliminary set of data categories for semantic annotation. The
third, D4.3 (Schiffrin and Bunt, 2007) represents the final stage
in the evolution of the preliminary data categories after these
have been extensively discussed, applied and tested. This final
document, D4.4 (Petukhova, Schiffrin and Bunt, 2007), is intended
as a companion guide to set of test suites and annotation
guidelines which will be made available online.
We will present here the methods used in the production of the
annotation, as well as concrete examples of each concept, in pseudo
XML format for three different areas of semantic annotation:
Dialogue Act, Semantic Role and Reference annotation. Temporal
annotation was originally one of the areas of consideration, but
since this has since been nominated by ISO as an area of particular
interest in a new ISO work item, this area was dropped in LIRICS.
It would have been redundant to duplicate work that is being
carried out elsewhere.
The rest of the document will be divided into the following
sections:
(Section 2) Dialogue Act Annotation.
(Section 3) Semantic Role Annotation.
(Section 4) Reference Annotation.
(Section 5) Concluding remarks.
(Appendices) - Extended examples of XML annotations;
- Guidelines for annotating dialoguen acts, semantic roles, and
reference refelations.
Each of the sections concerning a specific semantic area of
annotation will be further divided into the following subsections
of information:
• Description of the annotation task.
• Description of the corpora used, with references and
statistics.
• Description of the annotation tool used; rationale behind the
choice of annotation tool; screenshots.
• Example XML annotation for each data category.
• Summary and discussion of the issues arising from the
annotation task, including occurrence figures.
This deliverable constitutes a report on the final state of play
at the 30-month stage of the project.
-
2 Dialogue Act Annotation
2.1 Annotation Task
The dialogue annotation task implied two main activities:
- identification of the boundaries of functional segments with
at least one communicative function (segmentation task);
- assigning dialogue act tags (also multiple tags) to the
identified segments in multiple dimensions (classification
task).
Each dialogue was annotated by at least two different trained
annotators with aim to estimate the inter-annotator agreement and
designing the so-called ‘gold-standard’ annotation.
Annotators were provided with Annotation Guidelines for dialogue
act annotation (see Appendix I.A)
2.2 Corpora
Dialogue act annotation was performed for three languages:
English, Dutch, and Italian.
For English selected dialogues from two dialogue corpora were
annotated: TRAINS 1 (5 dialogues; 349 utterances) and MapTask2 (2
dialogues; 386 utterances). Dialogues from both corpora are
two-agent human-human dialogues. TRAINS dialogues are
information-seeking dialogues where an information office assistant
is supposed to help a client in choosing the optimal transport
train connection. MapTask dialogues are so-called instructing
dialogues where one participant plays the role of an
instruction-giver navigating another participant, who is an
instruction-follower, through the map. For both corpora
orthographic transcriptions for each individual speaker, including
word-level timings, were used.
For Dutch selected dialogues from two dialogue corpora were
annotated: DIAMOND3 ((one extended dialogue, 301 utterances) and
Schiphol (Amsterdam Airport) Information Office (6 dialogues; 202
utterances). Dialogues from both corpora are two-agent human-human
dialogues. DIAMOND dialogues have an assistance-seeking nature with
one participant playing the role of an instructor explaining to the
user how to configure and operate a fax-machine. Schiphol
Information Office dialogues are information-seeking dialogues
where an assistant is requested to provide a client the information
all around the airport activities and facilities (e.g. timetable,
security, etc.). The original DIAMOND dialogue is pre-segmented per
dialogue utterance for each speaker with indication of utterance
start and end time. The original Schiphol dialogues are
pre-segmented per speaker turn without authentic turn timings.
For Italian 6 selected dialogues (393 utterances) from the SITAL
corpus were annotated. All dialogues are two-agent human-human
information-seeking dialogues. The SITAL corpus contains dialogues
between a travel agency's operator and a person seeking travel
information or to book a ticket, a hotel room or a flight.
1 For more information about the TRAINS corpus please visit
http://www.cs.rochester.edu/research/speech/trains.html
2 Detailed information about the MapTask project can be found at
http://www.hcrc.ed.ac.uk/maptask/
3 See Geertzen et al. 2004
-
2.3 Annotation Tool
For the dialogue act annotation the ANVIL tool was used
(http://www.dfki.de/~kipp/ANVIL). The tool allows the
multidimensional segmentation of dialogue units into functional
segments and their annotation (labelling) in multiple dimensions
simultaneously. For ANVIL is also no problem that the annotator can
mark up discontinuous segments and re-segment the pre-segmented
dialogue units, e.g. some dialogues were presented in pre-segmented
form, either per turn as in the Dutch Schiphol Information Office
corpus or per utterance as in the Dutch DIAMOND corpus, so using
ANVIL annotators had the possibility to cut larger units into
smaller functional segments.
Figure 1 shows the annotator’s interface of the ANVIL tool and
how it organizes the annotation work.
2.4 Examples
In this section we illustrate the data categories defined for
dialogue acts with examples from the annotated corpora. Some of
these examples are also shown in the XML-representation extracted
from the original ANVIL-files in Appendix I.B.
/setQuestion/ Language English
Corpus TRAINS
http://www.dfki.de/~kipp/anvil
-
Example Speaker Start time End time Utterance DA-label B1
58.05684 60.52592 How far is it from Avon
to Bath? Task: SetQuestion
/propositionalQuestion/ Language English
Corpus Map Task
Example Speaker Start time End time Utterance DA-label A1
15.41509 16.4828 Have you got a
graveyard in the middle? Task: PropositionalQuestion
/alternativesQuestion/ Language English
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label A1
60.02543 62.99501 Do you wanna take the
boxcars with you or do you want to leave them in Elmira?
Task: AlternativesQuestion
/checkQuestion/ Language English
Corpus Map Task
Example Speaker Start time End time Utterance DA-label B1
22.68888 24.25708 Due south and then back
again? AutoFeedback: CheckQuestion
/indirectSetQuestion/ Language English
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label B1
128.15881 129.46008 Then from Dansville to
Corning? (full form is: How far is it from Dansville to
Corning?)
Task: IndirectSetQuestion
/indirectPropositionalQuestion/ Language English
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label B1
67.29922 72.53768 I was wondering
if we could actually pick up
Task: IndirectPropositionalQuestion
-
those two boxcars which are in Bath?
/indirectAlternativesQuestion/ Language Italian
Corpus SITAL
Example Speaker Start time End time Utterance DA-label A1
136.96742 138.73582 o viaggiamo la
mattina oppure il e pomeriggio VOC puff
Task: indirectAlternativesQuestion
/inform/ Language English
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label A1
38.97149 38.97149 We can unload any
amount of cargo onto a train in one hour
Task: Inform
/agreement/ Language English
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label B1
57.22269 58.95772 It’s five hours now that
we are back in Corning Task: Inform
A1 59.29138 59.55831 Yes Task: agreement
/disagreement/ Language English
Corpus Map Task
Example Speaker Start time End time Utterance DA-label A1 124.4
126.84 That’s a very
unnatural motion to
Task: disagreement
/correction/ Language English
Corpus Map Task
Example Speaker Start time End time Utterance DA-label B1
139.50325 140.53759 Just a straight line Partner
-
along Communication Management: Completion; Turn: TurnGrab
A1 141.50521 142.60628 No, this is a curve Task: Correction
/setAnswer/ Language English
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label B1
15.68202 16.4828 What do we have here? Task: SetQuestion B2 16.4828
18.95189 We have three tankers
available in Corning Task: SetAnswer
/propositionalAnswer/ Language English
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label B1
39.43861 39.43861 Will either one take me
any quicker? Task: PropositionalQuestion
A1 41.95354 42.17462 No Task: PropositionalAnswer
/confirm/ Language English
Corpus Map Task
Example Speaker Start time End time Utterance DA-label B1
43.10887 44.17658 Slightly northeast AutoFeedback:
CheckQuestion A1 44.74381 45.64469 Yeah, very slightly
AlloFeedback:
Confirm
/disconfirm/ Language English
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label A1 50.049
52.18442 If you are trying to fill
how many tankers three?
Task: CheckQuestion
B1 52.78501 53.85272 No, just one transport Task: Disconfirm
/instruct/ Language English
-
Corpus Map Task
Example Speaker Start time End time Utterance DA-label A1
72.43759 75.50726 You should be avoiding
that by quite a distance Task: Instruct
/suggest/ Language Dutch
Corpus DIAMOND
Example Speaker Start time End time Utterance DA-label A1
566.98844 571.42612 maar ik denk dat ik
anders misschien opdracht 3 alvast daarbij mee moet nemen en dat
ik op het einde terugga
Discourse Structuring: Suggest
/request/ Language English
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label B1
167.69753 169.73283 So, did you wanna
repeat that plan? AlloFeedback: Request
/acceptRequest/ Language English
Corpus Trains
Example Speaker Start time End time Utterance DA-label B1
167.69753 169.73283 So, did you wanna
repeat that plan? AlloFeedback: Request
A1 170.43353 170.7672 okay AutoFeedback: AcceptRequest
/declineRequest/ Language Dutch
Corpus Schiphol
Example Speaker Start time End time Utterance DA-label A1
44.97737 48.97835 kunt u nog eens
zeggen van waar naar waar u wilt reizen
Task: request
B1 49.18148 50.049 Nee Task: declineRequest
/promise/
-
Language Dutch
Corpus Schiphol
Example Speaker Start time End time Utterance DA-label A1
29.16188 30.1295 Ik zal hem om laten
roepen Task: Promise
/offer/ Language English
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label A1
0.43376 1.23454 Can I help you? Task: Offer
/acceptOffer/
Language English
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label A1
0.43376 1.23454 Can I help you? Task: Offer B1 1.6683 1.96859 yeah
Task: AcceptOffer
/declineOffer/
Language Dutch
Corpus Schiphol
Example Speaker Start time End time Utterance DA-label A1
16.57476 18.34549 Wilt u nog een andere
verbinding weten
Task: Offer
B1 18.34549 19.16448 nee Task: declineOffer
/positiveAutoFeedback/ Language English
Corpus Map Task
Example Speaker Start time End time Utterance DA-label A1
254.18219 256.2175 And above that
there’s an east lake Task: Inform
B1 256.68463 257.21848 Oh, right AutoFeedback:
positiveAutoFeedback
/negativeAutoFeedback/ Language Italian
Corpus SITAL
Example Speaker Start time End time Utterance DA-label
A1 37.13636 38.00387 mi scusi, mi ha detto alle dieci o alle
dodici e cinquantacinque
Auto Feedback: negativeAutoFeedback
-
Language English
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label
A1 29.067310 30.471672 I'm sorry what was the next question
Auto Feedback: negativeAutoFeedback
/feedbackElicitation/ Language English
Corpus Map Task
Example Speaker Start time End time Utterance DA-label A1 503.76
504.72 Does that make sense Allo Feedback:
feedbackElicitation
/negativeAlloFeedback/ Language Dutch
Corpus DIAMOND
Example Speaker Start time End time Utterance DA-label B1
741.19232 750.86846 je kunt als
alternatief voor het faxnummer gewoon intypen met de
cijfertoetsen kun je dus ook naamtoetsen gebruiken... of verkorte
kiescodes
Task: Instruct
A1 753.30418 754.10497 maar dat wil ik niet AlloFeedabck:
negativeAlloFeedback
/positiveAlloFeedback/ Language English
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label A1
170.7672 187.31673 engine E one goes
to Dansville picks up three boxcars goes to Corning is loaded
with oranges and goes to Bath engine E two picks takes the two
boxcars at Elmira to Corning where they're
AutoFeedback: Inform
-
loaded with oranges and then takes them to Bath
B1 187.88396 188.11751 Okay AlloFeedback:
positiveAlloFeedback
/turnKeep/ Language English
Corpus Map Task
Example Speaker Start time End time Utterance DA-label A1
122.88698 123.05381 So Turn Management:
TurnTake/TurnKeep
/turnGive/ Language Dutch
Corpus Schiphol
Example Speaker Start time End time Utterance DA-label A1 0
0.50049 Schiphol Inlichtingen Contact
Management: ContactIndication; Turn Management: TurnGive; Social
Obligation Management: initialSelfIntroduction
/turnAccept/ Language Dutch
Corpus Schiphol
Example Speaker Start time End time Utterance DA-label A1 0
0.50049 Schiphol Inlichtingen Contact
Management: ContactIndication; Turn Management: TurnGive; Social
Obligation Management:
/turnTake/ Language English
Corpus Map Task
Example Speaker Start time End time Utterance DA-label A1
122.88698 123.05381 So Turn Management:
TurnTake/TurnKeep
-
initialSelfIntroductionB1 0.50049 0.56722 Ja Contact
Management: ContactIndication; Turn Management: TurnAccept
/turnGrab/ Language English
Corpus Map Task
Example Speaker Start time End time Utterance DA-label A1
148.41197 148.41197 Which we’re going to
pass on the south Task: Instruct; Turn Management: TurnGrab
/turnRelease/
Language
Corpus
Example In two-agent dialogues turn transitions from one
participant to another are different than those in, for example,
multi-party interaction, and are as a rule more smoothly. Thus, one
participant normally may give the turn to his/her partner or keep
the turn. When the speaker wants the partner the opportunity to
take the turn, he/she is doing this implicitly by stopping
talking.
/stalling/ Language English
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label A1
81.44641 87.28546 Then six a.m. at
Dansville , nine a.m. at Avon
Task: Inform
A2 84.8831 86.11765 um … Time Management: stalling; Turn
Management: TurnKeep
/pausing/
Language English
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label B1
94.39241 94.75944 Wait a second Time Management:
pausing; Turn Management: TurnKeep
/completion/
Language English
-
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label A1
55.98815 59.8586 That’ll get there at four
into Corning Task: Inform
B1 59.09119 60.92632 And load up Partner Communication
Management: completion
/correctMisspeaking / Language English
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label A1
42.80858 44.97737 second engine E3 is
going to uhh city H to pick up the bananas, back to A, dro…
Task: Inform
B1 44.97737 45.97835 H to pick up the oranges
Partner Communication Management: correctMisspeaking
/signalSpeakingError/ Language English
Corpus Map Task
Example Speaker Start time End time Utterance DA-label B1
93.49153 93.82519 Oh oh Own Communication
Manegement: signalSpeakkingError
/selfCorrection/ Language English
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label A1
58.05684 60.52592 How far is it from
Avon to Bath? Task: SetQuestion
A2 60.65939 61.22661 to Corning Own Communication Management:
SelfCorrection
/contactIndication/ Language English
Corpus TRAINS
-
Example Speaker Start time End time Utterance DA-label A1
0.33366 0.36703 Hi Contact
Management: contactIndication; Social Obligation Management;
initialGreeting
/contactCheck/ Language Dutch
Corpus DIAMOND
Example Speaker Start time End time Utterance DA-label A1
1108.92004 1109.57996 eh jeroen? Contact
Management: contactCheck
/interactionStructuring/ Language English
Corpus Map Task
Example Speaker Start time End time Utterance DA-label A1
0.33366 1.36801 Starting off Discourse Structure
Management: interactionStructuring
Language Dutch
Corpus DIAMOND
Example Speaker Start time End time Utterance DA-label A1
306.48999 307.94 he Jeroen ik heb nog
een vraag Discourse Structure Management:
interactionStructuring
/initialGreeting/ Language English
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label A1
0.33366 0.36703 Hi Contact
Management: contactIndication; Social Obligation Management:
initialGreeting
Language Italian
-
Corpus SITAL
Example Speaker Start time End time Utterance DA-label A1
0.36703 0.66732 buongiorno Social Obligation
Management: initialGreeting
/returnGreeting/
Language English
Corpus TRAIN
Example Speaker Start time End time Utterance DA-label A1
0.33097 0.37163 Hi Contact
Management: contactIndication; Social Obligation Management:
initialGreeting
B1 1.99187 2.09128 Hi Social Obligation Management:
returnGreeting
/initialSelfIntroduction/ Language Dutch
Corpus Schiphol
Example Speaker Start time End time Utterance DA-label A1 0.07
1.00098 Inlichtingen Schiphol Contact
Management: contactIndication; Turn Management: turnGive; Social
Obligation Management: initialSelfIntroduction;
/returnSelfIntroduction/ Language Dutch
Corpus Schiphol
Example Speaker Start time End time Utterance DA-label A1 0.07
1.00098 Inlichtingen Schiphol Contact
Management: contactIndication; Turn Management: turnGive;
-
Social Obligation Management: initialSelfIntroduction
B1 1.10108 2.03533 met mevrouw van der Wilde
Social Obligation Management: returnSelfIntroduction
/initialGoodbye/ Language Dutch
Corpus Schiphol
Example Speaker Start time End time Utterance DA-label B1
15.48182 16.04905 Goedemiddag Social Obligation
Management: initialGoodbye
/returnGoodbye/ Language Dutch
Corpus Schiphol
Example Speaker Start time End time Utterance DA-label B1
15.48182 16.04905 Goedemiddag Social Obligation
Management: initialGoodbye
A1 16.04905 16.31597 Dag Social Obligation Management:
returnGoodbye
/apology/ Language Italian
Corpus SITAL
Example Speaker Start time End time Utterance DA-label A1
37.13636 38.00387 mi scusi Social Obligation
Management: apology
/acceptApology/ Language English
Corpus TRAINS
Example Speaker Start time End time Utterance DA-label
B1 365.620000 367.804207 to Avon I'm + sorry Social Obligation
Management: apology
A1 365.446201 366.193304 +okay Social Obligation Management:
acceptApology
-
/thanking/ Language Dutch
Corpus Schiphol
Example Speaker Start time End time Utterance DA-label B1
14.14718 14.98133 hartelijk bedankt Social Obligation
Management: thanking
/acceptThanking/ Language Dutch
Corpus Schiphol Example Speaker Start time End time Utterance
DA-label
B1 14.14718 14.98133 hartelijk bedankt Social Obligation
Management: thanking
A1 14.98133 15.48182 Tot uw dienst Social Obligation Management:
acceptThanking
Data categories coverage: tag occurrences in the annotated test
suites
The following table gives an overview of the tag occurrences and
the coverage of the data categories (percentage between brackets)
in the annotatedtest suites for each data category and each
language.
Data category English Dutch Italian
/setQuestion/ 22 (2.99%) 9 (1.79%) 16 (4.07%)
/propositionalQuestion/ 18 (2.45%) 21 (4.17%) 19 (4.8%)
/alternativesQuestion/ 1 (0.13%) 1 (0.2%) 3 (0.8%)
/checkQuestion/ 35 (4.76%) 30 (5.96%) 8 (2.04%)
/indirectSetQuestion/ 2 (0.27%) 4 (0.8%) 12 (3.1%)
/indirectPropositionalQuestion/ 1 (0.13%) 5 (1%) 5 (1.3%)
/indirectAlternativesQuestion/ 0 1 (0.2%) 2 (0.5%)
/inform/ 122 (16.6%) 89 (17.7%) 56 (14.25%)
/agreement/ 2 (0.27%) 3 (0.6%) 30 (7.6%)
/disagreement/ 1 (0.13%) 0 0
/correction/ 1 (0.13%) 4 (0.8%) 0 /setAnswer/ 26 (3.54%) 12
(2.39%) 33 (8.4%)
/propositionalAnswer/ 12 (1.63%) 21 (4.17%) 17 (4.33%)
/confirm/ 29 (3.95%) 26 (5.17%) 14 (3.6%)
/disconfirm/ 1 (0.13%) 1 (0.2%) 1 (0.3%)
-
/instruct/ 56 (7.62%) 17 (3.4%) 0
/suggest/ 0 1 (0.2%) 0
/request/ 1 (0.13%) 2 (0.4%) 9 (2.3%)
/acceptRequest/ 1 (0.13%) 1 (0.2%) 2 (0.5%)
/declineRequest/ 0 1 (0.2%) 0
/promise/ 0 3 (0.6%) 2 (0.5%)
/offer/ 5 (0.68%) 3 (0.6%) 6 (1.5%)
/acceptOffer/ 5 (0.68%) 2 (0.4%) 1 (0.3%)
/declineOffer/ 0 1 (0.2%) 3 (0.8%)
/positiveAutoFeedback/ 174 (23.67%) 96 (19.1%) 73 (18.6%)
/positiveAlloFeedback/ 2 (0.27%) 1 (0.2%) 19 (4.8%)
/negativeAutoFeedback/ 2 (0.27%) 0 1 (0.3%)
/negativeAlloFeedback/ 0 1 (0.2%) 2 (0.5%)
/feedbackElicitation/ 1 (0.13%) 0 0
/turnAccept/ 2 (0.27%) 7 (1.4%) 0
/turnGive/ 3 (0.41%) 13 (2.6%) 0
/turnGrab/ 17 (2.31%) 11 (2.2%) 0
/turnKeep/ 138 (18.78%) 87 (17.3%) 5 (1.3%)
/turnRelease/ 0 0 3 (0.8%)
/turnTake/ 69 (9.39%) 43 (8.5%) 46 (11.7%)
/stalling/ 74 (10.07%) 72 (14.3%) 27 (6.9%)
/pausing/ 5 (0.68%) 9 (1.79%) 11 (2.8%)
/completion/ 4 (0.54%) 0 0
/correctMisspeaking / 1 (0.13%) 0 0
/signalSpeakingError/ 4 (0.54%) 0 0
/selfCorrection/ 20 (2.72%) 6 (1.2%) 3 (0.8%)
/contactIndication/ 5 (0.68%) 12 (2.39%) 7 (1.8%)
/contactCheck/ 1 (0.13%) 2 (0.4%) 4 (1.02%)
/interactionStructuring/ 12 (1.63%) 9 (1.79%) 4 (1.02%)
/initialGreeting/ 5 (0.68%) 5 (1%) 6 (1.5%)
/returnGreeting/ 1 (0.13%) 1 (0.2%) 8 (2.04%)
/initialSelfIntroduction/ 0 6 (1.2%) 6 (1.5%)
/returnSelfIntroduction/ 0 4 (0.8%) 4 (1.02%)
/initialGoodbye/ 0 6 (1.2%) 4 (1.02%)
/returnGoodbye/ 0 5 (1%) 2 (0.5%)
/apology/ 0 0 1 (0.3%)
/acceptApology/ 1 (0.13%) 0 0
/thanking/ 1 (0.13%) 7 (1.4%) 6 (1.5%)
/acceptThanking/ 0 5 (1%) 0
-
Table 1: Tag occurrences and data category coverage (in %)
in the tested corpora for each language
From Table 1 we may observe that all the data categories defined
for the communicative functions of dialogue acts occurred in the
test suites for at least one the languages. No utterance was
labeled by the annotators as UNCODED.
Inter-annotator agreement and discussion of issues arising
For the purpose of qualitative evaluation of the proposed data
categories for the dialogue act annotation the inter-annotator
agreement was calculated using the standard kappa statistic (see
Cohen, 1960, Carletta, 1996). This measure is given by:
κ = (P(A) – P(E)) / (1 – P(E))
where P(A) is the proportion of times that the k annotators
agree and P(E) is the proportion of agreement expected the k
annotators agree by chance. The ag3reement was measured on both
annotation tasks (segmentation and classification). The analysis
was made on 2 Map Task dialogues (386 utterances) and 3 TRAINS
dialogues (187 utterances) for English and 1 DIAMOND dialogue (301
utterances) and 5 Schiphol dialogues (152 utterances) for Dutch.
Each utterance was tagged by two trained annotators independently.
As for classification task the agreement was calculated for each
data category in isolation and for the annotation task in general.
The results for the segmentation task are as follows:
Corpus P(A) P(E) Kappa Map Task 0.99 0.96 0.74
TRAINS 1.00 1.00 nav
DIAMOND 1.00 1.00 nav
Schiphol 0.99 0.94 0.83
Table 2: Inter-annotator agreement (kappa statistic on
segmentation task.
According to the scale proposed by Rietveld and van Hout (1993)
these kappa values reflect substantial and almost perfect agreement
between the annotators on the segmentation task. According to the
scale proposed by Landis and Koch (1997), these values show good
and high agreement (scores are higher than 0.6).
Table 3 presents the agreement statistics on the dialogue act
classification task after segmentation. The table shows
near-perfect between the annotators both for all separate classes
of dialogue act functions and for the corpus data as a whole
(evaluation according to Rietveld & van Hout, 1993). (See also
Geertzen & Bunt, 2006 for more complex but also more accurate
forms of weighted agreement calculation.)
Data category Kappa
Information Seeking functions 0.983
Information Providing functions 0.989
-
Action Discussion functions 0.994
Auto-Feedback functions 0.994
Allo-Feedback functions 1.00
Turn final functions 0.958
Turn initial functions 0.954
Time Management functions 0.971
Partner Communication Management functions 1.00
Own Communication Management functions 0.929
Contact Management functions 0.956
Discourse Structuring functions 0.982
Social Obligation Management functions 0.938
Whole corpus of annotated data 0.93
Table 3: Inter-annotator agreement on the classification task,
measured with kappa statistic.
Further analysis of the confusion matrix that can be constructed
from the data (see Table 5 below) shows in which cases the
annotators experienced some difficulties in reaching agreement. The
following cases can be identified:
- CheckQuestions vs Inform: CheckQuestions and Informs often
have the same surface structures and observable features, such as
word order, declarative intonation and pitch contour.
Discrimination between these two communicative functions often
requires knowledge of the context, in particular of the dialogue
history and of the distribution of information among the dialogue
participants (for instance, whether a participant is an expert on
the content of the dialogue act).
-
- Inform vs SetAnswer: SetAnswers can be confused with Informs
if an annotator only takes the dialogue history into account,
namely noticing that the previous utterance was a Question. Replies
to Questions are often Answers, but this not always the case.
For example:
A1 Ik zou graag willen weten wanneer het toestel met het
vluchtnummer KL
678 of dat morgenochtend of zaterdagochtend aankomt. (I would
like to know when the plane with flight number KL 678 whether it
arrives tomorrowmorning or Saturday morning)
B1 Moment hoor. (Just a moment)
B2 Hij komt morgen en zaterdag. (It arrives tomorrow and on
Saturday)
The utterance (B2) is not a SetAnswer but rather an Inform,
telling the other participant that the presupposition of the AltsQ
uestion was false.
- Turn Management function vs no Turn Management function: see
Table 4. The table shows that annotators failed to reach 100%
agreement on assigning Turn Management functions (this applies to
all function except for TurnGrab), where one annotator decided not
to assign any Turn Management function. This was invariabley caused
by a lack of evidence in the communicative behavior of dialogue
participants in the form of observable features which would reflect
such functions. Such features may be linguistic cues (e.g. ‘uhm’)
and intonation properties (e.g. pauses, rising intonation, word
lengthening, etc.).
Label none TurnGive TurnKeep TurnAccept TurnGrab TurnTake
none 1677
TurnGive 10 26
TurnKeep 5 190
TurnAccept 3 5
TurnGrab 21
TurnTake 7 90
Table 4: Confusion matrix for Turn Management function
assignment
Difficult cases for annotators where so-called backchannels,
signals by which a participant may indicate his understanding of
what is said without necessarily accepting or agreeing with what is
said. The producer of a backchannel does not take the turn and does
not wish to interfere with what the partner is saying, nor does he
wish to show the intention to interrupt and obtain the turn, but he
wants to show an active listening attitude and/or encourage the
partner to continue. Such phenomena often take place in hesitation
phases when one of the dialogue partners signals difficulty to
complete his/her utterance, by making pauses, stallings, producing
other vocal signs such as heavy breathing or puffing.
For example:
-
A1 and then we're going to turn east
A2 turn ... VOC_inbreath
B1 mmhmm
A3 not ... straight east ... slightly sort of northeast
Whether a Feedback act has a Turn Giving function or not is
sometimes difficult to decide for a human annotator, because the
differences between Turn Giving vs no Turn Giving can be very
subtle (voicing, pitch contour, energy, initial pauses, etc.),
making the annotator’s decision rather subjective.
A similar scenario was observed for assigning Turn Keep
functions (did hesitation take place) and Turn Take (did speaker
perform a separate act to that effect). Here also ,prosodic rather
than lexical cues indicate the speaker’s intentions to manage this
aspect of the interaction.
The other observed disagreements were accidental in nature and
can be disregarded.
3 Semantic Role Annotation
3.1 Annotation Task
We define a semantic role as the type of relationship that a
participant plays in some real or imagined situation; therefore the
semantic role annotation task involved two main activities:
• Identification and labeling of markables: expressions that
represent the entities involved in semantic role relations.
Markables come in two varieties:
• anchors, which correspond to one of three situation (or
‘eventuality’) types: events, states and facts (every semantic role
must be ‘anchored’ to a situation of one of these types). Anchors
are realised mainly by verbs but sometimes also by nouns.
• situation participants. The are realised mainly by nouns, noun
phrases and pronouns (ignoring event coreference, temporal
coreference, etc.).
• Identification and labeling of links: referential relations
between participant and anchor markables.
Annotators were provided with Annotation Guidelines for semantic
role annotation (see Appendix II.A)
In order to have a reasonable coverage of different types of
semantic roles, it was decided that a minimum of 500 sentences per
language should be annotated. Tet suites with semantic role
annotations were constructed for four languages: English, Dutch,
Italian, and Spanish.
For Dutch and English all test suite material was annotated
independently by at least three different annotators, in order to
investigate the usability of the tagset in terms of inter-annotator
agreement.
-
3.2 Corpora
For English FrameNet and PropBank data was used. We selected
three unbroken FrameNet texts (120 sentences) and separate
sentences (83 sentences). PropBank data consists of isolated
sentences (355 sentences).
For Dutch 15 unbroken texts were selected from news articles,
with a total of 260 sentences.
News articles were also selected to construct Italian test
suites (101 sentences). All files were taken from the Italian
Treebank corpus.
For Spanish, the LIRICS test suite consists of 189 sentences
taken from the Spanish FrameNet corpus.
3.3 Annotation Tool
The annotations were made using the GATE annotation tool form
the University of Sheffield4. GATE provides annotators with a
graphical interface for indicating which pieces of text denote
relevant concepts (the ‘maarkables’). For the LIRICS annotation
task two types of annotation label have been added to GATE:
SemanticAnchor and SemanticRole (updated gate.jar file was provided
by UtiL).
Figure 2 shows the GATE interface for annotators and how it
organizes an annotators’ activities.
4 See: http://gate.ac.uk for further details and
http://gate.ac.uk/documentation.html for documentation.
-
3.4 Examples from the annotated test suites
This section contains examples from the annotated corpora, which
illustrate the data categories defined for semantic roles.
/agent/
Language Spanish
Corpus Spanish FrameNet
Example [El partido popular Agent,e11] ha planteadoe1 [las
elecciones Theme,e1] [para llegar al Gobierno de la Nacion Reason,
e1].
Language English
Corpus FrameNet
Example [Libya Agent,e1&e2] has showne1 [interest Theme,e1]
in and takene2 [steps Theme, e2] [to acquiree3 [weapons of mass
destruction (WMD) Theme, e3] and their delivery systems Purpose,
e1&e2].
/partner/
Language English
Corpus FrameNet
Example [On 19 December 2003 Time, e1], [Libyan leader col.
Muammar Gadhafi Agent,e1] [publicly Manner, e1] confirmede1 [his
commitmente2 [to disclosee3 and dismantlee4 [WMD programs Patient,
e3&e4] [in his country Location, e3&e4] Theme,e1] Purpose,
e2] [following [a nine-month period Duration, e5] of negotiationse5
[with US and UK authorities Partner,e5] Reason, e1].
/cause/
Language English
Corpus FrameNet
Example [Signing the protocol Cause, e1] would ensuree1 [IAEA
Beneficiary, e1] [oversight over Libya's nuclear transition from
weapons creation to peaceful purposes Reason, e1].
-
/instrument/
Language English
Corpus FrameNet
Example [In 2003 Time, e1], Libya admittede1 [its previous
intentions to acquiree2 [equipment Theme, e2; Instrument, e4]
needede3 [to producee4 [biological weapons (BW) Result, e4]
Purpose, e3] Theme, e1]
/patient/
Language English
Corpus PropBank
Example [White women Agent, e1&e2] servee1 [tea and coffee
Theme, e1] , and then washe2 [the cups and saucers Patient, e2]
[afterwards Time, e2] .
/pivot/
Language English
Corpus PropBank
Example [Vicar Marshall Agent, e1; Pivot, e2] admitse1 [to mixed
feelingse2 [about this issue Theme, e2] Theme, e1].
/theme/
Language Spanish
Corpus Spaniish FrameNet
Example [China Pivot, e1] no representae1 [un peligro militar
Theme, e1].
Language English
Example [One man Agent, e1] wrappede1 [several diamonds Theme,
e1] [in the knot of his tie Final_Location, e1].
-
/beneficiary/
Language English
Corpus PropBank
Example [U.S. Trust Agent, e1] [recently Time, e1] introducede1
[certain mutual-fund products Theme,e1] , which allowe2 [it
Beneficiary, e2] [to servee3 [customers Beneficiary, e3] Purpose,
e2].
/source/
Language English
Corpus PropBank
Example [Eaton Beneficiary, e1] earnede1 [from continuing
operations Source, e1]
/goal/
Language English
Corpus PropBank
Example [The executive Agent, e1] recallse1 [[Mr. Corry Agent,
e2] whisperinge2 [to him and others Goal, e2] Theme, e1].
/result/
Language English
Corpus PropBank
Example [Within the past two months Duration, e1] [a bomb
Patient, e1; Cause, e2] explodede1 [in the offices of the El
Espectador in Bogota Location, e1], [destroyinge2 [a major part of
its installations and equipment Patient, e2] Result, e1]
/reason/
-
Language English
Corpus PropBank
Example [Elisa Hollis Agent, e1] launchede1 [a diaper service
Result, e1] [last year Time, e1] [because [State College , Pa.
Pivot, e2] didn't havee2 [one Theme, e2] Reason, e1].
/purpose/
Language English
Corpus PropBank
Example [Two steps Theme, s1] ares1 [necessary Attribute, s1]
[to translatee1 [this idea Patient, e1] [into action Result, e1]
Purpose, s1]
/time/
Language English
Corpus PropBank
Example [Right now Time, e1] [[about a dozen Amount, e1]
laboratories Agent, e1&e2] , [in the U.S. , Canada and Britain
Location, e1] , are racinge1 [to unmaske2 [other suspected
tumor-suppressing genes Theme, e2] Purpose, e1].
/manner/
Language English
Corpus PropBank
Example [These rate indications Theme, s1] ares1 n't [directly
Manner, s1] comparables1.
/medium/
-
Language English
Corpus PropBank
Example [They Pivot, s1; Agent, e1] coulds1 seee1 [the 23 pairs
of chromosomes Theme, e1] [in the cells Location, e1] [under a
microscope Medium, e1].
/means/
Language English
Corpus FrameNet
Example [Sears Agent, e1] blanketede1 [the airwaves Patient, e1]
[with ads about its new pricing strategy Means, e1]
/setting/
Language English
Corpus FrameNet
Example [A number of medical and agricultural research centers
Pivot, s1; Intsrument, e1] hads1 [the potential Attribute, s1] to
be usede1 [in BW research Setting, e1].
Here comee1 [the ringers Agent, e1&e2] [from above
Initial_Location, e1], makinge2 [a very obvious exit Theme, e2]
[while [the congregation Pivot, s1] iss1 [at prayer Setting, s1]
Time, e1&e2]
/location/
Language English
Corpus FrameNet
Example [Here Location, s1] iss1 [an example Theme, s1].
[They Patient, e1] aren't acceptede1 [everywhere Location,
e1]
[The stairs Theme, s1] are locateds1 [next to the altar
Location,s1]
/initialLocation/
-
Language English
Corpus FrameNet
Example Here comee1 [the ringers Agent, e1&e2] [from above
Initial_Location, e1], makinge2 [a very obvious exit Theme, e2]
[while [the congregation Pivot, s1] iss1 [at prayer Setting, s1]
Time, e1&e2]
/finalLocation/
Language English
Corpus PropBank
Example [One man Agent, e1] wrappede1 [several diamonds Theme,
e1] [in the knot of his tie Final_Location, e1].
/path/
Language English
Corpus PropBank
Example [Father McKenna Agent, e1] movese1 [through the house
Path, e1] [praying in Latin Manner, e1]
/distance/
Language English
Corpus FrameNet
Example [Libya Agent, e1] pledgede1 [to eliminatee2 [[ballistic
missiles Pivot, s1] capables1 of travelinge3 [more than 300km
Distance, e3] Patient, e2] Theme, e1].
-
/amount/
Language English
Corpus PropBank
Example [The ruble Theme, s1] iss1 n't worths1 [much Amount,
s1].
/attribute/
Language English
Corpus
Example [A number of medical and agricultural research centers
Pivot, s1; Intsrument, e1] hads1 [the potential Attribute, s1] to
be usede1 [in BW research Setting, e1].
/frequency/
Language English
Corpus PropBank
Example [President Zia of Pakistan Agent, e1] [repeatedly
Frequency, e1] statede1 [that [fresh Soviet troops Patient, e2]
were being insertede2 [into Afghanistan Final_Location, e2] Theme,
e1]
3.5 Data category coverage
Table 5 gives an overview of the tag occurrences and the
coverage of the data categories by the test suites (percentages
between brackets) for each defined data category and language.
Data category English Dutch Italian Spanish
Total amount of objcts 1795 1326 454 1356
/agent/ 311 (17.3%) 186 (14%) 60 (13.2%) 258 (19%)
/partner/ 5(0.3%) 9 (0.7%) 2 (0.4%) 3 (0.2%)
/cause/ 39 (2.2%) 33 (2.5%) 2 (0.4%) 43 (3.2%)
-
/instrument/ 10 (0.56%) 7 (0.5%) 7 (1.5%) 4 (0.3%)
/patient/ 186 (10.4%) 137 (10.3%) 51 (11.2%) 119 (8.8%)
/pivot/ 104 (5.8%) 85 (6.4%) 51 (11.2%) 154 (11.4%)
/theme/ 501 (27.9%) 331 (25%) 117 (25.6%) 315 (23.2%)
/beneficiary/ 40 (2.02%) 19 (1.4%) 7 (1.5%) 63 (4.7%)
/source/ 16 (0.9%) 31 (2.3%) 7 (1.5%) 2 (0.1%)
/goal/ 18 (1%) 13 (1%) 13 (2.9%) 5 (0.4%)
/result/ 66 (3.7%) 54 (4.1%) 14 (3.1%) 24 (1.8%)
/reason/ 36 (2%) 14 (1.1%) 9 (2%) 43 (3.2%)
/purpose/ 49 (2.7%) 18 (1.4%) 7 (1.5%) 24 (1.8%)
/time/ 135 (7.5%) 106 (8%) 13 (2.9%) 65 (4.8%)
/manner/ 39 (2.2%) 27 (2%) 18 (4%) 44 (3.2%)
/medium/ 4 (0.2%) 1 (0.1%) 2 (0.4%) 8 (0.6%)
/means/ 8 (0.4%) 6 (0.5%) 0 2 (0.1%)
/setting/ 47 (2.6%) 48 (3.6%) 16 (3.5%) 28 (2.1%)
/location/ 41 (2.3%) 66 (5%) 24 (5.3%) 34 (2.5%)
/initial_location/ 2 (0.1%) 1 (0.1%) 2 (0.4%) 5 (0.4%)
/final_location/ 6 (0.3%) 10 (0.8%) 7 (1.5%) 43 (3.2%)
/path/ 20 (1.1%) 9 (0.7%) 0 0
/distance/ 1 (0.06%) 0 1 (0.2%) 0
/amount/ 27 (1.5%) 19 (1.4%) 11 (2.4%) 17 (1.3%)
/attribute/ 72 (4%) 88 (6.6%) 6 (1.3%) 45 (3.3%)
/frequency/ 12 (0.7%) 8 (0.6%) 0 9 (0.7%)
-
unclassified 0 0 6 (1.3%) 0
Table 5: Tag occurrences and data categories coverage (in %)
in the tested corpora for each language in isolation
3.6 Inter-annotator agreement and discussion of issues
arising
Three annotators annotated the test suties for English and Dutch
independently. The annotators were students of linguistics, Dutch
naitive speakers, and their level of English knowledge was
evaluated as proficient. The annotators had no previous experience
an annotation; they received one afternoon of training in
annotation using LIRICS data categories and the annotation tool.
They also received a short (7 pages) document with annotation
guidelines (see Appendix II.A). This allowed an evaluation of the
usability of the LIRICS data categories for semantic role
annotation by determining the agreement among the annotators. This
was done in the usual way by calculating the standard kappa
statistic (see Cohen, 1960, Carletta, 1996, and above, section
2.5).
The obtained Kappa scores were evaluated according to Rietveld
& van Hout (1993) and interpreted as all annotators having
reached substantial agreement on all annotation tasks (scores
between 0.61 to 0.8), except for one annotator pair (A2&A3)
whose agreement on labelling Dutch anchors and semantic roles was
moderate (less than 0.61). The results are shown in Table 6 for
each pair of annotators.
Annotators’ pairs A1&A2 A1&A3 A2&A3
Anchors (English) 0.66 0.66 0.61
Semantic roles (English) 0.64 0.68 0.62
Anchors (Dutch) 0.73 0.77 0.54
Semantic roles (Dutch) 0.6 0.65 0.56
Table 6: Inter-annotator agreement on the two labeling tasks
based on kappa statistic
for English and Dutch corpus data
A closer look at the confusion matrices for both corpora (see
Tables 7 and 8 below) shows the disagreement cases. We found that
the following data categories were a source of confusion by
annotators:
• The role of adjectives in descriptions of states or facts by
means of constructions like Copula + Adjective was not labelled
consistently, as in the following example (where (b) is the correct
annotation):
a. Roses are red
-
roses are red .
b. Roses are red roses are red .
• Theme vs Patient: these roles have one distinguishing
property, Theme distinguished from patient by whether it is a
participant that is affected by the event or not; if it is not,
then it is a Theme; if it is, then it is a Patient. Sometimes,
however, it was difficult for annotators to decide whether the
participant is affected/ change by the event or not, for
example:
An ancient stone church stands amid the fields, the sound of
bells cascading from its tower, calling the faithful to
Evensong.
Any question .. is answered by reading this book about sticky
fingers and sweaty scammers .
Individuals close to the situation believe Ford officials will
seek a meeting this week with Sir John to outline their proposal
for a full bid.
Jayark, New York, distributes and rents audio-visual equipment
and prints promotional ads for retailers.
• Theme vs Pivot: Theme is distinguished from Pivot by whether
it is a participant that has the most central role or not; if it is
not, then it is a Theme; if it is, then it is a Pivot. Again, this
can be difficult for annotators to decide.
U.S. officials say they aren't satisfied (Annotator1 labelled as
Theme; other two – as Pivot of the state ‘to be satisfied’)
They may be offshoots of the intifadah, the Palestinian
rebellion in the occupied territories, which the U.S. doesn't
classify as terrorism.
• Theme vs Result: Theme is distinguished from Result by whether
it is a participant that exists independently of the event or not;
if it is, then it is a Theme; if not, then it is either Result.
Together with the 3.6 million shares currently controlled by
management, subsidiaries and directors, the completed tender offer
would give Sea Containers a controlling stake.
Delegates from 91 nations endorsed a ban on world ivory trade in
an attempt to rescue the endangered elephant from extinction
(potential Result).
These are the last words Abbie Hoffman ever uttered.
• Location vs Setting: Setting is distinguished from location by
whether it defines a physical location or not; if it does not, then
it is a setting; if does, then it is location. Some cases, however,
can be ambiguous, for example:
It hopes to speak to students at theological colleges about the
joys of bell ringing.
They settle back into their traditional role of making tea at
meetings.
• Beneficiary vs Goal: Goal is distinguished from Beneficiary by
whether it is a participant that is clearly advantaged or
disadvantaged by the event; if it is, then it is a Beneficiary; if
not, then it may be a Goal. For example:
-
Libya employed Iranian-supplied mustard gas bombs against Chad,
its southern neighbour, in 1987.
When their changes are completed, and after they have worked up
a sweat, ringers often skip off to the local pub, leaving worship
for others below.
• It was pointed out by annotators that Pivot seems to be a
rather general abstract role which subsumes more fined-grained
distinctions such as the experiencer of psychological
events/states, the theme of some states like “owning” etc. On the
other hand, we have such examples like ‘John has a dog’ where
obviously ‘John’ plays a more central role than ‘a dog’, and to
label both participants as Themes would be unsatisfactory. This was
the main reason for introducing the Pivot role.
• It was suggested that it would be more efficient to organize
the roles into a taxonomy, exploiting, for instance, semantic
features like [+/- agentivity] and similar, so that in their
application to real texts, annotators can be presented with
different levels of granularity and perform a case-by-case decision
without being forced to choose a highly specific role, e.g.
Location (general role) and Initial_Location, Final_Location, Path
as sub-roles.
• Also the issue arose whether separate roles like Initial_Time,
Final_Time and
Duration should be defined, since this is an overlap with
temporal information. These data categories are defined for the
domain of temporal annotation, and as semantic roles they seem to
be superfluous. On the other hand, if someone would be interested
only in semantic roles, the proposed set of tags should be complete
and also cover temporal roles. The same could be said about spatial
roles.
-
Agent Amount
Att ben Cau Du FL IL Fre Goal IT Instr
Locatio
Man Mea Medi Part Path Patie Piv Purp Rea Res Sett Sour Them
Ti
Agent 503
Amount 40
Attribute 41
Benefic 62
Cause 22 53
Duration 26
F_locatio 10
I_locatio 18
Frequen 23
Goal 1 23
I_time 16
Instrum 7
Locatio 63
Manner 20 66
Means 2 9
-
Medium 2 4
Partner 9
Path 31
Patient 19 4 5 5 1 2 207
Pivot 40 3 3 145
Purpose 1 1 62
Reason 4 78
Result 9 80
Setting 1 15 3 2 1 1 1 44
Source 1 1 20
Theme 28 4 46 3 7 1 5 1 3 1 103 59 4 3 16 644
time 1 1 196
Table 7: Confusion matrix for semantic roles for English
corpus
-
Agent Amount Att ben Cau Du FL IL Fre Goal IT Instr Locatio Man
Mea Medi Part Path Patie Piv Purp Rea Res Sett Sour Them Ti
Agent 133
Amount 1 9
Attribute 1
Benefic 1 7
Cause 8 7
Duration 1 8
F_locatio 9
I_locatio 2
Frequen 6
Goal 11 16
I_time 1 9
Instrum 1 1 1
Locatio 56
Manner 1 11 1 1 15
Means 1 2
-
Medium 1 1
Partner 1 2 9
Path 2 2 7
Patient 5 13 1 6 1 27
Pivot 39 68
Purpose 9
Reason 1 3 9
Result 2 1 1 13 15
Setting 1 3 1 8 4 29
Source 2 1 2 1 1 3 22
Theme 23 1 14 1 1 1 1 4 4 90 37 11 12 173
time 90
Table 8: Confusion matrix for semantic roles for Dutch
corpus
-
40
4 Reference Annotation
4.1 Annotation Task
The reference annotation tasks involved two main activities:
• Identification and labeling of markables: referential
expressions realised by nouns, noun phrases and pronouns (ignoring
event coreference, temporal coreference, etc.).
• Identification and labeling of links: referential relations
between markables.
Annotators were provided with Annotation Guidelines (see
Appendix III.A).
4.2 Corpora
The annotations were performed on corpus material for four
languages, English, Dutch, Italian, and German:
• For English 177 sentences were selected from the FrameNet
corpus 5 . In their annotation with respect to referential
relations, 375 markables and 233 links were identified. In
addition, 142 sentences were selected from the MUC-6 891102-0148
corpus. In the annotation of these sentences 331 markables and 221
links were identified and labeled.
• For Dutch 274 sentences from news articles were selected for
reference annotation. Annotators identified and labeled 494
markables and 327 coreferential links.
• For Italian 137 sentences from Italian newspaper articles were
annotated, where 736 markables and 265 coreferential links were
identified and labeled.
• The German test suite consisted of 232 sentences from
newspaper articles (Handelsblatt, financial news), where 98
markables were identified and 175 coreferential links.
4.3 Annotation Tool
The annotations were performed using the PALinkA annotation
tool6, an XML-based tool that was originally designed for the
purpose of referential relation annotation. This tool has
considerable advantages:
• It is language/platform/task-independent. The specifications
relevant to the annotation task were provided in an external file
(see Appendix III.B));
• It allows easy identification and labeling of all markables
and links, with point and click actions, and has the possibility to
perform undo/redo/delete operations;
• It has a user-friendly interface.
5 See http://framenet.icsi.berkeley.edu/ for more
information.
6 Visit the Palinka site http://clg.wlv.ac.uk/projects/PALinkA/
for more information and downloads.
-
41
Figure 3 shows the PALinkA annotation interface and the
organization of the annotation work.
Figure 3: Screen short of annotation work using PALinkA
4.4 Examples
In this section we provide some examples from the test suites,
to illustrate each of the data categories for reference annotation
that were defined in LIRICS. The same examples are shown in the
XML-format extracted from the PALinkA files in Appendix III.B.
/synonymy/ Language English
Corpus FrameNet
Example There is a significant amount of open-source literature
concerning Libya 's acquisition and use of [chemical weapons ]( [CW
]) ; [it ] is well documented [that [Libya ]employed
Iranian-supplied mustard gas bombs against [Chad ], [[its ]southern
neighbor ], in 1987 ].
/hyponymy/
Language English
Corpus FrameNet
Example Housing is scarce and [public services ]-- [the court
system , schools , mail service , telephone network and the
highways ]-- are in disgraceful
-
42
condition
/acronymy/ Language English
Corpus FrameNet
Example [Libya ]has shown interest in and taken steps to acquire
[weapons of mass destruction ]( [WMD ])
/compatibility/
Language English
Corpus FrameNet
Example In 2003 , [Libya ]admitted [its ]previous intentions to
acquire equipment needed to produce [biological weapons ]( [BW ]) .
In October and December 2003 , Libyan officials took US and UK
experts to a number of [medical and agricultural research centers
][that ]had the potential to be used in [BW ]research . [The
country ]acceded to the biological and toxin weapons convention on
19 January 1982 . There are allegations that the alleged [chemical
weapon ]( [CW ]) plants at Rabta and Tarhunah could contain [BW
]research facilities as well .
/meronymy/
Language Dutch
Corpus
Example [Alexandra Polier ], [die ]eerder werkte als redacteur
op het kantoor in NewYork van het Amerikaanse persagentschap
Associated Press , is op dit moment in Nairobi op bezoek bij de
ouders van [[haar ]verloofde ], [Yaron Schwartzman ], een Israëliër
die in Kenia opgroeide .
/metonymy/
Language Italian
Corpus
Example E [ la Bimex ][ si ] era affrettata ad aprire [ le [ sue
] porte ] sia [[ alla commissione ministeriale ] che [[ ai
carabinieri ][ del nucleo antisofisticazioni ]] e [ all' Usl ]] :
[[ ispezioni ] e [ controlli ]] avevano trovato tutto regolare .
> , dice tranquillo [ Ugo De Bei ] .
-
43
/partOf/ Language English
Corpus FrameNet
Example [Libya ]'s motivation to acquire [WMD ], and [ballistic
missiles ]in particular , appears in part to be a response to
Israel 's clandestine nuclear program and a desire to become a more
active player in Middle Eastern and African politics [The others
]here today live elsewhere . [They ]belong to a group of [15
ringers ]--
/subsetOf/
Language English
Corpus MUC-6 891102-0148
Example " [The group ]says standardized achievement test scores
are greatly inflated because [teachers ]often " teach the test " as
Mrs. Yeargin did , although [most ]are never caught .
/memberOf /
Language English
Corpus MUC-6 891102-0148
Example [Friends of Education ]rates [South Carolina ]one of
[the worst seven states ]in [its ]study on academic cheating .
/abstract/
Language English
Corpus MUC-6 891102-0148
Example [Mrs. Yeargin ]was fired and prosecuted under an unusual
South Carolina law that makes [it ]a crime [to breach test security
]
/concrete/
Language English
Corpus MUC-6 891102-0148
Example [she ]spotted [a student ]looking at [crib sheets ]
/animate/ Language English
Corpus MUC-6 891102-0148
Example And most disturbing , [it ]is [[educators ], not
students ], [who ]are blamed for much of the wrongdoing .
/inanimate/
Language English
Corpus MUC-6 891102-0148
Example [She ]had seen cheating before , but [these notes ]were
uncanny .
-
44
/alienable/ Language English
Corpus MUC-6 891102-0148
Example And sales of [test-coaching booklets ]for classroom
instruction are booming .
/inalienable/
Language English
Corpus FrameNet
Example Libya then invited the [IAEA ]to verify the elimination
of nuclear weapon related activities .
/naturalGender/
Language English
Corpus FrameNet
Example [Mr. Gonzalez ]is not quite a closet supply-side
revolutionary , however
Language English
Corpus MUC-6 891102-0148
Example [Cathryn Rice ]could hardly believe [her ]eyes .
/cardinality/ Language English
Corpus MUC-6 891102-0148
Example Standing on a shaded hill in a run-down area of [this
old textile city ], [the school ]has educated many of [South
Carolina ]'s best and brightest , including [[the state ]'s last
two governors ], [Nobel Prize winning physicist ][Charles Townes
]and [actress ][Joanne Woodward ].
/collective/
Language English
Corpus MUC-6 891102-0148
Example South Carolina 's reforms were designed for schools like
[[Greenville ]High School ]. And [South Carolina ]says [it ]is
getting results .
/nonCollective/
Language English
Corpus MUC-6 891102-0148
Example [There ]may be [others ]doing what [she ]did .
/countable/ Language English
Corpus MUC-6 891102-0148
Example [The school-board hearing ]at [which ][she ]was
dismissed was crowded
-
45
with [students , teachers and parents ][who ]came to testify on
[her ]behalf .
/nonCountable/
Language English
Corpus MUC-6 891102-0148
Example Says [[the organization ]'s founder ], [John Cannell ],
prosecuting Mrs. Yeargin is " a way for [administrators ]to protect
[themselves ]and look like [they ]take [cheating ]seriously , when
in fact [they ]do n't take [it ]seriously at all . "
/definiteIdentifiableTerm/
Language English
Corpus FrameNet
Example A strong challenge from [the far left ], [the communist
coalition Izquierda Unida ], failed to topple [him ].
/genericTerm/
Language English
Corpus FrameNet
Example [Unemployment ]still is officially recorded at 16.5 % ,
the highest rate in Europe , although actual [joblessness ]may be
lower .
/indefiniteTerm/
Language English
Corpus FrameNet
Example [The far left ]had [some good issues ] even if [it ]did
not have good programs for dealing with [them ].
/nonSpecificTerm/
Language English
Corpus FrameNet
Example The result is a generation of [young people ][whose
]ignorance and intellectual incompetence is matched only by [their
]good opinion of [themselves ].
/specificTerm/
Language English
Corpus FrameNet
Example [These beliefs ] so dominate our educational
establishment , our media , our politicians , and even our parents
that [it ]seems almost blasphemous [to challenge [them ]].
Data category coverage: tag occurrences in the annotated
corpora
The annotation results were evaluated quantitatively with the
respect to the frequencies of the LIRICS data categories. The
following table gives an overview of tag occurrences and data
-
46
categories coverage (the percentage given in brackets) in the
annotated corpora for each data category and for each language.
Data category English Dutch Italian German
/synonymy/ 4 (0.9%) 18 (5.5%) 15(5.7%) 15(8.6%)
/hyponymy/ 3 (0.7%) 0 9(3.4%) 7(4%)
/acronymy/ 9 (2%) 5 (1.5%) 7(2.6%) 3(1.7%)
/compatibility/ 37 (8%) 0 23(8.7%) 0
/meronymy/ 0 2 (0.6%) 3(1.1%) 0
/metonymy/ 0 0 7(2.6%) 0
LINGUISTIC="NA" (not applicable) 401 (88.4%) 271 (82.9%)
192(72.5%) 138(78.9%)
LINGUISTIC="unclassified" 0 31(9.5%) 9(3.4%) 12(6.8%)
/objectalIdentity/ 429 (94.5%) 300 (91.7%) 225(84.9%)
117(66.9%)
/partOf/ 3 (0.7%) 12 (3.7%) 6(2.3%) 0
/subsetOf/ 4 (0.9%) 9 (2.8%) 7(2.6%) 5(2.8%)
/memberOf / 18 (3.9%) 3 (0.9%) 25(9.4%) 24(13.7%)
OBJECT="NA" 0 2 (0.6%) 0 29(16.6%)
OBJECT=”unclassified” 0 1(0.3%) 2(0.8%) 0
___________________________ __________ ________ ________
_________
/abstract/ 134 (19%) 108 (21.9%) 209(28.4%) 16(16.3%)
/concrete/ 572(81%) 386(78.1%) 495(67.3%) 79(80.6%)
ABSTRACTNESS="unclassified" 0 0 32(4.3%) 3(3.1%)
/animate/ 420(59.5%) 259(52.4%) 144(19.6%) 40(40.8%)
/inanimate/ 286(40.5%) 235(47.6%) 564(76.6%) 58(59.2%)
ANIMACY="unclassified" 0 0 28(3.8%) 0
/alienable/ 686(97.2%) 391(79.1%) 29(4%) 49(50%)
/inalienable/ 20(2.8%) 103(20.9%) 12(1.6%) 1(1%)
ALIENABILITY="unclassified" 0 0 695(94.4%) 48(49%)
/naturalGender/ 94(male)+ 183(female)
(39.2%)
120(male) +
18(female) (27.9%)
323(male) + 197(female)
(70.7%)
22(male) + 3(female) (25.5%)
GENDER=”NA” 429(60.8%) 345(69.8%) 21(2.8%) 57(58.2%)
GENDER="unclassified" 0 11(2.3%) 195(26.5%) 16 (16.3%)
/cardinality/ 11(1.6%) 33(6.7%) 59(8%) Not annotated
CARDINALITY="NA" 695(98.4%) 461(93.3%) 677(92%) Not
annotated
/collective/ 245(34.7%) 294(59.5%) 69(9.4%) 21(21.4%)
/nonCollective/ 295(41.8%) 172(34.8%) 7(0.9%) 74(75.5%)
-
47
COLLECTIVENESS="NA" 166(23.5%) 28(5.7%) 651(88.5%) 3(3.1%)
COLLECTIVENESS="unclassified" 0 0 9(1.2%) 0
/countable/ 545(77.2%) 395(80%) 330(44.8%) 65(66.4%)
/nonCountable/ 66(9.3%) 69(14%) 108(14.7%) 7(7.1%)
COUNTABILITY="NA" 95(13.5%) 30(6%) 271(36.8%) 19(19.4%)
COUNTABILITY=”unclassified” 0 0 27(3.7%) 7(7.1%)
/definiteIdentifiableTerm/ 329(46.6%) 255(51.6%) 559(76%)
8(8.2%)
/genericTerm/ 74(10.5%) 52(10.5%) 47(6.4%) 15(15.3%)
/indefiniteTerm/ 1(0.1%) 11(2.3%) 124(16.8%) 15(15.3%)
/nonSpecificTerm/ 79(11.2%) 19(3.8%) 0 1(1%)
/specificTerm/ 219(31%) 128(25.9%) 0 9(9.2%)
DEFINITENESS="unclassified" 2(0.3%) 1(0.2%) 5(0.7%) 0
DEFINITENESS="NA" 2(0.3%) 28(5.7%) 1(0.1%) 50(51%)
Table 9: Tag occurrences and data categories coverage (in %) in
the tested corpora for each language in isolation
It may be observed from Table 9 that all the LIRICS data
categories were covered by the test suites. The percentages given
in brackets indicate that their frequencies are comparable for the
various corpora, except for the following differences:
• the category GENDER was ‘unclassified’ for Italian in 26.5% of
the cases, and for German in 16.3%. Annotators noticed that the
application of /naturalGender/ is not clear, in particular for
Italian and German. With the exception of human beings, most
objects and concepts have a gender due to their grammatical
classification. Moreover, a high percentage of ‘not applicable’
values was observed for all languages (English: 60.8%; Dutch:
69.8%; German: 58.2%). Annotators noticed that NPs such as
"controller", and "Menschen" are difficult to tag, since they refer
to both natural genders.
• there are some discrepancies in labelling the ‘DEFINITENESS’
category across languages. In the annotators’ opinion, the
definitions of the definiteness data categories need to be
tightened (see the suggestions of the Italian LIRICS partners in
Section 5).
• there are high proportions of unclassified ‘ALIENABILITY’ for
Italian and German. German annotators noticed that it is difficult
to select an appropriate value when dealing with proper names,
since the value "NA" is lacking.
• ‘CARDINALITY’ for German was not annotated.
• a high percentage of cases was labelled with the ‘not
applicable’ value for ‘COLLECTIVITY’ (no comments on this; comments
required to be sent).
4.5 Discussion
For a qualitative evaluation of the performed annotation work,
each annotator (project partner) was asked to comment on the
following three points:
1. the definition of the annotation task
2. the definitions of data categories
-
48
3. the use of the annotation tool
With respect to the annotation task, as defined above, it was
noticed at an early stage that the main purpose of this LIRICS task
is to illustrate the use of the data categories that were defined,
which means that it would not be necessary to identify all possible
markables as described in the Annotation Guidelines (all NPs and
embedded structures), since not all of them enter in coreferential
relations. Therefore, it was agreed with all project partners that
the identification of markables and links can be performed in
parallel, in other words, that only NPs would be marked up which
participate in coreferential relations.
With respect to the definitions of data categories the following
issues were discussed:
• the category ‘GENDER’: not applicable in a significant amount
of cases, e.g. ‘people’, ‘a student’, since they refer to both
genders. The Italian partners noticed that most objects and
concepts have a gender due to their grammatical classification.
Since /naturalGender/ is a semantic rather than a grammatical
notion, it does not apply to many types of referent, and such
referents should be labelled as ‘GENDER=’Not applicable’
• the category ‘DEFINITENESS’: annotators proposed to consider
‘DEFINITENESS’ as a grammatical category. It was suggested to
change the /definiteIdentifiableTerm/ category into /definteTerm/,
in parallel with /indefiniteTerm/. These two categories refer to
the surface values of a NP in a document. Since these categories
are purely syntactic in nature, it was agreed to leave
/definiteTerm/ and /indefiniteTerm/ completely out of
consideration.
On the other hand, it was proposed to introduce a new category,
/specificity/, whose values would be /SpecificTerm/
/nonSpecificTerm/ and /genericTerm/. The definition of
/genericTerm/ should be reformulated like those for /specificTerm/
and /nonSpecificTerm/, i.e. the speaker refers to any members of a
class as a representative of its class. We would then have cases
like the following:
• The lions are noble beasts. GENERIC/DEFINTE
• I want to meet a Norwegian. GENERIC/INDEFINITE
• I want to meet the Norwegian. SPECIFIC/DEFINITE
• I saw a Norwegian. NONSPECIFIC/INDEFINITE
• She met a Norwegian. NONSPECIFIC/INDEFINITE
• I met a guy, he is a Norwegian. SPECIFIC/INDEFINITE
It was suggested that the /hyperonym/ linguistic relation should
be reintroduced as a data category, although it was considered
redundant at an early stage in the annotation enterprise. This is
due to the way co-reference annotation is conducted, i.e. from the
second (determined) NP towards the probable anchor (usually an
indefinite NP). It is strange to mark a relation the other way
round.
These suggestions have all been taken into account in the final
version of the proposed set of LIRICS semantic data categories, as
documented in the final version of deliverable D4.3
As a final comment, annotators noticed that the number of
attributes and values is extremely high for obtaining a coherent
tagging, even for a single annotator. In the case of a MARKABLE, an
annotator has to keep 7 categories in mind, each one subdivided
into at least four others.
-
49
The following issues were pointed out with respect to
annotators’ experiences in using the PALinkA. A disadvantage when
using the tool to perform annotations according to the LIRICS
specifications was the difficulty to assign values to categories in
one step. (Comment from one annotator: ‘It is so much clicking and
thinking and clicking... and clicking again!’). This increased the
annotation time considerably. Instead of separate windows with
drop-down menus for each category separately, one window with a
list of all categories and drop-down lists with values for each of
them would be preferred.
5 Concluding remarks
In this report we have documented Task 4.3 in Work Package 4:
Test suites for semantic content annotation and representation. The
task was defined to involve at least four European languages;
indeed, it has involved English, Dutch, Italian, German and
Spanish, and it has been carried out for the intended domains of
semantic information: semantic roles, dialogue acts, and
coreference relations. The data categories that were defined for
these domains, and that are documented in Deliverable D4.3, have
been marginally updated as a result of the test suite construction
and annotation effort, and have been endorsed by the Thematic
Domain Group 3, Semantic Content, of division TC 37/SC 4 of the
International Organization for Standards ISO.
For semantic role annotation, the state of the art in
computational linguistics is such that there are widely diverging
views on what may constitute a useful set of semantic roles, with
the FrameNet and PropBank initiatives as two opposite extremes. We
have proposed a set of data categories that corresponds roughly to
the upper levels of the FrameNet hierarchy, but with a more
strictly semantic orientation. In view of these circumstances, we
have carried out an investigation in the usability of the proposed
set of dscriptors by having material in English and Dutch (partly
taken from FrameNet and PropBank data) annotated independently by
three annotators. This is reported in section 3.5 of this document.
It turns out that even previously untrained annotators, with no
specific background in the area, were able to reach substantial
agreement on the use of the LIRICS data categories. This is a
welcome and very encouraging result. Outside of and after the
LIRICS project, this will be investigated further, also by
systematically relating LIRICS annotations to FramNet and PropBank
annotations, and willl be reported at conferences and in the
literature on semantic annotation.
For coreference annotation the situation is rather different.
The computational linguistics community is less divided in this
area, and the LIRICS data categories for reference annotation build
on several related efforts in reference annotation. This part of
the annotation work presented relatively little difficulty and did
not warrant a separate investigation into the usability of the
proposed data categories. However, annotators were asked to comment
on a number of aspects of their work, and this has resulted in some
suggestions for improving the set of data categories for reference,
which have been taken into account in the final proposal of this
set, as documented in Deliverable D4.3
For dialogue act annotation the state of the art is such that
different annotation schemes use a number of common core
descriptors, but vary widely in the number of additional tags, as
well as in their granularity, their naming, and the strictness of
their definitions. The LIRICS proposal for this domain is based on
taking the common core of a range of existing approaches and
extending this core in a principled way, with the help of a
formalized notion of ‘multidimensionality’ in dialogue act
annotation, which has been around informally in this domain for
some time. The usability of the LIRICS tagset was evaluated by
having two experienced annotators independently annotating the test
suites for English and Dutch. The results, described in section 2.5
of this report, show a near-perfect annotator agreement.
-
50
6 Bibliography
Carletta, J. (1996). Assessing agreement on classification
tasks: The kappa statistic. Computational Linguistics,
22(2):249-254.
Cohen, J. (1960). A cofficient of agreement for nominal scales.
Education and Psychological Measurement, 20:37-46.
Geertzen, J., Y. Girard, and R. Morante (2004). The DIAMOND
project. Poster at the 8th Workshop on the Semantics and Pragmatics
of Dialogue (CATALOG 2004), Barcelona, Spain, May 2004.
Geertzen, J. and H. Bunt (2006) Measuring annotator agreement in
a complex, hierarchical dialogue act scheme. In Proceedings of the
7th Workshop on Discourse and Dialogue (SIGdial 2006), Sydney,
Australia, July 2006, pp. 126 – 133.
Landis, J. and Koch, G. (1977). A one-way components of variance
model for categorical data. Biometrics, 33:671-679.
Rietveld, T. & R. van Hout (1993). Statistical techniques
for the study of language and language behavior. Berlin: Mouton de
Gruyter, page 219
-
51
Appendix I.A Annotation Guidelines for Dialogue Acts
Dialogue act annotation is about indicating the kind of
intention that the speaker had; what kind of thing was he trying to
achieve? When participating in a dialogue, this is what agents are
trying to establish.
1. First and most important guideline: “Do as the Addressee
would do!“
When assigning annotation tags to a dialogue utterance, put
yourself in the position of the participant at whom the utterance
was addressed, and imagine that you try to understand what the
speaker is trying to do. Why does (s)he say what (s)he says? What
are the purposes of the utterance? What assumptions does the
speaker express about the addressee? Answering such questions
should guide you in deciding which annotation tags to assign,
regardless of how exactly the speaker has expressed himself. Use
all the information that you could have if you were the actual
addressee, and like the addressee, try to interpret the speaker’s
communicative behaviour as best as you can.
2. Second and equally important guideline: “Think functionally,
not formally!“
The linguistic form of an utterance often provides vital clues
for choosing an annotation, but such clues may also be misleading;
in making your choice of annotation tags you should of course use
the linguistic clues to your advantage, but don’t let them fool you
- the true question is not what the speaker says but what he
means.
For example, WH-QUESTIONS are questions where the speaker wants
to know which elements of a certain domain have a certain property.
In English, such questions often contain a word beginning with
„wh“, such as which as in Which books did you read on your
holidays? or where in Where do your parents live?. But in other
languages this is not the case; moreover, even in English not all
sentences of this form express a WH-QUESTION: Why don’t you go
ahead is for instance typically a SUGGESTION rather than a
question.
Similarly, YN-QUESTIONS are questions where the speaker wants to
know whether a certain statement is true or false. Such sentences
typically have the form of an interrogative statement, such as Is
The Hague the capital of the Netherlands? or Do you like peanut
butter?. But not all sentences of this form express a YN-QUESTION;
for example, Do you know what time it is? functions most often as
in INDIRECT WH-QUESTION (What time is it?), and Would you like some
coffee? is an OFFER; Shall we go? is a SUGGESTION.
3. Another important general guideline is: “Be specific!“
Among the communicative functions that you can choose from,
there are differences in specificity, corresponding with their
relative positions in hierarchical subsystems. For instance, a
CHECK is more specific than a YES/NO-QUESTION, in that it
additionally carries the expectation that the answer will be
positive. Similarly, a CONFIRMATION is more specific than a
YES/NO-ANSWER, in that it carries the additional speaker that the
addressee expects the answer to be positive.
In general, try all the time to be as specific as you can. But
if you’re in serious doubt about specific functions, then simply
use a less specific function tag that covers the more specific
functions.
4. On indirect speech acts: “Code indirect speech acts just like
direct ones.“
Standard speech act theory regards indirect speech acts, such as
indirect questions, as just an indirect form of the same
illocutionary acts. By contrast, the DIT++ taxonomy incorporates
the idea that indirect dialogue acts signal subtly different
packages of beliefs and intentions than direct ones. For example,
the direct question What time is it? carries the assumption that
the addressee knows what time it is, whereas the indirect question
Do you know what time it is? does not carry that assumption (it
does at least not express that assumption; in fact it questions
it).
-
52
5. On implicit functions: “Do not code implicit communicative
functions, that can be deduced from functions that you have already
assigned.“
Implicit communicative functions occur in particular for
positive feedback.
For example, someone answering a question may be assumed to
(believe to) have understood the question. So any time you annotate
an utterance as an ANSWER (of some sort), you might consider
annotating it also as providing positive feedback on the
interpretation of the question that is answered. Don’t! It would be
redundant.
Notice also that the definition of a positive (auto-) feedback
act concerning interpretation stipulates that the speaker wants the
addressee to know that he (speaker) has understood the question. A
speaker who answers a question does not so much want to tell the
addressee that his question was understood—that’s just a
side-effect of giving an answer, that no speaker can avoid.
Similarly for reacting to an offer, a request, a suggestion,
etc.
6. Guidelines for the annotation of feedback functions.
Negative feedback, where the speaker wants to indicate that
there was a problem in processing a dialogue utterance, is always
explicit and as such mostly easy to annotate. 6.1 Implicit and
explicit positive feedback.
Positive feedback is sometimes given explicitly, and very often
implicitly.
Examples of explicit positive auto-feedback are the following
utterances by B, where he repeats part of the question by A:
A: What time does the KLM flight from Jakarta on Friday, October
13 arrive?
B: The KLM flight from Jakarta on Friday, October 13 has
scheduled arrival time 08.50.
B: The flight from Jakarta on Friday has scheduled arrival time
08.50.
B: The KLM flight from Jakarta on October 13 has scheduled
arrival time 08.50.
B: The flight from on October 13 has scheduled arrival time
08.50.
In such cases, the utterance by B should be annotated as having,
besides the general-purpose function WH-ANSWER in the Task/Domain
dimension, also a function in the Auto-Feedback dimension (see
below).
By contrast, the short answer: At 08.50 would carry only
implicit feedback information, and should therefore, following
Guideline 5, not be coded in the Auto-Feedback dimension.
6. 2 Levels of feedback.
The DIT++ taxonomy distinguishes 5 levels of feedback:
1. participant A pays attention to participant B’s
utterance.
2. A perceives B’s utterance, i.e. A recognizes the words and
nonverbal elements in B’s contribution.
3. A understands B’s utterance, i.e. A assigns an interpretation
to B’s utterance, including what A believes B is trying to achieve
with this utterance (what are his goals and associated beliefs
about the task/domain and about A).
4. A evaluates B’s utterance, i.e. A decides whether the beliefs
about B that characterize his understanding of B’s utterance, can
be added to A’s model of the dialogue context, updating his context
model without arriving at inconsistencies.
5. A ’executes’ B’s utterance, i.e. A performs actions which are
appropriate for achieving a goal that he had identified and added
to his context model. (For instance, executing a request
-
53
is to perform the requested action; executing an answer is to
add the content of the answer to one’s information; executing a
question is to look for the information that was asked for.)
There are certain relations between these levels: in order to
execute a dialogue act one must have evaluated it positively
(„accepted“ it); which is only possible if one (believes to) have
understood the corresponding utterance; which presupposes that one
perceived the utterance in the first place, which, finally,
requires paying attention to what is said. So for instance positive
auto-feedback about the acceptance of the addressee’s previous
utterance implies positive feedback at the „lower“ levels of
understanding, perception, and attention.
For positive feedback functions a higher-level function is more
specific than the lower-level functions. (Remember that a function
is more specific if it implies other functions.)
For negative feedback the reverse holds: when a speaker signals
the impossibility to perceive an utterance, he implies the
impossibility to interpret, evaluate and execute it. So negative
feedback at a lower level implies negative feedback at higher
levels.
Since, following Guideline 3, you should always be as specific
as possible, you should observe the following guideline for
annotating feedback functions:
Guideline 6: When assigning a feedback function, choose the
highest level of feedback in the case of positive feedback that you
feel to be appropriate, and choose the lowest level in the case of
negative feedback.
While this guideline instructs you to be as specific as
possible, sometimes you’ll be in serious doubt. You may for
instance find yourself in a situation where you have no clue
whether a feedback signal (such as OK) should be interpreted at the
level of interpretation or that of evaluation. In such a case you
should use the less specific of the two, since the more specific
level would mean that you „read“ more into this utterance than you
can justify.
In practice, it is often difficult to decide the level of
feedback that should be chosen. One of the reasons for this is that
the same verbal and nonverbal expressions may be used at most of
the levels (with a tendency to signal feedback (positively or
negatively) with more emphasis as higher levels of processing are
involved). It may happen that you encounter a feedback signal and
you have no clue at all at which level you should interpret that
signal. In this situation the annotation scheme allows you to use
the labels POSITIVE and NEGATIVE, which leave the level of feedback
unspecified.
7. Guidelines for the annotation of Interaction Management
functions
7.1 Turn Management.
General guideline:
“Code Turn Management functions only when these are not just
implied.“
In a spoken dialogue, the participants take turns to speak.
(Their nonverbal behaviour is not organised in turns; both
participants use facial expressions and gestures more or less all
the time.) A turn, that is a stretch of speech by one of the
participants, in general consists of smaller parts that have a
meaning as a dialogue act; these parts we call „utterances“. Turn
Management acts are the actions that participants perform in order
to manage this aspect of the interaction. This acts are subdivided
into acts for taking the turn (utterance-initial acts) and those
for keeping the turn or giving it away (utterance-final acts).
Usually only the first utterance in a turn has an utterance-initial
function and only the last an utterance-final one. The non-final
utterances in a turn do not have an utterance-final function,
except when the speaker signals (typically in the form of a rising
intonation at the end of the utterance) that the utterance is not
going to be the last one in the turn, that he wants to continue. In
that case the utterance has a TURN KEEPing function. Except for the
first one, the utterances in the turn do not have an
utterance-initial function; the speaker does not have to perform a
separate act in order to continue; all he has to do is to continue
speaking.
When a speaker accepts a turn that the addressee has assigned to
him through a TURN ASSIGN act, the utterance should be annotated as
having the utterance-initial function TURN
-
54
ACCEPT only when the speaker performs