Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin November 2006
Designing an Elicitation Corpus with Semantic Representations
Simon FungAdvisor: Lori LevinNovember 2006
Corpus example
Was there an apple?Wasn't there an apple?Will there be an apple?Won't there be an apple?There was an apple.There was not an apple.There will be an apple.There will not be an apple....
Corpus example
Was there an apple?Wasn't there an apple?Will there be an apple?Won't there be an apple?There was an apple.There was not an apple.There will be an apple.There will not be an apple....
那裡曾經有一個蘋果嗎 ?那裡不是曾經有一個蘋果嗎 ?那裡會有一個蘋果嗎 ?那裡不是會有一個蘋果嗎 ?那裡曾經有一個蘋果。那裡曾經沒有一個蘋果。那裡會有一個蘋果。那裡不會有一個蘋果。
Uses for parallel corpus
statistical MT training data learning about grammar of new
language
Motivation
how do languages form various constructions (e.g. relative clauses)?
1. The student whom I saw2. 我見過的學生。
Motivation
what semantic distinctions are important in different languages?He is talking. Tā zài jiăng
huà.Il parle.
They are talking.
Tā mén zài jiăng huà.
Ils parlent.
He talks. {habitually}
Tā jiăng huà. Il parle.
The MILE (MInor Language Elicitation) Corpus
sentences covering various semantic categories/constructions
e.g. number, gender, relative clauses to be translated into language
under study semantic representation for each
sentence
The MILE (MInor Language Elicitation) Corpus
10,000-20,000 words translations done by one person 7 languages per year for next 5
years E.g., Thai, Bengali, Punjabi May have a lot of speakers, but fewer
electronic resources
Constraints
maximize range of semantic categories and constructions
minimize corpus size
Constraints
different languages complex in different areas only one corpus, for this project ultimate goal: dynamically navigate
through features e.g. no sing./pl. distinction → no dual
Method
1. create semantic representations first (instead of starting with English)
2. write English sentences based on them
3. translate sentences into various languages
Method
1. create semantic representations first (instead of starting with English)
2. write English sentences based on them
3. translate sentences into various languages
Example: feature structuresrcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker;
((actor ((np-function fn-actor)(np-general-type interrogative-type) (np-person person-third)(np-number num-dual) (np-biological-gender bio-gender-male)(np-animacy anim-human)(np-pronoun-antecedent antecedent-n/a) (np-specificity specificity-neutral)(np-identifiability identifiability-neutral) (np-distance distance-neutral)(np-pronoun-exclusivity inclusivity-n/a))) (undergoer ((np-person person-third)(np-identifiability unidentifiable)(np-number num-pl) (np-specificity non-specific)(np-animacy anim-inanimate)(np-biological-gender bio-gender-n/a)(np-function fn-undergoer)(np-general-type common-noun-type)(np-pronoun-exclusivity inclusivity-n/a)(np-pronoun-antecedent antecedent-n/a)(np-distance distance-neutral))) (c-polarity polarity-positive) (c-v-absolute-tense future) (c-general-type open-question)(c-question-gap gap-actor)(c-my-causer-intentionality intentionality-n/a)(c-comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary-n/a)(c-comparator-function comparator-n/a)(c-causee-control control-n/a)(c-our-situations situations-n/a)(c-comparand-type comparand-n/a)(c-causation-directness directness-n/a)(c-source source-neutral)(c-causee-volitionality volition-n/a)(c-assertiveness assertiveness-neutral)(c-solidarity solidarity-neutral)(c-v-grammatical-aspect gram-aspect-neutral)(c-adjunct-clause-type adjunct-clause-type-n/a)(c-v-phase-aspect phase-aspect-neutral)(c-v-lexical-aspect activity-accomplishment)(c-secondary-type secondary-neutral)(c-event-modality event-modality-none)(c-function fn-main-clause)(c-minor-type minor-n/a)(c-copula-type copula-n/a)(c-power-relationship power-peer)(c-our-shared-subject shared-subject-n/a))
Example: feature structuresrcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker;
((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE)))
(UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY
UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON-SPECIFIC)))
(C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE))
Example: feature structuresrcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker;
((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE)))
(UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY
UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON-SPECIFIC)))
(C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE))
Example: feature structuresrcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker;
((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE)))
(UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY
UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON-SPECIFIC)))
(C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE))
Feature name
Example: feature structuresrcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker;
((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE)))
(UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY
UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON-SPECIFIC)))
(C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE))
Feature name
value
Using semantic representation
Advantages: more precise more complete encode actual linguistic features to
elicit
Method
1. create semantic representations first (instead of starting with English)
2. write English sentences based on them
3. translate sentences into various languages
Corpus example
Was there an apple?Wasn't there an apple?Will there be an apple?Won't there be an apple?There was an apple.There was not an apple.There will be an apple.There will not be an apple....
Method
1. create semantic representations first (instead of starting with English)
2. write English sentences based on them
3. translate sentences into various languages
Corpus example
Was there an apple?Wasn't there an apple?Will there be an apple?Won't there be an apple?There was an apple.There was not an apple.There will be an apple.There will not be an apple....
那裡曾經有一個蘋果嗎 ?那裡不是曾經有一個蘋果嗎 ?那裡會有一個蘋果嗎 ?那裡不是會有一個蘋果嗎 ?那裡曾經有一個蘋果。那裡曾經沒有一個蘋果。那裡會有一個蘋果。那裡不會有一個蘋果。
1. Naturalness naturalness of sentences vs. holding
lexical items constant• minimal pairs ideal (A tree fell/The tree fell)• but also want natural sentences• natural → easier to translate → less mistakes
She hurt herself. *It hurt itself.
sentences are hand-written vs using natural language generators
(GenKit)
2. Restrictions
• need to find restrictions on combinations of features
• some combinations invalid/unnatural
• e.g. inclusive and third-person
3. Definition of values use language-independent
semantic categories precise
e.g. specificity better than definiteness
agreement on definitions• intercoder agreement (informal
experiment)• writers agreed on English forms to use
Avoiding language-specificity many-to-many translations of determiners
I have a cat. J’ai un chat.
The cat is fat. Le chat est gros.
I like chocolate. J’aime le chocolat.
I eat chocolates. Je mange des chocolats.
Communism failed. Le communisme a échoué.
He has (some) money. Il a de l’argent.
I am a teacher. Je suis professeur.
England L’angleterre
I don’t have a/any cat(s). Je n’ai pas de chat.
Avoiding language-specificity
Have to break it down by function: Indefinite quantity (some water) Generic (the moose is a noble animal) Predicate nominal (I am a doctor) definite noun phrase (the dog is sick) Etc.
Definiteness
example of a problem in design of features and values
how to define definiteness, while avoiding using English
definiteness categories?
Criteria for definiteness
Lyons (1999): uniqueness familiarity identifiability specificity inclusiveness
Criteria for definiteness
chose the most important criteria: identifiability specificity
Definiteness
You and I are in a room. I say
“The chair is on fire!”
Definiteness
Why did I say “the chair”? identifiability
I know that you know what chair I’m talking about
specificity I’m referring to a particular chair
Grammatical feature: specificity
John wants to marry a Norwegian.Feature: np-specificity
Values specific
John wants to marry a (specific) Norwegian. non-specific
John wants to marry some Norwegian. specificity-neutral
She is a Norwegian.
Grammatical feature: specificity
Turkish direct objects:
Ali bir kitap okudu. Ali one book read Ali read a book.
Ali bir kitab-ı okudu.
Ali one book-acc readAli read a (specific) book.
Layout of Corpus1. Clause types, negation, and formality2. Discourse setting/Speaker-hearer features3. Basic NP features4. Verbal Tense and Aspect5. Evidentiality and Modality6. Causatives7. Comparatives8. Modifiers9. Conjunctions10. Clause-combining
Layout of Corpus
combine feature values systematically why combine
some features interact e.g. Will the woman be happy?
(interrogative, future tense) what to combine?
some features known to interact e.g. person, number (I am, we are, he is)
Status
delivered 21,133 words (sampled version) translated into Thai, Bengali Spanish -> Guarani