Thematic Roles and Semantic Space Insights from Distributional Semantic Models Gabriella Lapesa 1 & Stefan Evert 2 1 Institute of Cognitive Science, University of Osnabr¨ uck 2 Corpus Linguistics Group, FAU Erlangen-N¨ urnberg Quantitative Investigations in Theoretical Linguistics 12-14 September 2013
43
Embed
Thematic Roles and Semantic Space Insights from ...glapesa/materials/...Thematic Roles and Semantic Space Insights from Distributional Semantic Models Gabriella Lapesa1 & Stefan Evert2
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Thematic Roles and Semantic SpaceInsights from Distributional Semantic Models
Gabriella Lapesa1 & Stefan Evert2
1Institute of Cognitive Science, University of Osnabruck2Corpus Linguistics Group, FAU Erlangen-Nurnberg
Quantitative Investigations in Theoretical Linguistics12-14 September 2013
Distributional SemanticsData
ModelsEvaluationConclusion
Outline
1 Distributional Semantics
2 DataFrameworkDatasetsMotivation
3 ModelsGeneral FeaturesParameters
4 EvaluationStep 1: Range and Mean PerformanceStep 2: Evaluation of DSM ParametersStep 3: Thematic Roles and DSM Performance
5 Conclusion
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 2/43
Distributional SemanticsData
ModelsEvaluationConclusion
Distributional Semantic Models
Distributional semantic models (DSM) implement theDistributional Hypothesis (Harris 1954): difference in meaning→ difference in distribution
Distributional meaning of a word is usually operationalized interms of its co-occurrence patterns with other words
get see use hear eat kill
knife 51 20 84 0 3 0
cat 52 58 4 4 6 26
dog 115 83 10 42 33 17
pig 12 17 3 2 9 27
Distance between word vectors ⇐⇒ semantic similarity
empirical correlate of the amount of shared meaning
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 3/43
Distributional SemanticsData
ModelsEvaluationConclusion
Two important (and open) research questions
1 Lots of tasks, lots of parameters: how do different parametersaffect DSM performance in a particular task?
2 Are distributional meaning representations comparable tothose of human speakers?
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 4/43
Distributional SemanticsData
ModelsEvaluationConclusion
FrameworkDatasetsMotivation
The Generalized Event Knowledge Framework
”Speakers use their knowledge of common events to understandlanguage, and they do so as quickly as possible” (McRae andMatsuki 2009)
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 7/43
Distributional SemanticsData
ModelsEvaluationConclusion
FrameworkDatasetsMotivation
Why modeling these experiments?
Contribute to a wider debate concerning the way humansemantic representations are built and handled through theintegration of experiential and language-based distributionaldata
A practical reason: a (quite) large amount of data is available,from different experimental paradigms and with different typesof information for each experimental item (norming, reactiontimes, etc.).
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 8/43
Distributional SemanticsData
ModelsEvaluationConclusion
FrameworkDatasetsMotivation
Why do we expect DSMs to be successful?Distributional Similarity as relatedness
Verbs and prototypical fillers co-occur, therefore they tend tooccur in the same contexts
Shared meaning relevant to these experiments is understoodin terms of shared topic (the event), rather than in terms ofsynonymy
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 9/43
Distributional SemanticsData
ModelsEvaluationConclusion
General FeaturesParameters
Overview of the models
Term-term distributional semantic models (bag-of-words)no syntaxno word order
Target terms (rows)vocabulary from Baroni and Lenci (2010) plus GEK tasks27,688 lemmas / 31,713 tagged lemmas
Feature terms (columns)filtered by part-of-speech (nouns, verbs, adjectives, adverbs)filtered by frequency thresholds (same for all corpora)
Distributional models were compiled and evaluated using the IMS CorpusWorkbencha, the UCS toolkitb and the wordspace packagec for R.
Part-of-speech information reduces ambiguity (light/A vs. light/N)but results in sparser representation.
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 14/43
Distributional SemanticsData
ModelsEvaluationConclusion
General FeaturesParameters
DSM parametersAssociation score for feature weighting
co-occurrence frequency (freq)
Dice coefficient (Dice)
Mutual Information (MI)
simple log-likelihood (s-ll)
t-score (t-sc)
z-score (z-sc)
Q: Are association measures cognitively plausible? E.g., do theyallow for incremental updates?
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 15/43
Distributional SemanticsData
ModelsEvaluationConclusion
General FeaturesParameters
DSM parametersTransformation function
no transformation
logarithmic
square root
sigmoid
Transformations reduce Zipfian skew of co-occurrence frequencies.
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 16/43
Distributional SemanticsData
ModelsEvaluationConclusion
General FeaturesParameters
DSM parametersDimensionality Reduction
no dimensionality reduction
Random Indexing to 1000 dimensions (ri)
(randomized) Singular Value Decomposition to 300dimensions (rsvd)
Dimensionality reduction expected to improve semanticrepresentation (SVD) and/or make computations more efficient(SVD, RI), but some researchers also report detrimental effect(e.g. for composition by pointwise multiplication).
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 17/43
Distributional SemanticsData
ModelsEvaluationConclusion
General FeaturesParameters
DSM parametersDistance measure
cosine similarity → angular distance
Euclidean distance
Manhattan distance
Problem: all these distance measures are symmetric, whilecognitive processes (among which priming - Hare et al.2009) areoften asymmetric
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 18/43
Distributional SemanticsData
ModelsEvaluationConclusion
General FeaturesParameters
DSM parametersRelatedness index
distance between prime and target (dist)
rank of prime among nearest neighbors of target (back rank)
rank of target among nearest neighbors of prime (forw rank)
average rank = mean of back rank and forw rank (rank avg)
Michelbacher et al. (2011) use rank-based measures to predictasymmetric syntagmatic association. Hare et al. (2009) applythem to their noun-noun priming data.
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 19/43
Distributional SemanticsData
ModelsEvaluationConclusion
Step 1: Range and Mean PerformanceStep 2: Evaluation of DSM ParametersStep 3: Thematic Roles and DSM Performance
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 23/43
Distributional SemanticsData
ModelsEvaluationConclusion
Step 1: Range and Mean PerformanceStep 2: Evaluation of DSM ParametersStep 3: Thematic Roles and DSM Performance
Step 2: Verb-Noun, PatientBest Parameter Values: Corpus and Window
Corpus
78
80
82
84
86
88
90
92
bnc wp500 wacky ukwac joint
●
●●
● ●
Window
78
80
82
84
86
88
90
92
2 5 15
●
●●
Higher accuracy for models trained on bigger corpora and withmedium context windows
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 24/43
Distributional SemanticsData
ModelsEvaluationConclusion
Step 1: Range and Mean PerformanceStep 2: Evaluation of DSM ParametersStep 3: Thematic Roles and DSM Performance
Step 2: Verb-Noun, PatientBest Parameter Values: Part of Speech and Distance
Part of Speech
78
80
82
84
86
88
90
92
no_pos pos_t pos_t+f
●●
●
Distance
78
80
82
84
86
88
90
92
cosine euclidean manhattan
●
●
●
Models with no part of speech info or with pos info only on thetarget perform better (trade off between disambiguating effect andsparseness?). Cosine and euclidean distance are the best value forthe distance measure parameter.
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 25/43
Distributional SemanticsData
ModelsEvaluationConclusion
Step 1: Range and Mean PerformanceStep 2: Evaluation of DSM ParametersStep 3: Thematic Roles and DSM Performance
Step 2: Verb-Noun, PatientBest Parameter Values: Score and Transformation
Score
78
80
82
84
86
88
90
92
freq Dice MI s_ll z_sc t_sc
●
●●
● ●
●
Transformation
78
80
82
84
86
88
90
92
none log root sigmoid
● ●●
●
Best performances for association measures vs frequency, worseperformances for sigmoid vs other values of the transformationparameter
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 26/43
Distributional SemanticsData
ModelsEvaluationConclusion
Step 1: Range and Mean PerformanceStep 2: Evaluation of DSM ParametersStep 3: Thematic Roles and DSM Performance
Step 2: Verb-Noun, PatientBest Parameter Values: Interaction Score and Transformation
●●
●
●
●
●
Score * Transformation
none
freq Dice MI s−ll z−sc t−sc
7878
80
82
84
86
88
90
92
● nonelogrootsigmoid
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 27/43
Distributional SemanticsData
ModelsEvaluationConclusion
Step 1: Range and Mean PerformanceStep 2: Evaluation of DSM ParametersStep 3: Thematic Roles and DSM Performance
Step 2: Verb-Noun, PatientBest Parameter Values: Relatedness Index and Dimensionality Reduction
Dimensionality Reduction
78
80
82
84
86
88
90
92
none ri rsvd
●
●
●
Relatedness Index
78
80
82
84
86
88
90
92
dist back_rank forw_rank avg_rank
● ●
●
●
Non reduced models perform significantly better than the reducedones. Forward rank is the best relatedness index.
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 28/43
Distributional SemanticsData
ModelsEvaluationConclusion
Step 1: Range and Mean PerformanceStep 2: Evaluation of DSM ParametersStep 3: Thematic Roles and DSM Performance
Step 2: How about other relations?
Same analysis on all relations: no significant difference interms of both explained variance and best parameter values
Most explanatory parameters:Distance and dimensionality reduction (followed by relatednessindex and by corpus - with more fluctuations. Score:transformationis always very explanatory)
Best parameter values:bigger corporamedium-big context windowsno part of speech or part of speech only on targetassociation measures better than frequencybetter accuracy without vector transformation (or with log and root)negative effect of dimensionality reductioncosine as best distance measureforward rank as best relatedness index
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 29/43
Distributional SemanticsData
ModelsEvaluationConclusion
Step 1: Range and Mean PerformanceStep 2: Evaluation of DSM ParametersStep 3: Thematic Roles and DSM Performance
Step 2: WindowVerb-Noun, Agent
74
76
78
80
82
84
86
88
2 5 15
●
● ●
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 30/43
Distributional SemanticsData
ModelsEvaluationConclusion
Step 1: Range and Mean PerformanceStep 2: Evaluation of DSM ParametersStep 3: Thematic Roles and DSM Performance
Step 2: WindowVerb-Noun, Instrument
74
76
78
80
82
84
86
88
2 5 15
●
●
●
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 31/43
Distributional SemanticsData
ModelsEvaluationConclusion
Step 1: Range and Mean PerformanceStep 2: Evaluation of DSM ParametersStep 3: Thematic Roles and DSM Performance
Step 2: WindowVerb-Noun, Location
70
72
74
76
78
80
82
84
2 5 15
●
●
●
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 32/43
Distributional SemanticsData
ModelsEvaluationConclusion
Step 1: Range and Mean PerformanceStep 2: Evaluation of DSM ParametersStep 3: Thematic Roles and DSM Performance
Step 2: WindowNoun-Verb, Agent
74
76
78
80
82
84
86
88
2 5 15
●
●
●
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 33/43
Distributional SemanticsData
ModelsEvaluationConclusion
Step 1: Range and Mean PerformanceStep 2: Evaluation of DSM ParametersStep 3: Thematic Roles and DSM Performance
Step 2: WindowNoun-Verb, Patient
80
82
84
86
88
90
92
94
2 5 15
●
● ●
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 34/43
Distributional SemanticsData
ModelsEvaluationConclusion
Step 1: Range and Mean PerformanceStep 2: Evaluation of DSM ParametersStep 3: Thematic Roles and DSM Performance
Step 2: WindowNoun-Verb, Instrument
74
76
78
80
82
84
86
88
2 5 15
●
● ●
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 35/43
Distributional SemanticsData
ModelsEvaluationConclusion
Step 1: Range and Mean PerformanceStep 2: Evaluation of DSM ParametersStep 3: Thematic Roles and DSM Performance
Step 2: WindowNoun-Verb, Location
74
76
78
80
82
84
86
88
2 5 15
●
● ●
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 36/43
Distributional SemanticsData
ModelsEvaluationConclusion
Step 1: Range and Mean PerformanceStep 2: Evaluation of DSM ParametersStep 3: Thematic Roles and DSM Performance
Step 3: Thematic Roles and DSM PerformanceModel Parameters and Interactions (R2:0.72)
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 39/43
Distributional SemanticsData
ModelsEvaluationConclusion
Summary
DSMs that make no use of syntax show good performances ina task related to selectional preference
The representation responsible for the effects is stable acrossrelations
The distribution of DSMs’ performance across thematicrelations shows patterns which are compatible with somegeneral assumptions in theoretical linguistics
Some relations are more salient than others in the semanticspace, and more subject to typicality effects (prototypical fillersare closer than non prototypical ones)
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 40/43
Distributional SemanticsData
ModelsEvaluationConclusion
Work in progress
We are currently evaluating syntax-based models (dependencyfiltered/structured, prototype-based)
Test additional parameters and parameter values
Include standard tasks in evaluation (TOEFL, . . . )
Evaluate other types of DSMs (term-context)
Item-based prediction of RTs, based on different types ofcorpus-based information (first order, DSMs)
Context-dependent priming for agent-verb-patient triples(Bicknell et al. 2008) and verb-instrument-patient triples(McRae and Matsuki 2009)
Any other ideas?
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 41/43
Distributional SemanticsData
ModelsEvaluationConclusion
References I
Baroni, Marco and Lenci, Alessandro (2010). Distributional memory: A generalframework for corpus-based semantics. Computational Linguistics, 36(4), 1–49.
Bicknell, Klinton; Elman, Jeffrey L.; Hare, Mary; McRae, Ken; Kutas, Marta (2008).Online expectations for verbal arguments conditional on event knowledge. InProceedings of the 30th Annual Conference of the Cognitive Science Society,Volume 1, pages 2220–2225.
Erk, Katrin; Pado, Sebastian; Pado, Ulrike (2010). A flexible, corpus-driven model ofregular and inverse selectional preferences. Computational Linguistics, 36(4),723–763.
Ferretti, Todd; McRae, Ken; Hatherell, Ann (2001). Integrating verbs, situationschemas, and thematic role concepts. Journal of Memory and Language, 44(4),516–547.
Harris, Zelig (1954). Distributional structure. Word, 10(23), 146–162.
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 42/43
Distributional SemanticsData
ModelsEvaluationConclusion
References II
McRae, Ken and Matsuki, Kazunaga (2009). People use their knowledge of commonevents to understand language, and do so as quickly as possible. Language andLinguistics Compass, 3(6), 1417–1429.
McRae, Ken; Hare, Mary; Elman, Jeffrey L.; Ferretti, Todd (2005). A basis forgenerating expectancies for verbs from nouns. Memory & Cognition, 33(7),1174–1184.
Michelbacher, Lukas; Evert, Stefan; Schutze, Hinrich (2011). Asymmetry incorpus-derived and human word associations. Corpus Linguistics and LinguisticTheory, 7(2), 245–276.
Sahlgren, Magnus (2006). The Word-Space Model: Using distributional analysis torepresent syntagmatic and paradigmatic relations between words inhigh-dimensional vector spaces. Ph.D. thesis, University of Stockolm.
Gabriella Lapesa & Stefan Evert Thematic Roles and Semantic Space 43/43