Top Banner
A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial Intelligence Romanian Academy
26

A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

Dec 30, 2015

Download

Documents

Vivien Adams
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

A Differential Semantics Approach to the Annotation of

Synsets in WordNet

Dan Tufiş, Dan Ştefănescu{tufis danstef}@racai.ro

Research Institute for Artificial Intelligence Romanian Academy

Page 2: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

Connotation Versus Subjectivity

• There are various definitions of these terms, and most time they are taken to be synonymic (although they are not).

• Connotation of a word (which is contrasted with the denotation of a word-the dictionary meaning) is intrinsically subjective, referring to emotional responses commonly associated with its referent (that to which it refers).

2

Page 3: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

• Subjectivity refers to words’ capacity of being used in expressing opinions, particular feelings, beliefs, and desires.

• The connotation is more about word meanings while the subjectivity is more about phrases/sentences meaning; subjectivity is on an upper layer and builds on the connotations of the constituents

Connotation Versus Subjectivity

3

Page 4: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

– the word home means “the place where one lives”, but by connotation, also suggests something good i.e security, family, love and comfort....

– the word murder means “unlawful premeditated killing of a human being by a human being” but by connotation, also suggests something bad i.e. fear, disgust, repulsion…

– the word criminal means “someone who has committed (or been legally convicted of) a crime” but by connotation, it suggests a very bad person, fear, disgust, repulsion…

4

Page 5: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

Theory of Semantic Differentiation • (Osgood et al., 1957) “The Measurement of Meaning”.• The semantic differentiation technique uses pairs of

antonyms (factors) as semantic dimensions along which most adjectives can be differentiated. Lots of subjects were asked to scale the meanings of words along several factors

• Osgood and his colleagues showed that most of the variance in judgment could be explained by only three factors– <good-bad> (evaluative)– <strong-weak> (potency)– <active-passive> (activity)

5

Page 6: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

Kamps and Marx Approach to PWN Adjectives Annotation

Jaap Kamps and Maarten Marx “Words with Attitude” (2004) is based on (Osgood et al., 1957) “The Measurement of Meaning”.– they gave an algorithmic interpretation to the semantic

differentiation techniques based on the synonymy relations as encoded in WordNet1.7

– deals only with adjectives in WordNet and ignores the sense distinctions

6

Page 7: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

Some Definitions

• Two words w and w are related if there exists a sequence of words (w w1 w2 … wi … w ) so that any two adjacent words in the sequence belong to the same synset. If the length of such a sequence is n+1, one says that w and w are n-related.

• good and proper are 2-related :– synset 0161119-a: (good:14 right:13 ripe:3)– synset 00140845-a: (right:6 proper:3 suitable:3)

7

Page 8: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

Some OtherDefinitions

• Let MPL(wi, wj) be the partial function:

– MPL(wi, wj)=

• n if n is the smallest number such that wi and wj are n-related and

• undefined if wi and wj are not related

• MPL(wi, wj) has the following properties:

– MPL(wi, wj) = 0 iff wi, = wj

– MPL(wi, wj) = MPL(wj, wi)

– MPL(wi, wj) + MPL(wj, wk) MPL(wi, wk)

• MPL is a distance measure that can be used as a metric for the semantic relatedness of two words. 8

Page 9: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

Some More Definitions

• Let <w,w> be a differential semantic factor.

• The partial function: – TRI (wi, w, w) = (MPL(wi,w) - MPL(wi,w)) / MPL(w,w)

when all MPL are defined

– undefined otherwise

• TRI(wi, w, w) [-1,1] measures the closeness of ∊ wi to the factor poles:– a negative value means wi is closer to w while, a positive one

indicates closeness to w

9

Page 10: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

Our Approach: Some Definitions

• Let <w,w> be a differential semantic factor

• Any word wi in WordNet that can be reached on a path from w to w is given a score number which is a function of the distances from wi to w and to w (TRI)

• The set of these words defines the coverage of the <w,w> factor – COV(<w, w>).

• The set of all synsets containing the words defines the semantic coverage of the corresponding S-factor – SCOV(<Sw,Sw >).

10

Page 11: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

AA

w

ww

w

w w

ww

w

w w

w

w

ww

w w

ww

w

w

B

COVERAGE of factors <AA>, <B B> … is the same

B

Page 12: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

Moving towards synsets• S-factor = a pair of synsets (S,S) for which there exists wi

α:siα∊S

and wi β:si

β∊Sβ so that wi

α:siα and wi

β:si β are antonyms and

MPL(wiα,wi

β) is defined.S and S have opposite meanings, but only

wiα and wi

β are antonyms.

• MPLS(S,S) = MPL(wiα,wi

β)

• A synset or a literal of a certain synset is n-scoped relative to an S-factor <S,S> if the synset’s SUMO/MILO label L is a node in the tree-like structure having as root the n-th parent of the lowest common ancestor of the SUMO/MILO labels corresponding to S and S.

• n defines the depth of the scope coverage SCOVn(<S,S>) and every synset in this coverage is n-scoped. If the root synset is Sγ we will use also use the notation SCOV(< S,S>)Sγ 12

Page 13: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

13

Scope

Page 14: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

Moving towards synsets

• For an S-factor <S,S> TRIS (Si, S, S) is defined as the average of the TRI values associated to the literals making the synset:

• where m is the number of literals in Si, wj are the literals in Si, and <wk

α,wl β> is the factor determining the <S,

S> S-factor

• TRIS has values in the [-1,1] interval. If it is not defined, we assign TRIS a value outside the considered interval (2 for exampe).

14

Page 15: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

Some examples: scoped S-factors

• ({fairness:1 …}<-> {unfairness:2…}) NormativeAttribute

• ({comfort:1 …} <–> {discomfort:1 …}) StateOfMind

• ({trust:3 …} <–> {distrust:2 …}) TraitAttribute

• ({increase:3… }<->{decrease:2…}) QuantityChange

• ({demand:1…}<->{supply:2…}) Entity

• ({good:1…}<->{bad:1…}) SubjectiveAssessmentAttribute

• ({strong:1…}<->{weak:1…}) SubjectiveAssessmentAttribute

• ({active:1…}<->{inactive:2}) BiologicalAttribute

15

Page 16: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

Factors & Scoped S-factorsWord Class Factors Scoped

S-FactorsCoverage (literals)

Coverage (synsets)

Adjectives 335 332 5,307 (24.68%)

5,291 (28.50%)

Adverbs (factors & scores imported from adjectives)

335 332 1,943 (41.69%)

1,571 (42.87%)

Nouns 85 78 11,109 (9.59%)

11,007 (13.81%)

Verbs 254 247 6,467 (57.19%)

8,589 (64.58%)

16

Page 17: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

The Annotations• Each connotative synset attached with a vector the

size of which depends on the POS of the synset– Noun synsets => a vector of 78 values– Verb synsets => a vector of 247 values– Adj and Adv synsets => a vector of 332 values

• Although all factors for a POS cover the same synsets, the vectors for different synsets of the same POS may be very different;

• The selection of the scoped S-factors has tremendous relevance with respect to the domain for which a text analysis is achieved

Page 18: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

18

S-Factors & Connotation Scores

Page 19: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

19

Page 20: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

A tool for connotation

scoring

Page 21: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

Some examples• Assume we selected the following factors

– for nouns:• ({comfort:1 …} – {discomfort:1 …}) StateOfMind

• ({pleasure:1 …} – {pain:2 …}) EmotionalState

• ({trust:3 …} – {distrust:2 …}) TraitAttribute

– for verbs:• ({get well:1…}– {get worse:1…} OrganismProcess

• ({enjoy:4… }– {suffer:1…}) AsymmetricRelation, IrreflexiveRelation

• ({believe:1…} – {disbelieve:1 …}) Entity

“His lies will be dealt with in the court and his immorality will be proved.”

Page 22: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

The mark-up of the exampleHis lies:1 <comfort:-0.11 pleasure:-0.23 trust:-0.11> will be dealt:2 <get well:0.42 enjoy:0 believe:0.62> within the court:1 <comfort:-0.11 pleasure:-0.07 trust:-0.11> and his immorality:2 <comfort:-0.22 pleasure:-0.07 trust:-0.11> will be proved:3 <get well:0.14 enjoy:-0.2 believe:0.25>. A rough analysis suggests that we have a subjectively loaded sentence which expresses: -lack of confort (i.e discomfort (average score: -0.14), -lack of pleasure (i.e pain (average score: -0.12), -lack of trust (i.e distrust (average score -0.11), -getting well (average score: 0.28), -not enjoying (i.e. suffering (average score: -0.1) and -believing (average score: 0.43).

Page 23: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

An alternative mark-up of the example

In SentiWordNet (Essuli &Sebastiani, 2006) each synset is annotated by a triplet <P: N: O:γ> where P denotes the synset positive load, N stands for its negative load and O represents the objectity of the considered meaning:

His lies:1 <P:0 N:0 O:1> will be dealt:2<P:0.125 N:0 O:0.875> within the court:1 <P:0 N:0 O:1> and his immorality:2 <P:0.75 N:0 O:0.25> will be proved:3<P:0 N:0 O:1>.

In terms of <P, N, O> triad, one would eventually obtain an almost objective statement (average score 0.825) with a significant load of positivism (average score 0.175) and no negativity at all.

Page 24: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

Using the S-factors to Compute Binary Judgments

Score (synsetK) =

– SF is the set of S-factors for the POS of synsetK, while Vector (i, synsetK) is the value of the kth cell of the S-factor vector associated with the synsetK

– A negative Score means that the aggregated connotational value for the synsetK is closer to the connotations induces by the first words in the antonymic pairs making the S-factors, while a positive value means closeness to the second words of the antonymic pairs.

|,|1

),(*SFi

Ki synsetiVector

Page 25: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

Data and tools: http://www.racai.ro/differentialsemantics/

Page 26: A Differential Semantics Approach to the Annotation of Synsets in WordNet Dan Tufiş, Dan Ştefănescu {tufis danstef}@racai.ro Research Institute for Artificial.

0_0

• N: 38 / 78 (48.71%)

• V: 73 / 247 (29.55%)

• A: 252 / 332 (75.90%)

26