INF2820 Computational Linguistics, 2013 Jan Tore Lønning 11 March
INF2820 Computational Linguistics, 2013 Jan Tore Lønning 11 March
Today With recommended (order of) reading • Grammatical features (Last week)
• NLTK book sec 9.1 • Feature structures
• J&M, sec 15.1 • Unification and subsumption
• J&M, sec. 15.2 • Feature structures in NLTK
• NLTK book sec 9.2 • Feature-Based grammars/Unification grammars
• Partly: • J&M, sec 15.3, NLTK book sec 9.3
2
Towards a formalization
• Formally: • Can a category have more than one feature? • What are the possible values of features? • What are the grammar rules? • How should the grammar rules be interpreted?
• Applicability: • How should a grammar with features for Nat. Lang look? • What more can features be used for?
• Semantic representations
• Computationally: • How can feature structure grammars be parsed?
3 March 12, 2013
More than one feature, ex: German S
NP[CASE=nom, NUM=?x, PERS=?y] VP[NUM=?x, PERS=y?]
NP[CASE=?z,NUM=?x, PERS=3rd] Det[CASE=?z,NUM=?x, GEN=?u] N[CASE=?z,NUM=?x, GEN=?u]
VP[NUM=?x] V[SUBC= dtv, NUM=?x] NP[CASE=dat] NP[CASE=acc]
Det[NUM=sg, CASE=nom, GEN=mask] 'der'
4
Feature structures • Long tradition in
linguistics • E.g. Phonology
• A set of features and
values: • Each value is
appropriate for that feature
• Take it one step further:
• Allow feature structures as values
5
Feature structures as graphs
• Two alternative notations 6
Directed Acyclic Graphs
(DAGs)
Attribute Value Matrices (AVMs)
Reentrancies
7
Reentrancies and programming
• Reentrancies in feature structures resemble the difference in programming between
• two variables pointing to the same object (identity)
• and two variables having similar values
>>> a = [3,4,5] >>> b = [6,7,a,9] >>> c = a[:] >>> a.pop() 5 >>> a ? >>> c ? >>> b ? >>>
8
Today With recommended (order of) reading • Grammatical features (Last week)
• NLTK book sec 9.1 • Feature structures
• J&M, sec 15.1 • Unification and subsumption
• J&M, sec. 15.2 • Feature structures in NLTK
• NLTK book sec 9.2 • Feature-Based grammars/Unification grammars
• Partly: • J&M, sec 15.3, NLTK book sec 9.3
9
Unification of feature structures
March 12, 2013 10
March 12, 2013 11
March 12, 2013 12
March 12, 2013 13
Subsumption and unification
Subsumption • F subsumes G • ”F is as least as general as G” • • If and only if:
• F is atomic and F=G • F is complex and
• For each x in F: F(x) subsumes G(x)
• For any paths p, q in F: If F(p) = F(q) then G(p) = G(q)
Unification
• H is the unification of F and G
• H = • If and only if
• • • And H is the most general
f.s. with these properties
March 12, 2013 14
Today With recommended (order of) reading • Grammatical features (Last week)
• NLTK book sec 9.1 • Feature structures
• J&M, sec 15.1 • Unification and subsumption
• J&M, sec. 15.2 • Feature structures in NLTK
• NLTK book sec 9.2 • Feature-Based grammars/Unification grammars
• Partly: • J&M, sec 15.3, NLTK book sec 9.3
15
NLTK - implementation >>> fs1 = nltk.FeatStruct(TENSE='past', NUM='sg') >>> fs1 [NUM='sg', TENSE='past'] >>> print fs1 [ NUM = 'sg' ] [ TENSE = 'past' ] >>> from nltk import FeatStruct >>> fs2 = FeatStruct(CAT='vp', AGR = fs1) >>> print fs2 [ AGR = [ NUM = 'sg' ] ] [ [ TENSE = 'past' ] ] [ ] [ CAT = 'vp' ]
16 12. mars 2013
NLTK - implementation >>> fs3 = fs2.unify(FeatStruct( "[AGR = ?x, SUBJ = [AGR = ?x]]"))
>>> print fs3 [ AGR = (1) [ NUM = 'sg' ] ] [ [ TENSE = 'past' ] ] [ ] [ CAT = 'vp' ] [ ] [ SUBJ = [ AGR -> (1) ] ]
17 12. mars 2013
Today With recommended (order of) reading • Grammatical features (Last week)
• NLTK book sec 9.1 • Feature structures
• J&M, sec 15.1 • Unification and subsumption
• J&M, sec. 15.2 • Feature structures in NLTK
• NLTK book sec 9.2 • Feature-Based grammars/Unification grammars
• Partly: • J&M, sec 15.3, NLTK book sec 9.3
18
Two formats for grammar rules
NLTK
• S NP[AGR=?x] VP[AGR=?x]
• NP[AGR=?x] Det[AGR=?x] Nom[AGR=?x]
J&M
March 12, 2013 19
Two formats for grammar rules 2
NLTK • V[AGR=[NUM=PL]] ’serve’
• V[AGR=[NUM=SG, PERS=3rd]] ’serves’ • VP[AGR=?x] V[AGR=?x] NP
J&M
March 12, 2013 20
Comparing the formats
NLTK
• Extend non-terminals with partial feature structures
• The feature structures may contain variables for coindexing
• Used in e.g. (early) Head-driven Phrase Structure Grammars
Jurafsky & Martin • Add equations to CFG-rules • An equation equals
• Two paths, or • A path and an atomic value
• Inspired by
• PATR • Lexical-Functional Grammar
12. mars 2013 21
Amount to the same (before extensions)
Interpretation of feature-based grammars
• We have defined: • feature structures and unification • grammar rules with feature structures (x2)
• We should also make clear exactly what a
feature structure grammar defines • (missing from both J&M and NLTK-book)
• We will give a semi-formal definition
22 12. mars 2013
Remember: CFG & Trees
• A local three: • A node which is
not a leaf • All the daughters • The order
between the daughters
• A rule • B s1, s2, …, sn • licenses a locale
tree if and only if is on the form:
March 12, 2013 23
B
s1 s2 sn … …..
Trees • A CFG G, generates a tree t iff
• The top of t is annotated with S • The leafs are tagged with
terminals • Each local tree is licensed by a
rule • T(G) = the set of trees generated
by G • The yield of the tree t are the
symbols on the leafs in order
• A string w may be derived from G iff w is the yield of a tree in T(G).
March 12, 2013 24
Abbreviation: ”iff” for ”if and only if”
Trees with feature structures
12. mars 2013 25
NP, VP,
V, N, DET, NP,
N, DET,
S,
the restaurant serves many fish
Each non-terminal node contains a feature structure
Conditions on grammaticality
12. mars 2013 26
NP, VP,
V, N, DET, NP,
N, DET,
S,
the restaurant serves many fish
Each local tree must be licensed by a grammar rule
Local tree licensed by rule –ex 1
• J&M-format: • The local tree respects
all the equations
• NLTK-format S NP[AGR=?x] VP[AGR=?x] • The rule corresponds to a
partial local tree • The actual local tree
extends this
12. mars 2013 27
NP, VP,
S, Each local tree must be licensed by a grammar rule
Local tree licensed by rule –ex 2
• J&M-format: • The local tree respects
all the equations
DET the <DET AGR PERS>=3rd
• NLTK-format DET[AGR=[PERS=‘3rd’]]-> ‘the’
12. mars 2013 28
DET, Each local tree must be licensed by a grammar rule
the
DET,
the
Conditions on grammaticality
A tree T with feature structures is licensed by feature-structure grammar G if and only if:
• If t1, t2, …, tn are all the local trees in T • Then there are some corresponding rules in G, say g1, g2,
…, gn such that: • Tree ti is licensed by rule gi for i= 1, 2, …, n • T is a minimal structure which satisfy these gi-s
• T is minimal: • If fs_i is the feature structure at the mother of local tree ti for i = 1, 2, …, n • Then we cannot find a structural similar tree for the same sentence with feature
structures fs’_i such that • fs’_i subsumes fs_i for i = 1, 2, …, n • fs_i does not subsume fs’_i for at least one i
12. mars 2013 29