Sentence Realisation from Bag-of-Words with Dependency Co ...web2py.iiit.ac.in/publications/default/download/... · First and foremost, I would like to thank my advisers Prof. Rajeev
Post on 25-May-2020
2 Views
Preview:
Transcript
Sentence Realisation from Bag-of-Words with Dependency Constraints
Thesis submitted in partial fulfillmentof the requirements for the degree of
Master of Science (by Research)in
Computer Science
by
Karthik Kumar G200402018
karthikg@research.iiit.ac.in
Language Technologies Research CentreInternational Institute of Information Technology
Hyderabad - 500 032, INDIAMay 2010
Copyright c© Karthik Kumar G, 2010
All Rights Reserved
International Institute of Information Technology
Hyderabad, India
CERTIFICATE
It is certified that the work contained in this thesis, titled“Sentence Realisation from Bag-of-Words with
Dependency Constraints” by Karthik Kumar G, has been carried out under my supervision and is not
submitted elsewhere for a degree.
Date Principal Co-Adviser: Prof. Rajeev Sangal
Principal Co-Adviser: Mr. Sriram Venkatapathy
To My Parents
Acknowledgments
First and foremost, I would like to thank my advisers Prof. Rajeev Sangal and Mr. Sriram Venkat-
apathy for their guidance and support during my research. They provided me with the opportunities to
pursue my research interests and helped in developing a thinking processes required for the research.
I would like to thank Dr. Dipti Misra Sharma, Prof Laxmi Bai and Dr. Soma Paul for the valuable
discussions on a variety of topics over the years on various research works at Language Technologies
Research Centre (LTRC). I would also like to thank all my LTRCseniors which include Prashanth
Mannem, Rafiya Begum, Anil Kumar Singh, Himanshu Agarwal, Jagadeesh Gorla and Samar Hussain
from whom I learnt NLP. Also I would like to thank all my LTRC colleagues Rohini, Avinesh, Manohar,
Ravi Kiran, Taraka Rama, Sudheer K, Sandhya, Shilpi Sapna, Ritu, Viswanth, Suman, Praneeth and all
my juniors for their support during my stay at LTRC.
My acknowledgment would not be complete without the mentionof my classmates of UG2k4 batch
at IIIT Hyderabad.
v
Abstract
Sentence realisation involves generation of a well-formedsentence from a bag-of-words. Sentence
realisation has wide range of applications ranging from machine translation to dialogue systems. In
this thesis, we present five models for the task of sentence realisation. The five models proposed are
sentential language model, subtree-type based language models (STLM), head-word STLM, part-of-
speech (POS) based STLM and marked head-POS based LM. The proposed models employ simple and
efficient techniques based onN -gram language modeling and use only minimal syntactic information
as dependency constraints among the bag-of-words. This enables the models to be used in wide range
of applications that require sentence realisation.
We have evaluated the proposed models on two different typesof input (gold and noisy). In the
first input type, dependency constraints among the bag-of-words are extracted from treebank which is
manually transcribed and in the second input type, the dependency constraints among the bag-of-words
are noisy as they are automatically extracted from a parser.We have evaluated the models on noisy input
in order to test the robustness of our models. From the results, we observed that the models performed
well even on the noisy input. We have achieved state-of-the-art result for the task of sentence realisation.
We have also successfully tested a graph based nearest neighbour algorithm for the task of sentence
realization. We have shown that using the graph-based algorithm, the computational complexity can be
reduced from factorial to quadratic at a cost of 2% reductionin the overall accuracy. This method makes
our module suitable for employment in several practical applications.
vi
Contents
Chapter Page
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Introduction: Natural language generation . . . . . . . . . .. . . . . . . . . . . . . . 11.2 Stages of NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 11.3 Dependency constraints . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . 31.4 Significance of sentence realisation in machine translation . . . . . . . . . . . . . . . 4
1.4.1 Transfer based Machine Translation . . . . . . . . . . . . . . .. . . . . . . . 41.4.2 Two-stage statistical machine translation . . . . . . . .. . . . . . . . . . . . 5
1.5 Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . 51.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5
2 Sentence realisation: A review. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1 Issues analysed in this thesis . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . 8
3 Statistical Language Modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . 93.2 N -gram parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . 103.3 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 12
4 Sentence realisation experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 144.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . 15
4.2.1 Model 1 : Sentential Language Model . . . . . . . . . . . . . . . .. . . . . . 154.2.2 Model 2 : Subtree-type based Language Models(STLM) . .. . . . . . . . . . 164.2.3 Model 3 : Head-word STLM . . . . . . . . . . . . . . . . . . . . . . . . . .. 184.2.4 Model 4 : POS based STLM . . . . . . . . . . . . . . . . . . . . . . . . . . .204.2.5 Model 5: Marked Head-POS based LM . . . . . . . . . . . . . . . . . .. . . 21
4.3 Nearest Neighbour Algorithm . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . 21
5 Results and Discussion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 25
6 Conclusion and Future Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 296.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 30
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
vii
List of Figures
Figure Page
1.1 Tasks of Natural Language Generation . . . . . . . . . . . . . . . .. . . . . . . . . . 21.2 Bag-of-words with dependency constraints and head marked . . . . . . . . . . . . . . 4
4.1 Unordered dependency tree . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . 164.2 Unordered dependency tree with partial order calculated . . . . . . . . . . . . . . . . 164.3 Dependency tree of a English sentence . . . . . . . . . . . . . . . .. . . . . . . . . . 164.4 Dependency tree of a English sentence with POS tags . . . . .. . . . . . . . . . . . . 174.5 Unordered Dependency tree with POS tags attached to words . . . . . . . . . . . . . . 184.6 A treelet having 3 nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 204.7 Two different treelets which would have same best POS tagsequence . . . . . . . . . 214.8 Complete directed graph using the edges was,.,market,illiquid, . . . . . . . . . . . . . 234.9 Complete directed graph with the best path marked . . . . . .. . . . . . . . . . . . . 23
5.1 Graph showing the BLEU scores of Nearest Neighbour Algorithm for different valuesof K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
viii
List of Tables
Table Page
4.1 The number of nodes having a particular number of children and cumulative frequenciesin the test data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 15
5.1 The results of Model 1-5 . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 265.2 Results of Nearest Neighbour Algorithm for different values of K . . . . . . . . . . . . 265.3 Comparsion of results for English WSJ section 23 . . . . . . .. . . . . . . . . . . . . 27
ix
Chapter 1
Introduction
1.1 Introduction: Natural language generation
Natural language generation (NLG) is a natural language processing task of generating natural lan-
guage from a machine representation system such as a knowledge base or a logical form. The goal
of this process can be viewed as the inverse ofnatural language understanding (NLU). While NLU
concerns mapping from natural language to a computer-interpretable representation, NLG concerns
mapping from a computer-interpretable representation into natural language [22]. In NLU, given an
utterance, the task is to analyze its syntactic structure and interpret the semantic content of that utter-
ance. Whereas in NLG, the goal is to produce a well-formed natural language utterance. NLG is a
critical component in many natural language processing (NLP) applications where the expected output
is a well-formed natural language text. The applications include machine translation, human-computer
dialogue, summarization, item and event descriptions, question answering, tutorials and stories [11].
Since the task of NLG is not straight forward, it is generallydivided into different modules. In the next
section, we describe the different stages involved in the task of NLG.
1.2 Stages of NLG
As mentioned in the earlier section, NLG is generally decomposed into a number of steps. Fig. 1.1
show the different steps involved in a NLG system.
The six steps involved in NLG are:
1. Content determination: The goal of content determination is to decide the type of information to
be included.
(a) It is based on creating messages (data objects used in subsequent language generation).
Messages consist of entities, concepts and relations.
(b) The choice of messages is often based on an analysis of a corpus of human-generated texts
covering the same field (the corpus-based approach).
1
Content DeterminationDiscourse Planning
LexicalizaionReferring Expression Generation
Syntax and morphologyOrthographic Realization Linguistic Realization
Sentence Planning
Text Planning
Sentence Aggregation
Tasks Modules
Figure 1.1Tasks of Natural Language Generation
2. Discourse planning: The process of imposing order and structure over the set of messages that
are determinated by content determination step. It is basedon either discourse relations (such
as elaboration, exemplification) or schemas (that is general patterns according to which a text is
constructed).
3. Sentence aggregation: It involves the process of grouping messages together into sentences. How-
ever, it is not always necessary. Each message may be presented as a separate sentence, but a good
aggregation improves the quality of the text. There are several types of aggregation:
(a) Simple conjunctions
(b) Ellipsis (e.g. “John went to the bank and John deposited $100” may be combined into “John
went to the bank and deposited $100”)
(c) Set formation (combining messages that are identical except for a single constituent, e.g.
“John bought a car, John bought a house and John bought a computer” may be combined
into “John bought a car, a house and a computer”. Also, set formation covers instances like
replacing “Monday, Tuesday,..., Sunday” with whole week)
(d) Embedding (usage of relative clauses)
(e) Problems arise when the system has to make choices between different possibilities of ag-
gregation. The system builder has to supply it with rules that govern such choice-making
4. Lexicalization: The process of deciding which words and phrases should be used in order to
transform the underlying messages into a readable text. This is the point when pragmatic issues
are taken into consideration (e.g. should the text be formalor informal). As with aggregation,
a poorly lexicalized text may still be understood, but good lexicalization improves the quality
2
and fluency of the text. Similarly to aggregation, problems emerge when the system has to make
choices between particular words
5. Referring expression generation: Selecting words and phrases to identify entities (e.g. “Caledo-
nian Express” or “it” or “this train”), generating deictic expressions.
6. Linguistic realization: The process of applying rules ofgrammar in order to produce a text which
is syntactically, morphologically and orthographically correct. This process may be realized by
means of the inverse parsing model, different kinds of grammars or templates
Sometimes, one or more subtasks of NLG might be combined to form a module. In general, there
are three modules:
1. Text planning: This stage combines the content determination and discourse planning tasks.
2. Sentence planning: This stage combines sentence aggregation, lexicalization, and referring ex-
pression generation.
3. Linguistic realisation: As described above, this task involves syntactic, morphological, and ortho-
graphic processing.
Linguistic realisation is the last step in NLG. In linguistic realisation,sentence realisation is a major
step. Sentence realisation involves generating a well-formed sentence from a bag-of-words. These bag-
of-words may be syntactically related to each other. The level of syntactic information attached to the
bag-of-words might vary with application. In this thesis, we present sentence realisation models which
have basic syntactic information as dependency constraints between bag-of-words. In the next section,
we will briefly explain about dependency constraints.
1.3 Dependency constraints
As mentioned in the previous section, we present sentence realisation models which take bag-of-
words with dependency constraints as input and produce well-formed sentence. Dependency constraints
are basic modifier-modified relationships between the bag-of-words [2]. The main reason behind choos-
ing dependency constraints is that they are used in most of the applications and hence the proposed
models can be easily used.
Fig. 1.2 shows an example of bag-of-words with dependency constraints for the sentence “Ram is
going to school”.
This bag-of-words with Dependency constraints can also be seen as a unordered dependency tree. A
dependency tree in which the order of the lexical items are not given is known as unordered dependency
tree.
3
is going
Ram to school
Figure 1.2Bag-of-words with dependency constraints and head marked
1.4 Significance of sentence realisation in machine translation
Our sentence realiser models can be applied in various Natural Language Processing applications
such as transfer based machine translation, two-step statistical machine translation and Natural Lan-
guage Generation applications such as dialogue systems. Webriefly describe the first two applications
and show how our realisation models can be applied to achievethe following tasks.
1.4.1 Transfer based Machine Translation
We now present the role of a sentence realiser in the task of transfer based MT. In transfer-based
approaches for MT1 [12], the source sentence is first analyzed by a parser (a phrase-structure or a
dependency-based parser). Then the source lexical items are transferred to the target language using
a bilingual dictionary. The target language sentence is finally realised by applying transfer-rules that
map the grammar of both the languages. Generally, these transfer rules make use of rich analysis, such
as dependency labels etc, on the source side. The accuracy ofhaving such rich analysis (dependency
labeling) is low and hence, might affect the performance of the sentence realiser. Also, the approach
of manually constructing transfer rules is costly, especially for divergent language pairs such as English
and Hindi or English and Japanese. In this scenario, our models can be used as a robust alternative to
the transfer rules. Once the source lexical items are transferred to the target language using a bi-lingual
dictionary, we apply the realisation models trained on the target side (say Hindi or Japanese) to realise
the sentence. There is one more advantage of using realisation models instead of transfer grammar.
Transfer grammar rules are dependent on both the source language and target language syntactic prop-
erties, whereas the realisation models are dependent only on the target language which makes it more
general. For example, if we want to build a machine translation system between English to Hindi and
Japanese to Hindi and if we follow the transfer based method we need to build transfer grammar rules
for both English to Hindi and Japanese to Hindi. But if we use realisation models, then we need to train
only the Hindi models for both the machine translation systems. In this way the burden will be reduced.
1http://www.isi.edu/natural-language/mteval/html/412.html
4
1.4.2 Two-stage statistical machine translation
A sentence realiser can also be used in the framework of a two-step statistical machine translation.
In the two-step framework, the semantic transfer and sentence realisation are decoupled into indepen-
dent modules. Semantic transfer involves selection of appropriate target language lexical items, given
the source language sentence and sentence realisation involves the production of a well-formed target
sentence from the selected target language lexical items. This provides an opportunity to develop sim-
ple and efficient modules for each of the steps. The model forglobal lexical selection and Sentence
Re-construction [1] is one such approach. In this approach, discriminative techniques are used to first
transfer semantic information of the source sentence by looking at the source sentence globally, this ob-
taining a accurate bag-of-words in the target language. Thewords in the bag might be attached with mild
syntactic information (i.e., the words they modify) [23]. We propose models that take this information
as input and produce a well-formed target sentence.
We can also use our sentence realiser as an ordering module inother approaches such as [21], where
the goal is to order an unordered bag (of treelets in this case) with dependency links. In this thesis,
we do not test our models with any of the applications mentioned above. We evaluate our models
independently. Evaluating the models by plugging them in these applications can become a direction
for potential future work.
1.5 Summary of contributions
1. We proposed five models for the task of sentence realisation from bag-of-words with dependency
constraints. We achieve the state-of-the-art results for this task.
2. We successfully used graph based algorithms which to the best of our knowledge is the first
attempt in this direction for the task of sentence realisation.
1.6 Outline
• Chapter 2 describes the related work on sentence realisation.
• Chapter 3 presents the language modeling techniques. It starts with an introduction to general
language modeling techniques and then describes the maximum likelihood estimation. Finally, it
identifies the problem of data sparsity and gives some smoothing techniques to solve the problem.
• Chapter 4 lists the experiments conducted for the task of sentence realisation. It starts with a
description of the experimental setup and then explains in detail, the five proposed models of
sentence realisation. It also describes the application ofgraph based models (nearest neighbour
algorithm) for the task of sentence realisation.
5
• Chapter 5 contains the results and discussion of the experiments. It tabulates the results of the
models 1-5 on the standard test data. The results on the same test data using the graph based
models are also given at the end of the work. The results are also compared with previous methods.
• Chapter 6 contains concluding remarks and suggestions for future work on sentence realisation.
6
Chapter 2
Sentence realisation: A review
Most general purpose sentence realisation systems developed to date transform the input into well-
formed sentence by statistical language modeling techniques or by the application of a set of grammar
rules based on particular linguistic theories, e.g. Lexical Functional Grammar (LFG), Head-Driven
Phrase Structure Grammar (HPSG), Combinatory Categorial Grammar (CCG), Tree Adjoining Gram-
mar (TAG) etc. The grammars rules can be obtained in different ways like hand-crafted grammars or
semi-automatically extracted or automatically extracted(from treebanks), while language modeling in-
volves the estimation of probabilities of natural languagesentences. We give a brief description of each
of the above:
1. Hand-crafted rules:
In this method, the set of grammar rules for sentence realisation will be manually written. These
rules are based on a particular linguistic theory as mentioned above. For example, the work
presented in FUF/SURGE [8], LKB [5] OpenCCG [24] and XLE [7] perform the task of sen-
tence realisation by manually constructing the grammar rules. There are some problems with this
approach. If we want to build a sentence realiser for a new domain, then the rules have to be re-
written for a new domain. The same reasoning applies for the case of building a sentence realiser
for new languages which is a time-consuming process. Despite these drawbacks, this method
is useful for languages such as Hindi and Telugu which do not have treebanks for learning the
grammar rules automatically.
2. Automatically generated rules using statistical methods:
The grammar rules for sentence realisation can also be learned/extracted directly from the tree-
banks using statistical methods. For languages like English and Chinese which have large tree-
banks, the grammar rules can be created automatically. The work presented in HPSG [18],
LFG [4, 10] and CCG [25] extract the grammar rules automatically from the treebank for the
task for sentence realisation. The advantage of extractingthe grammar rules automatically from
the treebank is that it reduces the manual effort in writing the rules. The other advantage is that
7
these methods can be easily adapted to other domains and languages. The problem with this ap-
proach is that it requires large treebanks to learn the grammar rules automatically. For languages
such as English which have large treebanks, this method can be used directly.
3. Semi-automatically extracted rules: In this method, thetask of sentence realisation is done using
both automatically extracted rules and hand-crafted rules. In the first step, the hand-crafted rules
are used to get the list of possible realisations from a giveninput. Then the statistical methods are
used to choose the best sentence. The sentence realiser presented in Belz,2007 [3] is an example of
this method. This method carries the advantages and disadvantages of both the methods presented
above.
4. Statistical language modeling: Here, the task of sentence realisation is performed using language
modeling techniques. Language modeling enables us to estimate the probability of occurrences
natural language sentences. Detailed explanation of language modeling is given in chapter 3.
Generally language modeling is used to rank the probable candidates in sentence realisation.
2.1 Issues analysed in this thesis
This thesis mainly focuses on language modeling for the taskof sentence realisation in English. One
of the major issues with the models which uses language modeling presented before is that they don’t
use the treelet structure of the dependency tree while building the language models. We believe that
using the treelet structure while building the language models will improve the accuracy of sentence
realisation. We propose five sentence realisation models which uses the treelet structure while building
the language models. The five proposed models are:
1. Sentential LM : This is a generic language model trained onthe entire sentences of training data.
2. Phrase-based LM : We train different language models for subtrees of different types differentiated
by the POS tags of the head of the subtree.
3. Head-based LM : Different LMs for different subtrees are trained only on the heads of the sub-
trees.
4. Head-POS based LM : This is a modification to head based LM trained on POS tags instead of
words.
5. Marked Head-POS based LM : This is an extension to the fourth model where the head of the
subtree explicitly mark.
A detailed explanation of the above models are given in section 4.2.
8
Chapter 3
Statistical Language Modeling
In our models for sentence realisation, we traverse the given unordered dependency tree in a bottom
up fashion. At each node we find the best phrase representing the subtree. This is achieved by scoring
all the possible phrases and extracting the phrase with the best score. Scoring of the phrases is done
using language modeling techniques. The best scored phraseis fixed to the root of the subtree. The best
phrase obtained at the root of the unordered dependency treeis the realised sentence. In this chapter, we
discuss the basics of the language modeling and smoothing techniques which are used in our models for
sentence realisation.
3.1 Introduction
The goal of statistical language modeling is to build a statistical model that can estimate the distri-
bution of natural language sentences as accurately as possible. A statistical language model (LM) is a
probability distributionp(S) over stringsS that attempts to reflect the likelihood of a sentenceS in a
language.
Let the string for which we want to find the probability beS = w1 w2 .... wn. The probabilityp(S)
of S can be expressed as
p(S) = p(w1, w2, w3...wn) = p(w1)p(w2|w1)...p(wn|w1...wn−1) (3.1)
However, it is not possible to reliably estimate all the conditional probabilities required. So, in
practice, the approximation presented in equation 3.2 is made.
p(wi|w1, w2, w3, ..., wi−1) ≈ p(wi|wi−N+1, ..., wi−1) (3.2)
giving
p(S) =
n∏
i=1
p(wi|wi−N+1, ..., wi−1) (3.3)
9
In equation 3.3, a word is dependent only on theN − 1 preceding words and not all preceding words
in the sequence.
The models based on the approximation in equation 3.2 are known asN -gram models. EvenN -gram
probabilities are difficult to estimate reliably and soN is usually limited toN = 1, N = 2 or N = 3,
giving unigram, bigram and trigram models respectively. The simplest model is the unigram where,
p(S) ≈n
∏
i=1
p(wi) (3.4)
The unigram model uses only the frequencies of occurrence ofwords in the corpus to calculatep(S).
The bigram and trigram models are richer approximations. A bigram model uses the conditional
probabilities of word pairs is defined as in equation 3.5
p(S) ≈ p(w1)
[
n∏
i=2
p(wi|wi−1)
]
(3.5)
A trigram is based on the conditional probabilities of word-triples and is defined as in equation 3.6
p(S) ≈ p(w1)p(w2|w1)
[
n∏
i=3
p(wi|wi−2, wi−1)
]
(3.6)
A special marker “$” is used to label the beginning and the endof word-sequences respectively.
Introducing this marker allows statistics relating to the likelihoods of certain words starting or ending a
sentence to be incorporated into models. So sentenceS is
S = $, w1, w2, .., , wn, $ (3.7)
the bigram model computes the probability ofS as given in equation 3.8
p(S) ≈ p(w1|$)
[
n∏
i=2
p(wi|wi−1)
]
p($|wn) (3.8)
Similarly the trigram model computes the sequence probability as shown in equation 3.9
p(S) ≈ p(w1|$)p(w2|$, w1)
[
n∏
i=3
p(wi|wi−2, wi−1)
]
p($|wn−1, wn) (3.9)
3.2 N -gram parameter estimation
The parameters in a traditionalN -gram model can be calculated from the frequency of occurrences
of theN -grams:
p(wi|wi−N+1, ..., wi−1) =c(wi−N+1, ..., wi−1, wi)
c(wi−N+1, ..., wi−1)(3.10)
10
In equation 3.10,c(wi−N+1,...,wi−1,wi) represents count of the sequence of wordswi−N+1,...,wi−1,wi
cooccured in the training corpus andc(wi−N+1,...,wi−1) represents count of the sequence of words
wi−N+1,...,wi−1 occurred in the training corpus.
For bigram approximation we get the probabilities as,
p(wi|wi−1) =c(wi−1, wi)
c(wi−1)(3.11)
For trigram approximation we get
p(wi|wi−2, wi−1) =c(wi−2, wi−1, wi)
c(wi−2, wi−1)(3.12)
Let us consider a small example to understand. Let our training data D be composed of the three
sentences:
• John read Moby Dick
• Mary read a new book
• She read a book by Cher
Let us calculatep(John read a book) for the maximum likelihood bigram model. We have
p(John|$) =c($|John)
c($)=
1
3(3.13)
p(read|John) =c(John|read)
c(John)=
1
1(3.14)
p(a|read) =c(a|read)
c(a)=
2
3(3.15)
p(book|a) =c(book|a)
c(book)=
1
2(3.16)
p($|book) =c(book|$)
c(book)=
1
2(3.17)
giving us
p(John read a book) = p(John|$)p(read|John)p(a|read)p(book|a)p($|book)
=1
3×
1
1×
2
3×
1
2×
1
2
≈ 0.06
11
3.3 Smoothing
But, sometimes the probabilityp(wn|wn−1) might be zero. This might be because c(wn,wn − 1) can
be zero. Then the probability of a valid sentenceS which contains word sequencewn−1,wn is zero. The
problem comes when one of the words is a proper noun. The training data from which the parameters are
learned might not contain all the proper nouns. For example,the probability of a valid English sentence
“Sam read a book” is zero. Since, the probabilityp(read|Sam) is zero. This is because “Sam”
is a proper noun which is not present in the training data. A technique calledsmoothing is used to
address this problem. The term smoothing describes techniques for adjusting the maximum likelihood
estimate of probabilities to produce more accurate probabilities. The name smoothing comes from
the fact that these techniques tend to make distributions more uniform, by adjusting low probabilities
such as zero probabilities upward, and high probabilities downward. Not only do smoothing methods
generally prevent zero probabilities, but they also attempt to improve the accuracy of the model as a
whole. Whenever a probability is estimated from few counts,smoothing has the potential to significantly
improve estimation.
There are many smoothing techniques available. Some of the techniques which we tried are:
1. Add-One smoothing
2. Witten bell smoothing
3. Good-Turing estimate
4. Kneser-Ney Smoothing
5. Katz Smoothing
The detailed explanation of the above techniques is explained in Chen and Goodman(2003) [6]. We
have done experiments with all the above smoothing techniques. We have found out that Kneser-Ney
smoothing technique works well. In all our experiments we have used the same algorithm for smoothing.
TheN -gram models can also be applied between word-classes instead of words, where a class may
be a POS tag or supertag or some other category to which words can be assigned. As mentioned in the
previous section, the problem with the word basedN -gram models are that it cannot handle new words
(The bigrams or trigrams that are not in training data). In this case, we can use the part-of-speech based
N -gram model. In this models, we train our models on part-of-speech tags instead of words. Since,
12
the types of part-of-speech tags are limited, the problem ofsparsity will be avoided. In this thesis, we
also explore whether language models built on POS tags improve the performance of our models. We
also experimented whether language models built on a combination of both word and part-of-speech tag
improves sentence realisation.
13
Chapter 4
Sentence realisation experiments
In this chapter, we present the experiments conducted for the task of sentence realisation. It starts
with a description of the experimental setup and then explains in detail, the five proposed models for
sentence realisation. The chapter concludes with a application of the graph based models (nearest
neighbour algorithm) for the task of sentence realisation.
4.1 Experimental setup
For the experiments, we use the Wall Street Journal (WSJ) portion of the Penn tree bank [14], using
the standard train/development/test splits, viz 39,832 sentences from 2-21 sections, 2416 sentences from
section 23 for testing and 1,700 sentences from section 22 for development. The input to our sentence
realiser are bag-of-words with dependency constraints which are automatically extracted from the Penn
treebank using head percolation rules used in [13]. We also use the provided part-of-speech tags in some
experiments.
In most of the practical applications, the input to the sentence realiser is noisy. To test the robustness
of our models in such scenarios, we also conducted experiments on noisy input data. The input test data
is first parsed with an unlabeled projective dependency parser [19] to obtain the dependency constraints
the order information is dropped to obtain the input to our sentence realiser. However we still use the
correct bag-of-words.
Table 4.1 shows the number of nodes having a particular number of children in the test data.
From Table 4.1, we can see that more than 97.11% of the internal nodes of the trees contain five or
less children.
14
Children countNodes Cumulative frequency %0 30219 53.311 13649 77.392 5887 87.773 3207 93.424 1526 96.115 1017 97.96 685 99.17 269 99.578 106 99.75> 8 119 100
Table 4.1The number of nodes having a particular number of children and cumulative frequencies inthe test data
4.2 Experiments
4.2.1 Model 1 : Sentential Language Model
In this model, we traverse the tree in bottom up manner and findthe best phrase at each subtree. The
best phrase corresponding to the subtree is assigned to rootnode of the subtree during the traversal. If
the subtree contains only one node i.e., the head itself, then the node is the best phrase corresponding
to that subtree. Let a noden haveN children represented asci (1 ≤ i ≤ N ). During the bottom
up traversal, the childrenci are assigned best phrases before processing noden. Let the best phrases
corresponding toith child beαci. The best phrase corresponding to the noden is computed by exploring
the permutations ofn and the best phrases of all the children. The total number of permutations that are
explored are(N + 1)!. A sentential language model is applied on each of the candidate phrases to score
the phrases and select the one with maximum score as the best phrase. A sentential language model is
trained on complete sentences of the training corpus to score the permutations.
αn = bestPhrase ( perm (n, ∀ i p(ci)) o LM ) (4.1)
For example, Fig. 4.1 is an unordered dependency tree and theexpected output is “The equity
market was illiquid .”. When we traverse the tree in a bottom up manner, we first reach the subtree
with “illiquid” as head which contains only one node. So the best phrase corresponding to the subtree
is “illiquid”. We then reach “.”, “The” and “equity” and their best phrases are “.”, “The” and “equity”,
respectively (since all the subtrees contain only one node). Then we reach the subtree with “market”
as head, which contains three nodes “The”, “equity”, “market” and there will be 3! (6) phrases formed
15
was
.illiquid market
equityThe
Figure 4.1Unordered dependency tree
was
.illiquid The_equity_market
Figure 4.2Unordered dependency tree with partial order calculated
and here we can apply language model on each of the 6 phrases and identify the best phrase (say “The
equity market”) to the node “market” as seen in Fig. 4.2. We next go to the subtree with “was” as head.
It contains four nodes “was”, “the equity market”, “illiquid” and “.” (note that here we consider “The
equity market” as a unit and permute). We get 4! (24) phrases and apply language model on each one
of them and identify the best phrase. The best phrase which wehave identified at the tree root is the
sentence generated. For this example, we should ideally get“The equity market was illiquid .”.
The worst case complexity of the model isO(n!). But, this is a rare case where in all the(n − 1)
words is attached to a parent. If we consider that the dependency tree has a subtree with maximum of
k children, then the complexity is of the order ofn × (k + 1)! i.e.,O(n × k!). For a binary tree with n
words, the complexity isn × 3! (O(n)).
4.2.2 Model 2 : Subtree-type based Language Models(STLM)
teaches
friend
myschool
in
Figure 4.3Dependency tree of a English sentence
16
Consider the tree given in Fig. 4.3, when we traverse the dependency tree in a bottom up manner as
mentioned in model 1, we first reach the subtree with “school”as head which contains only one node.
So, the best phrase corresponding to the subtree is “school”. Then we reach the subtree with “in” as
head. There are two phrases “school” and “in” in the subtree which are to be ordered to assign the
best phrase to the node “in”. The two possible phrases are “inschool” and “school in”. Both of the
phrases are valid strings in English. The following sentences “Ram worked in school.” and “The school
in Gachibowli is famous.” are two valid English sentences which contains “in school” and “school in”
phrases respectively. Also the probabilities of “school in” and “in school” might be equal as both of
the phrases have a good chance of occurring in the complete sentences. But we know that, given the
fact that the node represents a prepositional phrases (which have prepositions as root), the string “in
school” should be more probable than the string “school in”.This objective can be achieved by training
different language models for subtrees of different types.A prepositional phrasal language model is
then expected to give much higher probability to “in school”. The POS tags are used to represent the
subtrees of various kinds and hence, different language models are built for different POS tags. So in this
model we synthesize a sentence from a unordered dependency tree by traversing the dependency tree
in bottom-up manner as in model 1. But while scoring the permuted phrases we use different language
models for subtrees headed by words of various POS tags.
αn = bestPhrase ( perm (n, ∀ i p(ci)) o LMPOS(n) ) (4.2)
Here,LMPOS(n) represents the language model associated with the part-of-speech of the noden.
To build STLMs, the training data is parsed first. Each subtree in the parse structure is represented
by the part-of-speech tag of its head. Different language models are created for each of the POS tags.
.
has(VBZ)
date(NN) been(VBN) .(.)
set(VBN)A(DT) record(NN)
not(RB)
Figure 4.4Dependency tree of a English sentence with POS tags
Fig. 4.4 shows a ordered dependency tree of sentence “A record date has not been set .” with words
having their POS tags. Consider the subtree with “date” as head whose gold phrase is “A record date”.
17
We use this phrase to build a “NN” language model. In the similar way, we use gold phrase “been set”
to build a “VBN” language model and “A record date has not beenset .” to build a “VBZ” language
model. Note that we don’t use leaf nodes of the dependency tree for training language models.
We have trained 44 different language models each corresponding to a particular POS tag. For
example, a “IN” language model contains phrases like “in hour, of chaos, after crash, in futures, etc”
and a “VBD” language model contains phrases like “were criticized, never resumed” while training.
was(VBD)
equity(NN)The(DT)
illiquid(JJ) market(NN).(.)
Figure 4.5Unordered Dependency tree with POS tags attached to words
While decoding, the input will be the unordered dependency tree with POS tags attached to the
words. For example, Fig. 4.5 is the given unordered dependency tree with POS tags attached to words
and the expected output is “The equity market was illiquid .”When we traverse the tree in a bottom
up manner, we first reach the subtree with “illiquid” as head which contains only one node. So, the
best corresponding phrase to the subtree is the “illiquid”.We then reach “.”, “The” and “equity” in
order and their best phrases are “.”, “The” and “equity” respectively (since all the subtrees contains only
one node). Then we reach the subtree with “market” as head which contains three nodes “the, equity,
market” and there can be 3! (6) phrases formed and here we can apply “NN” language model on these 6
phrases and assign the best phrase (say “The equity market”)to the node “market” see Fig. 4.2. Next we
go to the subtree with “was” as head which contains 4 nodes “was, the equity market, illiquid, .” (note
that here we consider “The equity market” as a unit and permute). We get 4! (24) phrases and apply
“VBD” language model on them and get the best phrase. The bestphrase obtained at the tree’s root is
the generated sentence (in this example, we should ideally get “The equity market was illiquid .”).
4.2.3 Model 3 : Head-word STLM
In the models presented earlier, a node and its children are ordered using the best phrases of the
children. For example, the best phrase assigned to the node “was” is computed by taking the permuta-
18
tion of “was” and its children “The equity market”, “illiquid” and “.” and then applying the language
model. In this model, instead of considering best phrases while ordering, the heads of the childrenci
are considered. For example, the best phrase assigned to thenode “was” is computed in three steps.
1. The best phrases of children are computed. The best phraseof the childci is αci.
2. Permute the node “was” and the children “market”, “illiquid”, “.” and then apply a language
model trained to order the heads of subtrees.
αn = bestPhrase ( perm (n, ∀ i ci) o LMPOS(n) ) (4.3)
For example, the ordered head will be “market was illiquid .”.
3. The ordered heads are replaced with the corresponding best phrases to get the best phrase asso-
ciated with “was”. Hence, the best phrase of node “was” is “The equity market was illiquid .”
which is “The equity market was illiquid”.
In the similar way, we train head-word STLM language models for each subtree by using only the
heads of the children instead of all the phrase. For example,consider the subtree with “has” as head in
Fig. 4.4, we add heads based phrase “date has not been .” instead of “A record date has not been set .”
to build the “NN” language models.
The major advantages of using this model are,
1. Order at a node is independent of the best phrases of its descendants. This eliminates the redun-
dant words in the descendant’s best phrases to be consideredby the language models that scores
strings at the higher level.
2. Any errors in computation of best phrases of descendants doesn’t effect the choice of reordering
decision at a particular node. For example, the computationof the best phrase at the node “was” is
not affected by whether the best phrase of node “market” is “The equity market” or “The market
equity” or any other possibility. This reduces the propagation of errors to some extent.
3. The head words represent the entire subtree. In this manner, the language modeling can be done
across non-contiguous units also which are connected by a dependency relation.
4. For the subtrees which have a preposition as their root, the preposition represents the grammatical
information of the entire subtree during the scoring using alanguage model.
19
4.2.4 Model 4 : POS based STLM
We now experiment by using Part-Of-Speech (POS) tags of words for ordering the nodes. In the
previous approaches, the language models were trained on the words which were then used to compute
the best strings associated with various nodes. Here, we order the node and its children using a language
model trained on POS tag sequences.
For example in Fig. 4.4, consider the subtree with “date” as head whose gold POS phrase is “DT
NN NN”. We use this phrase to build a “NN” POS based STLM. In thesimilar way, we use gold phrase
“VBN VBN” to build a “VBN” language model and “NN VBZ RB VBN .”.to build a “VBZ” language
model. Note that here also we don’t use leaf nodes of the dependency tree for training language models.
For decoding the process involves three steps,
1. Obtain the best string of POS tags for a node
2. Substitute words in the place of POS tags (of the best POS string)
3. Apply word based language model to choose the best phrase
The third step is important because a single POS tag might be substituted by several words thus
resulting in several candidates for the best phrase. A word-based language model needs to be applied
on these resulting candidates in order to obtain the best phrase.
VB
NN NN
Figure 4.6A treelet having 3 nodes
For example, let the node shown in Fig. 4.6 have the best POS phrase ‘NN NN VB’ where “VB” is
the POS tag of “w1” and “NN” is the POS tag of “w2”, “ w3”. So, the various candidates for best phrase
are “w2 w3 w1” and “w3 w2 w1” and word based language model is applied to choose the best phrase
among these two.
There are two primary advantages with this model:
1. This model is more general and it deals with unseen words effectively
2. Much faster than earlier models as the size of head-POS language model is less
20
4.2.5 Model 5: Marked Head-POS based LM
In POS based STLM, the head of a particular node isn’t marked while applying the language model.
Hence, all the nodes of the treelet are treated equally whileapplying the LM. For example, in Fig. 4.7,
the structures of treelets is not taken into account while applying the head-POS based language model.
Both are treated in the same manner while applying TLM. In this model, we experiment by marking the
head information for the POS of the head word which treats thetreelets in Fig. 4.7 in a different manner
to obtain the best phrase. As the best POS tag sequence might correspond to several orderings of the
treelet, we test various word-based approaches to choose the best ordering among the many possibilities.
The best phrase is generated by following the enumerated methods.
1. Apply word based language model
2. Apply a language model which is trained on a corpus where the POS tags are attached to the
words (of the form wordPOS).
3. Apply a language model trained on the corpus where, only the head of a particular node has an
associated POS tag.
VB
VBP NN
VBP
VB NN
Figure 4.7Two different treelets which would have same best POS tag sequence
4.3 Nearest Neighbour Algorithm
A major bottleneck of the models presented in section 4.2 is that their worst case complexity is
O(N !), whereN is the number of nodes in the subtree. For the models to be usedin a practical systems,
there is a need to decrease the computational complexity of the models presented. The major reason
behind the computational complexity of the models beingO(N !) is that we do a exhaustive search to
find best phrase for each subtree. In order to reduce the computational complexity we have to modify
the search technique. In this regard, we plan to use graph-based algorithm for searching the best phrase
for a subtree. This will reduce the computational complexity from factorial to quadratic. Graph based
21
algorithms for natural language applications such as parsing [15], summarization [17] and word sense
disambiguation [16] have been well explored.
We map the problem of finding the best phrase in a subtree to a graph problem of finding a least
Hamiltonian path in a complete graph. The problem is formulated as follows. The nodes in the graph
represents the words (“$” is added to the bag-of-words whichrepresents the beginning of the phrase)
whereas the edges represents the bi-gram probabilities. Then a complete directed graph is constructed
from the above vertices.
A complete directed graph is a simple graph in which every pair of distinct vertices is connected
by an directed edge. The complete directed graph onn vertices hasn × (n − 1) edges. Every edge
in complete directed graph is associated with a scoreScore(x, y) that maps edge betweenx andy to
a real number. These scores are a negative of the conditionalprobability p(y/x) which is given by
−(count(x, y)/count(x)). Here, we take the negative of the conditional probability only for the sake
of implementation simplicity. We can see from the above formulation that the path which visits each
vertex exactly once and has the least score is the required phrase of the subtree. Such a path which
visits each vertex exactly once is called Hamiltonian path.Now, our task of finding the best sentence is
reduced to building a complete directed graph with nodes of subtree as vertices and find the least cost
Hamiltonian path.
But finding the Hamiltonian path is aNP -complete problem i.e., the search space isN ! for N
nodes. But there are some approximate algorithms1 which gives the approximate solution with less
computational complexity. The solution might not be alwaysoptimal but the primary advantage is the
reduction of search space from factorial to quadratic. One of the familiar approximate algorithm is
nearest neighbour algorithm.
In nearest neighbour algorithm, we start with “$” node and mark it has visited. Then, we find and
visit the lightest edge going from the current vertex which is not visited and mark it as visited. We repeat
this till all the nodes of the graph are visited which gives usthe least Hamiltonian path.
For the example discussed in 4.2.3, the best phrase assignedto node “was” is computed by taking the
permutations of “was” and its children heads “was”,“.”,“market”,“illiquid”. But, in this model instead
of searching all the permutations we use algorithm mentioned above.
1http://en.wikipedia.org/wiki/Approximationalgorithm
22
.(.)market(NN)illiquid(JJ)was(VBD) $($)
Figure 4.8Complete directed graph using the edges was,.,market,illiquid,
First we add “$” to the list of words (“was”,“.”,“market”,“illiquid”) and we consider them as edges.
Then we build the complete directed graph using these edges.Fig. 4.8 shows the complete directed
graph using the words “$”,“was”,“.”,“market”,“illiquid”.
The edges are scored with the negative of conditional probabilities mentioned above. For example,
the edges from “illiquid” to “market” has the score of conditional probability of−P(market/illiquid). In
the similar way, all the edges are scored using the conditional probabilities.
.(.)market(NN)illiquid(JJ)was(VBD) $($)
Figure 4.9Complete directed graph with the best path marked
Once the scored complete directed graph is build, the task now is to find the Hamiltonian path from
the graph with minimum score. We use the nearest neighbour algorithm for achieving this. We start
with “$” vertex and find the edge with least scored edge connected with “$” and we mark that edge as
visited. The process is repeated until all the nodes are marked visited. Fig. 4.9 shows the state of the
23
graph after all the nodes are visited. From the Figure, we getthe best path as “$ market was illiquid .”.
Once we get the best path, we follow the same steps as did in model 3.
Since there are (N2) possible bigram probabilities, the run time complexity ofthe Nearest Neighbour
algorithm is in the order ofN2. This decreases the search space and in turn effects the system output
since the best output might not be possibly explored.
The method described takes the local best at each step. So, the output might not be the global best.
In order to get the global best, we storeK-best instead of top one at each stage. Then in the end when
we get theK-best phrases for the subtree. Then, we chose the best phrasehaving the highest global
probability. Higher value ofK allows more phrases to be considered. Hence the search spacefor the
K-best nearest neighbour algorithm will be in the order ofK × N2.
24
Chapter 5
Results and Discussion
In the previous chapter, we have proposed models for sentence realisation which takes bag-of-words
with dependency constraints and produce well-formed sentence. This chapter explains the results of the
proposed models on the standard test data using two different metrics. The results are also compared
with previous methods.
5.1 Results
Similar to most of the previous works on sentence realisation, we have usedBilingual Evaluation
Understudy (BLEU) score [20] and percentage of exactly matched sentences as evaluation metrics. We
compare the system generated sentences with reference sentences and get the evaluation metrics. As
our system guarantees that all input bag-of-words can realise a sentence, special coverage-dependent
evaluation (as has been adopted in most grammar-based generation systems) is not necessary in our
experiments.
As mentioned earlier in section 4.1, we evaluate our models on two types of input which differs
in dependency constraints. In the first input type, dependency constraints among the bag-of-words are
extracted from treebank which is manually transcribed and in the second input type, the dependency
constraints among the bag-of-words are noisy as they are automatically extracted from a parser.
Table 5.1 shows the results of model 1-5 for both input extracted from treebank and parser output.
We can observe that in model 1, BLEU score of the parser input is high when compared to treebank
input. This might be because, the parser input is projective(as we used projective parsing) whereas
the treebank input might contain some non-projective cases. In general, for all the models, the results
25
Treebank(gold) Parser(noisy)Model BLEU score ExMatch BLEU score ExMatch
Model 1 0.5472 12.62% 0.5514 12.78%Model 2 0.6886 18.45% 0.6870 18.29%Model 3 0.7284 21.86% 0.7227 21.52%Model 4 0.7890 28.52% 0.7783 27.92%Model 5 0.8156 29.47% 0.8027 28.78%
Table 5.1The results of Model 1-5
with noisy dependency links are comparable to the cases where gold dependency links are used which
is encouraging.
From table 5.1, we can observe that model 5 gives the best BLEUscores for both the input types
and we can also observe that the difference in the BLEU scoresfor two input types is 0.0129. From
the low difference value, we can infer that our models work well even when the dependency constraints
between the bag-of-words are noisy.
As mentioned in section 4.3, we have applied the graph based nearest neighbour algorithm for search-
ing the best phrase for each subtree. We have directly applied the search algorithm on the high perform-
ing Model 5. We have also suggested to store the K-best instead of one best. Table 5.1 shows the results
of the K-best nearest neighbour algorithm for different values ofK.
K-best BLEU score1 0.45132 0.62113 0.68855 0.736510 0.771620 0.786830 0.7968
Table 5.2Results of Nearest Neighbour Algorithm for different values of K
We can see from the Table 5.1 that, forK = 30, BLEU score obtained for the standard test set is
0.7968. Model 5, which does a exhaustive search ofN ! has achieved a BLEU score of 0.8156 on the
same test data. The results show that there is a slight decrease of 0.0188 BLEU score with the decrease
of computational complexity fromN ! to K ∗ N2 (K=30).
Fig. 5.1 shows the results of the nearest neighbour algorithm with K on the x-axis and BLEU score
on the y-axis. We observe that BLEU score increases at a faster rate upto the value of K = 10 and
gradually stabilises after that.
26
Figure 5.1Graph showing the BLEU scores of Nearest Neighbour Algorithm for different values of K
Paper BLEU scoreLangkilde(2002) 0.757Nakanishi(2005) 0.705Cahill(2006) 0.6651Hogan(2007) 0.6882White(2007) 0.5768Guo(2008) 0.7440Our Model 0.8156
Table 5.3Comparsion of results for English WSJ section 23
The results given in table 5.1 are taken from Guo et al., 2008 [9], which shows the BLEU scores
for different systems on section 23 of PTB. It is really difficult to compare sentence realisers as the
information contained in the input varies greatly between systems.
But, we can clearly see that the our system performs better than all the other systems. The main
observations from the results are summarised below
1. Searching the entire search space of O(n!) gives the best performance.
2. Treelet LM capture characteristics of phrases headed by various POS tags, in contrast to sentential
LM which is a general LM.
3. POS tags play an important role in ordering nodes of a dependency structure.
4. The head models performed better than the models that usedall the nodes of the subtree.
27
5. Marking the head of a treelet provides vital clues to the language model for reordering.
28
Chapter 6
Conclusion and Future Work
6.1 Conclusion
In this thesis, we have addressed the problem of sentence realisation. The input to the sentence real-
isation is a bag-of-words with dependency constraints. Theexpected output is a well formed sentence
out of the bag-of-words. We have presented fiven-gram based models for sentence realisation. They
are:
1. Sentential Language Model
2. Subtree-type based Language Models(STLM)
3. Head-word STLM
4. POS based STLM
5. Marked Head-POS based LM
We have evaluated our models on two different types of input(gold and noisy). In the first input
type, we have bag-of-words with dependency constraints extracted from treebank and in the second
input type, the dependency constraints among the bag-of-words are extracted from the parser which are
noisy. From the results, we can conclude that the model ‘Marked Head-POS based LM’ works best
with 0.8156 BLEU score on dependency constraints extracted from treebank and0.8027 BLEU score
on dependency constraints extracted from parser output. This is the best result for the task of sentence
realisation on the standard test data. We also observe that,there is only a small decrease in the results
when the dependency constraints among the bag-of-words arenoisy from this we can conclude that the
models are fairly robust.
29
We have also tested the graph based nearest neighbour algorithm for the task of sentence realization.
We have shown that using the graph-based algorithm can reduce the computational complexity from
factorial to quadratic at the cost of 2% reduction in the overall BLEU score. This method of decreasing
the computational complexity at a very low cost makes our module suitable for employment in practical
applications. We have also checked the importance of storing K-best solutions at each stage and choose
the best sentence with higher global probability at the end.The BLEU score improved from 0.4513 for
K = 1 to 0.7968 for K = 30.
6.2 Future work
There are several possible areas of further research as an extension to this work.
The models proposed in thesis, consider only the locally best phrases (local to the subtree) at every
step. In order to retain the globally best possibilities at every step, we plan to use beam search, where
we retain K-best phrases for every subtree.
Also, the goal is to test the approach for morphologically-rich languages such as Hindi. Also, it
would require us to expand our features set. We also plan to test the factored models in this regard.
The most important experiment that we plan to perform is to test our system in the context of MT,
where the input is more real and noisy.
To train more robust language models, we plan to use the much larger data on a web scale.
30
Publications
1. Karthik Gali and Sriram Venkatapathy,Sentence Realisation from Bag of Words with de-
pendency constraints. In Proceedings of Human Language Technologies: The 2009 Annual
Conference of the North American Chapter of the Associationfor Computational Linguistics.
2. Karthik Gali , Sriram Venkatapathy and Taraka Rama,From Factorial to Quadratic Time
Complexity for Sentence Realization using Nearest Neighbour Algorithm . In Proceedings
of The 7th Brazilian Symposium in Information and Human Language Technology.
31
Bibliography
[1] S. Bangalore, P. Haffner, and S. Kanthak. Statistical machine translation through global lexical selection and
sentence reconstruction. InANNUAL MEETING-ASSOCIATION FOR COMPUTATIONAL LINGUISTICS,
volume 45, page 152, 2007.
[2] R. Begum, S. Husain, A. Dhwaj, D. Sharma, L. Bai, and R. Sangal. Dependency annotation scheme for
Indian languages.Proceedings of IJCNLP-2008, 2008.
[3] A. Belz. Probabilistic Generation of Weather Forecast Texts. InProceedings of NAACL HLT, 2007.
[4] A. Cahill and J. van Genabith. Robust PCFG-Based Generation Using Automatically Acquired LFG
Approximations. InANNUAL MEETING-ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, vol-
ume 44, 2006.
[5] J. Carroll, A. Copestake, D. Flickinger, and V. Poznanski. An efficient chart generator for (semi-) lexicalist
grammars. InProceedings of the 7th European workshop on natural language generation (EWNLG99),
pages 86–95, 1999.
[6] S. Chen and J. Goodman. An empirical study of smoothing techniques for language modeling.Computer
Speech and Language, 13(4):359–394, 1999.
[7] D. Crouch, M. Dalrymple, R. Kaplan, T. King, J. Maxwell, and P. Newman. XLE documentation.Available
on-line, 2007.
[8] M. Elhadad. FUF: The universal unifier user manual version 5.0. Department of Computer Science,
Columbia University. New York, 1991.
[9] Y. Guo, J. van Genabith, and H. Wang. Dependency-Based N-Gram Models for General Purpose Sentence
Realisation.Proceedings of the 22nd conference on Computational linguistics, 2008.
[10] D. Hogan, C. Cafferkey, A. Cahill, and J. van Genabith. Exploiting Multi-Word Units in History-Based
Probabilistic Generation. InProceedings of the 2007 Joint Conference on Empirical Methods in Natural
Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 2007.
[11] I. Langkilde-Geary. An empirical verification of coverage and correctness for a general-purpose sentence
generator. InProceedings of the 12th International Natural Language Generation Workshop, pages 17–24.
Citeseer, 2002.
32
[12] A. Lavie, S. Vogel, L. Levin, E. Peterson, K. Probst, A. Llitjos, R. Reynolds, J. Carbonell, and R. Co-
hen. Experiments with a Hindi-to-English transfer-based MT system under a miserly data scenario.ACM
Transactions on Asian Language Information Processing (TALIP), 2(2):143–163, 2003.
[13] D. Magerman. Statistical decision-tree models for parsing. In Proceedings of the 33rd annual meeting
on Association for Computational Linguistics, pages 276–283. Association for Computational Linguistics
Morristown, NJ, USA, 1995.
[14] M. Marcus, M. Marcinkiewicz, and B. Santorini. Building a large annotated corpus of English: the penn
treebank.Computational Linguistics, 19(2), 1993.
[15] R. McDonald, F. Pereira, K. Ribarov, and J. Hajic. Non-projective dependency parsing using spanning
tree algorithms. InProceedings of Human Language Technology Conference and Conference on Empirical
Methods in Natural Language Processing, pages 523–530, 2005.
[16] R. Mihalcea. Unsupervised large-vocabulary word sense disambiguation with graph-based algorithms for
sequence data labeling. InProceedings of the conference on Human Language Technology and Empiri-
cal Methods in Natural Language Processing, pages 411–418. Association for Computational Linguistics
Morristown, NJ, USA, 2005.
[17] R. Mihalcea and P. Tarau. Multi-document Summarization with iterative graph-based algorithms. In
Proceedings of the First International Conference on Intelligent Analysis Methods and Tools (IA 2005),
McLean, VA, 2005.
[18] H. Nakanishi and Y. Miyao. Probabilistic models for disambiguation of an HPSG-based chart generator. In
Proceedings of the International Workshop on Parsing Technology, 2005.
[19] J. Nivre, J. Hall, J. Nilsson, G. Eryigit, and S. Marinov. Labeled pseudo-projective dependency parsing with
support vector machines. InProceedings of the Tenth Conference on Computational Natural Language
Learning (CoNLL), pages 221–225, 2006.
[20] K. Papineni, S. Roukos, T. Ward, and W. Zhu. BLEU: a method for automatic evaluation of machine
translation.Proceedings of the 40th Annual Meeting on ACL, 2001.
[21] C. Quirk, A. Menezes, and C. Cherry. Dependency treelettranslation: Syntactically informed phrasal SMT.
pages 271–279, 2005.
[22] E. Reiter and R. Dale. Building applied natural language generation systems.Natural Language Engineer-
ing, 3(01):57–87, 1997.
[23] S. Venkatapathy and S. Bangalore. Three models for discriminative machine translation using Global Lexi-
cal Selection and Sentence Reconstruction. InProceedings of SSST, NAACLHLT/AMTA Workshop on Syntax
and Structure in Statistical Translation, pages 152–159, 2007.
[24] M. White. Reining in CCG Chart Realization.LECTURE NOTES IN COMPUTER SCIENCE, 2004.
[25] M. White, R. Rajkumar, and S. Martin. Towards Broad Coverage Surface Realization with CCG. In
Proceedings of the Workshop on Using Corpora for NLG: Language Generation and Machine Translation
(UCNLG+ MT), 2007.
33
CURRICULUM VITAE
1. NAME : Karthik Kumar G
2. DATE OF BIRTH : 08-02-1987
3. PERMANENT ADDRESS:
11-9-78/2, Laxmi nagar colony,
Kothapet, Hyderabad 50032
Andhra Pradesh, India
4. EDUCATIONAL QUALIFICATIONS :
May 2010: Master of Science (by Research) and Bachelor of Technology in Computer Science
and Engineering, IIIT Hyderabad
34
THESIS COMMITTEE
1. GUIDES:
• Prof. Rajeev Sangal
• Mr. Sriram Venkatapathy
2. MEMBERS :
• Dr. Bruhadeshwar Bezawada
• Prof. B. Yegnanarayana
35
top related