A DECISION THEORETIC APPROACH TO NATURAL LANGUAGE GENERATION by NATHAN MCKINLEY Submitted in partial fulfillment of the requirements For the degree of Master of Science Thesis Advisor: Dr. Soumya Ray Electrical Engineering and Computer Science CASE WESTERN RESERVE UNIVERSITY January, 2014
59
Embed
A DECISION THEORETIC APPROACH TO NATURAL LANGUAGE …
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A DECISION THEORETIC APPROACH TO NATURAL LANGUAGE GENERATION
by
NATHAN MCKINLEY
Submitted in partial fulfillment of the requirements
For the degree of Master of Science
Thesis Advisor: Dr. Soumya Ray
Electrical Engineering and Computer Science
CASE WESTERN RESERVE UNIVERSITY
January, 2014
i
CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES
We hereby approve the thesis of
NATHAN MCKINLEY
candidate for the Master of Science degree*.
Dr. Soumya Ray
Dr. Michael Lewicki
Dr. Gregory Lee
Dr. Vincenzo Liberatore
Date: December 2, 2013
*We also certify that written approval has been obtained for any proprietary material contained
1.1 The flow of information through a normal interaction with a user in a generalizedNLP system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 The basic UCT algorithm for searching over an MDP. Equation 2.1 is the UCBalgorithm referenced in the child selection step. . . . . . . . . . . . . . . . . . . . 9
2.2 An example of an initial tree in a lexicalized tree adjoining grammar . . . . . . . . 11
2.3 An example of an adjoining tree in a lexicalized tree adjoining grammar . . . . . . 11
5.5 Reward function output over time as STRUCT generates a sentence with a nestedsubclause. Decreases in reward after t=4 and t=12 are the STRUCT algorithmchoosing a plan which is locally suboptimal but which it suspects is globally opti-mal (in this case, adding the word ‘which’ to the sentence). . . . . . . . . . . . . . 38
5.6 Time to generate sentences with conjunctions with one entity (“The man sat andthe girl sat and ....”). Note that STRUCT performs slightly worse than CRISP withone entity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.7 Time to generate sentences with conjunctions with two entities (“The dog chasedthe cat and ...”). Note that STRUCT performs much better than CRISP with twoentities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.8 Time to generate sentences with conjunctions with three entities (“The man gavethe girl the book and ....”). CRISP cannot generate sentences with conjunctionsand three entities; STRUCT is much less sensitive to a change in the number ofentities than CRISP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
In this section, we compare STRUCT to a state-of-the-art NLG system, CRISP 1 and eval-
uate three hypotheses: (i) STRUCT is comparable in speed and generation quality to CRISP
as it generates increasingly large referring expressions, (ii) STRUCT is comparable in speed
and generation quality to CRISP as the size of the grammar which they use increases, and (iii)
STRUCT is capable of communicating complex propositions, including multiple concurrent
goals, negated goals, and nested subclauses. Finally, we evaluate the effect on STRUCT’s
performance of varying key parameters, including grammar size.
We compare CRISP to two different versions of STRUCT. As mentioned in the previous
chapter, there are two different reward functions which we have written and found to be useful
in this domain. We compare to both such functions in order to demonstrate the performance
tradeoffs of a system based on a reward function. STRUCT was implemented in Python 2.7,
where CRISP was implemented in Java. All of our experiments were run on a 4-core AMD
Phenom II X4 995 processor clocked at 3.2 GHz. Both systems were given access to 8 GB of
RAM.
5.1 Comparison to CRISP
We begin by describing experiments comparing STRUCT to CRISP. We used a 2010 ver-
sion of CRISP which uses a Java-based GraphPlan implementation. In these experiments, we
1We considered using the PCRISP system as a baseline [22]. However, we could not get the system to compile,and we did not receive a response to our queries, so we were unable to use it.
32
0.01
0.1
1
10
100
1000
10000
0 2 4 6 8 10 12 14 16
Tim
e to
Gen
erat
e (s
econ
ds)
Referring Expression Length
CRISPSTRUCT, final
STRUCT, initial
Figure 5.1 Experimental comparison between STRUCT and CRISP: Generation time vs.length of referring expression
use a deterministic grammar. Because the reward signal is fine-grained, a myopic action selec-
tion strategy is sufficient for these experiments, and the d parameter is set to zero. The number
of simulations for STRUCT varies between 20 to 150. In most cases, a small n, under 100, is
sufficient to guarantee generation success. The exploration constant c in Equation 2.1 is irrel-
evant when n is less than the size of the set of actions, since it applies only to actions selected
after all open actions have already been tried once.
5.1.1 Referring Expressions
We first evaluate CRISP and STRUCT on their ability to generate referring expressions.
We follow prior work ([20]) in our initial experiment design. We consider a series of sentence
generation problems which require the planner to generate a sentence like “The Adj1 Adj2 ...
Adjk dog chased the cat.”, where the string of adjectives is a string that distinguishes one dog
(whose identity is specified in the problem description) from all other entities in the world.
The experiment has two parameters: j, the number of adjectives in the grammar, and k, the
number of adjectives necessary to distinguish the entity in question from all other entities. We
33
0
2
4
6
8
10
12
14
0 1 2 3 4 5 6 7
Sco
re
Time (seconds)
STRUCTCRISP
Figure 5.2 Experimental comparison between STRUCT and CRISP: Score of best solution vstime.
set j = k and show the results in Figure 5.1. We observe that CRISP was able to achieve sub-
second or similar times for all expressions of less than length 5, but its generation times increase
exponentially past that point, exceeding 100 seconds for some plans at length 10. At length 15,
CRISP failed to generate a referring expression; after 90 minutes the Java garbage collector
terminated the process. STRUCTb, performs much better and is able to generate much longer
referring expressions without failing. Later experiments had successful referring expression
generation of lengths as high as 25. STRUCTa performs similarly to CRISP asymptotically.
We can also observe the anytime nature of STRUCT from this experiment, shown in Fig-
ure 5.2. Here we look at the length of the solution sentence generated as a function of time, for
k = 8, a mid-range scenario which both generators are able to solve relatively quickly (< 5s).
As expected, CRISP produces nothing until the end of its run, at which point it returns the
solution. STRUCT (both versions) quickly produces a reasonable solution, “The dog chased
the cat.” This is then improved upon by adjoining until the referring expression is unambigu-
ous. If at any point the generation process was interrupted, STRUCT would be able to return a
solution that at least partially solves the communicative goal.
In this experiment, d was set equal to 1, since each action taken improved the sentence in a
way measurable by our reward function. n was set equal to k(k + 1), since this is the number
34
0
5
10
15
20
25
30
35
10 20 30 40 50 60
Tim
e to
Gen
erat
e (s
econ
ds)
Adjoining Grammar Size
CRISPSTRUCT
STRUCT (pruning)
Figure 5.3 Effect of grammar size
3.5
4
4.5
5
5.5
6
6.5
7
7.5
8
1 2 3 4 5 6 7 8 9
Tim
e to
Gen
erat
e (s
econ
ds)
Number of Goals
Positive GoalsNegative Goals
Figure 5.4 Effect of multiple and negated goals
of adjoining sites available in the final step of generation, times the number of potential words
to adjoin. This allows us to ensure successful generation in a single loop of the STRUCT
algorithm.
35
5.1.2 Grammar Size
We next evaluate STRUCT and CRISP’s ability to handle larger grammars. This experi-
ment is set up in the same way as the one above, with the exception of l “distracting” words,
words which are not useful in the sentence to be generated. l is defined as j − k. In these
experiments, we vary l between 0 and 50. Figure 5.3 shows the results of these experiments.
We observe that CRISP using GraphPlan, as previously reported in [20], handles an increase
in number of unused actions very well. Prior work reported a difference on the order of single
milliseconds moving from j = 1 to j = 10. We report similar variations in CRISP runtime as
j increases from 10 to 60: runtime increases by approximately 10% over that range.
5.1.2.1 Without Pruning
STRUCT’s performance with large grammars is similar to CRISP using the FF planner
[27], also profiled in [20], which increased from 27 ms to 4.4 seconds over the interval from
j = 1 to j = 10. STRUCT’s performance is less sensitive to larger grammars than this, but over
the same interval where CRISP increases from 22 seconds of runtime to 27 seconds of runtime,
STRUCT increases from 4 seconds to 32 seconds. This is due almost entirely to the required
increase in the value of n (number of samples) as the grammar size increases. At the low end,
we can use n = 20, but at l = 50, we must use n = 160 in order to ensure perfect generation
as soon as possible. Fortunately, as STRUCT is an anytime algorithm, valid sentences are
available very early in the generation process, despite the size of the set of adjoining trees (the
“STRUCT Initial” curve in Figure 5.3). This value does not change substantially with increases
in grammar size. However, the time to improve this solution does.
5.1.2.2 With Pruning
STRUCT’s performance with large grammars improves dramatically if we allow for prun-
ing (described in Chapter 4). This experiment involving distracting words is a perfect example
of a case where pruning will perform well. When we apply pruning we find that STRUCT
is able to completely ignore the effect of additional distracting words. Experiments showed
36
roughly constant times for generation for j = 1 through j = 5000. Although pruning is O(n)
in grammar size, repeated experiments failed to show any significant distinction in runtime,
even on very large grammars. The time taken by this algorithm depends on the complexity of
the world and of the goals, and is independent of grammar size.
5.2 Evaluation of Complex Communicative Goals
In the next set of experiments, we illustrate that STRUCT can solve conjunctions of com-
municative goals as well as negated communicative goals.
5.2.1 Multiple Goals
We next evaluate STRUCT’s ability to accomplish multiple communicative goals when
generating a single sentence. In this experiment, we modify the problem from the previous
section. In that section, the referred-to dog was unique, and it was therefore possible to pro-
duce a referring expression which identified it unambiguously. In this experiment, we remove
this condition by creating a situation in which the generator will be forced to ambiguously refer
to several dogs. We then add to the world a number of adjectives which are common to each of
these possible referents. Since these adjectives do not further disambiguate their subject, our
generator should not use them in its output. We then encode these adjectives into communica-
tive goals, so that they will be included in the output of the generator despite not assisting in
the accomplishment of disambiguation. For example, assume we had two black cats, and we
wanted to say that one of them was sleeping, but we wanted to emphasize that it was a black
cat. We would have as our goal both “sleeps(c)” and “black(c)”. We want the generator to say
“the black cat sleeps”, instead of simply “the cat sleeps”.
We find that, universally, these otherwise useless adjectives are included in the output of
our generator, demonstrating that STRUCT is successfully balancing multiple communicative
goals. As we show in figure 5.4 (the “Positive Goals” curve) , the presence of additional
satisfiable semantic goals does not substantially affect the time required for generation. We are
37
able to accomplish this task with the same very high frequency as the CRISP comparisons, as
we use the same parameters.
5.2.2 Negated Goals
We now evaluate STRUCT’s ability to generate sentences given negated communicative
goals. We again modify the problem used earlier by adding to our lexicon several new ad-
jectives, each applicable only to the target of our referring expression. Since our target can
now be referred to unambiguously using only one adjective, our generator should just select
one of these new adjectives (this has been experimentally confirmed). We then encode these
adjectives into negated communicative goals, so that they will not be included in the output of
the generator, despite allowing a much shorter referring expression. For example, assume we
have a tall spotted black cat, a tall solid-colored white cat, and a short spotted brown cat, but
we wanted to refer to the first one without using the word “black”.
We find that these adjectives which should have been selected immediately are omitted
from the output, and that the sentence generated is the best possible under the constraints. This
demonstrates that STRUCT is balancing these negated communicative goals with its positive
goals. Figure 5.4 (the “Negative Goals” curve) shows the impact of negated goals on the time
to generation. Since this experiment alters the grammar size, we see the time to final generation
growing linearly with grammar size. The increased time to generate can be traced directly to
this increase in grammar size. This is a case where pruning does not help us in reducing the
grammar size; we cannot optimistically prune out words that we do not plan to use. Doing so
might reduce the ability of STRUCT to produce a sentence which partially fulfills its goals.
5.2.3 Nested subclauses
Here, we evaluate STRUCTa’s ability to generate sentences with nested subclauses. An
example of such a sentence is “The dog which ate the treat chased the cat”. This is a difficult
sentence to generate for several reasons. The first, and clearest, is that there are words in the
sentence which do not help to increase the score assigned to the partial sentence. Notably, we
38
0
200
400
600
800
1000
1200
2 4 6 8 10 12 14 16 18
Sco
re
Time (seconds)
Generated ScoreBest Available Score
Figure 5.5 Reward function output over time as STRUCT generates a sentence with a nestedsubclause. Decreases in reward after t=4 and t=12 are the STRUCT algorithm choosing a planwhich is locally suboptimal but which it suspects is globally optimal (in this case, adding the
word ‘which’ to the sentence).
must adjoin the word “which” to “the dog” during the portion of generation where the sentence
reads “the dog chased the cat”. This decision requires us to do planning deeper than one level
in the MDP, which increases by O(Nd) the number of simulations STRUCT requires in order
to get the best possible result.
Despite this issue, STRUCT is capable of generating these sentences. As we can see in
Figure 5.5, STRUCT’s time to generate increases with the number of nested clauses. To the best
of our knowledge, CRISP is not able to generate sentences of this form due to an insufficiency
in the way it handles TAGs, and consequently we present our results without baselines. We
present results only for STRUCTa here, since STRUCTb is not capable of generating sentences
using indirection.
In this case, we require lookahead further into the tree than depth 1. We need to know that
using “which” will allow us to further specify which dog is chasing the cat; in order to do this
we must use at least d = 3. Our reward function will determine this with, at a minimum, the
actions corresponding to “which”, “ate”, and “treat”.
39
0
1
2
3
4
5
6
1 2 3 4 5
Tim
e to
Gen
erat
e (s
econ
ds)
Number of Sentences
STRUCT (1 entity)CRISP (1 entity)
Figure 5.6 Time to generate sentences with conjunctions with one entity (“The man sat andthe girl sat and ....”). Note that STRUCT performs slightly worse than CRISP with one entity.
0
20
40
60
80
100
120
140
160
1 2 3 4 5
Tim
e to
Gen
erat
e (s
econ
ds)
Number of Sentences
STRUCT (2 entities)CRISP (2 entities)
Figure 5.7 Time to generate sentences with conjunctions with two entities (“The dog chasedthe cat and ...”). Note that STRUCT performs much better than CRISP with two entities.
5.2.4 Conjunctions
Here, we evaluate STRUCTb’s ability to generate sentences including conjunctions. We
introduce the conjunction “and”, which allows for the root nonterminal of a new sentence (‘S’)
to be adjoined to any other sentence. We then provide STRUCT with multiple goals. Given
sufficient depth for the search (d = 3 was determined to be sufficient, as our reward signal is
40
0
10
20
30
40
50
60
70
1 2 3 4 5
Tim
e to
Gen
erat
e (s
econ
ds)
Number of Sentences
STRUCT, 3 entities
Figure 5.8 Time to generate sentences with conjunctions with three entities (“The man gavethe girl the book and ....”). CRISP cannot generate sentences with conjunctions and threeentities; STRUCT is much less sensitive to a change in the number of entities than CRISP.
fine-grained), STRUCT will produce two sentences joined by the conjunction “and”. Here, we
follow prior work in our experiment design [20].
As we can see in Figures 5.6, 5.7, and 5.8, STRUCT successfully generates results for
conjunctions of up to five sentences. This is not a hard upper bound, but generation times
begin to be impractically large at that point. Fortunately, human language tends toward shorter
discourse units than these unwieldy (but technically grammatical) sentences.
STRUCT increases in generation time both as the number of sentences increases and as the
number of objects per sentences increases. We show results for STRUCTa here, as our output
should contain only simple sentences without nesting, and because STRUCTb is exponential
in number of entities in the sentence, which will cause impractically large generation times for
this experiment.
We also compare our results to those presented in [20] for CRISP with the FF Planner.
Koller et al attempted to generate sentences with three entities and failed to find a result within
their 4 GB memory limit. As we can see, CRISP generates a result slightly faster than STRUCT
when we are working with a single entity, but works much much slower for two entities and
41
cannot generate results for a third entity. According to Koller’s findings, this is because the
search space grows by a factor of the universe size with the addition of another entity [20].
5.3 Summary
In summary, we have implemented the STRUCT algorithm in Python 2.7 and tested it in
a variety of situations, including several which are difficult for CRISP and which challenge
our generator’s ability to prune, to generate complex referring expressions, to generate nested
clauses, and to generate long and very complex sentences. In all these cases, STRUCT was
able to accomplish the communicative goals without trouble, in a reasonable amount of time.
We found that, in general, STRUCT’s asymptotic behavior is comparable to CRISP’s, but in
the region where most language generation takes place, STRUCT performs approximately as
well or better.
We have also shown that STRUCT is capable of generating language that CRISP sim-
ply cannot (referring expression), and that STRUCT is capable of anytime generation. These
properties make STRUCT a desirable natural language generator, even compared to the state-
of-the-art generator we used.
42
Chapter 6
Conclusion and Future Work
In conclusion, we will discuss some of the strengths and weaknesses of STRUCT, and
describe directions for future work.
We believe that we have produced an algorithm which unifies the two approaches to NLG
discussed in Chapter 3. We have fused the probabilistic reasoning and domain knowledge use
from the probabilistic ranking method with the planning problem conversion and structured
approach to semantics from the classical planning approach. We have the advantage of partial
goal satisfaction from “overgenerate and rank” and the advantage of explicit semantic meaning
specification and output from classical planning.
We have avoided the all-or-nothing weakness of classical planning, where a classical plan-
ner cannot emit a sentence which does not optimally satisfy the goal. Yet, STRUCT allows
generation of complex goals, which would be intractable with a probabilistic ranking genera-
tor. We have avoided the requirement that we specify a large percentage of our output as input,
which is a problem with probabilistic ranking.
In short, we have constructed an algorithm with many of the strengths of both classical
planning and probabilistic ranking, and few of the weaknesses of either.
STRUCT’s prime strength is its ability to partially satisfy goals in cases where perfectly
correct generation is impossible, either because of conflicting goals or because of a grammar
which does not allow for all goals. This allows STRUCT to be used in situations even without
perfect knowledge of the domain, since STRUCT can recover from some of the issues that
come from unknown domains (like a grammar which is too small or insufficient knowledge of
43
the world), and continue attempting to generate close-to-optimal output where other generators
would either fail after churning on the problem for a long time, or emit something nonsensical.
Another important strength of STRUCT is its anytime nature; something which we have
not seen done before. STRUCT is capable of creating an approximate solution very quickly
(usually less than 0.5 seconds), and that approximate solution can be emitted if the user desires
a response very quickly. The solution will be iteratively improved until it reaches a sentence
which fulfills all communicative goals given.
One important weakness of STRUCT is the rapid increase in generation time past certain
limits. STRUCT has trouble generating referring expressions of length 10 or more, and has
trouble generating sentences which adjoin more than about 5 verbs.
STRUCT also is limited by its grammar. It is difficult to produce an LTAG grammar for
English, and any such grammar will, by nature, overgenerate (the language specified by the
grammar will contain some constructs not acceptable in English). It is best to build a gram-
mar which undergenerates (the language specified by the grammar is a subset of acceptable
English), but that can be difficult without knowledge of which constructs exactly will be nec-
essary.
Due to some peculiarities of the python interpreter, we were unable to efficiently parallelize
UCT during the search phase of generation. This would be relatively straightforward using a
worker pool solution, and we could drastically shrink the amount of time spent in that search.
We considered incremental computation of reward functions in order to get around the issue
of STRUCTa being relatively slow. Since most of the computation done in the reward function
is repeated between runs, we thought that storing the shared computation might increase speed.
In our experiments, we discovered that constant factors (time to hash the entire partial tree)
dominated the saved time.
We also considered caching of reward functions. Due to the method by which STRUCT
performs experiments, it often does simulations which result in the same partial sentence. We
thought that we could save computations by storing the results and looking them up later.
44
Again, we found that constant factors dominated the saved computation. Both these failures
could likely be overcome by a different choice of implementation language or methodology.
Due to the nature of STRUCT’s simulations, it often reaches the same state from two dif-
ferent parents. Because of the way that UCT works, this occurrence is treated as reaching two
distinct states. Although there are reasonable gains to be made in avoiding duplicated work
by saving those states and backing up their rewards to all their parent states, again we found
that the hashing and lookup time dominated the saved computation, and that no notable quality
gains were made by using this backup methodology.
We could consider using an alternative approach to adapting UCT to a DAG. Instead of
treating states reached from differing parents as different states, we could keep track of how
they were reached each time they were reached and update their parent states accordingly.
This would allow us to reduce the number of simulations needed for high-depth experiments,
especially experiments where adjoin operations are performed repeatedly.
We may also be able to use STRUCT as the output generator of a dialog system, similar to
NJFun [3], instead of the template-based generation that most such systems employ. STRUCT
would be substantially more flexible in its output than a template-based system, which can
usually only respond to preprogrammed error cases. STRUCT’s ability to partially accomplish
communicative goals would be valuable in this application.
We could also consider using a different semantic language. Our choice of first-order logic
predicates was made following Koller’s work, and allowed us to easily compare to CRISP.
However, other work has suggested a lambda-calculus semantic model [28] which we feel
holds promise. Initial experimentation has suggested that drastic speed increases may result
from this choice of semantic representation.
We are also interested in reward functions. The reward functions we have created are good
for generating language based on facts about the world, and would serve for goal-directed
communication, but those are not the only possible goals of natural language generation. If we
could learn a reward function from a corpus and attempt to emulate that corpus in our speech,
or build a hierarchical reward function which could attempt to accomplish secondary or tertiary
45
goals in addition to a primary one. We should explore the possibilities that the generality of
our architecture afford us.
In conclusion, we have presented an algorithm which performs Natural Language Gen-
eration in a well-principled way, unifying two popular schools of thought regarding NLG.
Experimental evaluation shows that it performs as well as the state-of-the-art in the field, and
its nature allows for a substantially greater set of use cases than either of the popular schools it
synthesizes.
46
APPENDIXExample Grammars
A.1 Basic Experiment
Figure A.1 Grammar for the basic experiment
” grammar ” :
{
” i . nvn ” : ” ( S (NP−s u b j ) (VP(V+− s e l f ) (NP−o b j ) ) ) ” ,
” i . np ” : ” (NP(D) (N+− s e l f ) ) ” ,
” i . d ” : ” (D+− s e l f ) ” ,
” i . cv ” : ” (V+− s e l f ) ” ,
” a . ad ” : ” (N(A+) (N∗− s e l f ) ) ” ,
” a . sub ” : ” (N(N∗− s e l f ) ( PP ( P+−c l a u s e ) (VP(V−c l a u s e v e r b |
s e l f , c l a u s e o b j ) (NP−c l a u s e o b j ) ) ) ) ”
}
47
Figure A.2 Lexicon for the basic experiment
” l e x i c o n ” :
{
” i . nvn ” : [{” word ” : ” c ha s ed ” , ” meaning ” : ” c h as ed ( sub j , o b j
) ”} , {” word ” : ” a t e ” , ” meaning ” : ” a t e ( sub j , o b j ) ”} ] ,
” i . d ” : [{” word ” : ” t h e ” , ” meaning ” : None } , {” word ” : ” a ” , ”
meaning ” : None } ] ,
” i . np ” : [{” word ” : ” c a t ” , ” meaning ” : ” c a t ( s e l f ) ”} , {” word
” : ” dog ” , ” meaning ” : ” dog ( s e l f ) ”} ,
{” word ” : ” t r e a t ” , ” meaning ” : ” t r e a t ( s e l f ) ”} ] ,
” a . ad ” : [ ] ,
” i . cv ” : [{” word ” : ” c h as ed ” , ” meaning ” : ” ch as ed ( s e l f ) ”} ] ,
” a . sub ” : [{” word ” : ” which ” , ” meaning ” : None } ]
}
Figure A.3 World and Goal for the basic experiment
{
” wor ld ” : [ ” ch as e d ( d1 , c ) ” , ” dog ( d1 ) ” , ” c a t ( c ) ” ] ,
” g o a l ” : ” ch as e d ( d1 , c ) ”
}
Figure A.4 Example input for Halogen; appears in [1]
(A / I h a v e t h e q u a l i t y o f b e i n g l
:DOMAIN ( P / [ p r o c u r e l
:AGENT ( A2 / [ American [ )
: PATIENT (G / [ gun , a rml ) )
:RANGE ( E / [ easy , e f f o r t l e s s [ ) )
48
LIST OF REFERENCES
[1] K. Knight and V. Hatzivassiloglou, “Two-level, many-paths generation,” in Proceedingsof the 33rd annual meeting on Association for Computational Linguistics, pp. 252–260,Association for Computational Linguistics, 1995.
[2] L. Goasduff and C. Pettey, “Gartner says worldwide smartphone sales soared in fourthquarter of 2011 with 47 percent growth,” Gartner, Inc, vol. 15, 2012.
[3] D. Litman, S. Singh, M. Kearns, and M. Walker, “NJFun: a reinforcement learning spo-ken dialogue system,” in Proceedings of the 2000 ANLP/NAACL Workshop on Conversa-tional systems-Volume 3, pp. 17–20, Association for Computational Linguistics, 2000.
[4] L. Kocsis and C. Szepesvari, “Bandit based Monte-Carlo planning,” in In: ECML-06.Number 4212 in LNCS, p. 282293, Springer, 2006.
[6] R. Bellman, Dynamic Programming. Princeton University Press, Princeton, NJ, 1957.
[7] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, vol. 1. CambridgeUniv Press, 1998.
[8] M. Ghallab, D. Nau, and P. Traverso, Automated planning: theory & practice. AccessOnline via Elsevier, 2004.
[9] A. L. Blum and M. L. Furst, “Fast planning through planning graph analysis,” Artificialintelligence, vol. 90, no. 1, pp. 281–300, 1997.
[10] R. Brafman and M. Tennenholtz, “R-max-a general polynomial time algorithm for near-optimal reinforcement learning,” The Journal of Machine Learning Research, vol. 3,pp. 213–231, 2003.
[11] M. Kearns, Y. Mansour, and A. Ng, “A sparse sampling algorithm for near-optimal plan-ning in large markov decision processes,” in International Joint Conference on Artifi-cial Intelligence, vol. 16, pp. 1324–1331, LAWRENCE ERLBAUM ASSOCIATES LTD,1999.
49
[12] S. Gelly and Y. Wang, “Exploration exploitation in Go: UCT for Monte-Carlo Go,” 2006.
[13] J. W. Cowan, The complete Lojban language. Logical Language Group, 1997.
[14] “Context free grammars.”
[15] I. Langkilde-Geary, “An empirical verification of coverage and correctness for a general-purpose sentence generator,” in Proceedings of the 12th International Natural LanguageGeneration Workshop, pp. 17–24, Citeseer, 2002.
[16] A. Koller and M. Stone, “Sentence generation as a planning problem,” in ANNUALMEETING-ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, vol. 45, p. 336,2007.
[17] S. M. Shieber, “A uniform architecture for parsing and generation,” in Proceedings of the12th conference on Computational linguistics - Volume 2, COLING ’88, (Stroudsburg,PA, USA), p. 614619, Association for Computational Linguistics, 1988.
[18] M. Kay, “Chart generation,” in Proceedings of the 34th annual meeting on Associationfor Computational Linguistics, ACL ’96, (Stroudsburg, PA, USA), p. 200204, Associationfor Computational Linguistics, 1996.
[19] M. Stone, C. Doran, B. Webber, T. Bleam, and M. Palmer, “Microplanning with com-municative intentions: The SPUD system,” Computational Intelligence, vol. 19, no. 4,pp. 311–381, 2003.
[20] A. Koller and R. P. A. Petrick, “Experiences with planning for natural language genera-tion,” Computational Intelligence, vol. 27, no. 1, p. 2340, 2011.
[21] E. Reiter and R. Dale, Building Natural Language Generation Systems. Cambridge Uni-versity Press, Jan. 2000.
[22] D. Bauer and A. Koller, “Sentence generation as planning with probabilistic LTAG,” Pro-ceedings of the 10th International Workshop on Tree Adjoining Grammar and RelatedFormalisms, New Haven, CT, 2010.
[23] M. Fox and D. Long, “PDDL2.1: An extension to PDDL for expressing temporal plan-ning domains,” Journal of Artificial Intelligence Research (JAIR), vol. 20, pp. 61–124,2003.
[24] A. Blum and M. Furst, “Fast planning through planning graph analysis,” Artificial intelli-gence, vol. 90, no. 1, pp. 281–300, 1997.
[25] I. Langkilde and K. Knight, “Generation that exploits corpus-based statistical knowl-edge,” in Proceedings of the 17th international conference on Computational linguistics-Volume 1, pp. 704–710, Association for Computational Linguistics, 1998.
50
[26] D. Bauer, “Statistical natural language generation as planning,” Proceedings of the 44thannual meeting on Association for Computational Linguistics, 2009.
[27] J. Hoffmann and B. Nebel, “The FF planning system: fast plan generation through heuris-tic search,” J. Artif. Int. Res., vol. 14, p. 253302, May 2001.
[28] Y. W. Wong and R. J. Mooney, “Learning synchronous grammars for semantic parsingwith lambda calculus,”