Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681 1 Abstract This project involves an application that runs in the background that waits for incoming messages, and responds with an answer. This application is what is called “chatbot” and it responds to English questions about stock quotes. A chatbot is a computer program that runs without human interaction and replies to messages that are sent to it. A chatbot is short for “chatting robot”. This chatbot combines the functionality of the Jabber protocol for messaging, Alice for natural language rule based processing and Cocoa for the user interface. It runs under Mac OS X and has been verified to run on version 10.2.2 of the OS. Acknowledgements The author would like to thank Dr. Tony White for all of his advise and guidance throughout this project. The author would also like to thank all the people that worked on the Alice tool, especially the ones that created J-Alice.
34
Embed
Honours Project 2 - Carleton Universitypeople.scs.carleton.ca/~arpwhite/documents/honours... · Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681
1
Abstract
This project involves an application that runs in the background that waits for
incoming messages, and responds with an answer. This application is what is called
“chatbot” and it responds to English questions about stock quotes. A chatbot is a
computer program that runs without human interaction and replies to messages that are
sent to it. A chatbot is short for “chatting robot”. This chatbot combines the
functionality of the Jabber protocol for messaging, Alice for natural language rule based
processing and Cocoa for the user interface. It runs under Mac OS X and has been
verified to run on version 10.2.2 of the OS.
Acknowledgements
The author would like to thank Dr. Tony White for all of his advise and guidance
throughout this project. The author would also like to thank all the people that worked on
the Alice tool, especially the ones that created J-Alice.
Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681
2
Table of Contents:
1) Introduction pg. 4 1.1) Chatbot pg. 4 1.2) Jabber pg. 4 1.3) Alice pg. 5 1.4) Cocoa pg. 5 2) Alice pg. 5 2.1) How Alice Works pg. 6 2.2) Rules pg. 6 2.3) AIML pg. 7 3) Jabber pg. 8 3.1) Architecture pg. 8 3.2) Message Example pg. 9 4) Chatbot pg. 10 4.1) Purpose pg. 11 4.2) Choice of Technology pg. 11 5) User Interface pg. 12 6) Program Flow pg. 14
6.1) Receiving a Question pg. 14 6.1.1) Connecting to the Jabber Server pg. 15
6.1.2) Receiving a Message pg. 17 6.1.3) Parsing the Message pg. 18 6.2) Processing the Question pg. 19 6.2.1) Interacting with Alice pg. 19 6.2.2) Stock Handler pg. 20 6.2.3) Rules for Stock Handler pg. 22 6.3) Replying With an Answer pg. 23 7) Testing pg. 24 7.1) What was expected pg. 24 7.2) Connecting to the Jabber Server pg. 24 7.3) Responding to Messages pg. 25 7.4) Results pg. 27 8) Conclusion pg. 28 8.1) Future Work pg. 29 8.2) Bugs pg. 30 9) References pg. 30 10) Licenses pg. 31 11) Appendix A A-1
Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681
3
List of Figures:
FIGURE 1 - Screen shot of the chatbot user interface. pg. 12
FIGURE 2 - Screen shot of the chatbot after connecting to the server. pg. 14
FIGURE 3 - Two messages received while connecting to the server. pg. 25
FIGURE 4 - Screen shot of the chat session within Fire. pg. 26
FIGURE 5 - A screen shot showing the received messages. pg. 27
FIGURE B-1 - Simple system overview. B-1
FIGURE B-2 - UML overview of the chatbot architecture. B-2
FIGURE B-3 - Message Sequence Chart for Chatbot Connection B-3
FIGURE B-4 - Message Sequence Chart for Incoming Messages B-4
FIGURE B-5 - Message Sequence Chart for Alice Response B-5
Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681
4
1) Introduction
This project is a chatbot that responds to queries about stock prices and returns
current price of that stock. It contains a User Interface (UI) that was written in Cocoa
using Objective-C. In addition to the UI, the chatbot contains code that works with the
Jabber Instant Messaging (IM) protocol, and Alice for natural language processing. An
simple overview of the system can be seen in Fig. 6.
1.1) Chatbot
A chatbot is a program that runs in the background on a computer connected to a
network that waits for messages to be sent to it. Once a message is received from a user,
the chatbot decides what to respond with, and sends a message back to the user. This
way, the program can run unattended, and it makes its own choices of what to respond
with without human interaction or supervision.
1.2) Jabber
Jabber is an open-source Instant Messaging protocol that is based on XML.
Jabber. Jabber has other attractive features, such as: the server is free, it has transports
that allow it to work with other IM schemes, and the protocol is simple. The use of other
transports for MSN, ICQ, etc. are not used in this project and therefore will not be
discussed.
Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681
5
1.3) Alice
Alice (Artificial Linguistic Internet Computer Entity) is an open-source program
that does natural language processing using rules. It is mostly used as a robot for
chatting. It uses AIML (Artificial Intelligence Markup Language), which is an XML-
compliant language for the rules. There is a version called J-Alice that is written in C++,
which is the version used for this project.
1.4) Cocoa
Cocoa is a framework from Apple that runs under Mac OS X. Cocoa was
developed from OpenStep and is therefore tied in with Objective-C. Objective-C is an
object oriented language that is very similar to C++. The other language that works with
Cocoa is Java, but since the Jabber and Alice code is in C++, Objective-C is great
because it can work with unison with C++. In addition, the development tools to Cocoa
are free and included with Mac OS X.
2) Alice
The next three sections describe how Alice works, AIML files, and the rules that
are inside these files.
Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681
6
2.1) How Alice Works
The internal workings of Alice are basically composed of two items. The first is
the Kernel, the second is the Handlers that work with the additional XML tags. The
kernel is responsible for loading the AIML files, which contain the rules, and to process
statements. In the chatbot, a new kernel is created when the application is launch. This
new kernel reads in specified AIML files. Once the files are read in, Alice is ready to
match up statements to rules, and provide an answer. When a statement is passed into
Alice, it will match it up to a rule, and based on the information in the rule, provide a
logical response.
2.2) Rules
Once the AIML files are read in, each rule is “learned”. When Alice learns a rule,
it is able to match the rule up to a message passed in that was received from an outside
user. Matching up a rule occurs by looking at the message and seeing if the grammar in a
rule matches it. For example, if the rule is “_ school”, which matches up the word
“school” after the beginning of a sentence, Alice would match it up to the message “I am
at school”. This is true since the word school is after the beginning of the sentence. By
having multiple rules, Alice can give a meaningful answer to a wide range of questions
about the same topic.
There are two main symbols that are used for most of the rules for the chatbot. They are
the ‘*’ and ‘_’ symbols. The ‘*’, or wildcard symbol tells Alice to match up any word
Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681
7
for this symbol. So the rule “I like ” would match up to “I like computers” or “I like
skiing”. The ‘_’ symbol can be used a the beginning or the end of a rule. If it is used at
the beginning, like the rule”_ dogs”, Alice will match up any sentence that has the word
“dogs” after the beginning of the sentence. If it is used at the end of a rule, like “_
school”, Alice will match up any sentence that ends with the word school. The
‘*’ and ‘_’ symbols can be used together in a rule, but Alice only supports one ‘*’ per
rule.
2.3) AIML
The files that are read in that contain the rules are called AIML files. The files are
based on XML and contain information about how to handle specific questions. For the
use in the chatbot, there are 4 important tags that are used in the AIML files. An example
of an AIML containing one rule and the meaning of each tag are as follows:
<aiml version=”1.0”>
<category>
<pattern>_ happy *</pattern>
<template>Why are you happy?<template>
</category>
</aiml>
<aiml> - This tag indicates that the file is of type AIML and the version is 1.0.
Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681
8
<category> - This tag tells Alice that everything inside of it is a unit of knowledge which
is also referred to as a rule.
<pattern> - This tag is what needs to be matched up for the rule to occur. In this case if
the question passed in contains the word “happy” somewhere other than the beginning of
the sentence, the result “Why are you happy?” will be passed back to the user.
<template> - This tag contains the answer that Alice will pass back to the user if the rule
is matched up with the question.
There are many more tags available to use, such as the <srai> tag for recursive pattern
matching, but since they were not in the scope of this project, they will not be discussed
here. More information can be found on the web at http://www.alicebot.org.
3) Jabber
The next two sections describes how Jabber works and an example of an instant
message.
3.1) Architecture
Jabber uses a client-server architecture as opposed to a client-client architecture
that some other IM systems use. This enables Jabber user to message other users who are
not on the same Jabber server. When a user is ready to login to a server, a TCP/IP
connection is made on port 5222. This connection will stay alive until the user logs off.
When a message arrives on the server for a user, the user’s client is set the message. This
means that the client does not have to poll the server to see if there are messages waiting.
Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681
9
This reduces the amount of traffic flowing to the clients. Because each user could be on a
different server, the user’s address is their username @ servername. For example, it
could be [email protected] All messages that are exchanged between the server
and clients, including connection, registration, and instant messages use Jabber’s XML
protocol.
If two users are communicating together, but they are on different servers, Jabber
ensures that the message goes to the right person. Suppose there are two users, user1 and
user2, and they are on server1.com and server2.com respectively. If [email protected]
sends a message to [email protected], server1 will connect with server2 and deliver the
message. server2 will then forward the message on to user2.
3.2) Message Example
In the Jabber protocol, there are two different types of instant messages that can
be passed between the server and client once the client has logged in. The first is a single
instant message, and the second is a chat message. A single message is exactly what it is;
it is one message that is not part of a group of messages. The message is sent and the
client waits for a response. A chat message on the other hand is part of a chat session,
where each message is part of a group of messages that each user can see. The advantage
to chatting is that you don’t need to fill out a new message each time you want to say
something. The chat window stays open and all communications between users stays
visible until the session is over.
Natural Language Processing Chatbot for Stock Quotes Honours Project Report Matt Austin - 264681
10
For the example, a single instant message will be used since it is very similar to a
chat message but contains a little more information. Here is a sample message, and a