Word Vectorisation for Long-Short-Term-Memory (LSTM) model ...

NATIONAL UNIVERSITY OF SINGAPORE

Department of Physics

Word Vectorisation for Long-Short-Term-Memory

(LSTM) model on Chatbot and Analysis of Model’s

Dynamical Patterns

Lim Yanxiang Louis

A0140537L

Supervisors:

Orkan Arkan (Director of EY Data & Analytics)

Dr Hong Cao (Head of Data Science, EY)

Dr Feng Ling (Assistant Professor in NUS)

5th April 2019

Page i

Abstract Word Vectorization for Long-Short-Term-Memory (LSTM) model on Chatbot and Analysis of Model’s

Dynamical Patterns

Lim Yanxiang Louis, National University of Singapore (Singapore)

Chat services are needed in almost every business. Industries are using rule based chatbots to automate

chat services, however, they are faced with limitations. In this study, we build a generative based

chatbot using the Ubuntu Dialogue Corpus. This type of chatbot have the potential to answer all

technical questions about the Ubuntu operating system. We first analysed the corpus and sent it

through a Natural Language Processing (NLP) pipeline. We applied 3 different pre-trained word

embedding models (word2vec, GloVe and fastText), which vectorizes the words corpus and trained it on

a Long Short-Term Memory (LSTM) model. We studied the results of the embeddings and found that the

weight distribution became more heterogeneous during the training process. GloVe performed the best

in terms of accuracy for both fundamental and technical analysis. We tried to draw analogies between

neural activations and Ising spins by analysing the distribution of the model's activation.

Page ii

Acknowledgements An end to my formal education leads to the start to of my career. This is at this moment when life

begins.

It was an insane final year.

Before the semester started, I had to go through a lot of administrative work to coordinate between

NUS and EY, to understand the criteria of the final year project since the scope of this project is

unconventional to the physics department. All these were done when I was still in Munich, Germany

doing my exchange program. I had to consider the time difference when making a call to back to

Singapore to discuss about the project.

In the first semester, more than 12 hours were spent almost every weekday in the science library to

learn more about data science. I went from thinking that “pandas” referred to an adorable animal in

China to using this library almost every day.

In the second semester, I was fortunate enough find an internship in e-commerce company, Castlery,

where I worked 3 days a week in the business intelligence/ data analyst team, while balancing school

and this final year project.

The decision to learn more about data science put me out of my comfort zone and made life a lot

tougher than how it already was. However, I am very thankful that I have done it, broadening my

knowledge on this field.

To my supervisors at EY,

Orkan, thank you for introducing the world of data science to me, advising me on the initial steps and

planning such an interesting and relevant industrial project for me.

Dr Hong Cao, thank you for voluntarily joining the project. You got me to think about the implications of

data science from a business perspective and how to make my project valuable to both the academic

and industrial setting.

To my co-supervisor, Dr Feng Ling, thank you for spending so much time and effort on top of your busy

schedule to make sure that the project was on track. Really enjoyed the times when we brain-stormed

on ideas to make the project more interesting by including physics elements to the project.

A special thanks to my physics and Sheares hall senior, Jia Hui, although you have graduated even

before I joined NUS, you still came back to help the juniors. This includes imparting your knowledge on

data science, patiently guiding me to understand the theories I applied in this project and assisting me

whenever I had any problem with my codes.

To my mentor at Castlery, Manuel, thank you for making the internship such a valuable one, imparting

relevant knowledge to me. Appreciate the times we had small talks during lunch and after work,

discussing about the project and giving me some of your thoughts and feedbacks. You have been an

inspiration for me to learn more about data science.

Page iii

My friends in physics, notably Sherman, Abby and Jasper (semi physics), thank you for camping in the

library with me for crazy hours and finishing off the day with Bishan chicken rice. Not to forget Waxin

and Shouzan who were always there to entertain my nonsense.

I would like to thank all my friends and family who have been understanding as I disappeared to work

on what is important to me. All these would not have been be possible without all of you. Special

mention to Edward. You were always so willing to understand more about my project even though you

did not really know what was going on, just so that you can push me on to achieve more.

Finally, thank you Min Jee for taking care me during this period. You looked out for my well-being,

planning exercise classes and ensuring I was well nourished during the busiest months. The thoughts of

going out with you during the weekend was what kept me motivated to complete the tasks at hand.

Page iv

Contents Abstract .......................................................................................................................................................... i

Acknowledgements ....................................................................................................................................... ii

1 Introduction .......................................................................................................................................... 1

1.1 Motivation ..................................................................................................................................... 1

1.2 Introduction to Artificial Intelligence and Machine Learning ....................................................... 4

1.3 Chatbots ........................................................................................................................................ 7

1.3.1 Chatbot types ........................................................................................................................ 7

1.3.2 Chatbot Workflow ................................................................................................................. 9

2 Data Preprocessing ............................................................................................................................. 10

2.1 Ubuntu Dialogue Corpus ............................................................................................................. 10

2.2 Preprocessing Computations ...................................................................................................... 10

2.3 Natural Language Processing (NLP) pipeline .............................................................................. 12

2.3.1 Morphological Analysis ....................................................................................................... 12

2.3.2 Syntactic Analysis ................................................................................................................ 13

2.3.3 Semantic Analysis ................................................................................................................ 14

2.3.4 Natural Language Processing in this study ......................................................................... 14

3 Vectorising of text ............................................................................................................................... 15

3.1 Introduction ................................................................................................................................ 15

3.2 Word Embeddings ....................................................................................................................... 15

3.2.1 Word2vec ............................................................................................................................ 15

3.2.2 GloVe ................................................................................................................................... 16

3.2.3 fastText................................................................................................................................ 17

3.2.4 Embedding Comparison ...................................................................................................... 17

3.3 Implementation .......................................................................................................................... 20

4 Deep Learning Framework .................................................................................................................. 23

4.1 Sequence-to-sequence (Seq2Seq) .............................................................................................. 23

4.2 Recurrent Neural Network (RNN) ............................................................................................... 23

4.3 Long Short-Term Memory (LSTM) .............................................................................................. 24

4.4 Training ....................................................................................................................................... 29

5 Results ................................................................................................................................................. 31

5.1 Fundamental Analysis ................................................................................................................. 31

Page v

5.1.1 Accuracy and loss ................................................................................................................ 31

5.1.2 Weights and Biases ............................................................................................................. 33

5.1.3 Fundamental Insights .......................................................................................................... 35

5.2 Technical Analysis ....................................................................................................................... 35

5.2.1 Technical Insights ................................................................................................................ 35

6 Neuron Activation and Ising Spins ...................................................................................................... 38

6.1 Phase Transition and Ising Model ............................................................................................... 38

6.2 Model and Results ...................................................................................................................... 38

7 Future work and Conclusion ............................................................................................................... 43

8 Bibliography ........................................................................................................................................ 45

9 Appendix ............................................................................................................................................. 48

9.1 Weight and Bias Results .............................................................................................................. 48

9.1.1 Evolution over 1000 epochs ................................................................................................ 48

9.1.2 Model Comparison .............................................................................................................. 60

9.2 Technical Results ......................................................................................................................... 63

9.3 Activation Analysis ...................................................................................................................... 88

9.3.1 Reverse-CDF ........................................................................................................................ 88

9.3.2 Activation Variance ............................................................................................................. 94

9.4 Codes ......................................................................................................................................... 102

9.4.1 Chatbot Building ................................................................................................................ 102

9.4.2 Analysis of results.............................................................................................................. 108

9.4.3 Analysis of model parameters .......................................................................................... 109

Page 1

1 Introduction In this paper, we seek to understand the work flow of building a chatbot and apply the concepts of

physics to study the learning behavior of artificial intelligence in a chatbot.

1.1 Motivation In the last decade, machine learning has been trending. It is common for almost all industries to adopt

this technology to avoid falling behind their competitors. This is no different in the professional services

industry. EY’s data and analytics team has been established to leverage on big data and advanced

technologies to disrupt everything, providing new insights, value and action for the company. These

gives the company a competitive advantage to day-to-day business processes [1].

One of the most exciting artificial intelligence technology to date is the development of chatbots. A

chatbot, or question-answering system has been around since the 1970s. However, most of these

systems are limited to short and factual answers with no memory of the previous question asked [2].

Recently, due to advancement in technologies, we can generate more sensible answers to complex

questions. A break through in the chatbot technology will impact the business environment and the lives

of technology adopters. Hence, we study the current state of chatbot technology and study its relevance

to statistical physics, a field that has a much longer history as compared to data science. It will be a

scientific breakthrough if we are able apply our knowledge on statistical physics and have a better

understanding on chatbot technology or other machine learning technologies

This study is motivated by 3 main domains: business, technology and physics

The most clear-cut business application would be to improve the customer services sector. According to

IBM, during a 6 minutes customer service call, the customer service agent spends 75% of that time doing

manual research [3]. Businesses are turning to more cost and time effective alternatives to improve

their customer services. Chatbot software products can service customers 24/7 without much waiting

time. Scottish Bank, Royal Bank of Scotland (RBS) is using a chatbot trained on 1,000 responses to more

than 200 customer queries [4]. IBM predicted that 85% of all customer interactions will be handled by a

chatbot before 2020 [3]. Customer service will improve when businesses opt to adopt the chatbot

technology

Chatbot is not just beneficial to the customer service department, staffs within the company can also

utilize this service. Real business data is often stored in many sources, unfiltered and difficult for

humans to interpret. This is a problem that many companies are facing as they have different suppliers

and many different enterprise systems. A good chatbot software will understand what the staff requires

from his or her question and automatically extract, transform and load (ETL) the data, making it

interpretable to the staff. For instance, staffs from the operations department can keep track of their

operations simply by verbally asking their computer “where is my cargo now?” and get a reply almost

immediately as it searches through the database. This is not limited just to operation but any form of

data from marketing to finances. A good chatbot can easily join all their data and give them a real time

update of their business. Brazil’s biggest bank, Banco Bradesco has built a chatbot not only for

customers, but also agents to find 283,000 questions a month with an accuracy of 95% [4]. Adopting

chatbots will give data driven companies an edge over their competitors.

Page 2

Another business implication would be for the chatbot to replace applications or apps for short. Mobile

apps are very handy, we have apps for almost everything. Apps allow us to find out information on the

weather, order ourselves food, clothes or even call a taxi with a few clicks on our phone. However, to

perform each of these tasks requires a different app and this takes up a lot of storage space, something

which is getting more valuable each day as data is getting more important. With improved chatbot

technology, these tasks can be further simplified to an all in one platform. We can ask one chatbot to

perform all the tasks mentioned above and more, serving as an artificial butler. This idea is often seen in

science fiction movies such as in Iron Man where Tony Stark/Iron Man builds a chatbot called J.A.R.V.I.S

(Just A Rather Very Intelligent System) to assist him in almost everything he needs. Leading chatbot

technology services such as Google assistant is working towards this goal. A good chatbot technology

could do away with multiple apps, integrating companies in the backend while users have an all in one

artificial butler to serve them. When chatbots become more advanced, what used to science fiction will

become a reality and businesses need to adopt this technology to remain relevant.

The potential a chatbot can bring to businesses are limitless if companies know how to leverage on this

technology. However, its limitation could hinder business adoption.

Chatbots that the industries use is only capable of responding to frequently asked questions that it is

programmed to answer. If you have a more specific question, you will still require a human to intervene.

If chatbots have the ability to think and understand questions just like a human, they will be able to give

an answer similar or possibly even better than what a human is capable of at a fraction of the time.

In this study, we aim tackle this problem by researching on a type of chatbot that can respond to any

question, not just limited to the ones it was trained on. With this, we will move on to the technological

motivation.

We can read text from this report with ease. The same cannot be said about a computer. It is only

capable of making computations from numbers and makes no sense of alphabets, words and sentences.

We will understand how computers tries to understand the human language, English for this study. The

ability for AI to understand human languages, also known as natural language processing, is not only

relevant to building chatbots, but also in many other developing AI technologies. For instance, natural

language processing is also used to analyze the sentiment of texts, understanding the meaning of

sentences and can execute many tasks such as summarizing a long paragraph or clustering similar text

together to give recommendations to users based on articles they have been reading.

We will study different vectorization models commonly used in the industry, understand how the

computer convert words into numbers to process what humans are trying to tell them. The different

vectorization models are then applied in the context of a chatbot and evaluated. We will also be

exploring a state-of-the-art technology still not used by industries. Generative based chatbots (chatbot

that can respond to questions it has never been exposed to) are currently too inaccurate and unreliable

for industries. They use rule based (chatbot that respond based on what it is being taught) which is

easily implemented and reliable, however is limited to answer only questions that it was trained on.

Generative based chatbots, on the other hand, have is no limit to the types of questions it can answer

[6]. Hence, we study the implementation of a generative based chatbot.

We move on to the last motivation of this project and explore how physics is relevant.

Page 3

I consider a physicist to be a data analyst of the physical world. We have abundant data about how the

universe works and have developed theories and equations to model the universe. Some of these

equations are so complicated, advanced mathematical tools are required for these models. This results

in a long computation time whenever a variable is changed. With the same data, after training, neural

networks have the ability to match the inputs and outputs, constructing its own model. These models

might be able to achieve results similar to the complicated models using simple arithmetic operations,

significantly reducing computation cost. Thus, physicist working on projects that requires quick

production might find this a viable alternative after validating the trained model with their complex

model.

In this study, we are using a type recurrent neural network (RNN) model called Long-Short Term

Memory (LSTM). More details on what this is and why this model is implemented will be explain in

chapter 4.

LSTM is a model that computational physicists dealing with time series data are exploring. Due to the

complexity of certain problems, conventional equations are limiting physicist from solving them. There

are some physics problems that have been giving generations of physicists nightmares. They are now

looking at more advanced computational tools such as LSTM to solve these problems. For example, in

condensed matter physics, the Navier-Stokes equation governs the flow of fluids. However, this

equation is listed as one of the Millennium Prize Problems in mathematics as there is no proof that such

a solution even exists [7]. There is a paper titled “A Deep Learning based Approach to Reduced Order

Modeling for Turbulent Flow Control using LSTM Neural Networks” [8] that aims to model turbulent

flow without computing the full Navier-Stokes equation with the help of LSTM .

Another application of LSTM would be in the Large Hadron Collider (LHC) to model the voltage time

series of the magnets. This is possible from the data provided from an electronic monitoring system, in

hopes to detect misbehavior of the magnets and avoid costly damages in the LHC. More on this can be

found on a paper “Using LSTM recurrent neural networks for monitoring the LHC superconducting

magnets” [9].

On top of understanding how data science can be used to solve physics problems, we seek to discover a

mutually beneficial relationship between these 2 fields. Physics has been around for a very long time

and serves as a foundation to many other sciences such as Quantum Chemistry. Data science is a

relatively new field of science and there are still many areas where scientists are still debating on. For

example, many advanced models are often described as a “black box” as we are not sure of what is

going on inside them. We aim to marry physics theorems into data science by drawing analogies with

physical phenomenon. With this inspiration, we might be able to catalyse the advancement of data

science.

Now that we have understood the motivations behind this project, we will give a brief introduction on

artificial intelligence and machine learning defining some commonly used jargons in this field.

Page 4

1.2 Introduction to Artificial Intelligence and Machine Learning What is Artificial Intelligence (AI)?

“Artificial intelligence refers to the simulation of human intelligence in machines which are programmed

to think like humans and mimic their actions. The term may also be applied to any machine that exhibits

traits associated with a human mind such as learning and problem solving.”

– Investopedia, New York City based website that focuses

on investing and finance education and analysis. [10]

The term ‘AI’ was first coined by American computer scientist John McCarthy in 1956 during the first

academic conference on the subject. AI was then founded as an academic discipline on the same year. In

fact, AI was around years before this. The famous Turing test, by Alan Turing, was developed in 1950 to

test a machine’s ability to exhibit intelligent behavior [5]. The earliest form of AI used in games was a

checkers-playing program written by Christopher Strachey and a chess-playing program written

by Dietrich Prinz in 1951. In October 2015, AlphaGo by Alphabet Inc developed the first program to beat

a human professional in the board game Go (commonly known in Singapore by its Chinese name, Weiqi)

[11]. In August 2017, inventor Elon Musk’s startup, OpenAI, released an AI program during an eSport

(Dota 2) tournament [12]. This shocked many as it became the first ever AI to defeat professional

players in a complex game.

You may be wondering, how did AI evolve from learning to defeat professional humans in board games

like Go, to defeating professionals in a complex eSport like Dota 2 over just 2 years. This is because

instead of feeding algorithm that tells the AI what to do, scientists have written programs that allows

machine to learn from past data, generating new output and solutions to solve the problem. With the

concept of artificial intelligence, I will be introducing to you the next buzz word after AI: Machine

learning (ML).

“Machine learning is a method of data analysis that automates analytical model building. It is a branch

of artificial intelligence based on the idea that systems can learn from data, identify patterns and make

decisions with minimal human intervention.”

-SAS Institute, American multinational developer of analytics software [13]

Machine learning has been around since the 1980s. It is classified into 3 main categories: supervised

learning (learning from known answers), unsupervised learning (learning from internal structure of data)

and reinforcement learning (learning from experience) [13]. Computers use mathematical models to

study data and make their own predictions. The key point about machine learning is that it requires

minimal intervention and learns directly from the data provided. Example of such models include linear

regression, clustering and classification. Before machine learning, scientist had to fit input data together

with a function written into a program code for the model to generate an output, for instance, scientist

had to calculate the coefficients when they wanted to fit the data into a linear regression model. With

machine learning, all they need is the input data and output data, feed it into the computer and the

computer will generate its own coefficients for the model, making modelling more efficient and

accurate. Differences between machine learning and the traditional mathematical computation are

illustrated in Fig 1.2.1 and 1.2.2 respectively.

Page 5

Figure 1.2.1 Traditional mathematical computation method Figure 1.2.2 Computation with Machine Learning

Traditional mathematical computations are still commonly used today for less complex problems.

However, in a VUCA (short for volatility, uncertainty, complexity and ambiguity) world, simple solutions

are often not enough to solve complex problems. Hence, we seek for a new approach, which brings us

to the next buzz word: artificial neural networks (ANN).

While scientists were trying to improve on artificial intelligence, they went to study the human

intelligence and try to implement it on machines. They studied biological neural networks, getting

inspiration from how the neurons and synapses in the brain works, modelling the architecture on

computers. Neural networks (indicating to artificial neural networks rather than biological neural

networks for the rest of this paper), a subset of machine learning, is based on a collection of nodes

called artificial neurons and are connected to each other, sending electrical signals, similar to synapses

in the brain. Some of these neurons contain more critical features than others, hence, the neural

network must be able to differentiate the importance of each neuron. The idea of a neural networks

(also called “perceptrons”) have been around from as early as the 1940s [14]. However, early neural

networks do not have the capability to learn. Every single neuron’s importance (or weights) do not

change automatically when you feed in more data. These machines are not learning from the data and is

a form of artificial intelligence but cannot be classified as machine learning. It was only in 1969 where

the idea of backpropagation was first proposed and became a mainstream part of machine learning in

the mid-1980s. Backpropagation refers to the ability for weights in the hidden layer to adjust based on

the accuracy of its output. It calculates a loss function, which computes the deviation between the

predicted output and the output fed into the training data, through various methods such as cosine

proximity or cross entropy. The network will try to optimise the system, minimising the loss function and

maximising its accuracy, through adjusting the weights. This made neural networks a lot more advanced

than what it used to be.

What made AI, ML and ANN so successful these days? This will be answered by the final buzz word that

will introduced in this chapter: deep learning.

‘Deep learning is a collection of algorithms used in machine learning, used to model high-level

abstractions in data through the use of model architectures, which are composed of multiple nonlinear

transformations. It is part of a broad family of methods used for machine learning that are based on

learning representations of data.’

-Techopedia, IT education website that provides insight and inspiration [15]

Computer

Input Data

Program Code/

Function

Computer

Input Data

Output

Page 6

Deep learning has been introduced to the machine learning community by Rina Dechter in 1986 and to

artificial neural network community by Igor Aizenberg in 2000 [16].

In general, you could think of deep learning as an architecture which sends the input through multiple

layers of machine learning models before churning out an output. In the case of ANN, deep learning

refers to stacking the number of hidden layers before giving an output. Although it is still unclear on

what each layer does, some scientist believes that each layer takes care of a certain feature of the input.

For instance, if you fed the machine a picture of a cat, the first layer might be looking at the eyes, while

the second layer, the ears and so on. However, there are conflicting arguments with regards to this

hypothesis and there is no significant evidence to support or reject this theory.

The accuracy provided from deep learning sky rocketed as compared to a single layer neural network.

This is partly because we are in an age of where there is an inflation of data. Some people believe that

90% of the data in the world was created in the last 2 years. With more data, deep learning performs

better as compared to its traditional counterparts. In figure 1.2.3, we can see that with small amount of

data, older learning algorithm may perform better than deep learning algorithms. However, as we have

more data, older learning algorithms plateaus while deep learning models continue improving its

performance [17].

Figure 1.2.3 Graph of performance against amount of data, comparing deep learning and older learning algorithms [17]

“The analogy to deep learning is that the rocket engine is the deep learning models and the fuel is the

huge amounts of data we can feed to these algorithms.”

- Andrew Ng, the chief scientist of China’s major search engine

Baidu and one of the leaders of the Google Brain Project [17]

Page 7

Figure 1.2.4 Summary of Artificial Intelligence, Machine Learning and Deep Learning [18]

Today, artificial intelligence and machine learning is used in almost every industry to improve workflow.

Most prevalent application of AI is used in predictive modeling, computer vision and time series analysis.

Predictive modeling refers to predicting an unknown event from the data provided. For instance,

predicting the species of a butterfly from the features such as its wing span length, or identifying a

criminal based on evidence provided. Computer vision refers to classifying image data based on the

features in the photo. For instance, the computer learns to differentiate between a photo of a dog and a

cat. Time series, as its name suggests, involve predicting data with a time element. An example would be

predicting how the stock market price fluctuates. A chatbot will be considered a time series prediction

as the position of words matters and these word positions can be treated as different time points.

In this study, we look at how we use deep learning in time series analysis and implanting it into a

chatbot.

1.3 Chatbots A chatbot is simply a conversation agent that interacts with human turn by turn using natural language.

Some if the more popular chatbots in the industry are Amazon Echo, Google Assistant and Siri.

1.3.1 Chatbot types In this section, we will discuss the different types of chatbots, its functionalities, followed by the

workflow of building a chatbot.

There are various types of chatbots. This is represented in Figure 1.3.1.3 shown below. Chatbots are

categorized into the domains they operate in and based on the kind of response they return.

The conversation content can either come from an open or closed domain. For an open domain, the

chatbot get its training data from an open source such as the internet, giving it the capability to answer

a vast variety of topics. A close domain refers to getting training data from a closed source of data with a

specialized area of expertise.

The kind of reply a chatbot gives can be categorized into either retrieval-based or generative-based.

Even the most advanced chatbots in the industry we have today are retrieval-based. This means that a

fixed reply is already encoded to the chatbot. It can be built simply by using rule-based sentence

matching or an ensemble of machine learning techniques to return an existing answer that the machine

previously learned. Generative-based on the other hand, are capable of generating any kind of

responses. Its ability in terms of the kind of question they can reply is limitless. However, at the present

Page 8

state of technology, this kind of chatbots are not capable of giving responses that is up to the industrial

standards [6].

Figure 1.3.1.1 Generative-based chatbot model [6] Figure 1.3.1.2 Retrieval-based chatbot model [6]

Figure 1.3.1.3. Chatbot conversation framework [18]

1. Open domain with retrieval-based responses

To have a fixed set of responses from an open domain means that there is a fixed response for any

possible question anyone can think of. This is illogical and hence, impossible to create.

2. Open domain with generative-based responses

We can ask the chatbot any possible question and it is able to generate a reply. This is the most

complete form of a chatbot and the solution to this problem is called Artificial General Intelligence

(AGI). However, we are nowhere near this technology yet.

3. Closed domain with retrieval-based responses

This is the most common type of chatbot where a specified answer has been crafted for a specific

domain. This is the most basic type of chatbot, giving the most reliable answers at present.

However, this type of chatbot is limited to what it has been taught.

4. Closed domain with generative-based responses

This type of chatbot can handle questions from the underlying dataset it was trained on and new

questions outside the underlying dataset, but within the domain. Tend to be more human like and

have their own personality. However, the generated answers are full of grammatical errors. This

approach is still not widely used by chatbot developers and is mostly found in labs [18].

Page 9

In this study, we focus on a closed domain with generative-based responses chatbot. Despite it being

harder to train and giving more inaccurate results, it is the future of chatbots. It will provide us with

more learning opportunities and gather more valuable insights.

1.3.2 Chatbot Workflow In this section, we provide an overview of the steps to build a chatbot. More details on each step can be

found in further chapters.

Firstly, we need to define the objective of the problem. This was elaborated in the motivation section

above. Following which, we import our dataset into an interpreter and start with data exploration where

we have a better understanding of the data we are dealing with.

The most obvious difference between a chatbot problem and most data science problems is the nature

of the data. Most data science problem involves dealing numbers, pictures and texts. Since a computer

has been built to handle numbers, the first kind of problem can be easily dealt with. For pictures, each

pixel in the picture can be represented by a number on the RGB colour model, hence converting it to

numbers easily. For texts, this is something more challenging. We need to vectorize these words into

vectors before we can make any predictions. Hence, this additional vectorization process is needed for a

chatbot problem.

Next, we to send our vectorized input into a model that will give us an output. In this study, we use a

model called Long-Short Term Memory (LSTM). Following which we convert our vectors back to words.

This gives us the reply of the chatbot.

Finally, we integrate the model into a user interface. This is a step which will not be dealt with in this

project due to time constraint and its ability to generate insights relevant to a Physics project.

Figure 1.3.2.1 Chatbot work flow. Last stage will not be implemented, hence labelled in red.

Clean the data with Natural Language Processing (NLP)

Vectorizing with various machine techniques

Run machine learning model LSTM and convert vectors back to words

Integration into chatbot

Page 10

2 Data Preprocessing We can now proceed to discuss the processes for building the chatbot.

2.1 Ubuntu Dialogue Corpus The dataset that is required to build a chatbot is a question and answer text data. In this study, we chose

the Ubuntu Dialogue Corpus (UDC) [21]. A corpus is a large and structured set of texts. The UDC is a

collection of logs from Freenode’s Internet Relay Chat (IRC) network. Freenode IRC is a platform that

facilitates communication in the form of text, used to discuss peer-directed projects. A new user joins

the channel and asks a general question about a problem they have with Ubuntu. A more experienced

user replies with a potential solution, after first addressing the ’username’ of the first user. This is done

to avoid confusion in the channel. At any given time during the day, there can be between 1 and 20

simultaneous conversations happening in some channels. The UDC collates the history of chat from

Ubuntu-related chat rooms [22].

Ubuntu is a free and open source operating system based on a Linux kernel as its foundation. Since it is

an open source product, support from the developers is limited. Most queries on technical support are

directed to the chat room on freenode. We extracted the dialogue corpus from freenote in this study.

The UDC consist of logs from 2004 till today. In this study, we extracted data from the start till 2015. The

dataset contains almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100

million words. A turn refers to a change of user giving an input during the dialogue. An utterance refers

to a text message sent. The reason why there is a large greater number of utterances as compared to

turns is that multiple messages can be sent before a reply is received. The conversations have an

average of 8 turns each, with a minimum of 3 turns.

2.2 Preprocessing Computations All the computations in this study is implemented in the Python programming language. Most of it is

done on the Pycharm interface with a small part of visualization done on Jupyter Notebook and

JupyterLab.

The data is split into 3 different folders, a training folder, validation folder and a test folder. The test set

is used to adjust the weights in the network, the validation set is used to ensure that there is no

overfitting and the test set is used to confirm the predictive power of the neural network. The training

dataset downloaded is 1,757,751KB with 16,587,830 rows and 6 columns. The row comprises of:

‘folder’, ‘dialogueID’, ‘date’, ‘from’, ‘to’, ‘text’.

● folder: The folder that a dialogue originated from.

● dialogueID: An ID number given to a specific dialogue.

● date: A timestamp of the time that the particular dialogue was sent.

● from: The username of the user whom sent the line of dialogue.

● to: The username of the user whom they were replying. On the first turn of a dialogue, this field

is blank.

● text: The text of that turn of dialogue [23]

Page 11

Table 2.2.1 illustrates the corpus.

Due to a high computation power required to process such a large dataset, we sliced the data and ran

50,000 lines.

As mentioned earlier, there were many inputs before a turn. The first step was to combine the multiple

question input into one input and repeat this for the responses. The outcome gives us a list of question

and answer pair, each contained in strings. For questions that does not have a reply, such as a thank you

statement, where the conversation comes to an end, it is paired with an empty string. All the question

and answer pairs in a dialogue with the same dialogue ID are put into a list. Individual list of each

dialogue is appended into a bigger list which contains all the information. For easy visualization, the list

looks like this:

[[[Question, Answer], [Question, Answer], … [Question, Answer],[Question, ‘ ’]],[[Question, Answer],

[Question, Answer],…]… …]

Table 2.2.1 Table of first 15 inputs from the Ubuntu Dialogue Corpus visualised in a pandas dataframe

Page 12

2.3 Natural Language Processing (NLP) pipeline Natural Language Processing (NLP) is a set of techniques that enables computer to “talk” to human and

understand the human language. NLP accounts for the hierarchical structure of a language, such as

letters forming words and words forming a sentence. This is a big challenge in computer science because

of the complexity and ambiguity found in languages. For instance, a computer needs to know both the

meaning of the word and how these words are linked together to create the meaning of the text. NLP

techniques are broken down into 3 stages [2]:

1. Morphological analysis

2. Syntactic analysis

3. Semantic analysis

2.3.1 Morphological Analysis In this stage, we study the elements within words. The key processes are as follows:

• Tokenization

Texts are broken down into symbols, words, phrases or other text elements called tokens. For

example, a sentence (“I am Louis”) is broken down into (“Hello”, “I”, “am”, “Louis”)

• Stop-word removal

Most common words in a language that gives little meaning to a text such as ‘a ‘, ’the’, ‘is’ and

‘are’ are removed. The removal of these words will not change the meaning of the entire text.

• Special character removal

Similar to stop-word removal, we will remove special characters which does not help a

computer to comprehend the sentence. This includes symbols like (“&”, ”@”, “,”)

• Stemming

Reducing conjugated words to the same word stem. For instance, words like eating and eaten

can all be stemmed to ‘eat’. These 3 words hold the same meaning. The simplest method of

stemming is the affix removal stemmers. This method removes pairs of letters from the end of

the word. For instance, “eating” becomes “eat”. But “aging” becomes “ag”. Hence, stemming

often lead to problems such as over-stemming and under-stemming.

• Lemmatization

Lemmatization serves the same purpose as stemming of reducing conjugated words back to its

root word. The difference is that instead of slicing off the end words to get the root word,

lemmatization uses a vocabulary to transform words properly into its root form, called the

lemma. To do so, it is necessary to have detailed dictionaries which the algorithm can look

through to link the form back to its lemma. For instance, lemmatization process “ate” and

transform it into “eat”

Page 13

• Automatic Query expansion

Reformulating a query to encourage matching between texts

Some techniques used to expand queries are using synonyms of words, stemming all words and

fixing spelling errors

• Part-of-speech (POS) tagging

Words can be tagged as a noun, verb, pronoun or article. 2 approaches are:

1. Statistical approach (Markov models)

2. Rule based approach

o Rules, such as word position in a sentence, are place into an algorithm to tag the words

2.3.2 Syntactic Analysis In this stage, we study the elements within sentences. The key processes are as follows:

• Parsing

Convert sentences into its formal grammar structure. The output is a parse tree that illustrates the

syntactic relation between words in the input sentence.

Figure 2.3.1 Illustration of a parse tree [25]

• Bag-of-words

Bag-of-words is an orderless representation of text. Text is portrayed as the bag of words, without

considering the relationship between words and grammar. This method represents the word in large

text and is most frequently used for document classification

• N-grams

N-gram models are used to store spatial information. It models the probability of the words from

the sequence of words and estimates the next word. The probability that a word in used next in the

sequence is estimated by the frequency that this word follows the same sequence in the training

corpus, divided by the frequency that the sequence is present in the training corpus, N stands for

the number of words considered in the sequence. For example, the sentence: “I study in the

National University of Singapore”, when N=2, the N-gram of the sentence becomes: “I study”,” study

in”, “in the”, “the National”, “National University”, “University of”, “of Singapore”.

Page 14

• Term Frequency-Inverse Document Frequency (TF-IDF)

Uses statistic to reflects the importance of a word to a text in a sentence or text. The TF-IDF statistic

increases when a word is more frequently found in a text, however, loses its importance if it is

frequently found in all text, suggesting that it may be a stop word.

(𝑇𝐹 − 𝐼𝐷𝐹) = 𝑡 𝑙𝑜𝑔 𝐷

𝑑

➢ t - number of times the word appears in that particular document (term frequency in the input) ➢ d - number of text documents the term appears in ➢ D - total number of text documents

2.3.3 Semantic Analysis Assigning meaning to words, sentences and texts. Structures are created to represent the meaning of

words and phrases, however, there is no optimal solution to automatically derive the meaning from text

despite intensive research by scientists.

2.3.4 Natural Language Processing in this study Natural Language Processing itself is a field that many data scientists spend their career research on. To

keep the project manageable, we only did morphological analysis to our text data. Due to the

complexity of many of these techniques we performed the following NLP processes:

We tokenized the text (“Hello”, “I”, “am”, “Louis”), remove stop words (“a”, “the”, “and”) and special

characters (“&”, “@”, “,”) since we only want to keep the context of the sentence.

We import a Python library called Natural Language Toolkit (NLTK) and fit our data into these processes

A snapshot of sentences after going through the NLP pipeline can be seen in Figure 2.3.4.1

Figure 2.3.4.1 Section of the Ubuntu Dialogue Corpus after tokenising, removing of stop words and special characters.

Page 15

3 Vectorizing of text We need to transform our text data into numbers for our computer to process.

3.1 Introduction Words are not naturally understood by computers. By transforming words into a numerical form, we can

apply mathematical rules and do matrix operations on them to obtain an output.

The most basic way to numerically represent words is through one-hot encoding. This means that every

unique word in the dataset is represented by a one vector in the vector space, with 0s everywhere else.

The dimension of the vector will then be the number of unique words. This results in an enormous

vector that captures no relational information [24][26][28].

Figure 3.1.1 Visualisation of one-hot encoded vector

As seen in the diagram, every word is of the same distance, hence, synonyms and antonyms are treated

to be the same.

3.2 Word Embeddings Word embeddings is a real number, vector representation of a word. Ideally, words with similar

meaning will be close together when being represented on the vector space. The goal is to capture the

relationship in that space. With words in densely populated space, we can represent word vector in a

much smaller space as compared to one hot encoded vector that could go up to millions of dimensions

[26][27][28].

In this study we will be implementing 3 popular word embedding techniques, namely, word2vec, GloVe

and fastText.

3.2.1 Word2vec Word2vec was created by a team of researchers at Google, led by Tomáš Mikolov. It is the most popular

method for training embedding [26][27][28][29[[30][32][33].

This model involves a statistical computation to learn from a text corpus. It is a predictive model that

learns their vectors to improve their predictive ability by reducing its loss function. It is also the first

model that considers the closeness of word meaning in a vector space.

Page 16

There are 2 methods that this model takes during training.

1. Continuous Bag-of-Words (CBOW)

As briefly explained in 2.3.2 what a bag-of-words is, this method determines the context of a word

by the surrounding words, or continuous-bag-of-words. It then learns an embedding by predicting

the current words based on the context.

2. Continuous Skip-Gram

This method also learns an embedding by predicting the surrounding words given the context. In

continuous skip-gram, the model uses the current word to predict the surrounding window of

context words.

According to the google team, CBOW is faster than skip-gram. However, skip-gram performs better at

infrequent words

In this study, we used the Google’s pre-trained model. Its word vectors embed a vocabulary of 3 million

words and phrases trained on approximately 100 billion words from the Google News dataset. There

was no explicit detail on whether Google used CBOW or skip-gram to train the model. Each vector has

300 dimensions.

3.2.2 GloVe GloVe, derived from Global Vectors, is another model for word embedding. This model is created by a

team of researchers from the Stanford Artificial Intelligence Laboratory in the computer science

department of Stanford University [26][27][28][31][33].

An extension of word2vec, instead of being a predictive model, GloVe is a count-based model.

Initially, a sparse matrix (large matrix with mostly zero terms) of words × context with the count of word

frequency in the corpus is constructed. Context refers to the word next to (before, or after) the word of

interest. For example, the sentence “The boy ate at the table” with a window size of 2 would become a

co-occurrence matrix as seen in table 3.2.2.1

Table 3.2.2.3.2.1 Word context co-occurrence matrix for the sentence “The boy ate at the table”

the boy ate at table

the 2 1 2 1 1

boy 1 1 1 1 0

ate 2 1 1 1 0

at 1 1 1 1 1

table 1 0 0 1 1

When many sentences are added together, this matrix will grow into a sparse matrix with many 0

entries. This matrix is manipulated based on the hyperparameters set to shift the weights on certain

words. The word context co-occurrence matrix is then deconstructed into a word feature matrix and

feature context matrix as shown in figure 3.2.2.1

Page 17

Figure 3.2.2.1 Matrix illustration for the construction of a GloVe embedding [28]

The row of the word feature matrix is the representation of the GloVe vector of each word.

GloVe vectors are very good at global information, but do not perform that well when capturing

meanings of words.

In this study, we used the GloVe’s pre-trained model. Its word vectors embed a vocabulary of 2.2 million

words and phrases trained on approximately 840 billion words from the Common Crawl dataset. Each

vector has 300 dimensions.

The Common Crawl dataset from text all over the web from a non-profit organization, Common Crawl.

3.2.3 fastText fastText is another word embedding method created by Facebook's AI Research (FAIR) lab. Like GloVe, it

is another extension of the word2vec model [26][27][28][32].

Unlike the previous 2 models which treat words as the smallest unit to train on, fastText treats each

word as composed of character N-gram as explained in 2.3.2, but for word level. For example, the word

vector for “hello” is a sum of the vectors: “he”, “hel”, “hell”, “hello”, “ello”, “llo”, “lo”, “ell”, “el”, “ll”.

This feature allows fastText to support words that are not trained in its vocabulary. For instance, the if

the model has been trained on the word ‘apple’ but has not been trained on the word ‘pineapple’, It will

be able to form a relationship between those 2 words and give a meaning to the word. Hence, fastText

is known to best handle rare words.

In this study, we used the fastText’s pre-trained model. Its word vectors embed a vocabulary of 2 million

words and phrases trained on approximately 600 billion words from the Common Crawl dataset. Each

vector has 300 dimensions.

3.2.4 Embedding Comparison Cosine similarity or cosine proximity is a measure of closeness between 2 non-zeros vectors. It is the

result of an inner product space, measuring the cosine angle between the 2 vectors.

𝑐𝑜𝑠𝑖𝑛𝑒 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 = 𝑐𝑜𝑠 𝜃 = |𝐴 ∙ 𝐵

‖𝐴‖‖𝐵‖| = ||

∑ 𝐴𝑖𝐵𝑖𝑛𝑖=1

√∑ 𝐴𝑖2𝑛

𝑖=1 √∑ 𝐵𝑖2𝑛

𝑖=1

||

➢ A and B – first and second vector respectively

Page 18

➢ 𝜃 – cosine angle

For each word embedding model, we searched for words that are closest to the word ‘hello’ in terms of

cosine similarity. The similar words and cosine similarity value for each model are presented below.

We mapped the similarity between the word ‘Hello’ and ‘Bye’. These 2 words should be almost parallel

but in opposite direction because of the meaning of the words, hence should have a cosine similarity

value that is close to 1.

Cosine similarity values will be close to 0 when 2 vectors are orthogonal and have no relationship with

each other.

Hence, cosine similarity is a measure of how related 2 words are rather than how similar 2 words are.

Word2vec

Words most related to: ‘Hello’

[('Hi', 0.6188793182373047), ('Hello_hello', 0.6035354137420654), ('Goodbye', 0.6006848812103271),

('Hiya', 0.5998829007148743), ('Hey', 0.5955354571342468), ('Bruce_Springsteen_bellowed',

0.5867887139320374), ('hello', 0.5842196345329285), ('Welcome', 0.570493221282959), ('Hullo',

0.5693444609642029), ('@_ESPN_Michelle', 0.5587143898010254)]

Figure 3.2.4.1 word2vec top 10 most related words to ‘Hello’

Similarity between ‘Hello’ and ‘Bye’ (cosine similarity)

0.34806252

00.10.20.30.40.50.60.70.80.9

1

CO

SIN

E SI

MIL

AR

ITY

RELATED WORDS

word2vec 10 most closely related words to 'Hello'

Page 19

GloVe


[('Hi', 0.8000067472457886), ('hello', 0.7661874294281006), ('Hey', 0.7338173985481262), ('Dear',

0.7021666169166565), ('Greetings', 0.6533131003379822), ('Thank', 0.6320098638534546), ('Thanks',

0.6301470994949341), ('Welcome', 0.6017478704452515), ('Howdy', 0.5910314321517944), ('Happy',

0.5842403769493103)]

Figure 3.2.4.2 GloVe top 10 most related words to ‘Hello’


0.53188723

fastText


[('Hi', 0.8838053345680237), ('Greetings', 0.7670567035675049), ('Hellow', 0.7657904624938965),

('Helllo', 0.7620762586593628), ('Hallo', 0.7522884607315063), ('HEllo', 0.7484337687492371), ('Hiya',

0.7310601472854614), ('Howdy', 0.7185547947883606), ('Helloo', 0.7155404090881348), ('Hey',

0.7117010354995728)]

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

CO

SIN

E SI

MIL

AR

ITY

RELATED WORDS

GloVe 10 most closely related words to 'Hello'

Page 20

Figure 3.2.4.3 fastText top 10 most related words to ‘Hello’


0.5045497

Table 3.2.4.1 summarizes these results and compares the cosine similarity word representation between

the 3 word embedding models

Table 3.2.4.1 Cosine similarity between the vector representation of the word ‘Hello’ against the words ‘Bye’ and ‘Hi’ across the 3 word embedding models

3.3 Implementation In the preprocessing stage, the list was grouped into a question and answer format. This data feed into

the 3 word embedding models mentioned. Since the models were already trained on other sources, this

process of using a trained data for another dataset is referred to as transfer learning. For all 3 models

each word is vectorized into a 300-dimension vector. Each question and answer were truncated to a

length of 14 words. Reason to why this is done will be explained in section 4.2 on vanishing and

exploding gradient problem. Finally, to indicate to the model that the sentence has ended, we filled the

last vector with a sentend (short for sentence end) vector. This vector is a 300-dimension vector filled

with value 1. For those vectors with less than 14 words, we filled the shortage with the sentend vector

such that every question and answer input is of length 15.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

CO

SIN

E SI

MIL

AR

ITY

RELATED WORDS

fastText 10 most closely related words to 'Hello'

word2vec GloVe fastText

'Hello' against 'Bye' 0.34806252 0.53188723 0.5045497

'Hello' against 'Hi' 0.618879318 0.800006747 0.883805335

Page 21

An example of how the word “thanks” look like through a 300-dimension GloVe embedding:

array ([ 6.1785e-01, -4.4125e-01, -1.2995e-01, 3.0474e-01, -1.6964e-01, 5.8525e-01, 1.7816e-01, -2.4445e-01, 4.6609e-01, -5.5441e-01, -1.9651e-01, 1.2467e-01, 2.3402e-01, 4.3042e-01, -6.9528e-02, 3.6921e-01, -4.1056e-01, 8.0052e-01, -4.0739e-02, -5.5692e-01, -7.4094e-01, 3.2417e-01, -6.6123e-02, 1.2387e-01, -2.4245e-01, -4.9114e-01, -7.7270e-02, -4.1978e-01, -1.9696e-01, 1.5016e-01, 8.0221e-01, -5.1428e-01, 2.9590e-01, 1.4012e-01, -5.6856e-01, -1.0086e-01, -1.2372e-01, 5.6630e-01, 7.3632e-01, -4.3627e-01, -3.4570e-01, -2.4257e-01, 5.0708e-01, 2.2338e-02, 3.2261e-01, 1.2905e-01, 8.9789e-01, 4.7432e-01, -3.3794e-01, 2.7993e-01, 1.5664e-01, 2.0991e-01, 8.7210e-02, -1.1773e+00, -1.4127e-02, -8.5633e-02, -1.7679e-01, -3.6956e-01, 6.3628e-01, 9.3964e-02, -3.5051e-01, 4.4953e-01, 1.1335e-01, -6.9076e-01, 9.5427e-01, 3.8231e-02, 2.8848e-01, -3.1629e-01, -5.3073e-01, 4.6703e-01, 2.8946e-01, 6.0747e-02, 7.8110e-01, 9.7159e-02, 3.5411e-01, 3.5429e-01, 4.5753e-01, 2.1665e-01, 7.7750e-02, 1.5379e-02, 1.1258e-01, 2.4216e-01, 2.7584e-01, 2.0799e-02, 2.5160e-03, -7.7169e-02, 1.0423e+00, -8.2542e-01, 5.8036e-01, 1.4357e-01, -2.2185e-01, -7.8225e-01, 2.0596e-01, -2.8564e-03, 2.3707e-01,5.8756e-01, -6.2610e-01, -3.7070e-01, 1.8357e-01, 3.7607e-01, -1.6075e-01, 5.4994e-01, -2.1851e-02, -1.1604e-01, -3.5818e-01, -1.0260e+00, -9.4983e-03, 2.1102e-01, 1.7862e-01, -2.7145e-01, -2.4182e-01, -2.9118e-01, 5.1136e-02, -5.6239e-01, -1.0261e-01, 3.6359e-01, 3.4954e-02, -2.3305e-01, -8.0760e-01, -6.5115e-02, -6.5976e-02, -7.2082e-02, -5.8931e-01, -5.1587e-01, 1.5050e-01, -2.0810e-01, 2.3459e-01, -4.1443e-02, 4.8942e-01, -4.6887e-01, -3.1591e-01, 1.1605e-01, -5.1222e-01, -3.3649e-01, -2.3522e-01, 2.1687e-01, 2.5116e-02, -4.7138e-01, -3.8184e-01, -2.4276e-01, -5.7618e-01, -3.4715e-01, -3.4020e-01, -2.3749e-01, 3.6804e-01, -2.2816e-01, 1.4530e-01, 3.6457e-01, -1.1285e+00, -1.9550e-02, 4.6852e-01, 4.3800e-01, -2.1323e-01, 4.0452e-01, 3.0908e-01, 3.1752e-01, -1.6075e-01, 2.2468e-01, -1.4104e-01, -2.1498e-01, -9.0856e-02, -1.1202e-01, 4.0014e-01, 7.7541e-03, -6.0925e-01, -5.3467e-01, -3.3437e-01, -2.9339e-01, -3.6834e-01, 1.4113e-01, 4.4871e-01, -1.6597e-01, 3.6154e-01, 3.0655e-01, 1.8177e-01, 4.8663e-01, 7.5956e-02, 2.7227e-01, -2.9530e-01, 3.7055e-01, 1.4913e-01, -4.4195e-01, 3.6298e-01, 6.8432e-02, -3.2926e-04, 6.1724e-01, 4.8816e-02, -1.6167e-01, -5.6976e-01, -4.9326e-01, 3.5325e-02, -1.9735e-01, -4.8631e-01, 4.3613e-02, 9.1704e-01, -1.8168e-01, 3.3793e-02, -1.9161e-01, 3.8046e-01, 5.3991e-01, -1.9965e-01, -4.2029e-01, -4.4788e-01, 3.2937e-03, 1.2840e-01, -4.0793e-01, 2.1293e-01, 1.3234e-01, 2.3004e-01, 1.5066e-02, 4.9688e-01, -1.9250e-01, 1.8341e-01, -8.0931e-02, -7.9795e-01, -6.0971e-01, 3.5563e-01, -2.0999e-01, -7.3456e-01, 1.6439e-01, 4.4013e-01, 3.7708e-01, -1.6677e-01, 1.3012e-01, 2.5617e-01, -3.3465e-01, 3.6778e-01, -2.8760e-01, 4.8935e-01, 1.2266e-01, 2.7119e-01, -4.0542e-01, 4.8436e-01, -3.3753e-01, 5.8466e-02, -6.1795e-01, -1.7105e-01, -1.2160e-01, 8.3938e-01, -7.3814e-02, 2.3217e-01, 2.3231e-01, 4.2904e-01, -3.0190e-01, 4.3600e-01, -8.6186e-01, -5.4164e-01, 1.5913e-01, 2.0317e-01, -6.8615e-01, -1.4824e-01, -2.2892e-01, 5.0858e-01, -6.0777e-01, 5.9970e-01, -4.3597e-01, -2.9010e-01, -4.4896e-01, -9.1547e-02, -2.4183e-01, 5.4375e-01, 1.7586e-01, 6.5346e-01, -7.7777e-01, -2.3320e-01, 2.6135e-02, -3.8351e-01, 9.5452e-02, 8.0358e-01, -4.6460e-01, -4.3092e-01, 5.3605e-01, 1.1099e-01, 3.9731e-01, 1.4128e-01, 1.1530e-01, 8.0781e-01, 1.3066e-01, 5.7696e-01, -6.0291e-01, -3.6552e-01, -4.2259e-01, 1.2372e-02, 4.6832e-01, 1.0419e+00, -3.3080e-01, -3.4654e-01, 4.8942e-01, 6.6817e-01, -4.4043e-01, 1.9420e-01, -1.2774e-01, -2.7415e-01, 7.7054e-02, 3.7732e-01, 3.4198e-02, 3.6308e-01, -7.6669e-01, 3.7364e-01, 3.6009e-01])

Page 22

On a histogram,

Figure 3.3.1 Histogram for the GloVe embeddings of the word 'thanks'

This is how the sentend vector look like:

Figure 3.3.2 The sentend vector visualised on JupyerLab

Page 23

4 Deep Learning Framework Before we will now dive into the deep learning framework used in this model, we need to understand

the problem setting. The chatbot is a sequence-to-sequence problem. We will be using a recurrent

neural network (RNN) or more specifically long-short term memory (LSTM) model in this problem.

4.1 Sequence-to-sequence (Seq2Seq) We first look at the problem setting of this project. In language study, the placement of words affects

the meaning of a sentence. For instance, the sentence “you are happy” and “are you happy” may

contain the same words however they have different meanings. Even though we have removed the

special character ‘?’ in the latter statement, we can still interpret that the second statement is a

question to understand if the second person is feeling happy while the first statement is indicating that

the second person is feeling happy. From this example, we need to construct a model that accounts for

where a token is inserted into the sentence. Hence this is a time series problem that requires a model to

accept a time series input and return a time series output. This kind of problem is referred to as a

sequence-to-sequence learning model. The initial sequence is fed into the encoder which considers the

order of the input is then sent into a model before delivering an output which is returned at the decoder

with the correct word order. This is illustrated in Figure 4.1.1 where the sentence “how is nus” followed

by the sentend vector mentioned in chapter 3.3 is used to indicate the end of the sentence is loaded

into the encoder. The decoder then gives the sequential output “it is awesome” and SENTEND after

passing through the black box which represents the model we will be using. This is a simplified model

and, in our study, as mentioned in chapter 3, we are using an encoder and decoder that handles 15

vectors for its input and output [34].

Figure 4.1.1 An illustration of the sequence-to-sequence problem setting, starting with an encoder and ending with a decoder

4.2 Recurrent Neural Network (RNN) We will now look at the model which was depicted by the black box in section 4.1.

Before heading to the specific model that we will be using, we will first understand the kind of network

we will be dealing with. The most common way of modeling a seq2seq problem is with a recurrent

neural network model. Deep learning and neural networks were already introduced in section 1.2. We

now look at a type of neural network called the recurrent neural network (RNN). As most networks are

feed forward network, this kind of neural network are recurrent. This means that there are loops in the

network and the output of one unit may go back to one of the already visit units. This is illustrated in

Figure 4.2.1 where there is an arrow at the hidden layer which loops back into the hidden layer. This

loop is not present in a typical feed forward neural network. This loop can be “unfolded” into what we

see in Figure 4.2.1. “unfolded” is in quotation marks because we cannot unfold the network and this

Page 24

unfolding is only illustrated for visualization purpose. This allows us to draw a closer comparison with

the seq2seq problem statement we had before. The initial black box we described is now replace by a

hidden layer. We call each of this hidden layer a cell [34][35].

The biggest problem that exists with RNN is the vanishing and exploding gradient problem. This problem

was first discovered by scientist Sepp Hochreiter in 1991. Following which, there were many papers

written targeted for this problem. We will generally cover this problem now.

To simplify the equation of an RNN, we have the equation:

𝑂 = 𝑊𝑛𝐼

➢ I - the input

➢ O - the output

➢ W - the weight

➢ n - the number of sequence input.

If we do not limit the input and output sequence, the variable n will be a number, 0 < 𝑛 < ∞ . When

the number of inputs from the training becomes a large number, the value of Wn will tend towards

infinity if W is greater than 1 or tend towards 0 if w is less than 1. Hence, the output value will be either

infinity or 0. Due to this limitation of the model and to reduce the computational cost of our model, we

have limited the input to 15 as mentioned in section 3.3. However, we found a specific type of RNN

which addresses our first consideration. Hence, the result of our truncation is mainly to deal with

computational cost.

4.3 Long Short-Term Memory (LSTM) We will now explore the specific model we use in this study. Long short-term memory is a special type of

RNN where you connect the units in a specific way such that it avoids the vanishing and exploding

gradient problem that arise initially in a typical RNN [36].

LSTM was discovered by Hochreiter (the same scientist who discovered the vanishing and exploding

gradient problem in RNNs) and Schmidhuber in 1997. Over the years many other scientists refined and

popularized this model. It is model that works on a wide variety of problems, some of which were

mentioned earlier in chapter 1.

To understand how LSTM is different from a typical RNN, we study the architecture of each individual

cell for both models.

Figure 4.2.1 An illustration of a recurrent neural network (RNN) model, “unfolded” to demonstrate the recurrent process

Page 25

Figure 4.3.1 Illustration of a repeating module in a standard RNN model, containing a single tanh layer [36].

Figure 4.3.2 Illustration of a repeating module in an LSTM model, containing 3 sigmoid and 1 tanh layer [36].

In an artificial neural network like this LSTM, all the “memory” of the networks is in the form of vectors

that we have created in chapter 3. To “remember”, “learn” or “forget” is analogous to when a

mathematical operator acts on the “memory” vector to retain, alter or remove the values.

Figure 4.3.1 represents the cell of a standard RNN while Figure 4.3.2 represents the cell of a LSTM. In a

standard RNN, it consists only of a single tanh neural network layer. In a LSTM, there are four neural

network layers interreacting in a unique way that allows the model to “remember” in its long-term

memory and “forget” in its short-term memory, hence the name long short-term memory. The long

term “memory” is embedded in the cell state represented by C while the short-term memory, memory

of the previous (t-1) output, is embedded in the hidden state represented by h in the figures.

Page 26

Each line in the diagram refers to a “memory” and in our study is represented by a 300-dimension

vector. The pink circles with an operator inside are pointwise operators. These occurs at intersections

between 2 vector line and forces the 2 vectors to do an operation. The pointwise operator with a ‘+’

performs a vector addition while those with ‘x’ performs a pointwise product (ie, [u1,…,un] x [v1,…,vn]

=[u1v1,…,unvn]. These pointwise functions are also referred to as gates as they decide on what

information is retained, added or remove from the system. The yellow boxes represent the neural

network layers. These layers comprise of a weight and bias which are being updated from

backpropagation during the training. Merging lines without the pink circles refer to concatenation of

vectors while splitting lines represents vectors being copied and going on separate paths.

We will explore the key idea behind the LSTM model.

In Figure 4.3.3 we have the cell state. This contains the “memory” from the previous sequence. We see

that the cell state goes through the cell without going through any neural network layer, it only passes

through 2 pointwise operation, hence the information through the system can only be altered linearly at

the pointwise operation. These structures are called gates. The vector through this stream are often

unchanged and is thus responsible for the long-term memory.

Figure 4.3.3 Illustration of the path of a “cell” through a repeating unit, responsible for keeping the “memory” of the model [36].

Before we explore the function at each neural network, we will introduce the two functions that are

governing the neural network. The first function is a sigmoid function. This function takes in any value

and compresses the value to produce an output with a value between 0 and 1. The second function is a

hyperbolic function of tangent, also referred to as a tanh function. This function takes in any values and

squeezes the values between -1 to 1.

The sigmoid and tanh functions are defined by the following formulas respectively

𝑆(𝑥) =1

1 + 𝑒−𝑥=

𝑒𝑥

𝑒𝑥 + 1 , tanh 𝑥 =

sinh 𝑥

cosh 𝑥=

𝑒𝑥 − 𝑒−𝑥

𝑒𝑥 + 𝑒−𝑥=

𝑒2𝑥 − 1

𝑒2𝑥 + 1

The first layer we will be looking at is called the “forget gate layer” as shown in Figure 4.3.4. The input

through this layer is a concatenated vector from the hidden state ht-1 and from the new input vector xt.

This concatenated vector consists of 2 300-dimension vectors where xt is fed into the kernel while ht-1 is

fed into the recurrent kernel. More about the kernel and recurrent kernel will be discussed in chapter 4.

The output is governed by the sigmoid function and produces a vector with values between 0 and 1. This

vector undergoes a pointwise multiplication with the cell state as shown in Figure 4.3.3. This allows the

Page 27

cell states to either “remember”, “learn” or “forget” information from the previous cell. For instance,

when the word fed in is a new subject, we want the model to forget the gender of the old subject.

Figure 4.3.4 Illustration of the vectors path through the "forget gate" [36].

In the next step, the model needs to decide if there will be any new information that needs to be added

to the “memory” of the cell state.

There are 2-part process as shown in Figure 4.3.5. The same concatenated vector that went through the

first neural network is being duplicated and one vector goes through a sigmoid layer to decide which

values to update, producing a vector it. The second vector goes through a tanh layer to create a new

vector referred to as candidate, �̃�𝑡. A pointwise product is performed on the 2 new vectors it and �̃�𝑡 to

create a new vector to be added into the cell state. This is shown in Figure 4.3.3.

An application of this will be when the forget gates causes the system to “forget” the gender of the old

subject, the gender of the new subject is updated into the cell state.

Figure 4.3.5 Illustration of the vectors path through the "input gate" [36].

Figure 4.3.6 Illustration of the vectors through the "forget gate" and "input gate" affecting the cell state [36].

Page 28

In the final process, the same concatenated vector through a sigmoid layer. This output vector decides

that parts of the cell state that will be output. The cell state goes through a tanh function to have its

values compressed between -1 to 1. Note that this is not a neural network layer but is simply a tanh

operator acting on the vector without the influence of weights and biases. These 2 vectors undergo a

pointwise multiplication to produce the new hidden state ht.

Figure 4.3.7 Illustration of the vectors path through the "output gate" [36].

In our setup, we stacked 4 LSTM on top of each other forming 4 layers. Note that there are 4 LSTM layer

and within each layer, there are 4 neural network layers. Figure 4.3.8 illustrates the set up used in this

study.

Figure 4.3.8 Illustration of a 4-layer LSTM that we use in this study.

Page 29

4.4 Training Upon understanding the model used, we trained out data on the proposed model.

The duration which we set to train our model is called an epoch. One Epoch is defined to be when an

entire dataset is passed forward and backward through the neural network only once.

We sent each of our vectorized data into the LSTM network. At this point, out 50,000 lines that had

been combined earlier in the preprocessing stage mentioned in chapter 2, our training dataset now

contains 14938 pairs of question and answer. Our training is set to 1000 epochs through 4 LSTM layers

with 300 neurons in each neural network layer.

In general, there is no rule on how many networks and epoch to run the dataset, too many might result

in overfitting and too little might not represent the data well enough. This is based on experience,

considering processing power of the computer and the data size. The parameters chosen in this study

were taken with reference from chatbot models that experts used.

Figure 4.4.1 Snapshot of the LSTM training process, where the first mini batch for the first epoch, the estimated time of completion/arrival (ETA) is 6.33min, loss function (cosine proximity) is -0.4829, accuracy us 0.0083. For the 14938 epochs, it is split into mini batches of size 32.

Page 30

We now look at the parameters in the training.

In each neural network layer in each cell, there are 300x300 weights in the kernel and 300x300 weight in

the recurrent kernel (300-dimension input on 300 neurons). All the weights in each LSTM layer kernel

and recurrent kernel are concatenated. We can see how this is broken down in Figure 4.4.2. The total

number of weights and recurrent weights in the LSTM is 2,880,000. Each neural network layer contains

300 biases and each LSTM layer contains 300x4 = 1200 biases. There is a total of 4,800 as seen in Figure

4.4.2. More details on weights and biases can be found in section 5.2.

An activation is the output at each neuron given a set of input, defined by an activation function.

In general, at each neural network layer in the LSTM as described in 4.3, it follows the equation:

𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 = function (𝑊𝑘 × 𝑥𝑡 + 𝑊𝑟 × ℎ𝑡−1 + b)

➢ activation – output, (300, 1) matrix

➢ function – the activation function, given by a sigmoid or tanh function

➢ Wk – weight of kernel, (300, 300) matrix

➢ xt – input at time t, (300, 1) matrix

➢ Wr – weight of recurrent kernel, (300, 300) matrix

➢ ht-1 – hidden layer from time t-1, (300, 1) matrix

➢ b – bias, (300, 1) matrix

The size of the matrices is specific to this study, where (m, n) represents m number of rows and n

number of columns.

Figure 4.4.2 Distribution of weights in kernel and recurrent kernel and its biases in each layer

Page 31

5 Results In this chapter, we will discuss the results obtain from the 3 different models we built. We broke down

the analysis into fundamental analysis, studying the intrinsic results such as the theoretical accuracy,

loss functions, weights and biases of the model, and technical analysis, studying the actual response

from the chatbot models.

5.1 Fundamental Analysis We study the values of the accuracy and the loss function at each epoch obtained from training the

model and the weights and biases at every 100 epochs from a mathematical point of view.

5.1.1 Accuracy and loss In this section, we will discuss about the accuracy and loss recorded during the training of each model.

Accuracy in this section refers to the theoretical accuracy obtained during the training. It refers to the

accuracy of the words matching with the theoretical “correct” answer that was given as the output. This

is not to be confused with the experimental accuracy give in section 5.2.1. The accuracy has a maximum

score of 1 when all the words in the validation is equals to the expected output. In this study, since our

input and output is of size 15 words, if it manage to predict 9 words in the correct position of the

sentence, it will achieve a score of (9 ÷ 15) = 0.6. We will now analyse the results we achieved.

Figure 5.1.1 Graph of accuracy against epoch across the 3 models during the training

From the accuracy graph, we gathered that GloVe obtained the best accuracy. Initially, the accuracy was

an almost an exponential increase until around the 150th epoch. The accuracy spiked from

approximately 0.3 to approximately 0.7 from the 150th to 200th epoch. After around 200 epochs, its

accuracy slowly increases to slightly below 0.8 where it starts to plateau. This means that out of 15

words that was fed into the model, the model was able to predict approximately (0.8 x 15) = 12 words of

the output correctly after training. This of a very high accuracy. The model with the next highest

accuracy is word2vec. Initially, the accuracy of the word2vec model was very low, close to 0. Its gradient

was almost flat and accuracy was not improving despite the training. From the 255th epoch, its accuracy

spiked from 0.067 to 0.404 in the 266th epoch. Following which, it starts to plateau at around 0.5 for the

Page 32

rest of the training. This model was able to predict approximately (0.5 x 15) =7.5 words of the output

correctly after training. FastText fared the worst at in end of the training in terms of the training

accuracy. Despite having a higher initial accuracy than word2vec, the training was not as efficient. Unlike

the first 2 models, there was no large jump in accuracy found from the fastText training. Accuracy only

improved after the 65th epoch and its rate of increase slowed down after the 200th epoch. Towards the

end of the train, from around 800 to 1000 epoch, there was a lot of noise observed. After 1000 epochs,

its training accuracy ended at 0.26. This means that it could only predict approximate (0.26 x 15) = 4

words of the output correctly.

The loss function is calculated from the cosine proximity/similarity formula:

𝐿 = − 𝑐𝑜𝑠𝑖𝑛𝑒 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 = − 𝑐𝑜𝑠 𝜃 = −𝐴 ∙ 𝐵

‖𝐴‖‖𝐵‖ = −

∑ 𝐴𝑖𝐵𝑖𝑛𝑖=1

√∑ 𝐴𝑖2𝑛

𝑖=1 √∑ 𝐵𝑖2𝑛

𝑖=1

Where L represents the loss function and A and B are the input and output vectors

In a usual cosine similarity formula, the value ‘1’ is obtained when 2 vectors are completely similar

(parallel) and 0 when they are dissimilar (orthogonal). However, unlike the usual cosine similarity

formula, there is a negative sign included here. This is to illustrate the loss function to decrease, getting

closer to -1, when the answers are getting more similar.

Figure 5.1.2 Graph of loss against epoch across the 3 models during the training

We can observe that the 3 models are decreasing in a similar shape. Like the accuracy graph, the GloVe

model performed the best (closest to -1). FastText performed better than word2vec in terms of loss.

This means that the vectors predicted by fastText were closer to each other than word2vec (have a

closer vector representation to the targeted answer), however, the final vector results produced were

not the same word, hence fastText scored a lower accuracy.

Page 33

5.1.2 Weights and Biases In this section, we analyse how the weights and biases evolve in each layer during the training and

compared the final training weights and biases across all 3 models.

The weights in the neuron applied to the input vectors are referred to as weights from the kernel while

the weights applied to the hidden state vector which stores the memory from the previous sequence’s

output is referred to as the weights from the recurrent kernel.

All the weights from the kernel and recurrent kernel are initialized with the Glorot normal initializer

(also known as Xavier normal initialization). This function is a truncated normal distribution with mean 0.

The standard deviation (𝜎) is given by:

𝜎 = √2/(𝑖𝑛 + 𝑜𝑢𝑡)

➢ in – number of input units in the weight tensor

➢ out – number of output units in the weight tensor

Glorot normal initializer is a common initializer used in neural networks [37]. If the weights of the

network start too small, the signal going through each layer will shrink and be too small. If the weights

are too large, then the signal going through the weight will grow and be too massive. Glorot normal

initializer is a method commonly used in data science with the aim to have the right size for the weight

distribution, keeping the signal in a reasonable range of values through the layers.

For all weights in the kernel, recurrent kernel and all the biases, we plot the quantity on a log scale. The

first reason for doing so is to deal with the skewness of values from the initialization. The second reason

is to observe if the data follows a power law distribution.

In the following observations, the weights and biases distribution are studied on a macro perspective.

5.1.2.1 Distributions

We plotted and anaysed how the weights and biases for each layer in each model evolved over the 1000

epochs set. In general, we observed that over the epochs, the weights, in both the kernels and recurrent

kernels, and biases spread outwards away from the initialization. The Glorot normal initializer for the

kernels and recurrent kernels flattens outwards with some values favouring the negative values while

others favouring the positive values. The 2 peaks from the initialization of the bias flattened out as well

and converge towards each other.

Page 34

Figure 5.1.2.1.1 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec model first LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale

For the first layer of the word2vec network, we observer the weights redistributing away from the

center where the values of the weights were close to 0.

The weights were most volatile at the first 100th epoch and the redistribution is less vigorous as the

epochs increase. For kernel 1, the weight distribution is not symmetrical and is biased more towards the

negative values. For recurrent kernel 1, the weight distribution is found to be more symmetrical on both

sides. We observed that the movement of weights from the recurrent kernel is greater than the kernel.

The bias is initialized with 2 peaks at 0 and 1. Similar to the weights, we can see the biases redistributing

outwards with the first 100th epoch being the most volatile. We observed the peaks combining as the

biases spreads out.

A detailed study on the evolution of its distribution can be found in appendix 9.1.1.

We went on to study the weight and bias distribution after 1000 epoch compared across the 3 models.

In each layer, the weights and biases spread out differently for each model. There are no clear

qualitative differences to the model’s weights and biases.

Page 35

Figure 5.1.2.1.2 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec, GloVe and fastText model first LSTM layer, at the 1000 epochs, with its quantity plotted on a log scale

In the first layer of the 3 models, we observed that the weights from the word2vec model is the most

heterogeneous and GloVe was the least heterogeneous model in the kernel and recurrent kernel.

However, the biases for the GloVe model had the widest distribution while fastText had the narrowest

distribution.

A detailed study across each model after 1000 epochs can be found in appendix 9.1.2.

5.1.3 Fundamental Insights From fundamental analysis, the most important insight that we could draw was from the accuracy

graph. The GloVe model responded the best to the LSTM training, obtaining the best result. The most

interesting insight from the graph was the similar trend occurring in the word2vec and GloVe model, but

not for the fastText model. We observed a jump in accuracy for the first 2 models but not for the third

model. This is an interesting phenomenon that we try to study and model in chapter 6.

5.2 Technical Analysis In this section, we will discuss the result from the actual output which users will experience when they

use the chatbot.

5.2.1 Technical Insights At this stage, we want to test the accuracy of our models. Since we are building a model that mimics

intelligence, judging the accuracy of the model based on the numerical prediction would not be the

most accurate. The accuracy in this section is different from the theoretical accuracy mentioned in

section 5.1.1. The accuracy (experimental accuracy) here is a measure of accuracy on its context rather

than the accuracy of individual words.

Page 36

The best way to evaluate an artificial intelligent is to expose it to humans and gather real feedbacks.

However, since the chatbot it fed with too little data, it is still at its infancy development stage. Hence,

we personally evaluated the chatbot.

We loaded a list of 100 questions to the chatbot. The first 50 questions are questions that the chatbot

trained on, while the next 50 questions are questions that the chatbot have never encountered in its

training. We got the respond of each question from the 3 models and will be giving them a score from 1-

3. 3 for the best response given out of the 3 chatbots and 1 for the worst response. The chatbot will be

evaluated equally based on 2 criteria

1. The accuracy of the response

2. How human is the response

The first column gives us the index of the table. ‘Q’ represents question and ‘A’ represents answer. For

instance, ‘Q1’ is the index for question 1 and ‘A1’ is the index for answer 1. In the second column, we

have the questions and answers. The table is structure in such a way where we have the question,

followed by the actual answer given by the dialogue corpus in the next row.

The next 3 columns are the results from the 3 models, word2vec, GloVe and fasText respectively. The

top row, together with the question, gives the response from the model while the bottom row, together

with the answer, gives the score of each model.

Table 5.2.1.1 Truncated table showing the first question and answer pair from trained models and answer provided by the Ubuntu Dialogue Corpus

Question/Answer Response


Q1 what green ?

samantharonson_@ unfortu_nate

unsigned_char HELEN_COONAN_Well

l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

sure required provide least havent anyway yes post-

exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-

exertional post-exertional

do think refer porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A1

usb to ps2 converter came with the mouse in

the box 1 2 3 S

The full results of the responses and scoring is in the appendix, section 9.2.

Recalling the sentend vector that we have implemented in section 3.3, this vector will appear at the end

of the sentences. We could replace these vectors in the Chatbot integration, however, as mentioned in

1.3.2, it is not the focus of this project. As observed from each model, the sentend vector represents the

word:

● l'_Affaire, in the word2vec model

● post-exertional, in the GloVe model

● porn.The, in the fastText model

Page 37

The results generated from the 50,000 training input was not able to generate a sufficiently well trained

model. This made the grading very difficult as most of the answers were incorrect. This was especially so

in the second half of the questions with new questions. We also noticed that for the fastText model,

sentend was found not just at the end of the sentence, but also within the sentence. This is reflected by

the training accuracy, achieving the lowest score out of the 3 models.

One positive result was that we could see that the models have learnt from the training data. The

outputs were not completely random but showed sign of learning. It returned words that were within

the domain of the Ubuntu Dialogue Corpus. For example, its replies had words like ‘Ubuntu’, ‘BIT’ and

computing related subjects.

Despite the difficulty to grade the chatbot, we managed to rank the models. More details on the grading

results can be found in appendix 9.2. In general, we felt that GloVe did the best in terms of replying the

most humanly logical sentences. The scores are summarized in table 5.2.1.2.

Table 5.2.1.2 Technical analysis results for the 3 word embedding models on 100 sample question and answer pairs


Score 129 250 215

Page 38

6 Neuron Activation and Ising Spins We observed an interesting phenomenon during the training whereby there was a spike in accuracy

during the training of the word2vec and GloVe model. In this chapter, we try to give an interpretation to

this phenomenon by drawing an analogy of this phenomenon with a phase transition and modelling the

neuron activations to Ising spins.

6.1 Phase Transition and Ising Model Phase transitions refers to the change in state. Common phase transition we are familiar with includes

vapourisation and melting [38].

We explore a familiar phase transition, melting of ice.

The Helmholtz free energy of a system is given by the following equation:

𝐹 ≡ 𝑈 − 𝑇𝑆

Where F is the Helmholtz free energy, U is the internal energy of the system, T is the absolute

temperature of the surrounds and S is the entropy of the system

At very low temperature, the water is in solid form, ice, as heat is applied to the ice the temperature

increases while the molecules held within a crystal lattice vibrates faster and becomes more energetic.

Once the temperature reaches 0°C or 273.15K, this is a critical temperature where phase transition

occurs. At this point, upon heating the ice, the temperature of the ice does not change. The heating

process causes the entropy and internal energy to increase as temperature remains constant as the ice

starts to melt.

Another famous model is the Ising Model, named after physicist Ernst Ising, is a popular model for

magnetic solids. In this model, each atom has an intrinsic magnetic momentum called spin. This spin can

exist as spin “up”, which conventionally equals to 1 or “down” which equates to -1. It is being widely

used as a toy model to study phase transition behaviors in statistical mechanics. In this model, close to

the critical temperature, the correlation lengths of the system approaches infinite, while the variance

from the distribution of the spin cluster sizes is heterogeneous.

6.2 Model and Results In Figure 5.1.1.1, we saw that there was a spike in accuracy over the epoch during the training. By

drawing analogy to phase transitions, we hypothesize that before the training starts, the neural network

is in the disorder phase, as the weights are randomly initialized. It evolves towards a 'ordered' phase in

which the model has learned the patterns in data and is able to make predictions. We see that an abrupt

change in accuracy exists for word2vec and GloVe, but not for fastText. If the analogy is accurate, the

word2vec and GloVe models has a “critical epoch” of 250 and 150 respectively.

Since neural activations do not have a clear spatial allocation unlike the Ising model, we look at the

distribution of activations to explore the dynamical evolutions of the model training process.

We investigate the activation of the neurons by feeding in the sentend vector. After initialization, the

weights and biases will start distributing outwards. As these neural network layers are governed by a

sigmoid and tanh functions, the activations tend towards 0 and 1 for the sigmoid function, 1 and -1 for

the tanh function.

Page 39

From figure 6.2.1, we have plot the reverse cumulative distribution function (CDF) for the word2vec

activation at the first gate, we observe that the likelihood of occurrence after the zeroth epoch is at the

extreme values of 0 and 1.

Figure 6.2.1 Reverse CDF plot for the word2vec activations in the first gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Refer to appendix 9.3.1 for the reverse CDF plots for all the other layers and models.

The 2 states of the activations are similar to that of the up and down spins in the Ising model. Hence, we

try to draw an analogy of the activations to the Ising model. To verify if phase transition occurs, we plot

the variance of the activation from the sentend over 1000 epochs.

From figure 6.2.2, we observe that maximum variance occurs at the region of the “critical epoch” for

LSTM 2 gate 1 in the word2vec model. The same observation can be made in the same layer and same

gate at the “critical epoch” for the GloVe model, illustrated in figure 6.2.4. This suggest to us that phase

transition could occur in LSTM 2 gate 1.

Instead of setting the activation on the sentend vector, we fed the vector for the word ‘java_14’ into the

word2vec and GloVe model. Figure 6.2.3 and figure 6.2.5 shows that a maximum does not occur at the

“critical epoch”.

Page 40

Figure 6.2.2 Variance plot for the word2vec activations in the first gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Figure 6.2.3 Variance plot for the word2vec activations in the first gate (sigmoid), second LSTM layer, over 1000 epoch at 100 epoch intervals, when the word ‘java_14’ is fed into the activation.

Page 41

Figure 6.2.4 Variance plot for the GloVe activations in the first gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Figure 6.2.5 Variance plot for the GloVe activations in the first gate (sigmoid), second LSTM layer, over 1000 epoch at 100 epoch intervals, when the word ‘java_14’ is fed into the activation

Page 42

Plots of variance against epoch at each gate per layer for all other layers can be found in appendix 9.3.2.

From the results, we found that the variance of these activations was not at its maximum during the

abrupt change in accuracy. This was consistent for both word2vec and GloVe model. We do not have

enough evidence to support the hypothesis that the abrupt increase was due to a phase transition.

Page 43

7 Future work and Conclusion As natural language processing is a new field of study, little is known about it. There are still many

uncertainties that experts in the field do not understand. Researchers spend their career understanding

this branch of data science. As someone new to data science, given 1 year to do this project, on top of

school and a part time internship, time was a major limitation in this project. Hence, there are some

major improvements that can be done to improve on the result of this study.

Firstly, more research could be done on the natural language processes and applied to the corpus. In

chapter 2.3, we discussed about the various forms of analysis and mentioned that we have only applied

the most basic natural language processes: tokenization, stop word removal and special character

removal. There are many more methods mentioned in chapter 2.3 which may improve the result of the

output.

Another potential improvement that we could work on would be the word embeddings used. Currently,

we are doing transfer learning, using word embedding trained on other context, Google news for

word2vec and Common Crawl for GloVe and fastText. These are general words trained on a large

corpus, however, were not specific to the topic of interest. For instance, there were many irrelevant

words such as ‘samantharonson_@’, ‘porn.the’ and ‘AP_HOCKEY_NEWS’ that were in the replies of the

chatbot. Training the word embedding on the corpus with only relevant words might produce an

improved result. However, would require a lot more time and computational power.

One of the most critical improvement that is applicable to all data science problem is to train on a larger

dataset. The luxury of a super computing lab was not available and thus the access to computational

resources were limited. We trained the model on a home computer with only 8GB ram. Each model took

approximately 5 days to train and initially, we had to keep retraining the model as we were not sure of

what data we wanted to record. Attempts to use the National Supercomputing Centre Singapore (NSCC)

computer were made. This resource is freely available to NUS students. However, at the start of the

project, there was a waterpipe leakage at their centre, causing their infrastructure to be damage and

most services to be unavailable. Subsequently, we it was back online, we tried to again, however, due to

the complexity of the python libraries, older discontinued libraries such as Theano were used in my

code, we did not have enough time to either change the code or to try and figure out how to get the

libraries installed. We were only able to train 50,000 rows of our full corpus. By training the full corpus,

it will allow the weights to better generalize the model, possibly giving a better result.

The dataset used in this study, the Ubuntu Dialogue Corpus, is meant for users to discuss about

technical issues. A lot of these responses are specific to a particular problem, having specific steps to

solve a particular issue. This is something every difficult to generalize and users will require the exact

steps to solve that specific problem. A ruled based chatbot might be more suitable for this problem

setting. A generative based chatbot might be more situatable for more general question and answer

chatbot at this stage.

In this study, we did the most basic implementation into a user interface. We had to run the code and

talk to the chatbot on the python terminal. In future, when the accuracy of the chatbot is improved with

the suggestions mentioned above, or with other modeling techniques, we could complete the chatbot

by integrating it in a Chatbot user interface, such as on telegram, or even building an entire interface for

it. This might benefit Ubuntu users, providing them the convenience of an instant customer service.

Page 44

The physics intuition to analyse the deep learning models has led us to explore possible evidence of

critical phase transition in the training process. That is, whether the transition of the model from

'unlearned' to 'learned' is a critical phase transition analogous to many physical processes. From the

limited analysis of our work, such evidence is lacking. It could be purely the absence of a 'critical' phase

transition or it could be that the models we employed simply do not perform well in the end, meaning

they have not really reached the 'learned' phase yet. Other models or numerical methods can be applied

to further explore the nature of the evolution process during model training, and that could be future

work.

We hope that in the future, as we conduct more research on the tools required to build the chatbot, we

will have a better understanding on data science and neural networks, with the possibility of marrying

physics theories to accelerate the advancement of artificial intelligence. With a fully functioning

generative based chatbot connected to an open domain, we will be able to achieve what was discussed

in section 1.1 and even more!

Page 45

8 Bibliography [1] EY - Analytics. Retrieved from https://www.ey.com/gl/en/issues/business-environment/ey-

analytics

[2] Wetstein, S. (2017). Designing a Dutch financial chatbot (Masters). VU UNIVERSITY

AMSTERDAM.

[3] Schneider, C. (2017). 10 reasons why AI-powered, automated customer service is the future.

Retrieved from https://www.ibm.com/blogs/watson/2017/10/10-reasons-ai-powered-

automated-customer-service-future/

[4] AI for Customer Service | IBM Watson. Retrieved from https://www.ibm.com/watson/ai-

customer-service?cm_mmc=OSocial_Blog-_-

Watson%20and%20Cloud%20Platform_Watson%20Core%20-%20Conversation-_-WW_WW-_-

Landing%20Page&cm_mmca1=000027BD&cm_mmca2=10006919

[5] Avalverde, D. A Brief History of Chatbots. Retrieved from https://pcc.cs.byu.edu/2018/03/26/a-

brief-history-of-chatbots/

[6] Surmenok, P. (2016). Chatbot Architecture. Retrieved from

https://medium.com/@surmenok/chatbot-architecture-496f5bf820ed

[7] Millennium Problems | Clay Mathematics Institute. Retrieved from

http://www.claymath.org/millennium-problems

[8] Mohan, A. T. (2018). A Deep Learning based Approach to Reduced Order Modeling for Turbulent

Flow Control using LSTM Neural Networks. 22.

[9] Wielgosz, M., Skoczeń, A., & Mertik, M. (2017). Using LSTM recurrent neural networks for

monitoring the LHC superconducting magnets.

[10] Frankenfield, J. (2019). Artificial Intelligence (AI). Retrieved from

https://www.investopedia.com/terms/a/artificial-intelligence-ai.asp

[11] AlphaGo | DeepMind. Retrieved from https://deepmind.com/research/alphago/

[12] Sottek, T. (2017). The world’s best Dota 2 players just got destroyed by a killer AI from Elon

Musk’s startup. Retrieved from https://www.theverge.com/2017/8/11/16137388/dota-2-dendi-

open-ai-elon-musk

[13] Machine Learning: What it is and why it matters. Retrieved from

https://www.sas.com/en_sg/insights/analytics/machine-learning.html

[14] Richárd, N. (2018). The differences between Artificial and Biological Neural Networks. Retrieved

from https://towardsdatascience.com/the-differences-between-artificial-and-biological-neural-

networks-a8b46db828b7

[15] What is Deep Learning? - Definition from Techopedia. Retrieved from

https://www.techopedia.com/definition/30325/deep-learning

[16] Audacity, T. (2018). Introduction of Deep Learning. Retrieved from

https://medium.com/@techutzpah/introduction-of-deep-learning-e79252bf353a

[17] Mahapatra, S. (2018). Why Deep Learning over Traditional Machine Learning? Retrieved from

https://towardsdatascience.com/why-deep-learning-is-needed-over-traditional-machine-

learning-1b6a99177063

[18] Samson, O. (2017). Deep learning weekly piece: the differences between AI, ML, and DL.

Retrieved from https://towardsdatascience.com/deep-learning-weekly-piece-the-differences-

between-ai-ml-and-dl-b6a203b70698

https://www.ey.com/gl/en/issues/business-environment/ey-analytics

https://www.ey.com/gl/en/issues/business-environment/ey-analytics

https://www.ibm.com/blogs/watson/2017/10/10-reasons-ai-powered-automated-customer-service-future/

https://www.ibm.com/blogs/watson/2017/10/10-reasons-ai-powered-automated-customer-service-future/

https://pcc.cs.byu.edu/2018/03/26/a-brief-history-of-chatbots/

https://pcc.cs.byu.edu/2018/03/26/a-brief-history-of-chatbots/

https://medium.com/@surmenok/chatbot-architecture-496f5bf820ed

http://www.claymath.org/millennium-problems

https://www.investopedia.com/terms/a/artificial-intelligence-ai.asp

https://deepmind.com/research/alphago/

https://www.theverge.com/2017/8/11/16137388/dota-2-dendi-open-ai-elon-musk

https://www.theverge.com/2017/8/11/16137388/dota-2-dendi-open-ai-elon-musk

https://www.sas.com/en_sg/insights/analytics/machine-learning.html

https://towardsdatascience.com/the-differences-between-artificial-and-biological-neural-networks-a8b46db828b7

https://towardsdatascience.com/the-differences-between-artificial-and-biological-neural-networks-a8b46db828b7

https://www.techopedia.com/definition/30325/deep-learning

https://medium.com/@techutzpah/introduction-of-deep-learning-e79252bf353a

https://towardsdatascience.com/why-deep-learning-is-needed-over-traditional-machine-learning-1b6a99177063

https://towardsdatascience.com/why-deep-learning-is-needed-over-traditional-machine-learning-1b6a99177063

https://towardsdatascience.com/deep-learning-weekly-piece-the-differences-between-ai-ml-and-dl-b6a203b70698

https://towardsdatascience.com/deep-learning-weekly-piece-the-differences-between-ai-ml-and-dl-b6a203b70698

Page 46

[19] Krishnan, S. (2018). Chatbots are cool! A framework using Python. Retrieved from

https://towardsdatascience.com/chatbots-are-cool-a-framework-using-python-part-1-overview-

7c69af7a7439 [20] Ray, S. (2017). Understanding and coding Neural Networks From Scratch in Python and R.

Retrieved from https://www.analyticsvidhya.com/blog/2017/05/neural-network-from-scratch-

in-python-and-r/

[21] Ubuntu IRC Logs. Retrieved from https://irclogs.ubuntu.com/2007/12/12/%23ubuntu.html

[22] Lowe, R., Pow, N., V. Serban†, I., & Pineau, J. (2015). The Ubuntu Dialogue Corpus: A Large

Dataset for Research in Unstructured Multi-Turn Dialogue Systems. School of Computer Science,

McGill University, Montreal, Canada.

[23] Tatman, R. (2017). Ubuntu Dialogue Corpus. Retrieved from

https://www.kaggle.com/rtatman/ubuntu-dialogue-corpus [24] Parrish, A. (2018). Understanding word vectors: A tutorial for "Reading and Writing Electronic

Text," a class I teach at ITP. (Python 2.7) Code examples released under CC0

https://creativecommons.org/choose/zero/, other text released under CC BY 4.0

https://creativecommons.org/licenses/by/4.0/. Retrieved from

https://gist.github.com/aparrish/2f562e3737544cf29aaf1af30362f469

[25] CS 2112/ENGRD 2112 Fall 2018. Retrieved from

http://www.cs.cornell.edu/courses/cs2112/2018fa/lectures/lecture.html?id=parsing [26] NSS. (2017). Intuitive Understanding of Word Embeddings: Count Vectors to Word2Vec.

Retrieved from https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-

word2veec/

[27] Ruder, S. (2016). An overview of word embeddings and their connection to distributional

semantic models - AYLIEN. Retrieved from http://blog.aylien.com/overview-word-embeddings-

history-word2vec-cbow-glove/

[28] Heidenreich, H. (2018). Introduction to Word Embeddings | Hunter Heidenreich. Retrieved from

http://hunterheidenreich.com/blog/intro-to-word-embeddings/

[29] Google Code Archive - Long-term storage for Google Code Project Hosting. (2019). Retrieved

from https://code.google.com/archive/p/word2vec/

[30] Banerjee, S. (2018). Word2Vec — a baby step in Deep Learning but a giant leap towards Natural

Language Processing. Retrieved from https://medium.com/explore-artificial-

intelligence/word2vec-a-baby-step-in-deep-learning-but-a-giant-leap-towards-natural-language-

processing-40fe4e8602ba

[31] Socher, R., D. Manning, C., & Pennington, J. (2019). GloVe: Global Vectors for Word

Representation. Computer Science Department, Stanford University, Stanford, CA 94305.

[32] Rajasekharan, A. (2017). What is the main difference between word2vec and fastText?.

Retrieved from https://www.quora.com/What-is-the-main-difference-between-word2vec-and-

fastText

[33] Selivanov, D. (2015). GloVe vs word2vec revisited. · Data Science notes. Retrieved from

http://dsnotes.com/post/glove-enwiki/

[34] Goyal, P. (2017). What is the difference between LSTM, RNN and sequence to sequence?.

Retrieved from https://www.quora.com/What-is-the-difference-between-LSTM-RNN-and-

sequence-to-sequence

https://towardsdatascience.com/chatbots-are-cool-a-framework-using-python-part-1-overview-7c69af7a7439

https://towardsdatascience.com/chatbots-are-cool-a-framework-using-python-part-1-overview-7c69af7a7439

https://www.analyticsvidhya.com/blog/2017/05/neural-network-from-scratch-in-python-and-r/

https://www.analyticsvidhya.com/blog/2017/05/neural-network-from-scratch-in-python-and-r/

https://irclogs.ubuntu.com/2007/12/12/%23ubuntu.html

https://www.kaggle.com/rtatman/ubuntu-dialogue-corpus

https://gist.github.com/aparrish/2f562e3737544cf29aaf1af30362f469

http://www.cs.cornell.edu/courses/cs2112/2018fa/lectures/lecture.html?id=parsing

https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/

https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/

http://blog.aylien.com/overview-word-embeddings-history-word2vec-cbow-glove/

http://blog.aylien.com/overview-word-embeddings-history-word2vec-cbow-glove/

http://hunterheidenreich.com/blog/intro-to-word-embeddings/

https://code.google.com/archive/p/word2vec/

https://medium.com/explore-artificial-intelligence/word2vec-a-baby-step-in-deep-learning-but-a-giant-leap-towards-natural-language-processing-40fe4e8602ba



https://www.quora.com/What-is-the-main-difference-between-word2vec-and-fastText

https://www.quora.com/What-is-the-main-difference-between-word2vec-and-fastText

http://dsnotes.com/post/glove-enwiki/

https://www.quora.com/What-is-the-difference-between-LSTM-RNN-and-sequence-to-sequence

https://www.quora.com/What-is-the-difference-between-LSTM-RNN-and-sequence-to-sequence

Page 47

[35] Banerjee, S. (2018). An Introduction to Recurrent Neural Networks. Retrieved from

https://medium.com/explore-artificial-intelligence/an-introduction-to-recurrent-neural-

networks-72c97bf0912

[36] Olah, C. (2015). Understanding LSTM Networks. Retrieved from

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

[37] Jones, A. An Explanation of Xavier Initialization. Retrieved from

http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization

[38] Eastman, P. (2014). 6. Phase Transitions — Introduction to Statistical Mechanics. Retrieved from

https://web.stanford.edu/~peastman/statmech/phasetransitions.html

[39] Shrimal, S. (2017). The Semicolon. Retrieved from https://github.com/shreyans29/thesemicolon

[40] Building a Chatbot: analysis & limitations of modern platforms | Tryolabs Blog. (2017). Retrieved

from https://tryolabs.com/blog/2017/01/25/building-a-chatbot-analysis--limitations-of-modern-

platforms/

[41] Baroni, M., Dinu, G., & Kruszewski, G. (2019). Don’t count, predict! A systematic comparison of

context-counting vs. context-predicting semantic vectors. Retrieved from

http://clic.cimec.unitn.it/marco/publications/acl2014/baroni-etal-countpredict-acl2014.pdf

[42] Drakes, W. Information Retrieval: CHAPTER 8: STEMMING ALGORITHMS. Retrieved from

http://orion.lcg.ufrj.br/Dr.Dobbs/books/book5/chap08.htm

https://medium.com/explore-artificial-intelligence/an-introduction-to-recurrent-neural-networks-72c97bf0912

https://medium.com/explore-artificial-intelligence/an-introduction-to-recurrent-neural-networks-72c97bf0912

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization

https://web.stanford.edu/~peastman/statmech/phasetransitions.html

https://github.com/shreyans29/thesemicolon

https://tryolabs.com/blog/2017/01/25/building-a-chatbot-analysis--limitations-of-modern-platforms/

https://tryolabs.com/blog/2017/01/25/building-a-chatbot-analysis--limitations-of-modern-platforms/

http://clic.cimec.unitn.it/marco/publications/acl2014/baroni-etal-countpredict-acl2014.pdf

http://orion.lcg.ufrj.br/Dr.Dobbs/books/book5/chap08.htm

Page 48

9 Appendix

9.1 Weight and Bias Results As discussed in 5.3 about the weights and biases, we look at the plots of the results in detail.

9.1.1 Evolution over 1000 epochs In this section, we plot the kernel, recurrent kernel and bias individual LSTM layers for each model

across 1000 epochs. We can study how the weights and biases evolves in each layer.

9.1.1.1 Word2vec

Figure 9.1.1.1 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec model first LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale

For the first layer of the word2vec network, we observer the weights redistributing away from the




negative values. For recurrent kernel 1, the weight distribution is found to be more symmetrical on both




biases spreads out.

Page 49

Figure 9.1.1.1.2 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec model second LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale

For the second layer of the word2vec network, we observer the weights redistributing away from the




negative values. For recurrent kernel 2, the weight distribution also not symmetrical and is also biased

more towards the negative values. We observed that the movement of weights from the recurrent

kernel is greater than the kernel.

The weights for both kernel and recurrent kernel spread out more rapidly in the second layer as

compared to the first layer



biases spreads out.

Page 50

Figure 9.1.1.1.3 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec model third LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale

For the third layer of the word2vec network, we observer the weights redistributing away from the




negative values. For recurrent kernel 3, the weight distribution also not symmetrical and is also biased

more towards the negative values. We observed that the movement of weights from the recurrent

kernel is greater than the kernel.

The weights for both kernel and recurrent kernel spread out more rapidly in the third layer as compared

to the second layer



biases spreads out.

Page 51

Figure.9.1.1.1.4 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec model fourth LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale

For the fourth layer of the word2vec network, we observer the weights redistributing away from the




negative values. For recurrent kernel 4, the weight distribution also not symmetrical, however the

weights are observed to be biased more towards positive. We observed that the movement of weights

from the recurrent kernel is greater than the kernel.

The weights for both kernel and recurrent kernel spread out more rapidly in the fourth layer as

compared to the third layer



biases spreads out. The distinction between the 2 peaks were not observable by the end of the training

Page 52

9.1.1.2 GloVe

Figure 9.1.1.2.1 Graphs with the distribution of kernel, recurrent kernel and biases for the GloVe model first LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale

For the first layer of the GloVe network, we observer the weights redistributing away from the center

where the values of the weights were close to 0.


epochs increase. For kernel 1, the weight distribution is rather symmetrical on both positive and

negative sides. For recurrent kernel 1, the weight distribution not symmetrical and is also biased more

towards the negative values. We observed that the movement of weights from the recurrent kernel is

greater than the kernel.



biases spreads out.

Page 53

Figure 9.1.1.2.2 Graphs with the distribution of kernel, recurrent kernel and biases for the GloVe model second LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale

For the second layer of the network, we observer the weights redistributing away from the center


The weights were most volatile at the first 100th epoch and the redistribution less is vigorous as the


negative values. For recurrent kernel 2, the weight distribution not symmetrical and is also biased more

towards the negative values. We observed that the movement of weights from the kernel is greater

than the recurrent kernel.





biases spreads out.

Page 54

Figure 9.1.1.2.3 Graphs with the distribution of kernel, recurrent kernel and biases for the GloVe model third LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale

For the third layer of the GloVe network, we observer the weights redistributing away from the center


The weights were most volatile at the first 100th epoch and the redistribution less is vigorous as the



weights are observed to be biased more towards positive values. We observed that the rate which the

weights distributes for the kernel and recurrent kernel is similar.


to the second layer



biases spreads out.

Page 55

Figure 9.1.1.2.4 Graphs with the distribution of kernel, recurrent kernel and biases for the GloVe model fourth LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale

For the fourth layer of the GloVe network, we observer the weights redistributing away from the center





weights are observed to be biased more towards positive. We observed that the movement of weights

from the recurrent kernel is greater than the kernel.






Page 56

9.1.1.3 fastText

Figure 9.1.1.3.1 Graphs with the distribution of kernel, recurrent kernel and biases for the fastText model first LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale

For the first layer of the fastText network, we observer the weights redistributing away from the center



epochs increase. For kernel 1, the weight distribution is rather symmetrical on both positive and

negative sides. For recurrent kernel 1, the weight distribution also symmetrical on both positive and

negative sides. We observed that the movement of weights from the recurrent kernel is greater than

the kernel.



biases spreads out.

Page 57

Figure 9.1.1.3.2 Graphs with the distribution of kernel, recurrent kernel and biases for the fastText model second LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale

For the second layer of the fastText network, we observer the weights redistributing away from the




negative values. For recurrent kernel 2, the weight distribution not symmetrical and is also biased more

towards the negative values. We observed that the movement of weights from the kernel is greater

than the recurrent kernel.





biases spreads out.

Page 58

Figure 9.1.1.3.3 Graphs with the distribution of kernel, recurrent kernel and biases for the fastText model third LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale

For the third layer of the word2vec network, we observer the weights redistributing away from the




positive values. For recurrent kernel 3, the weight distribution I rather symmetrically distributed on both



to the second layer



biases spreads out.

Page 59

Figure 9.1.1.3.4 Graphs with the distribution of kernel, recurrent kernel and biases for the fastText model fourth LSTM layer, over the evolution of 1000 epochs, with its quantity plotted on a log scale

For the fourth layer of the fastText network, we observer the weights redistributing away from the




positive values. For recurrent kernel 4, the weight distribution also not symmetrical and the weights are

observed to be biased more towards positive. We observed that the movement of weights from the

recurrent kernel is greater than the kernel.






Page 60

9.1.2 Model Comparison In this section, we plot the kernel, recurrent kernel and bias individual LSTM layers on the 1000 epoch

across the 3 different models. We can study how the weights and bias differs for each model at the end

of the training.

Figure 9.1.2.1 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec, GloVe and fastText model first LSTM layer, at the 1000 epochs, with its quantity plotted on a log scale

In the first layer of the 3 models, we observed that the weights from the word2vec model is the most

heterogeneous while the GloVe model was the least heterogeneous in the kernel and recurrent kernel.

However, the biases for the GloVe model had the widest distribution while fastText had the narrowest

distribution.

Page 61

Figure 9.1.2.2 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec, GloVe and fastText model second LSTM layer, at the 1000 epochs, with its quantity plotted on a log scale

In the second layer of the 3 models, we observed that the weights from all models distributed almost

similar in the kernel. The distribution for GloVe and fastText is similar in the recurrent kernel, whereas

the word2vec model is slightly less distributed in the positive values. The bias distribution for all 3

models is very similar.

Page 62

Figure 9.1.2.3 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec, GloVe and fastText model third LSTM layer, at the 1000 epochs, with its quantity plotted on a log scale

In the third layer of the 3 models, we observed that the weights from all models distributed almost

similar in the kernel and recurrent kernel. The bias distribution for all 3 models is also very similar.

Page 63

Figure 9.1.2.4 Graphs with the distribution of kernel, recurrent kernel and biases for the word2vec, GloVe and fastText model fourth LSTM layer, at the 1000 epochs, with its quantity plotted on a log scale

In the fourth layer of the 3 models, we observed that the weights in the kernel from fastText model is

the most heterogeneous on the positive values while word2vec model is the most heterogeneous on the

negative values. GloVe had the narrowest distribution in both the positive and negative distribution. The

weight distribution in the recurrent kernel is the widest for word2vec and the narrowest for the glove

model. The bias distribution for GloVe model was the widest in the positive values and fastText model

was the widest in the negative values

9.2 Technical Results In this section, we have the responses from the 3 different chatbot models for 100 questions. The first

50 questions are questions that the chatbot was trained on while the next 50 are questions from the

Ubuntu Dialog Corpus that the chatbot was not trained on. These 100 question and answers were

randomized.

The first column gives us the index of the table. ‘Q’ represents question and ‘A’ represents answer. For

instance, ‘Q1’ is the index for question 1 and ‘A1’ is the index for answer 1. In the second column, we

have the questions and answers. The table is structure in such a way where we have the question,

followed by the actual answer given by the dialog corpus in the next row.

The next 3 columns are the results from the 3 models, word2vec, GloVe and fasText respectively. The

top row, together with the question, gives the response from the model while the bottom row, together

with the answer, gives the score of each model. This score is evaluated based on accuracy of answer and

how human the sentence is, 3 being the best and 1 being the worst score.

Page 64

Table 9.2.1 Questions and answers from trained models and answer provided by the Ubuntu Dialogue Corpus

Question/Answer Response


Q1 what green ?

samantharonson_@ unfortu_nate

unsigned_char HELEN_COONAN_Well


l'_Affaire

sure required provide least havent anyway yes post-

exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional

do think refer porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A1

usb to ps2 converter came with the mouse in

the box 1 2 3 S

Q2 yes

Stupid_stupid Circumstances_dictate Circumstances_dictate Circumstances_dictate Circumstances_dictate

reallllly l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

if if if if check required return charters charters charters

policyIf policyIf policyIf post-exertional post-exertional

do do reistall reistall seex porn.The

porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The

porn.The R

A2 i would try putting them

there 2 3 1 S

Q3 is it okay to just hand out

Ubuntu CDs?

Indexing_Options RUSH_Wait reallllly tried

could install anyway Indexing_Options

slackware astonishing_Siemionow

l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

there can that able need files use use fill either windows

charters post-exertional post-exertional post-exertional

think try use see MicroXP thing nt do

don troubleshooting use

multi-monitors AMplz intellipoint

porn.The R

A3

absolutely what is the output of: wget -O alsa-info.sh http://www.alsa-project.org/alsa-info.sh; chmod +x ./alsa-info.sh;

./alsa-info.sh the command makes a link if you select to upload to

the server ;) ok and what is the issue? 2 1 3 S

Q4

....yeah, thats where it showed me 36 updates

but it wont do them

RUSH_Okay Circumstances_dictate etc_fstab Stupid_stupid anyway want DBX_files anyway VIGELAND_Well

l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

there if install install open file not you check installed

amyone post-exertional post-exertional post-exertional

post-exertional

but BrowseSystem xwindows xshell .Xr

BrowseSystem directory.Click

bsdinstall porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

Page 65

A4

Find the one that says 'NVIDIA accelerated

graphics driver (version xxx) [Recommended]'. Click it then click the

'Activate' button. You can't install updates in a command line 'sudo apt-

get update' then 'sudo apt-get upgrade' what

they said, only you have to use sudo What error?

the command is 'sudo apt-get update' YOu've

got an Nvidia card, it should work fine.

YOu've got to install these updates Just copy this and paste it into a terminal: sudo apt-get update then this: sudo apt-get upgrade do you

have synaptic open? gimme a sec sudo pkill apt then sudo apt-get

upgrade 1 3 2 S

Q5 yup

RUSH_Okay dpa_cb Pfft 1Gig_DIMM

Circumstances_dictate CNA_fa

Circumstances_dictate inlets_ponds_creeks inlets_ponds_creeks

opera_Elaine_Padmore opera_Elaine_Padmore


there file required required change login can check

required required return number charters post-

exertional post-exertional

but .Xr reistall reistall reistall

reistall seex porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A5 s/thread/wik entry/ 2 3 1 S

Q6

how do i get the image on the usb stick or you

mean usb cdrom

Stupid_stupid tried jolie 1Gig_DIMM MediaSmart


sure double-check register post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional

onesClick daysdetails wasnt porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A6

usb cdrom would be easy, usb flash didn't

work for me. https://help.ubuntu.com/community/Installation

/FromUSBStick 1 3 2 S

Page 66

Q7 you got the script run?

RUSH_Okay Stupid_stupid WHat

equaling_Vince_Lombardi wmf psd

Ali_Farokhmanesh_gutsy I'l back l'_Affaire l'_Affaire


sure setup directly post-exertional login if address post-


but brought install windoze reistall do porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The

porn.The R

A7 ya, this is givin gme hell- sorry. thanks for trying.. 0 0 0 S

Q8 Didn't work on my pc.

Just doesn't start

yes get let allow reallllly could might l'_Affaire


l'_Affaire

if exceptional install that post-exertional post-exertional

post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional

post-exertional

do showed modules.dep geting

diconnecting getting do try version.Click

porn.The porn.The porn.The porn.The porn.The porn.The R

A8

hmm that doesnt make sense well, if you can't get TVTime, my next

choice would be KDE... XawTV.. as noted by

someone above, works, but its ugly 2 1 3 S

Q9

-> in generality, fdisk /dev/sddX and follow

the menu -> i have already given you to command for using

debootstrap

GLENN_RENWICK Chariton_Newspapers

l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

there router codes policyIf give you anyone can post-

exertional post-exertional post-exertional post-exertional post-exertional post-exertional

post-exertional

do xubuntu porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The

porn.The R

A9

Yes, okay, thanks. Can you tell me the command for

debootstrap again, as I had a shower, and

mibbit made your text dissappear! 1 3 2 S

Q10

i visited the samba site, but I cant find anything

under downloads

Stupid_stupid rar nt reallllly vm use

PMB_Portable_software asus l'_Affaire l'_Affaire


l'_Affaire

sure run especially id know possible who required required

can charters post-exertional post-exertional post-exertional

post-exertional

thing thats dont AMtrying jaunt linuxes ut2k4

porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A10

try sudo apt-get upgrade samba that is the same version as yours where

are you located? (to select the right

download server) 1 3 2 S

Page 67

Q11

can I do that from here ... hmmmm hold on how can I do that from here ... I can access the other

files

Pfft drives_CDs_DVDs Interrogation_techniques

going ascii polish_turd going install Ehh l'_Affaire


l'_Affaire

sure input that use post-exertional post-exertional


post-exertional

do text-mode dvd-drive xwindows

pkgtool run make picClick but bash-

script have porn.The porn.The porn.The porn.The R

A11

you can do 'sudo cp /etc/network/interfaces /etc/network/interfaces.bak' and then do 'sudo

cp /media/OTHER_INSTALL/etc/network/interfaces

/etc/network/interfaces' 2 3 1 S

Q12

compiz will not let me drag to another

workspace this way work I found where its set:

System Settings > Window Behavior >

Advanced > Placement: Centered see I have no

system settings anywhere

Stupid_stupid redhat associate_Vyacheslav_Sok

olenko JEFF_BURTON_Well

Defragment l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

couldnt information post-exertional login - system 12


umm installed reistall easylist

cpuburn porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The

porn.The R

A12 I said that was for KDE...

not gnome 1 2 3 S

Q13

uncomplicated firewall for the less ambitios.

'sudo ufw enable' should be done on all macines directly on the internet

alien translates between rpm and deb

know alien @_bbc.co.uk. guess BACHMANN_Well is

Stupid_stupid l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

know that email configure install needed that would

would login need post-exertional post-exertional

post-exertional post-exertional

think alien AMSuch doubletalk vimperator

Slashcode win2k3 stepgen rtos -gui

porn.The porn.The porn.The porn.The

porn.The R

A13

i know what alien is, he wrote 'rvm' which can be

lvm or rpm 3 1 2 S

Q14

once you resolve the name to the IP you can

start a TCP connection to the actual server, if your system ALREADY knows

the resolve it will not need the internet to resolve it and will be

faste your ISP will provide a DNS but if you

want to use different ones then thats you call

RUSH_Okay Men_Seeded_Winners

System_Configuration_Utility could Elena_Nola

usr_sbin Personality_wise l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

if once work hav there concerned policyIf post-


stuffClick use use try run porn.The


porn.The R

A14

Seveas so I'll just take that as a 'yes', I am using Google's DNS instead of my ISPs? :) I am planning on doing the local DNS cache as you suggested as well, but just looking

to find out if the DNS 1 2 3 S

Page 68

actually uses Googles since I'm behind a router

Q15

well... lets just say that i get a big table, and its

aaaaall zeroes

Stupid_stupid try subservient_housewives

Stupid_stupid Chrome_#.#.###.###

1Gig_DIMM see anyway Dell_TrueMobile install


l'_Affaire

there nt get again post-exertional post-exertional


post-exertional

do think much probedisk try one

thing thing thing get ndisgtk mA.

xwindows xubuntu porn.The R

A15

I recommend you try putting that disk back

into the other computer :) 1 2 3 S

Q16

That's awesome - Anyhow... how would I

get 'x' to start via terminal?

Pfft would l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

sure would that post-exertional post-exertional

post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional

do thats thing do thing porn.The


porn.The R

A16 startx 1 3 2 S

Q17

geez i keep thinking i should start playing

WoW.. it's easy.. friends play it, lol.. but i'd drop too much time into it at

least FFXI was a challenge

Oh txt_file MediaSmart l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

ok needs he check can information you work example get either opportunity provide

provide post-exertional

yeah windoze xwindows one so

cant dont dont withstar stuffClick

nvidia do computer diconnecting

porn.The R

A17

WoW is a cash cow now. Leveling is easy, but fun. They do a decent job of

providing content for casual play and hardcore folks. I'm just too much

of an addictive personality to keep any

balance 2 3 1 S

Q18

nope !!! --- I am trying to remember my steps,

but , the main thing is to be able to modify the

fstab

Stupid_stupid going MediaSmart txt_file LTO4

Persistent_rumors l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

there would required log reinstall policyIf installed post-


do system see porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A18 im waiting to your

solution 1 3 2 S

Page 69

Q19 Whatever you do don't

do that ;)

Oh Calverts_disappearance speaker_Koksal_Toptan

reallllly try might JON_ALPERT

Someway_somehow things

PETER_COSTELLO_Well going want KDE_desktops

l'_Affaire l'_Affaire

sure concerned need numbers well if unit installed policyIf

post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional

thats installed bucks.Click four

comupter google more thats being but do try install install porn.The R

A19

lol ok, I'll be sure not to where do you

reccomend searching? 1 2 3 S

Q20

how to open the other system then. from

network in nautlius?

Stupid_stupid l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

sure falan computer work computer provide installed

operate logging post-exertional post-exertional post-exertional post-exertional post-exertional

post-exertional

onesClick daysdetails trying

do porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A20

you'd need to share the filesystems out via

NFS/Samba/SSHFS etc etc 1 3 2 S

Q21

no. when i brows for a connection on my

domain (localdm) i get this error: Browsing for service type _rfb._tcp in domain localdm failed:

Timeout reached

oh ACD_ACD_Systems wanta Which_brings

Cal_JJ_Arrington l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

sure well nt display to fill there can icono if can if interface ip

whether

do HWInfo bits.A porn.The porn.The

honest work Henyman work thing porn.The

porn.The porn.The porn.The porn.The R

A21

Ã¯Â»Â¿@wigren If this question is below your

level, please do not take offense, as I've found

that such questions can sometimes be worth

asking: You have enabled VNC on the desktop,

correct? Ã¯Â»Â¿@wigren I'm

gonna go refill my coffee. Feel free to send me a personal message,

so I don't miss it. 1 3 2 S

Q22 wierd is an

understatement :P

http yeah datamart hurdled_tackler

Matthieu_Humery_head line lol still ought


http if well documents post-exertional post-exertional


post-exertional

http but do go thing electical drive


A22 outrageously wierd?.. 1 2 3 S

Page 70

Q23

sudo grep -r -i HTTP /etc/apt/* any

response?

Stupid_stupid charset_= CNA_fa l'_Affaire


l'_Affaire

sure 3 entries post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional

post-exertional

do 2 objectshidden4fun porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A23 yeah. the source.list and

2 entries in apt.conf 1 3 2 S

Q24

anwy way to get it to display automatically

what do i need to add to make it display the

options of safemode, memtest, and windows?

Oh Stupid_stupid Following

inlets_ponds_creeks l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

there sure back next registrations if post-exertional cvs present charters vista able

specific required post-exertional

think nt work problems work

porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A24

i'll post you a default menu.lst to pastebin so

you can just look for yourselfe what you

need. 1 2 2 S

Q25 reboot and let me know

Oh http_ftp Tee_hee really think Ovidiu_Rom

l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

sure possible use able anyway did upload one time

registration could post-exertional post-exertional


umm daysdetails do Well and thing work

work one most porn.The porn.The porn.The porn.The

porn.The R

A25

? Ok. Login ok. nvidia installed. Back to

original issue from 5 hours ago! Turning on

Desktop Cube/Rotate in Hardy resets Effects from 'Extra' to 'None' give up

yet? (FYI - nvidia installed via Envy) It all

work is Fiesty and Gutsy You crack me up!

Thanks. Resolved. It was working before you had

me reinstall xserver :) but I appreciate the

attemp 1 2 3 S

Q26

No matter what the operating system is, you HAVE to find a program to backup Or it comes with the distribution, ubuntu in fact does

come with everything you need to back things up If you do not like that

approach, you can search for a solution

oh os bullsh_* look l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

sure there laptop if check you would wont not that that nt

possible post-exertional post-exertional

do debians withstar do daysdetails see

oh mean thing porn.The porn.The porn.The porn.The porn.The porn.The R

Page 71

more to your liking just as you would on any

other systeme

A26

No they shouldn't, there's no law in the

government to say 'NO OPERATING SYSTEM

WILL HAVE A BACKUP UTILITY' 1 3 2 S

Q27 Yep Should I undo that

or change it or anything?

Stupid_stupid Stupid_stupid might

Dominic_Rhodes_plow webmail_interface ftp calvin redhat see come

want l'_Affaire l'_Affaire l'_Affaire l'_Affaire

if qredit card executables check would there that policyIf post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional

do porn.The porn.The porn.The porn.The comupter porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The

porn.The R

A27

can you paste the results of 'ifconfig -a' to

pastes.ubuntu.com, and send me the link? i

wouldn't change it right now 1 2 3 S

Q28 ActionParsnip i dont care

how i fix it lol

Stupid_stupid samantharonson_@ do

RUSH_Okay insisted l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

sure can work delete would server need check else post-


thing PMhay navigational use

OS.Click thing porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The

porn.The R

A28

slighly vague on that cause i haven't had to since probably back

when i had teh mx 440 ActionParsnip has a point about nvidia-

settings 1 3 2 S

Q29 yeah, i figured that... :)

yeah docx_file Contrasting_textures downloaden anyway

going butched i_dunno think ought see l'_Affaire

l'_Affaire l'_Affaire l'_Affaire

there tll post-exertional post-exertional not 12 config output

please able required (250) post-exertional post-exertional

post-exertional

thats with windos glibc replyGetting Vrythramax would <pre facility.Read


A29 btw, whats the shortcut

to the terminal? 2 1 3 S

Q30

ok, i think these are connected, except the

video driver case maybe, but it is also possible

Stupid_stupid 1Gig_DIMM AP_HOCKEY_NEWS l'_Affaire l'_Affaire l'_Affaire l'_Affaire

sure know needed amyone sumtink information login

there can required can using that post-exertional post-

do think put send the porn.The


Page 72

that that is the source... l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

exertional porn.The porn.The porn.The porn.The

porn.The

A30

do the freezes occur when certain apps are

running? have you tried to xkill them? 1 3 2 S

Q31

, yes merodent , rahul_ , same point, grub> (cant

use quit/exit)

Stupid_stupid go reinstalling_Windows

Sooooo opera_Elaine_Padmore


there updated log code post-exertional post-exertional


post-exertional

jobClick page1.htm server.domain.com

tpye porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A31 quick ban razer freespire

yuck 1 2 3 S

Q32

but I dont quite understand how the raid

could be the problem here?

Pfft supose try horsepower_Hemi_engine

contactmusic_quoted msn_messenger

Oh_c'mon os findout Chrome_#.#.###.### Dear_Bossy l'_Affaire

l'_Affaire l'_Affaire l'_Affaire

if check post-exertional amyone get xp can that delete

one does if either required post-exertional

do sudo pkgtool can onWikianswers

Quotecan linuxes koneksi idle-timeout run


porn.The R

A32

because grub cant detect the raid-array if it is not activated with dmraid

(what the alternate-cd is doing) maybe give it a try? if it is not working

you can use usb-flash or buy a ide-drive good luck 2 3 1 S

Q33 same reliance net

connect

@_kiranchetrycnn_@ 5.x. is l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

there required file can file policyIf post-exertional post-


ok terminal winmodems

connection PMfeel ping porn.The


A33 ah ok 1 2 3 S

Q34

you cant have got it more right, thats exactly how it is :) if you check

disk usage now - you should see a lot more freespace for /home :)

RUSH_Okay vm AdContext really see decided might could

might might might get anyway l'_Affaire

l'_Affaire

sure vista happen that default to install need would that

there ubuntu either applications post-exertional

but getting waay mHotspot gnu-linux porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

Page 73

A34

that's a thing of beauty then , coz i can still just

as easily look at my music 'close by' without

having to navigate to /media/multimedia/music everytime indeed this

is the acid test 1 2 3 S

Q35

g-ma meant to be a gud modeller, but also no

renderer afaik

anyway wanted let ought presumptious andrea

SADY happeneing anyway l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

give try even specified specific that possible specific example one program post-exertional

post-exertional post-exertional post-exertional

give try anything thats thing thing

cperl-mode stuffClick thing

things thing notesLoading

pkgtool PMmight porn.The R

A35

give kpovmodeler a try, perhaps the povray manual has quite an

amusing example of a raytracer written in

povray's SDL ;) 2 3 1 S

Q36 o okay i just go

permission denied

Stupid_stupid courtney_cox

spokesman_Eitan_Bencuya RUSH_Okay could

www.geospaceinc.com expected

Fox_affiliate_WUPW l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

there required policyIf post-exertional post-exertional


daysdetails porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A36

sudo wpa_passphrase mywireless_ssid

'secretpassphrase' > /etc/wpa_supplicant.con

f 1 3 2 S

Q37 : then?

anyway see going Laughing.

Circumstances_dictate reallllly MediaSmart

reallllly Which_brings reallllly

Dominic_Rhodes_plow l'_Affaire l'_Affaire l'_Affaire l'_Affaire

if as sawed shown files hang post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional

post-exertional

think loadlin thing cfset use

kpowersave dpkg-divert install do system-config-

network timesync connection little

porn.The porn.The R

A37 seperate hard drive.. 1 3 2 S

Q38

'Response: 500 OOPS: vsftpd: cannot locate

user specified in 'guest_username':$USER

'

RUSH_Okay associate_Vyacheslav_Sok

olenko really really n'est_pas l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

if one there server so error away fixed nt nid returned

return specified policyIf post-exertional

think daysdetails thing fc18 install

line--a cued porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

Page 74

A38

well, it seems that you have to specify an user...

like 'virtual' or something maybe it will

help you http://archives.neohapsis.com/archives/openbsd/2005-12/0755.html this one surelly will help you, but its quite the same i

told you :) http://howto.gumph.org/content/setup-virtual-

users-and-directories-in-vsftpd/ you will need to enable the local users, i

guess 1 3 2 S

Q39 bash: /dev/dsp: Device

or resource busy

soooooooo SSPX_bishops l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

sure dirs file information ip given be there there required

file can L8S post-exertional post-exertional

thats porn.The utili- porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A39 lol 2 1 3 S

Q40 iso 10.10 desktop

want using anyway l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

not 64bit command file up create display to possible

required post-exertional post-exertional post-exertional


Puppylinux sorta withstar tv.Click

puter. killdisk porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The

porn.The R

A40

32bit? 32bit? and what program are you using to

burn? 2 3 1 S

Q41 that is odd then

Pfft msn_messenger Circumstances_dictate

TM_Distribution_Reinvestment ls intefere WHat

stuff l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

sure that use able post-exertional post-exertional


post-exertional

do isant but do install porn.The


porn.The R

A41

i know i can't figure it out lol it's been driving

me a little crazy 2 3 1 S

Q42 you have 2 entries for

that repo?

RUSH_Okay like refreshing GIF_BMP

setup.exe VKernel_virtual ought l'_Affaire l'_Affaire


sure computer when mount computer nor if need get

elsewhere post-exertional post-exertional post-exertional post-exertional post-exertional

but stuff think thing put .msstyles

objectshidden4fun but windows.Click porn.The drivers.A porn.The porn.The porn.The porn.The R

A42 no 1 3 2 S

Page 75

Q43 Is there a master list of

all PPA's?

Indexing_Options RUSH_Okay

reinstall_Windows_XP Tweak_UI see l'_Affaire


l'_Affaire

there there there changed nt installed throughout place

whether others login login nt one post-exertional

think mesa-utils install.Click mac-like dont thing put sub-

CPU but facility.Read


porn.The R

A43

I think there is a search mechanism, perhaps a directory, let me check 1 2 3 S

Q44

i wanna trasfer the repos to my pc.. coz my friends

done have direct internet connection..

Stupid_stupid vm dont do don'tknow aregoing

another l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

sure if check login everyone there amyone post-exertional


post-exertional

thing do more http Rapidleech

porn.The cpuburn porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A44 so you don't need to login via ssh for that 1 3 2 S

Q45

:) but the most strange.. i needed to copy the boot of pendrive to / of my

usb hd. hahaha it means the pendrive is not necessary.. excet

because i will not need to install de the grub in the mbr when i start my

ubuntu, my screen freezy on 'configuring Network interface' i

need to press control alt del to skip it, and under

it i see the message about rcS and rc 6. i

want my linux skip this step otherwise i will

always need to press ctrl alt del, do u know how to fix it ? or did u hear about it before ? oh..

just rcS failed to call or something.. but

ok i will restart and i will tell it to u just wait i got it nothing there, when i press ctrl alt del.. just show rcs killed by

sign terminal.. and rc6 .. the same message well

the time is default 120 and i have a dhcp

connection. i think it could be my wireless

even i wait .. this is step

Pfft tried_unsuccessfully want DVRS

www.geospaceinc.com bring courtney_cox i_dunno want see teeny_tiny_lapse


if interface number post-exertional post-exertional


think ummm Postsuser got install

work porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

Page 76

never finish or skip. i really need to press ctrl

alt del so, what can i do ? not sure, its a vaio notebook ah one is

modem other ethernet.. yes the wireless is other really ? cant i skip this

step on init.d ? where do u mean exctally? system

admin network or network tools ? wireless has a - wired has a v and modem - if i disable the wired.. i will not have

internet wireless connection is roaming

mode enable. is it a problem ? please could u

look to it ? http://img120.imageshack.us/my.php?image=ikozu1.png oh shit im stupid sorry.. just feeling stupid

but need i disable that enable roaing mode ?

ok.. lets see what happens thank u very

much. u are genius :** :) i was just reading and im

tired already, i need some coffee.. u need to take a break :) notebook can use agp video too ? or in the case mine is a nvidia it must be pci ?

A45 1 3 2 S

Q46 toshiba satellite a100-

797

think really sudo sudo search see sudo

www.ci.ankeny.ia.us O'REILLY_OK l'_Affaire


l'_Affaire

would work sudo sudo hidd -- search not sudo hidd where

information know post-exertional post-exertional

thing thing sudo sudo hidd -- search

mean sudo usbmount -- search

sorry porn.The porn.The R

A46

the only thing that works for me is when I do 'sudo

sudo hidd --search I mean (sudo hidd --

search) sorry 1 3 2 S

Q47 that's the kernel stuff

Pfft spokesman_Hugo_Eriksse

n 7_Upgrade_Advisor escapist_fiction capping


sure would make concerned continue either post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional

do thats provide snap.do installed

near feel.Click porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

Page 77

A47

true /etc/initramfs-tools/scripts/init-bottom

is empty :-( 1 2 3 S

Q48

same command? still same effect. waits like 1.5 seconds and then

new command line pops up

@_kiranchetrycnn_@ www.janus.com run

Continental.com really l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

there use did click access post-exertional post-exertional


ok ask ping.exe thing dofile disable most backup.tar.gz

file porn.The porn.The porn.The porn.The porn.The

porn.The R

A48 try it with the -v switch 1 3 2 S

Q49 in the same file? No luck

with adding that line

imma going said l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

sure windows work indicate them required either check

need disable parameters post-exertional post-exertional


umm operated question umm

PreservedFish do icon do but

operated regard thing porn.The

porn.The porn.The R

A49

Did you restart the pam and sshd processes before trying login? 2 3 1 S

Q50

shredder only wipe 'files', not free space i

think

Oh CNA_fa Stupid_stupid l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

ok as possible ask nt anyway there able setup anytime

either change post-exertional post-exertional post-exertional

ok installing install work bucks.Click

install opsys install porn.The porn.The porn.The porn.The porn.The porn.The

porn.The R

A50 ok... well tehre probably are other programs too .. 1 2 3 S

Q51 the buggyness of the distro is fully on topic

anyway try Vyke_Pro try AMERICAblog_Joe_Sudba

y Beckloff_exchanged_pleasantries txt_file get might

go l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

there required changes which there whether we can alot

following post-exertional post-exertional post-exertional


yes removepkg reistall looking linux

grphics Gigabit-Ethernet base-

installer installed being porn.The


A51

No, this is a channel for support, not discussions

about Ubuntu 1 3 2 S

Q52 it doesnt state the time...

Stupid_stupid http://www.cbrands.com

upp install ktre.com go l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

if would connections post-exertional post-exertional


but onesClick porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The

porn.The R

A52 1 3 2 S

Page 78

Q53 BOOOOOOOYAAAAAAA

HHHHHHHH!

RUSH_Okay Circumstances_dictate Circumstances_dictate Circumstances_dictate Circumstances_dictate Dominic_Rhodes_plow Dominic_Rhodes_plow


login installed count computers computer check additional file log able post-exertional post-exertional

post-exertional post-exertional post-exertional

thats onWikianswers

drives.Click components.Click

battery.Click comptuers porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A53 You scared me. Get soemthing to work? 1 3 2 S

Q54

can someone help me print from a wireless printer using 9.04?

Pfft don'tI'll Kimi_RÃƒâ€žIKKÃƒâ€“NEN

_Yeah Um_um Sooooo COACH_FISHER get automatic_debits By_Azenith_Smith

RUSH_Okay Pajama_Jeans Earn_Rs.####/day_workin

g l'_Affaire l'_Affaire l'_Affaire

sure amyone up work that reset error corresponding specified specified there

apache timestamps there check

do but use Several got get get but

probably one try install do put

porn.The R

A54

is the wireless printer already set up?? Does it have an ip and can you

ping it? 1 2 3 S

Q55

Bug report time on a Breezy>Dapper update! I

did 3 'sudo apt-get -f install's to get things

going, once from an 'x-less' environment,

because the installation screwed up 'x'. In the

process: 1) My sound no longer works. (I have an nVidia card) 2) I had to

re-install a few programs that I had running before, they just

disappeared. To put it mildly... whoever said

that Dapper Drake was ready to release was not

looking at life the way that I l it.

soooooooo burke gma l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

ok if past installed tutorials use need need install if sure post-

exertional post-exertional post-exertional post-exertional

ok nt isant thing nt use use do think


A55

it might be better off the CD. I did an update to Breezy that way and X broke but otherwise it

wasn't too bad. Dapper broke _everything_ but

that was beta2 1 3 2 S

Page 79

Q56 oh god. my frickin ex gf

called.

Oh Tru##_Unix Circumstances_dictate


sure network file twice checked releaseDoors post-exertional post-exertional


post-exertional

do network work porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A56 kick her to the curb 3 1 2 S

Q57

Hello folks, please help me a bit with the

following sentence: 'Order here your

personal photos or videos.' - I think the only allowed version is 'Order your personal videos or photos here.', but I'm

not sure, are you? Did I choose a bad channel? I ask because you seem to

be dumb like windows user

shutup problem Easynews l'_Affaire l'_Affaire try gussy might l'_Affaire


sure error but post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional

post-exertional

do problem thats files. porn.The


A57

the second sentence is better english and we

are not dumb 1 3 2 S

Q58

Evening All Is this the best place for new

converts from Windoze to Linux (Ubuntu 8.10)?

yeah +_kerry_kerry Saintula um_ah l'_Affaire


there anytime software reinstall re-install re-install uninstalled anything there know service wont if post-exertional post-exertional

thats isant see Contributor82 stuff

thing porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A58 yes 3 1 2 S

Q59

so i won't be able to use perl rename to do what I

want?

Oh ughh tar.gz tar.gz anyway reallllly os_x Stupid_stupid would

would reallllly l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

there required post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional

do use operated porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A59 1 3 2 S

Q60

how to transfer files (250 GB) from my pc and my

new laptop? how to transfer files (250 GB) from my pc to my new

laptop?

Stupid_stupid TRUMP_Well vm imma

want docx_format really courtney_cox

news.bbc.co.uk l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

sure falan register post-exertional post-exertional


onesClick daysdetails IO-APIC threshold fast-boot porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A60

not by getting kicked from the channel, I

suppose. 1 3 2 S

Q61

where i can i find docs on how to upgrade

ubuntu from warty to hoary?

try AEGIS_LTD Debian_Sarge might could

attempting l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

if see that anyone either post-exertional membership post-


do AWizzArd say deskstop software

puter. things computer Thanks

put three porn.The R

Page 80


post-exertional post-exertional porn.The porn.The porn.The

A61

sudo gedit /etc/apt/sources.list and change all instances of warty to hoary; sudo

apt-get update and then sudo apt-get dist-

upgrade should do it 1 3 2 S

Q62 !pinning | zanberdo

!guidelines | cryptopsy

computor nt spacewalking_tourist get

get Necth_described l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

login our can rego possible provide your 21 access ie etc install post-exertional post-exertional post-exertional

thats MicroXP probedisk win2K isapnp porn.The


porn.The R

A62

/topic works too - I find it more conveniant than copy/pasting. What say

you? 1 3 2 S

Q63

Can anyone recommend an NFS alternative that works transparently?

Pfft try Erland_E._Kailbourne


sure post-exertional additional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional

do graphcs run porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A63 once you get NFS set up,

it IS transparent 1 2 3 S

Q64 PLEASE WOMENS

Pfft reallllly l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

sure check what post-exertional post-exertional


thats qbittorrent so-called porn.The

porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The

porn.The R

A64 /join #ubuntu-es 1 2 3 S

Q65

Question guys, I know it's probably in the

manual some where but I'm too lazy to look, where abouts is the

setting to make the icons on the desktop smaller?

yeah later Bryan_Hahnfeldt

sober_Boole_added come outrunning_defenders

maybe logoff there'sa_silver_lining line

RUSH_Okay nifty_wraparound nifty_wraparound l'_Affaire l'_Affaire

sure example provide use need installed not provide disk need

if post-exertional post-exertional post-exertional

post-exertional

but rstrui.exe stuff windoze MAME32

perl-base nmap configs. -tf apt-

pinning apt-pinning porn.The porn.The porn.The porn.The R

A65 can't you alt + middle click and resize them? 1 2 3 S

Page 81

Q66 hello everybody i've a

simple problem

shutup Er_um http_ftp courtney_cox Pfft try


l'_Affaire

sure able there log post-exertional post-exertional


post-exertional

do Kanasz winmx and porn.The


A66 which is? 1 3 2 S

Q67

Hi all, I want to install the guest extensions of

virtualbox in ubuntu 10.04 but they dont

work

Pfft move ftp docx_file let wayyy JACKIE_CAMERON want l'_Affaire l'_Affaire


l'_Affaire

if set there fill either post-exertional post-exertional


do one picClick ALL-IN-WONDER should downloud MicroXP

get install bucks.Click ubuntu component booting re-install porn.The R

A67

: Please join #ubuntu+1 for Lucid/10.4

support/discussion. 1 2 3 S

Q68

please help me i movie player display not fine where i go for better

performance

Pfft jolly_japes l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

sure kde abiword post-exertional post-exertional


thats kpowersave bt5 PMOhhhhhh

daysdetails -l. umm fuck.Click apt-get

seex porn.The porn.The porn.The porn.The porn.The R

A68 what GPU do you have Graphis Processing Unit 2 3 1 S

Q69

huh, PPA sounds sweet! who all is getting in the

beta?

Sooooo Lorem_ipsum want think want think noscript really another

Sooooo might Auslogics_Disk_Defrag

might Circumstances_dictate

l'_Affaire

sure restart install needed that able anyway use post-


post-exertional

heheClick being use tvout oh do

porn.The dont porn.The porn.The porn.The porn.The porn.The porn.The

porn.The R

A69

it would be really awesome if it

automatically generated packages for all

distributions based on the source code 2 3 1 S

Q70

Hey Pidgin won't connect. I type my email and password but it says it's incorrect. but when i try to sign in though the website it works fine! :s

Er_um use get try exe_files gigapixel_images

use l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

if immediately package work should post-exertional post-


post-exertional

thing thing kpowersave linuxes

uses porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The

porn.The R

A70

i am a user too, but i think you also may know

there is also a #pidgin channel 3 2 1 S

Q71 That doesn't work for me

Pfft gma l'_Affaire Ovidiu_Rom l'_Affaire


sure there that check install well that that policyIf post-exertional post-exertional

do thats put fix porn.The porn.The porn.The porn.The R

Page 82


post-exertional post-exertional post-exertional post-exertional

porn.The porn.The porn.The porn.The porn.The porn.The

porn.The

A71 1 2 3 S

Q72

do i need to defrag my harddisk first on

windows first before i install ubuntu?

Sooooo really fucking_retarded think

iFuntastic gcc eh supose Wash_rinse zac

Stupid_stupid ought gma norman l'_Affaire

there able novatel policyIf post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional

post-exertional

do comouter think think thing laptop.My

xwindows most classpnp.sys nt

core2quad objectshidden4fun

thing problems.More

porn.The R

A72

nah its fine. but if you want to, go on you dont need to defrag. but do you mean partition? 3 2 2 S

Q73

i know this question will send red-flags a mile

high, but i can't seem to get it working on my own. I'm trying to get

aircrack-ng to work, and when i follow the

howto's, it keeps telling we that it can't put my

atheros card can't be put in monitor mode. I've

personally put this card in master mode to use as

a WAP, so i know it SHOUDL be able to do

this also.. is there something ubunt-specific that is goofing me here?

Stupid_stupid TERRY_FRANCONA_Well

@_kiranchetrycnn_@ maybe l'_Affaire l'_Affaire


l'_Affaire

sure if working will there upgrade provide that possible

that there regarding count post-exertional post-exertional

thing do thing 233mhz drbdadm but can probedisk Seems operate put

sometimes.A webbrowsing

graphical porn.The R

A73 ndiswrapper? 1 3 2 S

Q74

It's easy if install first windows 7 and after it

ubuntu. So as last is ubuntu it find all other

operating files to startup.

Stupid_stupid want would docx_file Kinda_sucks

gma l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

if but there able going post-exertional install post-


but winmx porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A74

yes windows lost the bootloader or whatever it is but using liveCD it

can be re-enabled 1 3 2 S

Q75

what time will jaunty-ubuntu mostly be

released

samantharonson_@ reinstall_Windows_XP upgrade wayyy let gma

works Australian_sheepskin_boo

ts l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

sure able provide changes that functions possible well check


do say think thing thing thing think instlled use thing

but classpnp.sys put objectshidden4fun

porn.The R

Page 83

l'_Affaire

A75 ^^^^^ 1 3 2 S

Q76

sorry subcool whats a light DM without compiz that will still have game

capability?

Stupid_stupid going Sooooo docx_format

Digital_Negative_DNG like nmap wmf

detrimentally_affects RUSH_Okay l'_Affaire


if before linux provide Linux hr post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional

post-exertional

do quite pyNeighborhood

thing pkgfile operate AMttt

besides porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A76 XFCE is also pretty

lightweight 1 3 2 S

Q77

hi i would like to upgrade to 11.04 from 10.04 but

update-manager -d propose me only 10.10

should i change my preference to testing or

something like that ?

Pfft hollyrpeete_@ think charset_= Inbox_folder let want l'_Affaire l'_Affaire


if get well you that need change allow user generate provide provide able post-exertional post-exertional

do but other Powertop provide

try use comments.Click use AMPlz xshell make

AMttt make porn.The R

A77

Also, natty/11.04 support/discussion is

only in #ubuntu+1 1 2 3 S

Q78 hey, I just changed the

cd drive AND it worked!

Er_um ftp regional_energo

monetarism_adopted think hahah

SCOTT_THISTLE_covers etc_fstab

Circumstances_dictate gma MammoView_Ã‚Â® [email protected]

get ought l'_Affaire

if would post-exertional post-exertional post-exertional


thing daysdetails use to umm


A78 hehe 1 2 3 S

Q79

Help! If I type in 'sudo depmod -a' nothing

happends. It asks me for the sudo password, I

enter it, and then I just get the regular

command prompt. Should I not get some

data from that command?

Alright_alright associate_Vyacheslav_Sokolenko l'_Affaire l'_Affaire


l'_Affaire

sure 06794 post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional

ltsp5 fugured gonaads -----signature-----

Ã¤Â¸â‚¬Ã¨Ë†Â¬Ã¨Â¨Å½Ã¨Â«â€“

freespire install onesClick porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A79 no 1 2 3 S

Q80

its my dads, ive never used it before. IT works for him so i think its all

set up

lÃƒÂ wan't make Ovidiu_Rom l'_Affaire


Ovidiu_Rom l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

sure well portable either go there that 06095 post-


post-exertional

AMttt use install sets.Click thing

onesClick ntfs get of Ã©Â ÂÃ¦â€¢Â¸

minutesAdd thats velineon

circuit.Click porn.The R

Page 84

A80 1 3 2 S

Q81

i'm trying to tftp to a router. it asks me to

specify a port....which port do i use?

Stupid_stupid sanctuary_DellaBianca

About_generica_viagra_viagra Golly_gee

RUSH_Okay think l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

sure either be able help there you that sure sudo return install option grep post-

exertional

thing porn.The more more

troublshooting being xubuntu

multimedia bitTorrent network computer wireless

ping traps.The porn.The R

A81 tftp:69 ftp:21 1 2 3 S

Q82 its any soft to free ram?

lÃƒÂ Yeah_uh_huh get wayyy smug_git want

site_http://www.lockheedmartin.com files Ehh l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

sure reinstall file click be application card required

information able vista required nt that post-exertional

AMttt install porn.The zoodad withstar windowz

linpus put objectshidden4fun

do porn.The porn.The porn.The porn.The porn.The R

A82

free ram i wasted ram managin ram is the

kernels job 1 3 2 S

Q83 thanks :)

Er_um Circumstances_dictate

reallllly Dominic_Rhodes_plow Dominic_Rhodes_plow opera_Elaine_Padmore opera_Elaine_Padmore


sure config be possible required code post-exertional


do thing Edonkey dpkg hexchat use

happens.Can porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A83 1 3 2 S

Q84

i need to send a mail when my system starts. but since init/d scripting is beyond me i want to

know if there is a prgram where you can put in yout mail, smtp server and have that program

startup with the complete system

Stupid_stupid 1Gig_DIMM try might WRT##G MediaBlvd_Did Pfft

anyway happiest_camper Appreciable_downside_ris

ks docx_file like courtney_cox l'_Affaire

l'_Affaire

sure error amyone post-exertional post-exertional

post-exertional post-exertional install install additional work install post-exertional post-exertional post-exertional

thing reistall norecoil DoomFrost

install xwindows porn.The

driver.Click porn.The porn.The porn.The porn.The porn.The porn.The

porn.The R

A84 1 3 2 S

Q85 thanks

Er_um Circumstances_dictate

reallllly Dominic_Rhodes_plow Dominic_Rhodes_plow opera_Elaine_Padmore opera_Elaine_Padmore


sure if check required check check log required charters

post-exertional post-exertional charters charters policyIf post-

exertional

do reistall reistall seex seex porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The

porn.The R

A85 1 3 2 S

Page 85

Q86

how to clean remove wine and app that

installs.... i try remove with 'apt-get purge wine' but my apps that install on wine still there... any

help please :).

Stupid_stupid f_cked ctrl_alt l'_Affaire l'_Affaire


sure falan password access login once access before

before should configure specify install post-exertional post-

exertional

onesClick daysdetails Hdds install iso-image

remotley porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

A86

they are usually stored in $HOME/.wine so the

package-manager hasn't anything to do with it 1 3 2 S

Q87 how can I install mercury messanger on ubuntu?

Stupid_stupid want dn't Spencer_Troian_Bellisario QuickTime_#.#.#_update

want Raikhan_Daffa l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

sure specify policyIf post-exertional post-exertional


onesClick operated Oviously porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The

porn.The R

A87

you need to follow the directions: it's tricky, but

you can if i remember right you need to do some sneaky stuff to enable the sys tray 1 3 2 S

Q88

Ubutntu 9.04 beta: compiz: missing

transparency effeccts on all windows. everything

else in compiz works great. any ideas? intel

graphics

try might l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

there either get executable you using asap icq charters post-


but AMReport Add-ADCentralAccessPolicyMember Install TheBigDawg non-

gui apt-get sites.Use CARRYOVER ActSection

Subdivision So-called Subdivision

Subdivision porn.The R

A88 #ubuntu+1 3 2 1 S

Q89

o man sorry i have to much in my head forget

some time's the rules for all channels i'm in.

Stupid_stupid think tummy_pooch try anyway

wmf ##.#.#_Leopard loooooove cheapy this

would =_strlen system##_folder

etc_fstab l'_Affaire

there if linux return there work using post-exertional post-exertional post-exertional

post-exertional post-exertional post-exertional post-exertional

post-exertional

daysdetails linpus Thats say onesClick

Quotecan do RemotelyAnywhere

use use system -r1.11 problems

harddrive porn.The R

A89 1 3 2 S

Q90

Is something going on? All of my mini.iso installs are failing on the linux-generic package. They were working before,

but it's been a few weeks since I needed to install

Ubuntu.

Indexing_Options going Wally_Pipped

Ovidiu_Rom l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

there need there ms post-exertional post-exertional


post-exertional

think get poweriso use use thing thing

run porn.The porn.The porn.The porn.The porn.The porn.The porn.The R

Page 86

A90

the linux-generic various meta packages have

been updated, but the actual packages aren't up yet (or they weren't

earlier) 1 2 3 S

Q91

http://pastebin.com/d74f8dc93 i guese thats what you want no ?

Stupid_stupid want want try would persuade

Penny_Musgraves wayyy l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

l'_Affaire

there outside when folder shortly login one that access

login corp post-exertional post-exertional post-exertional

post-exertional

do porn.The lol files equipt. porn.The


porn.The R

A91 lspci | grep vga 1 3 2 S

Q92 ah, ok.

Oh Circumstances_dictate reallllly l'_Affaire l'_Affaire


sure well stored post-exertional post-exertional


thats use thing somewhat kinda do porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The

porn.The R

A92 1 3 2 S

Q93 no the text based

oh document try sheÃƒÂ¢ easily_hackable

eeny_meeny_miny_moe would

Kimi_RÃƒâ€žIKKÃƒâ€“NEN_Yeah system##_folder friend_Dion_Mial would

filesystem come l'_Affaire l'_Affaire

sure work try post-exertional policyIf policyIf register

whether address following possible login ptr that post-

exertional

do reistall tftpboot say software called servers.A but ran


A93

ok , i have intel onboard vga, with hdmi output,

how do i use it? 1 3 2 S

Q94

is there any easy way to update the refresh rate to 72 instead of 60 or

something, ubuntu isn't auto-detecting it

Indexing_Options RUSH_Okay might will

id_=_#####,## eeny_meeny_miny_moe

ought want do BSODs anyway think l'_Affaire


there there see check ask specific specified

corresponding sudo hostname not configure not possible

post-exertional

think mesa-utils reistall porn.The

porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The porn.The

porn.The R

A94 LinuxJones check

XF86Config-4 1 3 2 S

Q95

look at the download page without live

desktop and the gui grub.cfg

anyway reallllly RUSH_Okay l'_Affaire


l'_Affaire

if not same post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional

post-exertional

do run say say to script.Click textpad locate costs.Learn porn.The porn.The porn.The porn.The porn.The porn.The R

A95 2 3 1 S

Q96

Is it possible to disable the graphics card in my AGP slot and run from my onboard graphics without removing the

Indexing_Options RUSH_Wait reallllly l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

there can well he ubuntu either example access can

provide using install required work post-exertional

think try use but make nt drivers


Page 87

card? l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

porn.The porn.The

A96

Probably. Look at the BIOS for your computer,

not here. (!) 1 2 3 S

Q97

Hello Does Ubuntu have somekind of register to configure applications and os settings? Hello

Does Ubuntu have somekind of register to configure applications

and os settings?

shutup reinstall @_kiranchetrycnn_@

skeeved Circumstances_dictate


sure disk delete filename go post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional post-exertional

do problem ummm go systems.One

stuff thing porn.The porn.The porn.The porn.The porn.The porn.The porn.The

porn.The R

A97

Linux doesn't have an equivalent to the Windows registry 1 2 3 S

Q98 !schedule october?

RUSH_Okay seemed might mount A.It_s really


l'_Affaire

login available there before mil register 64 required update


thats stuff debian PMup apt-get apt-get devx.sfs have

scsi porn.The porn.The porn.The porn.The porn.The

porn.The R

A98 right, october. 6.10 (2006, 10th month) 3 2 1 S

Q99

any OS is better than windows All microshit employees should be

stoned to death

hollyrpeete_@ Pixelated Homepages

aunt_Philomena_McCann Labor_frontbencher_Tany

a_Plibersek daughter_Tomasita

supose Rightfully O'Reirdan Bert_Brantley

Kazinform_cites_RIA_Novosti

Opposition_Leader_Mario_Dumont could l'_Affaire

l'_Affaire

sure install lspci check computer because failure there there sudo fix post-exertional post-exertional


yes being go fugured errorClick crapClick ( reistall

questionsClick onesClick

AMSOOOO thing xcircuit cable-

connected porn.The R

A99 please stop 1 2 3 S

Q100

oh, ok. It seems a project that dates long time ago. Never heard about. Thx

Oh tar.gz MediaSmart l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire l'_Affaire

sure get able L8S post-exertional post-exertional


post-exertional

do but windows get do have bucks.Click make thing ie use


A100 1 3 2 S

Total Score 129 250 215

Page 88

9.3 Activation Analysis We analyse the reverse cumulative distribution function (CDF) and variance for activations of the 3

models when the sentend vector is fed into the LSTM network. The peak occurred approximately at

around the 260 epoch for word2vec, 150 epoch for GloVe and fastText.

9.3.1 Reverse-CDF We plotted the reverse CDF for the 3 models for each gate in each layer at interval of 100 epochs.

Similar results were found in all 3 models.

From the graphs, apart from the 0th epoch, all the graphs have a close to flat increase from 0 to 1

(sigmoid function) or -1 to 1(tanh function). The likelihood that the activations take values of either 0 or

1(sigmoid function) or -1 to 1(tanh function) is almost 1. This means that almost all the values are close

to either 0 or 1(sigmoid function) or -1 to 1(tanh function)

word2vec

Figure 9.3.1 Reverse CDF plot for the word2vec activations in the first gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervasl.

Page 89

Figure 9.3.1.2 Reverse CDF plot for the word2vec activations in the second gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Figure 9.3.1.3 Reverse CDF plot for the word2vec activations in the third gate (tanh) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Page 90

Figure 9.3.1.4 Reverse CDF plot for the word2vec activations in the fourth gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

GloVe

Figure 9.3.5 Reverse CDF plot for the GloVe activations in the first gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Page 91

Figure 9.3.6 Reverse CDF plot for the GloVe activations in the second gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Figure 9.3.7 Reverse CDF plot for the GloVe activations in the third gate (tanh) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Page 92

Figure 9.3.8 Reverse CDF plot for the GloVe activations in the fourth gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

fastText

Figure 9.3.9 Reverse CDF plot for the fastText activations in the first gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Page 93

Figure 9.3.20 Reverse CDF plot for the fastText activations in the second gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Figure 9.3.31 Reverse CDF plot for the GloVe activations in the third gate (tanh) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Page 94

Figure 9.3.42 Reverse CDF plot for the fastText activations in the fourth gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

9.3.2 Activation Variance For each model, we plot the variance of the neuron activations against the epoch at all gates and layers.

The activations are calculated by feeding the activation equation with the sentend vector.

If a phase transition were to occur, a maximum variance should occur approximately at the “critical

epoch” of 260 for word2vec and the “critical epoch” of 150 for GloVe. From figure 9.3.2.1 and figure

9.3.2.6, LSTM 2 Gate 1 for both word2vec and GloVe model showed signs of phase transition as the

maximum point is happening around the “critical epoch” region.

We changed the vector fed into the activation equation to the word ‘java_14’. From figure 9.3.2.2 and

9.3.2.7, the maximum variance was not occurring near the “critical epoch” region. Hence, this suggests

that having the maximum variance at the critical epoch was coincidental and there is no indication of a

phase transition

Page 95

word2vec (spike at approximately 260)

Figure 9.3.5.1 Variance plot for the word2vec activations in the first gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch interval.

Figure 9.3.6.2 Variance plot for the word2vec activations in the first gate (sigmoid), second LSTM layer, over 1000 epoch at 100 epoch intervals, when the word ‘java_14’ is fed into the activation.

Page 96

Figure 9.3.2.3 Variance plot for the word2vec activations in the second gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Figure 9.3.2.4 Variance plot for the word2vec activations in the third gate (tanh) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Page 97

Figure 9.3.2.5 Variance plot for the word2vec activations in the fourth gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervasl.

GloVe (Spike at approximately 150 epoch)

Figure 9.3.2.6 Variance plot for the GloVe activations in the first gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Page 98

Figure 9.3.2.7 Variance plot for the GloVe activations in the first gate (sigmoid), second LSTM layer, over 1000 epoch at 100 epoch intervals, when the word ‘java_14’ is fed into the activation

Figure 9.3.2.8 Variance plot for the GloVe activations in the second gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Page 99

Figure 9.3.2.9 Variance plot for the GloVe activations in the third gate (tanh) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Figure 9.3.2.10 Variance plot for the GloVe activations in the fourth gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Page 100

fastText (no spike)

Figure 9.3.2.11 Variance plot for the fastText activations in the first gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Figure 9.3.2.12 Variance plot for the fastText activations in the second gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Page 101

Figure 9.3.2.13 Variance plot for the fastText activations in the third gate (tanh) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Figure 9.3.2.14 Variance plot for the fastText activations in the fourth gate (sigmoid) across the 4 LSTM layers, over 1000 epoch at 100 epoch intervals.

Page 102

9.4 Codes

http://www.planetb.ca/syntax-highlight-word

9.4.1 Chatbot Building Data Preparation

1. import csv

2. import numpy as np

3. from pprint import pprint

4.

5.

6. # read csv file on python

7. filepath = "dialogueText_301.csv"

8.

9.

10. with open(filepath, "r", encoding='latin1') as f:

11. reader = csv.reader(f)

12. data = [line for line in reader]

13.

14. # read only first 50000 rows

15. df=data[1:50000]

16.

17.

18. # grouping all the next neighbour text together

19. def grouped(df):

20. lst1 = []

21. lst = []

22. for i in range(len(df)):

23. if i in lst:

24. continue

25. else:

26. for j in range(i+1, len(df)):

27. if df[i][3] == df[j][3]:

28. lst.append(j)

29. df[i][-1] += ' ' + df[j][-1]

30. if df[i] not in lst1:

31. lst1.append(df[i])

32. else:

33. if df[j] not in lst1:

34. lst1.append(df[j])

35. break

36. return lst1

37. group = grouped(df)

38. #group by ID

39. ID = []

40. for i in group:

41. g = []

42. g.append(i[1])

43. g.append(i[5])

44. ID.append(g)

45.

46.

47.

http://www.planetb.ca/syntax-highlight-word

Page 103

48. # only chat output list by id

49. values = sorted(set(map(lambda x:x[0], ID)))

50. chat = [[y[1] for y in ID if y[0]==x] for x in values]

51.

52.

53.

54. #group into pairs

55. l = []

56. for i in chat:

57. if len(i)%2==0:

58. paired = zip(i[:-1:2], i[1::2])

59. l.append(list(paired))

60. else:

61. odd_i = i

62. odd_i.append(' ')

63. paired = zip(odd_i[:-1:2], i[1::2])

64. l.append(list(paired))

65. f = open('extract_301.pickle','wb')

66. pickle.dump(l,f)

67. f.close()

NLP pipeline

1. import csv


3. import pandas as pd


5.

6. from prep import l

7. import pickle

8.

9.

10. import nltk

11.

12. import re

13. import string

14. from nltk.tokenize import word_tokenize, sent_tokenize

15. from nltk.corpus import stopwords

16. from nltk.stem import PorterStemmer

17.

18. default_stemmer = PorterStemmer()

19. default_stopwords = stopwords.words('english')

20. def clean_text(text, ):

21.

22. def tokenize_text(text):

23. return [w for s in sent_tokenize(text) for w in word_tokenize(s)]

24.

25. def remove_special_characters(text, characters=string.punctuation.replace('-

', '')):

26. tokens = tokenize_text(text)

27. pattern = re.compile('[{}]'.format(re.escape(characters)))

28. return ' '.join(filter(None, [pattern.sub('', t) for t in tokens]))

29.

30.

Page 104

31. def remove_stopwords(text, stop_words=default_stopwords):

32. tokens = [w for w in tokenize_text(text) if w not in stop_words]

33. return ' '.join(tokens)

34.

35. text = text.strip(' ') # strip whitespaces

36. text = text.lower() # lowercase

37. #text = stem_text(text) # stemming

38. text = remove_special_characters(text) # remove punctuation and symbols

39. text = remove_stopwords(text) # remove stopwords

40. #text.strip(' ') # strip whitespaces again?

41.

42. return text

43.

44.

45. text=l

46.

47.

48.

49. clean = []

50. for i in text:

51. y = []

52. for j in i:

53. for k in j:

54. x = clean_text(k)

55. y.append(x)

56. paired = zip(y[:-1:2], y[1::2])

57. clean.append(list(paired))

58.

59.

60. f = open('w2v50000.pickle','wb')

61. pickle.dump(clean,f)

62. f.close()

For the following codes, the same process is repeated for word2vec, GloVe and fastText. Hence, only

the code for the implementation of word2vec is attached

word2vec embedding

1. # download gensim environment

2. import gensim

3. from gensim.models import KeyedVectors



6. import pickle

7.

8.

9. model = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz',

10. encoding='utf-8',

11. binary=True)

12.

13. f = open('clean50000.pickle','rb')

14. clean =pickle.load(f)

15.

16.

Page 105

17.

18. count=0

19. question=[]

20. answer=[]

21.

22. for x in clean:

23. for t in x:

24. for u in t:

25. count+=1

26. if count%2==1:

27. question.append(u)

28. else:

29. answer.append(u)

30.

31. split_question = [item.split(" ") for item in question]

32.

33. split_answer = [item.split(" ") for item in answer]

34.

35. vocab = model.vocab.keys()

36. question_vectors=[]

37.

38. for sent in split_question:

39. sentvec = [model[w] for w in sent if w in model.vocab]

40. question_vectors.append(sentvec)

41.

42.

43.

44. answer_vectors=[]

45.

46. for sent in split_answer:

47. sentvec = [model[w] for w in sent if w in model.vocab]

48. answer_vectors.append(sentvec)

49.

50.

51. ### Return a new array of given shape and type, filled with ones.

52.

53. sentend=np.ones((300,),dtype=np.float32)

54.

55. for tok_sent in question_vectors:

56. tok_sent[14:] = []

57. tok_sent.append(sentend)

58.

59. for tok_sent in question_vectors:

60. if len(tok_sent) < 15:

61. for i in range(15 - len(tok_sent)):


63.

64. for tok_sent in answer_vectors:

65. tok_sent[14:] = []


67.

68. for tok_sent in answer_vectors:

69. if len(tok_sent) < 15:

70. for i in range(15 - len(tok_sent)):


72.

73. f = open('w2v5000.pickle','wb')

Page 106

74. pickle.dump([question_vectors,answer_vectors],f)

75. f.close()

Training embeding on LSTM model

1. import pickle


3. from keras.models import Sequential

4. from keras.layers.recurrent import LSTM, SimpleRNN

5. import theano

6. from keras.callbacks import CSVLogger

7.

8. theano.config.optimizer = "None"

9.

10. f = open('w2v50000.pickle','rb')

11. question_vectors,answer_vectors=pickle.load(f)

12.

13. x_train = np.array(question_vectors)

14. y_train = np.array(answer_vectors)

15.

16. model=Sequential()

17. model.add(LSTM(output_dim=300,input_shape=x_train.shape[1:],return_sequences=True, init

='glorot_normal', inner_init='glorot_normal', activation='sigmoid'))







21. model.compile(loss='cosine_proximity', optimizer='adam', metrics=['accuracy'])

22.

23. csv_logger = CSVLogger('w2vlog.csv', append=True, separator=';')

24. model.fit(x_train, y_train, nb_epoch=0, callbacks=[csv_logger])

25.

26. model.save('w2vLSTM0.h5');


28.



31.


33. model.fit(x_train, y_train, nb_epoch=100 ,callbacks=[csv_logger])

34.



37.



40.



43.



46.


Page 107


49.



52.



55.


57. predictions=model.predict(x_test)

58. f = open('predictions','wb')

59. pickle.dump([predictions],f)

60. f.close()

Simple implementation into chatbot

Implementation only on pycharm interface to retrieve results generated by model


2. from keras.models import load_model


4. import gensim

5. import nltk


7.

8. import theano

9. theano.config.optimizer="None"

10.

11.

12.

13.

14. model=load_model('w2vLSTM1000.h5')

15. mod = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz',


17. binary=True)

18.

19. while (True):

20. x = input("Enter the question:");

21. sentend = np.ones((300,), dtype=np.float32)

22.

23. sent = nltk.word_tokenize(x.lower())

24. sentvec = [mod[w] for w in sent if w in mod.vocab]

25.

26. sentvec[14:] = []

27. sentvec.append(sentend)

28. if len(sentvec) < 15:

29. for i in range(15 - len(sentvec)):


31. sentvec = np.array([sentvec])

32.

33. predictions = model.predict(sentvec)

34. outputlist = [mod.most_similar([predictions[0][i]])[0][0] for i in range(15)]

35. output = ' '.join(outputlist)

36. print (output)

Page 108

9.4.2 Analysis of results Randomisation of 100 questions and expected answers

Code is for extracting first 50 questions from trained data. Similar steps can be done to extract the next

50 questions.

1. import pickle

2. import numpy


4. import random

5. import csv

6.

7. f = open('extract_301.pickle','rb')

8. l = pickle.load(f)

9.

10. count=0

11. question=[]

12. answer=[]

13. for x in l:

14. for t in x:

15. for u in t:

16. count+=1

17. if count%2==1:

18. question.append(u)

19. else:

20. answer.append(u)

21.

22. random.seed(42)

23. random.shuffle(question)

24. random.seed(42)

25. random.shuffle(answer)

26. randomq = (question[:50])

27. randoma = (answer[:50])

28.

29.

30. csvfile = "randomq_301.xls"

31. with open(csvfile,"w") as output:

32. writer = csv.writer(output, lineterminator='\n')

33. for word in randomq:

34. writer.writerow([word])

35. output.close()

36.

37. csvfile = "randoma_301.xls"

38. with open(csvfile,"w") as output:

39. writer = csv.writer(output, lineterminator='\n')

40. for word in randoma:

41. writer.writerow([word])

42. output.close()

word2vec model answers




4. import nltk

Page 109


6.

7.

8. import theano

9. theano.config.optimizer="None"

10.

11. model=load_model('w2vLSTM1000.h5')

12. mod = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz',


14. binary=True)

15.

16. df=pd.read_csv(r'randomq_301.xls', header=None)

17. df=df.rename({0:'questions'}, axis=1)

18.

19.

20.

21. def function(x):

22. sentend = np.ones((300,), dtype=np.float32)

23. sent = nltk.word_tokenize(x.lower())

24. sentvec = [mod[w] for w in sent if w in mod.vocab]

25.

26. sentvec[14:] = []


28. if len(sentvec) < 15:

29. for i in range(15 - len(sentvec)):


31. sentvec = np.array([sentvec])

32.

33. predictions = model.predict(sentvec)

34. outputlist = [mod.most_similar([predictions[0][i]])[0][0] for i in range(15)]

35. output = ' '.join(outputlist)

36. return (output)

37.

38.

39.

40. df['answers'] = df['questions'].apply(function)

41.

42. df.to_csv('w2vqna_301.csv')

9.4.3 Analysis of model parameters Plotting loss fuction and accuracy from training over 1000 epochs



3. import matplotlib.pyplot as plt

4.

5. df=pd.read_csv("w2vlog.csv")

6. df=df.iloc[:,0].str.split(';', expand=True)

7. df=df.rename(index=str, columns={0: "epoch1", 1: "acc",2: "loss"})

8. df=df.astype(float)

9. df = df.reset_index()

10. df["epoch"]=df["index"]

11. df=df.astype(float)

Page 110

12. df.plot(kind='line',x= 'epoch',y='acc',color='red')

13. plt.xlabel("epoch")

14. plt.show()

15.

16. df.plot(kind='line',x='epoch',y='loss',color='blue')

17. plt.show()

Extracting the weights from h5 file stored in the keras file for each 100 epoch



3. import csv

4. import pickle

5.

6. model0=load_model('w2vLSTM0.h5')











17.

18.

19. weights0 = model0.get_weights()











30.

31.

32.

33.

34. f= open('file0.pickle','wb')

35. pickle.dump(weights0,f)

36. f.close()



39. f.close()



42. f.close()


Page 111


45. f.close()



48. f.close()



51. f.close()



54. f.close()



57. f.close()



60. f.close()



63. f.close()



66. f.close()

Plotting of kernel, recurrent kernel and bias for the first layer of the LSTM network. Similar code can be

derived to plot the kernel, recurrent kernel and bias of the second, third and fourth layer.




4. import pickle

5. f= open('file0.pickle','rb')

6.

7. x0 =pickle.load(f)

8. A=x0[0].flatten()

9.

10.



13. a=x1[0].flatten()

14.



17. b=x2[0].flatten()

18.



21. c=x3[0].flatten()

22.



25. d=x4[0].flatten()

26.



29. e=x5[0].flatten()

Page 112

30.



33. z=x6[0].flatten()

34.



37. g=x7[0].flatten()

38.



41. h=x8[0].flatten()

42.



45. i=x9[0].flatten()

46.



49. j=x10[0].flatten()

50.

51. import math

52. intervali = np.linspace(-8, 8, 100)

53. yA, xA = np.histogram(A, bins=intervali)

54. ya, xa = np.histogram(a, bins=intervali)

55. yb, xb = np.histogram(b, bins=intervali)

56. yc, xc = np.histogram(c, bins=intervali)

57. yd, xd = np.histogram(d, bins=intervali)

58. ye, xe = np.histogram(e, bins=intervali)

59. yf, xf = np.histogram(z, bins=intervali)

60. yg, xg = np.histogram(g, bins=intervali)

61. yh, xh = np.histogram(h, bins=intervali)

62. yi, xi = np.histogram(i, bins=intervali)

63. yj, xj = np.histogram(j, bins=intervali)

64.

65. plt.plot(xA[:-1], yA,label="0",color='K')

66. plt.plot(xa[:-1], ya,label="100")

67. plt.plot(xb[:-1], yb,label="200")

68. plt.plot(xc[:-1], yc,label="300")

69. plt.plot(xd[:-1], yd,label="400")

70. plt.plot(xe[:-1], ye,label="500")

71. plt.plot(xf[:-1], yf,label="600")

72. plt.plot(xg[:-1], yg,label="700")

73. plt.plot(xh[:-1], yh,label="800")

74. plt.plot(xi[:-1], yi,label="900")

75. plt.plot(xj[:-1], yj,label="1000")

76.

77. plt.yscale('log')

78. plt.ylabel('Quantity (log)')

79. plt.xlabel('Kernel')

80.

81.

82. plt.legend(loc='upper left')

83. plt.title("w2v histogram of kernels 1")

84. plt.show()

85.


Page 113



89.




93.




97.




101.




105.




109.




113.




117.




121.




125.



128. j=x10[1].flatten()

129.

130. import math













143.

Page 114

144. plt.plot(xA[:-1], yA,label="0",color='k')



147. plt.plot(xc[:-1], yc,label="300")


149. plt.plot(xe[:-1], ye,label="500")


151. plt.plot(xg[:-1], yg,label="700")


153. plt.plot(xi[:-1], yi,label="900")


155.



158. plt.xlabel('Recurrent Kernel')

159.


161. plt.title("w2v histogram of recurrent kernels 1")

162. plt.show()

163.

164. #bias 1




168.




172.




176.




180.




184.




188.




192.




196.




200.

Page 115




204.



207. j=x10[2].flatten()

208.

209. import math













222.

223. plt.plot(xA[:-1], yA,label="0",color='k')



226. # plt.plot(xc[:-1], yc,label="300")


228. # plt.plot(xe[:-1], ye,label="500")


230. # plt.plot(xg[:-1], yg,label="700")


232. # plt.plot(xi[:-1], yi,label="900")


234.



237. plt.xlabel('Bias')

238.


240. plt.title("w2v histogram of bias 1")

241. plt.show()

Plot of the variance of the activation in the first gate ‘forget gate’ of the first LSTM layer.

Similar code can be derived to plot the second, third and fourth gates and for the second third and

fourth LSTM layer.




4. import pickle

5. import pprint

6. import math

7.

8.

9. def sigmoid(gamma):

Page 116

10. if gamma < 0:

11. return 1 - 1/(1 + math.exp(gamma))

12. else:

13. return 1/(1 + math.exp(-gamma))



16.

17.



20.



23.



26.



29.



32.



35.



38.



41.



44.



47.

48. w=np.ones((300,),dtype=np.float32)

49.

50. # gate 1 activation var

51. e0=[]

52. for i in range(300):

53. w1=x0[0][i][0:300]

54.

55. rw1=x0[1][i][0:300]

56.

57. b1=x0[2][i]

58.

59. e0.append(sigmoid(np.dot(w1,w)+np.dot(rw1,w)+b1))

60.

61. e1=[]


63. w1=x1[0][i][0:300]

64.

65. rw1=x1[1][i][0:300]

66.

Page 117

67. b1=x1[2][i]

68.


70.

71. e2=[]


73. w1=x2[0][i][0:300]

74.

75. rw1=x2[1][i][0:300]

76.

77. b1=x2[2][i]

78.


80.

81. e3=[]


83. w1=x3[0][i][0:300]

84.

85. rw1=x3[1][i][0:300]

86.

87. b1=x3[2][i]

88.


90. e4=[]


92. w1=x4[0][i][0:300]

93.

94. rw1=x4[1][i][0:300]

95.

96. b1=x4[2][i]

97.


99.

100. e5=[]


102. w1=x5[0][i][0:300]

103.

104. rw1=x5[1][i][0:300]

105.

106. b1=x5[2][i]

107.


109.

110. e6=[]


112. w1=x6[0][i][0:300]

113.

114. rw1=x6[1][i][0:300]

115.

116. b1=x6[2][i]

117.


119.

120. e7=[]


122. w1=x7[0][i][0:300]

123.

Page 118

124. rw1=x7[1][i][0:300]

125.

126. b1=x7[2][i]

127.


129.

130. e8=[]


132. w1=x8[0][i][0:300]

133.

134. rw1=x8[1][i][0:300]

135.

136. b1=x8[2][i]

137.


139.

140. e9=[]


142. w1=x9[0][i][0:300]

143.

144. rw1=x9[1][i][0:300]

145.

146. b1=x9[2][i]

147.


149.

150.

151.

152. e10=[]


154. w1=x10[0][i][0:300]

155.

156. rw1=x10[1][i][0:300]

157.

158. b1=x10[2][i]

159.


161.

162.

163. var0=np.var(e0)











174.

175. plt.plot([0,100, 200, 300, 400,500,600,700,800,900,1000], [var0,var1,var2,var3,v

ar4,var5,var6,var7,var8,var9,var10])

176. plt.title('LSTM 1 Gate 1 activation')

177. plt.xlabel('epoch')

178. plt.ylabel('variance')

179. plt.show()

Page 119

Plot of the reverse cumulative step historgram for the first gate (‘forget gate’) of the first LSTM layer.

Similar code can be derived to plot the second, third and fourth gates and for the second third and

fourth LSTM layer.




4. import pickle

5. import pprint

6. import math

7.

8.

9. def sigmoid(gamma):

10. if gamma < 0:

11. return 1 - 1/(1 + math.exp(gamma))

12. else:

13. return 1/(1 + math.exp(-gamma))



16.

17.



20.



23.



26.



29.



32.



35.



38.



41.



44.



47.

48. w=np.ones((300,),dtype=np.float32)

49.

Page 120

50. # gate 1 activation

51. e0=[]


53. w1=x0[0][i][0:300]

54.

55. rw1=x0[1][i][0:300]

56.

57. b1=x0[2][i]

58.


60.

61. e1=[]


63. w1=x1[0][i][0:300]

64.

65. rw1=x1[1][i][0:300]

66.

67. b1=x1[2][i]

68.


70.

71. e2=[]


73. w1=x2[0][i][0:300]

74.

75. rw1=x2[1][i][0:300]

76.

77. b1=x2[2][i]

78.


80.

81. e3=[]


83. w1=x3[0][i][0:300]

84.

85. rw1=x3[1][i][0:300]

86.

87. b1=x3[2][i]

88.


90. e4=[]


92. w1=x4[0][i][0:300]

93.

94. rw1=x4[1][i][0:300]

95.

96. b1=x4[2][i]

97.


99.

100. e5=[]


102. w1=x5[0][i][0:300]

103.

104. rw1=x5[1][i][0:300]

105.

106. b1=x5[2][i]

Page 121

107.


109.

110. e6=[]


112. w1=x6[0][i][0:300]

113.

114. rw1=x6[1][i][0:300]

115.

116. b1=x6[2][i]

117.


119.

120. e7=[]


122. w1=x7[0][i][0:300]

123.

124. rw1=x7[1][i][0:300]

125.

126. b1=x7[2][i]

127.


129.

130. e8=[]


132. w1=x8[0][i][0:300]

133.

134. rw1=x8[1][i][0:300]

135.

136. b1=x8[2][i]

137.


139.

140. e9=[]


142. w1=x9[0][i][0:300]

143.

144. rw1=x9[1][i][0:300]

145.

146. b1=x9[2][i]

147.


149.

150.

151.

152. e10=[]


154. w1=x10[0][i][0:300]

155.

156. rw1=x10[1][i][0:300]

157.

158. b1=x10[2][i]

159.


161.

162.

163.

Page 122

164. bins = 150

165.

166. fig, ax = plt.subplots(figsize=(8, 4))

167.

168.

169. # Overlay a reversed cumulative histogram.

170. ax.hist(e0, bins=bins, density=True, histtype='step', cumulative=-1,

171. label='0 epoch')



















190. ax.hist(e10, bins=bins, color='k', density=True, histtype='step', cumulative=-

1,


192.

193. # tidy up the figure

194. ax.grid(True)

195. ax.legend(loc='right')

196. ax.set_title('Reverse cumulative step histograms for gate 1 LSTM 1')

197. ax.set_xlabel('activation')

198. ax.set_ylabel('Likelyhood of occurence')

199.

200. plt.show()

Word Vectorisation for Long-Short-Term-Memory (LSTM) model ...

Documents