Project Report (MBA 653A) 2015 Indian Institute of Technology, Kanpur 1 REAL TIME SENTIMENT ANALYSIS USING TWITTER FEED Project Report by Swapnil Shwetank Jha (11753) Shibendu Saha (11679) Anshu Kumar Gupta (11125) Shivendu Bhushan (11689) [Group 4] Project Supervisor: Dr. Shankar Prawesh IIT Kanpur
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Project Report (MBA 653A) 2015
Indian Institute of Technology, Kanpur 1
REAL TIME SENTIMENT ANALYSIS
USING TWITTER FEED
Project Report
by
Swapnil Shwetank Jha (11753) Shibendu Saha (11679)
We have taken efforts in this project. However, it would not have been possible without the kind support and help of many individuals. We would like to extend our sincere thanks to all of them.
We are highly indebted to Dr Shankar Prawesh for his guidance and constant supervision as well as for providing necessary information regarding the project & also for their support in completing the project. Our thanks and appreciations also go to our colleagues in developing the project and people who have willingly helped us out with their abilities.
Project Report (MBA 653A) 2015
Indian Institute of Technology, Kanpur 3
Table of Contents
Topic Page No.
Objective 4
Dataset 4
Introduction 5
Algorithm 6-7
Results 8-9
Future Prospects 9
Appendix (Python Codes) 10-11
Bibliography 12
Project Report (MBA 653A) 2015
Indian Institute of Technology, Kanpur 4
Objective
The aim of our project is to collect real time tweets about any trending topic
which we can then classify as ‘positive’ or ‘negative’ using a model that we
have prepared through training using Gaussian Naïve Bayes Classifier.
This information will be useful in gathering information about the general
public response related to the particular object, news, trend etc.
Dataset
For training our judgement model, we have used the dataset from ‘Kaggle’
Sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. The basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level — whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive or negative.
The accuracy of a sentiment analysis system is, in principle, how well it agrees with human judgments. This is usually measured by precision and recall. However, according to research human raters typically agree 79% of the time.
Thus, a 70% accurate program is doing nearly as well as humans, even though such accuracy may not sound impressive. If a program were "right" 100% of the time, humans would still disagree with it about 20% of the time, since they disagree that much about any answer.
Get several statements from a database with their actual
positive or negative response.
Split the statements into two classes: positive and negative.
For each class, compute the tf-idf values and their mean
and variances to prepare a Gaussian probability distribution
Map the probabilities using a Naïve Bayes Classifier
Use only 80% of the dataset for the training and
test the model on remaining 20%.
Project Report (MBA 653A) 2015
Indian Institute of Technology, Kanpur 7
Actual Program
Authenticate with Twitter using token
Collect real time tweets about a trending topic.
Classify the tweets into the two different classes
based on computed probabilities.
Transform the tweets into vectors and pass it to our
judgement model that has previously been trained.
Display the results about the type of response
That the keyword is generating.
Project Report (MBA 653A) 2015
Indian Institute of Technology, Kanpur 8
Results (Training)
Overall Accuracy: 88.7%
Where,
Recall Precision
Positive: 1.00 0.79
Negative: 0.80 1.00
Project Report (MBA 653A) 2015
Indian Institute of Technology, Kanpur 9
Results (Real time)
Review of Apple Watch
No of tweets v/s date (in April’15)
Future Prospects Our application can be used as a service for businesses to do market analysis of the response that their products receive and track the changes in response with time.
0
2
4
6
8
10
12
15 16 17
Positive
Negative
Project Report (MBA 653A) 2015
Indian Institute of Technology, Kanpur 10
Appendix (Codes) Training
from sklearn.feature_extraction.text import TfidfVectorizer