SOPS: Stock Prediction using Web SOPS: Stock Prediction using Web Sentiment Sentiment Presented by Vivek sehgal, Charles Song Department of Computer Science, University of Maryland ICDMW 2007 2009-05-29 Summarized by Jaeseok Myung
Dec 16, 2015
SOPS: Stock Prediction using Web SentimentSOPS: Stock Prediction using Web Sentiment
Presented by Vivek sehgal, Charles Song
Department of Computer Science, University of Maryland
ICDMW 2007
2009-05-29
Summarized by Jaeseok Myung
Copyright 2009 by CEBT
In this talk..In this talk..
Introducing some papers about sentiment analysis in finance
[1] 0Event and Sentiment Detection in Financial Markets (ISWC 08)
– Simple Architecture
[2] SOPS: Stock Prediction using Web Sentiment (ICDMW 07)
– Entire Process
[3] Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web (Management Science 07)
– An Idea that can improve prediction performance
We will focus on SOPS, but brief introductions about the others will also be presented
Center for E-Business Technology
Copyright 2009 by CEBT
Sentiment Analysis in Financial Sentiment Analysis in Financial MarketsMarkets
Sentiment analysis is one of my favorite research topic
I’ve conducted some researches by using product reviews
In my opinion, finance is more suitable domain than product
Product sales statistics is not publicly available
– Stock values are always opened
Financial markets are really related to investors’ sentiment
– ‘ 경제는 심리’
– Behavioral finance
– Lots of evidences
Interesting & Worth
Center for E-Business Technology
Copyright 2009 by CEBT
Research Problem from [1][2][3]Research Problem from [1][2][3]
How can information from various, heterogeneous sources be integrated?
Different formats
How can the opinions in the documents be extracted?
Statistical, NLP ways
How can the important opinions be filtered?
Reliable Source(news, blog), Trusted Author, Promising Alg.
How can the users’ trading decisions be supported?
Finding out the relationships between investors’ sentiment and stock values
Center for E-Business Technology
Copyright 2009 by CEBT
An Architecture from [1]An Architecture from [1]
Center for E-Business Technology
Monitor a huge number of relevant sources
Monitor a huge number of relevant sources
Extract metadata and Make a single representation
Extract metadata and Make a single representation
Decide whether the information has to be analyzed or not
Decide whether the information has to be analyzed or not
Copyright 2009 by CEBT
SOPS: System OverviewSOPS: System Overview
Center for E-Business Technology
Collect data from a message board
Collect data from a message board
Remove HTML tags and extract features
Remove HTML tags and extract features
Identify reliable users in order to filter noise
Identify reliable users in order to filter noise
Use several classifiersUse several classifiers
Copyright 2009 by CEBT
SOPS: Data CollectionSOPS: Data Collection
260,000 messages for 52 popular stocks on Yahoo! Finance
The messages covered over 6 month time period
A message board exists for each stock traded on major stock exchange such as NYSE and NASDAQ
Users must sign up before they can post messages
Every message posted is associated with the author
Center for E-Business Technology
Copyright 2009 by CEBT
SOPS: Data CollectionSOPS: Data Collection
Center for E-Business Technology
Copyright 2009 by CEBT
SOPS: Feature RepresentationSOPS: Feature Representation
After the relevant information has been extracted
Converting each message to a vector of words and author names
The value of each entry in the vector is then calculated using TFIDF formula
Center for E-Business Technology
M : set of all messagesm : a messagew : a term
M : set of all messagesm : a messagew : a term
( 3.2, 1.6, 1.09, 3.37. 90, 0.5, …)
“good” “stop” “asdf” date % of change in stock price
Copyright 2009 by CEBT
SOPS: Sentiment PredictionSOPS: Sentiment Prediction
Center for E-Business Technology
a message(undisclosed)
Classifier
Strong Buy Strong Sell
Buy Sell
Hold
What How
a message(disclosed)
Classifier(Training)
Strong Buy Strong Sell
Buy Sell
Hold
Copyright 2009 by CEBT
SOPS: Sentiment PredictionSOPS: Sentiment Prediction
The sentiment for a message m at time instant i is modeled as follows:
Center for E-Business Technology
m : a messageMi : set of all messagesSVi : Stock value
m : a messageMi : set of all messagesSVi : Stock value
Classifier
1.Naïve Bayes2.Decision Trees3.Bagging
Strong Buy, Buy, Hold, Sell, Strong SellStrong Buy, Buy, Hold, Sell, Strong Sell
Strong Buy Strong Sell
Buy Sell
Hold
0.2 0.3 0.1 0.4
Copyright 2009 by CEBT
TrustValue CalculationTrustValue Calculation
Some authors are more knowledgeable than others about the stock market
Trusted author’s posts should carry more weight => TrustValue
TrustValue
Not only cares about the direction in which the stock price went, but also care about the magnitude
Takes into account the fact that a single author cannot be expert on all stocks => an author can be assigned different trust values for different stocks
Center for E-Business Technology
PredictionScore : author’s prediction performance that is how closely does the author’s prediction follow the stock market
NumberOfPrediction : the total number of predictions made by the author
ExactPrediction : the number of exact predictions
ClosePrediction : the number of “good enough” predictions
ActivityConstant : a constant used to penalize low activity or predictions by the author
PredictionScore : author’s prediction performance that is how closely does the author’s prediction follow the stock market
NumberOfPrediction : the total number of predictions made by the author
ExactPrediction : the number of exact predictions
ClosePrediction : the number of “good enough” predictions
ActivityConstant : a constant used to penalize low activity or predictions by the author
Copyright 2009 by CEBT
SOPS: Stock PredictionSOPS: Stock Prediction
Center for E-Business Technology
Classifier
Go up Go down
Copyright 2009 by CEBT
SOPS: Evaluation MetricsSOPS: Evaluation Metrics
Center for E-Business Technology
Copyright 2009 by CEBT
SOPS: ExperimentsSOPS: Experiments
Center for E-Business Technology
Copyright 2009 by CEBT
ConclusionConclusion
SOPS can predict Web sentiment with high precision and recall
SOPS introduced TrustValue which takes into account the trust-worthiness of an author
In my opinion, there are some points that are unclear
Presentation
– About Summarization
Users
Time Period
Center for E-Business Technology
Copyright 2009 by CEBT
FurthermoreFurthermore
We have the paper [3]
Center for E-Business Technology
Copyright 2009 by CEBT
Research Problem from [1][2][3]Research Problem from [1][2][3]
How can information from various, heterogeneous sources be integrated?
Different formats
How can the opinions in the documents be extracted?
Statistical, NLP ways
How can the important opinions be filtered?
Reliable Source(news, blog), Trusted Author, Promising Alg.
How can the users’ trading decisions be supported?
Finding out the relationships between investors’ sentiment and stock values
Center for E-Business Technology