Applying Data Mining for News Analytics
Post on 30-Nov-2014
597 Views
Preview:
DESCRIPTION
Transcript
Applying Data Mining for News Analytics
Vasko Yordanov
The Problem
• Increased volume of online financial data which causes useful investment information to be “lost”
• Traditional means of news delivery and analysis are becoming obsolete.
• Online information is unstructured
Information Overload
3
Thousands of blogospheres
Millions of newsfeeds
Thousands of syndicators
Information is not uniform:• Structured data: news• Unstructured data: blogs• Mixed: syndicators• Dynamic: constant changing and flow of information stream• Quality: search engine optimization dilutes quality
Solution:
• Machine Interpretation of news.
• Quantify the news to be used by “news-flow” trading algos.
• Detect “news sentiment”:
-Stocks react to market sentiment
Incorporate the above into “smart” news feed aggregator
Opportunity:
• Social media can offer glimpses of information well before it reaches mainstream media.
Ex: Emergency landing on airplane in Hudson got known first on Twitter.
Such data is real time and instant.
• Real time analysis of twitter, blogs, news, gives you a view of what the public sees
• Reacting appropriately to breaking news events can give traders a significant edge over the rest of the market, if they act on it faster than the competition.
• Quantitative trading groups say that in the near future they will look to develop models that use the historical impact of news events on stock performance to predict the effect real-time events may have on future performance
Ex: Real time news for Enviro Technologies:
Example:
Avastin is Genetech’s trade name for Bevacizumab ,an anti-angiogenic drug which has been approved for use against colorectal cancer since 2004.On 14th March , 2005, The National Cancer Institute ( NCI) posted the result of Phase 3 trials using Bevacizumab combined with chemotherapy for patients with advanced lung cancer. Four hours later Genetech did a press release.
The immediate result was 25% hike in the company’s stock price. Anyone who had made the connection between Bevacizumab and Genetech within the four-hour window could have had a significant
market lead.
Incorporate news into trading algorithms
• News flow itself is important trading signal following the old adage “There is no smoke without fire”.
• The sheer volume of news items can be just as much an indicator as the actual information they convey. Sudden rush of headlines does suggest that volatility may increase ( uncertainty breeds volatility)
• Identify the news “sentiment”• Data mine Events of Interest ( EOI)
What is an event?
• An event is a significant change• An event is detected by observing a
pattern in data acquired over time from multiple sources: -Stock tickers, news, blogs,..
• Observe an anomaly:-The number of blogs per day about a
company are 50% higher in the last day than over the last year
How ?
• Employ advances in emerging technologies of AI such as :Natural Language Processing , Sentiment Analysis and Entity identification, Semantic Web.
• That arena is becoming hot and already sophisticated tools exist such as Thomson Reuters’s “OpenCalais” service.
The Market for Unstructured Data
• Only 2 % of firms employing electronic trading strategies with unstructured data in a machine readable format, estimates Aite Group.
• Some content is free; for paid content, firms will spend more than $75 million globally in 2009 and over $141 million by 2011, estimates Aite Group.
Use emerging edge technology:
Semantic Web:• “The Semantic Web is an evolving extension of the
World Wide Web in which the semantics of information and services on the web is defined, making it possible for machines to understand content” –Wikipedia
• Contextual search for more relevant information
– Proprietary algorithm to search and render information using Semantic Web standards
– Google just acquired Freebase – formerly the “poster child” of Semantic linked data . They contained and linked world’s knowledge. This is likely to cause a shift similar to WWW adoption.
Proprietary 14
Competitors:
• Relegence’s “FirstTrack” service
• Collective Intellect
• Bloomberg news “heat” analytics.
top related