“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

Post on 21-Jan-2018

93 Views

Category:

Economy & Finance

1 Downloads

Preview:

Click to see full reader

Transcript

Real Time Machine Learning Architecture & Sentiment Analysis

Quantcon 2017, Singapore

Juan CHENG, PHDData Scientistcheng.juan@infotrie.com

www.infotrie.com@infotrie

www.finsents.com@finsents

● About us● News analytics signals in Finance● Big data architecture ● Demo cases

Frederic GEORJONCEO

Ajil GEORGEHead of Development Center

Daniel ABROUKHead of EMEA

Paris/Singapore London

LONG ZhichengCTO

Singapore India

FinSentS.com➔ Real-time information

and trading portal➔ Millions of sources /

Multilingual➔ Saas or on premises➔ Real-time Alerts➔ Actionable signals

Sentiment Data➔ Through API or 1/3 parties➔ Up to 15 years of history➔ Low latency / Tick by tick➔ 50,000+ entities➔ Stock, Forex, commodities,

index, Macroeconomic topics etc…

Consultancy and Training➔ Trading Technology➔ Algorithmic trading➔ Big Data➔ Natural Language

Processing (NLP)➔ Machine Learning

Access to News / News management

- Visualization tools - Filtering tools - On demand view

Feed from multiple sources:- Social Media- Web based content- Private sources - Internal data

News Content Alerts based on sentiment indicator

Provide accurate information from Big Data environment and pushed it front of Users in real time for Risk management

Dashboard

- Consolidated Dashboard- Portfolio Alerts

Actionable indicators

Users receive news signals for trading / hedging / risk management based sentiment indicator

Algo Trading / Robo Trading

Real Time algorithmic trading Sentiment indicator and News Analytics

Equity Research / Sales Team Hedging Trader / Prop Trader

- News Tag Cloud- Filtering newsfeed with Social media blotter, news blotter - Search Engine on demand

- Topics detection - Rumours alerts- News qualification per importance

- Relevant information from single screen- Automatic Alert- Integrated to OMS

Provide relevant news analytics indicator for hedging or trade idea generation

Fully integrated news analytics signals integrated to algo trading strategies

ReutersMARKET NEWS | Fri Oct 21, 2016 | 2:18am EDT

AT&T acquires Time Warner for $85 billionNEW YORK- AT&T Inc said it agreed to buy Time Warner Inc for $85.4 billion, the boldest move yet by a telecommunications company to acquire content to stream over its high-speed network to attract a growing number of online viewers.

The trend of consolidation comes as technology advances have been upending traditional entertainment companies. Many in the industry believe that getting bigger is the best way to compete with companies like Google, Apple, Netflix and Facebook.David Goldman and Paul R. La Monica contributed to this report.

ReutersMARKET NEWS | Fri Oct 21, 2016 | 2:18am EDT

AT&T acquires Time Warner for $85 billionNEW YORK- AT&T Inc said it agreed to buy Time Warner Inc for $85.4 billion, the boldest move yet by a telecommunications company to acquire content to stream over its high-speed network to attract a growing number of online viewers.

The trend of consolidation comes as technology advances have been upending traditional entertainment companies. Many in the industry believe that getting bigger is the best way to compete with companies like Google, Apple, Netflix and Facebook.David Goldman and Paul R. La Monica contributed to this report.

Source

Category

Time

Location

Named Entity

Sentiment

Event

Hacking skill, regex,nlp, named entity recognition, pos taggers

- Companies, indexes - People, locations, organizations- Events- Regions

NLP

Text- Dow Jones, bloomberg- Web news, blogs, twitter- 1000+ sources

Feature Extraction

Classification

Sentiment

- 15 years history- Tens of millions of articles

Training

Indexing - Sector/industry- Commodity, FX, ETFs- Political, country risk- Macroeconomic- Fear, greed, anger,

happiness

Aggregation

● Entity ● Classification● Sentiment

www.infotrie.com@infotrie

Ping An Insurance Group • SSE: 601318 (A share)• SEHK: 02318 (H share)

• Also known as Ping An of China

• A holding company whose subsidiaries mainly deal with insurance, banking, and financial services

• Constituent of Shanghai Stock Exchange 50 A Share Index (SSE50)

• A component of Hang Seng Index

NoSQL Databasecache persistent

Kafka Filter, topic classification, sentiment calculation, entity detection, stock mapping, sentiment aggregation

Apache Storm

DFSNlp modelsML models

ProducersBlogs, twitter, news, bloomberg...

Model training, batch cleaning, batch calculation

Apache Spark

Solr

Relational Database

Web app

www.infotrie.com@infotrie

lead signal in the subsequent price rise

positive corporate announcement on stock dividend release

Ping An44

43444443

434444342

434444341

434444340

43444434-143444434-2

08/14/2016 09/11/2016 11/07/201610/09/2016

positive corporate announcement on stock dividend release

positive announcement on insurance fee income and 17.1% rise of revenue in the first three quarters

mandarin

english

close

get Articles, Treemap, Tags, Company Sentiment, Sentiment History, Company Static Tags, News Buzz, Data, Articles Tag, Index Score, Asset Sentiment, Article Ids, Leaders Laggers…

Easy API call

Available @ www.infotrie.comcontact@infotrie.com@infotrie

www.finsents.com@finsents

Train Document Set:

d1: The sky is blue.d2: The sun is bright.

Test Document Set:

d3: The sun in the sky is bright.d4: We can see the shining sun, the bright sun.

Vector Space Model (VSM)

t1 t2...

d1

d2 ...

Train Document Set:d1: The sky is blue.d2: The sun is bright.

Vocabulary

Term frequency(TF)

TF emphasize a term which is almost present in the entire corpus

TD-IDF

TF example IDF example

Normalized TD-IDF

Train Document Set:

d1: The sky is blue.d2: The sun is bright.

Test Document Set:

d3: The sun in the sky is bright.d4: We can see the shining sun, the bright sun.

Vector Space Model (VSM)

t1 t2...

d1

d2 ...

Machine Learning

Analytics on Massive Historical Text Data

Analytics on recent pass

Realtime analytics

Batch layer real-time layer

Fast and general engine for large-scale distributed data processing

Memory Network CPU’s Disk

Reference: spark

Logistic regression in Hadoop and Spark

open source distributed realtime computation system, easily process unbounded streams of data

Storm was benchmarked at processing one million 100 byte messages per second per node on hardware with the following specs:

● Processor: 2x Intel E5645@2.4Ghz

● Memory: 24 GB

Reference: storm

Spout

bolt

✓ Guaranteed data processing ✓ Horizontal scalability✓ Fault-tolerance✓ Higher level abstraction than message

passing✓ Real-time machine learning for

classification and predictive analytics

Sentiment in itself is a powerful trading indicator out of which multiple trading strategies can be build

Simulate impact of complex events

➔ Scale analysis pipeline➔ Live stats➔ Recommendations ➔ Predictions➔ Realtime analytics ➔ Online machine learning

Apply similar architecture in

MIFID alertImprove Client's communication

Regulatory Process complex / low signals events

ESG monitoringEcological – Social – Governance

An union calls for a strike in a factory in Argentina?

Negative news coverage is accelerating for a stock I hold in Chinese press but are not yet in English press?

A European company employs children in Bangladesh (*)?

ACTIONS

111111111

3231

111111111

3231

111111111

3231

dfs

96

3

99693

text_file.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)Job

Executor

Nimbus

Zookeeper

Zookeeper

Worker

Worker

Worker

Worker

Velocity

Big Data

Variety

- News, blogs, social media, analyst reports, company announcement, traders’ chat room…

- Financial reports, price, economic events...

- Weather, GPS, image....

Volumn

- ETL- Machine learning- Correlation analysis,- regressions….

- As fast as possible

B.No, I’m a quant. I found it’s hard to quantified news.

A.No, I found news are noisy. They are just too much.

C. Yes. But I found using news is not very efficient. I have to manually related them to my portfolio.

❏ Guaranteed data processing❏ Horizontal scalability❏ Fault-tolerance❏ Higher level abstraction than message passing❏ Real-time machine learning for classification and predictive

analytics

www.infotrie.com@infotrie

Analysis of an Indonesian Company “Pelindo” in English vs Bahasa Indonesia

Tracking of weak signals : local languages with little to no coverage in English press

Sen

tim

ent

-5/5

New

s vo

lum

e #

top related