Top Banner
Real Time Machine Learning Architecture & Sentiment Analysis Quantcon 2017, Singapore Juan CHENG, PHD Data Scientist [email protected] www.infotrie.com @infotrie www.finsents.com @finsents
38

“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

Jan 21, 2018

Download

Economy & Finance

Quantopian
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

Real Time Machine Learning Architecture & Sentiment Analysis

Quantcon 2017, Singapore

Juan CHENG, PHDData [email protected]

www.infotrie.com@infotrie

www.finsents.com@finsents

Page 2: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

● About us● News analytics signals in Finance● Big data architecture ● Demo cases

Page 3: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

Frederic GEORJONCEO

Ajil GEORGEHead of Development Center

Daniel ABROUKHead of EMEA

Paris/Singapore London

LONG ZhichengCTO

Singapore India

Page 4: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

FinSentS.com➔ Real-time information

and trading portal➔ Millions of sources /

Multilingual➔ Saas or on premises➔ Real-time Alerts➔ Actionable signals

Sentiment Data➔ Through API or 1/3 parties➔ Up to 15 years of history➔ Low latency / Tick by tick➔ 50,000+ entities➔ Stock, Forex, commodities,

index, Macroeconomic topics etc…

Consultancy and Training➔ Trading Technology➔ Algorithmic trading➔ Big Data➔ Natural Language

Processing (NLP)➔ Machine Learning

Page 5: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

Access to News / News management

- Visualization tools - Filtering tools - On demand view

Feed from multiple sources:- Social Media- Web based content- Private sources - Internal data

News Content Alerts based on sentiment indicator

Provide accurate information from Big Data environment and pushed it front of Users in real time for Risk management

Dashboard

- Consolidated Dashboard- Portfolio Alerts

Actionable indicators

Users receive news signals for trading / hedging / risk management based sentiment indicator

Algo Trading / Robo Trading

Real Time algorithmic trading Sentiment indicator and News Analytics

Equity Research / Sales Team Hedging Trader / Prop Trader

- News Tag Cloud- Filtering newsfeed with Social media blotter, news blotter - Search Engine on demand

- Topics detection - Rumours alerts- News qualification per importance

- Relevant information from single screen- Automatic Alert- Integrated to OMS

Provide relevant news analytics indicator for hedging or trade idea generation

Fully integrated news analytics signals integrated to algo trading strategies

Page 6: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

ReutersMARKET NEWS | Fri Oct 21, 2016 | 2:18am EDT

AT&T acquires Time Warner for $85 billionNEW YORK- AT&T Inc said it agreed to buy Time Warner Inc for $85.4 billion, the boldest move yet by a telecommunications company to acquire content to stream over its high-speed network to attract a growing number of online viewers.

The trend of consolidation comes as technology advances have been upending traditional entertainment companies. Many in the industry believe that getting bigger is the best way to compete with companies like Google, Apple, Netflix and Facebook.David Goldman and Paul R. La Monica contributed to this report.

Page 7: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

ReutersMARKET NEWS | Fri Oct 21, 2016 | 2:18am EDT

AT&T acquires Time Warner for $85 billionNEW YORK- AT&T Inc said it agreed to buy Time Warner Inc for $85.4 billion, the boldest move yet by a telecommunications company to acquire content to stream over its high-speed network to attract a growing number of online viewers.

The trend of consolidation comes as technology advances have been upending traditional entertainment companies. Many in the industry believe that getting bigger is the best way to compete with companies like Google, Apple, Netflix and Facebook.David Goldman and Paul R. La Monica contributed to this report.

Source

Category

Time

Location

Named Entity

Sentiment

Event

Hacking skill, regex,nlp, named entity recognition, pos taggers

Page 8: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

- Companies, indexes - People, locations, organizations- Events- Regions

NLP

Text- Dow Jones, bloomberg- Web news, blogs, twitter- 1000+ sources

Feature Extraction

Classification

Sentiment

- 15 years history- Tens of millions of articles

Training

Indexing - Sector/industry- Commodity, FX, ETFs- Political, country risk- Macroeconomic- Fear, greed, anger,

happiness

Aggregation

Page 9: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

● Entity ● Classification● Sentiment

Page 10: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

www.infotrie.com@infotrie

Ping An Insurance Group • SSE: 601318 (A share)• SEHK: 02318 (H share)

• Also known as Ping An of China

• A holding company whose subsidiaries mainly deal with insurance, banking, and financial services

• Constituent of Shanghai Stock Exchange 50 A Share Index (SSE50)

• A component of Hang Seng Index

Page 11: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

NoSQL Databasecache persistent

Kafka Filter, topic classification, sentiment calculation, entity detection, stock mapping, sentiment aggregation

Apache Storm

DFSNlp modelsML models

ProducersBlogs, twitter, news, bloomberg...

Model training, batch cleaning, batch calculation

Apache Spark

Solr

Relational Database

Web app

Page 12: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie
Page 13: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie
Page 14: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

www.infotrie.com@infotrie

lead signal in the subsequent price rise

positive corporate announcement on stock dividend release

Ping An44

43444443

434444342

434444341

434444340

43444434-143444434-2

08/14/2016 09/11/2016 11/07/201610/09/2016

positive corporate announcement on stock dividend release

positive announcement on insurance fee income and 17.1% rise of revenue in the first three quarters

mandarin

english

close

Page 15: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie
Page 16: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

get Articles, Treemap, Tags, Company Sentiment, Sentiment History, Company Static Tags, News Buzz, Data, Articles Tag, Index Score, Asset Sentiment, Article Ids, Leaders Laggers…

Easy API call

Page 17: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie
Page 18: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

Available @ [email protected]@infotrie

www.finsents.com@finsents

Page 19: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie
Page 20: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie
Page 21: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

Train Document Set:

d1: The sky is blue.d2: The sun is bright.

Test Document Set:

d3: The sun in the sky is bright.d4: We can see the shining sun, the bright sun.

Vector Space Model (VSM)

t1 t2...

d1

d2 ...

Page 22: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

Train Document Set:d1: The sky is blue.d2: The sun is bright.

Vocabulary

Term frequency(TF)

Page 23: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

TF emphasize a term which is almost present in the entire corpus

TD-IDF

TF example IDF example

Normalized TD-IDF

Page 24: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

Train Document Set:

d1: The sky is blue.d2: The sun is bright.

Test Document Set:

d3: The sun in the sky is bright.d4: We can see the shining sun, the bright sun.

Vector Space Model (VSM)

t1 t2...

d1

d2 ...

Machine Learning

Page 25: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

Analytics on Massive Historical Text Data

Analytics on recent pass

Realtime analytics

Batch layer real-time layer

Page 26: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

Fast and general engine for large-scale distributed data processing

Memory Network CPU’s Disk

Reference: spark

Logistic regression in Hadoop and Spark

Page 27: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

open source distributed realtime computation system, easily process unbounded streams of data

Storm was benchmarked at processing one million 100 byte messages per second per node on hardware with the following specs:

● Processor: 2x Intel [email protected]

● Memory: 24 GB

Reference: storm

Spout

bolt

Page 28: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

✓ Guaranteed data processing ✓ Horizontal scalability✓ Fault-tolerance✓ Higher level abstraction than message

passing✓ Real-time machine learning for

classification and predictive analytics

Page 29: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

Sentiment in itself is a powerful trading indicator out of which multiple trading strategies can be build

Simulate impact of complex events

Page 30: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie
Page 31: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

➔ Scale analysis pipeline➔ Live stats➔ Recommendations ➔ Predictions➔ Realtime analytics ➔ Online machine learning

Apply similar architecture in

Page 32: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

MIFID alertImprove Client's communication

Regulatory Process complex / low signals events

ESG monitoringEcological – Social – Governance

An union calls for a strike in a factory in Argentina?

Negative news coverage is accelerating for a stock I hold in Chinese press but are not yet in English press?

A European company employs children in Bangladesh (*)?

ACTIONS

Page 33: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

111111111

3231

111111111

3231

111111111

3231

dfs

96

3

99693

text_file.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)Job

Executor

Page 34: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

Nimbus

Zookeeper

Zookeeper

Worker

Worker

Worker

Worker

Page 35: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

Velocity

Big Data

Variety

- News, blogs, social media, analyst reports, company announcement, traders’ chat room…

- Financial reports, price, economic events...

- Weather, GPS, image....

Volumn

- ETL- Machine learning- Correlation analysis,- regressions….

- As fast as possible

Page 36: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

B.No, I’m a quant. I found it’s hard to quantified news.

A.No, I found news are noisy. They are just too much.

C. Yes. But I found using news is not very efficient. I have to manually related them to my portfolio.

Page 37: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

❏ Guaranteed data processing❏ Horizontal scalability❏ Fault-tolerance❏ Higher level abstraction than message passing❏ Real-time machine learning for classification and predictive

analytics

Page 38: “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Dr. Juan Cheng, Data Scientist at Infotrie

www.infotrie.com@infotrie

Analysis of an Indonesian Company “Pelindo” in English vs Bahasa Indonesia

Tracking of weak signals : local languages with little to no coverage in English press

Sen

tim

ent

-5/5

New

s vo

lum

e #