Top Banner
Shiva Amiri, PhD Chief Product Officer MLConf Seattle - May 1 st 2015 Incorporating the Real Time Component into Analytics and Machine Learning
27

Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

Jul 15, 2015

Download

Technology

SessionsEvents
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

Shiva Amiri, PhD

Chief Product Officer

MLConf Seattle - May 1st 2015

Incorporating the Real Time Component into Analytics and Machine Learning

Page 2: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

The Challenge

One or more structural limitations have significantly constrained successful data mining applications and initiatives

Frequently, these problems are associated with the amount of data, the rate of data generation and the number of attributes (variables) to be processed –

1000’s of data variables form which to model from (dimensionality) 100’s of billions of records to model data Continuously evolving data elements and changing sets of data The need to execute and adapt in Real Time

Increasingly, this “big data” environment expands beyond the capabilities of conventional data mining methods and technology

2

Page 3: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

Source: http://www.informationweek.com/big-data/big-data-analytics/5-analytics-bi-data-management-trends-for-2015/a/d-id/1318551 -09/01/2015

What are the trends?

Page 4: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

4

The Market Opportunity

IDC Reports Big Data Analytics market at $125 billion in 2015

Gartner reports the Internet of Things (IoT) will have 25 billion devices with

sensors connected by 2020 producing exabytes of data

IoT/E Market size by 2020 will exceed $14 trillion

Bioinformatics market is $7.5 billion according to Gartner

Streaming data, Real Time analytics and machine learning remain a

significant challenge for multiple sectors

Page 5: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

Which verticals are we looking at?

Bioinformatics, Computational Biology – genetics, proteomics, EEG data, fMRI, Molecular Dynamics data, etc.

Financials – behaviour, signals, patterns

Internet of Everything

Other fast and massive data is what we are interested in

5

Page 6: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

Disorder X

An example: Complexity of Brain Disorders

Disorder Y

Page 7: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

7

What kinds of questions do we want to ask? How do the genes and proteins in disorders relate

to each other – clustering, regression,

classification, etc.

What are the other factors involved in disease

onset and progression?

What about environment data? Quality of Life?

Education? Socioeconomic status? - natural

language processing (NLP), classification,

predictive modeling, etc.

How can we handle massive amounts of brain

sensing and imaging data (EEG, fMRI) and link

them to other data (genes and proteins)?

Integrative analytics

And questions we don’t know we have

Page 8: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

Big Data: The Four V’s

Page 9: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

RTDS’ SymetryMLTM : What have we built?

SymetryML™ is a distributed GPU-implemented predictive analysis and modeling technology for our Massive Data universe…

V3.5 released – real time analytics of large-scale data

Exploration(statistics) and model building, assessment and prediction in real time

Robust security and privacy features

V4.0 being developed – distributed computing capability

9

Page 10: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

How is SymetryML™ addressing these challenges?

The V’s of Big Data SymetryMLTM can handle heavy volumes of data (Volume)

SymetryMLTM can handle streaming data (Velocity)

Accelerated hardware with GPUs and distributed computing

REST API – flexibility and modular design, seamless integration into existing systems or development of custom systems

Simplicity of the design

Real Time analytics – exploration and model generation/prediction, handling massive data with unprecedented speed in real time

Privacy and security

Service Oriented Architecture – XaaS

Page 11: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

11

Faster: In minutes SymetryMLTM can utilize 10,000’s+ variables by constructing 1000’s of model

combinations and ultimately reduce variables to a single model - builds models in real time as

it learns

Smarter with Scale: Linearly scalable with zero limitation in length of data sets and depth of

categorical data allows for unlimited learning from data

More Agile on-the-fly: Continuous learning, both distributed and parallel

Simply Deployed: SymetryMLTM models can be deployed in real time or in the form of scripts

(SQL, Java, etc.)

Proprietary Statistical Representation

Data

Learner Modeler

Predictor

Explorer

Page 12: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

12

Parallel Processing/Distributed

Computing

Incremental/Decremental

Learning

(no rescan)

Automated Variable Selection

Add variables on-the-fly

SymetryML™

A few key features

Page 13: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

Component Technologies

Component

Web UI

REST API

Core functionalities

NVIDIA GPU support

Project

sym-web

sym-rest

sym-core

sym-core

Language

JavaScript

Java

Java

C/C++

Page 14: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

SymetryML™-COREBasic Functionality:

Learn / Forget data

Univariate Analysis – Mean, StDev, F Test, Z Test, T Test,

Bivariate Analysis

Correlation

Hypothesis Testing

Chi-square Testing

ANOVA

Model Selection and Creation

Predictions

Assessment

Persistence

Page 15: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

Web-UI - exploration

15

Page 16: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

Web-UI - exploration

16

Page 17: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

Web-UI - modeling

17

Page 18: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

Web-UI - assessment

18

Page 19: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

RTDS Inc. – Headlines

Team of 6 engineers and Data Scientists in Toronto, Board in NY Focus on Technology Differentiation

Technology timeline March ’13 – Launched .NET Based Desktop Version

July ’13 – Launched SymetryMLTM Server with REST API.

December ’13 – Successfully deployed first GPU-based system

June ‘14 – Algorithmic Support Expanded

’15 Roadmap: Aggressive, Attainable and Defensible

Proven technology with successful deployment in advertising

Current Financing Mogility Capital

19

Page 20: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

Next steps

We’ve been successful with this technology in the mobile advertising space…now we want to use the power of this technology in other strategic sectors

We are looking for partners as beta users - with unique datasets and use cases - what kinds of questions can we help answer with your data?

We are looking for integration partners where we can both enhance our offering

Develop the next version (v4.0) of SymetryMLTM – fully parallel with Apache Spark

20

Page 22: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

22

Page 23: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

SymetryMLTM and

GPUs

• Native library that uses NVIDIA GPUs are available for:

• Linux 64 bit (CentOS 5.x and Amazon Linux)

• Use of GPUs for core operations:• Learning / Forgetting data

• Model Building

• Model Selection

Page 24: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

• Interactive HTML 5 application

• Direct connection to SYM-REST

• It is de-facto a light weight front-end to SYM-REST

• Based on Sencha Ext-JS 4.x

SymetryMLTM-WEB

Page 25: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

• Provides a Restful API to sym-core.

• Supported Data Sources:

• Amazon S3

• SFTP

• HTTP/HTTPS

• Redshift

• Upcoming Data Sources:

• HDFS

• ODBC/JDBC

SYM-REST

Page 26: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

• User of the rest-API needs an access key

• We generate these keys

• Key is AES 128 bits.

• Every REST request is authenticated with a HMAC

(SHA1) code based on part of the request

• If data encryption is needed, then usage of HTTPS

is possible

SYM-REST Security

Page 27: Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

Finance data example

• NASDAQ TotalView-ITCH Intraday Data Modeling

175Gb - one month of raw data

55Gb of transactions for NASDAQ100 constituents

12M rows/400 attributes

Univariate analysis across securities

Covariance and Hypothesis Testing

Model Building: Classification/Regression

Prediction of Price Movement

Full Order Book Analysis

27