Top Banner
Finding the Signal in the Noise June 15, 2015 Webinar Presentation Nova Spivack, CEO [email protected]
37

SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Jul 28, 2015

Download

Data & Analytics

DATAVERSITY
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Finding the Signal in the Noise

June 15, 2015Webinar Presentation

Nova Spivack, [email protected]

Page 2: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

What is Bottlenose For?

Bottlenose discovers the threats and opportunities that impact your business

Bottlenose does this using patented stream intelligence technology

2

Page 3: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Key Stream Intelligence Use-Cases

Threats• Risk detection• Crisis mitigation• Competitive threats• Reputational threats• Cyber threat detection

Opportunities• Audience and customer insights• Innovation and research• New business and market opportunities• Competitive intelligence• Product and marketing intelligence

3

Page 4: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Vision

Stream IntelligenceOur mission is to build the leading business intelligence company for stream data

Stream data is the fastest growing segment of data. It includes all types of live or historical, unstructured or structured, time-stamped data, such as: email and messaging data, social media, mobile data, news, IT log data, CRM data, support data, sales data, Web and app analytics data, financial data, sensor and device data.

We have built the first unified platform and application for automating the discovery of actionable intelligence across any stream data sources – We call this stream intelligence.

4

Page 5: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

... the future belongs to raw unstructured or semi-structured data from both internal and external sources - increasingly delivered in (near) real-time.

This data has great value yet most organizations do not have the tech infrastructure to handle all this data.” - IDC

Problem: Massive growth of unstructured data cannot be managed effectively with existing Tech infrastructure

Real-Time Discovery Against Streaming Data is Required:

5

Page 6: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

● There are never going to be enough data scientists or analysts to cope with the rise of unstructured stream data in the enterprise

● Analysts need automated stream intelligence tools to help them deal with the volume, velocity and variety of stream data

Analysts Are Drowning in Streams

Page 7: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Solution: Bottlenose Automates Stream Intelligence

• Bottlenose provides the most advanced automated stream intelligence that automatically finds patterns such as trends, anomalies, threats, opportunities and correlations in stream data

• Bottlenose is extremely easy to use and easy to derive value from right away without extensive engineering and IT involvement or long professional solutions

• The platform combines both internal enterprise data and external data from social, broadcast, web and other areas.

We are In The Stream Intelligence Sweet Spot

The Bottlenose solution is a new generation of tools that automates the production of actionable intelligence from stream data

Variety Velocity

Volume

&

ELK Stack

7

Page 9: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Competitive Advantage from Coping with Stream Data 9

Page 10: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Competitive Advantage from Coping with Stream Data 10

Page 11: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Competitive Advantage from Coping with Stream Data 11

Page 12: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Competitive Advantage from Coping with Stream Data 12

Page 13: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

BottlenosePlatform

Social & traditional media (social networks, blogs,

Forums, newswires)

98% of all live TV & Radio Broadcasts

Enterprise Data(Sales, financials, Web

analytics, IT systems, email, internal databases, etc.) Web Data, commercial data

sources, financial market data sources, public data

sources

Machine and sensor data (Internet-of-things, machine

data, weather data, etc.)

13Generate Actionable Intelligence from ANY Stream Data

13

Page 14: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Stream Intelligence Pipeline

Applications

Rules & Agents Alerts/Actions Based on Business Interests

Stream Data Storage & APILong-term storage,

real-time access, search & APIs

Trend DetectionExtrapolation, Correlation/Clustering

Data Mining & Analytics~30 Entity Types and ~150 Metrics

Ingestion & EnrichmentPush/Pull of Unstructured/Structured Data

Data in Motion

Alerts & Actions

New Patterns

Entities & Metrics

14

Page 15: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Breaking News

Automatically Discover Threats and Opportunities

Known Issues

Unknown/EmergingIssues

Customer Problems

Enterprise Risk Factor

Fraud Risk

Product Recall

Competitive Threat

Cyber Attack

FocalInterest

Power Outage

Natural DisasterTraffic Congestion

Device Failure

Financial Trading Anomaly

Reputation Risk

Intellectual Property Violation

15

Page 16: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Continuous High-Volume Stream Analytics• 3 billion live + historical messages analyzed every hour

• 72 billion records analyzed per day + predictive analytics on 7.2 billion

• 67,000,000 new messages ingested every day

• Trend detection at a rate of 1 million events per second

• 30 entity types recognized * 150 metrics per entity * 10’s of millions of entities = ~50 to 100 billion time series monitored and analyzed continuously

• Growing to 200 Terabytes of data stored & analyzed continuously in 2015

1000s of High-Level Detected Trends Per Hour• Automated data science layer applies machine learning, statistics, predictive

analytics to correlate, cluster, predict and analyze emergent trends

We See the Near Future Before Anyone Else• 80% of the time, our system detects breaking news and emerging threats,

opportunities and keywords up to 10’s to 100’s of minutes ahead of the media, Twitter, ad networks, etc. Similar advantages against non-text data sources

Key Metrics

Bottlenose analyzes 72 billion data

records every day

16

Page 17: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Demo: Data Agnostic Stream Intelligence

17

Page 18: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Customer Facing Products

● Analytics, intelligence, and discovery engine ○ Nerve Center○ Full-stack offering

● Streaming data services to applications ○ Bottlenose API (Platform)

18

Page 19: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

‣ Advanced filtering & aggregations using simple OLAP interface

‣ “Interactive Analytics” thanks to sub-second query response time

‣ Add new data sources using central mapping system

Analytics Engine 19

Page 20: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

A sophisticated Semantic approach is required to make sense of the raw data. The structure of data can be derived based on entities/dimensions the system has a pattern for. Machine learning techniques can begin to make inferences and match to known profiles as data flows in.

One of the most powerful capabilities is when different data sources need to be compared. A system like ours automatically normalizes them. For example, when the data has different time granularity, we automatically align different time periods in order to find overlaps.

Of course the Semantic engine can also be adjusted with a vertical industries unique facts, relationships, and jargon.

Need for a Radical New Form of Information Retrieval: Semantic meaning bottoms up from raw data

20

Page 21: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Application - Nerve Center

Our application provides a powerful suite of tools to find business insights in streaming data:

● Monitor: Real-time monitoring with powerful live visualizations.● Analyze: Fast interactive analytics to dig deep into the data.● Discover: Automated insight discovery. Get notified when new patterns

are detected.● Customize: Reports and live dashboards can be created for any vertical

by mix-and-matching insights & visualizations across any combination of data streams

21

Page 22: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Bottlenose Platform

Ingest Augment

Analytics Engine

Discovery Engine

Nerve Center®

Store

22

Page 23: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Augmentation Engine

Analytics Engine

Detection Engine

Correlations Engine

Rules & Agent Engine

Processing Layers

Depth of Insight

23

Page 24: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Augmentation Engine

Analytics Engine

Detection Engine

Correlations Engine

Rules & Agent Engine

Processing Layers

Depth of Insight

24

Page 25: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Augmentation Engine

Analytics Engine

Detection Engine

Correlations Engine

Rules & Agent Engine

Processing Layers

Depth of Insight

25

Page 26: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

‣ Typical topic stream like “Beyonce” (Pepsi)

‣ 4M new events (data records) per month

‣ ~8M unique entities tracked per month

‣ ~8M unique entities x 150 metrics x many time buckets = A lot of data points

‣ And this is just 1 stream. We have thousands of these running at all times...

Data Points 26

Page 27: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Augmentation Engine

Analytics Engine

Detection Engine

Correlations Engine

Rules & Agent Engine

Depth of Insight

Processing Layers 27

Page 28: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

‣ Systematically walk through all data points.

‣ Continuous stream of categorized signals. Searchable.

Detection Engine 28

Page 29: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

29

Page 30: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends
Page 31: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends
Page 32: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Detection EngineAnticipatePython servers for trend detection & extrapolation

DetectorWorkers that continuously aggregateentities and fetch corresponding metrics

Context GatheringFinding additional meta-data around detections

Time Series Extrapolation

ClusteringRolling clustering of trends based on overlapping meta-data and a variety of distance functions

Analytics Requests

Entities & Time Series

Analytics Requests

Related Entities

Find Related Trends

Related Trends

New and updated trends

32

Page 33: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

‣ Python library, using SciPy‣ Algorithms for detection & extrapolation in

time series data‣ Includes tooling for debugging, training and

simulating‣ ~500 detections/CPU-core/second

Anticipate 33

Page 34: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Augmentation Engine

Analytics Engine

Detection Engine

Correlations Engine

Rules & Agent Engine

Depth of Insight

Processing Layers 34

Page 35: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Augmentation Engine

Analytics Engine

Detection Engine

Correlations Engine

Rules & Agent Engine

Depth of Insight

Processing Layers 35

Page 36: SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Our automated insight discovery on streaming data enables “intelligence as a service for every organization”

Intelligence as a Service 36