Top Banner
Stream Analytics in the Enterprise
37

Stream Analytics in the Enterprise

Apr 16, 2017

Download

Software

Jesus Rodriguez
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Stream Analytics in the Enterprise

Stream Analytics in the Enterprise

Page 2: Stream Analytics in the Enterprise

About Us

• Emerging technology firm focused on helping enterprises build breakthrough software solutions

• Building software solutions powered by disruptive enterprise software trends

-Machine learning and data science -Cyber-security -Enterprise IOT -Powered by Cloud and Mobile• Bringing innovation from startups and academic institutions to the enterprise

• Award winning agencies: Inc 500, American Business Awards, International Business Awards

Page 3: Stream Analytics in the Enterprise

• The elements of stream analytic solutions• Stream analytic platforms: on-premise vs. cloud• On-premise stream analytic platforms• Cloud stream analytic services• Complementary technologies

Agenda

Page 4: Stream Analytics in the Enterprise

The elements of enterprise stream analytic solutions

Page 5: Stream Analytics in the Enterprise

• Real time data ingestion• Execute SQL queries on dynamic streams of data• Time window queries • Connect query outputs to new data streams• Leverage reference data in the stream queries

Capabilities of Stream Analytic Solutions

Page 6: Stream Analytics in the Enterprise

Stream analytic platforms

Page 7: Stream Analytics in the Enterprise

Cloud vs. On-premise stream analytic platforms

Page 8: Stream Analytics in the Enterprise

Capabilities of Stream Analytic Solutions

ExtensibilityControlRich programming modelIntegration with on-premise big data pipeline

Complex infrastructureScalabilityMaintenance and monitoring

Simple provisioningElastic scalabilityIntegrated with PaaS offeringsRich monitoring and management experience

Integration with on-premise systemsExtensibility Lack of customization

On-premise stream analytic platforms Cloud stream analytic services

Page 9: Stream Analytics in the Enterprise

On-premise stream analytic platforms

Page 10: Stream Analytics in the Enterprise

Lead Platforms

Apache Storm

Apache Spark

Apache Samza

Apache Flink

Akka

Page 11: Stream Analytics in the Enterprise

Apache Storm

• Stream processing framework with micro-batching capabilities

• Included in most Hadoop distributions

• Main model (spouts and bolts) -One at a time -Lower latency -Operates on tuple streams• Trident -Micro-batching -Higher throughput

Page 12: Stream Analytics in the Enterprise

Apache Storm: Benefits vs. Challenges

• Broad adoption• Included in Hadoop distributions• Vibrant community • Extensibility • Support for different programming

languages

• Increasing competition from newer stacks

• Performance limitations at very large scale

Benefits Challenges

Page 13: Stream Analytics in the Enterprise

Apache Spark

• Micro-batching processing framework

• Elastic scalability models• Receivers split data into batches• Spark Streaming processes

batches and produces results• High throughput – higher latency • Functional APIs

Page 14: Stream Analytics in the Enterprise

Spark Streaming: Benefits vs. Challenges

• MPP infrastructure• Interoperability with other Spark

programming models (Java, Python, SQL)

• Integration with messaging frameworks

• Extensibility• Included in most Hadoop

distributions

• Time window queries• Complex infrastructure setup• Integration with line of business

systems

Benefits Challenges

Page 15: Stream Analytics in the Enterprise

Apache Samza

• Built to address some of the limitations of Apache Storm

• Deep integration with Samza and Yarn

• Simple API comparable to map-reduce

• Leverages Yarn for task distribution, fault tolerance and scalability

Page 16: Stream Analytics in the Enterprise

Apache Samza: Benefits vs. Challenges

• Highly scalable, fault-tolerant model

• Stateful stream data processing• Extensibility • Simple infrastructure

• Small adoption• Low level API• Heavy IO operations

Benefits Challenges

Page 17: Stream Analytics in the Enterprise

Apache Flink Streaming

• Alternative to Spark• Everything is a stream• Platform to unity batch and stream

processing• True streaming with adjustable

latency and throughput • Support different stream sources

and transformations

Page 18: Stream Analytics in the Enterprise

Apache Flink Streaming: Benefits vs. Challenges

• Combine batch and stream data processing

• Expressive APIs • Data flows and transformation • Extensiblity

• Small adoption• Limited state management • High availability models

Benefits Challenges

Page 19: Stream Analytics in the Enterprise

Akka Streams

• Micro-service, actor oriented model

• Messaging driven • Isolated failures• Reactive programming model

based on source, sinks and flows• DSL for stream data manipulation

Page 20: Stream Analytics in the Enterprise

Akka Streams: Benefits vs. Challenges

• Rich stream data processing model• Extensibility• Concurrency and thread-safey • Leverage mainstream Java and

Scala programming models

• Small adoption• Dependent on Akka’s architecture

style• Support for languages outside the

JVM

Benefits Challenges

Page 21: Stream Analytics in the Enterprise

Cloud stream analytic platforms

Page 22: Stream Analytics in the Enterprise

Lead Platforms

AWS Kinesis Analytics

Azure Stream Analytics

Bluemix Stream Analytics

Page 23: Stream Analytics in the Enterprise

AWS Kinesis

• Native stream data services in AWS

• Combines three products in a single platform

-Kinesis Streams -Kinesis Firehose -Kinesis Analytics• Kinesis Streams allows to collect

data streams from any applications• Kinesis Firehose provides a model

to load streaming data into AWS• Kinesis Analytics allow the

execution of SQL queries over data streams

Page 24: Stream Analytics in the Enterprise

AWS Kinesis: Benefits vs. Challenges

• Elastic scalability model• Simple provisioning • Interoperable APIs• Very complete suite of platforms

• AWS Kinesis Analytics hasn’t been released

• Interoperability with on-premise data streams

Benefits Challenges

Page 25: Stream Analytics in the Enterprise

Azure Stream Analytics

• Native stream analytic service in the Azure platform

• Allow the execution of SQL queries over dynamic streams of data

• Integrates with the other components of the Cortana Analytics suite

• Leverages Azure Event Hub for high volume data ingestion

• Very rich monitoring and analytic capabilities

Page 26: Stream Analytics in the Enterprise

Azure Stream Analytcis: Benefits vs. Challenges

• Elastic scalability model• Simple provisioning • Interoperable APIs• Very complete suite of platforms • Rich SQL query and analytics

model

• Interoperability with on-premise data streams

• Extensibility

Benefits Challenges

Page 27: Stream Analytics in the Enterprise

Bluemix Streaming Analytics

• Native stream analytic service in the IBM Bluemix platform

• Built upon IBM Streams technology

• Allow the execution of SQL queries over dynamic streams of data

• Support interactive and programmatic query models

• Rich analytic and monitoring capabilities

• Stream visualization graph

Page 28: Stream Analytics in the Enterprise

Azure Stream Analytcis: Benefits vs. Challenges

• Elastic scalability model• Simple provisioning • Interoperable APIs• Rich SQL query and analytics

model

• Adoption • Interoperability with on-premise

data streams • Extensibility

Benefits Challenges

Page 29: Stream Analytics in the Enterprise

You can’t buy everything!

Page 30: Stream Analytics in the Enterprise

Capabilities of Enterprise Stream Analytic Solutions

• Stream tracking • Replay and simulation• Stream data testing • Integration with line of business systems • Stream data search • Integration with mainstream analytic tools

Page 31: Stream Analytics in the Enterprise

Complementary technologies

Page 32: Stream Analytics in the Enterprise

Other Relevant Technologies in Stream Analytic Solutions

• Enterprise messaging platforms • Time series databases• Stream data connectors

Page 33: Stream Analytics in the Enterprise

Enterprise Messaging Platforms

• Persistent messaging• Pub-sub messaging • Support for multiple messaging

patterns• Ordered messaging

Page 34: Stream Analytics in the Enterprise

Time Series Databases

• Store time stamped data• Time series query functions• Integrate real time and reference

data

Page 35: Stream Analytics in the Enterprise

Stream data connectors

• Develop stream data sources from line of business systems

• Integrate real time and reference data from enterprise systems into the stream data pipeline

• Combine real time data from multiple line of business systems into single data streams

Page 36: Stream Analytics in the Enterprise

Summary

• Stream data processing and analytics is a key element of modern enterprise data pipelines

• Some of the lead on-premise stream analytic stacks include: Apache Storm, Apache Samza, Spark Streaming, Flink Streaming, Akka….

• Some of the lead cloud stream analytic services include: AWS Kinesis, Azure Stream Analytics, Bluemix Streaming Analytics…

• You can’t buy everything! Stream analytic solution require custom implementations

• When building stream analytic solutions, consider complementary technologies such as enterprise messaging stacks or time series databases

Page 37: Stream Analytics in the Enterprise

Thankshttp://[email protected]