Top Banner
© 2015 Mesosphere, Inc. All Rights Reserved. SIMPLIFYING STREAMING ANALYTICS 1 Cassandra Summit 2015 Brenden Matthews @brndnmtthws
13

Cassandra summit 2015 - Simplifying Streaming Analytics

Apr 16, 2017

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cassandra summit 2015 - Simplifying Streaming Analytics

© 2015 Mesosphere, Inc. All Rights Reserved.

SIMPLIFYING STREAMING ANALYTICS

1

Cassandra Summit 2015

Brenden Matthews @brndnmtthws

Page 2: Cassandra summit 2015 - Simplifying Streaming Analytics

© 2015 Mesosphere, Inc. All Rights Reserved.

AGENDA

2

• Introduction • Streaming analytics:

• What is it? • Why do it? • When do I need it? • How? - Demo! • What are the limitations?

Page 3: Cassandra summit 2015 - Simplifying Streaming Analytics

© 2015 Mesosphere, Inc. All Rights Reserved.

ABOUT ME - BRENDEN MATTHEWS

3

• ASF member, Mesos committer • Have contributed to a number of related OSS projects,

including Spark, Storm, Kafka, Presto, and a number of Mesos schedulers

• SA @Mesosphere, formerly on the DI team @Airbnb

Page 4: Cassandra summit 2015 - Simplifying Streaming Analytics

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAMING ANALYTICS: WHAT IS IT?

4

Indeed.

Page 5: Cassandra summit 2015 - Simplifying Streaming Analytics

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAMING ANALYTICS: WHAT IS IT?

5

• Perform joins, aggregations, mutations on data as it happens

• Components typically include: • Producer • Message broker • [E] Consumer • [T] Processing engine • [L] Storage

Page 6: Cassandra summit 2015 - Simplifying Streaming Analytics

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAMING ANALYTICS: WHAT IS IT?

6

• Perform joins, aggregations, mutations on data as it happens

• Components typically include: • Producer • Message broker • [Extract] Consumer • [Transform] Processing engine • [Load] Storage

Page 7: Cassandra summit 2015 - Simplifying Streaming Analytics

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAMING ANALYTICS: WHY DO IT?

7

• Data is constantly being generated • HTTP traffic, clickstream, IoT, metrics

• Most data is correlated (requires joins) • Data can be pre-denormalized (i.e.,

flattened) • Immutability • Build “real time” services—what’s

happening right now? • Compute once

Page 8: Cassandra summit 2015 - Simplifying Streaming Analytics

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAMING ANALYTICS: WHEN DO I NEED IT?

8

• Messaging platform • Compliance • Fraud detection • Firehose consumption • Recommendation engine

Page 9: Cassandra summit 2015 - Simplifying Streaming Analytics

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAMING ANALYTICS: HOW?

9

Producer

Broker

Consumer/ML

Storage

Pipeline

Page 10: Cassandra summit 2015 - Simplifying Streaming Analytics

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAMING ANALYTICS: HOW?

10

Pipeline

Page 11: Cassandra summit 2015 - Simplifying Streaming Analytics

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAMING ANALYTICS: HOW?

11

Demo time!

github.com/mesosphere/iot-demo

Page 12: Cassandra summit 2015 - Simplifying Streaming Analytics

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAMING ANALYTICS: WHAT ARE THE LIMITATIONS?

12

• Not a replacement for all batch workloads

• Backfilling is tricky • Unless you retain a log of all data

mutations, backfilling my be impossible

• Maintaining a completely immutable system may explode storage costs

Page 13: Cassandra summit 2015 - Simplifying Streaming Analytics

© 2015 Mesosphere, Inc. All Rights Reserved.

QUESTIONS?

13