Top Banner
The next AMPLab: Real-time Intelligent Secure Execution Ion Stoica October 26, 2016
46

The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Jan 06, 2017

Download

Data & Analytics

Spark Summit
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

The next AMPLab: Real-time Intelligent Secure ExecutionIon StoicaOctober 26, 2016

Page 2: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Berkeley’s AMPLab2011 – 2016• Mission: “Make sense of big data”• 8 faculty, 60+ students

Governmental and industrial founding

2

Algorithms

Machines People

AMP

Page 3: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

AMPLab Goal and Impact

3

Goal: Next generation of open source data analytics stack for industry & academia

Berkeley Data Analytics Stack (BDAS)

Page 4: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

What is next?

Page 5: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

RISE: Real-time Intelligent Secure Execution

Page 6: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

From batch data to advanced analytics

AMPLab

6

From live data to real-time decisions

RISELab

Page 7: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Why?Data only as valuable as the decisions it enables

7

Page 8: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Why?

What does this mean?• Faster decisions better than slower decisions• Decisions on fresh data better than decisions on stale data• Decisions on personalized data better than on generic data

8

Data only as valuable as the decisions it enables

Page 9: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Goal

Real-time decisions

on live data

with strong security

9

decide in ms

the current state of the environment

privacy, confidentiality, integrity

Page 10: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Typical decision system

10

Decision System DecisionData Preprocess(e.g., train)

Intermediatedata

(e.g., model)

Queryengine

Automatic decision engine

update latency decision latency

Want low update latency & low decision latency

Page 11: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Why is it hard?

Want high quality decisions• Sophisticated, e.g., fraud, forecast, fleet of drones• Accuracy, low false positives and negatives• Robust to noisy and unforseen data

Want low latency for both updates and decisions

Want strong security: privacy, confidential, integrity

11

Page 12: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Example: Zero-time defense

12

Problem: zero-day attacks can compromise millions of hosts in secondsSolution: analyze network flows to detect attacks and patch hosts/software in real-time• Intermediate data: create attack model• Decision: detect attack, patch

Quality sophisticated, accurate, robust Latency update (sec ) / decision (ms)Security privacy (encourage users to share logs), integrity

Page 13: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Application QualityLatency

SecurityUpdate Decision

Zero-time defense sophisticated, accurate, robust sec ms privacy, integrity

Parking assistant sophisticated, robust sec sec privacy

Disease discovery sophisticated, accurate hours sec/min privacy, integrity

IoT (smart buildings) sophisticated, robust min/hour sec privacy, integrity

Earthquake warning sophisticated, accurate, robust min ms integrity

Chip manufacturing sophisticated, accurate, robust min sec/min confidentiality, integrity

Fraud detection sophisticated, accurate min ms privacy, integrity

“Fleet” driving sophisticated, accurate, robust sec sec privacy, integrity

Virtual companion sophisticated, robust min/hour sec integrity

Video QoS at scale sophisticated min ms/sec privacy, integrity

Page 14: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Challenges

14

Automated decisions on live data are hard

Poor security: exploits are daily occurrences

One-off solutions, expensive, slow to build

Real-time, sophisticated decisions that guarantee worst-case behavior on noisy and unforseen live data

Ensure privacy and integrity without impacting functionality

General platform: Secure Real-time Decision Stack

RISE Lab

Page 15: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Research directions

Systems: 100x lower latency, 1,000x higher concurrency than today’s Spark

Machine learning: Robust, on-line ML algorithms

Security: achieve privacy, confidentiality, and integrity without impacting performance or functionality

15

Page 16: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Early work

Drizzle

Opaque

16

Page 17: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Streaming Micro-batching vs. record-at-a-time

Micro-batching (e.g., Spark) inherits batch’s properties• fault-tolerance• straggler mitigation• optimizations• unification with other libraries

Record-at-a-time (e.g., Storm, Flink), typically lower latency

17

Page 18: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Yahoo’s streaming benchmark

Input: 20M JSON ad-events / second, 100 campaignsOutput: ad counts per campaign over a 10sec windowLatency: (end of window) – (time last event was processed)SLA: 1secFindings: Storm, Flink provide indeed lower latency than Spark

18

Streaming systemadsad counts per campaign

Page 19: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Spark Streaming

19

Workers

Master

Process batch Schedule tasks

task

task

task

Page 20: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

20

Workers

Master…

Spark Streaming

Cluster status

Page 21: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Spark Streaming

21

Workers

Master

Process batch Schedule tasks

task

task

task

Page 22: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

22

Workers

Master…

Spark Streaming

Cluster status

Page 23: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Drizzle

Goal: reduce Spark streaming latency by at least 10x

Key observation: consecutive iterations use same DAG

Solution: push scheduling decisions to workers

23

Page 24: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Group scheduling

24

Workers

Master…

Spark Streaming Drizzle

Workers

Master…

tasks

tasks

tasks

Page 25: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

25

Workers

Master…

Spark Streaming Drizzle

Workers

Master……

tasks

tasks

tasks

Process batch

Page 26: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

26

Workers

Master…

Spark Streaming Drizzle

Workers

Master……

tasks

tasks

tasks

Process batch

Page 27: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

27

Workers

Master…

Spark Streaming Drizzle

Workers

Master……

tasks

tasks

tasks

Cluster status

Page 28: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Latency

28

0 500 1000 1500 2000 2500 30000

0.2

0.4

0.6

0.8

1

Spark

Flink

Final Event Latency (ms)

CDF

Page 29: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Latency

29

0 500 1000 1500 2000 2500 30000

0.2

0.4

0.6

0.8

1

Spark

Drizzle

Flink

Final Event Latency (ms)

CDF

Similar latency to Flink

Page 30: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Latency, w/ ReduceBy optimization

30

0 100 200 300 400 500 600 7000

0.2

0.4

0.6

0.8

1

Spark

Drizzle

Flink

CDF

Final Event Latency (ms)

Aggregate counters on map side to reduce shuffle traffic

Page 31: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Latency, w/ ReduceBy optimization

31

0 100 200 300 400 500 600 7000

0.2

0.4

0.6

0.8

1

Spark

Drizzle

Flink

CDF

Final Event Latency (ms)

Aggregate counters on map side to reduce shuffle traffic

Page 32: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Fault tolerance

32

150 170 190 210 230 250 270 290 310 330 350100

1000

10000

100000

DrizzleSparkFlink

Late

ncy

(ms)

four nines SLA: 8.6 sec per day exceeding SLA

Recovers 5x faster than Flink with 10x

lower latency

Time (seconds)

Page 33: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Early results

Drizzle

Opaque

33

Page 34: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

State-of-the-art security today

Authentication, encryption at-rest and in-motion

34

Spark Core

SparkStreaming

Spark SQL MLlib GraphX

OS (e.g., Linux), Cluster Manager (e.g., Kubernetes),

Hypervisor (e.g., Xen)

private/public cluster

Not enough if OS or hypervisor compromised, and attacker

get root access

Page 35: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

State-of-the-art security today

Authentication, encryption at-rest and in-motion

35

Spark Core

SparkStreaming

Spark SQL MLlib GraphX

OS (e.g., Linux), Cluster Manager (e.g., Kubernetes),

Hypervisor (e.g., Xen)

private/public cluster

Not enough if attacker can observe network and

memory access patters

Page 36: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

OpaqueLeverage Intel’s SGX: hardware enclaveImplement secure distributed relational algebra

36

Execution

SparkStreaming

Spark SQL MLlib GraphX

Query Optimizer (Catalyst)enc-filter enc-join enc-agg enc-sort

Page 37: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Opaque: two modes

Encryption mode• Protect against compromised software (e.g., OS) • Full data encryption, authentication, and computation

verification in hardware enclave

Oblivious mode• Additionally, hide data access pattern

37

Page 38: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Opaque: Big Data Benchmark

38

Query 1 Query 2 Query 30.01

0.1

1

10

100SparkSQL Opaque encryption Opaque oblivious

Runt

ime

(s)

Page 39: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Opaque: Big Data Benchmark

39

Query 1 Query 2 Query 30.01

0.1

1

10

100SparkSQL Opaque encryption Opaque oblivious

Runt

ime

(s)

Encrypted operators implemented in C++

Page 40: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Opaque: Big Data Benchmark

40

Query 1 Query 2 Query 30.01

0.1

1

10

100SparkSQL Opaque encryption Opaque oblivious

Runt

ime

(s)

Up to 100x slower but 1,000x faster

than state-of-the-art

Page 41: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Next AMPLab: RISELab

Already promising results

Expect much more over the next five years!

41

Goal: develop Secure Real-time Decision Stack, an open source platform, tools and algorithms

for real-time decisions on live data with strong security

Page 42: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Thank you

Page 43: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

AMPLab alumni presenting here

43

Page 44: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Example: “Fleet” drivingProblem: suboptimal driving decisionsSolution: collect & leverage info from other cars and drivers in real-time• Intermediate data: automatically annotate

maps, actions of other drivers• Decision: avoid obstacles, congestions

44

Quality sophisticated, accurate, noise tolerantPerformance sec (decision) / sec (update) Security privacy, data integrity

Page 45: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Not only hypotheticalAttacks getting root access by exploiting OS/DBs vulnerabilities

Attacks exploiting access pattern leakages

45

Page 46: The Next AMPLab: Real-Time, Intelligent, and Secure Computing

46

Workers

Master…

Spark Streaming Drizzle

Workers

Master……

tasks

tasks

tasks

Process batch