Top Banner
Load Shedding in Load Shedding in a Data Stream a Data Stream Manager Manager Kevin Hoeschele Kevin Hoeschele Anurag Shakti Maskey Anurag Shakti Maskey
43

Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Dec 18, 2015

Download

Documents

Myron Cole
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Load Shedding in Load Shedding in a Data Stream a Data Stream

ManagerManagerKevin HoescheleKevin Hoeschele

Anurag Shakti Maskey Anurag Shakti Maskey

Page 2: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

OverviewOverview

Loadshedding in Streams exampleLoadshedding in Streams example

How Aurora looks at Load SheddingHow Aurora looks at Load Shedding

The algorithms Used by AuroraThe algorithms Used by Aurora

Experiments and resultsExperiments and results

Page 3: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Load Shedding in a Load Shedding in a DSMSDSMS

Systems have a limit to how much fast Systems have a limit to how much fast data can be processeddata can be processed

When the rate is too high, Queues will When the rate is too high, Queues will build up waiting for system resourcesbuild up waiting for system resources

Loadshedding discards some data so the Loadshedding discards some data so the system can flowsystem can flow

Different from networking loadsheddingDifferent from networking loadshedding Data has semantic value in DSMSData has semantic value in DSMS QoS can be used to find the best stream to QoS can be used to find the best stream to

dropdrop

Page 4: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Hospital - NetworkHospital - Network Stream of free doctors locationsStream of free doctors locations Stream of untreated patients locations, Stream of untreated patients locations,

their condition (dieing, critical, injured, their condition (dieing, critical, injured, barely injured)barely injured)

Output: match a patient with doctors Output: match a patient with doctors within a certain distancewithin a certain distance

JoinDoctors

PatientsDoctors who can work on a patient

Page 5: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Too many Patients, what to do?Too many Patients, what to do?

Loadshedding based on conditionLoadshedding based on condition Official name “Triage”Official name “Triage” Most critical patients get treated firstMost critical patients get treated first Filter added before the JoinFilter added before the Join

Selectivity based on amount of untreated Selectivity based on amount of untreated patientspatients

JoinDoctors

PatientsDoctors who can work on a patient

Condition Filter

Page 6: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Aurora OverviewAurora Overview

Push based data from streaming sourcesPush based data from streaming sources 3 kinds of Quality of Service3 kinds of Quality of Service

LatencyLatency Shows utility drop as answers take longer to Shows utility drop as answers take longer to

achieveachieve Value-basedValue-based

Shows which output values are most importantShows which output values are most important Loss-toleranceLoss-tolerance

Shows how approximate answers affect a queryShows how approximate answers affect a query

Page 7: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Loadshedding Loadshedding TechniquesTechniques

Filters (semantic drop)Filters (semantic drop) Chooses what to shed based on QoSChooses what to shed based on QoS Filter with a predicate in which selectivity = Filter with a predicate in which selectivity =

1-p1-p Lowest utility tuples are droppedLowest utility tuples are dropped

Drops (random drop)Drops (random drop) Eliminates a random fraction of inputEliminates a random fraction of input Has a p% chance of dropping each incoming Has a p% chance of dropping each incoming

tupletuple

Page 8: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

3 Questions of Load 3 Questions of Load SheddingShedding

WhenWhen Load of system needs constant evaluationLoad of system needs constant evaluation

WhereWhere Dropping as early as possible saves most Dropping as early as possible saves most

resourcesresources Can be a problem with streams that fan out and Can be a problem with streams that fan out and

are used by multiple queriesare used by multiple queries How muchHow much

the percent for a random dropthe percent for a random drop Make the predicate for a semantic Make the predicate for a semantic

drop(filter)drop(filter)

Page 9: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Load Shedding in AuroraLoad Shedding in Aurora Aurora CatalogAurora Catalog

Holds QoS and other statisticsHolds QoS and other statistics Network descriptionNetwork description

Loadshedder monitors these and Loadshedder monitors these and input rates: makes loadshedding input rates: makes loadshedding decisionsdecisions Inserts drops/filters into the query Inserts drops/filters into the query

network, which are stored in the catalognetwork, which are stored in the catalogLoad Shedder

Catalog

Query NetworkInput streams output

Network descriptionChanges toQuery plansData rates

Page 10: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

EquationEquation N= networkN= network I=input streamsI=input streams C=processing capacityC=processing capacity Uaccuracy= utility from loss-tolerance QoS graphUaccuracy= utility from loss-tolerance QoS graph H=Headroom factor, % of sys resources that can be used at a H=Headroom factor, % of sys resources that can be used at a

steady statesteady state

If (Load(N(I)) > C then load shedding is neededIf (Load(N(I)) > C then load shedding is needed (why no H)(why no H)

Goal is to get a new network N’ based on N but where:Goal is to get a new network N’ based on N but where: min{Uaccuracy(N(I))-Uaccuracy(N’(I))} min{Uaccuracy(N(I))-Uaccuracy(N’(I))}

andand

(Load(N’(I)) < H * C(Load(N’(I)) < H * C

Page 11: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Load Shedding AlgorithmLoad Shedding Algorithm

Evaluation StepEvaluation Step When to shed load?When to shed load?

Load Shedding Road Map (LSRM) Load Shedding Road Map (LSRM) Where to shed load?Where to shed load? How much load to shed?How much load to shed?

Page 12: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Load EvaluationLoad Evaluation

Load Coefficients (Load Coefficients (LL)) the number of processor cycles the number of processor cycles

required to push a single tuple through required to push a single tuple through the network to the outputsthe network to the outputs

c1

s1

c2

s2

cn

sn

…I O

n

i

i

ij

j

j cs1

1

1

*)(L = • n operators

• ci = cost

• si = selectivity

Page 13: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Load Evaluation Load Evaluation Load CoefficientLoad Coefficient

L1 = 10 + (0.5 * 10) + (0.5 * 0.8 * 5) + (0.5 * 10) = 22

L2 = 10 + (0.8 * 5) = 14

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I

O1

4

c2 = 10

s2 = 0.9

O2

L1 = 22

L2 = 14 L3 = 5

L4 = 10L(I) = 22

Page 14: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Stream Load (Stream Load (SS)) load created by the current stream load created by the current stream

ratesrates

Load EvaluationLoad Evaluation

m

i

ii rL1

*S = • m input streams

• Li = load coefficient

• ri = input rate

Page 15: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Load EvaluationLoad EvaluationStream LoadStream Load

S = 22 * 10 = 220

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I

O1

4

c2 = 10

s2 = 0.9

O2

L1 = 22

L2 = 14 L3 = 5

L4 = 10L(I) = 22r = 10

Page 16: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Queue Load (Queue Load (QQ)) load due to any queues that may have load due to any queues that may have

built up since the last load evaluation built up since the last load evaluation stepstep

MELT_RATEMELT_RATE = = how fast to shrink the how fast to shrink the queuesqueues

(queue length reduction (queue length reduction per unit time)per unit time)

Load EvaluationLoad Evaluation

Q = MELT_RATE * Li * qi

• Li = load coefficient

• qi = queue length

Page 17: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Load EvaluationLoad EvaluationQueue LoadQueue Load

MELT_RATE = 0.1

Q = 0.1 * 5 * 100 = 50

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I

O1

4

c2 = 10

s2 = 0.9

O2

L1 = 22

L2 = 14 L3 = 5

L4 = 10L(I) = 22r = 10

q = 100

Page 18: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Load EvaluationLoad EvaluationTotal LoadTotal Load

•Total Load (T) = S + Q

T = 220 + 50 = 270

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I

O1

4

c2 = 10

s2 = 0.9

O2

L1 = 22

L2 = 14 L3 = 5

L4 = 10L(I) = 22r = 10

q = 100

Page 19: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

The system is overloaded whenThe system is overloaded when

Load EvaluationLoad Evaluation

T > H * C

headroom factor processing capacity

Page 20: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Load Shedding AlgorithmLoad Shedding Algorithm

Evaluation StepEvaluation Step When to drop?When to drop?

Load Shedding Road Map Load Shedding Road Map (LSRM)(LSRM) How much to drop?How much to drop? Where to drop?Where to drop?

Page 21: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Load Shedding Road Load Shedding Road Map (LSRM)Map (LSRM)

<Cycle Savings Coefficients (CSC)

Drop Insertion Plan (DIP)

Percent Delivery Cursors (PDC)>set of drops that will be inserted

how many cycles will be saved

where the system will be running when the DIP is adopted

……max savingsmax savings

……

(0,0,0,…,0)(0,0,0,…,0)

CSCCSC

DIPDIP

PDCPDC

ENTRY nENTRY n…………ENTRY 1ENTRY 1

cursor more load sheddingless load shedding

Page 22: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

LSRM ConstructionLSRM Constructionset Drop Locations

compute & sort Loss/Gain ratios

how much to drop?

take the least ratio

insert Drop

create LSRM entry

how much to drop?

take the least ratio

insert Filter

create LSRM entry

determine predicate

Drop-Based LS Filter-Based LS

Page 23: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Drop Drop LocationsLocations

Single Queryset Drop Locations

compute & sort Loss/Gain ratios

Drop-Based LS Filter-Based LS

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I O

L1 = 17 L2 = 14 L3 = 5

A B C D

Page 24: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Drop Drop LocationsLocations

Single Queryset Drop Locations

compute & sort Loss/Gain ratios

Drop-Based LS Filter-Based LS

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I O

L1 = 17 L2 = 14 L3 = 5

A

Page 25: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Drop LocationsDrop LocationsShared Query

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I

O1

4

c2 = 10

s2 = 0.9

O2

L1 = 22

L2 = 14 L3 = 5

L4 = 10A

B

C

D E

F

set Drop Locations

compute & sort Loss/Gain ratios

Drop-Based LS Filter-Based LS

Page 26: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Drop LocationsDrop LocationsShared Query

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I

O1

4

c2 = 10

s2 = 0.9

O2

L1 = 22

L2 = 14 L3 = 5

L4 = 10A

B

C

set Drop Locations

compute & sort Loss/Gain ratios

Drop-Based LS Filter-Based LS

Page 27: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Loss/Gain Loss/Gain RatioRatioLossLoss

Loss – utility loss as tuples are Loss – utility loss as tuples are droppeddropped

– – determined using loss-determined using loss-tolerance QoS tolerance QoS graph graph

set Drop Locations

compute & sort Loss/Gain ratios

Drop-Based LS Filter-Based LS

100 50 0% tuples0

0.7

1

utility

Loss for first piece of graph

= (1 – 0.7) / 50

= 0.006

Page 28: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Loss/Gain Loss/Gain RatioRatioGainGain

Gain – processor cycles gainedGain – processor cycles gained

• R = input rate into drop operator

• L = load coefficient

• x = drop percentage

• D = cost of drop operator

• STEP_SIZE = increments for x to find G(x)

Gain G(x) =

otherwise 0

0 x if )*(* DLxR

set Drop Locations

compute & sort Loss/Gain ratios

Drop-Based LS Filter-Based LS

Page 29: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Drop-Based Load Drop-Based Load SheddingShedding

how much to drop?how much to drop?

Take the least Loss/Gain ratio Take the least Loss/Gain ratio

Determine the drop percentage Determine the drop percentage pp

how much to drop?

take the least ratio

insert Drop

create LSRM entry

Drop-Based LS

Page 30: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Drop-Based Load Drop-Based Load SheddingShedding

where to drop?where to drop? how much to drop?

take the least ratio

insert Drop

create LSRM entry

Drop-Based LS

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I O

L1 = 17 L2 = 14 L3 = 5

A drop drop dropdrop

If there are other drops in the network, modify their drop percentages.

Page 31: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Drop-Based Load Drop-Based Load SheddingShedding

make LSRM entrymake LSRM entry

All All dropdrop operators with the operators with the modified percentages form the modified percentages form the DIPDIP

Compute CSCCompute CSC Advance QoS cursors and store in Advance QoS cursors and store in

PDCPDCLSRM Entry

<Cycle Savings Coefficients (CSC)

Drop Insertion Plan (DIP)

Percent Delivery Cursors (PDC)>

how much to drop?

take the least ratio

insert Drop

create LSRM entry

Drop-Based LS

Page 32: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Filter-Based Load Filter-Based Load SheddingShedding

how much to drop?how much to drop?predicate for filterpredicate for filter

Start dropping from the interval Start dropping from the interval

with the lowest utility.with the lowest utility. Keep a sorted list of intervals Keep a sorted list of intervals

according to their utility and relative according to their utility and relative frequency.frequency.

Find out how much to drop and what Find out how much to drop and what intervals are needed to .intervals are needed to .

Determine the predicate for filter.Determine the predicate for filter.

how much to drop?

take the least ratio

insert Filter

create LSRM entry

determine predicate

Filter-Based LS

Page 33: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Filter-Based Load Filter-Based Load SheddingShedding

place the filterplace the filterhow much to drop?

take the least ratio

insert Filter

create LSRM entry

determine predicate

Filter-Based LS

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I O

L1 = 17 L2 = 14 L3 = 5

A filter filter filterfilter

If there are other filters in the network, modify their selectivities.

Page 34: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Experiment setupExperiment setup

Simulated network Simulated network Processing tuple time simulated by Processing tuple time simulated by

having the simulator process use the having the simulator process use the cpu for amount of time needed for an cpu for amount of time needed for an operator to consume a tupleoperator to consume a tuple

Process for each input streamProcess for each input stream randomly created networkrandomly created network

Num querys, Num operations for querys Num querys, Num operations for querys chosenchosen

Random networks a good benchmark?Random networks a good benchmark?

Page 35: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

ExperimentsExperiments

Used only Join, Filter, Union Aurora Used only Join, Filter, Union Aurora OperatorsOperators Filters were simple comparison predicates Filters were simple comparison predicates

of the form:of the form: Input_value > filter_constantInput_value > filter_constant

Filters and Drops loadshedding were Filters and Drops loadshedding were Compared to 4 Admission Control Compared to 4 Admission Control AlgorithmsAlgorithms Similar in style to networking loadsheddingSimilar in style to networking loadshedding

Page 36: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Evaluation MethodsEvaluation Methods

Loss-tolerance, and Value-based QoS were Loss-tolerance, and Value-based QoS were usedused

Tuple Utility is the utility from Loss-Tuple Utility is the utility from Loss-tolerance QoStolerance QoS K= num time segmentsK= num time segments nnii= num tuples per time segment i = num tuples per time segment i

uuii= loss-tolerance utility for each tuple during = loss-tolerance utility for each tuple during time segment itime segment i

Page 37: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Value UtilityValue Utility Value Utility is the Utility from value-based Value Utility is the Utility from value-based

QoSQoS ffii= relative frequency of tuples in value interval i = relative frequency of tuples in value interval i

with no dropswith no drops ffii’’=frequency relative to the total number of tuples=frequency relative to the total number of tuples UUii=average value utility for value interval i=average value utility for value interval i

When there are multiple queries, Overall When there are multiple queries, Overall Utility is the sum of the utilities for each queryUtility is the sum of the utilities for each query

Page 38: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

AlgorithmsAlgorithms

Input-RandomInput-Random One random stream is chosen, and tuples are shed One random stream is chosen, and tuples are shed

untill excess load is covereduntill excess load is covered if the whole stream is shed and there is still excess if the whole stream is shed and there is still excess

load, another random stream is chosenload, another random stream is chosen Input-Cost-TopInput-Cost-Top

Similar to Input-Random, but uses the input stream Similar to Input-Random, but uses the input stream with the most costly inputwith the most costly input

Input-UniformInput-Uniform Distributes load shedding uniformly by each input Distributes load shedding uniformly by each input

streamstream Input-Cost-UniformInput-Cost-Uniform

Load is shed of all input streams, weighted by their Load is shed of all input streams, weighted by their costcost

Page 39: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Results – Tuple Utility Results – Tuple Utility LossLoss

Observations:

QoS driven AlgorithmsPerform better

Filter works better then Drop

Page 40: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

Results -Value utility lossResults -Value utility loss

Filter-LS is clearly the best

Drop-LS is no better then the Admission control algorithms

Page 41: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

ConclusionConclusion

Loadshedding is important to DSMSLoadshedding is important to DSMS Many variables to considor when Many variables to considor when

planning to use Loadsheddingplanning to use Loadshedding Drop and Filter are two QoS driven Drop and Filter are two QoS driven

algorithmsalgorithms QoS based strategies work better QoS based strategies work better

then Admission controlthen Admission control

Page 42: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

QuestionsQuestions Drop and Filter were the two QoS loadshedding Drop and Filter were the two QoS loadshedding

algorithms given here. Are there any others?algorithms given here. Are there any others?

Admission Control may be a viable option in Admission Control may be a viable option in processing network requests, but in a streaming processing network requests, but in a streaming database system the connection is already made. database system the connection is already made. Where putting the incoming tuples into a buffer Where putting the incoming tuples into a buffer to in effect deny the stream bandwidth, would to in effect deny the stream bandwidth, would this increase utility?this increase utility?

Why are REDs useful or not useful for streaming Why are REDs useful or not useful for streaming databases?databases?

Page 43: Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.

More QuestionsMore Questions When we have a low bandwidth connection like a sensor When we have a low bandwidth connection like a sensor

that is unreliable and when a significant amount of traffic that is unreliable and when a significant amount of traffic is out of order, is TCP the best transport protocol?is out of order, is TCP the best transport protocol?

When there is high traffic, to what extent should the When there is high traffic, to what extent should the network do the load shedding? Should the database network do the load shedding? Should the database system be doing more because it knows the semantics of system be doing more because it knows the semantics of the tuples?the tuples?

So the idea of Admission control doesn't directly cross-So the idea of Admission control doesn't directly cross-over from networks to streaming databases.  But does the over from networks to streaming databases.  But does the idea of buffering the input when the process becomes idea of buffering the input when the process becomes overloaded, achieve the same effect?  Why doesn't aurora overloaded, achieve the same effect?  Why doesn't aurora have this? have this?