Top Banner
Boğaziçi University – Computer Engineering Dept. CMPE 521 An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data PAPER PRESENTATION on An Efficient and Resilient Approach to Filtering & Disseminating Streaming Data CMPE 521 Database Systems Prepared by: Mürsel Taşgın Onur Kardeş
54

PAPER PRESENTATION on

Jan 21, 2016

Download

Documents

Effie

PAPER PRESENTATION on An Efficient and Resilient Appro ac h to Filtering & Disseminating Streaming Data CMPE 521 Database Systems Prepared by: Mürsel Taşgın Onur Kardeş. Introduction. The internet and the web are increasingly used to disseminate fast changing data . - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

PAPER PRESENTATION on

An Efficient and Resilient Approach to Filtering & Disseminating Streaming

Data

CMPE 521

Database Systems

Prepared by:

Mürsel Taşgın

Onur Kardeş

Page 2: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Introduction

The internet and the web are increasingly used to disseminate fast changing data.

Several examples for fast changing data:sensors,

traffic and weather information,

stock prices,

sports scores,

health monitoring information

Page 3: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Introduction

The properties of this data:Highly dinamic,

Streaming,

Aperiodic.

Users are interested in not only monitoring streaming data but in also using it for on-line decision making.

Page 4: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Introduction

SOURCE

Repository 1

Repository 2

Replicating the Source

Repository 3

Page 5: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Introduction

Services like Akamai.net and IBM’s edge server technology are exemplars of such networks of repositories, which aim to provide better services by shifting most of the work to the edge of the network (closer to the end users).

But, although such systems scale quite well, if the data is changing at a fast rate, the quality of service at a repository farther from the data source would deteriorate.

Page 6: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Introduction

In general;Replication can reduce the load on the sources,

But, replication of time-varying data introduces new challenges:

Coherency

Delays and scalability

Page 7: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Introduction

Coherency requirement (cr) : Coherency requirement (cr) : Users specify the bound on the tolerable imprecision associated with each requested data item.

SOURCE

Microsoft : $60,85

at time : 11:43 Repository 2

Microsoft : $60,86

at time : 11:41

Repository 1

Microsoft : $60,89

at time : 11:36

USER 1

USER 2

Page 8: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Introduction

Coherency-preserving system:Coherency-preserving system:the delivered data must preserve associated coherency requirements,

resilient to failures,

efficient.

Necessary changes are pushed to the users; instead of polling the source independently.

Page 9: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Introduction

Construction of an effective Construction of an effective dissemination network of repositoriesdissemination network of repositories

A logical overlay network of repositories are created according to:

coherency needs of users attached to each repository

expected delays at each repository

this network is called dynamic data dissemination graph (d3g).

Page 10: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Introduction

Construction of an effective Construction of an effective dissemination network of repositoriesdissemination network of repositories

The previous algorithm called LeLA, for d3g, was unable to cope with large number of data.

A new algorithm (DiTA) to build dissemination networks that are scalable and resilient, is introduced.

Page 11: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Introduction

Construction of an effective Construction of an effective dissemination network of repositoriesdissemination network of repositories

In DiTA, repositories with more stringent coherency requirements are placed closer to the source in the network as they are likely to get more updates than the ones with looser coherency requirements.

In DiTA, a dynamic data dissemination tree, d3g, is created for each data item, x.

Page 12: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Introduction

SOURCE

Repository 1c = 0.2

Repository 2c = 0.3

Repository 3c = 0.8

Repository 4c = 0.7

Repository 5c = 0.9

Repository 6c = 0.7

Construction of an effective Construction of an effective dissemination network of repositoriesdissemination network of repositories

Page 13: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Introduction

Provision for the dissemination of dynamic data Provision for the dissemination of dynamic data in spite of failures in the overlay network in spite of failures in the overlay network

to handle repository and communication link failures; back-up parents are used.

back-up parent is asked to deliver data with coherency that is less stringent than that associated with the parent.

Page 14: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Introduction

Provision for the dissemination of dynamic data Provision for the dissemination of dynamic data in spite of failures in the overlay networkin spite of failures in the overlay network

x,y,z,t a,b,c,x

zy,z,tx,t

Parent

Back-up Parent

Page 15: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Introduction

Efficient filtering and scheduling techniques for Efficient filtering and scheduling techniques for repositoriesrepositories

normally a repository receives updates and selectively disseminates them to its downstreams.

it is not always necessary to disseminate the exact values of the most recent updates, as long as the values presented preserve the coherency of the data.

Page 16: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

The Basic Framework: Data Coherency and Overlay Network

Page 17: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

The Basic Framework: Data Coherency and Overlay Network

a coherency requirement (c) is associated with a data

item, to denote the maximum permissible deviation of

the user’s view from the value of data x at the source.

c can be specified in terms of;time (values should never be out-of-sync by more than 5sec.)

value (weather information where the temperature value should never be out-of-sync by more than 2 degrees).

Page 18: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

The Basic Framework: Data Coherency and Overlay Network

Each data item in the repository from which a user obtains data must be refreshed in such a way that the user-specified coherency requirements are maintained.

fidelity f observed by a user can be defined to be the total length of time for which the above inequality holds

Ux(t) – Sx(t) ≤ c1

Px(t) – Sx(t) ≤ c2

Page 19: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

The Basic Framework: Data Coherency and Overlay Network

Assume x is served by a single source

Repositories R1,....,Rn are interested in x.

These repositories in turn serve a subset of the remaining repositories such that the resulting network is in the form a tree rooted at the source and consisting of repositories R1,....,Rn .

Parent dependent relationship.

Page 20: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

The Basic Framework: Data Coherency and Overlay Network

Since the repository disseminates updates to its users and dependents, the coherency requirement of a repository should be the most stringent requirement that it has to serve.

When a data change occurs at the source, it checks which of its direct and indirect dependents are interested in the change and pushes the change to them.

Page 21: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Building a d3t

Start with a physical layout of the communication network in the form of a graph, where the graph consists of a set of sources, repositories and the underlying network.

Try to build a d3t for a data item x.

The root of the d3t will be the source, which serves x.

A repository P serving repository Q with data item x, is called the parent of Q; and Q is called the dependent of P for x.

Page 22: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Building a d3t

Source for data itemx

USERSUSERSUSERSUSERS

R1 R2

Parent

Dependents

Level 0

Level 1

Level 2

in each repository;

Page 23: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Building a d3t

A repository should ideally serve at least as many unique pairs as the number of data items served to it.

If a repository is currently serving less than this fixed number, then we say that the repository has the resources to serve a new dependent.

R1Dependent Data Item

R7 xR11 yR18 xR9 zR10 tR21 x

?

Page 24: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Building a d3t

R4c=0.1

R7c=0.8

R5c=0.4

R9c=0.7

R8c=0.6

SOURCE

R6c=0.5

R10c=0.3

Enough resources?

Max(c)=0.8Max(c)=0.7

Max(c)=0.8 Max(c)=0.6Max(c)=0.7

Enough resources?

Enough resources?YEScR6 > cR10So, replace R10 with R6, and push R6 down

NO

NO

Page 25: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Building a d3t

R4c=0.1

R5c=0.4

R6c=0.5

R8c=0.6

R10c=0.3

Max(c)=0.6

R9c=0.7

SOURCE

Max(c)=0.8

Max(c)=0.8

R7c=0.8

Max(c)=0.7

Max(c)=0.7Max(c)=0.5

This algorithm is called as

Data-Item-at-a-Time-Algorithm

(DiTA)

Page 26: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Building a d3t

Real world stock price streams from http://finance.yahoo.com are used.

10,000 values are polled during 1,000 traces; approximately a new data value is obtained per second.

Traces – Collection procedure and charectristicsTraces – Collection procedure and charectristics

Page 27: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Building a d3t

A coherency requirement c is associated with each of the chosen data items.

c’s associated with data in a repository are a mix of stringent tolerances (varying from $0.01 to 0.05) and less stringent tolerances (varying from $0.5 to 0.99).

T% of the data items have stringent coherency requirements at each repository (the remaining (100 – T)%, of data items have less stringent coherency requirements).

Repositories – Data, Coherency and Cooperation characteristicsRepositories – Data, Coherency and Cooperation characteristics

Page 28: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Building a d3t

The router topology was generated using BRITE (http://www.cs.bu.edu/brite).

The repositories and the sources are selected randomly.

node-node communication delays derived from a Pareto distribution: x (1 / x1/α) + x1 where α = x’ / (x’-1) and

Physical Network – topology and delaysPhysical Network – topology and delays

Page 29: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Building a d3t

x’ is the mean, x1 is the minimum delay a link can have.

According to the experiments, x’=15 ms and x1=2 ms.

The computational delays for dissemination is taken to be 12.5 ms .

Physical Network – topology and delaysPhysical Network – topology and delays

Page 30: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Building a d3t

The key metric is the loss in fidelity of the data.

Fidelity was the total length of time which the inequality;

|P(t) – S(t)| < c holds.

Fidelity of a repository is the mean over all data items stored in that repository

Fidelity of the system is the mean fidelity of all repositories.

Obviously, the loss of fidelity is (100% - fidelity)

One another metric is the number of messages in the system (system load)

MetricsMetrics

Page 31: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Building a d3t

For the base performance measurement, 600 routers, 100 repositories and 4 servers were used.

Total number of data items served by servers was varied from 100 to 1000.

T parameter was varied from 20 to 80.

A previous algorithm, LeLA was used as a benchmark.

Performance EvaluationPerformance Evaluation

Page 32: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Building a d3t

Each node in DiTA does less work than in LeLA.

Thus, in DiTA height of the dissemination tree will be more.

So, when computational delays are low; but link delays are large, LeLA may act better.

But, this happens only for negligible computational delays (0.5 ms) and very high link delays (110 ms)

Performance EvaluationPerformance Evaluation

Page 33: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Enchancing the Resiliency of the Repository Network

Active backups vs. Passive backups

Passive backups may increase the load, which causes the loss in fidelity.

So active backup parents are used.

A backup parent serves data to a dependent Q with a coherency cB > c.

Page 34: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Enchancing the Resiliency of the Repository Network

If all changes are less than cB, the dependent can not know when parent P fails. So P should send periodic “I’m alive” messages.

Once P fails, Q requests B to serve it the data at c . When P recovers from the failure, Q requests B to serve the data item at cB.

In this approach, there no backup for backups. So that when both P and B fails, Q can not get any updates.

Page 35: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Enchancing the Resiliency of the Repository Network

For the sake of simplicity, cB = k * c.Here, choice of k is important:

Choice of cChoice of cBB Using a Probabilistic Model Using a Probabilistic Model

kBackup will send

updates frequentlywhich incur high computational

and communication

overheads

Dependent will miss a

large number of changes during

failure of the parent

Page 36: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Enchancing the Resiliency of the Repository Network

Assuming that the data values change with uniform probability and

Using a Markov Chain Model:

# Misses = 2k2 – 22k2-2 is the number of updates a dependent will miss before it detects that there is a failure.

According to the experiments, this number is rather pessimistic; nearly an upper limit.

Choice of cChoice of cBB Using a Probabilistic Model Using a Probabilistic Model

Page 37: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Enchancing the Resiliency of the Repository Network

Choice of backup parentsChoice of backup parents

R

BP

Q

CAny siblings?

NO

Any siblings?

B C

YES

Choose one of them randomly

Page 38: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Enchancing the Resiliency of the Repository Network

In case the coherency at which Q wants x from B is less then the coherency at which B wants x ,

the parent of B is asked to serve x to Q with the required tighter coherency.

An advantage of choosing a sibling, is that the change in coherency requirement is not percolated all the way to the source.

However, if an ancestor of P and B is heavily loaded, then the delay due to the load will be reflected in the updates of both the P and B . This might result in additional loss in fidelity.

Choice of backup parentsChoice of backup parents

Page 39: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Enchancing the Resiliency of the Repository Network

Because the kinds of failures are memory-less, an exponential probability distribution is used for simulating them.

Pr (X > t) = e-λt

λ = λ1 time to failure

λ = λ2 time to recover

In this approach link failures are not taken into account. So the model is incomplete...

Effect of Repository failures on Loss of FidelityEffect of Repository failures on Loss of Fidelity

λ2fast recovery

slow recovery

Page 40: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Enchancing the Resiliency of the Repository Network

The effect of adding resiliency is shown.

k=2 is used.

When 100 data items are used, 23% of updates sent by backups are disseminated.

Some updates sent by backups reached before parents’.

Perfomance EvaluationPerfomance Evaluation

Page 41: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Enchancing the Resiliency of the Repository Network

But when backup parents are loaded ( > 400), their updates are of no use, and increase the loss of fidelity.

The dependent should control them by time-stamping the updates.

Perfomance EvaluationPerfomance Evaluation

Page 42: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Enchancing the Resiliency of the Repository Network

During the experiment, about 80-90% of the repositories experienced at least one failure,

and the maximum number of failures in the system at any given time for λ2 = 0.001 was around 12.

For λ2 = 0.01, the maximum number of failures was 5 and for λ2 = 0.1 , the maximum failures was 2.

Perfomance EvaluationPerfomance Evaluation

Page 43: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Enchancing the Resiliency of the Repository Network

Effect of quick recovery is shown.

λ1 = 0.0001 and λ2 = 2

For high coherence requirements, resiliency improves fidelity even for transient failures.

Perfomance EvaluationPerfomance Evaluation

Page 44: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Enchancing the Resiliency of the Repository Network

However, with resiliency; with a very large number of data items, for e.g., 1000, fidelity drops.

This is because, at this point, the cost of resiliency exceeds the benefits obtained by it, and hence this increases the lost in fidelity.

Perfomance EvaluationPerfomance Evaluation

Page 45: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Reducing the Delay at a Repository

Delays1) Queing delay: The time delay between the arrival of the update and time

its processing started

2) Processing delay: Check delay (decide if the update should be processed) + computation delay( delay of computing the update and pushing data to the dependents)

Update of yUpdate of x update of yupdate of x

Queue update requests

queing delay

Check if update needed

yx

Process of the updates and disseminating data is complete!

processing delay

Page 46: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Reducing the Delay at a Repository

Question: How can we reduce the average delays to improve fidelity?

This can be done by:a) Better filtering i.e. Reducing the processing delay in determining if

an update needs to disseminated to one or more dependents

a) Better scheduling of disseminations

Page 47: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Reducing the Delay at a Repository

Better Filtering

For each dependent, a repository maintains the coherency req. & last value pushed to

Upper bound = last pushed value + cr

Lower bound = last pushed value - cr

C1=0.7

C2=0.6

C3=0.5

C4=0.3

C5=0.1

C6=0.05

The dependent with first largest cr which needs to be disseminated

For every window the below rule is valid

If an update violates above rule a pseudo value is generated as actual value

Algorithm to find the dependents to disseminate data

So

rted

cr

valu

es

CR values for dependents reside at the repository

Dependent ordering

Page 48: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Reducing the Delay at a Repository

Better Filtering

Better filtering provides:

•Sending the updates of dynamic data to end users who are actually

interested in that update.

•By filtering, no garbage data flow is on the network. (no flooding of

data over the network) This improves communication time in the

networks and provides better response times

•By the help of filtering, a better scalable system can be established and it will resist against unexpected heavy loads.

Page 49: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Reducing the Delay at a Repository

Better scheduling of disseminations

u2u1

C(u1)Cost of update(delay)

C(u2)Cost of update(delay)

b(u1)Beneficiary of update

b(u2)Beneficiary of update

Total delay of processing ui

Approach:

Instead of standard queueing of processing the update requests, a kind of prioritization is superior to have

better performance b(u)/C(u) SCORINGEach update request is shceduled according to this score. B(u) is the number of dependents that will receive the update, C(u) is the cost of dissemination to all dependants. B(u) values are stored at aech repository so they are precomputed automatıcally.

Advantages:

•Update requests that is important to many dependents will be processed earlier BUSINESS IMPORTANCE

•Updates with low ratio gets delayed and if a new update arrives older ones are dropped, which improves performance especially in heaviliy loaded environments SCALABILITY

Page 50: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Reducing the Delay at a Repository

Scheduling provides:

• Priority scheme and business importance approach that achieves better results

• As filtering, it makes improvements on scalability; some out of date update requests are discarded from the queue. This saves unnecessary computations and queue delays.

Better scheduling of disseminations

Page 51: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Reducing the Delay at a Repository

Experimental Results“Dependent ordering” has lower loss of fidelity than “simple algorithm”. However “Scheduling” has better than those (up to 15%)

“Dependent ordering” has less number of pushes than “simple algorithm”.

“Scheduling” algorithm decrease computation delays because some updates are dropped at the queue because of new updates arrive and older ones are out of date.

Fidelity loss with “Scheduling” is shown with some numbers. It is seen that fidelity drops with an increase in the number of data items. Even with large increases in the number of data items, high update rates loss of fidelity is in the range within 10% only.

This provides better scalability

Page 52: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Reducing the Delay at a Repository

Advantages of the better performance approachesApproach-1-: Maintaining the dependents ordered by cr values

Reduces the number of checks required for processing each update

Reduces the number of pushes

Approach-2-: Scheduling

Reduces the overall delay to the end clients by processing updates which provide a higher benefit at a lower cost

Gives a better choice in dropping updates as low score updates are dropped

Due to lower propagation delay, it provides better scalibility and degrades gracefully under unexpected heavy loads

Page 53: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Related Work

Simple decision procedure is superior. Because there are many complex algorithms and database systems, that take much computation time to maintain data repository up to date

Some dynamic web data dissemination algorithms also uses push-based scheme. However if they use coherency scalability is improved and another important feature is that data repositories don’t need to cooperate with each other to maintain coherence information. (it’s up to date already!)

This approach deals with rapidly changing dynamic data while some similar approaches focus on web content that changes at slower time-scales

Most powerful side of this approach is that it deals with the problem of failure and forms a resillient dissemination network.

Page 54: PAPER PRESENTATION   on

Boğaziçi University – Computer Engineering Dept. CMPE 521

An Efficient and Resilient Approach to Filtering &Disseminating Streaming Data

Conclusion

The key points in this architecture are:

Design of a push-based dissemination for time-varying data. Not all the updates are disseminated to each repository, only the updates that meet the coherency requirements are pushed EFFICIENT

Design of cooperative dissemination network. This provides a resilient network and even if a failure in the network occurs, data coherency is not completely lost. RESILLIENT

Intelligent filtering, scheduling, selective dissemination reduces the overhead in the network. It provides a better scalability and it’s a good alternative for dynamic data publishing. SCALABLE