Top Banner
Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems
96

Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Dec 27, 2015

Download

Documents

Melvyn Booth
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Distributed OSes Continued

Andy Wang

COP 5911

Advanced Operating Systems

Page 2: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

More Introductory Materials

Important Issues in distributed OSes Important distributed OS tools and

mechanisms

Page 3: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

More Important Issues in Distributed Operating Systems Autonomy Consistency and transactions

Page 4: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Autonomy

To some degree, users need to control their own resources

The more a system encourages interdependence, the less autonomy

How to best trade off sharing and interdependence versus autonomy?

Page 5: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Problems with Too Much Interdependence Vulnerability to failures Global control Hard to pinpoint responsibility Hard security problems

Page 6: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Problems with Too Much Autonomy Redundancy of functions Heterogeneity

Especially in software Poor resource sharing

Page 7: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Methods to Improve Autonomy Without causing problems with sharing

Replicate vital services on each machine Don’t export services that are unnecessary Provide strong security guarantee

Page 8: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Consistency

Maintaining consistency is a major problem in distributed systems

If more than one system accesses data, can be hard to ensure consistency

But if cooperating processes see inconsistent data, disasters are possible

Page 9: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

A Sample Consistency Problem

Site A

Site B

Site CData Item 1

Page 10: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

A Sample Consistency Problem

Site A

Site B

Site CData Item 1

Page 11: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

A Sample Consistency Problem

Site A

Site B

Site CData Item 1

Page 12: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

A Sample Consistency Problem

Site A

Site B

Site CData Item 1

Page 13: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

A Sample Consistency Problem

Site A

Site B

Site CData Item 1

Page 14: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

A Sample Consistency Problem

Site A

Site B

Site CData Item 1

Page 15: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Causes of Consistency Problems Failures and partitions Caching effects Replication of data

Page 16: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

So why do this stuff?

Note these problems arise because of what are otherwise desirable features

Failures and partitions Working in the face of failures

Caching Avoiding repetition of expensive operations

Replication Higher availability

Page 17: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Handling Consistency Problems Don’t share data

Generally not feasible Callbacks Invalidations Ignore the problem

Sometimes OK, but not always

Page 18: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Callback Methods

Check that your data view is consistent whenever there might be a problem

In most general case, on every access More practically, every so often Extremely expensive if remote check required High overheads if there’s usually no problem

Page 19: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Invalidation Methods

When situations change, inform those who know about the old situation

Requires extensive bookkeeping Practical in some cases when changes

infrequent High overheads if there’s usually no problem

Page 20: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Consistency and Atomicity

Atomic actions are “all or nothing” Either the entire set of actions occur Or none of them do

At all times, including while being performed Apparently indivisible and instantaneous Relatively easy to provide in single machine

systems

Page 21: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Atomic Actions in Single Processors Lock all associated resources (with

semaphores or other synchronization mechanisms)

Perform all actions without examining unlocked resources

Unlock all resources Real trick is to provide atomicity even if

process is switched in the middle

Page 22: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Why are distributed atomic actions hard? Lack of centralized control What if multiple processes on multiple

machines want to perform an atomic action? How do you properly lock everything? How do you properly unlock everything? Failure conditions especially hard

Page 23: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Important Distributed OS Tools and Mechanisms Caching and replication Transactions and two-phase commit Hierarchical name space Optimistic methods

Page 24: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Caching and Replication

Remotely accessing data in the pits It almost always takes longer It’s less predictable It clogs the network It annoys other nodes Other nodes annoy your It’s less secure

Page 25: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

But what else can you do?

Data must be shared And by off-machine processes

If the data isn’t local, and you need it, you must get it

So, make sure data you need is local The problem is that everyone else also wants their

data local

Page 26: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Making Data Local

Store what you need locally Make copies Migrate necessary data in Cache data Replicate data

Page 27: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Store It Locally

Each site stores the data it needs on local media

But what if two sites need to store the same data?

Or if you don’t have enough room for all your data?

Page 28: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Local Storage Example

Site A

Foo

Site B

Bar

Site C

Froz

Page 29: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Make Copies

Each site stores its own copy of the data it needs

Works well for rarely updated data Like copies of system utility programs

Works poorly for frequently written data Doesn’t solve the problem of lack of local

space

Page 30: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Copying Example

Site A

Foo

Site B

Copy of Foo

Site C

Copy of Foo

Page 31: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Migrate the Data In

When you need a piece of data, find it and bring it to your site Taking it away from the old site

Works poorly for highly shared data Can cause severe storage problems Can overburden the network Essentially how shared software licenses

work

Page 32: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Migration Example

Site A

Foo

Site B

Site C

I need Foo

Page 33: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Migration Example

Site A Site B

Site C

Foo

Page 34: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Caching

When data is accessed remotely, temporarily store a copy of it locally Perhaps using callback or invalidation for

consistency Or perhaps not

Avoids problems of storage Still not quite right for frequently written data

Page 35: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Caching Example

Site A

Foo

Site B

Cached Foo

Site C

Cached Foo

Page 36: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Replication

Maintain multiple local replicas of the data Changes made to one replica are

automatically propagated to other replicas Logically connects copies of data into a

single entity Doesn’t answer question of limited space

Page 37: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Replication Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

Page 38: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Replication Advantages

Most accesses to data are purely local So performance is good

Fault tolerance Failure of a single node doesn’t lose data Partitioned sites can access data

Load balancing Replicas can share the work

Page 39: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Replication and Updates

When a data item is replicated, updates to that item must be propagated to all replicas

Updates come to one replica Something must assure they get to the others

Page 40: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Replication Update Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 41: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Replication Update Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 42: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Update Propagation Methods

Instant versus delayed Synchronous versus asynchronous Atomic versus non-atomic

Page 43: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Instant vs. Delayed Propagation “Instant” can’t mean instant in a distributed

system But it can mean “quickly”

Instant notification not always possible What if a site storing a replica is down?

So some delayed version of update is also required

Page 44: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Instant Update Propagation Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 45: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Instant Update Propagation Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 46: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Instant Update Propagation Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 47: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Synchronous vs. Asynchronous Propagation Update request sooner or later gets a

success signal Does it get it before all propagation

completes (asynchronous) or not (synchronous)?

Synchronous propagation delays completion Asynchronous propagation allows

inconsistencies

Page 48: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Synchronous Propagation Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 49: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Synchronous Propagation Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 50: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Synchronous Propagation Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 51: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Synchronous Propagation Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

update complete

Page 52: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Asynchronous Propagation Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 53: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Asynchronous Propagation Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

update complete

Page 54: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Asynchronous Propagation Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

update complete

Page 55: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Atomic vs. Non-Atomic Update Propagation Atomic propagation lets no one see new data

until all replicas store it Non-atomic lets users see data at some

replicas before all replicas have updated it Atomic update propagation can seriously

delay data availability Non-atomic propagation allows users to see

potentially inconsistent data

Page 56: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Replication Consistency Problems Unless update propagation is atomic,

consistency problems can arise One user sees a different data version than

another user at the same time But even atomic propagation isn’t enough to

prevent this situation

Page 57: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Concurrent Update

What if two users simultaneously ask to update different replicas of the data?

“Simultaneously” has a looser definition in distributed systems

How do you prevent both from updating it? Update propagation style offers no help

Page 58: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Concurrent Update Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo update Foo

Page 59: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Concurrent Update Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo update Foo

Page 60: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Preventing Concurrent Updates One solution is to lock all copies before

making updates That’s expensive And what if one of 20 replicas is unavailable? You must allow updates to data when

partitions or failures occur

Page 61: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Locking Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 62: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Locking Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

request lock

request lock

Page 63: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Locking Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

request lock

request lock

lock granted

lock granted

Page 64: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Locking Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 65: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Locking Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 66: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Locking Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

unlock

unlock

Page 67: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Locking Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

unlock

unlock

unlocked

unlocked

Page 68: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Locking Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

update complete

Page 69: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Concurrent Update Prevention Schemes Primary site Token approaches Majority voting Weighted voting

Page 70: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Primary Site Methods

Only one site can accept updates Or that site must approve all updates

In extraordinary circumstances, appoint new primary site

+ Simple

- Poor reliability, availability

- Non-democratic

- Poor performance in many cases

Page 71: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Primary Site Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 72: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Primary Site Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 73: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Second Primary Site Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 74: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Second Primary Site Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 75: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Token-based Approaches

Only the site holding the token can accept updates

But the token can move from site to site+ Relatively simple+ More adaptive than central site+ Exploit locality- Poor reliability, availability- Non-demonstratic- Poor performance in some cases

Page 76: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Token Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 77: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Token Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 78: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Second Token Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 79: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Second Token Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 80: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Second Token Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 81: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Why is this any different than primary site?

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 82: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Majority Voting

To perform updates, replica must receive approval from majority of all replicas

Once a replica grants approval to one update, it cannot grant it to another Until the first update is completed

Page 83: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Majority Voting Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

Page 84: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Majority Voting Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

request vote

request vote

Page 85: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Majority Voting Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

request vote

request vote

yes vote

Page 86: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Majority Voting Example

Site A

Foo1

Site B

Foo2

Site C

Foo3

update Foo

request vote

request vote

yes vote

Page 87: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Majority Voting, Con’t

+ Democratic

+ Easy to understand

+ More reliable, available

- Some sites still can’t write

- Voting is a distributed action

So, it’s expensive to do it

Page 88: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Weighted Voting

Like majority voting, but some replicas get more votes than others

Must obtain majority of votes, but not necessarily from majority of sites

Fits neatly into transaction models

Page 89: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Weighted Voting Con’t

+ More flexible than majority

+ Can provide better performance

- Somewhat less democratic

- Some sites still can’t write

- Still potentially expensive

- More complex

Page 90: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Basic Problems with Update Control Methods Either very poor reliability/availability or

expensive distributed algorithms for update Always some reliability/availability problems Particularly bad for slow networks, expensive

networks, flaky networks, mobile computers

Page 91: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Transactions

A special case of atomic actions Originally from databases

Sequence of operations that transforms a current consistent state to a new consistent state Without ever exposing an inconsistent state

Page 92: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Transaction Example

Move $10 from my savings account to my checking account

Basically, subtract $10 from savings account, add $10 to checking account

But never “lose” my $10 And never give me an extra $10

Page 93: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Running Transactions

Multiple transactions must not interfere So you can run them one at a time Or run them simultaneously

But avoiding all interference Serializability avoids interference

Page 94: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Serializability

A property of a proposed schedule of transactions

A serializable schedule produces the same results as some serial execution of those transactions

Even though the actions may be have been performed in a different order

Page 95: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

Consistency Example for a Distributed System

Site A Site B

Site C

update variable X update variable Y

update variable Z

Page 96: Distributed OSes Continued Andy Wang COP 5911 Advanced Operating Systems.

What Problems Could Arise?

Other processes could write the variables Other processes could read the variables Failures could interrupt the process How can we avoid these problems?