Top Banner
1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer
23

1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

1

Principles of Reliable Distributed Systems

Tutorial 12: Frangipani

Spring 2009

Alex Shraer

Page 2: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

2

Frangipani File SystemFrangipani File System

Thekkath, Mann, and Lee, SOSP 1997

Page 3: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

3

Frangipani

• Scalable file system built at SRC-DEC

• Published in SOSP’97

• Uses failure detection, Paxos, leases,…

• Two layers:– Petal: virtual disk from many “storage bricks”– Frangipani file system and lock service

Page 4: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

4

Motivation

• Large-scale distributed file systems are hard to administer

• Hard to add/remove machines (servers)

• Hard to add/remove disks (storage space)

• Hard to manage set of current components

• Hard to manage locks

Page 5: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

5

Petal: Distributed Virtual Disks

C. A. Thekkath and E. K. LeeSystems Research Center

Digital Equipment CorporationASPLOS’96

Page 6: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

6

Client’s View

Page 7: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

7

Petal Overview

• Petal provides virtual disks– Large (264 bytes), sparse virtual space

– Disk storage allocated on demand

– Accessible to all file servers over a network

• Virtual disks implemented by– Cooperating CPUs executing Petal software

– Ordinary disks attached to the CPUs

– A scalable interconnection network

Page 8: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

8

Petal Prototype

Page 9: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

9

Global State Management

• Uses Paxos– Global state is replicated across all servers

• Metadata (disk allocation) only!

– Consistent in the face of server and network failures

– A majority is needed to update the global state– Any server can be added/removed in the

presence of failed servers

Page 10: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

10

Key Petal Features

• Storage is incrementally expandable• Data is optionally mirrored over multiple servers• Metadata is replicated on all servers• Transparent addition and deletion of servers• Supports read-only snapshots of virtual disks• Client API looks like block-level disk device• Throughput

– Scales linearly with additional servers– Degrades gracefully with failures

Page 11: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

11

Frangipani: A Scalable Distributed File System

C. A. Thekkath, T. Mann, and E. K. LeeSystems Research Center

Digital Equipment CorporationSOSP’97

Page 12: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

12

Frangipani Features

• Behaves like a local file system– Multiple machines cooperatively manage

a Petal disk– Users on any machine see a consistent

view of data

• Exhibits good performance, scaling, and load balancing

• Easy to administer

Page 13: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

13

Ease of Administration

• Frangipani machines are modular– Can be added and deleted transparently

• Common free space pool – Users don’t have to be moved

• Automatically recovers from crashes

• Consistent backup without halting the system

Page 14: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

14

Frangipani Structure

• Distributed file system built atop a shared virtual disk (Petal)

• Frangipani servers do not communicate with each other directly– Only through Petal

• Simplifies managemant– Addition/removal of servers

Page 15: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

15

Frangipani Layering

Page 16: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

16

Standard Organization

Page 17: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

17

Components of Frangipani

• File system core– Implements the file system (FS) interface– Uses FS mechanisms (buffer cache etc.)– Exploits Petal’s large virtual space

• Locks with leases– Granted for finite time, must be refreshed

• Write-ahead redo log– Performance optimization + failure recovery

Page 18: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

18

Locks• Multiple reader/single writer• Granularity: lock per entire file or directory• A lock is really a lease – it expires

– After 30 seconds in their implementation

• Assumption?

Page 19: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

19

Using Locks

• Frangipani servers are clients of lock service

• Dirty data is written to disk (Petal) before the lock is given to another machine

• Locks are cached by servers that acquire them– Soft state: no need to explicitly release locks– Uses lease timeouts for lock recovery

Page 20: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

20

Distributed Lock Management

• A set of lock servers collaboratively manage locks– Run Paxos among them– Consensus on global state: set of locks each server is

responsible for, list of current lock servers, lock allocation to clients

– Need majority to make progress• Using leases requires assuming loosely

synchronized clocks– Expired leases should not be accepted

• Why Paxos then?– To overcome network partitions

Page 21: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

21

Logging

• Frangipani uses a write ahead redo log for metadata– Log records are kept on Petal (why?)

• Data is written to Petal – On sync, fsync, or every 30 seconds– On lock revocation or when the log wraps

• Each server has a separate log– Reduces contention– Independent recovery

Page 22: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

22

Recovery

• Recovery initiated due to failure detection– By the lock service– Failure detection implemented using heartbeats

• Any server can recover operations for a failed server– Log is available via Petal

Page 23: 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

23

Conclusions

• Fault-tolerance in the real world• Overcome crashes and network partitions

using consensus-based replication – Paxos

• Un-contended good performance – Using locks

• Implement locks as leases for robustness• Logging for recovery