ECE 6160: Advanced Computer Networks SAN Instructor: Dr. Xubin (Ben) He Email: [email protected] Tel: 931-372-3462 Course web: http://www.ece.tntech.edu/hexb/616f05
Jun 20, 2015
ECE 6160: Advanced Computer Networks
SAN
Instructor: Dr. Xubin (Ben) He
Email: [email protected]
Tel: 931-372-3462
Course web: http://www.ece.tntech.edu/hexb/616f05
ECE6160:Advanced Computer Networks
2
Prev…
• Networked storage
• NAS
ECE6160:Advanced Computer Networks
3
Storage Architectures
ECE6160:Advanced Computer Networks
4
Storage Area Networks
ECE6160:Advanced Computer Networks
5
SAN connection
• FC:– FC-SAN
• LAN (Ethernet)– IP-SAN
– iSCSI
• Other networks– Petal (ATM)
ECE6160:Advanced Computer Networks
6
Typical SAN
•Backup solutions (tape sharing) •Disaster tolerance solutions (distance to remote location) •Reliable, maintainable, scalable infrastructure
ECE6160:Advanced Computer Networks
7
A real SAN.
ECE6160:Advanced Computer Networks
8
NAS and SAN shortcomings
• SAN Shortcomings--Data to desktop--Sharing between NT and UNIX--Lack of standards for file access and locking
• NAS Shortcomings--Shared tape resources--Number of drives--Distance to tapes/disks
• NAS--Focuses on applications, users, and the files and data that they share
• SAN--Focuses on disks, tapes, and a scalable, reliable infrastructure to connect them
• NAS Plus SAN--The complete solution, from desktop to data center to storage device
ECE6160:Advanced Computer Networks
9
NAS plus SAN.
•NAS Plus SAN--The complete solution, from desktop to data center to storage device
ECE6160:Advanced Computer Networks
10
Petal/Frangipani
PetalPetal
FrangipaniFrangipani
NFSNFS
““SAN”SAN”
““NAS”NAS”
ECE6160:Advanced Computer Networks
11
Petal/Frangipani
PetalPetal
FrangipaniFrangipani
NFSNFSUntrustedOS-agnostic
FS semanticsSharing/coordinationDisk aggregation (“bricks”)Filesystem-agnosticRecovery and reconfigurationLoad balancingChained declusteringSnapshotsDoes not control sharing
Each “cloud” may resize or reconfigure independently.What indirection is required to make this happen, and where is it?
ECE6160:Advanced Computer Networks
12
Remaining Slides
The following slides have been borrowed from the Petal and Frangipani presentations, which were available on the Web until Compaq SRC dissolved. This material is owned by Ed Lee, Chandu Thekkath, and the other authors of the work. The Frangipani material is still available through Chandu Thekkath’s site at www.thekkath.org.
For ECE6160, several issues are important:• Understand the role of each layer in the previous slides, and the strengths and limitations of each layer as a basis for innovating behind its interface (NAS/SAN).• Understand the concepts of virtual disks and a cluster file system embodied in Petal and Frangipani.•Understand how the features of Petal simplify the design of a scalable cluster file system (Frangipani) above it.
Petal: Distributed Virtual Disks
Systems Research Center
Digital Equipment Corporation
Edward K. Lee
Chandramohan A. Thekkath
04/13/23
ECE6160:Advanced Computer Networks
14
Logical System View
/dev/vdisk1/dev/vdisk2 /dev/vdisk3 /dev/vdisk4
/dev/vdisk5
AdvFS NT FS PC FS UFS
Scalable Network
Petal
ECE6160:Advanced Computer Networks
15
Physical System View
Scalable Network
Petal Server Petal Server Petal Server Petal Server
Parallel Database or Cluster File System
/dev/shared1
ECE6160:Advanced Computer Networks
16
Virtual Disks
• Each disk provides 2^64 byte address space.
• Created and destroyed on demand.
• Allocates disk storage on demand.
• Snapshots via copy-on-write.
• Online incremental reconfiguration.
ECE6160:Advanced Computer Networks
17
Virtual to Physical Translation
PMap0
vdiskID
offset
(disk, diskOffset)
PMap1
Virtual Disk Directory
GMap
PMap2 PMap3
(server, disk, diskOffset)(vdiskID, offset)
Server 0 Server 1 Server 2 Server 3
ECE6160:Advanced Computer Networks
18
Global State Management
• Based on Leslie Lamport’s Paxos algorithm.
• Global state is replicated across all servers.
• Consistent in the face of server & network failures.
• A majority is needed to update global state.
• Any server can be added/removed in the presence of failed servers.
ECE6160:Advanced Computer Networks
19
Fault-Tolerant Global Operations
• Create/Delete virtual disks.
• Snapshot virtual disks.
• Add/Remove servers.
• Reconfigure virtual disks.
ECE6160:Advanced Computer Networks
20
Data Placement & Redundancy
• Supports non-redundant and chained-declustered virtual disks.
• Parity can be supported if desired.
• Chained-declustering tolerates any single component failure.
• Tolerates many common multiple failures.
• Throughput scales linearly with additional servers.
• Throughput degrades gracefully with failures.
ECE6160:Advanced Computer Networks
21
Chained Declustering
D0
Server0
D3
D4
D7
D1
Server1
D0
D5
D4
D2
Server2
D1
D6
D5
D3
Server3
D2
D7
D6
ECE6160:Advanced Computer Networks
22
Chained Declustering
D0
Server0
D3
D4
D7
Server1
D2
Server2
D1
D6
D5
D3
Server3
D2
D7
D6
D1
D0
D5
D4
ECE6160:Advanced Computer Networks
23
The Prototype
• Digital ATM network.– 155 Mbit/s per link.
• 8 AlphaStation Model 600.– 333 MHz Alpha running Digital Unix.
• 72 RZ29 disks.– 4.3 GB, 3.5 inch, fast SCSI (10MB/s).
– 9 ms avg. seek, 6 MB/s sustained transfer rate.
• Unix kernel device driver.
• User-level Petal servers.
ECE6160:Advanced Computer Networks
24
The Prototype
src-ss1
Digital ATM Network (AN2)
src-ss2 src-ss8
petal1 petal2 petal8
/dev/vdisk1
/dev/vdisk1 /dev/vdisk1 /dev/vdisk1
………
………
ECE6160:Advanced Computer Networks
25
Throughput Scaling
0
2
4
6
8
0 2 4 6 8
Number of Servers
Th
rou
pu
t S
cale
-up LINEAR
512B Rd
8KB Rd
64KB Rd
512B Wr
8KB Wr
64KB Wr
ECE6160:Advanced Computer Networks
26
Virtual Disk Reconfiguration
0
5
10
15
20
25
30
0 1 2 3 4 5 6
Elapsed Time in Minutes
Th
rou
gh
pu
t in
MB
/s
6 servers
8 servers
virtual disk w/ 1GB of allocated storage8KB reads & writes
Frangipani: A Scalable Distributed File System
C. A. Thekkath, T. Mann, and E. K. Lee
Systems Research Center
Digital Equipment Corporation
ECE6160:Advanced Computer Networks
28
Why Not An Old File System on Petal?
• Traditional file systems (e.g., UFS, AdvFS) cannot share a block device
• The machine that runs the file system can become a bottleneck
ECE6160:Advanced Computer Networks
29
Frangipani
• Behaves like a local file system– multiple machines cooperatively manage
a Petal disk
– users on any machine see a consistentview of data
• Exhibits good performance, scaling, and load balancing
• Easy to administer
ECE6160:Advanced Computer Networks
30
Ease of Administration
• Frangipani machines are modular– can be added and deleted transparently
• Common free space pool – users don’t have to be moved
• Automatically recovers from crashes
• Consistent backup without halting the system
ECE6160:Advanced Computer Networks
31
Components of Frangipani
• File system core– implements the Digital Unix vnode interface
– uses the Digital Unix Unified Buffer Cache
– exploits Petal’s large virtual space
• Locks with leases
• Write-ahead redo log
ECE6160:Advanced Computer Networks
32
Locks
• Multiple reader/single writer
• Locks are moderately coarse-grained– protects entire file or directory
• Dirty data is written to disk before lock is given to another machine
• Each machine aggressively caches locks– uses lease timeouts for lock recovery
ECE6160:Advanced Computer Networks
33
Logging
• Frangipani uses a write ahead redo log for metadata
– log records are kept on Petal
• Data is written to Petal– on sync, fsync, or every 30 seconds
– on lock revocation or when the log wraps
• Each machine has a separate log– reduces contention
– independent recovery
ECE6160:Advanced Computer Networks
34
Recovery
• Recovery is initiated by the lock service
• Recovery can be carried out on any machine– log is distributed and available via Petal
ECE6160:Advanced Computer Networks
35
References
• E. Lee and C. Thekkath, “Petal: Distributed Virtual Disks,” Proceedings of the international conference on Architectural support for programming languages and operating systems (ASPLOS 1996)
• P. Sarkar, S. Uttamchandani, and K. Voruganti, “Storage Over IP: When Does Hardware Support Help?” Proc. of 2nd USENIX Conference on File And Storage Technologies (FAST’2003)
• C. Thekkath, T. Mann, and E. Lee, “Frangipani: A scalable distributed file system,” Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP), pp. 224-237, October 1997