Introduciton of Distributed Computing

Distributed Computing – Case Study

Outline

• What is distributed computing

• Case study

– Hadoop – HDFS and map reduce

– Gluster File System

What is Distributed Computing/System?

• Distributed computing

– A field of computing science that

studies distributed system.

– The use of distributed systems to

solve computational problems.

• Distributed system

– Wikipedia• There are several autonomous

computational entities, each of which has its

own local memory.

• The entities communicate with each other by

message passing.

– Operating System Concept• The processors communicate with one

another through various communication

lines, such as high-speed buses or

telephone lines.

• Each processor has its own local memory.


• Distributed program

– A computing program that runs in a distributed

system

• Distributed programming

– The process of writing distributed program


• Common properties

– Fault tolerance• When one or some nodes fails, the whole system can still work fine

except performance.

• Need to check the status of each node

– Each node plays partial role• Each computer has only a limited, incomplete view of the system. Each

computer may know only one part of the input.

– Resource sharing• Each user can share the computing power and storage resource in the

system with other users

– Load Sharing• Dispatching several tasks to each nodes can help share loading to the

whole system.

– Easy to expand• We expect to use few time when adding nodes. Hope to spend no time

if possible.

CASE STUDY - HADOOP

Quick overview

Paramount Q1 2008 - 7

• Features

• HDFS

• Map-Reduce Framework

Features

• Large files

– Gigabytes, Terabytes

• Write once, read many

• Commodity Hardware

HDFS

• Namenode:

– manages the file system namespace and

regulates access to files by clients.

– determines the mapping of blocks to DataNodes.

– fsImage and editLog

• Data Node :

– manage storage attached to the nodes that they

run on

– save CRC codes

– send heartbeat to namenode.

– Each data is split as a chunk and each chuck is

stored on some data nodes.

HDFS

• Secondary Namenode

– responsible for merging fsImage and EditLog

– Not a namenode

HDFS architecture

Secondary namenode

• Edit log

– Transaction log• Update transaction log before updating content in memory

• Always update this file when each request has been sent to namenode

• fsImage

– Persistent checkpoint

• Secondary namenode

– Responsible for merging editLog and fsImage.

Secondary namenode

From Hadoop - The Definitive Guide

Map-Reduce Framework

• JobTracker

– Responsible for dispatch job to each tasktracker

– Job management like removing and scheduling.

• TaskTracker

– Responsible for executing job. Usually tasktracker

launch another JVM to execute the job.

Map-Reduce Framework

From Hadoop - The Definitive Guide

Summary - Hadoop

• Hadoop provides a distributed file system (HDFS) that

stores data on the compute nodes, providing very high

aggregate bandwidth across the cluster.

• Hadoop implements a computational paradigm named

Map/Reduce, where the application is divided into

many small fragments of work, each of which may be

executed or reexecuted on any node in the cluster.

CASE STUDY – GLUSTER

FILESYSTEM

Quick overview

Paramount Q1 2008 - 18

• Introduction

• Gluster File system design

• Example : 4 nodes GlusterFS

GlusterFSCluster File System

Introduction

• GlusterFS is an open source clustered file system

and runs on industry standard hardware from any

vendor and delivers multiple times the scalability

and performance of conventional storage at a

fraction of the cost.

=

N x Performance & Capacity

+ +

GlusterFS Overview

From GlusterFS Datasheet

GlusterFS Design

GigE

GlusterFS Clustered Filesystem on x86-64 platform

Storage ClientsCluster of Clients (Supercomputer, Data Center)

GLFS Client

Clustered Vol Manager

Clustered I/O Scheduler

GLFS Client



GlusterFS Client



GLFS Client



GLFS Client



GlusterFS Client



GLFSDVolumeGLFSDVolume

Storage Brick 1

GlusterFSVolume

Storage Brick 2

GlusterFSVolume

Storage Brick 3

GlusterFSVolume

GLFSDVolumeGLFSDVolume

Storage Brick 4

GlusterFSVolume

Storage Gateway

NFS/Samba

GLFS Client

Storage Gateway

NFS/Samba

GLFS Client

Storage Gateway

NFS/Samba

GLFS Client

InfiniBand RDMA (or) TCP/IP

NFS / SAMBA over TCP/IP

Compatibility withMS Windows

and other Unices

From http://www.gluster.org/

Key Design Considerations

• Capacity Scaling

– Scalable beyond Peta Bytes

• I/O Throughput Scaling

– Pluggable Clustered I/O Schedulers

– Advantage of RDMA transport

• Reliability

– Non Stop Storage

• Ease of Manageability

– Self Heal

– NFS like Disk Layout

• Elegance in Design

– Stackable Modules

– Not tied to I/O Profiles or Hardware or OS

Translators• Performance translators

1. Read Ahead

2. Write Behind

3. Threaded I/O

4. IO-Cache

• Clustering translators

1. Automatic File Replication (AFR)

2. Stripe

3. Unify

• Scheduling translators

1. Adaptive Least Usage (ALU)

2. Non-uniform filesystem architecture (NUFA)

3. Random

4. Rand-Robin

FUSE

• What’s FUSE ?

• Stands for “File system in USErspace”

• Makes it easy to write new filesystems

1.without knowing how the kernel works

2.without breaking unrelated things

3.more quickly/easily than traditional file systems

built as a kernel module

FUSE structure

From http://fuse.sourceforge.net/

How FUSE Works

• Application makes a file-related syscall

• Kernel figures out that the file is in a mounted

FUSE filesystem

• The FUSE kernel module forwards the

request to your userspace FUSE app

• Your app tells FUSE how to reply

Example : 4 nodes GlusterFS

Storage Virtualization : GlusterFS (AFR + Unify) ~1.8TB

Virtual Machine (XEN + KVM) Web App. MySQL

User

Server

POSIX

Ext4

vlab01

GlusterFS Server

Global Name Space ( /mnt/glusterfs )

Server

POSIX

Ext3

vlab02

GlusterFS Server

Server

POSIX

XFS

vlab03

GlusterFS Server

Server

POSIX

Ext3

vlab04

GlusterFS Server

TCPIP – GigE

User User

The view of GlusterFS client

• $ df -h

Filesystem Size Used Avail Use% Mounted on

/dev/sda1 901G 115G 740G 14% /

tmpfs 4.0G 0 4.0G 0% /dev/shm

/etc/glusterfs/glusterfs.vol 1.8T 243G 1.6T 13% /mnt/glusterfs

benchmark.pdf

test.ogg

initcore.c

mylogo.xcf

driver.c

ether.c

test.m4a

Unify Volume

work.ods

corporate.odp

driver.c

The view of GlusterFS server

accounts-2007.db

backup.db.zip

accounts-2006.db

accounts-2007.db

backup.db.zip

accounts-2006.db

accounts-2007.db

backup.db.zip

accounts-2006.db

Mirror Volume

north-pole-map

dvd-1.iso

xen-image

north-pole-map

dvd1.iso

xen-image

north-pole-map

dvd1.iso

xen-image

Stripe Volume

BRICK1 BRICK2 BRICK3

Summary - GlusterFS

• GlusterFS clusters together storage building blocks,

aggregating disk and memory resources and

managing your data in a single global namespace.

• GlusterFS is based on a stackable architecture that

can be optimized for specific application profiles with

simple plug-in modules, optimizing performance for a

wide range of workloads.

Reference

• http://en.wikipedia.org/wiki/Message_passing

• http://en.wikipedia.org/wiki/Distributed_computing

• http://en.wikipedia.org/wiki/Filesystem_in_Userspace

• http://en.wikipedia.org/wiki/Distributed_file_system

• http://hadoop.apache.org/

• Tom White - Hadoop - The Definitive Guide

• Silberschatz Galvin - Operating System Concepts

• http://www.gluster.org/

• http://www.zresearch.com/

• http://fuse.sourceforge.net/

Introduciton of Distributed Computing

Documents