Top Banner
Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria
20

Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

Mar 28, 2015

Download

Documents

Megan Cooke
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

Distributed cluster dynamic storage:a comparison of dcache, xrootd,

slashgrid storage systems running on batch nodes

Alessandra Forti

CHEP07, Victoria

Page 2: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

3 September 2007 Alessandra Forti 2

Outline

• Hardware

• Storage setup

• Comparison Table

• Some tests results

• Conclusions

Page 3: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

3 September 2007 Alessandra Forti 3

Manchester Nodes

• 900 nodes• 2X2.8 GHz• 2X2 GB RAMs• 2x250 GB disks

– Total space available for data ~350TB

– Disk transfer speed ~470 Mb/s• specs and benchmarked

– WN disks are NOT raided• disk 1: OS+scratch+data• disk 2: data

• No tape storage.– Nor other more permanent

storage as raided disk servers.• Nodes divided in two identical,

independent clusters.– Almost!– Head nodes have the same specs

as the nodes.

250 GBdcache

Batch Node Disks

150 GBxrootd

or waiting for a pool

100 GBOS/scratch

250 GBdcache

Batch Node Disks

150 GBxrootd

or waiting for a pool

100 GBOS/scratch

Page 4: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

3 September 2007 Alessandra Forti 4

Cluster Network

• 50 racks– Each rack

• 20 nodes• 1 node 2x1 Gb/s• 1 switch 2x1 Gb/s to

central switch

• Central switch– 10 Gb/s to regional

network

Page 5: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

3 September 2007 Alessandra Forti 5

dCache communicationOSG twiki

Page 6: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

3 September 2007 Alessandra Forti 6

dcache setup

srmpnfs server

admin

pool(s)

gridftpgsidcap

pnfs

disk

Head Node

pool(s)

pnfs

user process

disk

Batch Node

pool(s)

pnfs

user process

disk

Batch Nodeuser

process

Batch Node

Rack1 RackN

[.....]

• dcache on all the nodes• at least 2 permanent open connections on each node with the head node = ~900 connections per head

Page 7: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

3 September 2007 Alessandra Forti 7

Xrootd communication

ClientClient

RedirectorRedirector(Head Node)

Data ServersData Serversopen file X

AA

BB

CC

go to Copen file X

Who has file X?

I have

Cluster

Client sees all servers as xrootd data serversClient sees all servers as xrootd data servers

SupervisorSupervisor((sub-redirectorsub-redirector))

Who has file X? DD

EE

FF

I havego to F

open file X

I have

A. Hanushevsky

Page 8: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

3 September 2007 Alessandra Forti 8

Xrootd setup

olbdManager

Head Node

xrootd

olbdserver

user process

disk

Batch Node

xrootd

olbdserver

user process

disk

Batch Node

xrootd

olbdsupervisor

user process

disk

Batch Node

Rack1 RackN

[.....]

• xrootd on 100 nodes only• There are no permanent connections between the nodes

Page 9: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

3 September 2007 Alessandra Forti 9

Slashgrid and http/gridsite

• slashgrid is a shared file system based on http. – ls /grid/<http-server-name>/dir

• Main target applications are– Input sand box– Final analysis of small ntuples

• It was develop as a light weight alternative to afs.– It’s still in testing phase and is installed only on 2 nodes in Manchester

• For more information see poster– http://indico.cern.ch/contributionDisplay.py?contribId=103&amp;sessionI

d=21&amp;confId=3580• Although there is the possibility of an xrootd like architecture, in the

tests it was used in the simplest way.– client contacting data server directly without any type of redirection.

• Transfer tests didn’t involve /grid but htcp over http/gridsite• User analisys test where done reading from /grid

Page 10: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

3 September 2007 Alessandra Forti 10

Slashgrid setup

apachegridsite

slashgrid daemon apache

gridsite

slashgrid daemon

slashgrid daemon

remoteapache

user process

user process

user process

disk

disk

disk

Batch Node

Batch Node

Batch Node

Rack1

Page 11: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

3 September 2007 Alessandra Forti 11

Comparison table

dcache xrootd slashgrid

http/gridsite

rpms 2 1 2+4

config files 36 1 1

log files 20 1 1

databases to manage 2 0 0

srm yes no no

resilience yes no no

load balancing yes yes yes

gsi authentication and VOMS compat

yes no yes

configuration tools yaim for EGEE sites

VDT for OSG sites

not really needed not really needed

name space /pnfs none /grid

number of protocols supported

5 1 3

Page 12: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

3 September 2007 Alessandra Forti 12

Some tests results

• Basic tests• Transfers:

– dccp, srmcp, xrdcp, htcp• User analysis job over a set of ntuples:

– dcache, xrootd, slashgrid, root/http, afs • htcp in different conditions• srmcp strange behaviour• BaBar skim production jobs

– A real life application• AFS vs slashgrid

– real time against user time

Page 13: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

3 September 2007 Alessandra Forti 13

Transfer tools

Rate 1GB files

0

100

200

300

400

500

600

700

1 17 33 49 65 81 97 113 129 145 161 177 193

Mb

/s

srmcp

dccp

htcp

xrdcp

htcp a job kicked in on the serving node

Same file copied many times over many times

xrootd server busy with jobs

gridftp and dccp have access to replicas

Page 14: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

3 September 2007 Alessandra Forti 14

User analysis

xrootd

0

100

200

300

400

500

600

1 26 51 76 101 126 151 176 201 226 251 276 301 326 351 376

rate

Mb

/s

slashgrid

0

10

20

30

40

50

60

70

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151

rate

Mb

/s

http

0

50

100

150

200

250

300

350

400

1 19 37 55 73 91 109 127 145 163 181 199 217 235 253 271

rate

Mb

/s

dcap

050

100150200250300350400450

1 12 23 34 45 56 67 78 89 100 111 122 133 144 155 166

rate

Mb

/s

Same set of small files copied many times: 29 files ~79MB each

Page 15: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

3 September 2007 Alessandra Forti 15

htcp tests 1GB files

htcp test 1GB file

0

200

400

600

800

1000

1 5 9 13 17 21 25 29 33 37

rate

Mb

/s

to /dev/null with andwithout parallelprocess

to disk

to disk with a paralleltransfer

disk with concurrentlhcb process

disk on samemachine

other process onsame machine

parallel process kicks inmemory speed

writing on the same disk

parallel processwriting on different disk

Page 16: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

3 September 2007 Alessandra Forti 16

Dcap API

same as htcp transferring data from memory

Page 17: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

3 September 2007 Alessandra Forti 17

srmcp tests 1GB files

• list of 5 1GB files– files on different pools– files replicated

• 1st test: – copy 10 times the each file for 5

times• 2nd test:

– copy each file once for 5 times• 3rd test:

– same as the first one• 4th test:

– the same as the first one in different sequence

• Drop in efficiency after the first loop in each case

• Cluster empty no concurrent processes to explain the drop.

– Needs more investigation

srmcp rates 1GB file

0

50

100

150

200

250

300

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49ra

tes

Mb

/s 1st test

2nd test

3rd test

4th test

Page 18: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

3 September 2007 Alessandra Forti 18

Efficiency vs reading rates

0.9550.96

0.9650.97

0.9750.98

0.9850.99

0.9951

0 5 10 15 20 25

Mb/s

BaBar skim production

• average reading rate 10 Mb/s

• two files in input• reading non sequential

– subset of events– jumps from 1 file to another

• average job efficiency (cpu time/elapsed) is 98.6%

• Even increasing the speed of a factor 10 wouldn’t change the job efficiency!!

Page 19: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

3 September 2007 Alessandra Forti 19

afs vs slashgrid

• Manchester department uses AFS– Local experiment software

installation– User shared data

• AFS measures for comparison with slashgrid.– cache smaller than the

amount of data read• the job was reading from

disk after the first time all the same.

• Slashgrid in not designed to do it.

afs time

0

100

200

300

400

500

600

700

800

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49

sec afs real

afs user

slashgrid time

0100200300400500600700800900

1 16 31 46 61 76 91 106 121 136 151

sec slashgrid real

slashgrid user

AFS copying from cache

Page 20: Distributed cluster dynamic storage: a comparison of dcache, xrootd, slashgrid storage systems running on batch nodes Alessandra Forti CHEP07, Victoria.

3 September 2007 Alessandra Forti 20

Conclusions• dcache more complicated, difficult to manage, but has 3 features

difficult to beat– resilience– srm front end– 5 different protocols

• xrootd is elegant and simple but all the data management part is in the users/admin hands and the lack of an SRM front end makes it unappealing for the grid community.

• slashgrid could be useful for software distribution– for data reading is still too slow.– over AFS it has the advantage of being easy to install and maintain.

• Speed within the cluster is comparable among different protocols– Applications like BaBar skim production demonstrate that a very high

speed is not required.– However more measurements are needed to better understand different

behavior especially when cluster features enter the equation.