CSCS STORAGE INFRASTRUCTURE CSCS HPS Storage System Engineer Stefano Claudio Gorini
CSCS STORAGE INFRASTRUCTURE
CSCS HPS
Storage System Engineer
Stefano Claudio Gorini
CSCS GPFS FS
2
/users & /apps /project /store
Small size Very Large Size Extreme Size
Quota by user Quota by group Quota by consortiunm
As a user exits
@ CSCS + 6 months
Duration of project
+ 6 months
As contractually agreed
Normal bandwidth High bandwidth High bandwidth
(if file on disk)
Backed up Backed up HSM
100 GB per user Capacity requested
and justified in a
project proposal
Capacity by Contract;
either matching founds
or fully paid by customer
PROJECT FS – HW
3
~1.4 PB ~1 PB
480 SATA 2TB DISKS 480 SATA 2TB DISKS 420 SATA 2TB DISKS 420 SATA 2TB
DISKS
DATA DISKS ~2.4 PB
~ 1 TB on SSD
Card
METADATA DISKS
TSM
Storage
Agent
BERNINA15
BERNINA03
BERNINA04
BERNINA16 BERNINA14
BERNINA13 BERNINA01 BERNINA05
BERNINA02
BERNINA22
GLOBAL 118-119 GLOBAL 123-124 GLOBAL 112-113 GLOBAL 116-117
BERNINA23
BERNINA25
BERNINA05
HOME & APPS FS – HW
4
GLOBAL 114-115
BERNINA11
BERNINA10
BERNINA24
DATA DISKs
60 of 120 SATA 2TB DISKS
METADATA DISKs
4 of 64 FC 500GB DISKS
TSM
Storage
Agent
STORE FS – HW
5
DATA DISKs
300 SATA 3TB DISKS DATA DISKs
300 SATA 3TB DISKS
DATA DISKs
300 SATA 3TB DISKS
~ 500 GB on SSD
Card
METADATA DISKS
TSM
Storage
Agent
ADULA05
ADULA06
MEDEL01
MEDEL02
MEDEL03
MEDEL04
MEDEL05
MEDEL06
MEDEL07
MEDEL08
MEDEL09
MEDEL10
MEDEL11
MEDEL12
MEDEL13
MEDEL14
RAMSAN1
~2.1 PB DATA DISKS
GPFS - CNFS
6
~# mmremotefs show all
Local Name Remote Name Cluster name Mount Point Mount Options Automount Drive Priority
global global global.cscs.ch /global rw yes - 0
apps apps globalhome.cscs.ch /apps rw yes - 0
users users globalhome.cscs.ch /users rw yes - 0
store archive store.cscs.ch /store rw yes - 0
/global *.cscs.ch(rw,async,no_root_squash)
/users *.cscs.ch(rw,async,no_root_squash)
/apps *.cscs.ch(rw,async,no_root_squash)
/store *.cscs.ch(rw,async,no_root_squash)
Alias used by CNFS:
nfs01.cscs.ch
nfs02.cscs.ch
nfs03.cscs.ch
nfs04.cscs.ch
BERNINA20
BERNINA07
BERNINA21
BERNINA08
QFS/SAM-FS to GPFS TSM Migration
7
GPFS
+
TSM/HSM
QFS/SAM-FS
QFS/SAM-FS to GPFS TSM Migration
8
1. Snapshot and migration of the metadata
2. Production stays on the old system
3. Bulk migration of data from the snapshot:
• Read tar le from SAM-FS/QFS
• Transfer data over network using a parallel copy tool
• Data integrity verication after the network transfer done by checksums, which
had been taken from the old system before the start of the migration.
• Untar to the new GPFS location 4. Transition to production after final synchronization of data
5. After that clean GPFS/TSM on production without access to the old tapes
SAM-FS
GPFS
HMK tool
QFS/SAM-FS to GPFS TSM Migration
9
FROM 02/09/2011 TO 10/12/2011 TO MIGRATE :
• ~26M files
• ~650 TB
• Average speed ~7 TB/day
PERFORMANCES WERE DRIVEN BY DEVICE SPEED:
• Tape drive speed ( T10000 max. 100 MB/s )
• Network speed ( 3 Gb/s due to a PCIX card)
• GPFS performances (the one used to migrate data was 4 GB/s)
DATA TOPOLOGY
10
Y<1MB 1MB<Y<100MB
100MB<Y<1GB 1GB<Y<10GB
Y> 10GB
0 20,000 40,000 60,000 80,000 100,000 120,000 140,000
X < 1 Month
1 Months < X < 3 Months
3 Months < X < 1 Year
X >1 Year
GB
File Size Range summerized in GB (Y) as a function of Last Access time (X)
TSM/HSM
11
/project
HSM & Backup Clients
HSM & Backup Clients
TSM
DB
HSM & Backup Clients
/store
TSM
DB
3 TSM Servers + 1 Spare
6 TSM Storage Agents:
24 LTO Tape Drives
5,719 LTO5 Slots & 5,719
Cartridges
- 8.58 PB uncompressed
Backup / Restore Capacity:
20 x 100 MB/s = 2000 MB/s = 7.2 TB/h
+ 4 drives for Data Management:
Reclaim, Copy, Move, DB Backup
GP
FS
S
tora
ge
Ag
en
ts
TS
M S
erv
ers
Active Libr. Manager
Spare
/users & /apps
TSM 6.3
Open Issue on GPFS/TSM
MMBACKUP - GPFS utility that drives Backup using the filesystem
policy does not yet completely join the TSM warning/error catalog:
12
“ Cannot reconcile shadow database.
Unable to compensate for all TSM errors in new shadow database.
Preserving previous shadow database.
Run next mmbackup with -q to synchronize shadow database. exit
12”
~10 hours to rebuild shadow database
Future Plans
Add a NEW TIERED GPFS “STAGE”:
– SSD & FC DISK
– Data moved across disk group by gpfs policy
Deploy a complete TSM Replica (TSM 6.3 feature)
13
Thanks!
14