USSM FOOTBALL SERIES SAISON 1 L’ESPRIT SM FOOTBALL SER FOOTBALL S DE CONQUÊTE VS ÉPISODE 3 02.09.2020 / 18H00 CHARTRES US SAINT-MALO
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
Lustre Replication and Migration Tool
2016/09/20
DataDirect Networks Japan, Inc.Shuichi Ihara ([email protected])
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
Background: Replication and Migration
▶ Data backup is important • Many backup and replication requirements • Lustre needs to be migrated from old to new system • Time to think backup for few PB Lustre system • Lustre Replication is not ready yet.
▶ Lustre tiering is possible • Data Migration from Fast HDD (or SSD) to slow disks is
possible o Different from fully automated HSM
• Lack of user space utilities
2
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
Lustre data replication and migration
▶ Data replication (Backup) • Two independent and different namespaces (file system) • Asynchronous file level replication
▶ Data migration • Migrate data from a storage device to another type of
device (e.g. SSD or fast HDD to slow HDD) • Keep same metadata and namespace • Application access data transparently even after migration
Introduce two utilities (ldsync and ldmigrate) for data replication and migration in Lustre.
3
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
Challenges on Backup and Replication
▶ Two major challenges on Backup and Replication at large file system • "Delta" detection between file systems o Determined by file attributes (mtimes, size, checksum, etc) o Depends on how many files are in the file systems
• Data transfer time o Copying many large files, as well large single shared file o Need efficient resource allocation and maximize utilization
4
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
Major Copy Tool : RSYNC and DCP
▶ RSYNC • Has been be maintained more than 10 years and
packaged in Linux distribution. • Supports many features, but lack of parallelization
▶ DCP (part of fileutils http://fileutils.io) • Designed for scalability and performance • Started as collaboration efforts among several large US
laboratories and DDN, that was involved at the beginning • Support MPI and any POSIX file system • Manage chunk of file and efficient MPI rank allocation for
copy and maximize resource utilization
5
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
Accelerate File system's delta detection
▶ "Diff" detections of two file systems • RSYNC takes few hours for delta detection at tens of millions
of inode in the file system • DCP (DCMP) in fileutils is much faster, but still consume a lot
of time and metadata pressure
▶ Lustre Changelog rescue • Lustre Changelog records events the file system namespace
or file metadata. (Timestamp, FID and operation) • Keep in MDT and it can fetch from Lustre client • No more file system scanning except initial copy!
6
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
ldsync : Parallel synchronization tool based on Lustre changelog
▶ Similar tool lustre_rsync is exist, but... • Single thread and still based on rsync • Partial changelog support
▶ ldsync is a replacement of lustre_rsync • A parallel synchronization framework includes Lustre
changelog analyzer • Changelog analyzer walk through Changelog and invoke
minimum stat() call to determine files to be synced • Flexible backend copy tool support (Use DCP for now, but
any native copy tool possible)
7
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
Architecture of ldsync
1. Fetch Changelog from MDT and analyze 2. Minimum stat() call to MDS to get additional metadata
information 3. Copy files by "dcp" and also unlink files from backup file
system by "drm" 4. Clear old Changelog
8
Primary Lustre
File system
Backup Lustre
File system
Data Mover Node
Data Mover mounts both primary(ro) and backup file system
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
Delta detection speed in two file systems9
0
200
400
600
800
1000
1200
1.2 2.4 3.6 4.8 6 7.2 8.4 9.6 10.8 12
TIm
e(se
c)
Number of file (Million)
ldsync(Changelog) and Rsync(without copy)
ldsync
rsync
25x reduction
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
ldsync : Experimental performance (Many small files)10
1Milion x 4KB file creation and synchronize two file systems
0
50
100
150
200
250
300
350
400
450
ldsync(1node) ldsync(1node)+Optimized Changelog Walk
ldsync(8node)+Optimized Changelog Walk
Tim
e (s
ec)
Data Copy
Changelog Analysis
Changelog Scan
Changelog Scanning(2 sec)
3x Reduction
2.5x Reduction
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
ldsync : Experimental performance (1TB single shared file)11
0 500
1000 1500 2000 2500 3000 3500 4000 4500 5000
Rsync DCP (1n1p) DCP (1n4p) DCP (1n8p) DCP (2n16p)
DCP (4n32p)
DCP (8n64p)
DCP (16n128p)
Single Client Multiple Clients
TIm
e(se
c)
Primary System : 1 x DDN ES7K(140 x NLSAS), 2 x FDR Backup System : 1 x DDN ES7K(140 x NLSAS), 2 x FDR
9x reduction Copy speed: ~80% BW of H/W(10GB/sec)
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
ldmigrate : Parallel data migration tool for Lustre
▶ Keep metadata, but change OST object placement • e.g) Migrate data from SSD OST pool to HDD • "lfs migrate" can do it, but limited scalability
▶ ldmigrate • Migrate OST objects to another OSTs (OST pool) based on
Lustre data layout (determines how to place data to OSTs) • Parallelization and scalability • Integration with Job Scheduler is possible
12
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
Architecture of ldmigrate13
OST Pool "SSD"
MDT
Acquire grouplock and start copy data to tmp files by dcp
OST OST
FileA
/scratch/user/
object
TmpFile
object
MDT
OST OST
FileA
/scratch/user/
object
TmpFile
object
Swap Layout and FileA objects are now in OSTs of OST pool "HDD" and remove tmp files
After finish copy
Parallel data copy Swap Lustre layout
OST Pool "HDD" OST Pool "SSD" OST Pool "HDD"
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
How Setup Tiered OST Pool and Migration
▶ Creating OST pool for different type of device [root@mds ~]# lctl pool_new scratch.SSD [root@mds ~]# lctl pool_new scratch.HDD [root@mds ~]# lctl pool_add scratch.SSD OST[0-9] [root@mds ~]# lctl pool_add scratch.HDD OST[a-13]
▶ Assign OST pool to directory [root@client ~]# lfs setstripe -p SSD /scratch/user [root@client ~]# lfs getstripe -p /scratch/user/file* SSD
▶ Copy and layout change [root@client ~]# ldmigrate -g 100 -m -o SSD /scratch/user /scratch/tmp [root@client ~]# lfs getstripe -p /scratch/user/file* HDD
14
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
Conclusions
▶ Introduced ldsync and ldmigrate for backup and data migration.
▶ Demonstrated huge performance improvements compared to existing tools and techniques.
▶ Still investigating several performance optimization and stability. It also require more tests
▶ These tools still under private repository at Github, but we plan to publish as open source or push patches to fileutils.
15
ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
16 Thank you!