Ajou University, South Korea GCC 2003 Presentation Dynamic Data Grid Dynamic Data Grid Replication Strategy based Replication Strategy based on Internet Hierarchy on Internet Hierarchy Sang Min Park , Jai-Hoon Kim, and Young-Bae Ko Ajou University South Korea
18
Embed
Ajou University, South Korea GCC 2003 Presentation Dynamic Data Grid Replication Strategy based on Internet Hierarchy Sang Min Park , Jai-Hoon Kim, and.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Ajou University, South Korea
GCC 2003 Presentation
Dynamic Data Grid Replication Dynamic Data Grid Replication Strategy based on Internet HierarchyStrategy based on Internet Hierarchy
Sang Min Park , Jai-Hoon Kim,
and Young-Bae Ko
Ajou University
South Korea
Ajou University, South Korea2
GCC 2003 Presentation
ContentsContents
Introduction to Data Grid
Optimizations in Data Grid
Novel Replication Strategy based on Internet Hierarchy
Simulation
Simulation Results
Conclusions
Ajou University, South Korea3
GCC 2003 Presentation
Introduction to Data GridIntroduction to Data Grid
Data Grid Motivations
Petabyte scale data production
Distributed data storage to store parts of data
Distributed computing resources which process the data
Two Most Important Approaches for Data Grid
Secure, reliable, and efficient data transport protocol
(ex. GridFTP)
Replication (ex. Replica catalog)
Replication
Large size files are partially replicated among sites
Reduce data access time
Application Scheduling, Dynamic replication issues are emerging
Ajou University, South Korea4
GCC 2003 Presentation
Introduction to Data GridIntroduction to Data Grid
Typical Job Execution Scenario
CE SE
Grid site
CE SE
Grid site
SE
Grid site
CE
Grid site
CE SE
Grid site
CE SE
Grid site
Internet andData Grid
CE SE
Grid site
Initial replication
Job Distribution
User
Master Site
Large-sizeStorage
Job Broker
Data Fetch andReplication
Data fromExperiments (HEP,Bioinformatics, etc)
Ajou University, South Korea5
GCC 2003 Presentation
Optimizations in Data Optimizations in Data GridGrid
Reducing the Overall Job Execution TimeScheduling Optimization
Deciding where to allocate the jobConsidering location of replicas and computational capabilities of sites
Short-term OptimizationDeciding from where to fetch replicasConsidering available network bandwidth between sites
Using Auction Protocol to trigger Long-term Optimization
Site-level Locality based on File access patterns
Ajou University, South Korea7
GCC 2003 Presentation
Existing Dynamic Replication Existing Dynamic Replication StrategiesStrategiesThe Limitations of the site-level optimization
A Site certainly have limitations of their storage size, which means that the rate of data request locality is also limitedThere should be predictable file access patterns, but we do not know if there will be.
Ajou University, South Korea8
GCC 2003 Presentation
Replication Strategy based Replication Strategy based on Bandwidth Hierarchy on Bandwidth Hierarchy (BHR)(BHR)Network-level Locality
A site is not the only possible source of localityAnother source of locality : Network-level locality
If the replica is located in a close site, not long delay would be taken to fetch this replica
Fast Replica Fast Replica TransmissionTransmission
Network Region Network Region (e.g., a country)(e.g., a country)
Ajou University, South Korea9
GCC 2003 Presentation
Replication Strategy based on Replication Strategy based on Bandwidth Hierarchy (BHR)Bandwidth Hierarchy (BHR)Bandwidth Hierarchy
Grid Site
Router
Network Region Network Region
NetworkRegion
Contendingnetwork traffic
Data moving path betweensites within region
Data moving path betweensites across regions
Narrowbandwidth
Ajou University, South Korea10
GCC 2003 Presentation
Replication Strategy based Replication Strategy based on Bandwidth Hierarchy on Bandwidth Hierarchy (BHR)(BHR)Maximizing Network-level locality
1. Avoiding Replica Duplication in a region
2. Considering popularity of file request at the region-level
XX XX
A RegionA Region
Receiving New Receiving New ReplicaReplica
a Sitea Site a Sitea Site
No space here!No space here!
We should remove some fileWe should remove some file
Delete this one!Delete this one!
Replica X is Replica X is duplicated here!duplicated here!AA
Ajou University, South Korea11
GCC 2003 Presentation
SimulationSimulation
OptorSimData Grid Dynamic Replication Simulation toolDeveloped as part of European Data Grid ProjectImplemented in Java
Implemented Our own Region-based Optimizer in OptorSim
Ajou University, South Korea12
GCC 2003 Presentation
SimulationSimulationSimulation Environment
Grid Site
Router
Region A Region B
Region D
Master Site
Intra-region link
Inter-region linkLink to master
Region C
Ajou University, South Korea13
GCC 2003 Presentation
SimulationsSimulations
Parameters Values
Number of jobs 1000
Number of job types 50
Number of file accessed per job 15
Size of single file 1 GB
Total size of files 750 GB
Parameters Values
Intra-region bandwidth 1000 Mbps
Inter-region bandwidth 1000 Mbps
Master-router bandwidth 2000 Mbps
Storage space at site 50 GB
General configuration of parametersGeneral configuration of parameters
Bandwidth and Storage SizeBandwidth and Storage Size
Ajou University, South Korea14
GCC 2003 Presentation
Simulation ResultsSimulation Results
33174.9
47962.2 50070.9
0
10000
20000
30000
40000
50000
60000
BHR Delete LRU Delete Oldest
Dynamic Replication Strategy
To
tal
Jo
b E
xe
cu
tio
n T
ime
s (
se
c)
Total Job times of three strategiesTotal Job times of three strategies
Ajou University, South Korea15
GCC 2003 Presentation
Simulation ResultsSimulation Results
0
10000
20000
30000
40000
50000
60000
600 800 1000 1200 1400 1600 1800 2000
Bandwidth of Inter-region link
To
tal
Job
Tim
e (
se
c)
BHRDelete LRUDelete Oldest
0
10000
20000
30000
40000
50000
60000
70000
30 50 70 90 110 130
Size of SE
To
tal
Job
Tim
e (
se
c)
BHRDelete LRUDelete Oldest
Total job time with varying bandwidth and storage size
Ajou University, South Korea16
GCC 2003 Presentation
ConclusionsConclusions
The existing dynamic replication strategies are based only on site-level locality of file requestBHR strategy is based on the network-localityBHR shows quite good performance when hierarchy of bandwidth clearly appears, and size of storage at a site is smallWe extend current site-level replica optimization study to more scalable way
Ajou University, South Korea17
GCC 2003 Presentation
ReferencesReferencesWilliam H. Bell, David G. Cameron, Luigi Capozza, A. Paul Millar, Kurt Stockinger, and Floriano Zini.: Simulation of Dynamic Grid Replication Strategies in OptorSim. In Proc. of the 3rd Int'l. IEEE Workshop on Grid Computing (Grid'2002), Baltimore, USA, November 2002. Springer Verlag, Lecture Notes in Computer Science.
William H. Bell, David G. Cameron, Ruben Carvajal-Schiaffino, A. Paul Millar, Kurt Stockinger, and Floriano Zini.: Evaluation of an Economy-Based File Replication Strategy for a Data Grid. In International Workshop on Agent based Cluster and Grid Computing at CCGrid 2003, Tokyo, Japan, May 2003. IEEE Computer Society Press.
Mark Carman, Floriano Zini, Luciano Serafini, and Kurt Stockinger.: Towards an Economy-Based Optimisation of File Access and Replication on a Data Grid. In International Workshop on Agent based Cluster and Grid Computing at International Symposium on Cluster Computing and the Grid (CCGrid'2002), Berlin, Germany, May 2002. IEEE Computer Society Press.
Ann Chervenak, Ian Foster, Carl Kesselman, Charles Salisbury and Steven Tuecke.: The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets. Journal of Network and Computer Applications, 23:187-200, 2001.
EU Data Grid Project: http://www.eu-datagrid.org
Ajou University, South Korea18
GCC 2003 Presentation
ReferencesReferencesI. Foster, C. Kesselman and S. Tuecke.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputer Applications, 15(3), 2001.Wolfgang Hoschek, Javier Jaen-Martinez, Asad Samar, Heinz Stockinger and Kurt Stockinger.: Data Management in an International Data Grid Project. 1st IEEE/ACM International Workshop on Grid Computing (Grid'2000), Bangalore, India, Dec 2000.OptorSim – A Replica Optimizer Simulation: http://edg-wp2.web.cern.ch/edg-wp2/optimization/optorsim.htmlSang-Min Park and Jai-Hoon Kim.: Chameleon: A Resource Scheduler in a Data Grid Environment. 2003 IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'2003), Tokyo, Japan, May 2003. IEEE Computer Society Press.Kavitha Ranganathan and Ian Foster.: Design and Evaluation of Dynamic Replication Strategies for a High Performance Data Grid. International Conference on Computing in High Energy and Nuclear Physics, Beijing, September 2001.Kavitha Ranganathan and Ian Foster.: Identifying Dynamic Replication Strategies for a High Performance Data Grid. International Workshop on Grid Computing, Denver, November 2001.