No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) MONARC Project Status Report http://www.cern.ch/MONARC Harvey Newman California Institute of Technlogy http://l3www.cern.ch/monarc/monarc_lehman151100.ppt http://l3www.cern.ch/monarc/monarc_lehman151100.ppt DOE/NSF Joint Review of Software and Computing, BNL DOE/NSF Joint Review of Software and Computing, BNL November 15, 2000 November 15, 2000
25
Embed
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT) MONARC Project Status Report Harvey Newman California Institute.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT)
PROJECT GOALSPROJECT GOALS Develop “Baseline Models”Develop “Baseline Models” Specify the main parameters Specify the main parameters
characterizing the Model’s characterizing the Model’s performance: throughputs, latencies performance: throughputs, latencies
Verify resource requirement baselines: Verify resource requirement baselines: (computing, data handling, networks) (computing, data handling, networks)
TECHNICAL GOALSTECHNICAL GOALS Define the Define the Analysis ProcessAnalysis Process Define Define RC Architectures and ServicesRC Architectures and Services Provide Provide Guidelines for the final ModelsGuidelines for the final Models Provide a Provide a Simulation Toolset Simulation Toolset for Further for Further
Model studiesModel studies
2.5
Gbi
ts/s
2.5 Gbits/s
Univ 2
CERN700k SI95 1000+ TB
Disk; Robot
Tier2 Ctr~35k SI95 ~100 TB
Disk Robot
FNAL/BNL167k SI95650 Tbyte
Disk; Robot
622
Mbi
ts/s
N X
2.5
Gbi
ts/s
2.5 Gbits/s
2.5 Gbits/s
Univ1
UnivM
Model Circa Model Circa 2005 or 20062005 or 2006
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT)
MONARC History MONARC History
Spring 1998 Spring 1998 First Distributed Center Models (Bunn; Von Praun)First Distributed Center Models (Bunn; Von Praun) 6/19986/1998 Presentation to LCB; Project Assignment PlanPresentation to LCB; Project Assignment Plan Summer 1998Summer 1998 MONARC Project Startup (ATLAS, CMS, LHCb)MONARC Project Startup (ATLAS, CMS, LHCb) 9 - 10/19989 - 10/1998 Project Execution Plan; Approved by LCBProject Execution Plan; Approved by LCB 1/19991/1999 First Analysis Process to be ModeledFirst Analysis Process to be Modeled 2/19992/1999 First Java Based Simulation Models (I. Legrand)First Java Based Simulation Models (I. Legrand) Spring 1999Spring 1999 Java2 Based Simulations; GUIJava2 Based Simulations; GUI 4/99; 8/99; 12/994/99; 8/99; 12/99 Regional Centre Representative MeetingsRegional Centre Representative Meetings 6/19996/1999 Mid-Project Progress ReportMid-Project Progress Report
Including MONARC Baseline ModelsIncluding MONARC Baseline Models 9/19999/1999 Validation of MONARC Simulation on TestbedsValidation of MONARC Simulation on Testbeds
Reports at LCB Workshop (HN, I. Legrand)Reports at LCB Workshop (HN, I. Legrand) 1/20001/2000 Phase 3 Letter of Intent (4 LHC Experiments)Phase 3 Letter of Intent (4 LHC Experiments) 2/20002/2000 Six Papers and Presentations at CHEP2000:Six Papers and Presentations at CHEP2000:
Begin Studies with TapesBegin Studies with Tapes Spring 2000Spring 2000 MONARC Model Recognized by Hoffmann WWC Panel;MONARC Model Recognized by Hoffmann WWC Panel;
Basis of Data Grid Efforts in US and Europe Basis of Data Grid Efforts in US and Europe
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT)
MONARC Working Groups/ChairsMONARC Working Groups/Chairs
Analysis Process Design WGAnalysis Process Design WG P. Capiluppi (Bologna, CMS)P. Capiluppi (Bologna, CMS)Studied the analysis workload, job mix and profiles, time to complete the Studied the analysis workload, job mix and profiles, time to complete the reco. and analysis jobs. Worked with the Simulation WG to verify that the reco. and analysis jobs. Worked with the Simulation WG to verify that the specified resources in the models could handle the workload. specified resources in the models could handle the workload. Architectures WGArchitectures WG Joel Butler (FNAL, CMS)Joel Butler (FNAL, CMS)Studied the site and network architectures, operational modes and Studied the site and network architectures, operational modes and services provided by Regional Centres, data volumes stored and services provided by Regional Centres, data volumes stored and analyzed, candidate architectures for CERN, Tier1 (and Tier2) Centresanalyzed, candidate architectures for CERN, Tier1 (and Tier2) CentresSimulation WGSimulation WG K. Sliwa (Tufts, ATLAS)K. Sliwa (Tufts, ATLAS)Defined the methodology, then (I. Legrand et al.) designed, built and Defined the methodology, then (I. Legrand et al.) designed, built and further developed the simulation system as a toolset for users. further developed the simulation system as a toolset for users. Validated the simulation with the Testbeds group.Validated the simulation with the Testbeds group.Testbeds WGTestbeds WG L. Luminari (Rome, ATLAS) L. Luminari (Rome, ATLAS) Set up small and larger prototype systems at CERN, several INFNSet up small and larger prototype systems at CERN, several INFN
and US sites and Japan, and used them to characterize the performance and US sites and Japan, and used them to characterize the performanceof the main elements that could limit throughput in the simulated systemsof the main elements that could limit throughput in the simulated systemsSteering GroupSteering Group Laura Perini (Milan, ATLAS) Laura Perini (Milan, ATLAS) Harvey Newman (Caltech, CMS)Harvey Newman (Caltech, CMS)
Tests:Tests: (1) Machine A local (2 CPU)(1) Machine A local (2 CPU) (2) Machine C local (4 CPU)(2) Machine C local (4 CPU) (3) Machine A (client) and Machine C (server)(3) Machine A (client) and Machine C (server)
number of client processes: 1, 2, 4, ..., 32number of client processes: 1, 2, 4, ..., 32
Raw Data
jobs
Raw Data
jobs
Raw Data
jobs
(1)
(2)
(3)
Job on Machine A
Time
CPUI/O CPUI/O CPUI/O...
Event
CPUI/O CPUI/O CPUI/O...
Job on Machine C
CPU 17.4 SI95 I/O 207MB/s @ 54MB file
CPU 14.0 SI95 I/O 31MB/s @ 54MB file
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT)
MONARC Simulation SystemMONARC Simulation System Multitasking Processing Model Multitasking Processing Model
“Interrupt” driven scheme: For each new task or when one task is finished, an interrupt is
generated and all “times to completion” are recomputed.
It provides:
An easy way to apply different load balancing schemes
An efficient mechanism to simulate multitask processing
Assign active tasks (CPU, I/O, network) to Java threads Concurrent running tasks share resources (CPU, memory, I/O)
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT)
Example : Physics Analysis at Example : Physics Analysis at Regional CentresRegional Centres
Similar data processingSimilar data processing
jobs are performed in jobs are performed in each of several RCs each of several RCs
There is profile of jobs,There is profile of jobs,each submitted to a job each submitted to a job schedulerscheduler
Each Centre has “TAG”Each Centre has “TAG”and “AOD” databases and “AOD” databases replicated.replicated.
Main Centre provides Main Centre provides “ESD” and “RAW” data “ESD” and “RAW” data
Each job processes Each job processes AOD data, and also aAOD data, and also aa fraction of ESD and a fraction of ESD and RAW data.RAW data.
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT)
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT)
Total Time for Jet & Muon Total Time for Jet & Muon Production JobsProduction Jobs
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT)
Job Scheduling in Distributed SystemsJob Scheduling in Distributed SystemsSelf-Organizing Neural Network (SONN)Self-Organizing Neural Network (SONN)
Efficient job scheduling policies in large distributed systems, which evolve dynamically, is one of challenging tasks in HEP computing.
Analyze a large number of parameters describing the jobs and the time dependent state of the system. The problem is more difficult when not all these parameters are correctly identified, when knowledge about the system state is incomplete and/or known with a certain delay.
This study is to develop tools able to generate effective job
scheduling policies in distributed architectures, based on a “Self Organizing Neural Network” (SONN) system able to dynamically learn and cluster information in a large
dimensional parameter space. An adaptive middleware layer, aware of current available resources and
learning from "past experience”, developing decision rules heuristically.
We applied the SONN approach to the problem of distributing jobs among regional centers. The evaluation for this job scheduling procedure has been done with the Monarc Simulation Tool.
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT)
Intuitive Scheduling Model Intuitive Scheduling Model
Intuitive scheduling
scheme
Job description parameters {J}External RCs state description {R}
Job Scheduling decision {D}
Knowledge & experience( +constants ) Local RC State {S}
Execution
Job
EvaluationPerformance quantification {X}
Job
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT)
A simple toy example A simple toy example
We assume that the time to execute a job in the local farm having a certain load () is:
Where t0 is the theoretical time to perform the job and f() describes the effect of the farm load in the job execution time.
If the job is executed on a remote site, an extra factor () is introduced in the response time:
tr t0 1 f r( )+( )=
tl t0 1 f ( )+( )=
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT)
Evaluating the SONN Scheduling Evaluating the SONN Scheduling Scheme with the MONARC SimulationScheme with the MONARC Simulation
DECISION
Intuitive scheduling
scheme
RCs SIMULATION
Activities Simulation Job
“Self OrganizingNeural Net”
Job
Job
JobJob
Job
Job
Job
Warming up
The learning process of the Self Organizing Network dynamically adapts itself to changes in the system configuration.
It may require a "classical" scheduling algorithm as a starting point with the aim to dynamically improve it in time.
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT)
2 RCs Learning to Export Jobs2 RCs Learning to Export Jobs
Day = 9
Cern30 CPUs
Caltech25 CPUs
No Activity
Kek20 CPUs
0.8MB/s ; 200 ms RTT
<E> =0.30 <E> =0.70
<E> =0.69
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT)
2 RCs Learning to Export Jobs2 RCs Learning to Export Jobsca
ltec
hke
kce
rn
Day 0 Day 1 Day 2 Day 9 Day 6
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT)
May 2000: May 2000: CMS HLT - simulationCMS HLT - simulation http://www.cern.ch/MONARC/sim_tool/Publish/CMS/publish/ http://home.cern.ch/clegrand/MONARC/CMS_HLT/sim_cms_hlt.pdf
June 2000 June 2000 Tape usage studyTape usage study http://www.cern.ch/MONARC/sim_tool/Publish/TAPE/publish/
Aug 2000 Aug 2000 Update of the Simulation tool for large scale simulationsUpdate of the Simulation tool for large scale simulations.. http://home.cern.ch/clegrand/MONARC/WSC/wsc_final.pdf
(to be presented at the IEEE Winter Simulation Conference: WSC2000) http://home.cern.ch/clegrand/MONARC/ACAT/sim.ppt
Oct 2000 Oct 2000 A study in using SONN for job schedulingA study in using SONN for job scheduling http://www.cern.ch/MONARC/sim_tool/Publish/SONN/publish/ http://home.cern.ch/clegrand/MONARC/ACAT/sonn.ppt
Nov 2000 Nov 2000 Update of the CMS computing needsUpdate of the CMS computing needs Based on the new requirements data, to update the baseline
models for CMS computing
Dec 2000 Dec 2000 Simulation of the current CMS Higher Level Trigger productionSimulation of the current CMS Higher Level Trigger production
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT)
Jan 2001Jan 2001 Update of the MONARC Simulation SystemUpdate of the MONARC Simulation SystemNew release, including dynamic scheduling and replication modules (policies);Improved Documentation
Feb 2001Feb 2001 Role of Disk and Tapes in Tier1 and Tier2 CentersRole of Disk and Tapes in Tier1 and Tier2 CentersMore elaborate studies to describe Tier2-Tier1 interaction and to evaluatedata storage needs
May 2001May 2001 Complex Tier0 - Tier1 - Tier2 simulation: Complex Tier0 - Tier1 - Tier2 simulation: Study the role of Tier2 centersStudy the role of Tier2 centers
Aim is to perform a complete CMS data processing scenario including all major tasks distributed among regional centers
Jul 2001Jul 2001 Real SONN module for job scheduling; based on Mobile agentsReal SONN module for job scheduling; based on Mobile agentsCreate a Mobile Agents framework able to provide the basic mechanism for scheduling between regional centers
Sep 2001Sep 2001 Add monitoring agents for network and system statesAdd monitoring agents for network and system statesbased on (SNMP)based on (SNMP)
Collect system dependent parameters using SNMP and integrate them intothe mobile agents used for scheduling
Dec 2001Dec 2001 Study of the correlation between data replication and job Study of the correlation between data replication and job schedulingscheduling
Combine the scheduling policies with data replication to optimize different cost functions; Integrate this into the Mobile Agents framework
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT)
MONARC StatusMONARC Status
MONARC is on the way to specifying baseline Models MONARC is on the way to specifying baseline Models representing cost-effective solutions to LHC representing cost-effective solutions to LHC
ComputingComputing MONARC’s Regional Centre hierarchy model has been MONARC’s Regional Centre hierarchy model has been
accepted by all four LHC Experimentsaccepted by all four LHC Experiments And is the basis of HEP Data Grid work.And is the basis of HEP Data Grid work.
A powerful simulation system has been developed, and is A powerful simulation system has been developed, and is being used both for further Computing Model, Strategy being used both for further Computing Model, Strategy Development, and Grid-component studies. Development, and Grid-component studies.
There is strong synergy with other advanced R&D projects:There is strong synergy with other advanced R&D projects:PPDG, GriPhyN, EU HEP Data Grid, ALDAP and others.PPDG, GriPhyN, EU HEP Data Grid, ALDAP and others.
Example Computing Models have been provided, and are Example Computing Models have been provided, and are being updatedbeing updated
This is important input for the Hoffmann LHC This is important input for the Hoffmann LHC Computing ReviewComputing Review
The MONARC Simulation System is now being applied to The MONARC Simulation System is now being applied to key Data Grid issues, and Grid-tool design and Developmentkey Data Grid issues, and Grid-tool design and Development
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT)
MONARC Future: Some “Large” MONARC Future: Some “Large” Grid Issues to Be Studied Grid Issues to Be Studied
Query estimation and transaction-design forQuery estimation and transaction-design forreplica managementreplica management
Queueing and co-scheduling strategiesQueueing and co-scheduling strategies Strategy for use of tapesStrategy for use of tapes Strategy for resource sharing among sites Strategy for resource sharing among sites
and activitiesand activities Packaging of object-collections into blocks for Packaging of object-collections into blocks for
transport across networks; integration with transport across networks; integration with databasesdatabases
Effect on Networks of Large windows, QoS, etc.Effect on Networks of Large windows, QoS, etc. Behavior of the Grid Services to be developedBehavior of the Grid Services to be developed
No vember 15, 2000 MONARC Project Status Report Harvey B Newman (CIT)
CD
CH
MDMH TH
MC
UF.bootMyFED.boot
UserCollection
MDCDMC
TD
AMS
ORCA 4 tutorial, part II - 14. October 2000
From UserFederation From UserFederation To Private Copy To Private Copy