EGEE-II INFSO-RI-031688
Enabling Grids for E-sciencE
www.eu-egee.org
EGEE and gLite are registered trademarks
Ganga Tutorial
Hurng-Chun Lee
EGEE Tutorial, Manchester 2
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Agenda
• Part I: Ganga introduction
• Part II: Ganga hands-on
• Part III: More about Ganga
EGEE-II INFSO-RI-031688
Enabling Grids for E-sciencE
www.eu-egee.org
EGEE and gLite are registered trademarks
Part I: Ganga Introduction
Ganga overview
EGEE Tutorial, Manchester 4
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Motivation
• In practice users deal with multiple computing backends
PBSPBS
LSF
PANDA
SGELocalPC
LHCb Users
Atlas Users
Biomed,...
EGEE Tutorial, Manchester 5
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Motivation
• FAQ: running applications on multiple computing backends
I must learnmany interfaces
How to configuremy applications?
Do I get a consistent viewon all my jobs?
PBSPBS
LSF
PANDA
SGELocalPC
EGEE Tutorial, Manchester 6
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
User requirements
• User requirements– interact with all backend systems in a very similar way
submit, kill, monitor jobs
– configure the applications easily and transparently across desired backends
– organize work job history: keep track of what user did save job outputs in a consistent way reuse configuration of previously submitted jobs
Ganga
EGEE Tutorial, Manchester 7
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Wish list
• Wish list– easy to learn and use– powerful: not limiting the features of the backends– close to the application (which is typically compiled locally)– not imposing single working style: scripts, command line, GUI,...
Ganga
EGEE Tutorial, Manchester 8
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Considerations of scientific application development
• Realities– Computing environment is heterogeneous– Computing technology is evolving– User requirement is also evolving
• Requirement– Application users prefer to learn as few as possible the tools
which are light-weight, handy and well-integrated with each other.
• Ganga tries to answer the questions:– How to minimize developer’s effort in building applications?– How to minimize user’s effort in running applications?
EGEE Tutorial, Manchester 9
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Ganga
• Ganga: Job Management Interface– a utility which you download to your computer
or it is already installed in your institute in a shared area• for example: /nfs/sw/ganga/install/4.2.14
it is an add-on to installed software
– comes with a set of plugins for Atlas and LHCb open - other applications and backend may be easily added
• even by users
GangaFrameworkGangaFramework
........Backend PluginsBackend Plugins
Application PluginsApplication PluginsGangaApplication SoftwareLSF ClientLCG UI
EGEE Tutorial, Manchester 10
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Ganga Job
Where the Ganga journey starts …Where the Ganga journey starts …
Mandatory
Optional
EGEE Tutorial, Manchester 11
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Plug-in based design
Common interface
Specific implementation
Ease user’s experience in switching between different technologies
Concentrate developer’s effort in specific domain
EGEE Tutorial, Manchester 12
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Applications & backends
EGEE Tutorial, Manchester 13
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
The Ganga development team
-Support for development work
-Core team:F.Brochu (Cambridge), U.Egede (Imperial), J. Elmsheuser (Munich),
K.Harrison (Cambridge), H.C.Lee (ASGC Taipei), D.Liko (CERN), A.Maier (CERN), J.T.Moscicki (CERN), A.Muraru (Bucharest), W.Reece (Imperial), A.Soroko (Oxford), CL.Tan (Birmingham)
- Ganga is an ATLAS/LHCb joint project
EGEE-II INFSO-RI-031688
Enabling Grids for E-sciencE
www.eu-egee.org
EGEE and gLite are registered trademarks
Part I: Introduction to Ganga
Using Ganga
EGEE Tutorial, Manchester 15
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Download, Install, First launch
wget http://ganga.web.cern.ch/ganga/download/ganga-install
python ganga-install \ --prefix=~/opt/ganga \ --extern=GangaAtlas,GangaGUI,GangaPlotter \ 4.2.8
Download & Install
export PATH $HOME/opt/ganga/install/slc3_gcc323/4.2.8/bin:$PATH
ganga -o’[LCG]ENABLE_EDG=False’ -o’[LCG]ENABLE_GLITE=False’
*** Welcome to Ganga ***Version: Ganga-4-2-8Documentation and support: http://cern.ch/gangaType help() or help('index') for online help.
In [1]:Do you really want to exit ([y]/n)?
First Launch
download installer
installation prefixInstallation of external modules
Ganga version
start Ganga with inline configurations
Ganga CLIP
<ctrl>-D to exit Ganga CLIP
EGEE Tutorial, Manchester 16
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Get your hands dirty ...
• Job().submit() submit and run a test job on local machine
• Job(backend=LCG()).submit() submit and run a a test job on LCG
• jobs browse the created jobs (job history)
• j = jobs[1] get the first job from the job history
• j print the details of the job and see what you can set for a job
• j.copy().submit() make a copy of the job and submit the new job
• j.<tab> see what you can do with the job
EGEE Tutorial, Manchester 17
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Configurations
[Configuration]TextShell = IPython... ...[LCG]EDG_ENABLE = True... ...
Syntax
Hardcoded configurations
export GANGA_CONFIG_PATH = /some/physics/subgroup.ini:GangaLHCb/LHCb.ini ganga --config-path=/some/pyhsycis/subgroup.ini:GangaLHCb/LHCb.ini
~/.gangarcganga -o
How to set configurations
user config > site config > release config
Override sequence
Python ConfigParser standard
release config
site config
user config
EGEE Tutorial, Manchester 18
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
The ‘gangadir’
• created at the first launch within $HOME directory
• To locate it in different directory:– [DefaultJobRepository] local_root = /alternative/gangadir/repository– [FileWorkspace] topdir = /alternative/gangadir/workspace
Metadata of jobs
Data of jobs
EGEE Tutorial, Manchester 19
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
*** Welcome to Ganga ***Version: Ganga-4-2-8Documentation and support: http://cern.ch/gangaType help() or help('index') for online help.
In [1]: jobsOut[1]: Statistics: 1 jobs--------------# id status name subjobs application backend backend.actualCE # 1 completed Executable LCG lcg-compute.hpc.unimelb.edu.au:2119/jobmanage
CLIP
User interfaces
GUI
#!/usr/bin/env ganga#-*-python-*-import timej = Job()j.backend = LCG()j.submit()while not j.status in [‘completed’,’failed’]: print(‘job still running’) time.sleep(30)
./myjob.exec
ganga ./myjob.exec
In [1]:execfile(“myjob.exec”)
GPI & Scripting
EGEE Tutorial, Manchester 20
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Some handy functions
• <tab> completion
• <page up/down> for cmd history
• system command integration
• Job template
• In[1]: plugins()– plugins(‘backends’)
• In[2]: help()
• etc.
In[1]: j = jobs[1]
In[2]: cat $j.outputdir/stdoutHello World
In[1]: t = JobTemplate(name=’lcg_simple’)
In[2]: t.backend = LCG(middleware=’EDG’)
In[3]: templatesOut[3]: Statistics: 1 templates--------------# id status name subjobs application backend backend.actualCE # 3 template lcg_simple Executable LCG
In[4]: j = Job(templates[3])In[5]: j.submit()
EGEE Tutorial, Manchester 21
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
j = Job()j.application = Athena()j.application.option_file = ‘myOpts.py’j.application.prepare(athena_compile = False)
j.inputdata = DQ2Dataset()j.inputdata.dataset = ‘interestingDataset.AOD.v12003104’j.inputdata.type = ‘DQ2_Local’
j.outputdata = AthenaOutputDataset()j.outputdata.outputdata = ‘myOutput.root’
j.splitter = AthenaSplitterJob(numsubjobs=2)j.merger = AthenaOutputMerger()
j.backend = LCG( CE=’ce102.cern.ch:2119/jobmanager-lcglsf-grid_2nh_atlas’ )j.submit()
CLIP modeCLIP mode
Real Application: the ATLAS data analysis application
application
inputdata
Outputdata
Splitter & Merger
Scripting modeScripting mode
EGEE Tutorial, Manchester 22
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Behind the scene ...
EGEE Tutorial, Manchester 23
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Application plugin
generic logic
specific backend binding
Application plugin
A good example: python/GangaTutorial/Lib/*.py
Splitter
Merger
Datasets
EGEE Tutorial, Manchester 24
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Summary
• What is Ganga?– An easy-to-use front-end for job definition and management– A generic framework extensible for specific applications– A light-weight application component fully implemented in
Python
• What Ganga can help in application development?– Ganga provides a set of ready-to-use APIs for high-level job
management– It simplifies developers’ effort in developing applications– End-users can easily extend Ganga for their own purpose
EGEE-II INFSO-RI-031688
Enabling Grids for E-sciencE
www.eu-egee.org
EGEE and gLite are registered trademarks
Part II: Ganga hands-on
EGEE Tutorial, Manchester 26
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Step 0: launch Ganga CLIP
https://twiki.cern.ch/twiki/bin/view/ArdaGrid/EGEETutorialPackage
• Skip the installation step• Start your Ganga CLIP session using the commands:
shell> ganga --config-path=GangaTutorial/Tutorial.ini \
--option=‘[LCG]VirtualOrganisation=gilda’
EGEE Tutorial, Manchester 27
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Step 1: Your first Ganga job - an arbitrary shell script
In [1]: !vi myscript.sh
In [2]: !chmod +x myscript.sh
In [2]: j = Job()
In [3]: j.application = Executable()
In [4]: j.application.exe = File(‘myscript.sh’)
In [5]: j.application.args = [‘ganga’]
In [6]: j.backend = Local()
In [7]: j.submit()
In [8]: jobs
In [9]: j.peek()
In [10]:cat $j.outputdir/stdout
#!/bin/shecho "hello! ${1}”echo $HOSTNAMEcat /proc/cpuinfo | grep 'model name’cat /proc/meminfo | grep 'MemTotal'
./myscript.sh ganga
EGEE Tutorial, Manchester 28
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Step 2: your first Ganga job on the Grid
In [11]:j = j.copy()
In [12]:j.backend = LCG()
In [13]:j.application.args = [‘grid’]
In [14]:j.submit()
In [15]:j
In [16]:cat $j.backend.loginfo(verbosity=1)
In [17]:jobs
EGEE Tutorial, Manchester 29
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Exercise: Prime number lookup
• Using the following Ganga components for this exercise:– Application: PrimeFactorizer()– Dataset: PrimeTableDataset()– Splitter: PrimeFactorizerSplitter()
Hints of the exercise
• Create a Ganga job
• Application specification
• Input dataset specification
• Splitter specification
• Backend specification
• Submit the job
EGEE-II INFSO-RI-031688
Enabling Grids for E-sciencE
www.eu-egee.org
EGEE and gLite are registered trademarks
Part III: More about Ganga
EGEE Tutorial, Manchester 32
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Integration with frameworks
Job statistics on 2580 grid jobs from Ganga
EGEE Tutorial, Manchester 33
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Web portal for biologists
• Interface created by biologists (Model-View-Controller design pattern)
– The Model makes use of Ganga as a submission tool and DIANE to better handle docking jobs on the Grid– The Controller organizes a set of actions to perform the virtual screening pipeline; The View represents biological
aspects
EGEE Tutorial, Manchester 34
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
GANGA Activities
• Main Users
• Other activities
HARP GarfieldGarfield
EGEE Tutorial, Manchester 35
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
More than job submission: Monitoring & Accounting
submission tool
EGEE Tutorial, Manchester 36
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
More than 500 unique Users
~ 550 different users, ~100 users weeklythe monitoring started end 2006
Easter
~60% Atlas~25% LHCb~15% others
EGEE Tutorial, Manchester 37
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Ganga usage
over 50 local sites
CLIP and scripts most popular
EGEE Tutorial, Manchester 38
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
More info.
• Ganga Home: http://cern.ch/ganga
• Official Ganga User’s Guide: http://ganga.web.cern.ch/ganga/user/html/GangaIntroduction/
• Tutorial for ATLAS data analysis using Ganga: https://twiki.cern.ch/twiki/bin/view/Atlas/DistributedAnalysisUsingGanga
• Looking for helps:– ATLAS user support: [email protected]– direct support from developers: [email protected]
EGEE Tutorial, Manchester 39
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Working with Ganga …
• Your applications developed in testbed environments can be smoothly migrated to production environments
• Your jobs are managed in a systematic way
• Your grid jobs benefit from a hidden job wrapper instrumented for– advanced input/output control– runtime progress monitoring
• New technologies will be transparent for you without changing your way of running applications