Top Banner
Setting up a Pan- European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of Edinburgh The ENACTS “Demonstrator”
30

Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

Dec 31, 2015

Download

Documents

Elmer Phelps
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

Setting up a Pan-European Datagrid using

QCDgrid technology

Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat

EPCC, The University Of Edinburgh

The ENACTS “Demonstrator”

Page 2: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

This Talk

• A summary of the title:– ENACTS Demonstrator– Pan-European Datagrid– QCDgrid.

Page 3: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

ENACTS• European Network for Advanced

Computing Technology for Science.• EC-funded project with 14 members.• Started in 2000.• Attempt to ensure that Europe did not

lag behind US in grid technology.• ENACTS originally consisted of many

reports reports with little technical work.

Page 4: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

14 ENACTS partners

Please see:www.enacts.org

Page 5: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

ENACTS Demonstrator: Partners involved

• EPCC, Edinburgh, UK– Chris Johnson– Jean-Christophe Desplat– James Perry

• Parallab, Bergen, Norway– Jacko Koster– Jan-Frode Myklebust– Csaba Anderlik

• TCD, Dublin, Ireland– Geoff Bradley– Bob Crosbie.

Page 6: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

ENACTS Demonstrator…• Objective

– “To enable the formation of a pan-European HPC metacentre…”.

• The Demonstrator is part of Phase II of the activity and its specific objective is – “to draw together the results from all of the

Phase I technology studies and evaluate their practical consequences for operating a pan-European metacentre and constructing a best-practice model for collaborative working amongst individual facilities”.

Page 7: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

ENACTS Demonstrator• ENACTS Phase I

– consisted mainly of reports• ENACTS Phase II

– contained the Demonstrator activity• Phase I identified technologies such

as– Globus, replica management, LDAP

database and XML metadata.• All these technologies are inherent

in the QCDgrid system.

Page 8: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

metacentre

• A “virtual organisation” with data described by metadata.– Users submit data from any site– The data is stored on “the grid”– All data is stored reliably– All data is easy to retrieve.

Page 9: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

Our Demonstrator

• Set-up QCDgrid across the 3-sites to create our metacentre.

• Use a genuine scientific scenario.• Use an XML schema for meta-data.• Ensure the data is portable

between the systems involved.

Page 10: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

Summary so far…

• ENACTS demonstrator project is an EC funded project involving 3 partners

• attempting to set up a pan-European metadata centre

• using QCDgrid technology.

Page 11: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

What is QCDgrid?• It’s not QCD-specific!!• QCDgrid was written to manage the QCD

data belonging to the UK QCD community (UKQCD)– 2 previous All-Hands talks (James Perry,EPCC).

• The original grid consisted of 6 geographically dispersed sites.

• Around 5 terabytes of data.• The amount of data is expected to grow

dramatically when QCDOC comes online later in 2004.

Page 12: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

What is QCDgrid?

• QCDgrid is a layer of software written on top of the Globus Toolkit.– Uses security infrastructure and basic

grid operations such as data transfer– also uses more advanced features

such as the replica catalogue.

Page 13: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

How does QCDgrid work?

• A control thread runs on one storage element– constantly scans grid – ensures all storage elements are

working– ensures all files are stored in at least 2

suitable locations.

• When a new file is added it is rapidly replicated across grid onto 2 or more geographically separate sites.

Page 14: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

QCDgrid:Dealing with node loss

• If a storage element is lost unexpectedly, all files that were held on the failed system are replicated elsewhere.

• QCDgrid can cope with loss of entire site.

• If the control node is lost – control reverts to a secondary node.

Page 15: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

How is QCDgrid used?

• Assuming QCDgrid is set up and the control thread and metadata database are running…

• and that each user has a valid certificate…

• User submits the usual initialisation commands– grid-proxy-init– source the correct set up files– sets a few paths, classpaths, etc.

Page 16: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

Submitting files

Command line• User submits a file (datafile.dat)

– put-file-on-qcdgrid datafile.dat

• AND an accompanying metadata file which describes the above data file– put-file-on-qcdgrid datafile.xml– exist:/db>put datafile.xml

Page 17: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

Submitting files

(better still…) Using the GUI • User runs the Java GUI and submits

both data file and metadata file at the same time.

• metadata file has a tag for the name of the corresponding data file– marries the two– every data file should have an

associated metadata file.

Page 18: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

Metadata Browser

• Can submit, search and retrieve data using this Java browser.

Page 19: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

Metadata/Datagrid Integration

• QCDgrid software deals with storage and replication of data.

• eXist database deals with cataloging of data using metadata.

• All can be controlled using command line or GUI.

Page 20: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

More on QCDgrid

• Other commands available with QCDgrid– qcdgrid-list lists all files on grid.– get-file-from-qcdgrid retrieves

files from grid.– i-like-this-file attempts to store a

file local to the user.

• There are also several commands for administering nodes, etc.

Page 21: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

QCDgrid Sites

?

QCDgrid ENACTS ?

Page 22: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

Our depoyment of QCDgrid

• UK (all using e-science CA) -> Europe (all using different CAs).

• Moving from a homogenous Linux environment to a mixed one (Linux/Solaris).

• Moving from Globus Toolkit (GT) 2.0 -> GT2.4.

Page 23: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

How difficult was it?

• Certificates– Some certificate issuers took several

weeks to issue certificates.– Different policies on issuing certificates,

e.g. non-human users (project accounts).

– Not too many difficulties using multiple certificates.

Page 24: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

How difficult was it?

• Moving to a heterogeneous environment.– Installing of Globus 2.x is difficult on

Solaris – led to the Solaris node being unable to submit data.

• A few minor problems getting system specific functions to work (e.g. df command).

• Usual minor compilation issues – did require gcc compiler.

Page 25: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

How difficult was it?

• Globus– This presented the biggest difficulty!– Installation difficulties and firewall

issues• several months before a “helloworld” job

would run from any site to any other.

– Migrating from GT 2.0 -> GT 2.4• Major difficulties!• Had to re-write the replica schema.• Remove some error-handling functionality.

Page 26: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

Users – Scientific scenario

• Appealed to QCD– Given more time we could have found a

different discipline.

• Two users from TCD as well as those involved in the project itself.

• Code used was MILC code.• Monte-Carlo simulation to investigate

string-breaking.• Three Monte-Carlo chains, one on each

node.

Page 27: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

User feedback

• Generally impressed with functionality.• Some frustration in getting certificates.• Difficulty persuading users to use

metadata, although agreed it was useful.

• Would make more use of file-sharing using such a system.

• Liked machine-independent data.• Wanted grid to do job submission.

Page 28: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

Conclusions• We have created a pan-European datagrid

(metacentre) using QCDgrid technology.• The systems works well…• …Globus is the limiting factor.• Users were impressed with the system in

use.• The system was not tested with many users

but we can see no reason why it would not scale to many users/nodes if Globus allows.

Page 29: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

Acknowledgements

• James Perry.• Jean-Christophe Desplat, Jacko

Koster, Jan-Frode Myklebust, Csaba Anderlik, Geoff Bradley, Bob Crosbie.

• Craig McNeile and Bálint Joó.• Mike Peardon and Jimmy Juge.

Page 30: Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

References• ENACTS

– http://www.enacts.org

• QCDgrid– http://www.gridpp.ac.uk/qcdgrid– code:

http://forge.nesc.ac.uk/projects/qcdgrid

• MILC code– http://physics.indiana.edu/~sg/milc.html