Top Banner
Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow
17

Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

Mar 28, 2015

Download

Documents

Cody Lawson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

Southgrid Status

Pete Gronbech: 21st March 2007

GridPP 18 Glasgow

Page 2: Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

RAL PPD

• Existing 30 Xeon cpus, and 6.5 TB storage supplemented by upgrade mid 2006

• RALPPD installed large upgrade 200 (Opteron 270) cpu cores equiv. to and extra 260 KSI2k plus 86TB of storage.

• The 50TB which was loaned to RAL Tier 1, and is now being returned.

• 10Gb/S Connection to RAL Backbone– RAL Currently connected at

1GB/s to TVM– Will be connected at 10Gb/s

to SJ5 Backbone by 01/04/2007

Page 3: Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

RAL PPD (2)

• Supports 22 Different VOs of which 18 have run jobs in the last year.

• 1,000,000 kSI2k Hours delivered in last 12 months

• 2007 upgrade ordered Disk and CPU:– 13 x 6TB SATA Disk servers, 3Ware RAID controllers 14 x 500GB WD

disks– 32 x Dual Intel 5150 Dual Core CPU Nodes with 8GB RAM

• Orders Placed, Delivery expected in the next 7 days• Will be installed in the Atlas Centre, due to power/cooling issues in

R1

Page 4: Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

Status at Cambridge

• Currently glite 3 on SL3• CPUs: 32 2.8GHz Xeon• 3 TB Storage

– DPM enabled Oct 05

• Upgrade arrived Christmas 2006 32 Intel ‘ Woodcrest’ based servers, giving 128 cpu cores equiv. to approx 358 KSI2k.

• Local computer room upgraded.

• Storage upgrade to 40-60TB expected this summer.

• Condor version 6.8.4 is being used but the latest LCG updates have a dependency for condor-6.7.10-1. This development release should not be used in a production environment. LCG/glite should not be requiring this release.

Page 5: Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

Cambridge (2)

• CAMONT VO supported at Cambridge, Oxford and Birmingham. Job submission by Karl Harrison and David Sinclair

• LHCb on Windows project (Ying Ying Li)– Code ported to windows

• HEP 4 node cluster• MS Research Lab 4 node cluster (Windows compute

cluster)

– Code running on a server at Oxford, possibly expansion on OERC windows cluster

– Possible Bristol nodes soon

Page 6: Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

• Status– All nodes running SL3.0.5 with glite 3– DPM enabled Oct 05, LFC installed Jan 06

• Existing resources– GridPP nodes plus local cluster nodes used

to bring site on line. Local cluster being integrated.

• New resources– 10TB of storage coming on line soon.– Bristol expect to have a percentage of the

new Campus cluster from early 2007. Includes CPU, high quality and scratch disk resources

Status at Bristol

•Jon Wakelin is working 50% on GPFS and Storm.•IBM loan kit

Page 7: Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

Status at Birmingham

• Currently SL3 with glite 3• CPUs: 28 2.0GHz Xeon (+98 800MHz )• 1.9TB DPM, being replaced by new

10TB array• Babar Farm starting to become

unreliable due to many disk and PSU failures.

• Run Pre Production Service which is used for testing new versions of the middleware.

• Birmingham will have a percentage of the new Campus Cluster due May/June 2007. First phase: 256 nodes each with two dual core opteron CPUs.

Page 8: Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

Status at Oxford

• Currently glite 3 on SL305• CPUs: 80 2.8 GHz

– Compute Element• 37 Worker Nodes, 74 Jobs Slots, 67 KSI2K

– 37 Dual 2.8GHz P4 Xeon, 2GB RAM

– DPM SRM Storage Element• 2 Disks servers 3.2TB Disk Space• 1.6 TB DPM server – second 1.6TB DPM disk

pool node. Bug in DPM stopped load balancing across pools, will be fixed with the latest glite update.

– Logical File Catalogue– Mon and UI nodes– GridMon Network Monitor

• 1Gb/s Connectivity to the Oxford Backbone– Oxford currently connected at 1Gb/s to TVM

• Submission from the Oxford CampusGrid via the NGS VO is possible.

Page 9: Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

Usage

• Oxford supports 20 VOs. 17 of which have run jobs in the last year.

• Most active VOs are LHCb (38.5%), Atlas (21.3%) and Biomed (21%).

• 300,000 kSI2k hours delivered in the last 12 months.

Page 10: Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

New Computer Room

The New Computer room being built at Begbroke Science Park jointly for the Oxford Super Computer and the Physics department, will provide space for 55 (11KW) computer racks. 22 of which will be for Physics. Up to a third of these can be used for the Tier 2 centre.

Disk and CPU (Planned purchase for Summer 07)32 * Dual Intel 5150 Dual Core CPU Nodes with 8GB RAM giving 353kSI2k10 * 12TB SATA Disks servers giving 105 TB usable (after RAID 6)Quad core CPU’s will be benchmarked , both for SPEC rates and power consumption. Newer 1TB disks will be more common place by the Summer.

Page 11: Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

Oxford DWB Computer room

Local Physics department Infrastructure computer room (100KW) has been agreed.

Should be ready in May/June 07.

This will relieve local computer rooms and possible house T2 equipment until the Begbroke room is ready. Racks that are currently in unsuitable locations can be re housed.

Page 12: Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

Other Southgrid sites

• Other groups within the Southgrid EGEE area are; EFDA-JET with 40 cpus up and running

• The Advanced Computing and Emerging Technologies (ACET) Centre, School of Systems Engineering, University of Reading started setting up their cluster in Dec 06.

Page 13: Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

SouthGrid CPU delivered

SouthGrid provided 1.4MKSI2K hours in the year March 06 -07

Page 14: Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

SouthGrid VO/Site Shares

Page 15: Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

Steve Lloyd Tests21.3.07

Page 16: Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

Site Monitoring

• Grid wide provided monitoring– GSTAT– SAM– GOC Accounting– Steve Lloyds Atlas test page

• Local Site Monitoring– ganglia– pakiti– torque/maui monitoring CLIs– Investigating MonAMI

• Developing– Nagios; RAL PPD have developed many plugins, Other

SouthGrid sites are just setting up

Page 17: Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

Summary

• SouthGrid continues to run well, and its resources are set to expand throughout this year.

• Birmingham new University Cluster will be ready in the Summer.• Bristol small cluster is stable, new University cluster is starting to

come on line.• Cambridge cluster upgraded as part of the CamGrid SRIF3 bid.• Oxford will be able to expand resources this Summer when the new

computer room is built. • RAL PPD has expanded last year and this year, way above what was

originally promised in the MoU.