Top Banner
STEINBUCH CENTRE FOR COMPUTING - SCC www.kit.edu KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association GridKa Site Report Andreas Petzold
16

STEINBUCH CENTRE FOR COMPUTING - SCC KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

Mar 28, 2015

Download

Documents

Jennifer Rhodes
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: STEINBUCH CENTRE FOR COMPUTING - SCC  KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

STEINBUCH CENTRE FOR COMPUTING - SCC

www.kit.eduKIT – University of the State of Baden-Württemberg andNational Laboratory of the Helmholtz Association

GridKa Site Report

Andreas Petzold

Page 2: STEINBUCH CENTRE FOR COMPUTING - SCC  KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)

Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013

GridKa Batch Farm

Univa Grid Engine is running fine

~150kHS06

~10k job slots

98 replacement machines this summerSysGen 2U 4 node chassis

2x Intel Xeon E5-2670 (8-core, 2,6 GHz, 312 HS06) 3GB/core3x 500GB HDD

Page 3: STEINBUCH CENTRE FOR COMPUTING - SCC  KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)

Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013

WN Migration to SL6

Migration of GridKa compute fabric to SL6 finished

Performance: +5.4%Intel Xeon E5-2670 (8 cores, 2.6 GHz) HT off / HT on:

SL5 + default compiler: 267 HS06 /335 HS06

SL6 + default compiler: 283 HS06(+5.8%)/348 HS06(+3.9 %)

SL5 + gcc-4.8.1: 289 HS06 /353 HS06

AMD Opteron 6168 (12 cores, 1.9 GHz):

SL5 + default compiler: 183 HS06

SL6 + default compiler: 193 HS06 (+ 5.6 %)

SL5 + gcc-4.8.1: 187 HS06

Page 4: STEINBUCH CENTRE FOR COMPUTING - SCC  KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)

Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013

Ivy Bridge Benchmarks

New Intel Ivy Bridge processors on the market (E5-26## v2)Manufacturing process: 0.022 micron

Sandy Bridge: 0.032 micron

Up to 12 cores

Sandy Bridge: up to 8 cores

Increasing HS06 score according to number of cores:

E5-2670 (8 cores, 2.6 GHz, HT on, SL6, default compiler)348 HS06

E5-2670 v2 (10 cores, 2.5 GHz, HT on, SL6, default compiler)411 HS06

Power saving of around 25...30 %

Thanks to DELL for providing test machine

Page 5: STEINBUCH CENTRE FOR COMPUTING - SCC  KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)

Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013

Power Efficiency

Power Usage (W) per Performance Score (HS06)

Worker node class machines at GridKa / E5-2670v2 is a test system provided by DELL

0

2

4

6

8

10

12

14

16

18

20

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

AMD 6168

Intel E5430

Intel L5420

AMD 246

AMD 270

Intel 5160

Intel E5345

Intel E5520(HT on) Intel E5-2670

(HT on)

Wat

ts p

er H

S06

Intel E5-2670v2(HT on)

Page 6: STEINBUCH CENTRE FOR COMPUTING - SCC  KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)

Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013

GridKa dCache & xrootd6 production dCache instances + pre-production setup

5 instances running 2.6, 1 running 2.2

9 PB, 287 pools on 58 servers

Upgrade to 2.6 instead of 2.2 recommended by dCache.orglast minute decision one week before planned downtime

full support for SHA-2 and xrootd monitoring

great support from dCache devs

CMS disk-tape separationmost CMS tape pools converted to disk-only pools

last CMS config changes today

GridKa 1st CMS T1 successfully migrated

two xrootd instances for ALICE2.7PB

15 servers

Page 7: STEINBUCH CENTRE FOR COMPUTING - SCC  KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)

Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013

GridKa Disk Storage

9x DDN S2AA9900150 enclosures

9000 disks

796 LUNs

SAN Brocade DCX

1x DDN SFA10K10 enclosures

600 disks

1x DDN SFA12K5 enclosures

360 disks

14PB usable storage

Page 8: STEINBUCH CENTRE FOR COMPUTING - SCC  KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)

Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013

Evaluating new Storage Solutions

DDN SFA12K-Eallows to run server VMs directly in storage controller

DDN are testing complete dCache instance inside controller

expected benefitsshortening long IO paths: no SAN + FC HBAs, reduced latency

less hardware: less power consumption, improved MTBF

possible drawbackslimited resources in storage controllers for VMs

loss of redundancy

Page 9: STEINBUCH CENTRE FOR COMPUTING - SCC  KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)

Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013

DDN SFA12K-E

Page 10: STEINBUCH CENTRE FOR COMPUTING - SCC  KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)

Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013

Glimpse at Performance

Preliminary performance evaluationIOZONE testing 30-100 parallel threads on XFS file system

still a lot of work ahead

no tuning

file system + controller setup tuning

Page 11: STEINBUCH CENTRE FOR COMPUTING - SCC  KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)

Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013

GridKa Tape Storage

2x Oracle/Sun/STK SL85002x 10088 slots

22 LTO5, 16 LTO4 drives

1x IBM TS35005800 slots

24 LTO4 drives

1x GRAU XL5376 slots

16 LTO3, 8 LTO4

>20k cartridges

17PB

Migration to HPSSplanned for 2014

Page 12: STEINBUCH CENTRE FOR COMPUTING - SCC  KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)

Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013

100G WAN at GridKa

Current WAN setup7x10Gb/s links to LHCOPN, LHCONE, German research network, FZU Prague + 1x1Gb/s link to Poznan

participation in 100G tests at SC2013100G equipment provided by CISCO

100G connection provided by DFN, time-shared by Aachen, Dresden, KIT

plan to move LHCOPN, LHCONE to 100G link in 2014replace old Catalyst border routers

procurement of new Nexus 7k with 100G line cards already underway

requires new arrangement of LHCOPN operation between KIT and DFN

Page 13: STEINBUCH CENTRE FOR COMPUTING - SCC  KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)

Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013

Configuration Management

Still mostly using Cfengine 2

Middleware services used as testbed for puppetStarted in early 2012

Still based on old homegrown deployment infrastructure “CluClo”

Very smooth operation

Now starting to draw up plans for puppet migration we’d like to try many new things: git integration, deployment management with Foreman, MCollective, …

Will be step by step process

Page 14: STEINBUCH CENTRE FOR COMPUTING - SCC  KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

bwLSDF

Andreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013

News from SCC/KIT outside GridKa

new services for state of Baden-Württemberg run by SCC/KITbwSync&Share

“dropbox” for scientistswinner of software evaluation: PowerFolderstart of production Jan 1st 2014expect active 55k users from all universities, 10GB quota

bwFileStoragesimple/overflow storage for scientific dataaccess via SCP, SFTP, HTTPS (r/o) provided by IBM SONASstart of production Dec 1st 2013

bwBlockStorageiSCSI storage over WAN for universities

all services based on storage hosted at Large Scale Data Facility

Page 15: STEINBUCH CENTRE FOR COMPUTING - SCC  KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

bwIDM

The bwIDM Project: Vision

• Federated access to services of the State of Baden-Württemberg

• Access control based on local accounts of the home organizations

„bwIDM is not about establishing IDM systems, it‘s about federating existing IDM systems and services.“

bwLSDFbwCloud

bwArchivebwData

bwHPC

bwIDM

bwServices

20.6.2013 M.Nussbaumer@ISC2013 | Federating HPC access via SAML

Vision: In the state of Baden-Württemberg,

researchers can access decentralized web-

based AND non web-based services by the

use of their local account

15

Page 16: STEINBUCH CENTRE FOR COMPUTING - SCC  KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association.

bwIDM

bwIDM Overview

• bwIDM– …federation of 9 universities of the state of Baden-Württemberg → (non) web-based services– …federates the access to non web-based services such as grid, cloud, and HPC resources.

• LDAP Facade– Deployable, operable, and maintainable approach to federate non web-based services:

• LDAP facade makes active use of the SAML-ECP and AssertionQuery profile• LDAP facade offers users a high usability in trustworthy federations• LDAP facade facilitates temporary trust for scientific portals

• Easy-to-deploy solution for service collaborations of universities, research centres or companies• Single registration process per service → service access• Successfully deployed in testing environments

• Deployed Services– Federated HPC Service “bwUniCluster” (8640 cores, 40.8 TIB Ram, IB FDR) going live in Q4/2013– Federated Sync&Share Service going live in Q1/2014

• Any Questions? Feel free to contact me: [email protected].

20.6.2013 16M.Nussbaumer@ISC2013 | Federating HPC access via SAML

If you have to bring non web-based services together with SAML, make use of the LDAP facade!