Top Banner
Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division
28

Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

Mar 28, 2015

Download

Documents

Makayla Lowe
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

Fermilab Site Report

Mark O. KaletkaHead, Core Support Services Department

Computing Division

Page 2: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

CD mission statement

• The Computing Division’s mission is to play a full part in the mission of the laboratory and in particular:

• To proudly develop, innovate, and support excellent and forefront computing solutions and services, recognizing the essential role of cooperation and respect in all interactions between ourselves and with the people and organizations that we work with and serve.

Page 3: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

How we are organized

Page 4: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

We participate in all areas

Accelerator Tev BPM project and many projects to help Run II luminosity goals, Accelerator Simulation

High Energy Physics experiments Older Fixed Target analysis, CDF, DZero, MiniBoone, MINOS, MIPP, Testbeam, CMS, BTeV and future proposals such as Minerva and Nova and future Kaon expts

Astrophysics Experiments Pierre Auger, CDMS, SDSS and future proposals such as SDSS extension, Joint Dark Energy Mission (SNAP), and Dark Energy Survey

Theory Lattice QCD facility

Page 5: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

Production system capacities

Page 6: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

Growth in farms usage

Page 7: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

Growth in farms density

Page 8: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

Projected growth of computers

4178

3589

2089

1500

430

365

215

150

0

500

1000

1500

2000

2500

3000

3500

4000

4500

FY04 FY05 FY06 FY07 FY08

No

de

s

Computing Nodes Sever Nodes

Page 9: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

CD Computer Power Growth

0

500

1000

1500

2000

2500

19

95

19

96

19

97

19

98

19

99

20

00

20

01

20

02

20

03

20

04

20

05

20

06

20

07

20

08

20

09

KV

A

Projected KVA Actual KVA

FCC Max 750

Projected power growth

Page 10: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

Computer rooms

• Provide space, power & cooling for central computers• Problem: increasing luminosity

– ~ 2600 computers in FCC– Expect to add ~1,000 systems/year– FCC has run out of power & cooling, cannot add utility

capacity• New Muon Lab

– 256 systems for Lattice Gauge theory– CDF early buys of 160 systems + 160 CDF existing systems

from FCC– Developing plan for another room

• Wide Band– Long term phased plan FY04 – 08– FY04/05 build: 2,880 computers (~$3M)– Tape robot room in FY05– FY06/07: ~3,000 computers

Page 11: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

Computer rooms

Page 12: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

Computer rooms

Page 13: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

Storage and data movement

• 1.72 PB of data in ATL– Ingest of ~100 TB/mo

• Many 10’s of TB fed to analysis programs each day

• Recent work:– Parameterizing storage

systems for SRM

• Apply to SAM

• Apply more generally

– VO notions in storage systems

Page 14: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

FNAL Starlight dark fiber project

• FNAL dark fiber to Starlight– Completion: Mid-June,

2004

– Initial DWDM configuration:

• One 10 Gb/s (LAN_PHY) channel

• Two 1 Gb/s (OC48) channels

• Intended uses of link– WAN network R&D projects

– Overflow for production traffic:

• ESnet link to remain production network link

– Redundant offsite path

ESNET

CERN

STARLIGHT

ESnetChicago

PoP

generalinternet

UKLightFormerMREN sites

I-Wire

622Mb/s

CAnet

ResearchNetwork B

ResearchNetwork A

BorderRouter

Fermilab

Abilene(Internet 2)

1 G

b/s

{10

Gb

/s s

oo

n}

Key:Production Traffic:R & D Traffic:

Page 15: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

General network improvements

• Core network upgrades– Switch/router (Catalyst 6500s) supervisors

upgraded:• 720 Gb/s switching fabric (Sup720s); provides

40Gb/s per slot– Initial deployment of 10 Gb/s backbone links

• 1000B-T support expanded– Ubiquitous on computer room floors:

• New farms acquisitions supported on gigabit ethernet ports

– Initial deployment in a few office areas

Page 16: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

Network security improvements

• Mandatory node registration for network access– “Hotel-like” temporary registration utility for visitors– System vulnerability scan is part of the process

• Automated network scan blocker deployed– Based on quasi-real time network flow data

analysis– Blocks outbound & inbound scans

• VPN service deployed

Page 17: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

Central services

• Email– Spam tagging in place

• X-Spam-Flag: YES– Capacity upgrades for

gateways, imapservers, virus scanning

– Redundant load sharing• AFS

– Completely on OpenAFS– SAN for backend storage– TiBS Backup system– DOE-funded SBIR for

performance investigations• Windows

– Two-tier patching system for Windows

• 1st tier under control of OU (patchlink)

• 2nd tier domain-wide (SUS)• 0 Sasser infections post-

implementation

Page 18: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

Central services -- backups

• Site-wide backup plan is moving forward– SpectraLogic T950-5– 8 SAIT-1 drives– Initial 450 tape capacity for 7TB pilot project

• Plan for modular expansion to over 200 TB

Page 19: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

Computer security

• Missed by Linux rootkit epidemic– but no theoretical reason for immunity

• Experimenting w/ AFS cross-cell authentication– w/ Kerberos 5 authentication– subtle ramifications

• DHCP registration process– includes security scan, does not (yet) deny access– a few VIP’s have been tapped during meetings

• Vigorous self-scanning program– based on nessus– maintain database of results– look especially for “critical vulnerabilities” (& deny access)

Page 20: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

Run II – D0

• D0 reprocessed 600M events in fall 2003– using grid style tools, 100M of those event processed offsite at 5 other

facilities– Farm production capacity is roughly 25M events per week– MC production capacity is 1 M events per week– about 1B events/week on the analysis systems.

• Linux SAM station on a 2 TB fileserver to serve the new analysis nodes– next step in the plan to reduce D0min– station has been extremely performant, expanding the Linux SAM cache– station typically delivers about 15 TB of data and 550M events per week.

• Rolled out a MC production system that has grid-style job submission– JIM component of SAM-Grid

• Torque (sPBS) is in use on the most recent analysis nodes– has been much more robust than PBS.

• Linux fileservers are being used as "project" space– physics group managed storage with high access patterns– good results.

Page 21: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

MINOS & BTeV status

• MINOS– data taking in early 2005– using “standard” tools

• Fermi Linux• General-purpose farms• AFS• Oracle• enstore & dcache• ROOT

• BTeV– preparations for CD-1 review by DOE

• included review of online (but not offline) computing• novel feature is that much of the Level2/3 trigger

software will be part of the offline reconstruction software

Page 22: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

US-CMS computing

• DC04 Data Challenge and the preparation for the computing TDR– preparation for the Physics TDR (P-TDR)– roll out of the LCG Grid service and federating it with the

U.S. facilities

• Develop the required Grid and Facilities infrastructure– increase the facility capacity through equipment upgrades– commission Grid capabilities through Grid2003 and LCG-1

efforts– develop and integrate required functionalities and services

• Increase the capability of User Analysis Facility– improve how a physicists would use facilities and software– facilities and environment improvements– software releases, documentation, web presence etc

Page 23: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

US-CMS computing – Tier 1

• 136 Worker Nodes (Dual 1 U Xeon Servers and Dual 1U Athlon)– 240 CPUs for Production (174 kSI2000)– 32 CPUs for Analysis (26 kSI2000)

• All systems purchased in 2003 are connected over gigabit

• 37 TB of Disk Storage– 24TB in Production for Mass Storage Disk Cache

• In 2003 we switched to SATA disks in external enclosures connected over fiber channel

• Only marginally more expensive than 3ware based systems, and much easier to administrate.

– 5TB of User Analysis Space• Highly available, high performance, backed-up space

– 8TB Production Space

• 70TB of Mass Storage Space– Limited by tape purchases and not silo space

Page 24: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

US-CMS computing

Page 25: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

US-CMS computing – DC03 & GRID 2003

Over 72K CPU-hours used in a week100 TB of data transferred across Grid3 sitesPeak numbers of jobs approaching 900Average numbers during the daytime over 500

Page 26: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

US-CMS computing – DC04

0

5000

10000

15000

20000

1-M

ar-2

004

8-M

ar-2

004

15-M

ar-2

004

22-M

ar-2

004

29-M

ar-2

004

5-A

pr-2

004

12-A

pr-2

004

19-A

pr-2

004

26-A

pr-2

004

Nu

mb

er

of

Tra

ns

ferr

ed

file

s

Page 27: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

1st LHC magnet leaving FNAL for CERN

Page 28: Fermilab Site Report Mark O. Kaletka Head, Core Support Services Department Computing Division.

And our science has shown up in some unusual journals…

“Her sneakers squeaked as she walked down the halls where Lederman had walked. The 7th floor of the high-rise was where she did her work, and she found her way to the small, functional desk in the back of the pen.”