Top Banner
PHENIX Computing Center in Japan (CC-J) Takashi Ichihara (RIKEN and RIKEN BNL Research Center) Presented on 08/02/2000 at CHEP2000 conference, Pa dova, Italy
16

PHENIX Computing Center in Japan (CC-J)

Jan 12, 2016

Download

Documents

orinda

PHENIX Computing Center in Japan (CC-J). Takashi Ichihara (RIKEN and RIKEN BNL Research Center ) Presented on 08/02/2000 at CHEP2000 conference, Padova, Italy. Contents. 1. Overview 2. Concept of the system 3. System Requirement 4. Other requirement as a Regional Computing Center - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PHENIX Computing Center  in Japan (CC-J)

PHENIX Computing Center in Japan (CC-J)

Takashi Ichihara

(RIKEN and RIKEN BNL Research Center)

Presented on 08/02/2000 at CHEP2000 conference, Padova, Italy

Page 2: PHENIX Computing Center  in Japan (CC-J)

Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

Contents

1. Overview

2. Concept of the system

3. System Requirement

4. Other requirement as a Regional Computing Center

5. Plan and current status

6. WG for constructing the CC-J (CC-J WG)

7. Current configuration of the CC-J

8. Photographs of the CC-J

9. Linux CPU farm

10. Linux NFS performance v.s. kernel

11. HPSS current configuration

12. HPSS performance test

13. WAN performance test

14. Summary

Page 3: PHENIX Computing Center  in Japan (CC-J)

Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

PHENIX CC-J : Overview

PHENIX Regional Computing Center in Japan (CC-J) at RIKEN

Scope Principal site of computing for PHENIX simulation

PHENIX CC-J is aiming at covering most of the simulation tasks of the whole PHENIX experiments

Regional Asian computing center Center for the analysis of RHIC spin physics

Architecture Essentially follow the architecture of RHIC Computing Facility

(RCF) at BNL

Construction R&D for the CC-J started in April ‘98 at RBRC Construction began in April ‘99 over a three years period 1/3 scale of of the CC-J will be operational in April 2000

Page 4: PHENIX Computing Center  in Japan (CC-J)

Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

Concept of the CC-J System

STKTapeRobot

HPSS

RCF

CRS

RawDST

BigDisk

SMPServers

CAS

DSTμDST

μDSTPhysics

40TB

Tape drive unitsto duplicate data

Tapes(50GB/volume)

Duplicating Facility

STKTapeRobot

importDST

DST

BigDisk

SMPServers

PC farmsfor ana. &simulation10k Spectnt95

DST μDST

15TB

PHENIX CC -J

μDSTsim.

μDST

Phys.

sim.ExportSim.

HPSSServers

HPSS

APAN/ESNETWAN

Trackreconstruction

20MB/s

HPSSServers

Tape drive unitsto duplicate data

Tapes(50GB/volume)

Duplicating Facility

PHENIX

Page 5: PHENIX Computing Center  in Japan (CC-J)

Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

System Requirement for the CC-J

Annual Data amountDST 150 TB

micro-DST 45 TB

Simulated Data 30 TB

Total 225 TB

Hierarchical Storage System

Handle data amount of 225TB/year

Total I/O bandwidth: 112 MB/s

HPSS system

Disk storage system15 TB capacity

All RAID system

I/O bandwidth: 520 MB/s

CPU ( SPECint95)

Simulation 8200

Sim. Reconst 1300

Sim. ana. 170

Theor. Mode 800

Data Analysis 1000

Total 11470

Data Duplication Facility

Export/import DST, simulated data.

Page 6: PHENIX Computing Center  in Japan (CC-J)

Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

Other Requirements as a Regional Computing Center

Software Environment• Software environment of the CC-J should be compatible to the PHENIX Offline

Software environment at the RHIC Computing Facility (RCF) at BNL• AFS accessibility (/afs/rhic)

• Objectivity/DB accessibility (replication to be tested soon)

Data Accessibility• Need exchange data of 225 TB/year to RCF

• Most part of the data exchange will be done by SD3 tape cartridges (50GB/volume)

• Some part of the data exchange will be done over the WAN• CC-J will use Asia-Pacific Advanced Network (APAN) for US-Japan connection

• http://www.apan.net/

• APAN has currently 70 Mbps bandwidth for Japan-US connection

• Expecting 10-30% of the APAN bandwidth (7-21 M bps) can be used for this project:

• 75-230 GB/day ( 27 - 82 TB/year) will be transferred over the WAN

Page 7: PHENIX Computing Center  in Japan (CC-J)

Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

Plan and current status of the CC-J

1998 1999 2000 2001 2002

RBRC(BNL)

R&D for CC-J

RIKENWako

Phase 1Phase 2

Phase 31/3 scale2/3 scale

Full scale

Prototype of CPU farmsData Duplication facility

April April April April

CC-J review at BNL(Dec. 1998)

HPSS Software/HardwareInstallation (March 1999)(Supplementary Budget)

CC-J starts operationat 1/3 scale(April. 2000)

Full scale CC-J(Mar. 2002)

CC-Jconstruction

CC-J frontend at BNL

April

CC-J Working Groupformed (Oct. 1998)

PHENIX Exp. at RHIC

Jan. 2000 Mar. 2001 Mar. 2002

CPU farm (number) 64 200 300

CPU farm (SPECint95) 1500 5900 10700

Tape Storage size(TB) 100 100 100

Disk Storage size(TB) 2 10 15

Tape Drive (number) 4 7 10

Tape I/O (MB/s) 45 78 112

Disk I/O (MB/s) 100 400 600

SUN SMP Server unit 2 4 6

HPSS Server unit 5 5 5

Page 8: PHENIX Computing Center  in Japan (CC-J)

Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

Working Group for the CC-J construction (CC-J WG)

CC-J WG is a main body to construct the CC-J

Hold bi-weekly regular meeting at RIKEN Wako, to discuss technical items and project plans etc. Mailing list of the CC-J WG created (mail traffic: 1600 mails /year)

Working Group for the CC-J construction (CC-J WG)

Å@ manager Servers, Network, HPSS T. Ichihara (RIKEN and RBRC)

Å@ technical manager HPSS Y. Watanabe (RIKEN and RBRC)

Å@ computer scientists CPU farms, HPSS N. Hayashi (RIKEN)

Bach queue system S. Sawada (KEK)

System monitor S. Yokkaichi (Kyoto Univ.)

scientific programming coordinator

Å@ coordination H. En'yo (Kyoto and RBRC)

Å@ AFS mirroring H. Hamagaki (CNS, U-Tokyo)

front-end BNL Data duplication Y. Watanabe (RIKEN and RBRC)

Software environment Y. Goto (RBRC)

Prototype CPU farms A. Taketani (RIKEN)

Page 9: PHENIX Computing Center  in Japan (CC-J)

Current configuration of the CC-J

Pentium II

Pentium II

Pentium IIPentium II

Pentium II

Pentium II

Pentium IIPentium II

STKTapeRobot

HPSS

ÇP00GB

SUN E450

NFS Server4CPU, 1GB memory

SP RouterAscend GRF

Tape Mover

Tape Mover

Disk Mover

HPSS Server

Disk Mover

HIPPI

HIPPISWITCH

SerialHIPPI

Pentium II

Pentium II

Pentium IIPentium II

288 GBRAID Disk

100BaseT x 32

RIKEN LAN

RIKENsupercomputer

4 RedWoodSD3 drives

MainBldg.

32 Pentium II (450 MHz)+32 Pentium III (600 MHz)

256 MB Memory /CPU

100 TB

1000BaseSX(9kB MTU)

1000BaseSX

updated on 14 Jan. 2000

ACSLS

GigabitSwitch #2 (L3)

1000BaseSX x 5 (9kB MTU)

1000BaseSX (9kB MTU)

Switch

IBM RS/6000-SP Silver node x 5 (AIX 4.3.2)

HPSS

Privateaddress

GigabitSwitch#1 (L3)

EPS-1000

Alteon 180(9kB MTU)

Alteon 180 (9KB MTU)

Gigabit SwitchCatalyst 2948G

(Alta cluster) * 4 box

Switch

HPSSControll WS

10/100BaseT

SUNACSLS

1000BaseSX

Jumbo FrameGigabit Ethernet

288 GBRaid X 2

HPSS Cache

compacDS20

Pentium III

AFS server(experimental)

Altaclustercontrol WS

SUN E450

G.C.E. Serv er2CPU, 1GB memory

ÇP00GB

1.6 TBRAID Disk

288 GB

Raid (Work)

150GB RaidHPSS Cache

Pentium III

Pentium III

Pentium III

Redhat 5.2

Linux

Comp.Bldg.

PHENIX Computing Center In Japancurrent config.

Page 10: PHENIX Computing Center  in Japan (CC-J)

Photographs of the PHENIX CC-J at RIKEN

TWO SUN E450Data Servers

1. 6 TBRAID5 Disk

CPU Farm of64 CPU

UninterruptablePower Supply (UPS)

UninterruptablePower Supply (UPS)

STK Tape Robot (100 TB [240 TB] )

HPSS Server (IBM RS-6000/SP)StorageTek Tape Robot (100TB [250 TB])

Page 11: PHENIX Computing Center  in Japan (CC-J)

Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

Linux CPU farms Memory Requirement : 200-300 MB/CPU for a simulation chain Node specification

• Motherboard: ASUS p2b• Dual CPU /node (currently total 64 CPU)

• PentiumII (450MHz) 32 CPU + Pentium III (600 MHz) 32 CPU• 512 MB memory / node (1GB SWAP/node)• 14 GB HD /node (system 4GB, work 10 GB)• 100 BaseT Ethernet interface (DECchip Tulip)

• Linux Redhat 5.2 (kernel 2.2.11 + nfsv3 patch)• Portable Batch System (PBS V2.1) for batch queuing• AFS is accessed through the NFS (No AFS client is installed on Linux pc)

• Daily mirroring of the /afs/rhic contents to a local disk file system is carrying out

PC Assemble (Alta cluster) • Remote hardware-reset/power control, Remote CPU temp. monitor

• Serial port login from the next node (minicom) for maintenance (fsck etc.)

Page 12: PHENIX Computing Center  in Japan (CC-J)

Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

Linux NFS performance v.s. kernel

NFS Performance test using bonnie benchmark for 2 GB file• NFS Server : SUN Enterprise 450 (Solaris 2.6) 4 CPU (400MHz) 1GB memory• NFS client : Linux RH5.2, Dual Pentium II 600 MB, 512 MB memory

NFS performance of the recent Linux kernel seems to be improved nfsv3 patch is still useful for the recent kernel (2.2.14)

– currently we are using the kernel 2.2.11 + nfsv3 patch

–nfsv3 patch is available from http://www.fys.uio.no/~trondmy/src/

NFS Write

(per Char)

NFS Write

(Block)

NFS Read

(per Char)

NFS Read

(Block)

2.2.11 0.6 MB/s 0.5 MB/s 4.7 MB/s 5.4 MB/s

2.2.11+nfsv3 7.1 MB/s 6.5 MB/s 6.4 MB/s 9.8 MB/s

2.2.14 1.1 MB/s 1.9 MB/S 4.7 MB/ 5.8 MB/s

2.2.14+nfsv3 5.5 MB/s 5.6 MB/s 6.2 MB/s 10.2 MB/s

Page 13: PHENIX Computing Center  in Japan (CC-J)

Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

Current HPSS hardware configuration

• IBM RS6000-SP• 5-node (silver node: Quadruple PowerPC604e 332 MHz CPU/node)

• Core server : 1, Disk mover : 2, Tape mover : 2• SP switch (300 MB/s) and 1000BaseSX NIC (OEM of Alteon)

• A StorageTek Powderhorn Tape Robot• 4 Redwood drives and 2000 SD3 cartridges (100 TB) dedicated for HPSS

• Sharing the robot with other HSM systems

• 6 drives and 3000 cartridges for other HSM systems

• Gigabit Ethernet• Alteon ACE180 switch for Jumbo Frame ( 9 kB MTU)

• Use of the Jumbo Frame reduces the CPU utilization for transfer

• CISCO Catalyst 2948G for distribution to 100BaseT

• Cache Disk : 700 GB (total), 5 components• 3 SSA loops (50 GB each)

• 2 FW-SCSI RAID (270 GB each)

Page 14: PHENIX Computing Center  in Japan (CC-J)

Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

Performance test of parallel ftp (pftp) of HPSS pput from SUN-E450 : 12 MB/s for one pftp connection

• Gigabit Ethernet, Jumbo Frame (9 kB MTU) pput from LINUX : 6 MB/s for one pftp connection

• 100BaseT - G.Ether - Jumbo (defragment on a switch)

Totally 〜 50 MB/s pftp performance was obtained for pput

CPU usage vs Disk transfer rate

0

20

40

60

80

100

0 5 10 15 20 25 30

Disk transfer rate [MB/ s]

d1d0

Disk transfer ratevs Number of pftp

0

10

20

30

40

50

60

0 5 10 15

Number of pftp

Disktransferrate[MB/s]

Total CPU usage vs numger of pftp (hpssd0+hpssd1)

0

50

100

150

200

0 5 10 15

number of pftp

Disk transfer rate vs Number of pftp

0

5

10

15

20

25

30

0 2 4 6 8 10number of pftp

Disktransferrate[MB/s]

d0d1

Page 15: PHENIX Computing Center  in Japan (CC-J)

Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

WAN performance test RIKEN (12 Mbps) - IMnet - APAN (70 Mbps) -startap- ESnet - BNL

• Round Trip Time for RIKEN-BNL :170 ms• File transfer rate is 47 kB/s for 8 kB TCP widowsize (Solaris default)• Large TCP-window size is necessary to obtain high-transfer rate• RFC1323 (TCP Extensions for high performance, May 1992) describes the me

thod of using large TCP window-size (> 64 KB)

Large ftp performance (641 kB/s = 5 Mbps) was obtained for a single ftp connection using a large TCP window-size (512 kB) over the pacific ocean (RTT = 170 ms)

TCP windowsize FTP transfer rate

(observed)

Theoretical limit

For 170 ms RTT

8 kB 41 kB/s 47 kB/s

16 kB 87 kB/s 94 kB/s

32 kB 163 kB/s 188 kB/s

64 kB 288 kB/s 376 kB/s

128 kB 453 kB/s 752 kB/s

256 kB 585 kB/s 1500 kB/s

512 kB 641 kB/s 3010 kB/s

Page 16: PHENIX Computing Center  in Japan (CC-J)

Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

Summary The construction of the PHENIX Computing Center in Japan (CC-J) at RIKE

N Wako campus, which will extend over a three years period, began in April 1999.

The CC-J is intended as the principal site of computing for PHENIX simulation, a regional PHENIX Asian computing center, and a center for the analysis of RHIC spin Physics.

The CC-J will handle the data of about 220 TB/year and the total CPU performance is planned to be 10,000 SPECint95 in 2002.

CPU farm of 64 processors (RH5.2, kernel 2.2.11 with nfsv3 patch) is stable. About 50 MB/s pftp performance was obtained for HPSS access. Large ftp performance (641 KB/s = 5 Mbps) was obtained for a single ftp conn

ection using a large TCP window-size (512 kB) over the Pacific Ocean (RTT = 170 ms)

Stress tests for the entire system were carried out successfully. Replication of the Objectivity/DB over the WAN will be tested soon. The CC-J operation will be started in April 2000.