Top Banner
Level 3 Review NT Support The Level 3 Group Michael Clements Doug Chapin Dave Cutts Andy Haas Sean Mattingly Gordon Watts Short Term Long Term
25

Level 3 Review

Jan 03, 2016

Download

Documents

lucius-ramirez

Level 3 Review. Short Term. NT Support. Long Term. The Level 3 Group. Michael Clements Doug Chapin Dave Cutts Andy Haas Sean Mattingly Gordon Watts. Short Term. Goal. Build a L3 Filter that can handle the data rates between now and September 1 (or start of Linux farm). Bug fixes too. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Level 3 Review

Level 3 Review

NTSupport

The Level 3 GroupMichael Clements

Doug ChapinDave CuttsAndy Haas

Sean MattinglyGordon Watts

Short Term

Long Term

Page 2: Level 3 Review

2Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Short Term

Goal Build a L3 Filter that can handle the data rates between now and September 1 (or start of Linux farm). Bug fixes too.

Requires

•Faster build/release system•Verification (minimal).•Better filter author access to NT development environment & experts

Page 3: Level 3 Review

3Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Faster Build/Release System

New Machine

•Initial tests indicate a x2 speed up.•Multiple logins allowed (flexibility)

Status

•Final release system script testing•Porting of packages to NT5 (we know what has to be changed)

•Commit to schedule

Do not think there is any real development work left here.Do not think there is any real development work left here.

Page 4: Level 3 Review

4Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Faster Build/Release System

Improve Build System

•Initial experiments indicate ctest_nt x10 faster than gmake

•Recent upgrade allows one to run tests inline.

Bug Fixes?

Status•Must be incorporated into release structure•How to deal with “legacy” packages that don’t support the CTEST interface.

Likely 1 week of development work here, then some amount of integration time.Likely 1 week of development work here, then some amount of integration time.Initially L3 group, then L3

group & release managers

Page 5: Level 3 Review

5Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

NT5 Port

Status•Done•Changes must be fed back into cvs

•Some have (SSS)•Most script changes•Less than 10 individual changes

•All Scripts

•Full Build system must be tested•Most complex parts have been tested (bldpara)•Initial scripts, etc.

Less than a week of work for L3 Group, Rel Mgrs. Simple Author UpdatesLess than a week of work for L3 Group, Rel Mgrs. Simple Author Updates

Page 6: Level 3 Review

6Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Verification

Verification System•The old build system

•more effective use of the 4 cpus.•Have money for extra disk

•Keep local copies of files.•Initially by hand, later by script

Status•Have to use build systems until verification is ready•Just starting (ftp doesn’t like more than 256 character file names!).

2 weeks of development time & setup time. Requires both L3 Group and Filters Group2 weeks of development time & setup time. Requires both L3 Group and Filters Group

~7000 events/hour

Page 7: Level 3 Review

7Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Better Development

Author Node•Maintained environment for users to develop code

•No more installs on own machine•Fast machine. Go to Central Processor Model.

•Install just too complex•All platforms, particularly NT

Status•One up and going. Configuration complete?

•Unix access•Second one ordered•$$ left over for third and possibly fourth

•So cheap now!

No real work left (one works, others will).No real work left (one works, others will).

Page 8: Level 3 Review

8Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Code State

State Of Code•Minimal number of changes to get nt41 to build•Looks like nt43 went!•Test-Script Work

•Package authors should translate filename locations using cygwin

•About half already do.

Status•New Tools Coming in (CPS, Tracking)•Most Errors are understood by us•What can we do to help get them fixed?

Figuring out the error requires very little timeFiguring out the error requires very little time

Feedback Loop?

Page 9: Level 3 Review

9Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Expert Access

Contacting Us?

•Since last week there has been a flood•Almost too much for us to keep up with!

•Most questions are one time fixes•Get them fixed, will not pop up again.

•Quite gratified with the communication•SMT tracking, Jet Finding, real filtering in the control room!

•Many Many people contributed to this!

Always room for improvement; ears & Inbox openAlways room for improvement; ears & Inbox open

Page 10: Level 3 Review

10Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Long Term

Goal Maintainable and smooth process for building, developing, etc. L3 Executables. Complete verification.

Requires

•Faster build/release system•Verification•Better filter author access to NT development environment & experts

Page 11: Level 3 Review

11Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Build System

Improvements

•Recommend moving away from SRT_D0 no matter what•Horribly inefficient build system.

•On NT, move away from complex cygwin dependencies•Get rid of gmake if at all possible

•Base work on ctest_nt•Fast, already well supported by L3 Group

Schedule

•Proof of principle by July 1.•First production by August 1.•Improvements through fall.

0.5 FTEs for duration

Page 12: Level 3 Review

12Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Build System

Maintain It

•Once stable take the on-purpose view it need not be modified often

•L3 filters have a much more restrictive environment than offline packages.

•Few extra feature for L3 do not compete with all of L3

•Less changes, more stability.

Page 13: Level 3 Review

13Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Doing Builds

Goal

•Make build system boring•Close to Ferbelizing it•Remove hand art.

•Will still need someone to do builds continuously•Target: 0.25 FTE!

0.5 down to 0.25 FTEs for first year

Decreases until Run 2b upgrade due to reduce changes??

Page 14: Level 3 Review

14Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Verification

Email from AmberWhat Can One CPU Do?

•Same speed as L3 Farm•Not memory limited, etc.•100ms per event•0.25 MB/event

•0.86 million events/day•2.5 MB/sec•210 Gig

Bug Fixes

•2-4 nodes, dual CPU, data local for speed

approximately one day’s worth of processing

Or, use production system…

L1/L2 Issues

Page 15: Level 3 Review

15Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Verification

Production Release

•Order of magnitude larger•10-20 million events•Terabyte of disk space ($2K?)

•SAM Integration

Page 16: Level 3 Review

16Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

NT Environment

Author Nodes

•Just implemented•Undoubtedly will need refinement.

•Increase number of nodes depending upon demand•Each node is very cheap

0.1 FTEs for life of experiment

Maintain Author Nodes

0.1 FTEs, but in clumps (l3 group)

(L3 group & Rel Mgrs?)

Page 17: Level 3 Review

17Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Software Changes

OS Upgrades

•MS’s slavish commitment to backwards compatibility has some benefits

•Level 3 is a conservative system•Imagine OS upgrades will be about once every two years.

Testing: 0.05 FTE (based on NT4->NT5 experience).

Implementing: 0.05 FTE

Some true no matter what we do

Page 18: Level 3 Review

18Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Farm Management

OS, general management

•Automatic tools (AutoStart, etc.).•Minimal security updates (default deny ACLs).

0.05 FTEs (L3 Group)

General Management

•Configure changes, etc.•All based on the web

(L3 Group)0.05 FTEs

True no matter the scheme we pick

Page 19: Level 3 Review

19Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Expert Access

Brown & UW Commitment

We will maintain the Level 3 Trigger/DAQ no matter what its form while the experiment takes

data!

We will maintain the Level 3 Trigger/DAQ no matter what its form while the experiment takes

data!

Providing NT expertise to the experiment is part of this commitment!0.1 FTE, in clumps(L3 Group)

Page 20: Level 3 Review

20Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Where are we now?

Releases

nt44 is an excellent releaseSmall set of filters in release

Several more poised to come into release.

Build System

Works, but slowctest_nt improvements with eye to release building Close to ready to move to build machine

Verification

Just startingNeeds our effort

Ignoring L3 Farm IssuesWill have to occur no matter what we do

Page 21: Level 3 Review

21Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Stages

Stage 1 June 1, 2001 Minimal Set of Filters running, using new build machine, NT5, author node(s) wide open. High speed readout from L3. Verification by hand.

Stage 2 July 15, 2001 robust set of filters, Build System Speed improvement demonstrated, filtering by default. Small scale automatic verification. Author nodes final.

Stage 3 September 1, 2001??

Build system speed improvements finished. Full scale verification being implemented. Build system undergoing minimal changes

Page 22: Level 3 Review

22Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Doing Both

Minimal Set of Filters

Debugged Filters +

unpacking

Linux L4 Farm Initially Ready

Linux L4 developmen

t work

Sept 1Sept 1

June 1

Now

Page 23: Level 3 Review

23Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Option 2

L3Node

L3Node

L3Node

L3Node

FCH

LinuxNode

LinuxNode

LinuxNode

LinuxNode

OnlineC/R

FCH

Feynman(SAM)

Examines

Feynman

FrontEnd

CratesSB

FCHMCH

Lots of CPU

Page 24: Level 3 Review

24Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Possible 3rd Alternative

Use Level 3 as a pre-filter/pipeline

Raw Bandwidth: 5 mb/sec

Unpacked bandwidth/node: 20 mb/sec

GB already?

Simple Filters and Tools: x5 reduction

200 Hz outUnpacked bandwidth/node: 4 mb/sec

•Make use of idle CPU power•Maintain what you’ll have by the end of September

Simple Static

Page 25: Level 3 Review

25Gordon Watts (UW Seattle) L3 Review 5/4/01 DD

Conclusions

• Short term effort– Has a well functioning system by end of summer/fall

• Long Term– Supportable and maintainable

• What is missing?– Better understanding of verification– If go the Linux route have to support both; not

thought through.

• Third alternative– Makes use of available CPU– Maintains current investment work.