Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 1 Data Taking Status and Plans Achievements Concerns Plans Arnd Meyer, RWTH Aachen DØ Collaboration Meeting October 8, 2003
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 1
Data Taking Status and Plans
AchievementsConcerns
Plans
Arnd Meyer, RWTH AachenDØ Collaboration Meeting
October 8, 2003
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 2
Data Taking Status
Total datasample ontape withcompletedetector> 200pb-1
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 3
Data Taking Status cont.
● Lab reached its goal of delivering 225pb-1 in FY03
Also for DØ: BD delivered
∫L dt = 227.7pb-1 in FY03
26pb-1 per month since May
But had to run 6 weeks longer than hoped
● Cuts into our running time next year
Startup November 17 – difficult to anticipate rapid startup
Six week shutdown next summer / fall
● We do not expect to get significantly more ∫L dt in FY04 than we got this year
25% pbar tax (Recycler commissioning), studies ⇒ 233-328pb-1 in FY04
See Dave Mc Ginnis' presentation on Oct 3 ADM
FY02 FY03
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 4
FY 04 Luminosity Profile
More (2x/week) andshorter (8 hours)accelerator study periods
Studies only if >140 hrsof store time in theprevious 14 days
Higher deliveredluminosity throughimproved stacking rate
Improve stacking ratethrough shorter pbarproduction cycle time(2.4sec 1.7sec) 11.3mA/hr (FY03) 18 mA/hr (FY04)
≃ 3 months turn-onafter shutdown
pessimistic∫L dt ≃ 233pb-1
design∫L dt ≃ 328pb-1
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 5
... and Plans / Dreams
You are here
3⋅1032cm-2s-1
2004
● STT fully commissioned● Missing pieces of CTT
fully commissioned● Taking data with rates of
2.5kHz / 1kHz / 50 Hz after L1 / L2 / L3 and 90% average efficiency
● CPS / FPS used in the trigger and for physics
● Most data quality problems are caught online (Global Monitoring et al.)
● 1-2 fewer people on shift
● Taking shifts and improving the detector is not considered a necessary evil
● We have 0.5fb-1 of good data on tape● Reco takes 1sec/event on my 2-year old desktop
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 6
Data Taking Efficiency
Shutdown
Winter '03Shutdown
The LuckyWeek
Pre-shutdownspecial runs/
studies
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 7
Data Taking Efficiency cont.
● Average data taking efficiency for 2003 is 86%● Current upper limit is ≃ 95%
3-4% global front end busy
1% begin and end store transitions
<1% run transitions
● Typically in the upper 80%'s for the last six months
Since Beaune, “lost” 83.5 hours of store time (12.6% by time)
12 hours for special runs
Largest single failure (4 hrs): low airflow trips of L1CAL on July 27/28. Fan belt replaced. Symptomatic: some of the largest downtimes are one-time occurences
Failures by component (without special runs):
– SMT: 12 hours
– Muon/L1Muon: 12 hours (trips, readout errors/crashes, trigger problems)
– CAL: 9 hours (mostly BLS power supplies and hot trigger); + 5 hours L1CAL
– CFT/CTT/PS: 5 hours
– L3/DAQ/Online: 4 hours
Tracking crates readoutcollaborations' decision: L1A vs. FEB
fairly optimized, contiuouseffort to keep this low
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 8
Data Taking Efficiency cont.
● Much of the time running close to our desired efficiency to tape – 90%
● At the same time, data quality improves
Number of conditions that causes us to stop data taking (automagically or manually) is continually increasing
● Credit to many (few) dedicated people!● Last weeks of data taking had large number of special runs,
increased number of accelerator studies – still ran over 80%● Large downtimes are discussed at the weekly operations meeting● Several systems marginal in terms of expert coverage: one
or no resident expert – no manpower for proactive improvements
http://www-d0.fnal.gov/runcoor/
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 9
Run II Bests
Regularly updated:
Best days by data taking efficiency
– 95.0% on June 22nd; so far 8 days with 93% or better efficiency
Best runs and days by recorded luminosity (Aug 10, 488nb-1; May 4, 1.68pb-
1)
Best stores by initial DØ luminosity (Aug 10, 4.55⋅1031cm-2s-1)
http://www-d0.fnal.gov/runcoor/
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 10
A “Typical” Store
Wobbly L1 rates (CAL)Initial Lum. 3.9⋅1031cm-2s-1
Store lost (quench)5-3% L1 Busy
Present max. rate guidelines:Level 1 1.4 kHz FEB < 5% (5-10% headroom to accountLevel 2 800 Hz Muon r/o for rate fluctuations)Level 3 50 Hz Offline (30% room)
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 11
Control Room Shifts
● It is a burden on the collaboration to fill that many shifts (and schedule them!)
The shifter duty is 7 shifts / 6 months per person on masthead
Only about 1 shift per month per person (on average!)
● Calorimeter and Muon shifts consolidated into CalMuon shifts since June
Rocky at times, but overall OK
Cost of additional training offset by savings in total number of shifts
There is more training involved – took some time to be realized by “old” shifters
● Next natural choice for merging is SMT / CFT – will require initiative from detector groups (clear instructions, simplify, automate)
● More than a third of the collaboration have not yet taken a single shift in 2003
● Request for new DAQ shifters: response was mediocre
Have a very good crew currently, but could use more volunteers – we need >26 new DAQ shifters per year. Need one more DAQ shifter this year!
I think I am too naïve for this job
The fact that we are collecting data with high efficiency is to a large partdue to the presence of 5 – 6 well trained people in the control room
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 12
Data Quality & Global Monitoring
● Online data quality monitoring consists of three (four) parts
Significant Event System (“Slow Control”)
– Catches an increasing number of hard- and software failures
– In many cases pauses the run to ensure consistent data quality
– Working very well, could use additional experts guidance
DAQ Artificial Intelligence
– Notifies shifter of abnormal conditions (global rate fluctuations, BOT trigger rate, ...)
– Automagically fixes certain problems (SCLinit), e.g. sync. problems in L2
Sub-detector monitoring examines
– Many expert-level plots (but experts are not generally on shift)
Global Monitoring
– Trigger rates, Trigger Examine, Vertex Examine, Physics Examine
● Global Monitoring has great potential, but there are many issues – examples follow
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 13
– If we can't fill all shifts, need to think about merging with Captain's and other shifters' duties
Global Monitoring cont.
● Technical Issues
During the transition from trigger list v11 to v12, ran for weeks with wrong/bad reference plots (different triggers, then rapidly changing prescales)
PhysEx uses random sample – should be based on certain triggers
Low statistics (slow reconstruction)
● Psychological Issues
Lack of interaction between detector shifters and GM
– GM detects feature in Gtrack phi distribution – SMT shifter cannot correlate with his occupancy plots
– Need effort from all detector groups to bring their expertise into GM plots (only Muon group has done this so far)
● Organizational Issues
Shifts not being filled (e.g. 8 in August)
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 14
DQ & GM cont.
● LmTrigger urgently needs improvement
For example averaging over different time periods, uncertainties, luminosity dependence of trigger cross sections, ...
Extremely important tool to identify problems quickly
● Overall, somewhat slow progress (remember Beaune?)● If we want to continue reducing the number of shifters in the control
room, GM needs a major effort (time, people, attention) From the collaboration – great task for groups new to DØ
Automation should be the goal
Need to catch all major problems online● Up to one third of the data is thrown away in the analysis stage –
sad!
Everybody who discards data through “bad” or “good” run lists should
make that extra step and think about how to catch the problems earlier!
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 15
Shutdown Plans & Progress
● In the middle of 10 week shutdown (Sep 8 – Nov 17)
7-8 weeks for experiments, 2-3 additional weeks with limited access
● Four scheduled power outages – recovered well from the first two (and unscheduled outage on Sep 26), two more on October 13 and 18
● 24x7 DAQ shifts and day shift Captains – thankless task! (“Good God, please, someone, if you see me in the parking lot, run me over. Kill me. I am so bored to death.” - anonymous DAQ shifter)
Keep as much as possible of the detector in working order
● Major goals for the shutdown
Improve reliability: reduce access time, periods with incomplete detector etc.
Improve quality of the data: reduce calorimeter noise, repair Silicon HDI's
● Detector groups together with mechanical and electrical support groups have developed detailed job lists including detector opening and closing, allocation of manpower resources from detector groups and support teams, survey as needed during major moves
● Access went smoothly so far – no accidents, on schedule, great support● See Friday's talks and parallel sessions for details
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 16
Silicon Status
● Bias scans (August): measure depletion one layer at a time
Use tracks from CFT and other SMT layers to determine cluster charge as a function of HV
Runs with HV varied between 0% and 100% in 10% steps
SMT in full readout (no sparsification)
Average over ladders (statistics)
0
50
100
150
200
250
0.0E+00 5.0E+12 1.0E+13 1.5E+13 2.0E+13 2.5E+13 3.0E+13 3.5E+13 4.0E+13 4.5E+13
Fluence of 1 MeV n / cm 2̂
DS ladder I DS Ladder IIDS wedge SS Ladder ISS Ladder II DSDM LadderHamburg Model (n) Current Dose
● Shutdown: Repairs of failed HDI's / electronics
Before shutdown: 136 disabled (up from “irreducible” 84 just after Jan shutdown)
12 are repaired (2 weeks of 2 shifts/day with 4 people); ≃100 are presently disabled
59 are unstable – study in the coming two weeks. Some recoverable for partial readout
● Cancellation of Silicon replacement means we must plan to operate current detector for the life of the experiment
Have to evaluate what steps can be taken to increase chances of the detector's reliable operation long term
CurrentDose
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 17
CFT and Preshower
● Maintenance of LVPS's and installing upgraded LVPS (better connectors) – this week
● Maintenance of the VLPC He cooling system
● Modifications of AFE boards to remove unused SVX inputs from the readout
Reduce data size and DAQ deadtime
Pedestal RMS's across all channels are higher than before the shutdown, under investigation
Continue testing AFE modifications over the next week.
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 18
Calorimeter
● Calorimeter Workshop, plenty of presentations during the week
● Replaced all large cooling fans for preamplifier cooling
● Studies of calorimeter noise – access priority given to “noise task force”
Controlled power-up after power outages
Investigating how SMT noise gets into the Calorimeter
Source of 10MHz noise
– Went around with a small antenna to identify sources
Grounding test planned for October 13
– Detailed plan in place
– Disconnect AC, safety ground, telephone, etc.
– Attach current controlled power supply, slowly increase current up to 100A, and look for heat sources
Preliminary findings this week, final report later
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 19
Muons, Lumi, FPD
● Forward Muon First-time access to A layer forward muon tracker (8-12 hours for
opening+closing) completed: replacement of preamplifiers, gas leaks, gas monitors; C layer repairs in progress
Number of non-working channels now 0.15% (trigger counters), 0.5% (drift tubes)
● Central Muon Installation of extra trigger counters under way – running into a few snags
(tight clearance on east side) but no show-stoppers
Installation of 144 remote power cycle relays for front-end electronics and all relevant cabling – on schedule, first driver board tested
Pulled a couple of wires drawing moderate to high currents for investigation
Installed Power PC's in the remaining muon readout crates that had 68k's
● Luminosity system – Cable work in the gaps last Thursday. Come out of the shutdown with complete readout electronics
● Forward proton detector – Maintenance and installation of electronics for full system operation on schedule. Waiting for parts
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 20
Trigger, DAQ, Online, General
● Firmware upgrades on L1CTT and L1Muon, maintenance on L1CAL: progressing well
● All Level 2 Alphas have been replaced with Betas! Running smoothly so far, updated shifter instructions
Now progressing on “worker” and “administrator” code to run with 2 Betas per crate
● Online & Controls Replaced 8 disks that have died over the last 2 years (4 disks for the data
logger)
Offline switch upgrade planned for Oct 13 – required electrical work means extended downtime for all online systems as well
Major online software upgrades this week – Python, Epics, VxWorks
● Still some downtimes ahead for TFW (new TCC etc.) and Online (IP shuffling etc.) upgrades/maintenance
● General detector maintenance – Air handlers, hydraulic systems, vacuum jackets, cooling water systems, ODH heads, etc.
● Old Cryo UPS replaced – Switched to commercial power source last Tuesday, then to new UPS on Thursday
Arnd Meyer (RWTH Aachen) Oct 8, 2003 Page 21
Conclusions
● Detector is running well (better than some people want to make us believe)
DØ was in good shape for physics data taking until the last store
Dmitri, detector group leaders and crews, support groups, daily operations have been doing a great job in operating the detector and have done a lot of planning for a successful shutdown
Data taking efficiency 86% for the year – “physics analysis efficiency”??
214pb-1 integrated luminosity on hands with full detector in readout
Progress in online data quality monitoring not as good as hoped for
● Shutdown progressing well and on schedule
A lot of work scheduled
Brace yourself for 2 more power outages: Oct 13 (½ hour) and Oct 18 (8 hours)
● Bring the detector back into shape by the end of October
Assume access ends on November 17th, close detector the week before
Plan to finish all major jobs in 7 weeks, only limited access during weeks 7 – 10