1 CNAP 22nd March 2004 CNAP 22nd March 2004 Summary of Atlas Petabyte Summary of Atlas Petabyte Data Store User Group Data Store User Group Meeting Meeting March 4 March 4 th th 2004 2004
1
CNAP 22nd March 2004CNAP 22nd March 2004
Summary of Atlas Petabyte Data Summary of Atlas Petabyte Data Store User Group MeetingStore User Group Meeting
March 4March 4thth 2004 2004
2
Summary of recent developments
• LHC, PP community and hardware upgrade, and media migration (Tim Folkes)
• SRB interface (Bonny Strong)• SE interface for GRIDPP (Jens Jensen)• Belt and braces:
• Improved environmental monitoring• disaster recovery: New off site back-up service.
• OAIS, the RLG and trusted digital repositories (David Giaretta)
3
9940B connections
Switch_1 Switch_2
RS6000 RS6000RS6000 RS6000fsc0 fsc1 fsc1fsc0
9940B 9940B 9940B 9940B 9940B 9940B 9940B 9940B
1 2 3 4 5 6 7 8
11 14 11 1415
fsc1fsc0fsc1fsc0
12 13 12 13 15
rmt1 rmt4rmt3rmt2rmt5-8 rmt5-8rmt5-8rmt5-8
A A A A A A A A
STK 9310 “Powder Horn”
Gbit network
1.2TB 1.2TB 1.2TB 1.2TB
4
SRB Example: CMS
• Largest project using CCLRC SRB services at present is the CERN CMS experiment.
• SRB chosen for Pre-Challenge Production in 2003, producing data for Data Challenge 2004.
• ADS driver for SRB was developed to meet CMS immediate needs.
• SRB server installed for CMS which interfaces to ADS.
5
Future Plans for SRB to ADS
• The SRB driver developed for CMS will be expanded for use by other projects.
• ADS will run an SRB server for integration into any SRB domain.
• Will translate the SRB user name and/or domain name into an ADS owner name.
• Will use the pathtape server to map SRB collection names to ADS 6-character tape names.
6
APS Recent New Users & Potential New Users
• Recent New Users• National Crystallography Service,
Southampton University (~2TB/yr?)• WASP (30TB/yr?)• VIRGO Consortium (3TB/yr?)
• Potential New users• Integrative Biology (15TB/yr?)• Diamond? (1-3PB/yr?)• BBSRC (BITS)? 10-20TB/yr?)• Arts and Humanities Data Service?
(2TB/yr)
7
Actual Growth 1997-2003
-20000
0
20000
40000
60000
80000
100000
Jun-
97
Sep-9
7
Dec-9
7
Mar
-98
Jun-
98
Sep-9
8
Dec-9
8
Mar
-99
Jun-
99
Sep-9
9
Dec-9
9
Mar
-00
Jun-
00
Sep-0
0
Dec-0
0
Mar
-01
Jun-
01
Sep-0
1
Dec-0
1
Mar
-02
Jun-
02
Sep-0
2
Dec-0
2
Mar
-03
Jun-
03
Sep-0
3
Dec-0
3
Time years
Dat
a V
olu
me
(GB
)
Cumulative Data Volume (GB)
Actual Growth (GB)
8
Questionnaire responses
62% from CCLRC; 38% external75% currently using ADS; 25% not currently using or not
users.
Average years of use 7.4Max years of use 20.0Min years of use 0.8SD years of use 6.6 Some role descriptions of those responding: “Sys admin”, “Data Analysis and data provision”, “Experiment
coordinator”, “Archiver”, ”User”, “Project Data Storage Manager”, “Responsible for project back-ups”, “Project Manager”.
9
Questionnaire – Motivation and assessment
“Convenient”, “Easy”, “Reliable”, “Support available”, “Secure”, “Long term back up”, “Large volume”
“No need to get involved with tape storage”;“No perceived alternative”
Mean Score (out of 10) 8.2
Min 5.0
Max 10.0
SD 1.8
10
Questionnaire – Web page usage
Web page usage %
Never 21
Rarely 14
Occasional 57
Often 7
11
Questionnaire – Communication & Awareness
Preferences for improved methods of communication
% For% Against% Maybe
Need for list server 71 29 0Need for user group meeting 57 29
14
User awareness of recent developmentsAwareness of Aware (%) Not aware (%)
Hardware upgrade 79 21
SE interface 29 71
SRB interface 50 50
12
Improvements or changes required to the service (1)
• Backup service available on wide platform i.e Windows PC etc
• Require SRM interface• Need to store data sets with long names (I.e.
> 6 chars) - and better than pathtape look-up is required
• Native support for full path names (ie. not having to use the pathtape service). Tiny tape names
• Use email more for known downtimes etc• Ability to store large files (> 2Gb)
13
Improvements or changes required to the service (2)
• More online storage / caching (depending on future requirements)
• Web / Grid interface• User-queryable database of usage statistics,
e.g. to find out my top-100 datasets, or to see how many times this year / month / etc a particular item has been accessed. Having this as a database that I can query using JDBC from my own management applications would be even better than static reports.
• Metadata lookups: it would be useful to check the file size directly from flfsys
14
Improvements or changes required to the service (3)• Transparent file access (HSM) so that we could
forget about (virtual) tapes • Fix the problem between Solaris and the ADS
software regarding multiple files on ADS datasets;• Provide a backup and archive interface for NT
servers.• Really good tape changer driver mapped into
Windows server 2003. (More support required)• Quicker access to off line tapes to improve speed of
restores.• More documentation.• More user-friendly commands for such things as
rules• Price control.
15
Ranked User issues
question User specified Issue Mean response
(A-K)
3 Need to store data sets with long names (I.e. > 6 chars) - and better than pathtape look-up is required
7.9
4 Native support for full path names (ie. not having to use the pathtape service).Tiny tape names
7.7
6 Ability to store large files (> 2Gb) 7.3
18 Price control. 6.5
8 Web / Grid interface 6.4
5 Use email more for known downtimes etc 6.2
16 More documentation. 6.1
7 More online storage / caching (depending on future requirements) 5.6
17 More user-friendly commands for such things as rules 5.6
1 Backup service available on wide platform i.e Windows PC etc 5.4
15 Quicker access to off line tapes to improve speed of restores. 5.3
9 User-queryable database of usage statistics, 5.0
16
Conclusions (1)
Responses have been received mainly from technical, hands-on users with a good balance from both within CCLRC and from external users.
The majority of responses have been received from people who are currently using the Data store. Most have many years of experience of using the Data Store.
The responses received represent approximately 20% of the active users. (Total number of active[1] users = 84)
Given 1,2 and 3 above, the responses received are from a knowledgeable section of experienced users both internal and external to CCLRC, who comprise a representative proportion of all current active users. On this basis the responses can be believed and should be used reliably.
17
Conclusions (2)
Most users understand the advantages of the ADS. I.e. they know what they want.
Overall, most users get what they want from the service (8.2/10). We now have a measure from which to improve.
Some of the improvements identified by the users have already or are now being addressed. Of those that are not, further clarification is required in order to understand how important the issue is to other users, and to clarify the problem adequately to consider appropriate solutions. What mechanisms could be used to achieve this?
Most users were aware of the recent hardware upgrade, although a surprisingly high proportion of users (21%) were not. Most users were unaware of the SE interface, and only half were aware of the SRB interface. This matters because there are improved services coming on line from the development team, which some users may wish to take advantage of.
18
Conclusions (3)Most users (64%) use the web page at least
occasionally, whereas 35% use it rarely or never.
Communication between users and development team needs to be improved. Given that most users make at least occasional use of the
web pages, the most simple and effective means of doing so is to keep the web site up-to-date with current developments. However, this will not be successful for around one third of users.
Almost 80% of users are in favour of a email list serv. Service. The combination of this with an improved web site should be adequate.
Almost 60% of users are in favour of User group meetings. These should be continued, probably yearly.
19
Backups
20
Digital Curation Centre (DCC)
• Joint collaboration between CCLRC, UKOLN, and Edinburgh and Glasgow Universities.
• Provide advice, support, research and development into aspects of Digital Curation for the UK HE community
• Funded jointly by JISC and EPSRC - £1m/year for three years initially. Feb 2004- 2007
• Establish collaboration with industrial partners…
21
ADS Running Costs 04/05. (Option 1).
H/W maintenance11%
S/W maintenance3%
Hardware15%
Network0%
Other5%
Staff costs66%
22
3590/9940 Drive connections (old)
STK 9310 ~6000 slots
3590 3590 3590 3590
RS6000 RS6000 RS6000 RS6000
54G 216G108G108G
100Mbit Network
9940 9940 9940 9940
23
Real drive performance
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
01
/01
/20
03
15
/01
/20
03
29
/01
/20
03
12
/02
/20
03
26
/02
/20
03
12
/03
/20
03
26
/03
/20
03
09
/04
/20
03
23
/04
/20
03
07
/05
/20
03
21
/05
/20
03
04
/06
/20
03
18
/06
/20
03
02
/07
/20
03
16
/07
/20
03
30
/07
/20
03
13
/08
/20
03
27
/08
/20
03
10
/09
/20
03
24
/09
/20
03
08
/10
/20
03
22
/10
/20
03
05
/11
/20
03
19
/11
/20
03
03
/12
/20
03
17
/12
/20
03
31
/12
/20
03
14
/01
/20
04
28
/01
/20
04 0
5
10
15
20
25
30
Mbytes/sec
Mounts Reads Writes Real mounts MB/Sec
Upgrade