GridPP3 Storage Perspective, Achievements, Challenges Jens Jensen, STFC RAL GridPP20 TCD Dublin, 11-12 March 2008
Mar 28, 2015
GridPP3 StoragePerspective, Achievements, Challenges
Jens Jensen, STFC RAL
GridPP20
TCD Dublin, 11-12 March 2008
Jens Jensen, STFC/RAL
“Bear with me for a moment”
• View of the past– Achievements– Lessons learned
• Present– SRM 2 deployment
• Future– Todo– Really high level stuff
Jens Jensen, STFC/RAL
Who we are…
• GridPP storage community• As defined by mailing list, has ~55
members– Covers every UK site– Also in .ie, .nl, .ca, .pl, .it, .de
• However, not all are equally active…– But that’s OK– Isn’t it?
Jens Jensen, STFC/RAL
Support
Develo
pers
Dev su
pp
ort
Dep
l. sup
port
Grid
PP su
pp
rot
com
mu
nity
sup
pro
t
(loca
l)
use
rs
Jens Jensen, STFC/RAL
Support
Develo
pers
Dev su
pp
ort
Dep
l. sup
port
Grid
PP su
pp
rot
com
mu
nity
sup
pro
t
(loca
l)
use
rs
1 person…
Jens Jensen, STFC/RAL
Support
Develo
pers
Dev su
pp
ort
Dep
l. sup
port
Grid
PP su
pp
rot
com
mu
nity
sup
pro
t
use
rs
Maybe reality is a little more complicated
Jens Jensen, STFC/RAL
Your name appeared among the beneficiaries who will receive a part-payment of US$2.8 million and has been approved already for months. You are requested to get back to me for more direction and instruction on how to receive your fund. We want to hear from you before we can make the transfer
• Open for questions, goes to Greig and Jens
• Almost all spam• Promising to solve our financial problems
• They tell us: “Storage, size matters”
Jens Jensen, STFC/RAL
Status
Jens Jensen, STFC/RAL
Status
Jens Jensen, STFC/RAL
Status
• 2/3 of sites running DPM– Experimentally on Lustre– (Cambridge, UCL)
• 1/3 of sites running dCache• Tier 1 running CASTOR
– (and dCache)
• Bristol (Jon) running StoRM
Jens Jensen, STFC/RAL
Status
• Finished CCRC 08• Should have SRM2 deployed
– At least for Atlas (sites)• Need space token descrs• Problems with space manager in dCache
– And CMS (sites)• More static token descrs initially
– Information system secondary (tokens static)• Still req’d for accounting
• Many people worked hard to make it a success
Jens Jensen, STFC/RAL
Experiences
• Went well, mostly• SRM2 used at RAL
– Few odd bugs and issues
– E.g. “-0.00P” free– Negative file sizes
(gridftp 32 bit issue?)
• Took time to get space token (descr) agreed
• Who speaks for expts?
• Using spaces at T2s– OK for DPMers
• Needs firewall open• Endpoint published• Spaces set up
– Harder for dCache• Problems with space
mgr• But running on same
port
Jens Jensen, STFC/RAL
Lessons• No way to get through to everyone
– Needs some effort at sites (to do what we need)– Workshop at NeSC was a success
• Storage is more difficult than you'd think– Particularly the occasional peaks– Implementation specific optimisations– Locating the problem – complex implementations
• Need to manage risks more carefully– GridPP2: surprising number of risks happened!
Jens Jensen, STFC/RAL
risksRisks...(dating back to Dec06-Feb07, needs revision)
Jens Jensen, STFC/RAL
Special Achievements
• Beyond the call of duty• Recognised internationally• Or special benefits to users
Jens Jensen, STFC/RAL
Information Systems
Information collected globally
Used for
accounting
Users locate
resources
Jens Jensen, STFC/RAL
Information Systems
• Much work done on information system backends in GridPP– GIP plugin easier– DPM (Graeme, then Greig)– dCache debug (owned by SARA then DESY)– CASTOR
• Disk servers – Tier 1• CASTOR, LSF, tape robot – RAL Storage• Oracle databases – RAL DB group
Jens Jensen, STFC/RAL
Special Achievements
• Accounting– Space “available” and “used”– Resource overview and selection– (or non-selection)
• Numerous subtle issues with space• What is used? Available?• Can info be relied on for selection?• Subtle implementation issues• Long propeller head discussions
Jens Jensen, STFC/RAL
SRM/SRB interoperation
using gLite
• Pretend SRB is a
“Classic SE”• Classic SE still supported
by gLite FTS
FTS
SRBDisk storage
SRM
GridFTPGridFTP
SRM selects pool node…
Disk storage
GridFTP
Disk storage
GridFTP
LFC
Jens Jensen, STFC/RAL
Achievements - FTS monitoring
Jens Jensen, STFC/RAL
Achievements – standards
• SRM 2.2 is now an OGF standard– Collaboration between SRM developers– …and WLCG– New challenges ahead
• GLUE– Contributed to GLUE SE schema– 1.3, also some for 2.0
Jens Jensen, STFC/RAL
What Keeps the Unreasonable
(Wo)Man Awake at Night?• CUS – Campaign for
Usable Storage• Fabric• Staff...!!• Coordination
Jens Jensen, STFC/RAL
What is Usable Storage
• Users: “we want usable storage”• Deployment: “storage is usable if it’s
being used”• Not necessarily…• Identified (currently) 13 areas
– Somewhat overlapping– But that is normal
Jens Jensen, STFC/RAL
What is Usable Storage
• Robust– Doesn’t fall overMeasure uptime (for some definition of
uptime)
• Good performanceRequests per second, concurrent users
– Can be tested – DESY did this for dCacheCan be tested! (Dave Newbold for CASTOR,
ScotGrid for DPM and dCache)
– (Also tests the SRM itself)
Jens Jensen, STFC/RAL
What is Usable Storage
• Good Overall Data PerformanceTests the data movers and networks
– Experiments are good at this– Also 3rd party transfers, and to tape– Optimisations
• Ensures resource availability– Concurrent users (other experiments, same
expt)Ancient available/used metrics
– Load balancing, dynamic alloc.
Jens Jensen, STFC/RAL
What is Usable Storage
• Monitored. Accountable.– See when something goes wrongReliable accounting dataMinimise downtime
• Maintainable– Ease upgrade, installation and configurationMinimise downtime
• Tested (prior to release)
Jens Jensen, STFC/RAL
What is Usable Storage
• Standards compliant and interoperable– Provides SRM 2.2 / GLUE 1.3 / GridFTP– Extensive test suite available
• Secure– Access control, secure implementations
• Supported– Upstream: developers
• Publishing metadata in current schema• Usable by applications (interfaces)
Jens Jensen, STFC/RAL
Challenges
Services
Capabilities
Scale,Performance
Economy,Sustainability
Middleware
State of the Art
Users
Challenges
Jens Jensen, STFC/RAL
Users
Applications
Culture,History
Customermgmt
Usability
Users
Jens Jensen, STFC/RAL
Services
Trust
Availability
Accounting
Discovery
Services
Jens Jensen, STFC/RAL
State of the Art
WebServices
Virtualisation
Media
State of the Art
Jens Jensen, STFC/RAL
Middleware
Stability
Applications
MaintenanceSupport
Ease of installAnd Config
Middleware
Jens Jensen, STFC/RAL
Scale, Performance
Staging
Transfer rates Size of files
Number of files
Volume
Scale,Performance
Jens Jensen, STFC/RAL
Sustainability, Economy
Scale
Trust Dynamic
Agreement
Cost Model
Economy
Jens Jensen, STFC/RAL
Capabilities
Content
Access
Curation
SECURITY
Capabilities
Jens Jensen, STFC/RAL
Conclusion
• Lots of things achieved• Lots of stuff to do
– Somehow always harder than expected– Doesn’t asymptotically tend to zero– Plus there are regular peaks so it doesn’t even
converge
• Storage is important! should not be underestimated
• Good community to go forward into GridPP3