1 E-Research Infrastructure? [email protected] Head, ANU Internet Futures; Grid Services Coordinator, GrangeNet; Leader, APAC Information Infrastructure Program; (PhD Mt Stromlo 1988-1992)
Dec 18, 2015
1
E-Research Infrastructure?
Head, ANU Internet Futures; Grid Services Coordinator, GrangeNet; Leader, APAC Information Infrastructure Program;(PhD Mt Stromlo 1988-1992)
2
A gentle (and fast!) overview
Themes: What does e-Research mean?
What kind of infrastructure is
involved?
How is it being developed? What are the problems?
3
e-Research + infrastructure
The use of IT to enhance research and education!
Access resources transparently Make data readily available Make collaboration easier
Is it The Grid ?
No, and yes – the Grid is a tool in the kit
Who funds it? The Govt – when building for a large community NCRIS (SII+MNRF), ARC, eResearch-
CoordC’tee
4
ANU Internet Futures
A cross-discipline, cross-campus “applied” research group e-Research infrastructure development
Objectives: To investigate and deploy advanced Internet-
based technologies that support university research and education missions.
Bring research-edge technologies into production use Engage with APAC, GrangeNet, ARIIC/SII, …, Internet2,
APAN, TERENA, …
A strong focus on User Communities Identify common requirements
5
What does “Grid” mean?Analogy with the power grid A standard service (AC, 240V, 50Hz) A standard connection A standard user interface
Users do not care about Various generation schemes Deregulated market
Power auctions Synchronised generators Transmission switching, fail-over
systems Accounting and Billing
6
What does “Grid” mean in IT?
Transparent use of resources Distributed, and networked
Multiple “administrative domains” Other people’s resources become available to you
Various IT resources Computing, Data, Visualisation, Collaboration, etc.
Hide complexity It should be a “black box”, one just plugs in.
7
What are the bits in eRI?
Network Layer (Physical and Transmission)
(Advanced) Communications Services Layer
Grid, Middleware Services Layer
Applications and Users…
8
What’s in that middle bit?
Computing
Visualisation
Collaboration
Data
Instruments
Middle-ware
(Advanced) Communications Services Layer
Applications and Users…
9
Networks
Physical networks are fundamental to link researchers, observational facilities, IT facilities
Demand for high-(and flexible) bandwidth to every astronomical site
Universities, observatories, other research sites/groups GrangeNet, AARNet3, AREN, … Big city focus
Today remote sites have wet bits of string, and station wagons At least 1-10Gigabit links soon-ish (SSO, ATCA, Parkes, MSO). Getting 10-20Gigabits internationally right now,
including to the top of Mauna Kea in the next year or so Canada, US, NL, … are building/running some 40+Gb/s today
e-VLBI, larger detectors, remote control, multi-site collaboration, real-time data analysis/comparisons, …
Burst needs, as well as sustained. Wavelength Division Multiplexing (WDM) allows for a lot more
bandwidth (80λ at 80Gb/s)
10
Common Needs - Middleware
Functionality needed by all the eRI areas Minimise replication of services
Provide a standard set of interfaces To applications/users To network layer To grid services
Can be built independently of other areas
A lot of politics, policy issues enter here
11
Common Needs - Middleware - 2
Authentication Something you have, something you know Somebody vouches for you
Certificate Authorities, Shibboleth, …
Authorisation Granularity of permission (resolution, slices,
…) Limits of permission (time, cycles, storage, …)
Accounting Billing, feedback to authorisation
*Collectively called AAA
12
Common Needs - Middleware - 3
Security Encryption, PKI, … AAA, Non-repudiation Firewalls and protocol hurdles (NATs, proxies,…)
Resource discovery Finding stuff on the Net
Search engines, portals, registries, p2p mesh, … Capability negotiation
Can you do what I want, when I want
Network and application signalling Tell the network what services we need (QoS, RSVP, MPLS, …) Tell the application what the situation is And listen for feedback and deal with it.
13
The Computational Grid
Presume Middleware issues are solved…
Probably the main Grid activity
Architectural Issues CPUs, endian-ness, executable format, libraries; non-uniform networking; Clusters vs SMP,
NUMA, …; Code design
Master/Slave, P2P; Granularity (Fine-grained parallelism vs (coarse) parameter sweep) Scheduling
Multiple owners; Queuing systems; Economics (How to select computational resources, and prioritise)
During execution Job Monitoring and Steering; Access to resources (Code, data, storage, …)
But if we solve all these: Seamless access to computing resources across the planet. Harness the power of supercomputers, large->small clusters,
and corporate/campus desktops (Campus-Grid)
14
Computing facilities
University computing facilities, within departments or centrally.
Standout facilities. The APAC partnership (www.apac.edu.au)
Qld: QPSF partnership, several facilities around UQ, GU, QUT NSW: ac3 (at ATP Everleigh) ACT: ANU - APAC peak facility, upgraded in 2005 (top 30 in the
world) Vic: VPAC (RMIT) SA: SAPAC (U.Adelaide?) WA: IVEC (UWA) Tas: TPAC (U.Tas)
Other very noteworthy facilities, such as Swinburne's impressive clusters. There are bound to be others, and more are planned.
15
Data GridsLarge-scale, distributed, “federated” data repositories
Making complex data available Scholarly output and scholarly input:
Observations, simulations, algorithms, …
to applications and other grid services in the “most efficient” way
Performance, cost, …
in the “most appropriate” way within the same middleware AAA framework
in a sustainable and trustworthy way
16Content Archive
Content Archive Interface
Metadata(Ontologies, Semantics, DRM, …)
User
Queries/R
esults, Curatio
n
ACCESS! and account
Computing
Visualisation
Collaboration
Directories:AAA, Capabilities
Workflows, DRM,…
Rep.
Rep.
Rep.Hardware, Software
A set of Repositories, sharing a purpose ora theme
Data Grid 101
Presentation
Authenticate, Authorise
17
Data Grid Issues Every arrow is a protocol, Every interface is a standard Storage: hardware, software; file format standards, algorithms Describing data: metadata, external orthographies, dictionaries Caching/replication: Instances (non-identical), identifiers,
derivatives Resource discovery: Harvesting, registries, portals Access: security, rights-management (DRM), anonymity; authsn.
granularity
Performance: delivery in appropriate form and size, ; user-meaningful user interface (Rendering/presentation – by location and culture)
Standards, and the excess thereof
Social engineering: Putting data online is An effort – needs to be easier, obvious A requirement! – but not enforced; lacks processes Not recognised nor rewarded
PAPER publishing is!
18
Data facilitiesIn most cases these are inside departments, or maybe central
services on a university.
ANU/APAC host a major storage facility (tape robot) in Canberra
that is available for the R&E community to make use of Currently 1.2Petabytes peak, and connected to GrangeNet and AARNet3. It hosts the MSO MACHO-et-al data set at the moment, and more is to
come. To be upgraded every 2 years or so – factor of 2-5 in capacity each time
If funding is found, each time. Needs community input. Doesn’t suit everyone (yet)
Mirror/collaborating facilities in other cities in AU and overseas being discussedIntegration with local facilitiesVO initiatives – all data from all observatories and computers…
Govt initiatives under ARIIC – APSR, ARROW, MAMS, ADT
19
Collaboration and Visualisation
A lot of intersection between the two Beyond videoconferencing - telepresence Sharing not just your presence, but also your research
Examples: Multiple sites of Large-scale data visualisation, computational steering,
engineering and manufacturing design, bio-molecular modelling and visualisation, Education and training
What’s the user interface? Guided tour vs independent observation Capability negotiation, local or remote rendering (Arbitrary) application sharing
Tele-collaboration (Co-laboratories) Revolve around the Access Grid
www.accessgrid.org
20
Access Grid “Nodes”A collection of interactive, multimedia centres that support collaborative work
distributed large-scale meetings, sessions, seminars, lectures, tutorials and training.
High-end, large-scale “tele-collaboration” facilities Or can run on a single laptop/PDA
Videoconferencing dramatically improved But not the price Much better support for
multi-site, multi-camera, multi-application interaction Flexible, open design
Over 400 in operation around the world 30+ in operation, design or construction in Australia 4+ at ANU
22
AccessGrid facilitiesUniversity hosted nodes are generally available for researchers from any area to use,
you just need to make friends with their hosts.
Qld: JCU-Townsville, CQU-several cities, UQ, QUT, CQU, SQU, GU (Nathan, GoldCoast)
NSW: USyd, UNSW(desktop), UTS ACT: ANU (4+, one at Mt Stromlo. SSO has been suggested) Vic: UMelb (soon), Monash-Caulfield, VPAC (by RMIT), Swinburne (desktop),
U.Ballarat (desktop) SA: U.Adelaide (1 desktop and 1 room), Flinders (soon), UniSA (planning) WA: UWA (IVEC) Tas: UTas (soon) NT: I wish!
Another 400+ around the world.Development by many groups, Australia has some leadership
23
Visualisation Facilities
Active visualisation research community in Australia
OzViz'04 at QUT 6-7 Dec 2004. Major nodes with hard facilities include
ANU-VizLab, Sydney-VisLab, UQ/QPSF-VisLab, IVEC-WA, I-cubed (RMIT), Swinburne, etc.
24
Online InstrumentsRemote, collaborative access to unique / scarce instruments:
Telescopes, Microscopes, Particle accelerators, Robots, Sensor arrays
Need to interface with other eRI services
Computation – analysis of data Data – for storage, comparison Visualisation – for human
analysis Collaboration – to share the
facility
25
So, in summary:Transparent use of various IT resources
Research and education processes Make existing ones easier and better Allow new processes to be developed
Are we there yet? Not even close!! But development in many areas is promising In some situations, the problems are not technical but
political/social Some of the results already are very useful
Astronomy needs to help the processes, to help Astronomy!