Top Banner
Technology and Infrastructure Support for Large Scale Information Marcio Faerman The Brazilian National Education and Research Network - RNP [email protected] www.rnp.br
18

Technology and Infrastructure Support for Large Scale Information

Feb 02, 2016

Download

Documents

paley

Technology and Infrastructure Support for Large Scale Information. Marcio Faerman The Brazilian National Education and Research Network - RNP [email protected] www.rnp.br. Generating Large Data Collections. Large Data Volumes can be generated much faster than they can be analyzed - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Technology and Infrastructure Support for Large Scale Information

Technology and Infrastructure Support for Large Scale Information

Marcio FaermanThe Brazilian National Education and Research Network - [email protected]

Page 2: Technology and Infrastructure Support for Large Scale Information

Generating Large Data Collections• Large Data Volumes can be generated much faster

than they can be analyzed– Instrument Observations

• Particle Accelerators (Cern LHC)• Telescopes, Satellites• Sensor Networks• Virtual Observatories

– Large Model Simulations• High resolution, Very complex

• Scientific Experiments– medical imaging (fMRI): ~ 1 GByte per measurement (day)– Bio-informatics queries: 500 GByte per database– Satellite world imagery: ~ 5 TByte/year– Current particle physics: 1 PByte per year– LHC physics (2007): 10-30 PByte per year– LSST Astronomy (2012): 5 PBytes per year

Page 3: Technology and Infrastructure Support for Large Scale Information

Challenges Managing Large Volume Data• Scalability

– What works for small datasets does not necessarily work for large collections

• Data Integrity– At a terabyte scale failures and data corruption are very likely to occur– Is data provenance reliable?

• Efficiency– Data should be accessed at a rate which keeps work feasible– More data – need for more speed

• Distributed Access– Data can be at remote (and possibly unknown) location

• Infrastructure Management– Heterogeneous– Distributed– Prone to failures– Very Complex

Page 4: Technology and Infrastructure Support for Large Scale Information

Challenges – Getting to Know your Data

• Extract knowledge from raw data files– Data product derivation

• Vizualization• Relationships• Patterns • New derived quantities

– Cross institutional and cross disciplinary collaborations• What if experiments

– Your data with our model?

• Dataset Access– Multiple formats

• Each sensor, simulation has its own storage format

– Federated collections

– Discovery by content

Page 5: Technology and Infrastructure Support for Large Scale Information

Technological Response

• Integration of compute, communication, storage and instrument resources into a powerful infrastructure – Information Grids– Very powerful infrastructure– Economy of scale

• Serves broad range of customers– biologists, pysicists, government, industry

• Infrastructure is heterogeneous, distributed, very complex

• Middleware and Data Oriented tools act as facilitators to tackle data management complexities

Page 6: Technology and Infrastructure Support for Large Scale Information

Open Access and Preservation Functionalities• Federated Digital Libraries

– Integration of distributed repositories– Access control – can decide who can see it– Organize the data in collections– Describe your data – Metadata

• Data Grids– Access to efficient parallel I/O systems– Hierarchical Systems

• Disk caches, tapes• Often Distributed

– Analysis, Data Mining– Visualization– Workflow based systems– Transaction based data ingestion

• Data provenance, Data fingerprinting– What if virtual lab

• End User Oriented Portals– "I deal with the data in the way it makes sense to me"

Page 7: Technology and Infrastructure Support for Large Scale Information

Middlewares and Tools

• Data Management– Storage Resource Broker (SRB)– Globus Data Management– L-Store– IBP– Storage Resource Manager (SRM)

• Data Representation Libraries– HDF5– NetCDF

• Portals– OGCE– JSR 168

Page 8: Technology and Infrastructure Support for Large Scale Information

Today’s Reality

• Exceptional achievements by early adopters

• Integration between domain scientists – data users and producers still a challenge– Need much more cross-disciplinary interaction

• Emphasis on scale and performance• Failures are still a taboo

– Frustration factor should be addressed in partnership with users

– Focus on failure recovery and quality of service getting more attention

Page 9: Technology and Infrastructure Support for Large Scale Information

e-Infrastructure Workshop, NUDI/USP, São Paulo, 07.05.2007 9

Grid Initiatives around the World

Page 10: Technology and Infrastructure Support for Large Scale Information

HEPGrid

Ringrid

EELA

SPRACE

UCRAV

OurGrid

UNAM

SINAPAD

CL Grid

Page 11: Technology and Infrastructure Support for Large Scale Information

Networking in Latin America

RNP-BRREUNA-CL

CUDI-MX

RAAP-PE

REACCIUN-VE

Page 12: Technology and Infrastructure Support for Large Scale Information

12

Brazilian National Research And Education Network - RNP

• In November 2005 the RNP networking infrastructure was entirely renovated.

It consists of

• A multigigabit core connecting 10 capitals at 2.5 and 10 Gbps

• Connections at 34 Mbps to 11 capitals

• Connections up to16 Mbps to 6 capitals

Page 13: Technology and Infrastructure Support for Large Scale Information

Infra-estrutura para e-Ciência 13

Communitary Metropolitan Networks

• It is not enough to bring high speed connectivity to each city – it is necessary bring it to the university campus / research lab as well.

• The metropolitan network is the solution– Infrastructure sharing to support:

• Campi interconnection of each partner institution• Access to RNP national network backbone

– This sharing substantially reduces deployment costs– Preferably, the infrastructure will be owned by the partners

themselves (reducing operating costs)

• Pilot: The Metrobel project in the city of Belém do Pará in the Amazon region

Page 14: Technology and Infrastructure Support for Large Scale Information

Metrobel – Belém Metropolitan Network

Page 15: Technology and Infrastructure Support for Large Scale Information

Infra-estrutura para e-Ciência 15

Redecomep Project(2005-7)

• Following Metrobel, Brazilian Ministry of Science and Technology is supporting the Communitary Networks for Education and Research (Redecomep) Project, with a R$ 39,7 M (~ U$ 19,0 M) through Finep (dec/2004)

• Goals:– Extend the metropolitan optical network to other

26 cities with RNP points of presence– Promote integration in metropolitan area– High speed access to RNP point of presence

Page 16: Technology and Infrastructure Support for Large Scale Information

Next steps

• Integration between network, data repositories, compute, storage resources and applications– Identify who needs better connectivity

– Developing Brazilian cyberinfrastructure

– Generally uncoordinated funding for infrastructure resources

– Need broad vision at funding agencies and partners level of application requirements and cyberinfrastructure integration

• RNP articulating with scientific communities and infrastructure providers e-Science/Infrastructure initiative in Brazil

Page 17: Technology and Infrastructure Support for Large Scale Information

e-Infrastructure Workshop, NUDI/USP, São Paulo, 07.05.2007 17

JRU- Brazil: 22 members in EELA-2 # STATE INSTITUTION E-SCIENCE COMMUNITIES

1 SP CCE / USP (e-INFRASTRUCTURE only) 

2 RJ CEFET-RJ e-GOVERNMENT, E-INDUSTRY

3 RJ FCM / UERJ BIOMED

4 RJ FIOCRUZ BIOMED, e-EDUCATION

5 SP IAG / USP CLIMATE

6 RJ IME BIOMED

7 SP INCOR / USP BIOMED

8 SP INPE CLIMATE

9 RJ LNCC BIOMED

10 RJ ON PHYSICS

11 BR RNP (NREN) (e-INFRASTRUCTURE only)

12 SP SPRACE / UNESP PHYSICS

13 PB UFCG CLIMATE, EARTH-SCIENCE

14 RJ UFF (e-INFRASTRUCTURE only)

15 MG UFJF BIOMED

16 MS UFMS BIOMED

17 RS UFRGS CLIMATE

18 RJ UFRJ (coordinator for EELA-2) BIOMED, PHYSICS, e-EDUCATION, CLIMATE

19 RS UFSM CLIMATE

20 DF UnB BIOMED

21 RJ UNILASALLE e-EDUCATION

22 SP UNISANTOS BIOMED, E-LEARNING, e-GOVERNMENT

Page 18: Technology and Infrastructure Support for Large Scale Information

Developing Together

• Information infrastructure is being redefined in Brazil and Latin America

• Now is the time to have as much cross-disciplinary interaction as possible to define needs, partnerships and investments

• Please contact us

THANK YOU!