Top Banner
Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007
12

Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007.

Dec 13, 2015

Download

Documents

Stuart Lane
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007.

Google and LargeScientific Datasets

or

How To Move 100TB

Jon Trowbridge

Google

Space Telescope Science Institute

March 15, 2007

Page 2: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007.

Organize the world’s information and make it universally accessible

and useful.

Page 3: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007.

Motivating Problem

What if a piece of information is too large to efficiently transmit across

the Internet as it exists today?

Page 4: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007.

“Never underestimate the bandwidth of a station wagon full of tapes hurtling down the

highway.”- Andrew Tanenbaum (?)

Page 5: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007.

Large Dataset Archive

• Move data by shipping hard drives

• Centralized repository stored on Google’s infrastructure

• Accepting data from all disciplines, but it must be open and free

• Ulimate goal: Promiscuous distribution

Page 6: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007.

Nice Properties ofPhysical HD Shipment

• Uses commodity technologies: Linux, SATA, ext2

• High throughput

• Trivially scalable

• Cheap and easy: $2400 for 3T

• Rapidly getting cheaper

Page 7: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007.

Real-World Throughputs

Method MiB/s GiB/hr TB/day hrs/TB1200 baud modem 1.14E-04 4.02E-04 9.43E-06 2545166My Home DSL (downstream) 0.3 1.41 0.03 728.18Ethernet: 10baseT 0.8 2.81 0.07 364.09Ethernet: 100baseT 8 28.13 0.66 36.41End-to-end physical shipment 0.88 27.42HD Transfer 30 105.47 2.47 9.71FedEx phase of shipment 3.00 8.00Ethernet: Gigabit 60 210.94 4.94 4.85LBNL, 2002: 10.6 GiB/s 10854 38160 894.38 0.03

Page 8: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007.

The Cost of 1GB of Storage

• 1986: $100,000

• 1990: $10,000

• 1994: $1,000

• 1997: $100

• 2000: $10

• 2004: $1

• Today: About 40¢

Creative Computing - February, 1980

Page 9: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007.

Not-So-Nice Properties ofPhysical HD Shipment

• Physical objects break, get stolen, occasionally explode

• HD copying bottleneck

• Customs/duties make international shipments more complicated

Page 10: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007.

The Big Question

What happens when every astronomer has the complete Hubble Legacy Archive on the

computer in their office?

Page 11: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007.

The Big Question

What happens when every high-school student has the complete

Hubble Legacy Archive on thecomputer in their bedroom?

Page 12: Google and Large Scientific Datasets or How To Move 100TB Jon Trowbridge Google Space Telescope Science Institute March 15, 2007.