Top Banner
slide 1 PS1 Prototype Systems Design Jan Vandenberg, JHU Early PS1 Prototype
23

PS1 Prototype Systems Design Jan Vandenberg, JHU

Jan 06, 2016

Download

Documents

Aric

PS1 Prototype Systems Design Jan Vandenberg, JHU. Early PS1 Prototype. Engineering Systems to Support the Database Design. Raw data size Index size Most end-user operations I/O bound Loading/Ingest more cpu-bound, though we still need solid write performance Time to do full table scans - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 1

PS1 Prototype Systems DesignJan Vandenberg, JHU

Early PS1 Prototype

Page 2: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 2

Engineering Systems to Support the Database Design

Raw data size Index size Most end-user operations I/O bound Loading/Ingest more cpu-bound, though we still need

solid write performance Time to do full table scans Time to do index scans Need to do most work where the data is; can’t sling TB’s

over the network quickly• …though we can brute-force past 1 Gbit Ethernet if

necessary

Page 3: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 3

Fibre Channel, SAN

Expensive but not-so-fast physical links (4 Gbit, 10 Gbit) Expensive switch Potentially very flexible Industrial strength manageability Little control over RAID controller bottlenecks

Page 4: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 4

SATA

Fast Cheap Ugly, spooky

• <cabling pic> Tough to manage

• <dlmsdb/sdssdb drive bay map>

Page 5: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 5

SAS

For our purposes, it’s SATA without the ugliness Fast: 12 Gbit/s FD building blocks Cheap: PS1 prototype MD1000 pricing versus Newegg

media costs Not Ugly: IB cables versus rats’ nest Industrial strength manageability: pretty blinking lights

and mgmt apps versus downtime plus white knuckles <cabling pic>

Page 6: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 6

I/O Performance of Dell SAS Systems in the PS1 Prototype

Page 7: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 7

SAS Performance, Gory Details

SAS v. SATA differences

Native SAS V. SATA Performance

0

50

100

150

200

250

300

350

400

450

500

1 2 3 4 5 6 7

Disks

MB

/s

20%

Page 8: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 8

Per-Controller Performance

Luckily, one controller is fast enough for one SATA disk box

<performance chart>

Page 9: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 9

Resulting PS1 Prototype I/O Topology

<topo diagram> <aggregate performance chart>

Page 10: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 10

RAID-5 v. RAID-10?

Primer, anyone? RAID-5 probably feasible with contemporary controller… …though tough to predict real-world effects of latency… …and not a ton of redundancy But after we add enough disks to meet performance

goals, we have enough storage to run RAID-10 anyway!• Remember sub-Newegg media costs

Page 11: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 11

RAID-10 Performance

Executive summary: RAID0/2 for single-threaded reads, RAID0 perf for 2-user/2-thread workloads. RAID0/2 writes

Page 12: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 12

PS1 Prototype Servers

<diagram of server roles plus storage and network interconnects>

Page 13: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 13

PS1 Prototype Servers

<iron photo (w/Will?)>

Page 14: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 14

Projected PS1 Systems Design

<diagram of 8-slice triply-replicated systems> <plus geoplex?>

Page 15: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 15

Backup/Recovery/Replication Strategies

No formal backup• …except maybe for mydb’s, f(cost*policy)

3-way replication• Replication != backup

– Little or no history– Replicas can be a bit too cozy: must notice badness before

replication propagates it• Replicas provide redundancy and load balancing…• Fully online: zero time to recover• Replicas needed for happy production performance plus

ingest, anyway Off-site geoplex

• Provides continuity if we lose HI (local or trans-Pacific network outage, facilities outage)

– <lava pic?>• Could help balance trans-Pacific bandwidth needs (service

continental traffic locally)

Page 16: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 16

Why No Traditional Backups?

Not super pricey… …but not very useful relative to a replica for our

purposes• Time to recover

Money no object… do traditional backups too!!! Synergy, economy of scale with other collaboration

needs (IPP?)… do traditional backups too!!!

Page 17: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 17

Failure Scenarios

Easy, zero-downtime:• Disks• Power supplies• Fans

Not so spooky, maybe some downtime and manual replica cutover:• System board (rare)• Memory (rare and usually proactively detected and handled via scheduled maintenance)• Disk controller (rare, potentially minimal downtime via cold-spare controller)• CPU (not utterly uncommon, can be tough and time consuming to diagnose correctly)

More spooky:• Database mangling by human or pipeline error

– Gotta catch this before replication propagates it everywhere– Can’t replicate too aggressively– (and so off-the-shelf near-realtime replication tools don’t help us)

• Catastrophic loss of datacenter– Have the geoplex

– …but we’re dangling by a single copy ‘till recovery complete– …but are we still screwed? Depending on colo scenarios, did we also lose the IPP and flatfile

archive? Terrifying:

• Unrecoverable badness fully replicated before detection• Catastrophic loss of datacenter without geoplex• Can we ever catch back up with the data rate if we need to start over?

– At some point in the survey, the answer likely becomes “no”.

Page 18: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 18

State Diagram for Replicas?

Loading Replicating Load balancing Failing Recovering

• Possibly repeat-loading

Page 19: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 19

Operating Systems, DBMS?

Sql2005 EE x64• Why?• Why not DB2, Oracle RAC, PostgreSQL, MySQL,

<insert your favorite>? (Win2003 EE x64) <Why EE?> Platform rant from JVV available over beers

• <JVV/beer graphic?>

Page 20: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 20

Systems/Database Management

Active Directory infrastructure Windows patching tools, methodology Linux patching tools, methodology Monitoring Staffing requirements

Page 21: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 21

Facilities/Infrastructure Projections for PS1

Cooling Rack space Network ports (plus AD/WSUS/monitoring infrastructure above)

Page 22: PS1 Prototype Systems Design Jan Vandenberg, JHU

slide 22

Operational Handoff to UofH

Page 23: PS1 Prototype Systems Design Jan Vandenberg, JHU

Mahalo!(See Ya, Hon!)