YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Digg.com Software Architecture

An Infrastructure in Transition

Joe Stump, Lead Architect, Digg

Page 2: Digg.com Software Architecture

Introductions

Page 3: Digg.com Software Architecture

✓ 35,000,000 uniques✓ 3,500,000 users✓ 15,000 requests / sec✓Hundreds of servers

Page 4: Digg.com Software Architecture

“Web 2.0 sucks (for scaling).”Joe Stump

Page 5: Digg.com Software Architecture

What’s Scaling?

Page 6: Digg.com Software Architecture

What’s Scaling?

Specialization

Page 7: Digg.com Software Architecture

What’s Scaling?

Severe Hair Loss

Page 8: Digg.com Software Architecture

What’s Performance?

Page 9: Digg.com Software Architecture

What’s Performance?

Who cares?

Page 10: Digg.com Software Architecture

4 Stages of Scaling

Page 11: Digg.com Software Architecture
Page 12: Digg.com Software Architecture
Page 13: Digg.com Software Architecture
Page 14: Digg.com Software Architecture
Page 15: Digg.com Software Architecture

As it stands ...

Page 16: Digg.com Software Architecture

Applications

Netscalers

MogileFS

Rec. Engine

ZOMG ROFLAFK WTFLULZ

Lucene

Page 17: Digg.com Software Architecture

Building Blocks• MogileFS

- 9 nodes- 2.8TB of files

• Gearman- Each application

server- 400,000 jobs / day

• Memcached- 25 nodes- 2GB / node

Page 18: Digg.com Software Architecture

Moving forward ...

Page 19: Digg.com Software Architecture

MogileFS

Rec. EngineLucene

Netscalers

Applications

Services

IDDB

Netscalers

Applications

Services

IDDB

Messaging

Page 20: Digg.com Software Architecture

✓ Elastic horizontal partitions✓Heterogenous partition types✓Muti-homed✓ ID’s live in multiple places✓Partitioned result sets

IDDB

Page 21: Digg.com Software Architecture

IDDB_ID_Intid bigintdate_created timestampstatus tinyintversion bigint

IDDB_ID_Charcharid charname charvalue charintid bigintdate_created timestamp

IDDB_ID_Int_Shardsintid bigintshardid intstatus tinyint

IDDB_Shardsid biginttype charhost charport mediumintuser charpass charstatus tinyint

Page 22: Digg.com Software Architecture

✓Memcached + BDB✓ 28,000+ writes a second✓Persistent key/value storage✓Works with Memcached

clients

MemcacheDB

Page 23: Digg.com Software Architecture

War stories ...

Page 24: Digg.com Software Architecture

✓ 15,000 - 17,000 submissions per day

✓Crawl for images, video embeds, source, other meta data

✓Ran in parallel via Gearman

Digg Images

Page 25: Digg.com Software Architecture

✓ 230,000+ Diggs per day✓Most active Diggers are also

most followed✓ 3,000 writes per second✓Ran in background via

Gearman✓ Eventually consistent

Green Badges

Page 26: Digg.com Software Architecture

user_ip_views

Page 27: Digg.com Software Architecture

✓Switched to explicit caching✓ Intelligently grouped objects

in cache ✓Sorting, limiting, etc. done in

the application layer✓ 200% to 300% gains in

performance

Digg Comments

Page 28: Digg.com Software Architecture

✓ Vertical partitioning✓Migrate in background

processes ✓Use the bots✓Keep track of migration✓Retry failed migrations

automatically

Data Migration

Page 29: Digg.com Software Architecture

Things to ponder ...

Page 30: Digg.com Software Architecture

CAP Theorem

Page 31: Digg.com Software Architecture

Have I ran the numbers?

Page 32: Digg.com Software Architecture

Is MySQL the best solution?

Page 33: Digg.com Software Architecture

Can I do this later?

Page 34: Digg.com Software Architecture

How can I partition this data?

Page 35: Digg.com Software Architecture

How should I cache this data?

Page 36: Digg.com Software Architecture

Questions?!