Top Banner
Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu
24

Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Dec 28, 2015

Download

Documents

Bryan Barber
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Advanced Topics in Distributed Systems

Fall 2011

Instructor: Costin Raiciu

Page 2: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

We’ve gotten used togreat applications

Page 3: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Enabling Such Apps is Hard

• Apps– Process huge amounts of data– Are fast– Are reliable

• One machine is not enough– Limited reliability, speed

• Super computers are expensive

Page 4: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

What Makes These Applications Tick?

Page 5: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Distributed Systems

Page 6: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

• Cares about technology relating to distributed systems:– Networks– Virtual machines– Distributed filesystems– Distributed computation

• We care about details, not about products– Why?

This course…

Page 7: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Traditional Data Center Network Topology

…Racks of servers

Top of Rack Switches

Aggregation Switches

Core Switch

1Gbps

10Gbps

10Gbps

Page 8: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Fat Tree Topology [Fares et al., 2008; Clos, 1953]

Aggregation Switches

K Pods with K Switches

each

K=4

Racks of servers

1Gbps

1Gbps

Page 9: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

• Many operating systems running on a single box

• Provides:– Isolation– Flexibility– Better utilization of the machine

Inside a Machine: Virtualization

Page 10: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

How do we store data?

• Distributed filesystem– NFS: • UNIX-like semantics• Single server• Limited scalability

– Google File System• Optimized for large-batch writes and sequential reads• Tolerates inconsistency

Page 11: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

How do we get work done?

• Map reduce– Apply the same function in parallel on different

data on many machines– Aggregate results

• Useful for:– Building big web-search indices– Processing large amounts of data (PB)

Page 12: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

This is just a taster

Page 13: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Course outline

• Distributed Apps we care about– Distributed Computation (Map Reduce, Driad, Hadoop)– Distributed Filesystems (NFS and GFS)– Web search– Caching (Memcached)– Distributed Hash Tables (Chord, Dynamo)– NoSQL databases (BigTable, Cassandra)

• Infrastructure: networks– Topologies: FatTree, VL2, Bcube– Using capacity: Hedera, MPTCP– Performance Optimizations: Incast, DCTCP

Page 14: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Course outline [2]

• Infrastructure: OS abstractions– Virtual Machines (Xen, VMM)– Distributed memory (Ivy)

• Security– Information Leakage– Good Isolation vs. High Utilization (Seawall,

CloudPolice)

Page 15: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Course Admin

• Lectures:– 2 hours per week, Tuesday 8-10 EC102

• Lab classes: – 2 hours per week, Tuesday 10-12 EG106– Project discussions– Help with practical issues– Help with high level goals, theory

• Website: curs.cs.pub.ro– If you have problems, let me know

Page 16: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Grading

• Project: 5p– Groups of 3-4 students– 4 stages: to help you get the job done easily,

without last minute work over Christmas• Exam: 3p• Presentation (1h): 1p• Class participation: 1p

Page 17: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Presentation

• Select one topic before the end of October (list will be posted this week)– Presentation date is fixed– If you miss your presentation, you lose 2p

• Class participation– 2 papers presented per course by your colleagues– Read them before and take part in discussion

Page 18: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Exam

• Open book• Need to understand and think– not memorize

• Studying 3 days before the exam won’t work– You need to take part in classes and read-up

Page 19: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Projects

• Large scale data processing with MapReduce– We will use Apache Hadoop– We will run code on Amazon EC2 (and maybe on

local clusters)– Several datasets you can choose from

Page 20: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Datasets available

• Crawled set of HTML pages from .uk• Wikipedia Page Traffic Statistics• Apache Mail Archives• Million Song Dataset• M-Lab dataset: Network Path and Application Diagnosis

tool• Human genome• US Census databases• Freebase data dump

Page 21: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Stage 1

• Choose dataset to use • Select one/many questions to answer using the dataset• Write small Hadoop script to parse a subset of the data• Come up with a few simple graphs (e.g. dataset size,

histograms)• Start writing:• Introduction to your report, problem statement• Start the implementation and evaluation

– Size of dataset, time to do one pass, etc.

• Strict deadline [1p]: November 1st

Page 22: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Stage 2• How do we solve the problem?

– Review related work– Select potential approaches

• Discuss pros/cons

• Implementation and evaluation– Implement the code– Run experiments– Refine code and reiterate

• Goal: 70% of functionality should be implemented• Deadline [1p]: December 1st

– Output in report:• Implementation section • Early evaluation section

Page 23: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Stage 3

• Final implementation• Evaluation• What did we learn?• Deadline [1p]: December 21th

– In class project presentation: 10 mins

Page 24: Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.

Stage 4

• Write-up– Polish report– Create a coherent story– Convince me that this is useful

• Deadline to hand-in final report: last day of semester (January 14th) [1p]