Top Banner
Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides
35

Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Dec 23, 2015

Download

Documents

Giles Flynn
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Primers

CMSC 491Hadoop-Based Distributed Computing

Spring 2015Adam Shook

Some content adapted from Dr. Kalpakis’s CMSC 621 slides

Page 2: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Agenda

• Distributed Computing– Evolution of Computing Infrastructure– Networking Infrastructure– Properties of Distributed Systems– Example System Architectures

• Java• Linux

Page 3: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

EVOLUTION OF COMPUTING INFRASTRUCTURE

Page 4: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Mainframe – 50s to 70s

• Custom hardware• Custom low-level specialized code

• Very expensive solutions

Page 5: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Client/Server – 80s to 00s

• IT-led architectures• More portable solutions• Scalable solutions based on demand

• Reign of the Enterprise Data Warehouse

Page 6: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Cloud – 00s to Today

• Consumer-grade infrastructure• Growing IaaS and PaaS markets• Data revolution

• Focus on applications and not infrastructure

Page 7: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Where does Hadoop fit?

• A piece of your data infrastructure– Can crunch data for analytics– Can expose data for web applications

• Exploration of raw data• Augments today’s infrastructure

• IMO, a big toolbox that can do a bit of everything

Page 8: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

NETWORKING INFRASTRUCTURE

Page 9: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Single Server

HDD

HDDCPUCPU

RAMRAM

NICNIC

Server Scale Up

Scale Out

Faster CPUsBigger Storage

More Servers

Page 10: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Local-Area Network (LAN)Rack

HDD

HDDCPUCPU

RAMRAM

NICNIC

Server

HDDHDDCPU

CPURAMRAM

NICNIC

Server

HDD

HDDCPUCPU

RAMRAM

NICNIC

Server

HDD

HDDCPUCPU

RAMRAM

NICNIC

Server

Rack

HDDHDDCPU

CPURAMRAM

NICNIC

Server

HDD

HDDCPUCPU

RAMRAM

NICNIC

Server

HDD

HDDCPUCPU

RAMRAM

NICNIC

Server

HDD

HDDCPUCPU

RAMRAM

NICNIC

Server

WAN

Gat

eway

Page 11: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Wide Area Network (WAN)

London, England

Beijing, ChinaNew York, NY

Page 12: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

PROPERTIES OF DISTRIBUTED SYSTEMS

Page 13: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Distributed Systems

• The development of low-cost powerful microprocessors, together with the invention of high speed networks, enable us to construct computer systems by connecting a large number of computers

• A distributed system is a collection of independent computers that appears to its users as a single coherent system.

Page 14: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Properties of Distributed Systems

• Reliability• Scalability• Availability• Efficiency• CAP Theorem

Page 15: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Reliability

• Can the system deliver services in face of several component failures?

Page 16: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Scalability

• Can the system scale to support a growing number of tasks?

Page 17: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Availability

• How much latency is imposed on the system when a failure occurs?

Page 18: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Efficiency

• How efficient is the system, in terms of latency and throughput?

Page 19: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

CAP Theorem

• Consistent• Available• Partition Tolerant

• Trade-off between Consistency and Availability

Page 20: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Stateful vs. Stateless

• Whether or not a distributed system saves their state on an attached device for recovery

Page 21: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

EXAMPLE SYSTEM ARCHITECTURES

Page 22: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Simple Client/Server

Page 23: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Multi-Tiered Client/Server

Page 24: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Round-Robin Client/Server

Page 25: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.
Page 26: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Java

• Object-oriented class-based programming language designed for code reuse and portability

• Programs compile to bytecode that can run on any Java Virtual Machine (JVM)

• Memory is managed for you and automatically cleaned up by the JVM’s garbage collector

• Syntax is similar to C++

Page 27: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

public class Animal {// Member Variablesprotected int age = 0;protected String species = null;

public Animal() { }

public Animal(int a, String s) {setAge(a);setSpecies(s);

}

public String getSpecies() { return species; }

public void setSpecies(String s) { this.species = s; }

public int getAge() { return age; }

public void setAge(int a) { this.age = a; }

public String toString() { return age + " " + species } ;}

Page 28: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

// Inherits all the public/protected items from Animalpublic class Human extends Animal {

// Additional member Variablesprivate String name = null

public Human(String name, int age) { super(age, "Human”);

setName(name); }

public String getName() { return name; }

public void setName(String n) { this.name = n; }

public String toString() { return name + " " +super.toString() } ;

}

Page 29: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

// Main class to be executedpublic class Main {

public static void main(String[] args) {Animal a = new Animal();a.setAge(10);a.setSpecies("Hiphopopotamus”);

System.out.println(a);

a = new Human("Adam", 85);

System.out.println(a);}

}

10 HiphopopotamusAdam 85 Human

Page 30: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

// Templated classpublic class Pair<FIRST, SECOND> {

public FIRST first;public SECOND second;

public String toString() { return first + ":" + second; }}public class Main {

public static void main(String[] args) {Pair<Integer, String> p1 = new Pair<Integer, String>();p1.first = 10;p1.second = "Rhymenocerous";

System.out.println(p1);

Pair<Human, String> p2 = new Pair<Human, String>();p2.first = new Human("Adam", 85);p2.second = "Hiphopopotamus";

System.out.println(p2);}

}

10:RhymenocerousAdam 85 Human:Hiphopopotamus

Page 31: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

101’d

• Simply scratched the surface of Java• Includes interfaces, abstract classes, lots of

libraries for data structures, networking, multi-threading, etc.

• We will be using Eclipse and Maven in this class

Page 32: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Let’s look at Maven

• sorry

Page 33: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

LINUX

Page 34: Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides.

Linux Reference

• A free and open source operating system• In this course, we live in Eclipse and the

command line• Mastery of 'vi' gets you +4 charisma

http://www.ibm.com/developerworks/library/l-lpic1-v3-103-1/http://www.linuxdevcenter.com/excerpt/LinuxPG_quickref/linux.pdf