Primers CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides
Dec 23, 2015
Primers
CMSC 491Hadoop-Based Distributed Computing
Spring 2015Adam Shook
Some content adapted from Dr. Kalpakis’s CMSC 621 slides
Agenda
• Distributed Computing– Evolution of Computing Infrastructure– Networking Infrastructure– Properties of Distributed Systems– Example System Architectures
• Java• Linux
EVOLUTION OF COMPUTING INFRASTRUCTURE
Mainframe – 50s to 70s
• Custom hardware• Custom low-level specialized code
• Very expensive solutions
Client/Server – 80s to 00s
• IT-led architectures• More portable solutions• Scalable solutions based on demand
• Reign of the Enterprise Data Warehouse
Cloud – 00s to Today
• Consumer-grade infrastructure• Growing IaaS and PaaS markets• Data revolution
• Focus on applications and not infrastructure
Where does Hadoop fit?
• A piece of your data infrastructure– Can crunch data for analytics– Can expose data for web applications
• Exploration of raw data• Augments today’s infrastructure
• IMO, a big toolbox that can do a bit of everything
NETWORKING INFRASTRUCTURE
Single Server
HDD
HDDCPUCPU
RAMRAM
NICNIC
Server Scale Up
Scale Out
Faster CPUsBigger Storage
More Servers
Local-Area Network (LAN)Rack
HDD
HDDCPUCPU
RAMRAM
NICNIC
Server
HDDHDDCPU
CPURAMRAM
NICNIC
Server
HDD
HDDCPUCPU
RAMRAM
NICNIC
Server
HDD
HDDCPUCPU
RAMRAM
NICNIC
Server
Rack
HDDHDDCPU
CPURAMRAM
NICNIC
Server
HDD
HDDCPUCPU
RAMRAM
NICNIC
Server
HDD
HDDCPUCPU
RAMRAM
NICNIC
Server
HDD
HDDCPUCPU
RAMRAM
NICNIC
Server
WAN
Gat
eway
Wide Area Network (WAN)
London, England
Beijing, ChinaNew York, NY
PROPERTIES OF DISTRIBUTED SYSTEMS
Distributed Systems
• The development of low-cost powerful microprocessors, together with the invention of high speed networks, enable us to construct computer systems by connecting a large number of computers
• A distributed system is a collection of independent computers that appears to its users as a single coherent system.
Properties of Distributed Systems
• Reliability• Scalability• Availability• Efficiency• CAP Theorem
Reliability
• Can the system deliver services in face of several component failures?
Scalability
• Can the system scale to support a growing number of tasks?
Availability
• How much latency is imposed on the system when a failure occurs?
Efficiency
• How efficient is the system, in terms of latency and throughput?
CAP Theorem
• Consistent• Available• Partition Tolerant
• Trade-off between Consistency and Availability
Stateful vs. Stateless
• Whether or not a distributed system saves their state on an attached device for recovery
EXAMPLE SYSTEM ARCHITECTURES
Simple Client/Server
Multi-Tiered Client/Server
Round-Robin Client/Server
Java
• Object-oriented class-based programming language designed for code reuse and portability
• Programs compile to bytecode that can run on any Java Virtual Machine (JVM)
• Memory is managed for you and automatically cleaned up by the JVM’s garbage collector
• Syntax is similar to C++
public class Animal {// Member Variablesprotected int age = 0;protected String species = null;
public Animal() { }
public Animal(int a, String s) {setAge(a);setSpecies(s);
}
public String getSpecies() { return species; }
public void setSpecies(String s) { this.species = s; }
public int getAge() { return age; }
public void setAge(int a) { this.age = a; }
public String toString() { return age + " " + species } ;}
// Inherits all the public/protected items from Animalpublic class Human extends Animal {
// Additional member Variablesprivate String name = null
public Human(String name, int age) { super(age, "Human”);
setName(name); }
public String getName() { return name; }
public void setName(String n) { this.name = n; }
public String toString() { return name + " " +super.toString() } ;
}
// Main class to be executedpublic class Main {
public static void main(String[] args) {Animal a = new Animal();a.setAge(10);a.setSpecies("Hiphopopotamus”);
System.out.println(a);
a = new Human("Adam", 85);
System.out.println(a);}
}
10 HiphopopotamusAdam 85 Human
// Templated classpublic class Pair<FIRST, SECOND> {
public FIRST first;public SECOND second;
public String toString() { return first + ":" + second; }}public class Main {
public static void main(String[] args) {Pair<Integer, String> p1 = new Pair<Integer, String>();p1.first = 10;p1.second = "Rhymenocerous";
System.out.println(p1);
Pair<Human, String> p2 = new Pair<Human, String>();p2.first = new Human("Adam", 85);p2.second = "Hiphopopotamus";
System.out.println(p2);}
}
10:RhymenocerousAdam 85 Human:Hiphopopotamus
101’d
• Simply scratched the surface of Java• Includes interfaces, abstract classes, lots of
libraries for data structures, networking, multi-threading, etc.
• We will be using Eclipse and Maven in this class
Let’s look at Maven
• sorry
LINUX
Linux Reference
• A free and open source operating system• In this course, we live in Eclipse and the
command line• Mastery of 'vi' gets you +4 charisma
http://www.ibm.com/developerworks/library/l-lpic1-v3-103-1/http://www.linuxdevcenter.com/excerpt/LinuxPG_quickref/linux.pdf
References
• http://webdam.inria.fr/Jorge/html/wdmch15.html• Google Images• http://www.ibm.com/developerworks/library/l-lpic1-
v3-103-1/• http://www.linuxdevcenter.com/excerpt/LinuxPG_qu
ickref/linux.pdf