“Introducing Hadoop on Azure: Joe Hummel, PhD Visiting Researcher: U. of California, Irvine Adjunct Professor: U. of Illinois, Chicago & Loyola U., Chicago Materials: http://www.joehummel.net/downloads.html Email: [email protected]hello Map-Reduce!”
hello Map-Reduce!”. “Introducing Hadoop on Azure:. Joe Hummel, PhD Visiting Researcher: U. of California, Irvine Adjunct Professor: U. of Illinois, Chicago & Loyola U., Chicago. Materials: http ://www.joehummel.net/downloads.html Email: [email protected]. Agenda. A little history… - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
“Introducing Hadoop on Azure:
Joe Hummel, PhDVisiting Researcher: U. of California, IrvineAdjunct Professor: U. of Illinois, Chicago &
A little history… Why Hadoop? How it works Demos Summary
Agenda
Hadoop on Azure 3
Map-Reduce is from functional programming
A little history…
// function returns 1 if i is prime, 0 if not:let isPrime(i) = ...
// sums 2 numbers:let sum(x, y) = return x + y
// count the number of primes in 1..N:let countPrimes(N) = let L = [ 1 .. N ] // [ 1, 2, 3, 4, 5, 6, ... ] let T = map isPrime L // [ 0, 1, 1, 0, 1, 0, ... ] let count = reduce sum T // 42 return count
4
Created by to drive internet search◦ BIG data ― scalable to TBs and beyond◦ Parallelism: to get the performance◦ Data partitioning: to drive the parallelism◦ Fault tolerance: at this scale, machines are going to crash, a lot…
A little more history…
BIGData
pagehits
Hadoop on Azure 5
Search engines: Google, Yahoo, Bing Facebook Twitter Financials Health industry Insurance Credit card companies Just about any company collecting user data…
Who’s using Hadoop
6
Freely-available framework for big data◦ http://hadoop.apache.org/
// Javascript version:var map = function (key, value, context){ var values = value.split(","); // field 0 contains movieid, field 2 the rating: context.write(values[0], values[2]);};
var reduce = function (key, values, context) { var sum = 0; var count = 0;
while (values.hasNext()) { count++; sum += parseInt(values.next()); } context.write(key, sum/count);};
Hadoop on Azure 12
Traditional use of Hadoop Upload data to HDFS
◦ Hadoop file system
Write map / reduce functions◦ default is to use Java◦ most languages supported: C, C++, C#, JavaScript, Python, …
Compile and upload code◦ For Java, you upload .jar file◦ For others, .exe or script
Submit MapReduce job Wait for job to complete
Hadoop on Azure 13
When to use Hadoop? Queries against big datasets Embarrassingly-parallel problems
◦ Solution must fit into map-reduce framework
Non-real-time demands
Hadoop is not for:◦ Small datasets (< 1GB?)◦ Sub-second / real-time needs (though clearly Google makes it work)
14
We’ll be working with Chicago crime data…◦ https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2 ◦ http://www.cityofchicago.org/city/en/narr/foia/CityData.html
Presenter: Joe Hummel◦ Email: [email protected]◦ Materials: http://www.joehummel.net/downloads.html
For more info:◦ http://www.hadooponazure.com/ ◦ http://msdn.microsoft.com/en-us/magazine/jj190805.aspx ◦ Overview, including how to access via .NET API: