Hadoop on Azure 101 What is the Big Deal?

Post on 24-Feb-2016

40 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Hadoop on Azure 101 What is the Big Deal?. Dennis Mulder Solution Architect – Global Windows Azure Center of Excellence Microsoft Corporation. Agenda. Why Big Data? Understanding the Basics Microsoft and Hadoop. Why Big Data ?. 1.8 ZETTABYTES. Of Information will be created in 2011 - PowerPoint PPT Presentation

Transcript

Hadoop on Azure 101 What is the Big Deal?Dennis MulderSolution Architect – Global Windows Azure Center of ExcellenceMicrosoft Corporation

Agenda

Why Big Data?Understanding the

BasicsMicrosoft and Hadoop

Why Big Data?

Example Scenarios

The Potential: Solving Specific Industry ProblemseCommerce: mining web logs: collaborative filtering, user experience optimisation…Manufacturing: detecting trends and anomalies in sensor data: predicting and understanding faultsCapital Markets: joining market and external data: correlation detection for investment strategy identification, risk calculations…Retail Banking: historical transaction mining: fraud detection, customer segmentation…

Industry-specific data-sets leveraged to improve decision making and generate new revenue streams

OPERATIONAL DATA

Traditional E-Commerce Data Flow

NEW USER REGISTRY

NEW PURCHASE

NEW PRODUCT

Excess Data

LogsETL Some Data

Data Warehouse

OPERATIONAL DATA

New E-Commerce Big Data Flow

Raw Data“Store it All” Cluster

Raw Data“Store it All” Cluster

NEW USER REGISTRY

NEW PURCHASE

NEW PRODUCT

Data Warehouse

Logs

Logs

How much do views for certain products increase when our TV ads run?

Understanding the Basics Move the Compute to the Data

FIRST, STORE THE DATA

Server

ServerServer

So How Does It Work?

Files

Server

SECOND, TAKE THE PROCESSING TO THE DATA

So How Does It Work?

// Map Reduce function in JavaScriptvar map = function (key, value, context) {var words = value.split(/[^a-zA-Z]/);for (var i = 0; i < words.length; i++) {

if (words[i] !== "")context.write(words[i].toLowerCase(),1);}}};var reduce = function (key, values, context) {var sum = 0;while (values.hasNext()) {sum += parseInt(values.next());

}context.write(key, sum);};

ServerServer

ServerServer

RUNTIMECode

MapReduce – Workflow

18Map tasks

53705 $65

53705 $30 53705 $15

54235 $75 54235 $22

02115 $15 02115 $15

44313 $10 44313 $25

44313 $55

5 53705 $15 6 44313 $10

5 53705 $65 0 54235 $22

9 02115 $15 6 44313 $25

3 10025 $95 8 44313 $55

2 53705 $30 1 02115 $15

4 54235 $75 7 10025 $60

Mapper

Mapper

4 54235 $75 7 10025 $60

2 53705 $30 1 02115 $15

10025 $60

5 53705 $65 0 54235 $22

5 53705 $15 6 44313 $10

3 10025 $95 8 44313 $55

9 02115 $15 6 44313 $25

10025 $95

Scenario: Get sum sales grouped by zipCodeDa

taNo

de3

Data

Node

2Da

taNo

de1

Blocks of the Sales file in HDFS

GroupBy

GroupBy

(custId, zipCode, amount)

One output bucket per reduce task

Map

Reducer

Reducer

Reduce tasks

Reducer

53705 $65

54235 $75 54235 $22

10025 $95 44313 $55

10025 $60

Map

per

53705 $30 53705 $15

02115 $15 02115 $15

44313 $10 44313 $25

Map

per

53705 $65

53705 $30

53705 $15

44313 $10 44313 $25

10025 $95 44313 $55

10025 $60

54235 $75 54235 $22

02115 $15 02115 $15

Sort

Sort

Sort

53705 $65

53705 $30

53705 $15

44313 $10 44313 $25 44313 $55

10025 $95 10025 $60

54235 $75 54235 $22

02115 $15 02115 $15

SUM

SUM

SUM

10025 $155 44313 $90

53705 $110

54235 $97

02115 $30

Done!Sh

uffle

Reduce

Hadoop

Hadoop Architecture

Traditional RDBMS vs. MapReduce

TRADITIONAL RDBMS MAPREDUCEData Size Gigabytes (Terabytes) Petabytes (Hexabytes)

Access Interactive and Batch Batch

Updates Read / Write many times Write once, Read many times

Structure Static Schema Dynamic Schema

Integrity High (ACID) Low

Scaling Nonlinear Linear

DBA Ratio 1:40 1:3000

Reference: Tom White’s Hadoop: The Definitive Guide

The Hadoop EcosystemETL Tools BI Reporting RDBMS

Reference: Tom White’s Hadoop: The Definitive Guide

Microsoft and Hadoop

Hadoop on Azure Azure Blob

Storage

Name Node

Data Node

Data Node

Data Node

Data Node

S3

HDFS

On Premise Enterprise Content• Transactional DBs• On Prem logs• Internal sensors

Cloud Enterprise Content• Generated in Azure

3rd Party Content• Azure Datamarket

• Generated/stored elsewhere

• Public content• Delivered online

Azure Blob

Storage

SQL Azure

Application end point

What does Hadoop in the Cloud mean?

Where is HDFS?Where is my data stored?Azure Blob Storage vs. HDFS

Detailed OfferingsHive ODBC Driver & Hive Add-in for ExcelIntegration with Microsoft PowerPivot

Hadoop based distribution for Windows Server & AzureStrategic Partnership with Hortonworks

JavaScript framework for HadoopRTM of Hadoop connectors for SQL Server and PDW

Deploying and Interacting With a Hadoop Cluster on Azuredemo

Hadoop on WindowsInsights to all users by activating new types of data

Integrate with Microsoft Business Intelligence

Choice of deployment on Windows Server + Windows AzureIntegrate with Windows Components (AD, Systems Center)Easy installation and configuration of Hadoop on WindowsSimplified programming with . Net & Javascript integration Integrate with SQL Server Data Warehousing

Diffe

rent

iatio

n

Summary Hadoop is about massive compute and massive data The code is brought to the data Map -> Split the work Reduce -> Combine the results Relational databases vs Hadoop?

Wrong question - Serve different needs

Resourceshttp://www.hadooponazure.com/

http://hadoop.apache.org/

© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to

be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION

IN THIS PRESENTATION.

top related