Top Banner
Premium community conference on Microsoft technologies itcampro @ itcamp14 # Highway to the Information Zone Solving 3 key challenges of building Big Data Solutions in the Cloud @ andybareweb
25

Highway to the Information Zone (Andy Cross)

Aug 29, 2014

Download

Technology

ITCamp



With major vendors working hard to ease provision of Hadoop, resulting in many Hadoop As A Service offerings; what’s the challenge domain in 2014 for Big Data engineers? If HaaS is a Highway; where does it lead and how do you travel on it?

In this fast paced, L300 hands-on session, Andy will demonstrate Hadoop in practice, using Microsoft’s Cloud technologies: Building a system from scratch to ingest information into HDInsight, query and report on that information.

This session presumes prior knowledge of Map Reduce technologies, Hadoop, HDFS and HCAT.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

Highway to the Information Zone

Solving 3 key challenges of building Big Data Solutions in the Cloud

@andybareweb

Page 2: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

Huge thanks to our sponsors & partners!

Page 3: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

Big Data core ethos: Distribute workload to achieve throughput on IO bound operations

Flat files + Compute = Azure

Page 4: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

GA managed Hadoop 2 Hadoop on Microsoft AzureFamiliar tools such as Hive, Pig, OozieAdditional BoB Microsoft ecosystem tooling with .net SDK

Powershell and .net for provisionExecution with .net and powershell for Hive

Paired with Hortonworks HDP for on-premises Hadoop; compatible with all major Hadoop implementationsCombined with Excel and traditional Microsoft BI stack for compelling solutions

HDInsight – Hadoop as a Service

Page 5: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

Simple Programming style for efficient distribution

A cluster topology designed for resilience and efficiency

What is Hadoop?

MAP REDUCE

Name Node & Job Tracker

Data Node & Task Tracker

Data Node & Task Tracker

Data Node & Task Tracker……

Page 6: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

Apply innovative expressions of logic over stored mass of data

Page 7: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

Position in Cloud

Page 8: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

Blank Canvas• Windows Azure Subscription

– Capacity to provision HDInsight– Capacity to provision Storage Account

Page 9: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

Challenge 1: Cluster Provision

Page 10: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

We need somewhere to Execute!

• Powershell / C# / xpat CLI

• All these give further configuration options including – Boost performance by increasing IOPs – stripe data across many Storage

Accounts– Manage cluster specific features; core-site, mapred-site and hdfs-site

Page 11: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

DEMOProvision a customised HDInsight cluster via powershell

Page 12: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

Centralised Resources

Page 13: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

HDFS

Mount Azure Blob Storage; consume from Hadoop

Page 14: Highway to the Information Zone (Andy Cross)

Provision

Execute

De-provision

Page 15: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

Shard Data to boost performance

Shard source data across Azure storage accounts, giving over 5000 IOPS per HDInsight cluster

Page 16: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

Isolate logs best practice

Use a state storage account for logs, creating automatically at the same time as cluster creation

Page 17: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

Challenge 2: Data Ingress

Page 18: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

• Windows Azure Storage Blobs– Equivalent to Azure Blob Storage

• Mounted as HDFS compatible file system– Hadoop can read/write directly with

– Azure Blobs

Explanation of WASB

ANDYC2014

Page 19: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

DEMOFile upload to new WASB location; Hadoop fs –cat /path/to/file

Page 20: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

In reality you will have a file pipeline; my solution is Cloud Data Sync Agent

Page 21: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

Challenge 3: Run a query!

Page 22: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

• .net Map Reduce SDK• Programmatically express logic• Implement three main classes• Job execution from a console

application

• Hive query language• Create Table myTable location ‘/path’• Select * from myTable• Powershell execution

Page 23: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

DEMOHive and .net

Page 24: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#

Page 25: Highway to the Information Zone (Andy Cross)

Premium community conference on Microsoft technologies itcampro@ itcamp14#