Premium community conference on Microsoft technologies itcampro @ itcamp14 # Highway to the Information Zone Solving 3 key challenges of building Big Data Solutions in the Cloud @ andybareweb
Aug 29, 2014
Premium community conference on Microsoft technologies itcampro@ itcamp14#
Highway to the Information Zone
Solving 3 key challenges of building Big Data Solutions in the Cloud
@andybareweb
Premium community conference on Microsoft technologies itcampro@ itcamp14#
Huge thanks to our sponsors & partners!
Premium community conference on Microsoft technologies itcampro@ itcamp14#
Big Data core ethos: Distribute workload to achieve throughput on IO bound operations
Flat files + Compute = Azure
Premium community conference on Microsoft technologies itcampro@ itcamp14#
GA managed Hadoop 2 Hadoop on Microsoft AzureFamiliar tools such as Hive, Pig, OozieAdditional BoB Microsoft ecosystem tooling with .net SDK
Powershell and .net for provisionExecution with .net and powershell for Hive
Paired with Hortonworks HDP for on-premises Hadoop; compatible with all major Hadoop implementationsCombined with Excel and traditional Microsoft BI stack for compelling solutions
HDInsight – Hadoop as a Service
Premium community conference on Microsoft technologies itcampro@ itcamp14#
Simple Programming style for efficient distribution
A cluster topology designed for resilience and efficiency
What is Hadoop?
MAP REDUCE
Name Node & Job Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker……
Premium community conference on Microsoft technologies itcampro@ itcamp14#
Apply innovative expressions of logic over stored mass of data
Premium community conference on Microsoft technologies itcampro@ itcamp14#
Position in Cloud
Premium community conference on Microsoft technologies itcampro@ itcamp14#
Blank Canvas• Windows Azure Subscription
– Capacity to provision HDInsight– Capacity to provision Storage Account
Premium community conference on Microsoft technologies itcampro@ itcamp14#
Challenge 1: Cluster Provision
Premium community conference on Microsoft technologies itcampro@ itcamp14#
We need somewhere to Execute!
• Powershell / C# / xpat CLI
• All these give further configuration options including – Boost performance by increasing IOPs – stripe data across many Storage
Accounts– Manage cluster specific features; core-site, mapred-site and hdfs-site
Premium community conference on Microsoft technologies itcampro@ itcamp14#
DEMOProvision a customised HDInsight cluster via powershell
Premium community conference on Microsoft technologies itcampro@ itcamp14#
Centralised Resources
Premium community conference on Microsoft technologies itcampro@ itcamp14#
HDFS
Mount Azure Blob Storage; consume from Hadoop
Provision
Execute
De-provision
Premium community conference on Microsoft technologies itcampro@ itcamp14#
Shard Data to boost performance
Shard source data across Azure storage accounts, giving over 5000 IOPS per HDInsight cluster
Premium community conference on Microsoft technologies itcampro@ itcamp14#
Isolate logs best practice
Use a state storage account for logs, creating automatically at the same time as cluster creation
Premium community conference on Microsoft technologies itcampro@ itcamp14#
Challenge 2: Data Ingress
Premium community conference on Microsoft technologies itcampro@ itcamp14#
• Windows Azure Storage Blobs– Equivalent to Azure Blob Storage
• Mounted as HDFS compatible file system– Hadoop can read/write directly with
– Azure Blobs
Explanation of WASB
ANDYC2014
Premium community conference on Microsoft technologies itcampro@ itcamp14#
DEMOFile upload to new WASB location; Hadoop fs –cat /path/to/file
Premium community conference on Microsoft technologies itcampro@ itcamp14#
In reality you will have a file pipeline; my solution is Cloud Data Sync Agent
Premium community conference on Microsoft technologies itcampro@ itcamp14#
Challenge 3: Run a query!
Premium community conference on Microsoft technologies itcampro@ itcamp14#
• .net Map Reduce SDK• Programmatically express logic• Implement three main classes• Job execution from a console
application
• Hive query language• Create Table myTable location ‘/path’• Select * from myTable• Powershell execution
Premium community conference on Microsoft technologies itcampro@ itcamp14#
DEMOHive and .net
Premium community conference on Microsoft technologies itcampro@ itcamp14#
Premium community conference on Microsoft technologies itcampro@ itcamp14#