Simon Elliston Ball Head of Big Data @sireb Getting your Big Data on with HDInsight http://bit.ly/GettingHDInsight #gettingHDInsight
May 25, 2015
Simon Elliston Ball Head of Big Data
@sireb
Getting your Big Data on with HDInsight
http://bit.ly/GettingHDInsight#gettingHDInsight
HDInsight: Hadoop on Azure.
HDInsight: Hadoop
wasb://
HDInsight: Hadoop on Azure.
wasb://
YARN
HDInsight: Hadoop on Azure.
wasb://
YARN
Big Data
What can I do with it?
Data warehousing
Machine Learning
Batch Analytics
ETL
HDInsight (c. 2013)
All grown up
Portal
Creating a cluster
PowerShell
Getting data in
http://www.cerebrata.com/products/azure-explorer/
http://bigdata.red-gate.com/hdfs-explorer
Import Export tool for RDBMS
Sqoop up that SQL
Command line based
Generates Map Reduce jobs
Doing it with PowerShell
Demo!
Sqoop up that SQL
SELECT * FROM hivesampletable
Hive: like SQL
Support for window functions
Rollups, aggregates
Limited support for some SQL features
Hive: like SQL, but…
Works on arbitrary data
Schema on Read
Demo!
Hive
Java based
MapReduce
Simple algorithm
key: valuea:1a:1b:1c:1
a:1,1b:1c:1
Map Sort / Shuffle Reduce
a:2b:1c:1
key: value key: value
Streaming Interface
MapReduce .NET
http://hadoopsdk.codeplex.com/
PM> Install-Package Microsoft.Hadoop.MapReduce
Demo!
MapReduce .NET
Machine learning library for Hadoop
Mahout
Just another Hadoop Job
All packaged in a jar
X
Demo!
Excel and HDInsight
High performance Key-Value store
HBase
Different cluster type in the portal
Can link to MapReduce and Hive
HDFS Explorer
Quick plug
http://bigdata.red-gate.com/
Hadoop Import/Export
Questions?Simon Elliston Ball [email protected]
@sireb
http://bit.ly/GettingHDInsight #gettingHDInsight