Top Banner
Simon Elliston Ball Head of Big Data @sireb Getting your Big Data on with HDInsight http://bit.ly/GettingHDInsight #gettingHDInsight
25

Getting your Big Data on with HDInsight

May 25, 2015

Download

Technology

Introduction to HDInsight, and its capabilities, including Azure Storage, Hive, MapReduce, Mahout and HBase. See also some of the tools mentioned at http://bigdata.red-gate.com/ and source code at https://github.com/simonellistonball/GettingYourBigDataOnMapReduce
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Getting your Big Data on with HDInsight

Simon Elliston Ball Head of Big Data

@sireb

Getting your Big Data on with HDInsight

http://bit.ly/GettingHDInsight#gettingHDInsight

Page 2: Getting your Big Data on with HDInsight

HDInsight: Hadoop on Azure.

Page 3: Getting your Big Data on with HDInsight

HDInsight: Hadoop

Page 4: Getting your Big Data on with HDInsight

wasb://

HDInsight: Hadoop on Azure.

Page 5: Getting your Big Data on with HDInsight

wasb://

YARN

HDInsight: Hadoop on Azure.

Page 6: Getting your Big Data on with HDInsight

wasb://

YARN

Page 7: Getting your Big Data on with HDInsight

Big Data

What can I do with it?

Data warehousing

Machine Learning

Batch Analytics

ETL

Page 8: Getting your Big Data on with HDInsight

HDInsight (c. 2013)

Page 9: Getting your Big Data on with HDInsight

All grown up

Page 10: Getting your Big Data on with HDInsight

Portal

Creating a cluster

PowerShell

Page 11: Getting your Big Data on with HDInsight

Getting data in

http://www.cerebrata.com/products/azure-explorer/

http://bigdata.red-gate.com/hdfs-explorer

Page 12: Getting your Big Data on with HDInsight

Import Export tool for RDBMS

Sqoop up that SQL

Command line based

Generates Map Reduce jobs

Doing it with PowerShell

Page 13: Getting your Big Data on with HDInsight

Demo!

Sqoop up that SQL

Page 14: Getting your Big Data on with HDInsight

SELECT * FROM hivesampletable

Hive: like SQL

Support for window functions

Rollups, aggregates

Page 15: Getting your Big Data on with HDInsight

Limited support for some SQL features

Hive: like SQL, but…

Works on arbitrary data

Schema on Read

Page 16: Getting your Big Data on with HDInsight

Demo!

Hive

Page 17: Getting your Big Data on with HDInsight

Java based

MapReduce

Simple algorithm

key: valuea:1a:1b:1c:1

a:1,1b:1c:1

Map Sort / Shuffle Reduce

a:2b:1c:1

key: value key: value

Page 18: Getting your Big Data on with HDInsight

Streaming Interface

MapReduce .NET

http://hadoopsdk.codeplex.com/

PM> Install-Package Microsoft.Hadoop.MapReduce

Page 19: Getting your Big Data on with HDInsight

Demo!

MapReduce .NET

Page 20: Getting your Big Data on with HDInsight

Machine learning library for Hadoop

Mahout

Just another Hadoop Job

All packaged in a jar

Page 21: Getting your Big Data on with HDInsight

X

Page 22: Getting your Big Data on with HDInsight

Demo!

Excel and HDInsight

Page 23: Getting your Big Data on with HDInsight

High performance Key-Value store

HBase

Different cluster type in the portal

Can link to MapReduce and Hive

Page 24: Getting your Big Data on with HDInsight

HDFS Explorer

Quick plug

http://bigdata.red-gate.com/

Hadoop Import/Export

Page 25: Getting your Big Data on with HDInsight

Questions?Simon Elliston Ball [email protected]

@sireb

http://bit.ly/GettingHDInsight #gettingHDInsight