Top Banner
© Hortonworks Inc. 2013 Quick House Keeping Rule Q&A panel is available if you have any questions during the webinar There will be time for Q&A at the end We will record the webinar for future viewing All attendees will receive a copy of the slides an recording Page 1
25
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

Quick House Keeping Rule

• Q&A panel is available if you have any questions during the

webinar

• There will be time for Q&A at the end

• We will record the webinar for future viewing

• All attendees will receive a copy of the slides an recording

Page 1

Page 2: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

Hadoop, R, and Google Chart Tools

Data Visualization for Application Developers

Jeff Markham

Solution Engineer

[email protected]

Page 3: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

Agenda

• Introductions• Use Case Description• Preparation• Demo• Review• Q & A

Page 3

Page 4: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

Use Case Description

• Visualizing data• Tools vs. application development• Choosing the technology

• Hortonworks Data Platform• RHadoop• Google Charts

Page 4

Page 5: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

OS Cloud VM Appliance

Preparation: Install HDP

Page 5

HORTONWORKS DATA PLATFORM (HDP)

Hortonworks Data Platform (HDP)Enterprise Hadoop

• The ONLY 100% open source and complete distribution

• Enterprise grade, proven and tested at scale

• Ecosystem endorsed to ensure interoperability

PLATFORM SERVICES

HADOOP CORE

Enterprise Readiness: HA, DR, Snapshots, Security, …

Distributed Storage & ProcessingHDFS YARN (in 2.0)

WEBHDFS MAP REDUCE

DATASERVICES

Store, Process and Access Data

HCATALOG

HIVEPIGHBASE

SQOOP

FLUME

OPERATIONAL SERVICES

Manage & Operate at

ScaleOOZIE

AMBARI

Page 6: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

Preparation: Install R

Page 6

• Install R language

• Install appropriate packages– rhdfs– rmr2–googleVis– shiny–Dependencies for all above

Page 7: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

Preparation

Page 7

• rmr2–Functions to allow for MapReduce in R apps

• rhdfs–Functions allowing HDFS access in R apps

• googleVis–Use of Google Chart Tools in R apps

• shiny– Interactive web apps for R developers

Page 8: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2012

Demo WalkthroughUsing Hadoop, R, and Google Chart Tools

Page 9: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

Visualization Use Case

• Data from CDC– Vital statistics publicly available data– 2010 US birth data file

Page 9

S 201001 7 2 2 30105 2 011 06 1 123 3405 1 06 01 2 2 0321 1006 314 2000 2 222 2 2 2 2 2 122222 11 3 094 1 M 04 200940 39072 3941 083 22 2 2 2 2 110 110 00 0000000 00 000000000 000000 000 000000000000000000011 101 1 111 1 0 1 1 1 111111 11 1 1 1 1

SAM

PLE

RECO

RD

source: http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm

Page 10: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

Visualization Use Case

Page 10

> hadoop fs –put ~/VS2010NATL.DETAILUS.DAT /user/jeff/natality/

PUT

DATA

INTO

HD

FS

> hadoop fs –mkdir /user/jeff/natality

CREA

TE H

DFS

DIR

• Put data into HDFS– Create input directory– Put data into input directory

Page 11: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

Visualization Use Case

Page 11

#!/usr/bin/env Rscript

require('rmr2')require('rhdfs')hdfs.init()

hdfs.data.root = 'natality'hdfs.data = file.path(hdfs.data.root, 'VS2010NATL.DETAILUS.DAT')hdfs.out.root = hdfs.data.roothdfs.out = file.path(hdfs.out.root, 'out')

. . .

R SC

RIPT

• Write R script– Specify use of RHadoop packages– Initialize HDFS– Specify data input and output location

Page 12: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

Visualization Use Case

Page 12

. . .

mapper = function(k, fields) { keyval(as.integer(substr(fields, 89, 90)),1)}

reducer = function(key, vv) {# count values for each key keyval(key, sum(as.numeric(vv),na.rm=TRUE))} . . .

R SC

RIPT

• Write R script– Write mapper function– Write reducer function

Page 13: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

Visualization Use Case

Page 13

. . .

job = function (input, output) { mapreduce(input = input, output = output, input.format = "text", map = mapper, reduce = reducer, combine = T)} . . .

R SC

RIPT

• Write R script– Write job function

Page 14: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

Visualization Use Case

Page 14

. . .

out = from.dfs(job(hdfs.data, hdfs.out))results.df = as.data.frame(out,stringsAsFactors=F)R

SCRI

PT

• Write R script– Write result to HDFS output directory

Page 15: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

Visualization Use Case

Page 15

> mkdir ~/my-shiny-app

SHIN

Y AP

P D

IR

• Create Shiny application

– Create directory– Create ui.R– Create server.R

Page 16: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

Visualization Use Case

Page 16

shinyUI(pageWithSidebar(

# Application title headerPanel("2010 US Births"),

sidebarPanel(. . .),

mainPanel( tabsetPanel( tabPanel("Line Chart", htmlOutput("lineChart")), tabPanel("Column Chart", htmlOutput("columnChart")) ) )))

UI.R

SO

URC

E

• Create Shiny application– Create ui.R

Page 17: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

Visualization Use Case

Page 17

library(googleVis)library(shiny)library(rmr2)library(rhdfs)

hdfs.init()

hdfs.data.root = 'natality'hdfs.data = file.path(hdfs.data.root, 'out')df = as.data.frame(from.dfs(hdfs.data))

. . .

SERV

ER.R

SO

URC

E

• Create Shiny application– Create server.R

Page 18: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

Visualization Use Case

Page 18

. . . shinyServer(function(input, output) {

output$lineChart <- renderGvis({ gvisLineChart(df, options=list( vAxis="{title:'Number of Births'}", hAxis="{title:'Age of Mother'}", legend="none" )) }) . . .

SERV

ER.R

SO

URC

E

• Create Shiny application– Create server.R

Page 19: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

Visualization Use Case

Page 19

> shiny::runApp('~/my-shiny-app')Loading required package: shiny

Welcome to googleVis version 0.4.0

. . .

HADOOP_CMD=/usr/bin/hadoop

Be sure to run hdfs.init()

Listening on port 8100

RUN

SH

INY

APP

• Run Shiny application

Page 20: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

Visualization Use Case

Page 20

• View Shiny application

Page 21: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2012

Demo LiveUsing Hadoop, R, and Google Chart Tools

Page 22: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

Visualization Use Case

Page 22

• Architecture recap– Analyze data sets with R on Hadoop– Choose RHadoop packages– Visualize data with Google Chart Tools via googleVis package– Render googleVis output in Shiny applications

• Architecture next steps– Integrate Shiny application into existing web apps– Create further data models with R

Page 23: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

OS Cloud VM Appliance

HDP: Enterprise Hadoop Distribution

Page 23

HORTONWORKS DATA PLATFORM (HDP)

Hortonworks Data Platform (HDP)Enterprise Hadoop

• The ONLY 100% open source and complete distribution

• Enterprise grade, proven and tested at scale

• Ecosystem endorsed to ensure interoperability

PLATFORM SERVICES

HADOOP CORE

Enterprise Readiness: HA, DR, Snapshots, Security, …

Distributed Storage & ProcessingHDFS YARN (in 2.0)

WEBHDFS MAP REDUCE

DATASERVICES

Store, Process and Access Data

HCATALOG

HIVEPIGHBASE

SQOOP

FLUME

OPERATIONAL SERVICES

Manage & Operate at

ScaleOOZIE

AMBARI

Page 24: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2013

HDP Sandbox

Page 24

Page 25: Hdp r-google charttools-webinar-3-5-2013 (2)

© Hortonworks Inc. 2012

Thank You!

Jeff MarkhamSolution Engineer

[email protected]

Page 25