Top Banner
© Hortonworks Inc. 2012 Hadoop gets Groovy Steve Loughran– Hortonworks stevel at hortonworks.com @steveloughran Berlin, June 2012
17

Hadoop gets Groovy

Nov 12, 2014

Download

Technology

Steve Loughran

Presentation on using Hadoop with the Groovy Language from Berlin Buzzwords 2012
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hadoop gets Groovy

© Hortonworks Inc. 2012

Hadoop gets Groovy

Steve Loughran– Hortonworksstevel at hortonworks.com@steveloughran

Berlin, June 2012

Page 2: Hadoop gets Groovy

© Hortonworks Inc. 2012Page 2

Hadoop SkillsG

roov

y S

kills

Doug,Owen

Arun, Jakob

@steveloughran

James Strachan

Guillamue Laforge

Where are you in this diagram?

Page 3: Hadoop gets Groovy

© Hortonworks Inc. 2012

Grumpy : Groovy Hadoop Library

• Something lightweight for testing

• Wanted to play in the M/R layer

• Already using Groovy

• Liked: JVM integration, tooling, libraries, IntelliJ IDEA,

Books…

[email protected]:steveloughran/grumpy.git

Page 3

Page 4: Hadoop gets Groovy

© Hortonworks Inc. 2012

What is Groovy?

A dynamic language within the JVM

• Java++–Maps, lists, tuples, Closures

• Flavours of Ruby and Python–'Duck' typing, Grails, (Scripting)

A way to do things in the JVM that Sun didn't imagine

Page 4

Page 5: Hadoop gets Groovy

© Hortonworks Inc. 2012

Can use & subclass java classes:

class LineCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

static final def emitKey = new Text("lines")static final def one = new IntWritable(1)

void map(LongWritable key, Text value, Mapper.Context context) { context.write(emitKey, one) }}

Page 5

Page 6: Hadoop gets Groovy

© Hortonworks Inc. 2012

Closures & lists

class CountReducer2 extends Reducer {

def reduce(Text k, Iterable values, Reducer.Context ctx) {

def sum = values.collect() {it.get() }.sum()

ctx.write(k, new IntWritable(sum)); }

}

Page 6

Page 7: Hadoop gets Groovy

© Hortonworks Inc. 2012

Closures & lists

values.collect() { it.get() }.sum()

List<values> -> List<int> -> int

Page 7

Page 8: Hadoop gets Groovy

© Hortonworks Inc. 2012

Result: MR jobs in Groovy

In:gate1,b46cca4d3f5f313176e50a0e38e7fde3,,2006-10-30,16:06:17,Fleurballgate1,f1191b79236083ce59981e049d863604,,2006-10-30,16:06:20,vklaptopgate1,b45c7795f5be038dda8615ab44676872,,2006-10-30,16:06:21,Franky Pankygate1,02e73779c77fcd4e9f90a193c4f3e7ff,,2006-10-30,16:06:23,gate1,eef1836efddf8dbfe5e2a3cd5c13745f,,2006-10-30,16:06:24,Vasgate1,b46cca4d3f5f313176e50a0e38e7fde3,,2006-10-30,16:06:32,Fleurballgate1,f1191b79236083ce59981e049d863604,,2006-10-30,16:06:36,vklaptopgate1,b45c7795f5be038dda8615ab44676872,,2006-10-30,16:06:37,Franky Pankygate1,eef1836efddf8dbfe5e2a3cd5c13745f,,2006-10-30,16:06:38,Vasgate1,02e73779c77fcd4e9f90a193c4f3e7ff,,2006-10-30,16:06:43,gate1,2afaf990ce75f0a7208f7f012c8d12ad,,2006-10-30,16:06:54,Smiley

Out: 163,198,223 device sightings!

Page 8

Page 9: Hadoop gets Groovy

© Hortonworks Inc. 2012

why no Pig? Sliding Window Debounce

void map(LongWritable key, BlueEvent event, Mapper.Context context) {

BlueEvent ev2 = window.insert(event) List<BlueEvent> expired = window.purgeExpired(event) expired.each { evt -> emit(context, evt) }}

void cleanup(Mapper.Context context) { window.each { evt -> emit(context, evt) }}

Page 9

Page 10: Hadoop gets Groovy

© Hortonworks Inc. 2012

Device sightings by day for 2007

Page 10

Dec

15

Aug

27

Tue-

Wed

Pea

k D

ays

Page 11: Hadoop gets Groovy

© Hortonworks Inc. 2012

Improving Hadoop APIs

Configuration.metaClass.setAt = { key, val -> set(key.toString(), val.toString())}

Configuration.metaClass.getAt = { key -> get(key)}

Configuration.metaClass.add = {map -> map.each {elt -> set((elt.key).toString(), (elt.value).toString() )}

Page 11

Page 12: Hadoop gets Groovy

© Hortonworks Inc. 2012

& Configuration gets better

conf['mapscript'] = new File(src).text

String scriptText = conf['mapscript']

conf.add([ window:60000, 'redscript':reduceScript ])

Extending to Job class trickier –subclassing better

Page 12

Page 13: Hadoop gets Groovy

© Hortonworks Inc. 2012

New today! script driven MR jobs!

protected void setup(Mapper.Context ctx) { this.ctx = ctx this.conf = ctx.configuration ScriptCompiler comp = new ScriptCompiler(conf) String scriptText = conf['mapscript'] map = comp.parse(scriptText, this, ctx) }

protected void map(Writable key, Writable value, Mapper.Context ctx) { map.setProperty('key',key) map.setProperty('value',value) map.run() }

Page 13

Page 14: Hadoop gets Groovy

© Hortonworks Inc. 2012

Things to consider

•Performance: Groovy 2 on Java7• 'False friends' -Types, if(), exceptions

• If you can use Pig, use it. •Use Groovy for testing, extending Hadoop classes (output formatter, etc)

•Play with YARN and Giraph with it

Page 14

Page 15: Hadoop gets Groovy

© Hortonworks Inc. 2012

Questions?

hortonworks.com

Page 15

Page 16: Hadoop gets Groovy

© Hortonworks Inc. 2012

hortonworks.com

Page 16

Page 17: Hadoop gets Groovy

© Hortonworks Inc. 2012

Performance?

•Groovy 1 over-introspects•HLL hides a lot of overhead

• If your work is I/O bound, less important•Speed of development vs execution•Need to benchmark on Java 7

Page 17