Top Banner
© 2015 DataTorrent Akshay Gore, Bhupesh Chawda DataTorrent Apex Hands-on Lab - Into the code! Getting started with your first Apex Application!
19

Writing an Apache Apex Application

Jan 08, 2017

Download

Software

Apache Apex
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Writing an Apache Apex Application

© 2015 DataTorrent

Akshay Gore, Bhupesh ChawdaDataTorrent

Apex Hands-on Lab - Into the code!Getting started with your first Apex Application!

Page 2: Writing an Apache Apex Application

© 2015 DataTorrent

Operators• Input Adaptor Vs

Generic Operators ?

• What are streams?• What are ports?

Page 3: Writing an Apache Apex Application

© 2015 DataTorrent

Apex Operator Lifecycle

Page 4: Writing an Apache Apex Application

© 2015 DataTorrent

Apex Streaming Application

public class Application implements StreamingApplication{

populateDAG(DAG dag, Configuration conf) {

// Add Operators to dag - dag.addOperator(args)// Add Streams between operators - dag.addStream(args)// Additional config + Hints to YARN - Optional } }

Page 5: Writing an Apache Apex Application

© 2015 DataTorrent

Apex Application - FilterWords

Apex Application DAG

• Problem statement - Filter words in the fileᵒ Read a file located on HDFSᵒ Split each line into words, check if it is not one of the forbidden words

and write it down to HDFS

HDFS

Lines Filtered WordsHDFS

Page 6: Writing an Apache Apex Application

© 2015 DataTorrent

FilterWords Application DAG

Reader Tokenize Processor Writter

Input Operator (Adapter)

Output Operator (Adapter)

Generic Operators

HDFS HDFS

Lines WordsFiltered Words

Page 7: Writing an Apache Apex Application

© 2015 DataTorrent

Prerequisites• JAVA 1.7 or above• Maven 3.0 or above • Apache Apex projects:

ᵒ Apache Apex Core: core platform, engineᵒ Apache Apex Malhar: operators library

• Hadoop cluster in running state• Your favourite IDE - Eclipse / vi

Page 8: Writing an Apache Apex Application

© 2015 DataTorrent

Demo time!• Apex application structure• Application code walk through• How to execute the application• Assignment

Page 9: Writing an Apache Apex Application

© 2015 DataTorrent

Assignment - WordCount

Apex Application DAG

• Problem statement - Count occurrences of words in a fileᵒ Read a file located on HDFSᵒ Emit count at the end of the every window and writes into HDFS

HDFS

Lines <Word, Count>HDFS

Page 10: Writing an Apache Apex Application

© 2015 DataTorrent

Assignment - Word Count Application DAG

Reader Tokenize Counter OutputHDFS HDFS

Lines Words<Word, count>

Page 11: Writing an Apache Apex Application

© 2015 DataTorrent

Assignment - What you need to do

Reader Tokenizer Processor WriterString String String

Line Words Words’

Counter WriterMap

{Word: Count}

Assignment

Page 12: Writing an Apache Apex Application

© 2015 DataTorrent

Assignment - Hints• Create copy of Processor.java. Name it Counter.java• Modify Counter.java as follows:

ᵒ Define a data structure which can hold counts for wordsᵒ Process method of input port must count the occurrencesᵒ Clear the counts in beginWindow() call

ᵒ Emit the counts in endWindow() call

Page 13: Writing an Apache Apex Application

© 2015 DataTorrent

Solution - Changes to Counter.java• Need to define a data structure which can hold counts for words

private HashMap<String, Integer> counts = new HashMap<>();

• Process method of input port must count the occurrencesif(counts.containsKey(refinedWord)) {

counts.put(refinedWord, counts.get(refinedWord) + 1);

} else {

counts.put(refinedWord, 1);

}

● Clear the counts in beginWindow call counts.clear();

● Emit the counts in endWindow call output.emit(counts.toString());

● Run Application Test

Page 14: Writing an Apache Apex Application

© 2015 DataTorrent

Assignment - Are we done yet?• Change the DAG

ᵒ Replace Processor operator with the newly created operator - Counter

Page 15: Writing an Apache Apex Application

© 2015 DataTorrent

Assignment - Slight change• We are emitting a Map. However it is still a string.

ᵒ Change type of output port of Counter to type Mapᵒ Change type of input port of Writer to Mapᵒ Make appropriate changes to Writer to read a Map and write in a format

such that each line belongs to a single word.

Page 16: Writing an Apache Apex Application

© 2015 DataTorrent

Assignment - Final change• Change the code such that each count is the overall count, not just

for each window?

Page 17: Writing an Apache Apex Application

© 2015 DataTorrent

Summary - Recap• Writing Apache Apex operators• Chaining the operators into an Apache Apex application• Executing the application on the Apache Apex platform

Page 18: Writing an Apache Apex Application

© 2015 DataTorrent

Where to go from here?Apache Apex Documentation - http://apex.incubator.apache.org/docs.htmlApache Apex Core Git - https://github.com/apache/incubator-apex-coreApache Apex Malhar Git - https://github.com/apache/incubator-apex-malhar

Join Users Mailing List - [email protected] Dev Mailing List - [email protected]

Send queries to Users Mailing List - [email protected] queries to Dev Mailing List - [email protected]

Page 19: Writing an Apache Apex Application

© 2015 DataTorrent

Thank You