Top Banner

Click here to load reader

Spring for Apache Hadoop - Reference Documentation · PDF fileSpring for Apache Hadoop - Reference Documentation 2.4.0.RELEASE Spring for Apache Hadoop 3 2. Additional Resources While

Feb 28, 2019

ReportDownload

Documents

phungkien

Spring for Apache Hadoop- Reference Documentation

2.4.0.RELEASE

Costin Leau Elasticsearch , Thomas Risberg Pivotal , Janne Valkealahti Pivotal

Copyright 2011-2016 Pivotal Software, Inc.

Copies of this document may be made for your own use and for distribution to others, provided that you do not charge any feefor such copies and further provided that each copy contains this Copyright Notice, whether distributed in print or electronically.

Spring for Apache Hadoop - Reference Documentation

2.4.0.RELEASE Spring for Apache Hadoop iii

Table of Contents

Preface ..................................................................................................................................... viiI. Introduction ............................................................................................................................. 1

1. Requirements ................................................................................................................. 22. Additional Resources ...................................................................................................... 3

II. Spring and Hadoop ................................................................................................................ 43. Hadoop Configuration ..................................................................................................... 5

3.1. Using the Spring for Apache Hadoop Namespace ................................................. 53.2. Using the Spring for Apache Hadoop JavaConfig .................................................. 63.3. Configuring Hadoop ............................................................................................. 73.4. Boot Support ..................................................................................................... 10

spring.hadoop configuration properties .......................................................... 11spring.hadoop.fsshell configuration properties .......................................... 13

4. MapReduce and Distributed Cache ............................................................................... 154.1. Creating a Hadoop Job ...................................................................................... 15

Creating a Hadoop Streaming Job .................................................................... 154.2. Running a Hadoop Job ...................................................................................... 16

Using the Hadoop Job tasklet ........................................................................... 174.3. Running a Hadoop Tool ..................................................................................... 17

Replacing Hadoop shell invocations with tool-runner ........................................... 18Using the Hadoop Tool tasklet .......................................................................... 19

4.4. Running a Hadoop Jar ....................................................................................... 19Using the Hadoop Jar tasklet ............................................................................ 20

4.5. Configuring the Hadoop DistributedCache ........................................................... 204.6. Map Reduce Generic Options ............................................................................ 21

5. Working with the Hadoop File System ........................................................................... 225.1. Configuring the file-system ................................................................................. 225.2. Using HDFS Resource Loader ........................................................................... 235.3. Scripting the Hadoop API ................................................................................... 25

Using scripts .................................................................................................... 275.4. Scripting implicit variables .................................................................................. 27

Running scripts ................................................................................................ 28Using the Scripting tasklet ................................................................................ 28

5.5. File System Shell (FsShell) ................................................................................ 29DistCp API ....................................................................................................... 30

6. Writing and reading data using the Hadoop File System ................................................. 316.1. Store Abstraction ............................................................................................... 31

Writing Data ..................................................................................................... 31File Naming ............................................................................................. 31File Rollover ............................................................................................. 32Partitioning ............................................................................................... 32Creating a Custom Partition Strategy ......................................................... 35Writer Implementations ............................................................................. 36Append and Sync Data ............................................................................ 36

Reading Data ................................................................................................... 37Input Splits ............................................................................................... 37Reader Implementations ........................................................................... 38

Using Codecs .................................................................................................. 38

Spring for Apache Hadoop - Reference Documentation

2.4.0.RELEASE Spring for Apache Hadoop iv

6.2. Persisting POJO datasets using Kite SDK ........................................................... 38Data Formats ................................................................................................... 38

Using Avro ............................................................................................... 39Using Parquet .......................................................................................... 39

Configuring the dataset support ........................................................................ 40Writing datasets ............................................................................................... 40Reading datasets ............................................................................................. 42Partitioning datasets ......................................................................................... 43

6.3. Using the Spring for Apache JavaConfig ............................................................. 447. Working with HBase ..................................................................................................... 47

7.1. Data Access Object (DAO) Support .................................................................... 478. Hive integration ............................................................................................................ 49

8.1. Starting a Hive Server ....................................................................................... 498.2. Using the Hive JDBC Client ............................................................................... 498.3. Running a Hive script or query ........................................................................... 49

Using the Hive tasklet ...................................................................................... 498.4. Interacting with the Hive API .............................................................................. 50

9. Pig support .................................................................................................................. 519.1. Running a Pig script .......................................................................................... 51

Using the Pig tasklet ........................................................................................ 529.2. Interacting with the Pig API ................................................................................ 52

10. Apache Spark integration ............................................................................................ 5310.1. Simple example for running a Spark YARN Tasklet ............................................ 53

11. Using the runner classes ............................................................................................ 5912. Security Support ......................................................................................................... 61

12.1. HDFS permissions ........................................................................................... 6112.2. User impersonation (Kerberos) ......................................................................... 6112.3. Boot Support ................................................................................................... 62

spring.hadoop.security configuration properties ........................................ 6213. Yarn Support .............................................................................................................. 64

13.1. Using the Spring for Apache Yarn Namespace .................................................. 6413.2. Usi

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.