Top Banner
Copyright © 2016 NTT DATA Corporation 5/9/2016 NTT DATA Corporation Akira Ajisaka Apache Hadoop 3, Current Status Apache: Big Data North America 2016
40

Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

Feb 02, 2018

Download

Documents

buicong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

Copyright © 2016 NTT DATA Corporation

5/9/2016NTT DATA CorporationAkira Ajisaka

Apache Hadoop 3, Current Status

Apache: Big Data North America 2016

Page 2: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

2 Copyright © 2016 NTT DATA Corporation

Self introduction

n  Akira Ajisaka (NTT DATA)l  Apache Hadoop Committer & PMC member

Ø  130+ commits in 2015Ø  Working on usability and supportability

l  "Open-Source Professional Services" teamØ  Has deployed and supported 10k+ nodes of

Hadoop clusters overall for 7 yearsØ  Contributing to Apache Hadoop 6th in the world

with NTT [1]

Ø  ASF Committers: Spark, Yetus, HTrace(incubating)

[1] The Activities of Apache Hadoop Community 2014 http://ajisakaa.blogspot.com/2015/02/the-activities-of-apache-hadoop.html

Page 3: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

3 Copyright © 2016 NTT DATA Corporation

Agenda

n  What's Apache Hadoop 3?n  Differences between Hadoop 3 and 2

l  New Featuresl  Incompatible Changes

n  Current Statusn  Summary

Page 4: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

Copyright © 2016 NTT DATA Corporation 4

What's Apache Hadoop 3?

Page 5: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

5 Copyright © 2016 NTT DATA Corporation

Disclaimer

n Apache Hadoop 3 is now undefinedl  Not releasedl  Not decided what feature is in or not

n This presentation is not always correct

n I'd like to introduce as far as I know

Page 6: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

6 Copyright © 2016 NTT DATA Corporation

What's Apache Hadoop 3?

20142010 2011 201320122009 2015

2.2.0

2.3.0

2.4.02.0.0-alpha

2.1.0-beta

branch-1 (branch-0.20)

1.0.0 1.1.0 1.2.1(stable)0.20.1 0.20.205

0.22.0

0.21.0New append

Security

0.23.0

0.23.11(final)NameNode Federation, YARN

NameNode HA

HDFS Snapshots NFSv3 support Windows

Heterogeneous storage HDFS in-memory caching

HDFS ACLs HDFS Rolling Upgrades Application History Server RM Automatic Failover

2.5.0

2.6.0YARN Rolling Upgrades Transparent Encryption Archival Storage

2.7.0

Drop JDK6 support Truncate API

2016

branch-0.23

branch-2

trunk

Hadoop 3 and 2 were diverged in 2011 (5 years ago!)

Hadoop1 (EOL)

Hadoop2

Hadoop3

Page 7: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

7 Copyright © 2016 NTT DATA Corporation

Discussions for releasing Apache Hadoop 3

n 2014/6l  Discussed releasing Hadoop 3 with upgrading JDK

versionl  Decided to make 2.6 the last release that supports

JDK6n  2015/3

l  Oracle JDK7 is EoL, so we need to upgradel  After all we put off the event for another 12 months

n  2016/2l  It's time to revisit Hadoop 3 release plansl  Developers are agreed with releasing Hadoop 3

Page 8: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

8 Copyright © 2016 NTT DATA Corporation

When will it be released?

n Developer mailing list:l  Releasing alphas through the summerl  Freeze and stabilize for GA in Nov/Dec.

n I suppose Hadoop 3 GA will be released in the end of 2016

n There are many tasks to dol  I'll introduce them in this slide later

Page 9: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

Copyright © 2016 NTT DATA Corporation 9

Difference between Hadoop 3 and 2

Page 10: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

10 Copyright © 2016 NTT DATA Corporation

YARN is unchanged

n YARN: the biggest difference between Hadoop 1 & 2n Hadoop 3 still uses YARN

HDFS

MapReduce

HDFS

Map Reduce Spark

Tez

Hadoop 1 Hadoop 2 & Hadoop 3

YARN

Hive Pig Hive

Pig Storm ・・・・・・・・・・

Page 11: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

11 Copyright © 2016 NTT DATA Corporation

New features

n 440 fixed issues only in Hadoop 3 (as of 5/8/2016)l https://s.apache.org/GpUq

n HDFS erasure codingn Shell script rewriten Task level native optimizationn Derive heap size or mapreduce.*.memory.mb

automaticallyn Support more than 2 NameNodesn and more

Page 12: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

12 Copyright © 2016 NTT DATA Corporation

Break compatibility

n Major version up is to clean up the codel  Deprecated APIs can be removed only in changing

major versionØ  @Public and @Stable Java APIØ  REST APIØ  Metrics/JMXØ  CLIØ  Environment variables

l  Wire-compatibility can be brokenØ  2.X client cannot talk to 3.X server and vice versa

l  Compatibility Guide:Ø https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html

Page 13: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

Copyright © 2016 NTT DATA Corporation 13

New Features

Page 14: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

14 Copyright © 2016 NTT DATA Corporation

Erasure Coding (HDFS-7285)

n  Probleml  Reduce costs of storagel  Blocks are replicated to 3 DNs

Ø  3x storage overhead is costly

n  Solutionl  Use Erasure Code

3-replication (6,3)-Reed-SolomonTolerates 2 failures 3 failures

Disk Usage 3x 1.5x

Page 15: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

15 Copyright © 2016 NTT DATA Corporation

Erasure Coding: Write files using (6,3)-Reed-Solomon

n  Write data to 9 DNs in parallel

DN1

DN6

DN7

DN9

・・・

・・・Incoming Data

・・・

ECClient

・・・

3 Parity Blocks

6 Data Blocks

Page 16: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

16 Copyright © 2016 NTT DATA Corporation

Erasure Coding: Read files

n  Read data from 6 DNs in parallel

DN1

DN6

DN7

DN9

・・・

・・・

ECClient

・・・

Page 17: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

17 Copyright © 2016 NTT DATA Corporation

Erasure Coding: Read files when DN fails

n  Read data from arbitrary 6 DNs in parallel

DN1

DN6

DN7

DN9

・・・

・・・

ECClient

・・・ ×

Page 18: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

18 Copyright © 2016 NTT DATA Corporation

Erasure Coding: Current Status

n Phase 1: striping layoutl  C = 64KB (default)l  Work for small filesl  No data localityl  Available on trunk

n Phase 2: contiguous layoutl  C = 128MB (= HDFS Block size)l  Not work for small filesl  Data localityl  Now in progress (HDFS-8030)

Incoming Data

DataNode 1DataNode 2DataNode 3DataNode 4DataNode 5

・・・

Cell size (C)

Page 19: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

19 Copyright © 2016 NTT DATA Corporation

Shell Script Rewrite (HADOOP-9902)

n Hadoop and Shell Scriptl  Launching daemonsl  Hadoop CLI

n Difficult to understandl  What is the correct env var to set a option

Ø  java classpath?Ø  java.library.path?Ø  GC options?

l  How to add the option to the env varl  We have to read almost all the shell scripts!

Page 20: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

20 Copyright © 2016 NTT DATA Corporation

After rewriting the scripts ...

n Easy to understandn Because shell API doc is available

l  Shelldoc maker generates docs from the scriptsl  Similar to JavaDoc

Public API

My documentation build (trunk): http://aajisaka.github.io/hadoop-project/hadoop-project-dist/hadoop-common/UnixShellAPI.html

Page 21: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

21 Copyright © 2016 NTT DATA Corporation

.hadoop-env and .hadooprc (HADOOP-11353, HADOOP-13045)

n Very similar to .bashrcl  Read the API docl  Create your own ~/.hadoopXX

Ø  .hadoop-env : hadoop-env.sh for each userØ  .hadooprc : called after shell env vars are configured

l  And that's all :)

n ex.) Set additional classpath (.hadooprc)hadoop_add_classpath /path/to/my/jar

Page 22: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

22 Copyright © 2016 NTT DATA Corporation

--debug option is available

n Useful for troubleshooting

$ hadoop --debug versionDEBUG: hadoop_parse_args: procesiong versionDEBUG: hadoop_parse: asking caller to skip 1DEBUG: HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop(snip)DEBUG: Applying the user's .hadooprcDEBUG: Initial CLASSPATH=/path/to/my/jarDEBUG: Initialize CLASSPATH(snip)DEBUG: Initial CLASSPATH=/usr/local/hadoop/share/hadoop/common/lib/*

CLASSPATH was overwritten!! (before HADOOP-13045)

Page 23: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

23 Copyright © 2016 NTT DATA Corporation

Many new features, bug fixes, improvements

n  'hadoop distch' to change the ownership and permissions on many files via MapReduce job

n  'hadoop jnipath' to print java.library.pathn  'hadoop --daemon' instead of hadoop-daemon.sh

l  ex.) hdfs --daemon status namenodel  The return code for status is LSB-compatiblel  hadoop-daemon(s).sh are now deprecated

n .out files are now appended (not overwritten)l  Allows external log rotation

n and many morel  see https://issues.apache.org/jira/browse/HADOOP-9902

Page 24: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

24 Copyright © 2016 NTT DATA Corporation

Task level native optimization (MAPREDUE-2841)

n Add a native implementation of the map output collectorl  Sort, Spill and IFile serialization

n Prequisitesl  Built with -Pnative optionl  Custom writable types and comparators are not

supportedn Setting

<property name="mapreduce.job.map.output.collector.class" value="org.apache.hadoop.mapred.nativetask.NativeMapOutputCollectorDelegator">

Tips: compact xml form will be supported in Hadoop 3

(HADOOP-6964)

Page 25: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

25 Copyright © 2016 NTT DATA Corporation

Benchmark

n Release Note in the issue:l  "For shuffle-intensive jobs this may provide

speed-ups of 30% or more."

n Benchmarked with 3 slaves (m3.xlarge)l  CentOS 7.2l  3.0.0-SNAPSHOT (revision 5865fe2b)

n A very shuffle-intensive wordcount jobl  Input: 2.6GB (compressed)l  Shuffle: 14GBl  Output: 10GB

Page 26: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

26 Copyright © 2016 NTT DATA Corporation

Benchmark result

n Result: 506sec -> 383sec (25% speedup)l  Average Map Time: 182sec -> 89sec

n Map task log (native optimization enabled)

n Tips: Backported to CDH 5.2.0 or later

INFO [main] org.apache.hadoop.mapred.nativetask.HadoopPlatform: Hadoop platform initedINFO [main] org.apache.hadoop.mapred.nativetask.NativeRuntime: Nativetask JNI library loaded.INFO [main] org.apache.hadoop.mapred.nativetask.handlers.NativeCollectorOnlyHandler: NativeTask Combiner is enabled, class = org.apache.hadoop.examples.WordCount$IntSumReducerINFO [main] org.apache.hadoop.mapred.nativetask.NativeBatchProcessor: NativeHandler: direct buffer size: 1048576INFO [main] org.apache.hadoop.mapred.nativetask.handlers.NativeCollectorOnlyHandler: [NativeCollectorOnlyHandler] combiner is not nullINFO [main] org.apache.hadoop.mapred.nativetask.NativeBatchProcessor: NativeHandler: direct buffer size: 1048576INFO [main] org.apache.hadoop.mapred.nativetask.util.OutputUtil: nativetask.output.manager = org.apache.hadoop.mapred.nativetask.util.NativeTaskOutputFilesINFO [main] org.apache.hadoop.mapred.nativetask.NativeMapOutputCollectorDelegator: Native output collector can be successfully enabled!INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.nativetask.NativeMapOutputCollectorDelegator

Page 27: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

27 Copyright © 2016 NTT DATA Corporation

Derive heap size or mapreduce.*.memory.mb automatically (MAPREDUCE-5785)

n In Hadoop 2, two similar properties must be set :(l  mapreduce.{map,reduce}.memory.mb

Ø  The amount of memory to request from the scheduler for each task (ex. 2048)

l  mapreduce.{map,reduce}.java.optsØ  Java options for YARN containers (ex. -Xmx2G)

n In Hadoop 3, either is enoughl  .java.opts is derived from .memory.mb and vice

versaØ  .java.opts = .memory.mb * mapreduce.job.heap.memory-mb.ratio

Ø  .memory.mb = .java.opts / mapreduce.job.heap.memory-mb.ratio

Page 28: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

28 Copyright © 2016 NTT DATA Corporation

Support more than two NameNodes (HDFS-6440)

n Hadoop 2 now supports only 2 NameNodesl  1 active and 1 standby

n Hadoop 3 supports 2 or more standby NameNodesl  provides additional fault-tolerancel  avoids multiple standby NNs to checkpoint at the

same timel  # of standby should be small due to block report

n FYI: ResourceManager already supports multi-standby in branch-2

Page 29: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

29 Copyright © 2016 NTT DATA Corporation

Other new features

n  metrics2 sink plugin for Apache Kafka (HADOOP-10949)n  .jhist default format is changed from json to binary

(MAPREDUCE-6613)n  Use FileOutputCommitter v2 by default

(MAPREDUCE-6336)n  Allow/disallow snapshots via WebHDFS (HDFS-9057)n  Check and make checkpoint before stopping NameNode

(HDFS-6353)n  YARN TimelineServer v2 (YARN-2928)n  YARN WebUI v2 (YARN-3368)n  Dynamic subcommands (HADOOP-12930)n  and many other changes

Page 30: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

Copyright © 2016 NTT DATA Corporation 30

Incompatible Changes

Page 31: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

31 Copyright © 2016 NTT DATA Corporation

Incompatible changes

n Many deprecated APIs will be removedl  hftp/hsftp/s3 -> webhdfs/s3{n,a}l  Metrics v1l  org.apache.hadoop.Recordsl  and more

n Improved CLI outputl  'mapred job -list' shows the job name as welll  'hadoop fs -du' shows the raw disk usage, and

aligned more unix-likel  and more

n Search 'Incompatible change' flagl  https://s.apache.org/sMO4

Page 32: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

32 Copyright © 2016 NTT DATA Corporation

STARTUP_MSG: classpath = /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/re2j-1.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/common/lib/zookeeper-3.4.6.jar:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-api-1.7.10.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/usr/local/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-client-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-json-1.9.jar:!/usr/local/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/nimbus-jose-jwt-3.9.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/common/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/common/lib/junit-4.11.jar:/usr/local/hadoop/share/hadoop/common/lib/hamcrest-core-1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/htrace-core4-4.0.1-incubating.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-annotations-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-server-1.9.jar:/usr/local! /hadoop/share/hadoop/common/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/json-smart-1.1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/usr/local/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/common/lib/jsch-0.1.51.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-framework-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/httpcore-4.2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/jcip-annotations-1.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/avro-1.7.4.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/common/lib/httpclient-4.2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-auth-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-digester-1.8.jar:/usr/local/hadoop/share/hadoop/common/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/common/lib/gson-2.2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jets3t-0.9.0.jar:/usr/local/hadoop/share/hadoop/common/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-net-3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/common/lib/mockito-all-1.8.5.jar:/usr/local/hadoop/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar:/usr/local/hadoop/share/hadoop/common/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-3.0.0-SNAPSHOT-tests.jar:/usr/local/hadoop/share/hadoop/common/hadoop-nfs-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/hadoop-hdfs-client-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-all-4.1.0.Beta5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/hpack-0.11.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/okio-1.4.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/okhttp-2.4.0.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-nfs-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-nativetask-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT-tests.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/lib/javassist-3.18.1-GA.jar:/usr/local/hadoop/share/hadoop/yarn/lib/metrics-core-3.0.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/curator-test-2.7.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/fst-2.24.jar:/usr/local/hadoop/share/hadoop/yarn/lib/objenesis-2.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-client-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-math-2.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-api-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-registry-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.0.0-SNAPSHOT.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-client-3.0.0-SNAPSHOT.jar:/usr/java/latest/lib/tools.jar

Bump up the versions of the libraries

n  Drop JDK7 support (HADOOP-11858)l  Hello Lambda!

n  Dependency Helll  Tomcatl  Jettyl  Jerseyl  Guaval  Log4Jl  Jacksonl  and many many many more ...

Page 33: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

33 Copyright © 2016 NTT DATA Corporation

Classpath isolation (HADOOP-11656)

n Relaxing the "dependency hell"l  Separate client and server jarsl  Client jar does not pull any third party

dependencies

n If the isolation is done ...l  We can safely upgrade the libraries in server codel  In branch-2, the upgrade is incompatible :(

Page 34: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

Copyright © 2016 NTT DATA Corporation 34

Current Status

Page 35: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

35 Copyright © 2016 NTT DATA Corporation

Current Status

n Most of the new features are already availablel  Erasure Codingl  Shell Scripts Rewritel  Task level native optimization

n Need more contributionl  Bumping the library versions l  Classpath isolationl  Remove deprecated XXX

n Discussionl  Release from trunk or cut new branch-3l  How should we make the releases alpha, beta, or

GA?

Page 36: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

Copyright © 2016 NTT DATA Corporation 36

Summary

Page 37: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

37 Copyright © 2016 NTT DATA Corporation

Summary

n Apache Hadoop 3 hasl  Many new features and code cleanupsl  Many remaining tasks

n Need your help to release Hadoop 3 earlier!l  Not only creating patches but also testing are

welcome!

n (Probably) Hadoop 3 GA will be released in 2016

Page 38: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

38 Copyright © 2016 NTT DATA Corporation

Comments?

n Suppose there are many Apache Hadoop developers/users in this room :)

n Tell mel  if there are any required tasks not in this talkl  if I am misunderstanding somethingl  everything you want to tell

n Discussionl  Release from trunk or cut branch-3l  When we should support JDK9?l  What is alpha/beta/GA?l  ...

n  I'd like to feedback to the community

Page 39: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

Copyright © 2011 NTT DATA Corporation

Copyright © 2016 NTT DATA Corporation

Page 40: Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

40 Copyright © 2016 NTT DATA Corporation

Appendix

n Native erasure coding support inside HDFS (Strata + Hadoop World New York 2015)l http://conferences.oreilly.com/strata/big-data-

conference-ny-2015/public/schedule/detail/42957