Top Banner

Click here to load reader

Apache Hadoop 2.0

Feb 25, 2016

ReportDownload

Documents

ofira

Apache Hadoop 2.0. Migration from 1.0 to 2.0. Vinod Kumar Vavilapalli Hortonworks Inc v inodkv [at] apache.org @ tshooter. Hello!. 6.5 Hadoop-years old Previously at Yahoo!, @ Hortonworks now. - PowerPoint PPT Presentation

Hortonworks

Apache Hadoop 2.0Migration from 1.0 to 2.0Vinod Kumar VavilapalliHortonworks Incvinodkv [at] [email protected] 1 Hortonworks Inc. 2014Hello!6.5 Hadoop-years oldPreviously at Yahoo!, @Hortonworks now.Last thing at School a two node Tomcat cluster. Three months later, first thing at job, brought down a 800 node cluster ;)Two hatsHortonworks: Hadoop MapReduce and YARNApache: Apache Hadoop YARN lead. Apache Hadoop PMC, Apache MemberWorked/working onYARN, Hadoop MapReduce, HadoopOnDemand, CapacityScheduler, Hadoop securityApache Ambari: Kickstarted the project and its first releaseStinger: High performance data processing with Hadoop/HiveLots of random trouble shooting on clusters99% + code in Apache, Hadoop

Page 2Architecting the Future of Big Data Hortonworks Inc. 2014AgendaApache Hadoop 2Migration Guide for AdministratorsMigration Guide for UsersSummaryPage 3Architecting the Future of Big Data Hortonworks Inc. 2014Apache Hadoop 2Next Generation ArchitectureArchitecting the Future of Big DataPage 4 Hortonworks Inc. 2014Hadoop 1 vs Hadoop 2HADOOP 1.0HDFS(redundant, reliable storage)MapReduce(cluster resource management & data processing)HDFS2(redundant, highly-available & reliable storage)YARN(cluster resource management)MapReduce(data processing)OthersHADOOP 2.0Single Use SystemBatch AppsMulti Purpose PlatformBatch, Interactive, Online, Streaming, Page 5 Hortonworks Inc. 20145Why Migrate?2.0 > 2 * 1.0HDFS: Lots of ground-breaking featuresYARN: Next generation architectureBeyond MapReduce with Tez, Storm, Spark; in Hadoop!Did I mention Services like HBase, Accumulo on YARN with HoYA?Return on Investment: 2x throughput on same hardware!Page 6Architecting the Future of Big Data Hortonworks Inc. 2014Yahoo!On YARN (0.23.x)Moving fast to 2.xPage 7Architecting the Future of Big Datahttp://developer.yahoo.com/blogs/ydn/hadoop-yahoo-more-ever-54421.html

Hortonworks Inc. 2014TwitterPage 8Architecting the Future of Big Data

Hortonworks Inc. 2014HDFSHigh Availability NameNode HAScale further FederationTime-machine HDFS SnapshotsNFSv3 access to data in HDFSPage 9Architecting the Future of Big Data Hortonworks Inc. 2014HDFS Contd.Support for multiple storage tiers Disk, Memory, SSDFiner grained access ACLsFaster access to data DataNode CachingOperability Rolling upgradesPage 10Architecting the Future of Big Data Hortonworks Inc. 2014YARN: Taking Hadoop Beyond BatchPage 11Applications Run Natively in HadoopHDFS2 (Redundant, Reliable Storage)YARN (Cluster Resource Management)

BATCH(MapReduce)INTERACTIVE(Tez)STREAMING(Storm, S4,)GRAPH(Giraph)IN-MEMORY(Spark)HPC MPI(OpenMPI)ONLINE(HBase)OTHER(Search)(Weave)Store ALL DATA in one place

Interact with that data in MULTIPLE WAYS

with Predictable Performance and Quality of Service Hortonworks Inc. 201455 Key Benefits of YARNScaleNew Programming Models & ServicesImproved cluster utilizationAgilityBeyond Java

Page 12 Hortonworks Inc. 2014Any catch?I could go on and on about the benefits, but whats the catch?Nothing major!Major architectural changesBut the impact on user applications and APIs kept to a minimalFeature parityAdministratorsEnd-users

Page 13Architecting the Future of Big Data Hortonworks Inc. 2014AdministratorsGuide to migrating your clusters to Hadoop-2.xArchitecting the Future of Big DataPage 14 Hortonworks Inc. 2014New EnvironmentHadoop Common, HDFS and MR are installable separately, but optionalEnvHADOOP_HOME deprecated, but worksThe environment variables - HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, HADOOP_MAPRED_HOME,HADOOP_YARN_HOME : NewCommandsbin/hadoop works as usual but some sub-commands are deprecatedSeparate commands for mapred and hdfshdfs fs -lsmapred job -kill bin/yarn-daemon.sh etc for starting yarn daemons

Page 15Architecting the Future of Big Data Hortonworks Inc. 2014Wire compatibilityNot RPC wire compatible with prior versions of HadoopAdmins cannot mix and match versions Clients must be updated to use the same version of Hadoop client library as the one installed on the cluster.

Page 16Architecting the Future of Big Data Hortonworks Inc. 2014Capacity managementSlots -> Dynamic memory based ResourcesTotal memory on each nodeyarn.nodemanager.resource.memory-mbMinimum and maximum sizesyarn.scheduler.minimum-allocation-mbyarn.scheduler.maximum-allocation-mbMapReduce configs dont changemapreduce.map.memory.mbmapreduce.map.java.opts

Page 17Architecting the Future of Big Data

Hortonworks Inc. 2014Cluster SchedulersConcepts stay the sameCapacityScheduler: Queues, User-limitsFairScheduler: PoolsWarning: Configuration names now have YARN-ismsKey enhancementsHierarchical Queues for fine-grained controlMulti-resource scheduling (CPU, Memory etc.)Online administration (add queues, ACLs etc.)Support for long-lived services (HBase, Accumulo, Storm) (In progress)Node Labels for fine-grained administrative controls (Future)

Page 18Architecting the Future of Big Data Hortonworks Inc. 2014ConfigurationWatch those damn knobs!Should work if you are using the previous configs in Common, HDFS and client side MapReduce configsMapReduce server side is toastNo migrationJust use new configsPast sinsFrom 0.21.xConfiguration names changed for better separation: client and server config namesCleaning up naming: mapred.job.queue.name mapreduce.job.queuenameOld user-facing, job related configs work as before but deprecatedConfiguration mappings exist

Page 19Architecting the Future of Big Data Hortonworks Inc. 2014Installation/UpgradeFresh installUpgrading from an existing version

Fresh InstallApache Ambari : Fully automated!Traditional manual install of RPMs/Tarballs

UpgradeApache AmbariSemi automatedSupplies scripts which take care of most thingsManual upgrade

Page 20Architecting the Future of Big Data Hortonworks Inc. 2014HDFS Pre-upgradeBackup Configuration filesStop users!Run fsck and fix any errorshadoop fsck / -files -blocks -locations > /tmp/dfs-old-fsck-1.logCapture the complete namespacehadoop dfs -lsr / > dfs-old-lsr-1.logCreate a list of DataNodes in the clusterhadoop dfsadmin -report > dfs-old-report-1.logSave the namespacehadoop dfsadmin -safemode enterhadoop dfsadmin saveNamespace

Back up NameNode meta-datadfs.name.dir/editsdfs.name.dir/image/fsimagedfs.name.dir/current/fsimagedfs.name.dir/current/VERSIONFinalize the state of the filesystemhadoop namenode finalizeOther meta-data backupHive Metastore, Hcat, Ooziemysqldump

Page 21Architecting the Future of Big Data Hortonworks Inc. 2014HDFS UpgradeStop all servicesTarballs/RPMsPage 22Architecting the Future of Big Data Hortonworks Inc. 2014HDFS Post-upgradeProcess livelinessVerify that all is wellNamenode goes out of safe mode: hdfs dfsadmin -safemode waitFile-System healthCompare from beforeNode listFull NamespaceYou can start HDFS without finalizing the upgrade. When you are ready to discard your backup, you can finalize the upgrade.hadoop dfsadmin -finalizeUpgrade

Page 23Architecting the Future of Big Data Hortonworks Inc. 2014MapReduce upgradeAsk users to stop their thingStop the MR sub-systemReplace everything Page 24Architecting the Future of Big Data Hortonworks Inc. 2014HBase UpgradeTarballs/RPMsHBase 0.95 removed support for Hfile V1Before the actual upgrade, check if there are HFiles in V1 format using HFileV1Detector/usr/lib/hbase/bin/hbase upgrade executePage 25Architecting the Future of Big Data

Hortonworks Inc. 2014UsersGuide to migrating your applications to Hadoop-2.xArchitecting the Future of Big DataPage 26 Hortonworks Inc. 2014Migrating the Hadoop StackMapReduceMR StreamingPipesPigHiveOozie

Page 27Architecting the Future of Big Data Hortonworks Inc. 2014MapReduce ApplicationsBinary Compatibility of org.apache.hadoop.mapred APIsFull binary compatibility for vast majority of users and applicationsNothing to do!Use existing MR application jars of your existing application via bin/hadoop to submit them directly to YARN

mapreduce.framework.name yarn

Page 28Architecting the Future of Big Data Hortonworks Inc. 2014MapReduce Applications contd.Source Compatibility of org.apache.hadoop.mapreduce APIMinority of usersProved to be difficult to ensure full binary compatibility to the existing applicationsExisting application using mapreduce APIs are source compatibleCan run on YARN with no changes, need recompilation only

Page 29Architecting the Future of Big Data Hortonworks Inc. 2014MapReduce Applications contd.MR Streaming applicationswork without any changesPipes applicationswill need recompilationPage 30Architecting the Future of Big Data Hortonworks Inc. 2014MapReduce Applications contd.ExamplesCan run with minor tricksBenchmarksTo compare 1.x vs 2.xThings to doPlay with YARNCompare performancePage 31Architecting the Future of Big Datahttp://hortonworks.com/blog/running-existing-applications-on-hadoop-2-yarn/ Hortonworks Inc. 2014MapReduce feature paritySetup, cleanup tasks are no longer separate tasks, And we dropped the optionality (which was a hack anyways).JobHistoryJobHistory file format changed to avro/json based.Rumen automatically recognizes the new format.Parsing history files yourselves? Need to move to new parsers.

Page 32Architecting the Future of Big Data Hortonworks Inc. 2014User logsPutting user-logs on DFS.AM logs too!While the job is running, logs are on the individual nodesAfter that on DFSProvide pretty printers and parsers for various log files syslog, stdout, stderrUser logs directory with quotas beyond their current user directoriesLogs expire after a month by default and get GCed.

Page 33Architecting the Future of Big Data Hortonworks Inc. 2014Application recoveryNo more lost applications on the master restart!Applications do not lose previously completed workIf AM crashes, RM will restart it from where i

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.