Top Banner

Click here to load reader

20100130 hadoop apache

May 12, 2015




  • 1.Hadoop and HDFS in CMRIChina Mobile Research InstituteWANG, Xu [wangxu(at)]

2. Apache Hadoop Open source clone of Google infrastructure De facto standards of MapReduce framework, win Terasort several times Search Engine, Data Mining, Log Analyzing Clusters scale up to 4,000 nodes Yahoo!, Facebook, Cloudera Baidu, Alibaba, China Mobile 3. Hadoop in China 2009Beijing Nov 15, 2009 4. Subprojects of Hadoop DataK-V K- Store /Distributed WarehouseColumn basedLock DBHBaseZooKeeper Pig Hive Basic(BigTable)(Chubby) PlatformHadoopMapReduce (Google MapReduce)CoreHDFS (Google GFS) Serialized Data Format Hadoop Common Avro&(io, ipc.)(ipc)RPC JVM 5. HDFS Principles Follow Google GFS Paper For Big data storage and processing Write once, read frequently Modify is not permitted, append will be support soon Read is prior to writing Working on commodity PC Hardware may fail anytime Multiple replicas for data safety 6. HDFS Architecture 7. Data in HDFS NameNodes Memory Namespace Info FS Hierarchical Tree Map(file, blocks) DataNode Map Map(living datanode, blocks) Blocks Map Map(block, file/datanodes) Other runtime info Lock holding by clients Blocks being processed (replication, invalid) 8. Persistence of NameNode data NameNode persistence Namespace: FSImage & EditLog Starting & Shutdown Secondary NameNode Checkpoint (merge EditLog into FSImage) Periodically work (1 hour by default) Backup NameNode Introduced In 0.21 (not release yet) Real time Secondary NameNode or Remote Editlog DataNode Map and other Info only exists in NameNode Memory 9. High Availability Considerations Availability in MainstreamSPOF in NameNode, Fail of NameNode may causeService interruption for minutesData loss for a ckpt period (worst case) Possible Solution: DRBD+Linux-HAMature fail over mechanismService interruption for minutesAlmost no data loss Another Solution: NameNode Cluster ExtensionService continuousAlmost no data lossModify the codeConsistency vs. Performance 10. HDFS+NNC Architecture 11. NNC Design Master & Slave: 1:N Master synchronize the FSNamesystem to slaves Zookeeper works as a registry, client and datanode can lookup namenode list from it. DFSClient can access multiple namenode for reading operation Failover is controlled by linux- HA by far, which get namenode status info from ClientProtocol 12. Update Events NNU_NOP// nothing to do NNU_BLK// add or remove a block NNU_INODE// add or remove or modify an inode (add or remove file; new block allocation) NNU_NEWFILE // start new file NNU_CLSFILE// close new file NNU_MVRM // move or remove file NNU_MKDIR// mkdir NNU_LEASE// add/update or release a lease NNU_LEASE_BATCH //update batch of leases NNU_DNODEHB_BATCH //batch of datanode heartbeat NNU_DNODEREG// dnode register NNU_DNODEBLK// block report NNU_DNODERM // remove dnode NNU_BLKRECV // block received message from datanode NNU_REPLICAMON//replication monitor work NNU_WORLD //bootstrap a slave node NNU_MASSIVE //bootstrap a slave node 13. Performance and Other Issues The overhead of NameNode synchronizationFor typical file IO and MapReduce (sort, wordcount)NNC system reaches 95% performance of hadoop without NNCFor meta data write only operation (parallel touchz or mkdir)NNC system reaches 15% performance of hadoop without NNC Performance gaining of Multiple NameNode in read-only operationCannot observed till now, unfortunately Other design issueWhy from master to slaves directly without an additional delivery node?That may introduce another SPOF, and make the problem more complex.Why dont use Zookeeper for failover?Linux-HA works well, and we are also evaluate whether change to ZK, anysuggestions? 14. Q&A

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.