Top Banner

Click here to load reader

MapReduce Over Lustre

Jan 20, 2015

ReportDownload

Technology

david-luan

integrate Hadoop with Lustre

  • 1. MapReduce over Lustre report David Luan, Simon Huang, GaoShengGong2008.10~2009.6

2. Outline

  • Early research, analysis
  • Platform design & improvement
  • Test cases, test process design
  • Result analysis
  • Related jobs(GFS-like redundancy)
  • White paper & conclusion

3. Early research, analysis

  • HDFS, Lustre overallBenchmarktests
    • IOZone
    • IOR
    • WebDAV (an indirect way to mount HDFS)
  • Hadoop platform overview
    • MapReduce
    • Three kinds of Hadoop I/O
    • Shortcoming & bottlenecks
  • Lustre platform
    • Modules analysis
    • Shortcomings

4. Early research, analysis Overall Benchmark tests 5. Early research, analysis Input Map Key, Value Key, Value = Map Map Split Input into Key-Value pairs. For each K-V pair call Map. Each Map produces new set of K-V pairs. Reduce(K, V[]) Sort Output Key, Value Key, Value = For each distinct key, call reduce. Produces one K-V pair for each distinct key.Output as a set of Key Value Pairs. MapReduce Flow Key, Value Key, Value Key, Value Key, Value Key, Value Key, Value 6. Early research, analysis Hadoop I/O phases Map Read Local Read Local Read HTTP Reduce write 7. Early research, analysis

  • Hadoop + HDFS
    • Job / Tasklevel parallel
    • Compute/storagetightlycoupled
    • HDFS prefer huge files
    • app limited ( job splitdifficult)
      • Distribute grep
      • Distribute sort
      • Log Processing
      • Data Warehousing
  • Lustre
    • I/Olevel parallel
    • Compute/storage loose coupled
    • POSIX compatible
    • Apps
      • Super computer

Platform comparison 8. Early research, analysis

  • HDFS shortcom.
    • Metadata design
    • No parallel I/O
    • No general use (design for MapReduce)
  • Lustre shortcom.
    • inadequate reliability
    • inadequate stability
    • No native redundancy

Shortcomingscomparison 9. Outline

  • Early research, analysis
  • Platform design & improvement
  • Test cases, test process design
  • Result analysis
  • Related jobs(GFS-like redundancy)
  • White paper & conclusion

10. Platform design & improvement

  • Two ways:
  • Java wrapper for liblustre (without Lustre client )
    • Motivation
    • Design a method to merge these two system. Implement Hadoops FileSystem interface with java wrapper, then MapReduce can work without Lustre Client.
    • Touch impasse
  • Use Lustre Client
    • Design
    • Improvement

11. Platform design & improvement

  • Java wrapper touch impasse -_-
  • JNI call liblustre.so error:
    • Java JNI willmis-linkthe function whose name is the same assystem call(such as: mount, read, write, etc.)
    • If we use C to call static-lib (liblutre.a), compile to a executable program, it works ok.
  • liblustres other problems
    • Liblustre is not recommended to use in wiki
    • When use it, use liblustre.a instead of liblustre.so
    • Liblustre depends on gcc version

12. Platform design & improvement

  • Advantages for each Task (with Lustre)
    • Decentralized I/O
    • Lustre can writeparallel
    • Lustre iscommon usage
    • Great fornon-splitablejobs

Platform design(1) advantages: 13. Platform design & improvement

  • Platform design(2) modules

14. Platform design & improvement Platform design(3) read/write 15. Platform design & improvement

  • UseHardlinkin instead ofHTTPshuffle before ReduceTaskstarts[1]
    • decentralized network bandwidth usage
    • delay ReduceTask actual Read/Write
  • Use Lustreblock location infoto distribute tasks[2]
    • move the compute to its data
    • Save network bandwidth
    • Use a java Child thread to run shell to fetch the location info (detail in White paper)

Platformimprovement1 16. Platform design & improvement

  • Platform improvement2

Addlocationinfoas a schedulingparameterUse hardlinktodelay shuffle pahse 17. Outline

  • Early research, analysis
  • Platform design & improvement
  • Test cases, test process design
  • Result analysis
  • Related jobs
  • White paper & conclusion

18.

  • Test cases design (Two kinds apps)
  • Apps of statistics (search, log processing, etc.)
    • Little grained tasks (jobtasks)
    • MapTask intermediate result is small
  • Apps of no-good splitable & highly complex
    • large grained tasks (jobtasks)
    • MapTask intermediate result is big
    • Each task is highly compute
    • Each task needs big I/O

Test cases, test process design 19. Platform design & improvement

  • Apps of highly complex, no-good splitable

intermediate result isbig Each task ishighly compute 20. Test cases, test process design

  • Test cases :
  • Apps of statistics:WordCount
  • This test reads text files and count each words. The output contains a word and its count, separated by a tab.
  • Apps of no-good splitable :BigMapoutput
  • It is a map/reduce program that works on a very big non-splittable file , for map or reduce tasks it just read the input file and the do nothing but output the same file.

21. Test cases, test process design

  • Test results
  • Overall execute time
  • Time of each phase
    • Map Read phase(the most time-consuming for Lustre )
    • Local read/write and HTTP phase
    • Reduce write phase

22. Test cases, test process design

  • Test scene :
  • No optimization
  • Use hardlink
  • Use hadlink and location info
  • Lustre tuning
    • Stripe size=?
    • Stripe count=?

23. Outline

  • Early research, analysis
  • Platform design & improvement
  • Test cases, test process design
  • Result analysis
  • Related jobs(GFS-like redundancy)
  • White paper & conclusion

24. Result analysis

  • Result analysis
  • Conclusion

25. Result analysis

  • Test1: WordCount with a big file
    • process one big textfile(6G)
    • blocksize=32m
    • Reduce Tasks=0.95((1.75))*2*7=13

26. Result analysis

  • Test2:WordCount with many small files
      • process a large number small files(10000)
      • Reduce Tasks=0.95*2*7=13

27. Result analysis

  • Test3: BigMapOutput with one big file
  • Result1:
  • Result2 fresh memory
  • Result3 (set mapred.local.dir to default value)

28. Result analysis

  • Test4: BigMapOutput with hardlink
  • Test5: BigMapOutput with hardlink & location information

29. Result analysis

  • Test6: BigMapOutputMap Readphase
  • Conclusion
    • Map Read is The mosttime-consultingpart :

30. Result analysis Conclusion 1: Hadoop+ HDFS Map Read Local Read Local Read HTTP Reduce write 31. Result analysis Conclus