YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Apache hadoop by shah

APACHE HADOOP

SHAH HUSSAIN

1213313318

Page 2: Apache hadoop by shah

DATA IS EVERYWHERE

DATA IS IMPORTANT

Page 3: Apache hadoop by shah

What is Hadoop?

Page 4: Apache hadoop by shah
Page 5: Apache hadoop by shah

Motivation of Hadoop

• How do you scale up applications?– Run jobs processing 100’s of terabytes of data

– Takes 11 days to read on 1 computer

• Need lots of cheap computers– Fixes speed problem (15 minutes on 1000

computers), but…

– Reliability problems• In large clusters, computers fail every day

• Cluster size is not fixed

• Need common infrastructure– Must be efficient and reliable

Page 6: Apache hadoop by shah

Motivation of Hadoop

• Open Source Apache Project

• Hadoop Core includes:

– Distributed File System - distributes data

– Map/Reduce - distributes application

• Written in Java

• Runs on

– Linux, Mac OS/X, Windows, and Solaris

– Commodity hardware

Page 7: Apache hadoop by shah

Fun Fact of Hadoop

"The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria. Kids are good at generating such. Googol is a kid’s term."

---- Doug Cutting, Hadoop project creator

Page 8: Apache hadoop by shah

History of Hadoop

Apache Nutch

Doug Cutting

“Map-reduce”2004

“It is an important technique!”

Extended

The great journey begins…

Page 9: Apache hadoop by shah
Page 10: Apache hadoop by shah
Page 11: Apache hadoop by shah
Page 12: Apache hadoop by shah
Page 13: Apache hadoop by shah
Page 14: Apache hadoop by shah
Page 15: Apache hadoop by shah

Nowadays…

• When you visit yahoo, you are interacting with data processed with Hadoop!

Page 16: Apache hadoop by shah

Nowadays…• Yahoo! has ~20,000 machines running Hadoop

• The largest clusters are currently 2000 nodes

• Several petabytes of user data (compressed, unreplicated)

• Yahoo! runs hundreds of thousands of jobs every month

Page 17: Apache hadoop by shah

Applications…

• Who use Hadoop?

• Amazon

• AOL

• Facebook

• Fox interactive media

• Google

• IBM

• New York Times

• PowerSet (now Microsoft)

• Quantcast

• Rackspace/Mailtrust

• Veoh

• Yahoo!

Page 18: Apache hadoop by shah

References• http://hadoop.apache.org/

• http://en.wikipedia.org/wiki/Apache_Hadoop

• https://github.com/apache/hadoop

• http://www.cloudera.com/content/cloudera/en/about/hadoop-and-big-data.html

Page 19: Apache hadoop by shah

Questions?

Page 20: Apache hadoop by shah

THANK YOU!


Related Documents