Top Banner

Click here to load reader

Apache hadoop by shah

Jul 17, 2015

ReportDownload

Technology

PowerPoint Presentation

APACHE HADOOPSHAH HUSSAIN1213313318

DATA IS EVERYWHEREDATA IS IMPORTANTWhat is Hadoop?

Motivation of HadoopHow do you scale up applications?Run jobs processing 100s of terabytes of dataTakes 11 days to read on 1 computerNeed lots of cheap computersFixes speed problem (15 minutes on 1000 computers), butReliability problemsIn large clusters, computers fail every dayCluster size is not fixedNeed common infrastructureMust be efficient and reliable

Motivation of HadoopOpen Source Apache ProjectHadoop Core includes:Distributed File System - distributes dataMap/Reduce - distributes applicationWritten in JavaRuns on Linux, Mac OS/X, Windows, and SolarisCommodity hardware

Fun Fact of Hadoop"The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell andpronounce, meaningless, and not used elsewhere: those are my naming criteria.Kids are good at generating such. Googol is a kids term." ---- Doug Cutting, Hadoop project creator

History of Hadoop

Apache Nutch

Doug Cutting

Map-reduce 2004It is an important technique!Reads paperExtendedJoins Yahoo! at 2006The great journey begins

Nowadays

When you visit yahoo, you are interacting with data processed with Hadoop!NowadaysYahoo! has ~20,000 machines running HadoopThe largest clusters are currently 2000 nodesSeveral petabytes of user data (compressed, unreplicated)Yahoo! runs hundreds of thousands of jobs every month

ApplicationsWho use Hadoop?AmazonAOLFacebookFox interactive mediaGoogle IBMNew York TimesPowerSet (now Microsoft)QuantcastRackspace/MailtrustVeohYahoo!

Referenceshttp://hadoop.apache.org/http://en.wikipedia.org/wiki/Apache_Hadoophttps://github.com/apache/hadoophttp://www.cloudera.com/content/cloudera/en/about/hadoop-and-big-data.html

Questions?

THANK YOU!