Top Banner
www.company.com PRESENTED BY : SHWETA PATNAIK-120101CSR014 Apache Hadoop Technology
21

Apache hadoop technology : Beginners

Apr 14, 2017

Download

Technology

Shweta Patnaik
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Apache hadoop technology : Beginners

www.company.com

PRESENTED BY :SHWETA PATNAIK-120101CSR014

Apache Hadoop Technology

Page 2: Apache hadoop technology : Beginners

www.company.com

Content :• Introduction to Hadoop• Hadoop architecture• What is Apache Hadoop• Data flow• MapReduce• HDFS• YARN Framework• Who uses Hadoop• Hadoop in enterprises• Advantage • Conclusion

Page 3: Apache hadoop technology : Beginners

www.company.com

What is Hadoop :• Hadoop is a free, Java-based programming framework

that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.

• At its core, Hadoop has two major layers namely: – (a) Processing/Computation layer (MapReduce), and – (b) Storage layer (Hadoop Distributed File System).

Page 4: Apache hadoop technology : Beginners

www.company.com

Hadoop Architecture :

Page 5: Apache hadoop technology : Beginners

www.company.com

What is Apache Hadoop :• The Apache Hadoop software library is a framework that

allows for the distributed processing of large data sets across clusters of computers using simple programming models.

• It is designed to scale up from single servers to thousands of machines, each offering local computation and storage..

Page 6: Apache hadoop technology : Beginners

www.company.com

Data flow :

Web Servers Scribe Servers

Network Storage

Hadoop ClusterOracle RAC MySQL

Page 7: Apache hadoop technology : Beginners

www.company.com

MapReduce :• Hadoop MapReduce is a software framework for easily

writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.

• A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks.

Page 8: Apache hadoop technology : Beginners

www.company.com

Cont.. • Job – A “full program” - an execution of a Mapper

and Reducer across a data set• Task – An execution of a Mapper or a Reducer

on a slice of data • a.k.a. Task-In-Progress (TIP)

• Task Attempt – A particular instance of an attempt to execute a task on a machine

Page 9: Apache hadoop technology : Beginners

www.company.com

MapReduce High level :

JobTrackerMapReduce job

submitted by client computer

Master node

TaskTracker

Slave node

Task instance

TaskTracker

Slave node

Task instance

TaskTracker

Slave node

Task instance

Page 10: Apache hadoop technology : Beginners

www.company.com

HDFS :• A file system, that stores data in a very efficient

manner, which can be used easily. A distributed file system that provides high throughput access to application.

• Features :– It is suitable for the distributed storage and processing.– Hadoop provides a command interface to interact with HDFS.– The built-in servers of namenode and datanode help users to

easily check the status of cluster.– Streaming access to file system data.– HDFS provides file permissions and authentication.

Page 11: Apache hadoop technology : Beginners

www.company.com

Architecture :

Page 12: Apache hadoop technology : Beginners

www.company.com

YARN Framework :• Apache Hadoop YARN (Yet Another Resource Negotiator) is a

cluster management technology.• YARN is the foundation of the new generation of Hadoop and is

enabling organizations everywhere to realize a modern data architecture.

• It provides resource management and a central platform to deliver consistent operations, security, and data governance tools across Hadoop clusters.

• It provides, a consistent framework for writing data access applications that run IN Hadoop, to the developers.

Page 13: Apache hadoop technology : Beginners

www.company.com

Cont. :• Some features are :

– Multi Tangency– Cluster Utilization– Scalability – Compatibility

Page 14: Apache hadoop technology : Beginners

www.company.com

Architecture :

Page 15: Apache hadoop technology : Beginners

www.company.com

Who Uses Hadoop :• Amazon/A9• Facebook• Google• IBM• Joost• Last.fm• New York Times• PowerSet• Veoh• Yahoo!

Page 16: Apache hadoop technology : Beginners

www.company.com

Page 17: Apache hadoop technology : Beginners

www.company.com

Hadoop in the Enterprise• Accelerate nightly batch business processes • Storage of extremely high volumes of data• Creation of automatic, redundant backups• Improving the scalability of applications• Use of Java for data processing instead of SQL• Producing JIT feeds for dashboards and BI• Handling urgent, ad hoc request for data• Turning unstructured data into relational data• Taking on tasks that require massive parallelism• Moving existing algorithms, code, frameworks, and

components to a highly distributed computing environment

Page 18: Apache hadoop technology : Beginners

www.company.com

Advantage :

• Hadoop framework allows the user to quickly write and test distributed systems. It is efficient, and it automatic distributes the data and work across the machines and in turn, utilizes the underlying parallelism of the CPU cores.

• Hadoop does not rely on hardware to provide fault-tolerance and high availability (FTHA), rather Hadoop library itself has been designed to detect and handle failures at the application layer.

Page 19: Apache hadoop technology : Beginners

www.company.com

• Servers can be added or removed from the cluster dynamically and Hadoop continues to operate without interruption.

• Another big advantage of Hadoop is that apart from being open source, it is compatible on all the platforms since it is Java based.

Page 20: Apache hadoop technology : Beginners

www.company.com

Conclusion :• Apache Hadoop is a fast-growing data framework• Apache Hadoop offers a free, cohesive platform that

encapsulates:• – Data integration• – Data processing• – Workflow scheduling• – Monitoring

Page 21: Apache hadoop technology : Beginners

www.company.com

THANK YOU