Top Banner
• HDFS https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and- online-course.html
78
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HDFS .

• HDFS

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 2: HDFS .

Apache Hive Features

1 Apache Hive supports analysis of large datasets stored in Hadoop's HDFS and compatible file systems such as Amazon S3 filesystem. It

provides an SQL-like language called while maintaining full support for

map/reduce. To accelerate queries, it provides indexes, including bitmap

indexes.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 3: HDFS .

Apache Hadoop

1 The entire Apache Hadoop “platform” is now commonly considered to consist of the Hadoop kernel,

MapReduce and Hadoop Distributed File System (HDFS), as well as a

number of related projects – including Apache Hive, Apache

HBase, and others.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 4: HDFS .

Apache Hadoop Architecture

1 Hadoop consists of the Hadoop Common package, which provides filesystem and OS

level abstractions, a MapReduce engine (either MapReduce or YARN) and the Hadoop Distributed File System (HDFS). The Hadoop Common package contains the necessary

Java ARchive (JAR) files and scripts needed to start Hadoop. The package also provides

source code, documentation and a contribution section that includes projects

from the Hadoop Community.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 5: HDFS .

Apache Hadoop Architecture

1 HDFS uses this method when replicating data to try to keep different copies of the data on

different racks

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 6: HDFS .

Apache Hadoop Architecture

1 In clusters where the Hadoop MapReduce engine is deployed

against an alternate file system, the NameNode, secondary NameNode

and DataNode architecture of HDFS is replaced by the file-system-specific

equivalent.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 7: HDFS .

Apache Hadoop Hadoop distributed file system

1 Each datanode serves up blocks of data over the network using a block protocol specific to

HDFS

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 8: HDFS .

Apache Hadoop Hadoop distributed file system

1 HDFS is not fully POSIX-compliant, because the requirements for a POSIX file-system differ from the

target goals for a Hadoop application

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 9: HDFS .

Apache Hadoop Hadoop distributed file system

1 HDFS added high-availability capabilities, as announced for

release 2.0 in May 2012, allowing the main metadata server (the

NameNode) to be failed over manually to a backup in the event of failure. The project has also started

developing automatic fail-over.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 10: HDFS .

Apache Hadoop Hadoop distributed file system

1 HDFS Federation, a new addition, aims to tackle this problem to a

certain extent by allowing multiple name-spaces served by separate

namenodes.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 11: HDFS .

Apache Hadoop Hadoop distributed file system

1 An advantage of using HDFS is data awareness between the job tracker and task

tracker

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 12: HDFS .

Apache Hadoop Hadoop distributed file system

1 HDFS was designed for mostly immutable files and may not be suitable for systems requiring concurrent write-operations.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 13: HDFS .

Apache Hadoop Hadoop distributed file system

1 Another limitation of HDFS is that it cannot be mounted directly by an existing

operating system. Getting data into and out of the HDFS file system, an action that often needs to be performed before and

after executing a job, can be inconvenient. A Filesystem in Userspace (FUSE) virtual

file system has been developed to address this problem, at least for Linux and some

other Unix systems.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 14: HDFS .

Apache Hadoop Hadoop distributed file system

1 File access can be achieved through the native Java API, the Thrift API to generate a client in the language of

the users' choosing (C++, Java, Python, PHP, Ruby, Erlang, Perl,

Haskell, C#, Cocoa, Smalltalk, and OCaml), the command-line interface,

or browsed through the HDFS-UI webapp over HTTP.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 15: HDFS .

Apache Hadoop Other file systems

1 HDFS: Hadoop's own rack-aware file system. This is designed to scale to

tens of petabytes of storage and runs on top of the file systems of the underlying operating systems.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 16: HDFS .

Apache Hadoop Other file systems

1 MapR's maprfs file system. This system provides inherent high availability,

transactionally correct snapshots and mirrors while offering higher scaling than HDFS while giving higher performance. Maprfs is available as part of the MapR distribution and as a native option on

Elastic Map Reduce from Amazon's web services, as well as on Google Compute

Engine.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 17: HDFS .

Apache Hadoop Other file systems

1 In May 2011, MapR Technologies, Inc. announced the availability of an

alternative file system for Hadoop, which replaced the HDFS file system with a full random-access read/write file system, with advanced features

like snaphots and mirrors, and got rid of the single point of failure issue of

the default HDFS NameNode.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 18: HDFS .

Apache Hadoop Other applications

1 The HDFS file system is not restricted to MapReduce jobs

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 19: HDFS .

Apache Hadoop Yahoo!

1 There are multiple Hadoop clusters at Yahoo! and no HDFS file systems or MapReduce jobs are split across

multiple datacenters. Every Hadoop cluster node bootstraps the Linux

image, including the Hadoop distribution. Work that the clusters

perform is known to include the index calculations for the Yahoo!

search engine.https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 20: HDFS .

Apache Hadoop Commercially supported Hadoop-related products

1 EMC released EMC Greenplum Community Edition and EMC

Greenplum HD Enterprise Edition in May 2011. The community edition,

with optional for-fee technical support, consists of Hadoop, HDFS,

HBase, Hive, and the ZooKeeper configuration service. The enterprise

edition is an offering based on the MapR product, and offers proprietary features such as snapshots and wide

area replication.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 21: HDFS .

Apache Hadoop Commercially supported Hadoop-related products

1 Grand Logic's JobServer product allows developers and admins to

deploy, manage and monitor their Hadoop infrastructure, with support for Hadoop job processing and HDFS

file/content management.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 22: HDFS .

Apache Hadoop Commercially supported Hadoop-related products

1 In January 2012, EMC Isilon announced support for HDFS in its OneFS clustered file

system.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 23: HDFS .

Comparison of structured storage software - Comparison

1 HBase Key-value Yes. Major version upgrades require re-import.

Yes HDFS, Amazon S3 or Amazon Elastic Block Store. Yes Yes See

HDFS, S3 or EBS. JavaBigTableApache 2.0

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 24: HDFS .

Petabyte - Usage examples

1 * In August 2012, Facebook's Hadoop clusters include the largest single

HDFS cluster known, with more than 100 PB physical disk space in a

single HDFS filesystem.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 25: HDFS .

Hadoop

1 * Hadoop Distributed File System (HDFS) - a distributed file-system

that stores data on the commodity machines, providing very high

aggregate bandwidth across the cluster.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 26: HDFS .

Hadoop

1 All the modules in Hadoop are designed with a fundamental

assumption that hardware failures (of individual machines, or racks of machines) are common and thus

should be automatically handled in software by the framework. Apache

Hadoop's MapReduce and HDFS components originally derived

respectively from Google's MapReduce and Google File System

(GFS) papers.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 27: HDFS .

Hadoop

1 Beyond HDFS, YARN and MapReduce, the entire Apache Hadoop “platform”

is now commonly considered to consist of a number of related

projects as well– Pig (programming tool)|Apache Pig, Apache Hive,

HBase|Apache HBase, and others.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 28: HDFS .

Hadoop - History

1 This includes the Hadoop Distributed Filesystem (HDFS) and an implementation of

MapReduce

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 29: HDFS .

Hadoop - Architecture

1 Hadoop consists of the Hadoop Common package, which provides filesystem and OS level

abstractions, a MapReduce engine (either MapReduce/MR1 or YARN/MR2) and the

#Hadoop distributed file system|Hadoop Distributed File System (HDFS). The Hadoop

Common package contains the necessary JAR (file format)|Java ARchive (JAR) files and scripts

needed to start Hadoop. The package also provides source code, documentation and a

contribution section that includes projects from the Hadoop Community.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 30: HDFS .

Hadoop - Hadoop distributed file system

1 Each datanode serves up blocks of data over the network using a block protocol specific to

HDFS

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 31: HDFS .

Hadoop - Hadoop distributed file system

1 HDFS is not fully POSIX-compliant, because the requirements for a POSIX file-system differ from the

target goals for a Hadoop application

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 32: HDFS .

Hadoop - Hadoop distributed file system

1 HDFS added the high-availability capabilities, as announced for

release 2.0 in May 2012, allowing the main metadata server (the

NameNode) to be failed over manually to a backup in the event of failure. The project has also started

developing automatic fail-over.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 33: HDFS .

Hadoop - Hadoop distributed file system

1 HDFS Federation, a new addition, aims to tackle this problem to a

certain extent by allowing multiple name-spaces served by separate

namenodes.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 34: HDFS .

Hadoop - Hadoop distributed file system

1 Another limitation of HDFS is that it cannot be Mount (computing)|

mounted directly by an existing operating system. Getting data into and out of the HDFS file system, an

action that often needs to be performed before and after executing

a job, can be inconvenient. A Filesystem in Userspace (FUSE)

virtual file system has been developed to address this problem, at least for Linux and some other

Unix systems.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 35: HDFS .

Hadoop - Hadoop distributed file system

1 File access can be achieved through the native Java API, the Thrift

(protocol)|Thrift API to generate a client in the language of the users' choosing (C++, Java, Python, PHP,

Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml), the

command-line interface, or browsed through the HDFS-UI Web

application|webapp over HTTP.https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 36: HDFS .

Hadoop - Other file systems

1 * HDFS: Hadoop's own rack-aware file system. This is designed to scale to tens of petabytes of storage and

runs on top of the file systems of the underlying Operating Systems.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 37: HDFS .

Hadoop - Other file systems

1 This system provides inherent high availability, transactionally correct

snapshots and mirrors while offering higher scaling than HDFS while

giving higher performance

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 38: HDFS .

Hadoop - Other file systems

1 * In May 2011, MapR Technologies, Inc. announced the availability of an alternative file system for Hadoop,

which replaced the HDFS file system with a full random-access read/write file system, with advanced features

like snaphots and mirrors, and got rid of the single point of failure issue of

the default HDFS NameNode.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 39: HDFS .

Hadoop - Commercially supported Hadoop-related products

1 * EMC Corporation|EMC released EMC Greenplum Community Edition and

EMC Greenplum HD Enterprise Edition in May 2011. The community

edition, with optional for-fee technical support, consists of

Hadoop, HDFS, HBase, Apache Hive|Hive, and the Apache ZooKeeper|

ZooKeeper configuration service. The enterprise edition is an offering based on the MapR product, and

offers proprietary features such as snapshots and wide area replication.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 40: HDFS .

Hadoop - Commercially supported Hadoop-related products

1 * EMC Isilon announced support for HDFS in its

OneFS clustered file system.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 41: HDFS .

Hadoop - Commercially supported Hadoop-related products

1 * Grand Logic's JobServer product allows developers and admins to

deploy, manage and monitor their Hadoop infrastructure, with support for Hadoop job processing and HDFS

file/content management.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 42: HDFS .

HBase

1 It is developed as part of Apache Software Foundation's Hadoop|

Apache Hadoop project and runs on top of Hadoop Distributed Filesystem|

HDFS (Hadoop Distributed Filesystem), providing BigTable-like

capabilities for Hadoop

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 43: HDFS .

Data-intensive computing - Hadoop

1 Hadoop now encompasses multiple subprojects in addition to the base

core, MapReduce, and HDFS distributed filesystem

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 44: HDFS .

Data-intensive computing - Hadoop

1 Hadoop includes a distributed file system called HDFS which is

analogous to Google File System|GFS in the Google MapReduce

implementation

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 45: HDFS .

Graph database - Distributed Graph Processing

1 * [ http://incubator.apache.org/hama/ Apache Hama] - a pure BSP(Bulk Synchronous Parallel) computing

framework on top of HDFS (Hadoop Distributed File System) for massive

scientific computations such as matrix, graph and network

algorithms.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 46: HDFS .

Graph database - Distributed Graph Processing

1 * [ http://thinkaurelius.github.com/faunus/ Faunus] - a Hadoop-based graph

computing framework that uses Gremlin as its query language. Faunus provides connectivity to

Titan, Rexster-fronted graph databases, and to text/binary graph formats stored in HDFS. Faunus is

developed by [ http://thinkaurelius.com Aurelius].

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 47: HDFS .

Apache Hive - Features

1 Apache Hive supports analysis of large datasets stored in Hadoop's HDFS and compatible file

systems such as Amazon S3 filesystem. It provides an SQL-like language called HiveQL

while maintaining full support for MapReduce|map/reduce. To accelerate queries, it provides

indexes, including bitmap indexes.[ http://www.facebook.com/notes/facebook-

engineering/working-with-students-to-improve-indexing-in-apache-hive/10150168427733920 Working with Students to Improve Indexing in

Apache Hive]

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 48: HDFS .

Filesystem in Userspace - Example uses

1 * HDFS: [ http://wiki.apache.org/hadoop/Mount

ableHDFS FUSE bindings] exist for the open source Hadoop distributed

filesystem

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 49: HDFS .

John Stossel - Libertarianism

1 When the Department of Labor reissued federal guidelines in April 2010 governing the employment of

unpaid interns under the Fair Labor Standards Act based on a 1947 Supreme Court

decision,[http://www.dol.gov/whd/regs/compliance/whdfs71.pdf Fact Sheet #71: Internship Programs Under

The Fair Labor Standards Act] United States Department of Labor; April 2010 Stossel criticized the

guidelines, appearing in a police uniform during an appearance on the Fox News program America Live, commenting, I’ve built my career on unpaid interns, and the interns told me it was great– I learned more

from you than I did in college

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 50: HDFS .

Apache Hama - GroomServer

1 A Groom Server (shortly referred to as groom) is a process that performs bsp

tasks assigned by BSPMaster. Each groom contacts the BSPMaster, and it takes

assigned tasks and reports its status by means of periodical piggybacks with

BSPMaster. Each groom is designed to run with HDFS or other distributed storages.

Basically, a groom server and a data node should be run on one physical node.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 51: HDFS .

Vertica - Optimizations

1 The Vertica Analytic Database runs on Grid computing|grids of Linux-

based Commodity computing|commodity servers. It is also available as a hosted DBMS

provisioned by and running on the Amazon ec2|Amazon Elastic Compute Cloud. The product

integrates with Hadoop to leverage HDFS within Vertica and provide access to Vertica's data through

MapReduce.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 52: HDFS .

Cloudera Impala - Description

1 The Apache-licensed Impala project brings scalable parallel database technology to Hadoop, enabling users to issue low-latency SQL

queries to data stored in HDFS and Apache HBase without requiring data movement or transformation. Impala is integrated with Hadoop to use the

same file and data formats, metadata, security and resource

management frameworks used by MapReduce, Apache Hive, Apache Pig

and other Hadoop software.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 53: HDFS .

Cloudera Impala - Description

1 * Supports HDFS#Hadoop_distribut

ed_file_system|HDFS and Apache HBase

storage

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 54: HDFS .

MarkLogic Server - Technology

1 In addition to the distributed, scale-out architecture expected from a

NoSQL database, it has role-based security features, JSON storage,

direct use of Apache Hadoop Distributed File System (HDFS), multiple indexing strategies and

ACID consistency

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 55: HDFS .

OneFS - Protocols

1 OneFS is equipped with options for accessing storage via Network File System|NFS, CIFS/Server Message Block|SMB, FTP, HTTP, Iscsi|iSCSI, and HDFS. It can utilize non-local

authentication such as Active Directory, LDAP, and Network

Information Service|NIS. It is also capable of interfacing with backup

devices using NDMP.https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 56: HDFS .

Tip (gratuity) - United States

1 These non-eligible employees include dishwashers, cooks, chefs, and

janitors.http://www.dol.gov/whd/regs/compliance/whdfs15.pdf FLSA US

DoL

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 57: HDFS .

Greenplum - Technology

1 Greenplum HD is a supported version of Apache Hadoop. It includes

Hadoop's Distributed File System (HDFS), Apache Hive|Hive, Pig

(programming tool)|Pig, HBase, and Apache ZooKeeper|ZooKeeper.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 58: HDFS .

Cloudera - Products and services

1 CDH contains the main, core elements of Hadoop that provide reliable, scalable distributed data

processing of large data sets (chiefly MapReduce and HDFS), as well as

other enterprise-oriented components that provide security,

high availability, and integration with hardware and other software.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 59: HDFS .

RCFile - Performance Benefits

1 In MapReduce-based systems, data is normally stored on a distributed

system, such as Apache_Hadoop#Hadoop_Distributed_File_System|Hadoop Distributed File

System (HDFS), and different data blocks might be stored in different

machines

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 60: HDFS .

RCFile - Impacts

1 http://www.slideshare.net/ydn/ahis20

11-platform-hive-evolution In addition, all the data sets stored in HDFS before RCFile have also been

transformed to use the RCFile structure.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 61: HDFS .

IBM General Parallel File System - History

1 (File Placement Optimizer). This allows GPFS to use locally attached

disks on a cluster of network connected servers rather than

requiring dedicated servers with shared disks (e.g. using a SAN).

GPFS-FPO is suitable for workloads with high data locality such as shared

nothing database clusters like SAP HANA and DB2 DPF, and can be used

as a HDFS-compatible filesystem.https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 62: HDFS .

IBM General Parallel File System - Architecture

1 It is interesting to compare this with Hadoop's HDFS filesystem, which is designed to store similar or greater

quantities of data on commodity hardware — that is, datacenters without RAID disks and a Storage

Area Network (SAN).

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 63: HDFS .

IBM General Parallel File System - Architecture

1 # HDFS also breaks files up into blocks, and stores them on different filesystem nodes.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 64: HDFS .

IBM General Parallel File System - Architecture

1 # HDFS does not expect reliable disks, so instead stores copies of the blocks on different nodes. The failure of a node containing a single copy of a block is a minor issue, dealt with by re-replicating another copy of the set

of valid blocks, to bring the replication count back up to the

desired number. In contrast, while GPFS supports recovery from a lost

node, it is a more serious event, one that may include a higher risk of data

being (temporarily) lost.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 65: HDFS .

IBM General Parallel File System - Architecture

1 # GPFS supports full Posix filesystem semantics. HDFS and GFS do not support full

Posix compliance.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 66: HDFS .

IBM General Parallel File System - Architecture

1 # GPFS breaks files up into small blocks. Hadoop HDFS likes blocks of or more, as this reduces the storage

requirements of the Namenode. Small blocks or many small files fill

up a filesystem's indices fast, so limit the filesystem's size.

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 67: HDFS .

Spark (cluster computing framework)

1 Spark fits into the Hadoop open-source community, building on top of the Apache Hadoop#Hadoop distributed file system|Hadoop Distributed File System (HDFS).

[https://amplab.cs.berkeley.edu/software/ Figure showing Spark in relation to other open-

source Software projects including Hadoop] However, Spark is not tied to the two-stage

MapReduce paradigm, and promises performance up to 100 times faster than

Hadoop MapReduce, for certain applications

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 68: HDFS .

Platform Computing - Open-source participation

1 * Platform joined the Hadoop project in 2011, and is focused on enhancing

the Hadoop Distributed File System[

http://www.platform.com/press-releases/2011/PlatformAnnouncesCo

mmercialSupportforApacheHDFS Platform Computing Announces Commercial Support for Apache Hadoop Distributed File System

(HDFS)]https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 69: HDFS .

Peter Schiff - Broadcasting

1 According to Schiff, The Daily Show took his remarks out of context by cutting and editing remarks made in the context of a four-hour interview.http://www.schiffradio.com/b/The-Daily-Show:-The-Daily-Show:-Intellectually-

Dishonest-about-the-Intellectually-Disabled/-525361918630098994.html Intellectually disabled people are currently exempt from

the minimum wage,http://www.dol.gov/whd/regs/complianc

e/whdfs39.pdf a fact which Mr

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 70: HDFS .

Quantcast File System

1 'Quantcast File System' ('QFS') is an open-source distributed file system

software package for large-scale MapReduce or other batch-

processing workloads. It was designed as an alternative to Apache Hadoop’s HDFS, intended to deliver

better performance and cost-efficiency for large-scale processing

clusters.https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 71: HDFS .

Isilon Systems - Technology and architecture

1 In addition, Isilon uniquely supports HDFS as a protocol allowing Hadoop

analytics http://isilonblog.emc.com/an-

interview-with-doug-cutting-the-founder-of-hadoop/ Interview with Doug Cutting to be performed on

files resident on the storage

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 72: HDFS .

Space Telescope Science Institute - Science community service

1 To date, these programs include the Hubble Deep Field (HDF), the Hubble

Deep Field South (HDFS), and the Ultra Deep Field (UDF)

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 73: HDFS .

Clustered filesystem - Examples

1 * Hadoop distributed file system|HDFS (Apache Software

Foundation)

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 74: HDFS .

Druid (open-source data store) - Architecture[http://druid.io/docs/latest Druid Project Documentation]

1 [http://static.druid.io/docs/druid.pdf Druid: A Real-time Analytical Data Store], Metamarkets, retrieved 6

February 2014 In addition, the cluster includes external dependencies for coordination (Apache ZooKeeper), storage of metadata (Mysql), and a deep storage facility (e.g., HDFS,

Amazon S3, or Apache Cassandra).

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 75: HDFS .

Buell Motorcycle Company - History

1 By 2008, Harley's credit arm, Harley-Davidson Financial Services (HDFS), was struggling, and the lower resale

value of Buell motorcycles meant that new bike sales were significantly

affected

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 76: HDFS .

Boundless Informant - Technology

1 According to published slides, Boundless Informant leverages Free and Open Source Software—and is

therefore available to all NSA developers—and corporate services

hosted in the Cloud Computing|cloud. The tool uses HDFS, MapReduce, and

Apache Accumulo|Accumulo (formerly Cloudbase) for data

processing.https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html

Page 77: HDFS .

Astronomical acronyms - H

1 * HDFS Hubble Deep Field South - a List of Deep Fields|deep field

surveyed by many telescopes at different wavelengths, first selected for deep field observations by the

Hubble Space Telescope

https://store.theartofservice.com/itil-2011-foundation-complete-certification-kit-fourth-edition-study-guide-ebook-and-online-course.html