Top Banner
DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS Recommended Configurations Kris Applegate Solution Architect Dell EMC Customer Solution Centers [email protected] Boni Bruno Principal Solution Architect Dell EMC Emerging Technology Team [email protected] Armando Acosta Product Manager Dell EMC Converged Platform Division [email protected] Sai Devulapalli Data Analytics Practice Lead Dell EMC Emerging Technology Team [email protected] ABSTRACT This white paper details the validated configuration for connecting Dell EMC Isilon to Dell EMC PowerEdge servers. We will also detail some recommended configurations as well as provide guidance on optional modifications for tailoring to each customer’s use case. December 2016
17

DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS€¦ · and process huge datasets versus traditional existing business intelligence (BI) and analytics solutions. In addition, cost

Aug 17, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS€¦ · and process huge datasets versus traditional existing business intelligence (BI) and analytics solutions. In addition, cost

DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS

Recommended Configurations

Kris Applegate Solution Architect Dell EMC Customer Solution Centers [email protected] Boni Bruno Principal Solution Architect Dell EMC Emerging Technology Team [email protected] Armando Acosta Product Manager Dell EMC Converged Platform Division [email protected] Sai Devulapalli Data Analytics Practice Lead Dell EMC Emerging Technology Team [email protected]

ABSTRACT

This white paper details the validated configuration for connecting Dell EMC Isilon to

Dell EMC PowerEdge servers. We will also detail some recommended configurations as

well as provide guidance on optional modifications for tailoring to each customer’s use

case.

December 2016

Page 2: DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS€¦ · and process huge datasets versus traditional existing business intelligence (BI) and analytics solutions. In addition, cost

Dell - Internal Use - Confidential 2

TABLE OF CONTENTS

EXECUTIVE SUMMARY ...........................................................................................................4

AUDIENCE ........................................................................................................................................ 4

HADOOP IN THE ENTERPRISE ...............................................................................................5

SHARED STORAGE HADOOP VS. DISTRIBUTED STORAGE HADOOP .............................5

DELL EMC ISILON ....................................................................................................................6

Dell EMC Isilon X-Series Nodes ........................................................................................................ 6

DELL EMC POWEREDGE ........................................................................................................6

Dell EMC PowerEdge FX2, PowerEdge FC630, and PowerEdge FD332 ......................................... 7

Dell EMC PowerEdge R630 .............................................................................................................. 7

HADOOP ROLES ......................................................................................................................7

Compute Node(s) .............................................................................................................................. 7

Infrastructure Nodes .......................................................................................................................... 8

Manager Node(s) ....................................................................................................................... 8

Edge Node(s) ............................................................................................................................. 8

RECOMMENDED CONFIGURATIONS .....................................................................................9

Modular Infrastructure ....................................................................................................................... 9

Network Diagram ....................................................................................................................... 9

Configuration ............................................................................................................................ 10

Rack Server Infrastructure .............................................................................................................. 12

Network Diagram ..................................................................................................................... 12

Configuration ............................................................................................................................ 13

As-Tested Configuration .................................................................................................................. 14

Network Diagram ..................................................................................................................... 14

Configuration ............................................................................................................................ 15

Considerations ................................................................................................................................ 15

Sizing Compute Nodes and Isilon Nodes ................................................................................. 16

Isilon Platform .......................................................................................................................... 16

Server Platform ........................................................................................................................ 16

Server CPU .............................................................................................................................. 16

Server Memory ......................................................................................................................... 17

Server Local Storage ................................................................................................................ 17

Network .................................................................................................................................... 17

Page 3: DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS€¦ · and process huge datasets versus traditional existing business intelligence (BI) and analytics solutions. In addition, cost

Dell - Internal Use - Confidential 3

DELL EMC CUSTOMER SOLUTION CENTERS .................................................................. 17

LINKS ...................................................................................................................................... 17

Page 4: DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS€¦ · and process huge datasets versus traditional existing business intelligence (BI) and analytics solutions. In addition, cost

Dell - Internal Use - Confidential 4

EXECUTIVE SUMMARY

Analytics play a crucial role in any modern enterprise. One technology that enables processing the data necessary to extract insights at

a scale, speed, and price-point is Hadoop. Dell EMC can provide solutions ranging from pure do-it-yourself reference architectures all

the way through complete turn-key appliances that can accommodate almost any project budget.

This whitepaper details the validated configurations for running Cloudera and Hortonworks Hadoop distributions on ® Dell EMC Isilon

arrays and ® Dell EMC PowerEdge servers. Additionally, recommended configurations are outlined along with potential variances that

can be used to accommodate differing use cases. Configurations will be provided for both module and rack server configurations in

order to provide the most flexibility to adapt to customer requirements.

AUDIENCE

This white paper is intended for customers looking to leverage validated configurations when customizing their own Hadoop clusters as

well as Dell EMC sales makers and partners that are looking to propose solutions to customers.

Page 5: DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS€¦ · and process huge datasets versus traditional existing business intelligence (BI) and analytics solutions. In addition, cost

Dell - Internal Use - Confidential 5

Hadoop in the Enterprise

Hadoop is an open source platform that is designed to store and process large datasets in a distributed computing environment. It has

two main sub-projects: Hadoop Distributed File System (HDFS) for data storage and MapReduce for data processing. Hadoop breaks

down large datasets across servers or shared storage to process the data in parallel.

Organizations turn to Hadoop for both business and technology advantages. At a business level, Hadoop offers a compelling value

proposition from a total cost of ownership standpoint. Hadoop uses industry-standard servers and storage, decreasing the cost to store

and process huge datasets versus traditional existing business intelligence (BI) and analytics solutions. In addition, cost efficiencies can

be achieved via a data lake using scale-out NAS, such as Dell EMC Isilon.

To help organizations overcome their expertise and skills shortages and accelerate the deployment of Hadoop environments that grow

with the business, Dell EMC provides a portfolio of validated reference architectures and scalable solutions for Hadoop deployments.

Dell EMC backs these offerings with a wide range of professional consulting and support services.

There is no single “right answer” for success with data analytics. It is a journey of continual growth. Every organization’s data is unique,

and must be treated as such. Solutions that are perfect for one company may not address the needs of another.

With this thought in mind, Dell EMC offers a wide range of products and solutions to address diverse big data and analytics challenges

— from starter bundles and validated reference architectures to integrated appliances and engineered solutions, or even completely

customized solutions for your specific environment.

Shared Storage Hadoop vs. Distributed Storage Hadoop

It’s a testament to Hadoop’s flexibility that it can tolerate multiple deployment models accounting for varying budget, performance,

capacity, and density requirements. The Dell EMC Isilon solution is a shared storage model where the persistent filesystem data for

Hadoop is stored in an Isilon NAS cluster versus in the distributed model where data is spread across the local storage of the Hadoop

nodes themselves.

Figure 1. Shared Vs. Distributed Topologies

These two approaches offer varying advantages:

Page 6: DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS€¦ · and process huge datasets versus traditional existing business intelligence (BI) and analytics solutions. In addition, cost

Dell - Internal Use - Confidential

6

Shared Storage Hadoop Distributed Storage Hadoop

Single source of data Massive Scale (100s of PB+)

Reduced datacenter footprint (Storage Density) Commodity platforms

Leverage storage platform features (Performance Tiering,

Alternative RAID, Multi-Protocol)

Linear scaling

Independent scaling of storage/compute Flexible replica model

Table 1. Shared Vs. Distributed Comparison

While this whitepaper focuses on the Dell EMC Isilon configurations, Dell EMC offers solutions built around both shared and distributed

models in order to accommodate the breadth in our customer’s use-cases. Please contact your Dell EMC sales team or your Dell EMC

Customer Solutions Center Solution Architect if you wish to discuss the distributed solution in detail.

Dell EMC Isilon

DELL EMC® Isilon® scale-out storage solutions are designed for enterprises that want to manage their data, not their storage. Isilon

storage systems are powerful yet simple to install, manage, and scale to virtually any size. And, unlike traditional enterprise storage,

Isilon solutions stay simple no matter how much storage capacity is added, how much performance is required, or how business needs

change in the future. We’re challenging enterprises to think differently about their storage, because when they do, they’ll recognize

there’s a better, simpler way—with Isilon.

Dell EMC Isilon X-Series Nodes

The Isilon X-Series, our most flexible and comprehensive storage product line, strikes the right balance between large capacity and

high-performance storage. The highly versatile X-Series is an ideal solution for high-throughput and high-concurrency applications. With

SSD technology for file system metadata and file-based storage workflows, the Isilon X-Series significantly accelerates namespace-

intensive operations. To meet rigorous data security and compliance requirements, Isilon also offers Data at Rest Encryption (DARE)

with self-encrypting drive (SED) options with the X-Series platform.

Figure 2. Dell EMC Isilon X-Series

Dell EMC PowerEdge

Dell EMC’s server portfolio is broad enough that there are many different options with regards to delivering the compute aspect of a

Hadoop solution. With models that can accommodate so many differing requirements around price, density, and management

Page 7: DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS€¦ · and process huge datasets versus traditional existing business intelligence (BI) and analytics solutions. In addition, cost

Dell - Internal Use - Confidential 7

capabilities, it would take far too much time to list every possible option. We’ll start the conversation here with two recommended

configurations. One from the modular infrastructure portfolio and one from the traditional rack infrastructure portfolio. Use these as an

initial starting point and work with your Dell EMC server specialist to customize to your exact specifications.

Dell EMC PowerEdge FX2, PowerEdge FC630, and PowerEdge FD332

The PowerEdge FX2 family is a fully modular eco-system that allows you enough configuration options that you can tailor it to meet the

demands of any workload. For Isilon Data Lake designs, we need the ability to have some flexible internal storage options as well as a

robust network capability both from server to Isilon as well as from server to client. The Dell EMC PowerEdge FC630 compute node

with a PowerEdge FD332 disk shelf is a great way to pack a lot of punch into an easily manageable footprint.

Figure 3. Dell EMC PowerEdge FX2, PowerEdge FC630, PowerEdge FD332

Dell EMC PowerEdge R630

As our most popular server platform the R630 has been battle tested by almost every use-case possible. In this configuration, we take

advantage of plenty of drive slots for either rotational media or solid-state drives as well as plenty of network bandwidth (both data and

client-facing).

Figure 4. Dell EMC PowerEdge R630

Hadoop Roles

Compute Node(s)

With all shared filesystem responsibilities taken care of by the Isilon, these node’s primary role is to provide the computational

horsepower to comb through all the data. However, they do still need some local storage to help cache or accelerate those operations.

With the drastic cost reductions in flash over the last years, some customers choose to make this local space consist of Solid State

Drives (SSDs). The use of SSDs isn’t a hard and fast requirement, but is becoming a common request as SSD prices come down

further and further.

Function Disks Type

Operating System 2 RAID 1 (Mirror)

Spark Scratch / Map Reduce Spill 2-10

(Optionally SSD)

Non-RAID or RAID 0

Page 8: DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS€¦ · and process huge datasets versus traditional existing business intelligence (BI) and analytics solutions. In addition, cost

Dell - Internal Use - Confidential

8

Table 2. Data Node Disk Layout

Infrastructure Nodes

The number of infrastructure servers will vary from customer to customer. In our recommended configuration we allocate 4 nodes, but it

could be done with less as your requirements for services high-availability vary.

Manager Node(s) The manager nodes in the cluster are responsible for running things like the Cloudera Manager (Cloudera Hadoop), Ambari

(Hortonworks Hadoop), and the principle roles for services like Hive, Oozie, and Zookeeper. We need 3 of them in order to provide a

quorum for high-availability in case of a node failure. These boxes don’t need to have high-end configurations and are a ripe area for

cost optimization. For the sake of our recommended configurations we’ll use the same chassis and server types as our compute nodes

in order to keep a common platform, but this is by no means required. Additionally, you can also, if your requirements allow it, co-locate

these roles on compute or edge nodes.

Edge Node(s) The role of the edge nodes are to be the primary interface for funneling data into a cluster as well as for pushing result data out of the

cluster. They are most often multi-homed to the Isilon network as well as the datacenter network. The configuration of these nodes can

vary drastically depending on the customer’s use case. For example if they are staging batch jobs into a cluster, you’ll need a larger

amount of local storage for that data to land on before you copy it into HDFS. If you are streaming data into the cluster, you wouldn’t

need a large amount of space but rather faster storage (like SSDs) to keep that data moving quickly. Much like the Manager Node(s),

this is an area ripe for optimization depending on use case. Our recommendations keep the same configuration as the Manager Nodes

just to keep some platform commonality. Lastly, as with the Manager Node(s) you can co-locate this role onto compute or manager

nodes if your use-case allows.

Page 9: DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS€¦ · and process huge datasets versus traditional existing business intelligence (BI) and analytics solutions. In addition, cost

Dell - Internal Use - Confidential 9

Recommended Configurations

Modular Infrastructure

Network Diagram

Figure 5. Modular Infrastructure – Network Diagram

Page 10: DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS€¦ · and process huge datasets versus traditional existing business intelligence (BI) and analytics solutions. In addition, cost

Dell - Internal Use - Confidential

10

Configuration Isilon Data Lake Array

Isilon Node 4x Dell EMC Isilon X410 102TB HDD / 3.2TB SSD 256GB 2x10GE and 2x1GE

Isilon Switch 2x QDR IB Switch - 8 Port, 1U, 1PS

Table 3. Modular Infrastructure Configuration - Isilon Data Lake Array

Networking

Data Network

Switches

2x Dell EMC Networking S4048-ON 10GbE Switches

Management

Network Switches

1x Dell EMC Networking S3048-ON 1GbE Switch

Table 4. Modular Infrastructure Configuration – Networking

Compute Chassis

Compute Chassis 3x Dell EMC PowerEdge FX2s

Chassis I/O Module 2x (Per-chassis) Dell EMC FX2 10 GbE Pass-through Module

Compute Platform 2x (Per-chassis) Dell EMC FC630 w/ 2x 2.5” disk slots

Compute Storage 2x (Per-chassis) Dell EMC FD332 w/ 16x 2.5” disk slots

Table 5. Modular Infrastructure Configuration – Compute Chassis

Table 6. Modular Infrastructure Configuration – Compute Chassis

Compute Storage Shelves

Compute Storage

Shelves

8x (Per Sled) 1.2TB 10K RPM 2.5” HDD

Compute Servers

Compute Platform

Processor

2x (Per Sled) Intel Xeon E5-2698v4 (20C)

Compute Platform

Memory

256 GB (Per Sled) - 16x 16GB 2400MHz RDIMM

Compute Platform

Disks

(OS) – 2x (Per Sled) 200GB Boot MLC 2.5” Intel S3610 Solid State Drives

Compute Platform

Network Cards

1x (Per Sled) Intel X710 Dual Port 10GbE Network Daughter Card

Page 11: DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS€¦ · and process huge datasets versus traditional existing business intelligence (BI) and analytics solutions. In addition, cost

Dell - Internal Use - Confidential 11

Table 7. Modular Infrastructure Configuration – Compute Storage Shelves

Infrastructure Nodes Chassis

Infrastructure Chassis 2x Dell EMC PowerEdge FX2s

Infrastructure Chassis

I/O Module

2x Dell EMC FX2 10 GbE Pass-through Module

Table 8. Modular Infrastructure Configuration – Infrastructure Nodes Chassis

Infrastructure Nodes Servers

Infrastructure Node

Platform

2x (Per-chassis) Dell EMC FC630 w/ 2x 2.5” disk slots

Infrastructure Node

Processor

2x (Per Sled) Intel Xeon E5-2640v4 (10C)

Infrastructure Node

Memory

128 GB (Per Sled) - 8x 16GB 2400MHz RDIMM

Infrastructure Node

Disks

(OS) – 2x (Per Sled) 200GB Boot MLC 2.5” Intel S3610 Solid State Drives

Infrastructure Node

Network Cards

1x (Per Sled) Intel X710 Dual Port 10GbE Network Daughter Card

Table 9. Modular Infrastructure Configuration – Infrastructure Nodes Servers

Infrastructure Node Storage Shelves

Infrastructure Node

Storage Shelves

3x (Per Sled) 1.2TB 10K RPM 2.5” HDD

Table 10. Modular Infrastructure Configuration – Infrastructure Storage Shelves

Page 12: DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS€¦ · and process huge datasets versus traditional existing business intelligence (BI) and analytics solutions. In addition, cost

Dell - Internal Use - Confidential

12

Rack Server Infrastructure

Network Diagram

Page 13: DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS€¦ · and process huge datasets versus traditional existing business intelligence (BI) and analytics solutions. In addition, cost

Dell - Internal Use - Confidential 13

Figure 6. Rack Server Infrastructure – Network Diagram

Configuration Isilon Data Lake Array

Isilon Node 4x Dell EMC Isilon X410 102TB HDD / 3.2TB SSD 256GB 2x10GE and 2x1GE

Isilon Switch 2x QDR IB Switch - 8 Port, 1U, 1PS

Table 11. Rack Server Infrastructure Configuration - Isilon Data Lake Array

Networking

Data Network

Switches

2x Dell EMC Networking S4048-ON 10GbE Switches

Management

Network Switches

1x Dell EMC Networking S3048-ON 1GbE Switch

Table 12. Rack Server Infrastructure Configuration – Networking

Table 13. Rack Server Infrastructure Configuration – Compute Servers

Infrastructure Nodes Servers

Infrastructure Node

Platform

4x Dell EMC PowerEdge R630 10-drive chassis

Infrastructure Node

Processor

2x Intel Xeon E5-2640v4 (10C)

Infrastructure Node

Memory

128 GB - 8x 16GB 2400MHz RDIMM

Infrastructure Node

Disks

(OS) – 2x 200GB Boot MLC 2.5” Intel S3610 Solid State Drives

(Data) – 3x 1.2TB 10K RPM 2.5” HDD

Compute Servers

Compute Platform 6x Dell EMC PowerEdge R630 10-drive chassis

Compute Platform

Processor

2x Intel Xeon E5-2698v4 (20C)

Compute Platform

Memory

256 GB - 16x 16GB 2400MHz RDIMM

Compute Platform

Disks

(OS) – 2x 200GB Boot MLC 2.5” Intel S3610 Solid State Drives

(Data) – 8x 1.2TB 10K RPM 2.5” HDD

Compute Platform

Network Daughter

Card

Intel X710 Dual Port 10GbE Network Daughter Card

Page 14: DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS€¦ · and process huge datasets versus traditional existing business intelligence (BI) and analytics solutions. In addition, cost

Dell - Internal Use - Confidential

14

Infrastructure Node

Network Daughter

Cards

Intel X710 Dual Port 10GbE Network Daughter Card

Table 14. Rack Server Infrastructure Configuration – Infrastructure Nodes Servers

As-Tested Configuration

The configuration below is only to document what was stood up in the Dell EMC Customer Solution Center in order to validate basic

functionality. We encourage customers to leverage these same capabilities that the Customer Solution Centers provide, by executing

their own proofs-of-concepts with us at no cost.

Roles of compute and infrastructure were shared across the same nodes. This isn’t recommended in production, but at a small scale in

a proof-of-concept, this is acceptable.

Network Diagram

Page 15: DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS€¦ · and process huge datasets versus traditional existing business intelligence (BI) and analytics solutions. In addition, cost

Dell - Internal Use - Confidential 15

Figure 7. As-Tested Configuration – Network Diagram

Configuration Isilon Data Lake Array

Isilon Node 3x Dell EMC Isilon S210 19.8TB HDD / 1.6TB SSD 256GB 2x10GE and 2x1GE (OneFS v8.0.0.2)

Isilon Switch 1x QDR IB Switch - 8 Port, 1U, 1PS

Table 15. As-Tested Configuration - Isilon Data Lake Array

Networking

Data Network

Switches

1x Dell EMC Networking S4048-ON 10GbE Switch

Management

Network Switches

1x Dell Force10 S60 1GbE Switch

Table 16. As-Tested Configuration – Networking

Compute Chassis

Compute Chassis 1x Dell EMC PowerEdge FX2s

Chassis I/O Module 2x (Per-chassis) Dell EMC FX2 10 GbE Pass-through Module

Compute Platform 4x (Per-chassis) Dell EMC FC630 w/ 8x 1.8” SSD slots

Table 17. As-Tested Configuration – Compute Chassis

Table 18. As-Tested Configuration – Compute Chassis

Considerations

Compute Servers

Compute Platform

Processor

2x (Per Sled) Intel Xeon E5-2680v3 (12C)

Compute Platform

Memory

256 GB (Per Sled) - 16x 16GB 2400MHz RDIMM

Compute Platform

Disks

(OS) – 2x (Per Sled) 480GB Intel S3610 MLC 1.8” Solid State Drives

(Data) – 6x (Per Sled) 480GB Intel S3610 MLC 1.8” Solid State Drives

Compute Platform

Network Cards

1x (Per Sled) Intel X520k Dual Port 10GbE Network Daughter Card

Compute Platform

Operating System

RedHat Enterprise Linux 7.2.1511

Page 16: DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS€¦ · and process huge datasets versus traditional existing business intelligence (BI) and analytics solutions. In addition, cost

Dell - Internal Use - Confidential

16

Sizing Compute Nodes and Isilon Nodes Many different factors go into sizing of your cluster. It’s important to work with your Dell EMC Account teams and Dell EMC Customer

Solutions Centers Solution Architects to make sure you’re appropriately accounting for as many variable as possible. Variables that

may need to be accounted for include:

Amount of Initial Data

Number of Replicas

Rate of Ingest

Duration of retention

Scratch Space

Compression

Read/Write I/O Mix

Our initial guidance for the number of compute nodes to the number of Isilon nodes is a ratio of 2:1. However, this is only an initial

guidance and we strongly recommend a more formal discussion with you Customer Solution Centers Solution Architect to come up with

a more specifically tailored recommendation given your requirements for capacity, performance, and any additional functions that the

Isilon may be serving.

Isilon Platform Isilon clusters simplify storage by combining the file system, volume manager, and data protection into the EMC Isilon OneFS® operating system. Through the clustered use of EMC Isilon high-performance X-Series nodes, high-capacity NLSeries, and high density HD-Series nodes, a single Isilon cluster can contain a mix of tiers that provide the best economics, throughput, or IOs per second into the petabyte range. With over 80 percent storage utilization, Isilon clusters need less raw capacity than most storage systems. Compared to traditional direct-attached storage (DAS) Hadoop, Isilon can do so at a third of the storage capacity while providing more protection. Consolidating your unstructured data on Isilon results in greater efficiency, simplified management, and cost savings.

Server Platform There are plenty of options when it comes to compute and infrastructure nodes inside the Dell EMC PowerEdge portfolio. We’ve

detailed two possible recommended configurations above, but there are many others as well that can be discussed with your Dell EMC

Customer Solutions Centers Solution Architects. Options include:

PowerEdge Rack / Tower Servers – These R- and T- series server are one of the most popular options for customers looking for

traditional 1U and 2U options. Either the PowerEdge R630 at 1U for density or the PowerEdge R730/XD for drive options are great

choices.

Modular Servers – Customers looking for robust manageability and integrated networking can look to the Dell EMC modular

infrastructure portfolio. The Dell EMC PowerEdge M1000 Blade chassis and the Dell EMC PowerEdge FX families are great choices.

Just make sure that you have enough drive slots or disk capacity to accommodate the local storage/scratch space that is needed.

These are also great for incidents where a highly datacenter density is required (co-location / hosting).

Server CPU The server core and frequency requirements for each customer can vary wildly. We recommend working closely with your Dell EMC

Customer Solution Centers Solution Architects to identify the right processor given you unique workload. You can also utilize the ability

to execute a proof-of-concept in the Customer Solution Centers at no charge to you in order to get an accurate characterization of your

expected performance.

Page 17: DELL EMC ISILON DATA LAKE WITH POWEREDGE SERVERS€¦ · and process huge datasets versus traditional existing business intelligence (BI) and analytics solutions. In addition, cost

Dell - Internal Use - Confidential 17

Server Memory As with the Server CPUs, this can vary from customer to customer and use-case to use-case. Generally we recommend starting at

256GB and going up from there as your utilization of in-memory technologies (Spark, Impala, Alluxio, etc.) increases.

Server Local Storage You’ll need some host-side cache / scratch space for your compute nodes. Approximately 5-8TB is common on either rotational or flash

memory. You should have enough scratch space across your compute nodes that is equal to approx 25% of your usable Hadoop

capacity. With the rapidly falling prices of flash memory, it can make sense to utilize those technologies to get fast local storage in ever-

increasing amounts. If you do opt for SSDs, this local scratch space can be SSDs either in drive-bays or in PCI-E form-factors.

Network At a minimum, you’ll want dual 10GbE from each host to the Isilon data nodes. As your bandwidth needs increase, you’ll want to

consider either segmenting off front-side (client to compute nodes) to their own network cards, or increasing the number and/or speed

of the links to each node. Prices on 25GbE and 40GbE cards are becoming very affordable and you may want to consider investing in

those early in order to reduce complexity (no need for complication bonding) as well as preparing you for the ever-increasing bandwidth

needs of emerging workloads. The Dell EMC Networking S6100 switch is an excellent switch for high-bandwidth needs either at the

host level or at the aggregation tier linking multiple racks together.

It’s also worth noting that as the Dell EMC Isilon product evolves, investing in 40GbE networking will be very wise for both compute-

node connectivity as well as datanode-to-datanode connectivity.

Dell EMC Customer Solution Centers

The Dell EMC Customer Solution Centers are a global network of connected labs that allow Dell to help customers architect, validate

and build solutions. With multiple footprints in every region, they can help you understand anything from simple hardware platforms, to

more complex solutions. These engagements range from an informal 30-60 minute briefings, through a longer half-day workshop, and

on to a proof-of-concept that allow customers to kick the tires of their solution prior to signing on the dotted line. Customers may engage

with their account team and have them submit a request to take advantage of these services for no charge.

Links Dell Customer Solution Centers – http://www.dell.com/customersolutioncenter Dell EMC FX PowerEdge Server FX Architecture – http://www.dell.com/en-us/work/learn/fx-server-solutions Dell EMC Isilon Info Hub For Hadoop - https://community.emc.com/docs/DOC-39529 Isilon Hadoop Tools - https://github.com/Isilon/isilon_hadoop_tools Cloudera – http://cloudera.com Hortonworks – http://hortonworks.com