HUAWEI OceanStor Dorado Series SSD Array Technical · PDF fileHUAWEI OceanStor Dorado Series SSD Array Technical ... 4.3.1 OLTP Case ... flash memory features and SSD design concepts

HUAWEI OceanStor Dorado Series SSD Array Technical White Paper

Issue 1.0

Date 2013-05-24

INTERNAL

HUAWEI TECHNOLOGIES CO., LTD.

Issue 1.0 (2013-05-24) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd.

i

Copyright © Huawei Technologies Co., Ltd. 2013. All rights reserved.

No part of this document may be reproduced or transmitted in any form or by any means without prior

written consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.

All other trademarks and trade names mentioned in this document are the property of their respective

holders.

Notice

The purchased products, services and features are stipulated by the contract made between Huawei and

the customer. All or part of the products, services and features described in this document may not be

within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,

information, and recommendations in this document are provided "AS IS" without warranties, guarantees or

representations of any kind, either express or implied.

The information in this document is subject to change without notice. Every effort has been made in the

preparation of this document to ensure accuracy of the contents, but all statements, information, and

recommendations in this document do not constitute a warranty of any kind, express or implied.

Huawei Technologies Co., Ltd.

Address: Huawei Industrial Base

Bantian, Longgang

Shenzhen 518129

People's Republic of China

Website: http://enterprise.huawei.com

http://enterprise.huawei.com/

HUAWEI OceanStor Dorado Series SSD Array

Technical White Paper Contents



ii

Contents

Change History ............................................................................................... 错误！未定义书签。

1 Executive Summary ...................................................................................................................... 1

2 Introduction.................................................................................................................................... 2

2.1 Limitation of Traditional Storage Arrays.......................................................................................................... 2

2.2 Flash Memory .................................................................................................................................................. 3

2.2.1 Concept and Principles............................................................................................................................ 3

2.2.2 Technical Features ................................................................................................................................... 4

2.3 SSD .................................................................................................................................................................. 6

2.3.1 Address Space Virtualization .................................................................................................................. 8

2.3.2 Capacity Redundancy ............................................................................................................................. 8

2.3.3 Garbage Collection ................................................................................................................................. 8

2.3.4 Wear Leveling ......................................................................................................................................... 9

2.3.5 Bad Block Management ........................................................................................................................ 10

2.3.6 SSD Service Life ................................................................................................................................... 10

3 Solution ......................................................................................................................................... 12

3.1 Dorado Series All-Flash-Memory Arrays ....................................................................................................... 12

3.1.2 Dorado2100........................................................................................................................................... 13

3.1.3 Dorado5100........................................................................................................................................... 13

3.1.4 Dorado2100 G2 ..................................................................................................................................... 15

3.2 Benefits .......................................................................................................................................................... 16

3.2.1 Reduced TCO ........................................................................................................................................ 16

3.2.2 Improved Customer Service Competitiveness ...................................................................................... 18

3.3 Technical Analysis .......................................................................................................................................... 19

3.3.1 Problems Caused by SSDs .................................................................................................................... 20

3.3.2 Design Philosophy ................................................................................................................................ 23

3.4 Reliability, Service Life, and Performance ..................................................................................................... 23

3.4.1 Reliability .............................................................................................................................................. 24

3.4.2 Service Life ........................................................................................................................................... 29

3.4.3 Performance .......................................................................................................................................... 30

4 Experience ..................................................................................................................................... 32

4.1 Application Analysis of All-Flash-Memory Arrays ........................................................................................ 32


Technical White Paper Contents



iii

4.2 Typical Applications in Target Industries ....................................................................................................... 33

4.3 Typical Cases .................................................................................................................................................. 36

4.3.1 OLTP Case ............................................................................................................................................ 36

4.3.2 OLAP Case ........................................................................................................................................... 37

5 Conclusion .................................................................................................................................... 39

A Acronyms and Abbreviations .................................................................................................. 40


Technical White Paper 1 Executive Summary



1

1 Executive Summary

Based on in-depth analysis and investigation into customer data centers, Huawei Technologies

Co., Ltd. (Huawei for short) finds that most data centers must cope with the following two

problems.

Problem 1: With the rapid development of cost-effective x86 servers, the virtualization

technology moves from the high-end server market towards common enterprises. Enterprises

of various scales begin to virtualize the infrastructure of their data centers to eliminate various

problems caused by the silo architecture. This virtualized infrastructure improves the

hardware usage of servers, simplifies IT management, and reduces the operating expense

(OPEX) of data centers. However, it brings the I/O blender effect to customers at the same

time. As various application systems are made invisible by the virtualized infrastructure,

various I/Os are blended. As a result, the performance of back-end storage arrays deteriorates

due to random I/Os. In addition, traditional storage arrays cannot be optimized for upper-layer

application systems. Therefore, traditional storage arrays become a performance bottleneck of

the virtualized infrastructure, preventing the maximum return on investment (ROI) from the

virtualized infrastructure.

Problem 2: To eliminate the performance bottleneck of traditional storage arrays, many

enterprises increase the IOPS by adding hard disk drives (HDDs). As a result, increased

capital expenditure (CAPEX) is incurred for storage capacity and more OPEX is incurred for

data center space and energy consumption. Traditional storage arrays counterbalance a large

proportion of benefits generated by the virtualized infrastructure. Stacking HDDs increases

only the IOPS of traditional storage arrays and cannot reduce the I/O response latency.

Therefore, service system performance cannot be improved completely.

Some enterprise customers have an in-depth understanding about the problems caused by

infrastructure virtualization and HDD stacking. To resolve the previous problems, these

customers install both HDDs and solid-state drives (SSDs) in the same storage array to meet

the requirements of various services. Before 2012, as SSDs had a relatively high unit price,

installing both HDDs and SSDs was the most cost-effective choice for most enterprise

customers. As SSDs lower the price and solid state storage technologies develop rapidly,

enterprise customers tend to use SSD arrays.

To resolve problems that bother enterprise customers, Huawei launches high-performance

OceanStor Dorado series SSD arrays. This document describes storage technology reforms

brought by solid state storage, especially by flash memories, customer concerns, and

application scenarios of Dorado series storage products.

http://3ms.huawei.com/term/docMaintain/termOperate.do?method=listTermAndDefinition&f_id=20090723000327&fd_id=25090&node_id=1-9&searchType=fulltext&searchValue=CAPEX&caseSensitive=&language_t=cn


Technical White Paper 2 Introduction



2

2 Introduction

According to documents issued by the Storage Networking Industry Association (SNIA),

solid state storage means any storage capability that is provided by non-moving memory

technologies rather than moving magnetic or optical media.

According to this definition, the random access memory (RAM), flash memory, and phase

change memory (PCM) are solid state storage. In fact, solid state storage has been used for

storing mission-critical data for a long time. Before using SSDs, some enterprises used RAM

arrays to store core data in real time, meeting the real-time computing requirements.

As flash memories developed and their prices were lowered, some storage array vendors

began to introduce flash memory–based SSDs in their storage arrays to improve storage array

performance in 2008.

After that, the storage industry found that traditional storage arrays were designed based on

HDDs. Even though inserting SSDs could instantly improve performance, it could neither

bring the advantages of flash memories into full play nor evade their disadvantages. In 2010,

various all-flash-memory arrays entered the market and were claimed to be storage arrays

designed and developed based on flash memories.

As various all-flash-memory arrays entered the market, some vendors of traditional storage

arrays launched storage arrays fully equipped with SSDs and claimed to have

all-flash-memory arrays. In fact, claiming all-flash-memory arrays was a way to expand the

market share by the vendors of traditional storage arrays. At that time, various

all-flash-memory arrays flushed into the market, confusing customers.

All-flash-memory arrays are based on flash memories and SSDs. Only those who understand

flash memory features and SSD design concepts can understand the difference between

all-flash-memory arrays and traditional storage arrays.

This chapter describes flash memories and SSDs.

2.1 Limitation of Traditional Storage Arrays

HDD-based traditional storage arrays provided reliable storage services for customers.

However, as enterprise-class storage applications become complicated, traditional storage

arrays expose their limitation:

Reliability

Because the mechanical structure of HDDs limits HDD performance, the annual failure

rate (AFR) of each HDD cannot be further reduced, restricting reliability improvement.





3

Performance

Traditional storage arrays improve the IOPS by stacking a large number of HDDs.

However, this method cannot reduce I/O latency. In addition, as data center vitalization

expands, traditional storage arrays receive more random I/Os, making dedicated

optimization more difficult.

Cost

Enterprises obtain required IOPS by stacking HDDs. This requires the additional costs of

capacity, equipment room space, and power consumption.

Flash memories and their reduced cost make it possible to resolve the previous problems.

2.2 Flash Memory

RAM-, flash memory-, and PCM-based storage modes and technologies are all called solid

state storage.

Currently, because flash memories have advantages in price, capacity, and reliability over

other storage media, they are widely applied in the solid state storage area.

2.2.1 Concept and Principles

A flash memory is an electronic non-volatile computer storage device. Non-volatile means

that the data stored in a flash memory will not be lost if the flash memory is powered off.

Currently, there are two main types of flash memory: Negated OR (NOR) and Negated AND

(NAND). Because of different usage, these two types of flash memory are used in different

areas. The NOR flash is used to store system start programs and commonly used in embedded

devices. The NAND flash is used to store data and commonly used in SSDs.

These two types of flash memory have the same operation principles: A memory cell consists

of three parts: source, drain, and gate. The source-drain current is controlled by the electric

field effect. A floating gate is added between the gate and silicon substrate for storing

electrons that are used to store memory.





4

Figure 2-1 Flash memory cell

A flash memory cell shown in Figure 2-1 represents 1-bit data. Charging the floating gate is

logically equivalent to the binary "0" value and discharging is logically equivalent to the

binary "1" value.

The binary "1" value indicates erasing.

The binary "0" value indicates programming.

2.2.2 Technical Features

Because the NAND flash is commonly applied in SSDs and all-flash-memory arrays, the flash

memory in the following sections refers to the NAND flash.

NAND flash structure

Figure 2-2 NAND flash chip structure

− Figure 2-2 shows that each NAND flash chip consists of thousands of the same

blocks that range from several hundred KBs to several MBs.





5

− Each block is divided into the same pages of 4 KB or 8 KB.

Data writing

− Data is written to the flash memory in pages.

− If a page already has data, new data cannot be written to this page until the existing

data on this page is cleared.

− Data is cleared in blocks. An entire block will be cleared if the clearing is performed

at a time. Clearing is equivalent to the erasing of the flash memory. After a block is

erased, all bits in this block are set to 1.

− Writing is equivalent to the programming of the flash memory. After data is written

onto a page, specified bits are changed from 1 to 0. In this way, this page saves the

data.

− The flash memory works in the program-erase cycles. Each circle is called one

program/erase (P/E).

− Each block in the flash memory has a finite number of P/Es. If the number of P/Es

reaches the threshold, data cannot be accessed correctly.

− The number of P/Es depends on multiple factors. The number of P/Es in the

following sections is based on the performance of mainstream memories at the point

in time when this document is prepared.

Data reading

− After a period of time, errors may occur on multiple bits of data stored in the flash

memory. If data read from the page is transferred to upper-layer services, these

services may fail.

− To ensure that data transferred to upper-layer services is correct and valid, the flash

memory reserves space for storing error-correcting codes (ECCs) of service data.

When data is being read, controllers use corresponding ECCs to detect and correct

errors for the data.

− Because controllers have limited computing capability, the error-correcting range of

an ECC is restricted. ECCs are valid only when the number of bit errors does not

exceed the upper threshold. Currently, a typical ECC can correct 32-bit errors in each

1 KB. When the number of bit errors is within 32 in each 1 KB of service data or

ECC data, controllers can compute out correct and valid service data.

− If the number of bit errors on a page exceeds the computing capability of a controller,

data on this page is incorrectly read and an error message indicating uncorrectable

(UNC) is generated.

− The UNC error can be repaired only by a higher-level RAID mechanism.

Categories of the NAND flash

− The NAND flash is divided into single-level cell (SLC), multi-level cell (MLC), and

triple-level cell (TLC).

− Figure 2-1 shows a cell of a NAND flash chip.

− In an SLC, each cell can store only one bit of information: 1, 0.

− In an MLC, each cell can store multiple bits of information. In actual situation, each

cell stores only two bits of information: 00, 01, 10, 11.

− In a TLC, each cell can store three bits of information: 000, 001, 010, 100, 101, 110,

111.

− The MLC is further divided into the enterprise MLC (eMLC) and consumer MLC

(cMLC).





6

− The eMLC and cMLC are the same in essence. During manufacturing, vendors define

MLCs that are verified to have larger number of P/Es as eMLCs and the remaining

MLCs as cMLCs.

− The MLC is commonly used to represent the cMLC in the industry.

− Various types of the NAND flash vary in capacity, the number of P/Es, and price, as

shown in Table 2-1.

Table 2-1 Comparison among NAND flash types

Capacity Per Unit Volume

Number of P/Es Price per Unit Capacity

SLC Small About 100,000 High

eMLC Moderate About 30,000 Medium

cMLC Moderate 5000 to 10,000 Low

TLC Large 500 to 1000 Very low

− Currently, the SLC and eMLC are applied in the enterprise-class market and the

cMLC is applied in the consumer market. The TCL has not been applied widely.

2.3 SSD

This document describes flash memory–based SSDs. Figure 2-3 shows the structure of an

SSD.





7

Figure 2-3 Components of an SSD

An SSD consists of a controller, memories, and flash chips. Most vendors keep the exterior,

interface properties, and data access methods of both SSDs and HDDs consistent during

manufacturing. Therefore, SSDs can be applied to scenarios where HDDs are applied.

The controller provides the ports for connecting external hosts and managing internal flash

memory and uses the embedded CPU to run SSD firmware. The SSD firmware manages the

storage address space, flash memory physical space, garbage collection (GC), and wear

leveling (WL) that can be perceived by hosts. Controllers used by Huawei SSDs are

self-designed ASIC chips with independent intellectual property rights that provide 6 Gbit/s

SAS 2.0 ports.

The memory is used to operate SSD firmware and store items required by address space

virtualization.

Multiple flash chips are distributed on the circuit board to provide storage space for SSDs.

Compared with an HDD, an SSD does not have the voice coil motor or cantilever. Therefore,

an SSD has strong shockproof. In addition, the multi-concurrent access and low latency of an

SSD increase the IOPS of an SSD by more than two orders of magnitude.





8

2.3.1 Address Space Virtualization

Section 2.2.2 Technical Features indicates that the number of flash P/Es is limited. If the

number of P/Es in a physical area reaches the upper limit because a large amount of data is

written to hotspot areas, the physical area fails. To resolve this problem and accelerate SSDs'

response to write requests, address space virtualization is designed for SSDs.

What is address space virtualization?

1. The mapping from the logical block address (LBA) to the physical block address (PBA)

of an SSD is changeable.

2. The minimum manageable unit of an SSD is page. Each page has a unique number,

namely the PBA.

3. A mapping table is maintained in an SSD to record mappings between LBAs and PBAs.

4. When data is written to an SSD, one or more clean pages are selected for storing the data.

Meanwhile, the mappings are recorded by the mapping table. Clean pages refer to those

pages that are erased for once and have not been programmed.

5. After address space virtualization, if data is written to the same area on an SSD

repeatedly, the data is written to different physical areas on the SSD.

2.3.2 Capacity Redundancy

To prevent the failure of an entire SSD due to faulty flash memories, SSDs are designed to

achieve capacity redundancy. For example, an SSD with a nominal capacity of 100 GB can

provide an actual flash memory–based physical capacity of more than 110 GB.

The ratio between the part that exceeds the nominal capacity and the nominal capacity is

called the redundancy ratio. Generally, an SSD with a lager redundancy ratio provides higher

reliability, longer service life, and better performance.

All OceanStor Dorado series SSD arrays use Huawei self-developed SSDs. The redundancy

ratio of these SSDs is up to 28%, meeting the requirements of enterprises. Table 2-2 lists the

nominal capacities and physical capacities of some Huawei self-developed SSDs.

Table 2-2 Redundancy ratio of Huawei self-developed SSDs

Nominal Capacity Physical Capacity Redundancy Ratio

100 GB 128 GB 28%

200 GB 256 GB 28%

400 GB 512 GB 28%

2.3.3 Garbage Collection

The address space virtualization and redundancy ratio can not only prevent the failure of an

SSD due to flash failure, but also provide switch space for the GC of SSDs to ensure steady

SSD performance.

Address space virtualization eliminates repeated data reads and writes in the same physical

area but brings junk data and junk pages at the same time. The process is described as

follows:





9

1. A host accesses LBA 100 and writes data AA. In this example, page 401 is used to store

data AA.

2. After a period of time, the host accesses LBA 100 again and writes data BB. Then the

address space virtualization mechanism transfers data to another page. In this example,

data BB is transferred to page 623.

3. After the previous two steps, data AA is stored on page 401 and data BB is stored on

page 623. Data stored on page 401 is invalid while that stored on page 623 is valid.

4. Data stored on page 401 is called junk data.

5. Because old data in flash memory must be erased before new data is written in, new data

cannot be directly written on page 401. Therefore, page 401 is called junk page before

data on it is erased.

GC is designed to erase junk data on junk pages. After GC, data can be written on to these

pages.

For an SSD, GC is a background task for monitoring the usage of blocks and pages on the

SSD. When a lot of junk pages exit,

1. Migrate valid data on blocks that contain many junk pages to other clean pages on other

blocks.

2. Erase the blocks that contain no valid data.

3. Place these blocks into the resource pool for new data writes.

The process of GC shows that GC not only cleanses SSDs but also generates more data writes.

For this reason, flash memory carries more data than that written by the host. This

phenomenon is known as write amplification.

When the service model is fixed, the ratio between the data amount that flash memory carries

and the data amount that a host writes is a fixed value with a small fluctuation range. This

ratio is called write amplification coefficient.

The write amplification coefficient varies with the redundancy ratio and other algorithms of

SSD firmware. A smaller write amplification coefficient defines better performance. The write

amplification coefficient of Huawei self-developed SSDs is about 2.5 for all small random

I/Os and is about 1.1 for large sequential I/Os, reaching the mainstream level in the industry.

2.3.4 Wear Leveling

Only address space virtualization and capacity redundancy are insufficient for preventing

some flash blocks from reaching the upper limits of the P/Es earlier than the others. To ensure

all flash blocks are erased and written evenly, WL is introduced.

WL records the number of P/Es on each block and chooses those blocks with fewer P/Es for

erasing or data writing. The service life of SSDs with WL is maximized.

WL is divided into dynamic WL and static WL.

Dynamic WL

Dynamic WL refers to the WL triggered by host I/Os.

When a host sends a write request, an SSD needs to find one or more clean pages for

data writing. Then the dynamic WL algorithm is activated to choose blocks with fewer

P/Es to provide clean pages.

Static WL

Static WL refers to the WL started internally by an SSD.





10

If an SSD of 100 GB is fully occupied by user data. Among these 100 GB data, 99 GB is

cold data that has not been updated since written and 1 GB is hot data that is updated

frequently.

The cold data, if not handled, occupies at least 99 GB physical flash memory space

permanently. Only 1 GB space is available for data erasing and writing. As a result, the

number of P/Es in the 99 GB space is much smaller than that in the 1 GB space. The

SSD fails in advance.

Static WL can resolve the previous problem. Static WL records the number of P/Es on

each block, recognizes blocks that are not seriously worn and have no junk pages for

long time (or blocks that store cold data), and migrates valid data on these blocks to

blocks that are worn more seriously. In this way, the entire SSD is worn evenly.

2.3.5 Bad Block Management

Even though various mechanisms and algorithms are used to prolong the service life of SSDs,

flash memory is damaged inevitably. Capacity redundancy provides foundation for

eliminating flash memory damage.

Flash memory damage is measured in pages. A block contains multiple pages. Among these

pages, some are normal while the others are damaged.

In actual situation, if several pages are damaged in a block, the other pages in the block are

easily damaged. For this reason, SSD firmware manages flash memory damage in blocks. If

the number of pages on which data cannot be read exceeds a threshold in a block, the block is

regarded as a damaged block. Then valid data on the block is migrated to other available

blocks. This block is marked as damaged and will not be used for storing service data any

more.

Generally, SSD firmware discovers bad blocks using the following two ways: host I/O

triggering and internal inspection.

The previous task is called bad block management.

2.3.6 SSD Service Life

Address space virtualization, capacity redundancy, GC, WL, and bad block management

maximize the service life of SSDs.

Generally, the service life of SSDs can be calculated using the following algorithm:

If the host service carried on SSDs is the around-the-clock database service, the IOPS is about

5000, average I/O is 8 KB, and read/write ratio is 40%:60%. The daily data amount written is

calculated as follows:

5000 x 60% x 8 KB x 60 x 60 x 24 ≈ 2 TB

Based on daily data amount 2 TB and write amplification coefficient for all random I/Os 2.5,

the service life of SSDs with various types and capacities is calculated as follows:





11

Table 2-3 Estimated service life of Huawei self-developed SSDs

SSD Calculation Calculation Result

100 GB SLC

More than 7 years

200 GB SLC

More than 14 years

200 GB eMLC

More than 4 years

400 GB eMLC

More than 8 years

The analysis and statistics from operating system and storage device mainstream vendors

show that the average data amount written onto each SSD every day is far less than 50 GB in

the enterprise-class market. However, the daily data amount written used in the previous

formula is 2 TB that is 40 times that of the actual one.

Compared with the calculation results in Table 2-3, the service life of SSDs under actual

service pressure increases by 40 times to 100 years.

Therefore, the service life of SSDs can meet the requirements of various enterprise customers.


Technical White Paper 3 Solution



12

3 Solution

As the price of flash media per unit capacity declines, all-flash-memory arrays enter the

market. Now, many enterprises claim to have all-flash-memory arrays. Free from the

limitation of HDDs, all-flash-memory arrays have various exteriors such as an all-in-one box

and traditional storage array fully equipped with SSDs. In fact, all-flash-memory arrays from

some enterprises are traditional storage arrays having HDDs replaced with SSDs.

The I/O response latency of SSDs is 2 orders of magnitude lower than that of HDDs.

Therefore, traditional storage arrays for HDDs cannot bring SSD advantages into full play.

This chapter describes the difference between all-flash-memory arrays and traditional storage

arrays, basic features of HUAWEI Dorado all-flash-memory arrays, and customer benefits

from all-flash-memory arrays.

3.1 Dorado Series All-Flash-Memory Arrays

The Dorado series are all-flash-memory arrays developed by Huawei and contain

self-developed array controllers, software, and SSDs. The Dorado series is featured by

enhanced reliability, high performance, ease of use, and ease of maintenance.

Figure 3-1 Dorado identifier

Figure 3-1 shows a rapid marching dorado, the identifier of Dorado series all-flash-memory

arrays.

Dorado means a kind of fish in Latin, indicating that the Dorado series is the fastest one

among storage devices. Dorado series all-flash-memory arrays demonstrate their outstanding

IOPS and latency in globally recognized performance benchmark tests.





13

3.1.2 Dorado2100

The Dorado2100 is the first product of the Dorado series and has been replaced by the

Dorado2100 G2.

Figure 3-2 Front view of a Dorado2100 controller enclosure with an air hood

Table 3-1 lists Dorado2100 specifications.

Table 3-1 Dorado2100 specifications

Form The 2 U controller enclosure houses twenty-four 2.5-inch SSDs.

The controller enclosure is fully equipped with SSDs of the same model

for sales and does not support expansion disk enclosures.

All active components including controllers, power modules, and fans are

redundant and field replaceable.

SSD SLC: 50 GB and 100 GB

MLC: 100 GB and 200 GB

Capacity SLC: 1.2 TB and 2.4 TB

MLC: 2.4 TB and 4.8 TB

Host Connectivity

8 Gbit/s Fibre Channel

Performance SPC-1 IOPSTM

: 100,051.99 @ 0.95 ms

3.1.3 Dorado5100

The Dorado5100 is a high-performance all-flash-memory array featured by flexible

configuration and wide coverage.

Figure 3-3 Front view of a Dorado5100 controller enclosure with an air hood





14

Figure 3-4 Rear view of a Dorado5100 controller enclosure

Figure 3-5 Dorado5100 disk enclosure

Table 3-2 lists Dorado5100 specifications.

Table 3-2 Dorado5100 specifications

Form The stand-alone 4 U controller enclosure supports multiple interface

cards.

A 2 U disk enclosure houses twenty-four 2.5-inch SSDs.

A disk enclosure is fully equipped with SSDs of the same model for sales.

A Dorado5100 device supports a maximum of four disk enclosures of the

same specifications.




eMLC: 200 GB and 400 GB

Capacity SLC: 2.4 TB to 19.2 TB

eMLC: 4.8 TB to 38.4 TB

Host Connectivity

8 Gbit/s Fibre Channel, 10 Gbit/s Ethernet or iSCSI


: 600,052.49 @ 1.09 ms

Advanced Feature

Snapshot and remote replication





15

3.1.4 Dorado2100 G2

The Dorado2100 G2 is a substitute of the Dorado2100. The Dorado2100 G2 has much better

performance and more advanced features compared with the Dorado2100.

Figure 3-6 Front view of a Dorado2100 G2 controller enclosure with an air hood

Figure 3-7 Rear view of a Dorado2100 G2 controller enclosure without an air hood

Figure 3-8 Dorado2100 G2 disk enclosure

The previous figures show that the Dorado2100 G2 controller enclosure accommodates both

controllers and disks, and can be connected to disk enclosures. Each controller enclosure can

provide 25 disk slots to facilitate hot spare disk configuration.





16

Table 3-3 lists Dorado2100 G2 specifications.

Table 3-3 Dorado2100 G2 specifications

Form The 2 U controller enclosure houses twenty-five 2.5-inch SSDs.

A 2 U disk enclosure houses twenty-five 2.5-inch SSDs.

Both disk enclosures and the controller enclosure are fully equipped with

SSDs of the same model for sales.

A Dorado2100 G2 device supports a maximum of three disk enclosures

of the same specifications.




eMLC: 200 GB and 400 GB

Capacity SLC: 2.5 TB to 20.0 TB

eMLC: 5.0 TB to 40.0 TB

Host Connectivity

8 Gbit/s Fibre Channel, 10 Gbit/s Ethernet or iSCSI with the TCP offload

engine (TOE) technology, and 40 Gbit/s InfiniBand


: 400587.11 @ 0.75 ms

Advanced Feature

Thin provisioning, global WL, and VMware VAAI

3.2 Benefits

All-flash-memory arrays are designed to help customers improve data center processing

capability, enhance service competitiveness, reduce the total cost of ownership (TCO), and

cope with the challenges of application scenarios where traditional storage arrays are not

applicable.

3.2.1 Reduced TCO

Currently, the price per unit capacity of all-flash-memory arrays is several times higher than

that of traditional storage arrays. Therefore, some customers simply think that

all-flash-memory storage arrays are more expensive than traditional storage arrays. In actual

situation, all-flash-memory arrays have low latency, high performance, and low space and

power consumption requirements. In addition, they acquire high IOPS without stacking disks

and require a lower maintenance cost. Therefore, the TCO of all-flash-memory arrays is lower

than that of traditional storage arrays.





17

Table 3-4 describes the TCO comparison between traditional storage arrays H and I and the

Dorado2100 G2.

Table 3-4 TCO comparison between traditional storage arrays and the Dorado2100 G2

Traditional Storage Array H

Traditional Storage Array I

Dorado2100 G2

Disk Configuration

896 x 10k rpm 300

GB SAS HDDs

230 x 15k rpm 300

GB SAS HDDs

100 x 400 GB

eMLC SSDs

Physical Capacity 268,800 GB 69,000 GB 40,000 GB

SPC-1 IOPSTM 109,986.41 82,496.08 Approximately

250,000

Latency Approximately 0.5

ms

Approximately 7 ms Approximately 2 ms

Price

(Including 3-year Warranty)

$484,985.78 $361,416.00 Approximately

$310,000

Price/Capacity Ratio

($/GB)

1.80 5.24 Approximately 7.75

Price/Performance Ratio

($/SPC-1 IOPSTM)

4.41 4.38 Approximately 1.24

Rack Space 3 racks 16 U 8 U

Typical Power Consumption

Approximately 10

kW

Approximately 3.3

kW

Approximately 1.5

kW

First Year's OPEX $42,000 $5600 $2800

Second Year's OPEX

$42,000 $5600 $2800

Third Year's OPEX $42,000 $5600 $2800

TCO $610,985.78 $53,216.00 Approximately

$318,400

The data about traditional storage arrays H and I in Table 3-4 is from SPC-1 reports on the

SPC website and from specifications on the official websites of the products. These storage

arrays passed the SPC-1 benchmark test in early 2013 that was close to the time when this

document was prepared. Therefore, the data from SPC-1 reports is valuable.

To view the SPC-1 test results, go to

http://www.storageperformance.org/results/benchmark_results_spc1.

http://www.storageperformance.org/results/benchmark_results_spc1





18

In Table 3-4, the Dorado2100 G2 is fully equipped with one hundred 400 GB eMLC SSDs

and provides 40 TB physical storage space, meeting the space requirements of various

high-performance storage devices.

The operating expense (OPEX) mainly includes space expense and electricity consumption

payment. As the OPEX varies with regions, it is computed based on the costs for rack leasing

of mainstream telecom carriers in China.

The previous table shows that traditional storage arrays achieve relatively high performance

by stacking disks. This increases customers' expense for unnecessary capacity. In addition, the

high performance of traditional storage arrays is not comparable with all-flash-memory arrays

in terms of IOPS or latency. Besides meeting the capacity requirements of customers, the

Dorado2100 G2 reduces customers' TCO. In addition, the decreasing price makes

all-flash-memory arrays more competitive.

All-flash-memory arrays can help customers who do not require much for capacity but require

high performance to reduce the investment and TCO.

3.2.2 Improved Customer Service Competitiveness

All-flash-memory arrays can help customers save investment and solve problems that

traditional storage arrays cannot cope with.

In business expansion, a building materials retailer found that the high latency of traditional

storage arrays caused that the I/O wait of the database running on the server kept constantly

high. This greatly delayed each transaction on the entire business system and limited the valid

concurrency increasing of the entire system.

The retailer found that under a specific concurrent pressure, the performance of the entire

system was not improved by adding traditional storage arrays.

To resolve this problem, the IT department of this retailer communicated with the system

integration agent. The agent provided the following two solutions:

Redesign the IT infrastructure, including servers and storage devices.

Use SSD arrays as the primary storage arrays.

After estimation, the retailer preferred the second solution because they thought that the first

solution requires lots of changes and the result is difficult to assess.

After the Dorado5100 was used as the primary storage device in the original storage system,

the retailer found that the waiting time of service processing was reduced to 18.3% of the

original one. In the actual situation, the waiting time of each transaction was reduced to 20%

of the original one. In the simulated environment, the maximum number of system users was

increased by 20 times. After the simulation, the retailer chose the Dorado5100 to improve

service processing capability of the entire system and achieve service expansion.

The previous case shows that all-flash-memory arrays can help customers achieve their

business objectives while saving IT investment and reducing system latency. As a result,

customers' service competitiveness is improved.

For details about this case, see section 4.3.1 "OLTP Case."





19

3.3 Technical Analysis

An all-flash-memory array is a storage array that is independent of HDDs and uses only flash

memories for data access.

According to the definition, all-flash-memory arrays in the market can be divided into the

following three types:

Proprietary structure

Various components are placed in an all-in-one box. This kind of all-flash-memory

arrays is like a rack-style device with SSDs, namely lager SSDs. Except for high

performance, these arrays have no other storage features and are hard to maintain.

Open structure

This kind of all-flash-memory arrays has no difference in appearance from traditional

storage arrays. However, as the system software of the all-flash-memory arrays is

developed based on flash memories, these arrays have high performance as well as

various storage features.

Traditional storage array fully equipped with SSDs

The HDDs of traditional arrays are replaced with SSDs. This kind of all-flash-memory

arrays has common performance but diversified functions and features.

A traditional storage array whose HDDs are replaced with SSDs can be called an

all-flash-memory array. However, this all-flash-memory array cannot bring the advantages of

flash memory into full play.

Figure 3-9 Comparison among three types of all-flash-memory arrays

Figure 3-9 compares the SPC-1 IOPSTM

performance between these three kinds of SSD arrays.

The horizontal axis represents IOPS and the vertical axis represents latency (ms). The figure

shows the latencies of storage arrays under various IOPS pressures. Data in Figure 3-9 is from

the SPC official website.





20

The SPC-1 is an I/O model and test benchmark defined by the Storage Performance Council

(SPC) and used for simulating the I/O features of online transaction processing (OLTP) and

online analytical processing (OLAP). The SPC-1 test benchmark is well-known in the storage

industry. Generally, the SPC-1 IOPSTM

test index of entry-level and mid-range storage arrays

is lower than 50k, that of mid-range and high-end storage arrays ranges from 50k to 200k, and

that of high-end storage arrays ranges from 200k to 300k.

Figure 3-9 compares the performance of five models of all-flash-memory arrays. These arrays

are categorized as follows:

Proprietary structure: RamSan-630

Open structure: Dorado series

Traditional storage array fully equipped with SSDs: V7000

Figure 3-9 shows the I/O response latencies of all-flash-memory arrays when the IOPS

increases.

The comparison shows that:

1. The Dorado5100, Dorado2100 G2, and RamSan-630 have better performance than

high-end traditional storage arrays.

2. When the IOPS is low, the RamSan-630 has a slightly lower latency than the Dorado

series. When the IOPS is high, the latency of the Dorado series keeps low while that of

the RamSan-630 greatly increases.

3. The latency of the V7000 (all SSDs) increases linearly as the IOPS increases. The V7000

does not bring the low latency of flash memory into full play.

4. Among these storage arrays, only the Dorado series inherits the low latency of flash

memory and has better performance than the other kinds of all-flash-memory arrays.

The Dorado series achieves better performance by rewriting system software, inherits the

reliability and maintainability advantages of traditional storage arrays, and implements higher

reliability and maintainability by improving these inherited advantages based on flash

memory.

3.3.1 Problems Caused by SSDs

Problem caused by low latency

The relationship among the IOPS, concurrent I/Os, and I/O latency is as follows:

The latency of a high-performance enterprise-class SAS HDD is about 5 ms for 4 KB I/O

random access.

The latency of a SAS SSD is about 0.2 ms for 4 KB I/O random access.

In Figure 3-10, an HDD and an SSD are respectively connected to a host with the same

configuration. Then these hosts deliver single I/Os.





21

Figure 3-10 An HDD and an SSD directly connected to hosts

In the previous figure, the left drive is an HDD and its IOPS is calculated as follows:

1/5 ms = 200 IOPS

The right drive is an SSD and its IOPS is calculated as follows: 1/0.2 ms = 5000 IOPS

Use a controller to connect between the host and the HDD, and between the host and the SSD,

as shown in Figure 3-11.

Figure 3-11 An HDD and an SSD connected to hosts through controllers

These controllers cause processing latency. Generally, the latency varies with pressures. The

latency is about 0.2 ms for single I/Os.

The IOPS perceived by hosts is changed.





22

In the previous figure, the IOPS on the left side is calculated as follows: 1/(0.2 ms + 5 ms) =

192 IOPS

The IOPS on the right side is calculated as follows: 1/(0.2 ms + 0.2 ms) = 2500 IOPS

The previous calculation shows that for an HDD, the latency slightly increases and the IOPS

slightly decreases after you install a controller. This is why the IOPS of HDD arrays can be

estimated by multiplying the IOPS per HDD by the number of HDDs. For an SSD, the latency

is doubled and the IOPS reduce by half after you install a controller. Therefore, the IOPS of

an all-flash-memory array cannot be estimated by simply multiplying the IOPS per SSD by

the number of SSDs.

An all-flash-memory array constructed by replacing HDDs with SSDs cannot bring the high

performance of SSDs into full play.

Performance Difference for Processing Random and Sequential I/Os

HDDs and SSDs vary greatly in performance when processing various I/Os. Table 3-5 lists

the performance difference for processing random and sequential I/Os.

Table 3-5 Performance difference between an HDD and an SSD for processing random and

sequential I/Os

4 KB Random I/O 512 KB Sequential I/O

IOPS Bandwidth (MB/s)

IOPS Bandwidth (MB/s)

HDD Read Approximately

200

Approximately

0.8

Approximately

400

Approximately

200 Write

SSD Read Approximately

20,000

Approximately

80

Approximately

500

Approximately

250

Write Approximately

60,000

Approximately

240

Approximately

600

Approximately

300

Table 3-5 shows that for an HDD, the bandwidth for sequential I/Os is more than 200 times as

large as that for random I/Os. This is why traditional storage arrays use a complicate cache

algorithm to reconstruct service data delivered from hosts and then access HDDs sequentially

to improve system performance.

For an SSD, the bandwidth for sequential writes is only about 4 times as large as that for

random writes. The bandwidth for sequential reads is only about 2 times as large as that for

random reads.

Therefore, the cache algorithm and I/O scheduling algorithm of various vendors for HDDs

may not apply to all-flash-memory arrays.

Performance Bottleneck

For a traditional storage array, HDDs bottleneck its performance. Therefore, the IPOS and

bandwidth of the entire storage array can be increased by adding HDDs. In addition, the

system software of a traditional storage array is developed for eliminating the performance

bottleneck caused by HDDs.





23

For an all-flash-memory array, the IOPS of an SSD reaches tens of thousands. The IOPS of a

disk enclosure with 24 slots reaches one million. Therefore, the performance bottleneck of an

all-flash-memory array lies in the controller, including the CPU processing capability, system

bandwidth, and system software designs and algorithms.

Compared with traditional storage arrays, all-flash-memory arrays have different performance

bottlenecks. Therefore, the hardware design of all-flash-memory arrays must be different.

Otherwise, the high performance of these arrays cannot be brought into full play.

Limited Number of P/Es

For flash memory, the number of P/Es is limited. Related studies show that the failure

probability of flash memory increases as P/Es increase.

Even though the failure rate of SSDs can be kept low before P/Es reach the maximum number,

the failure rate can be further reduced to prolong the service life of an entire all-flash-memory

array if the P/Es are reduced and the frequent access of hotspot data areas to some SSDs is

eliminated.

Currently, many all-flash-memory array vendors provide online deduplication and global WL

to reduce P/Es and prevent hotspot areas from failure in advance.

3.3.2 Design Philosophy

The Dorado series are all-flash-memory arrays developed by Huawei. When designing and

developing the Dorado series, R&D personnel observe the following rules:

Inherit the advantages of Huawei traditional storage arrays and adapt to customers'

existing habits in using storage arrays. Traditional storage arrays present accumulated

experience in ensuring system reliability and maintainability. For example, all active

components are redundant and replaceable online. These features of traditional storage

arrays have been inherited.

Fully consider the performance difference between SSDs and HDDs. The performance

of SSDs is two orders of magnitude higher than that of HDDs, which causes the change

of the system performance bottleneck. Therefore, the performance designed for HDDs

must be reviewed.

Fully consider the failure modes of SSDs and HDDs. The AFR of SSDs is 0.44% and

that of HDDs is 0.6%. Even though the statistics show that SSDs are more reliable than

HDDs, the failure modes of SSDs and HDDs are different. Therefore, dedicated design

and development for SSDs can further reduce the failure rate of SSDs on an

all-flash-memory array.

3.4 Reliability, Service Life, and Performance

Based on continuous development for several years, SSDs and SSD arrays have been greatly

improved in reliability, service life, and performance, meeting the requirements of various

enterprise-class storage applications.





24

3.4.1 Reliability

Reliability Basics

Reliability is divided into reliability in narrow sense and reliability is broad sense. In narrow

sense, reliability refers to the zero-failure ability of a device. In wide sense, it refers to the

zero-failure probability of a device, mean time to repair a device, and availability of a device

in a long running period.

Reliability in this document refers to the reliability in broad sense.

Zero-failure probability

Mean time between failures (MTBF), failure in time (FIT), and annual failure rate (AFR)

are used to access the failure probability of devices.

MTBF is a measure of the reliability of the system. It refers to the average time between

consecutive failures of a piece of equipment. It is expressed in hours. A larger MTBF

defines a more reliable device.

FIT is the measure of the number of failures per one billion devices hours. For example,

1 FIT = 1 failure in 109 device hours. The FIT of a device is the sum of the FIT of each

component. A smaller FIT defines a more reliable device.

AFR is a statistic. It is a statistical failure rate based on a large number of samples and

expressed in percentage. A smaller AFR defines a more reliable device.

The relationship among MTBF, FIT, and AFR is as follows:

− MTBF = 109/FIT

− FIT = (109 x AFR)/(365 x 24)

Mean time to repair a device

Mean time to repair (MTTR) is used to measure the mean time to repair a device.

MTTR is a basic measure of the maintainability of repairable items. It means the average

time that a component or device will take to recover from any failure and expressed in

hours. In essence, it refers to the fault tolerance capability of a device. A smaller MTTR

defines a stronger fault tolerance capability.

Availability of a device in a long-term running

Availability is the probability that a system will work as required during the period of a

mission. Availability can be calculated by the following formula:

A = MTBF/(MTBF + MTTR)

The formula indicates that increasing MTBF or decreasing MTTR can improve the

availability of a device.

For a device consisting of multiple reparable, replaceable, and standalone components,

its availability is a multiple of the availability of each component:

Availability of a device = Availability of component 1 x Availability of component 2 x

Availability of component N

For telecommunications, 99.999% of availability requires that MTTR not be more than 5

minutes.

http://3ms.huawei.com/term/docMaintain/termOperate.do?method=listTermAndDefinition&f_id=20121017000170&fd_id=134253&node_id=1-9&searchType=fulltext&searchValue=AFR&caseSensitive=&language_t=cn





25

SSD Reliability

SSDs are the primary components of an all-flash-memory array. Compared with HDDs, SSDs

do not have mechanical components. Therefore, the failure modes of SSDs and HDDs are

different. Generally, the failures of SSDs are easy to predict and manage. This is why the AFR

of SSDs is far lower than that of HDDs.

For details about the reliability of SSDs, refer to HUAWEI SSD Technical White Paper.

Hardware Reliability

The Dorado series all-flash-memory arrays use full hardware redundancy design. All active

components are redundant. This eliminates single points of failure and supports online

replacement.

Figure 3-12 Components in a controller enclosure of Dorado5100

Figure 3-12 shows components in a controller enclosure of Dorado5100. These components

are described as follows:

1: system enclosure

The passive design ensures high system reliability.

2: controllers

The two controllers back up for each other. They are field replaceable.

3: backup battery units (BBUs)

The four BBUs are effective against unexpected power failures.

4: fans

The three fans with 16-gear speed control ensure smooth heat dissipation.

5: power modules

The four power supplies greatly reduce the possibility of system power failures.

6: interface card slots

The interface card slots support Fibre Channel and SAS interface cards.





26

Software Reliability

For Dorado series all-flash-memory arrays, not only common measures such as active-active

controllers, RAID protection, global hot spare, and online upgrades are taken to ensure system

reliability, but also dedicated measures are taken to improve system reliability based on SSD

features and failure modes.

The statistics and analysis about faulty SSDs from Huawei and vendors show that there are

two reasons for SSD failure:

Flash chip failure

SSD hardware defects

Even though the flash chip failure rate cannot be eliminated, the RAID protection and repair

can reduce SSD failure due to flash chip failure.

HUAWEI Dorado series uses self-developed SSDs. Each generation of the SSDs gradually

progresses towards enhanced capability and eliminates the design and implementation defects

of the last generation. Currently, the Dorado series uses the third-generation SSDs.

The Dorado series all-flash-memory arrays implement the following features to improve

system reliability:

Bad block repair. All failed areas on SSDs are obtained and data is repaired using

RAIDs.

Global capacity redundancy. The redundant space of the other SSDs is used to cope with

the failure of multiple flash chips on some SSDs of the same storage array.

Global anti-wear leveling. Global WL is used to prolong the service life of all SSDs and

global anti-wear leveling is used to prevent multiple SSDs in the same RAID group from

failure at the same time.

SSD staggered running. Many software errors are caused by overflowing counters and

these errors are hard to discover in the development phase. The Dorado series

implements SSD staggered running. The running periods of all SSDs are staggered. In

this way, batch failure caused by failed counters is prevented.

The average AFR of SSDs in the industry is about 0.44%. The AFR of Huawei SSDs is only

0.29%, 65% of the average AFR in the industry.

Availability Analysis

The following analyzes and computes the availability of a Dorado series all-flash-memory

array. Table 3-6 lists the reliability data of components in a Dorado series all-flash-memory

array.

Table 3-6 Reliability data of components in a Dorado series all-flash-memory array

Item Component FIT MTBF (Hour)

MTTR (Hour)

Availability (%)

Controller

enclosure

Controller 2500 400,000 0.5 99.99988

Backplane 150 6,666,666.7 4 99.99994

Fan 1000 1,000,000 0.1 99.99999

Power supply 1000 1,000,000 0.1 99.99999





27

Item Component FIT MTBF (Hour)

MTTR (Hour)

Availability (%)

BBU 1000 1,000,000 0.1 99.99999

Disk

enclosure

Expansion

module

400 2,500,000 0.2 99.99999

Backplane 150 6,666,666.7 4 99.99994

Fan 1000 1,000,000 0.1 99.99999

Power supply 1000 1,000,000 0.1 99.99999

SSD Disk unit 331 3,021,148 1 99. 99997

Analyzing the availability of a Dorado series all-flash-memory array involves the controller

enclosure, disk enclosures, and RAID groups:

Availability of a Dorado series all-flash-memory array = Availability of the controller

enclosure x Availability of disk enclosures x Availability of RAID groups

For a simplified calculation, the following assumes that all redundant components are in 1+1

redundancy.

Availability of the controller enclosure

Availability of dual controllers (A1) = 1 – (1 – Availability of a single controller) x (1 –

Availability of a single controller) = 99.99999%

Availability of the backplane (A2) = 99.99994%

Availability of dual fans (A3) = 1 – (1 – Availability of a single fan) x (1 – Availability of

a single fan) = 99.99999%

Availability of dual power supplies (A4) = 1 – (1 – Availability of a single power supply)

x (1 – Availability of a single power supply) = 99.99999%

Availability of dual BBUs (A5) = 1 – (1 – Availability of a single BBU) x (1 –

Availability of a single BBU) = 99.99999%

Available of the controller enclosure = A1 x A2 x A3 x A4 x A5 = 99.99990%

Availability of a disk enclosure

Availability of dual expansion modules (A1) = 1 – (1 – Availability of a single expansion

module) x (1 – Availability of a single expansion module) = 99.99999%

Availability of the backplane (A2) = 99.99994%

Availability of dual fans (A3) = 1 – (1 – Availability of a single fan) x (1 – Availability of

a single fan) = 99.99999%

Availability of dual power supplies (A4) = 1 – (1 – Availability of a single power supply)

x (1 – Availability of a single power supply) = 99.99999%

Available of a disk enclosure = A1 x A2 x A3 x A4 = 99.99991%

Availability of a RAID group

The following assumes that each six SSDs form a RAID 5, which allows the failure of

only one SSD:

Availability of a RAID group = 1 – (1 – Availability of single SSD) x (1 – Availability of

single SSD) x C (6, 2) = 99.99999%





28

Availability of a Dorado all-flash-memory array

If a Dorado5100 all-flash-memory array is equipped with 1 controller, 4 disk enclosures,

96 SSDs and 16 RAID groups,

Availability of the Dorado5100 all-flash-memory array = Availability of the controller

enclosure x (Availability of a disk enclosure)4 x (Availability of a RAID group)

16 =

99.99970%

Concluded from the previous equations, the hardware of a Dorado series

all-flash-memory array delivers enterprise-class reliability, 99.999% or more.

Network Reliability

Dual-Switch Networking and UltraPath Software

HUAWEI OceanStor Dorado series SSD arrays have active-active controller architecture

and allow a dual-switch network. As shown in Figure 3-13, Dorado series SSD arrays

have four paths that back up for one another.

Figure 3-13 Dual-switch networking

Controller A Controller B

The OceanStor Dorado series SSD array uses Huawei self-developed UltraPath as

multipathing software. The UltraPath is installed on each server to provide multiple

paths from each server to the SSD array. The multipath design enables a network to

provide higher reliability and performance.

The UltraPath performs the following functions:

a. Presents physical disks as an integrated unit to the operating system.

b. Automatically switches over services from the active path to a standby path once the

active path fails.

This function is called failover.

c. Automatically switches services back to the active path after the active path

recovers.

This function is called failback. Failover and failback eliminate single points of

failure on paths.





29

d. Balances I/Os among reachable paths using the shortest queue first algorithm, least

load first algorithm, or round robin scheduling algorithm.

The UltraPath can run on Linux, AIX, and Windows operating systems, and on Hyper-V

and Xen virtual machines.

Snapshot

The snapshot technology generates a data duplicate that is consistent with source LUN

data at a point in time, without interrupting services running on the source LUN. The

duplicate is available immediately after being generated. Reading or writing the

duplicate has no impact on the source data. The snapshot technology helps handle online

backup, data analysis, and application testing.

Asynchronous remote replication

Remote replication is a type of data mirroring. Remote replication can be synchronous or

asynchronous. Huawei OceanStor Dorado series SSD array supports asynchronous

remote replication. Asynchronous remote replication allows multiple data copies to be

maintained at two or more sites, removing single-site data loss risks. Asynchronous

remote replication uses the snapshot technology to provide instant data collection and

points in time when faults are recovered.

Figure 3-14 Schematic diagram of asynchronous remote replication

3.4.2 Service Life

The service life of SSDs is described in 2.3.6 SSD Service Life. Table 2-3 shows that the

service life of Huawei SSDs under great pressure meets the requirements of enterprises.

The workload of SSDs is defined in the industry. Joint Electron Devices Engineering Council

(JEDEC) is a global leader in developing open standards and publications for the

microelectronics industry. JEDEC is an independent semiconductor engineering trade

organization and standardization body. Its members are from enterprises all around the world.

Currently, JEDEC focuses on developing open standards for solid state technologies.

JEDEC's JES D218 and JES D219A standards define a workload model for calculating the

service life of SSDs. According to this workload model, the service life of Huawei SSDs

meets the requirements of various enterprise-class applications.

In addition, a Dorado series all-flash-memory array has a series of features that improve the

service life of the entire array.





30

Global WL. WL is implemented on an entire storage array to prevent some SSDs from

being worn in advance caused by service hotspots.

Online deduplication. Data written to flash memories is reduced to decrease write

amplification (in development).

3.4.3 Performance

Currently, all models of Dorado all-flash-memory arrays have been tested in the SPC-1

benchmark test. Figure 3-12 compares the SPC-1 data of Dorado series all-flash-memory

arrays as well as traditional mid-range and high-end storage arrays from peer vendors.

Figure 3-15 Test result comparison between all-flash-memory arrays and traditional storage arrays

The data in Figure 3-15 is from the SPC official website. The horizontal axis represents IOPS

and the vertical axis represents latency (ms). The figure shows the latencies of storage arrays

under various IOPS pressures.

According to this figure:

1. All-flash-memory arrays are designed and developed based on flash memories and

achieve high IOPS by low latency. However, traditional storage arrays achieve high

IOPS by stacking disks.

2. The Dorado2100 has the same IOPS as and lower latency than traditional mid-range

storage arrays.

3. A traditional mid-range storage array fully equipped with SSDs can have relatively high

IOPS, but its latency cannot match that of all-flash-memory arrays.

4. The IOPS of Dorado5100 and Dorado2100 G2 is much higher than that of traditional

high-end storage arrays. In addition, the latency of the Dorado series is one to two orders

of magnitude lower than that of traditional high-end storage arrays.

To provide low latency and high IOPS, Dorado series all-flash-memory arrays are designed

and developed based on flash memories.

Rewriting cache algorithm. The performance difference between SSDs and HDDs

described in section 3.3.1 "Problems Caused by SSDs" causes that the cache algorithm





31

for HDDs cannot apply to SSDs. Dorado series all-flash-memory arrays uses page tables

to form cache and simplifies data flush and wash out algorithms to reduce time and CPU

usage, achieving lower latency and higher IOPS.

Physically separating the data plane from the management plane. Various features of

hardware are used to accelerate service data processing, releasing CPU for processing

higher IOPS.

Global GC. The cache is used to cache service data of hosts and a time window is

provided for all SSDs in turn. Any SSD will not receive write I/Os in a specified time

window. This ensures smooth GC and reduces the impact on storage arrays (in

development).


Technical White Paper 4 Experience



32

4 Experience

4.1 Application Analysis of All-Flash-Memory Arrays

Item Feature Advantage Disadvantage Application Scenario

Traditional

storage array

equipped with

SSDs

HDDs sequentially read and

write I/Os in large blocks.

However, most application I/Os

are random. Traditional storage

arrays are designed for HHDs.

The cache and I/O scheduling

algorithms must ensure that

random I/Os are integrated into

large-block sequential I/Os,

which increases the algorithm

complexity and latency. SSDs

on a traditional storage array

are regarded as HDDs.

Therefore, SSDs' advantage of

processing random I/Os cannot

be brought into full play. In

addition, complex algorithms

counteract the advantage of low

latency.

Tiered storage,

reducing the

comprehensive

cost

Although

applications can

be accelerated,

advantages of

SSDs cannot be

brought into full

play.

Because

reliability and

WL cannot be

designed for

SSDs, their

reliability and

service life

cannot match

those of solid

state storage.

A few

applications

requiring tiered

storage and

acceleration

PCIe SSD PCIe SSDs are installed in

servers, which keeps SSDs

closest to the CPU. This

method increases the

complexity of maintenance and

capacity expansion. In addition,

storage space of these SSDs

cannot be shared. The

convenience and reliability

designs of external storage

cannot be applied to these

SSDs.

Storage

deployment

method of the

highest

performance

Single points of

failure lower

reliability.

Services must be

stopped for

maintenance.

Storage capacity

cannot be

expanded and

shared.

Non-core

applications and

distributed

computing

systems





33

Item Feature Advantage Disadvantage Application Scenario

All-flash-memory

array

SSDs are installed in

SSD-based all-flash-memory

arrays. Their performance

advantages can be brought into

full play and their reliability

and service life are maximally

improved. All-flash-memory

arrays feature high reliability

and maintainability.

1. High

performance

2. Robust

reliability

3. Low cost in

terms of

performance

High cost in

terms of

capacity

Core services

requiring high

performance

and reliability,

especially

databases

4.2 Typical Applications in Target Industries

Government:

External e-government network: Is a database bearing government affairs transparency

and public services. These customers expect to accelerate public query and service

processing, improve the efficiency of public services, and enhance public service

satisfaction. The Dorado series is used as the primary storage array of a database.

Internal e-government network: Is used for internal work of governments and

communication between government sectors. These customers expect to improve their

work efficiency. The Dorado series is used as the primary storage array of a database.

Virtual desktop infrastructure (VDI) for work efficiency: These customers expect a

VDI featured by robust performance, high density, low cost, and energy savings. The

Dorado series is used as the primary storage array of the VDI.

Public security bureau:

Population, entry and exit, and vehicle information resource databases: Store

information about population, entry-exit personnel, goods, and vehicle registration,

violation, and annual review. Other applications need to extract required data from these

basic databases. These customers expect to accelerate information access and query and

cope with a large amount of concurrent access. Serving as the primary storage array of

these databases, the Dorado series improves the efficiency of resource database–based

applications and supports a large amount of high-speed access.

Information judgment and analysis data warehouse: a large platform where

information extracted from the basic databases is processed in batches. These customers

expect to reduce information analysis duration and improve analysis efficiency. Serving

as the storage array of the data warehouse, the Dorado series improves analysis

efficiency and reduces analysis duration.

Human resources and social security:

Personnel information management system

Information comparison and inspection service system

Service information comparison and inspection service system

Electronic record management system





34

Employment management subsystem

Employment service subsystem

Basic social security funds monitoring subsystem

Social security monitoring, inspection, and management subsystem

Social security management subsystem for medical payment

Urban and rural residents social security management subsystem

Employees social security management subsystem

Social security card certification subsystem

These are the information databases of human resources, social security, and employment, and

various service systems built on these databases. In the human resources and social security

systems of the Ministry of Human Resources and Social Security as well as its provincial and

municipal bureaus, both the online transaction processing (OLTP) and online analytical

processing (OLAP) exist. The OLTP is used to query and access the information databases.

The OLAP is used to analyze, report, and integrate database information. These customers

have the following requirements:

1. Fast public services

2. Support for a large amount of concurrent access

3. Efficient reporting, auditing, batch processing, and data integration

Serving as the primary storage array, the Dorado series accelerates service processing,

increases concurrent access, and shortens reporting and batch processing durations, improving

the customer satisfaction and work efficiency of human resources and social security bureaus.

Finance

Treasury payment system and budgeting system: The treasury payment is the most

important service in the financial system. The treasury pays for the expenditure of

departments after approval. The budgeting system processes and analyzes various budget

reports. The budgeting system requires high performance, especially at the beginning

and end of every month and year, to perform data batch processing and prepare reports.

These customers expect to shorten the duration of financial and budget report generating

and batch processing durations. Serving as the primary storage array of the OLAP

database, the Dorado series improves the efficiency of batch processing and report

generating.

Tax:

Tax collecting system: Is the most important system in the tax industry and is

responsible for tax collecting, statistics, analysis, and report. These customers expect to

shorten the duration of tax collecting, statistics, analysis, and report for the OLAP. The

Dorado series is used as the primary storage array of the tax collecting system.

Customs:

E-customs port system: Is responsible for customs clearance registration and inspection,

customs declaration bills, and customs declaration data integration, analysis, and report.

The customs clearance system uses both the OLTP and OLAP. The OLTP is used to





35

process customs clearance bills. The OLAP is used to integrate and process data in

batches at night. These customers have the following requirements:

1. Improve customs clearance speed and service efficiency.

2. Shorten data integration. Service data generated in the day can be processed at that

night.

Serving as the primary storage array, the Dorado series accelerates the customs clearance

service and shortens data integration.

E-hospital:

Hospital information system (HIS): Processes the medical history, appointments with

doctors, and payment of patients.

Laboratory Information System (LIS): Processes the medical history, test reports, and

medicine use of patients. These customers expect to accelerate the processing of the HIS

and LIS systems.

The Dorado series is used as the primary storage arrays of these two systems.

Education:

Online exam paper marking system: Enables teachers to read and mark electronic

exam papers from a database and then input scores. These customers expect to accelerate

the exam paper marking process to cope with a large number of exam papers after

college entrance examination. The Dorado series is used as the primary storage array of

the system.

VDI teaching: Uses the VDI to deploy teaching systems in schools, especially in

primary and secondary schools. These customers expect a teaching system featured by

robust performance, high density, low cost, and energy savings.

Grid:

Advanced metering infrastructure (AMI): Is an architecture consisting of hardware

and software for automated and two-way communication between a smart utility meter

with an IP address and a utility company. The goal of an AMI is to provide utility

companies with real-time data about power consumption and allow customers to make

informed choices about energy usage based on the price at the time of use. The data is

typical database data. The data amount is small and generated once a day. Gateway

energy meters change frequently. These customers expect to improve system

performance. The Dorado series is used as the primary storage array of a database.

Finance:

Bank card system, loans, and deposits: The main services and bank card management,

loan management, and deposit management systems of a bank require rapid transaction

and high concurrency.

Report, finance, and data warehouse: The report, analysis, data mining, and audit

require the analysis of a large amount of data in a short time.

Customer relationship management (CRM) system: Manages the detailed information,

assets, credits, and bad records of customers. It can also filter, collect statistics on,

analyze, and mine customer information in different dimensions. These customers

require rapid transaction, high concurrency, and short analysis process.

Retail:

Sales system: Manages the sales record, inventory anticipation, and cargo scheduling of

each outlet. The OLTP performance determines the transaction speed and whether the

system can support high access concurrency during peak hours.

http://searchwindevelopment.techtarget.com/definition/IP-address





36

Business intelligence (BI): Is used to analyze sales volumes, market, and customer

behaviors so that business plans can be formulated based on the analysis. The OLAP

capability and system data analysis speed determine the efficiency of scheduling

customer cargoes and the acute sense for the retail market.

Voyage transportation and logistics:

Voyage transportation management system: Needs the OLAP system to process order

data and generate plans about warehousing, transporting, and routes at night. If these

plans cannot be generated in time, the cost increases while the benefit decreases.

Therefore, the voyage transportation management system has a strict requirement for

batch processing duration at night.

Petroleum:

Earthquake materials processing system: Requires high-performance computing to

simulate and compute seismic data. A large amount of data needs to be computed and

scheduled. This requires high-performance storage devices.

Oil pipeline Enterprise Resource Planning (ERP): An efficient ERP and production

management service of the OLAP system determine efficient production.

Finished product, wholesale, and retail: The finished products, bulk accounting,

inventory anticipation, cargo scheduling, and market trend analysis require an integrated

marketing system. The OLTP system is combined with the OLAP system to rapidly

generate reports and scheduling plans, saving costs.

4.3 Typical Cases

4.3.1 OLTP Case

The transaction waiting time of a building material retailer is reduced to 20% of the

original one. Customer satisfaction is improved. The maximum number of concurrent users is increased by 20X. The service growth is smooth.

The retailer is the Germany's largest building material retailer. In 2001, the retailer had over

110 stores in seven countries of Western Europe and set up a strategic relationship with

Kingfisher. In 2012, the retailer had more than 200 stores in Eastern Europe, Asia, and

Australia.

Customer Pain Points: The excessive transaction waiting time incurs a bottleneck of

increasing concurrent users.

In German-speaking countries including Germany, Switzerland, and Austria, the retailer uses

unified data centers for online transactions. The online transaction system SAP is based on the

Oracle 11g database and uses eight IBM 3850 servers, WMware virtual machines (VMs), and

IBM DS8700. The service system averagely processes 500 transactions per minute during

peak hours. Each transaction takes 10 seconds and 3.5 persons have to wait at each cash

register on average.

In early 2012, the number of stores was increased by 25%. The service system had to process

600 transactions per minute during peak hours. In the actual situation, the service system

could not process 600 transactions per minute in peak hours and the transaction time

increased to 30 seconds. 7.5 persons had to wait at each cash register on average. Customer

complaints increased and about 8% customers gave up procurement because of long waiting

time. The monthly loss totaled 10 million euros in sales.





37

In desperation, the retailer shut down 1/6 cash registers in each store to ensure that the service

system could process 500 transactions per minute and each transaction took 10 seconds. As a

result, 5.1 persons had to wait at the cash register on average and customer complaints and

procurement give up still existed.

Huawei Solution: The Dorado5100 is used to store the database of the MAP system.

Each event of the database is analyzed. The result indicates that when the service pressure

increases, the I/O latency of the original storage system increases sharply. During peak hours,

the IOPS reaches 300,000 and the average I/O latency is up to 10 ms. High I/O latency causes

that 97% of the time is used for I/O waiting during database running. During I/O waiting, the

CPU is idle. As a result, service processing is greatly prolonged. To improve database

performance, the latency under the pressure of processing 300,000 and more IOPS must be

reduced.

In July 2012, the Dorado5100 was used to replace original IBM 8700, serving as the online

storage device of the MAP system and IBM 8700 was used for tests, backup, and office

applications. The latency is within 1 ms when the Dorado5100 processes 600,000 IOPS.

In the TPC-C tests of the Dorado5100, the service processing waiting time is reduced to

18.3% of the original one. In actual situation, the service process waiting time of each order is

reduced to 2 seconds, 20% of the original one. In the TPC-C tests, the number of concurrent

users supported by the Dorado5100 is increased by 20 times. In the actual situation, the

Dorado5100 processes 800 orders per minute.

Benefits: Services are expanded economically, effectively, and conveniently.

The LVM function of AIX enables the data to be migrated within 15 minutes. Currently, the

transaction system can process 800 transactions per minute and each transaction takes 2

seconds. 1.3 persons have to wait at the cash register on average. No customers complain

about waiting time and give up transactions because of waiting. The retailer does not need to

shut down cash registers to ensure acceptable waiting time any longer.

In addition, after the Dorado5100 is used, the number of concurrent transactions increases by

85%. The transaction system can process another 85% transactions without adding a server,

VM, software license file, service and maintenance cost. To the retailer, every 10 new stores

can save about 350,000 euros on IT costs.

4.3.2 OLAP Case

The batch processing duration of a voyage transportation company is reduced from 155

minutes to 15 minutes (9.6% of the original one). The company successfully expands its

services to North America.

This company is a large voyage transportation company whose services cover the Europe,

Africa, North America, and South America. This company's services include shipping,

warehousing, land transportation, cargo handling, and ship management.





38

Customer Pain Point: The batch processing time at night is too long. The service

processing capability is low.

The company uses the self-developed MAP system. Based on the Oracle 11g database, the

MAP system integrates the OLTP and OLAP. Currently, the MAP system uses two IBM P750

midrange computers and its storage system uses IBM SAN Volume Controller (SVC) to

manage one IBM DS300 and one DS4800. In daytime business hours, the MAP system

processes OLTP services and orders, including reservation for land transportation, loading and

unloading, warehouses, containers, ships, customs, and insurance. In non-business hours at

night, OLAP data is consolidated and backed up for the planning and scheduling of land

transportation, warehouses, containers, and ships. The data processing must be completed

before 6:00 a.m. the next day to ensure the smooth running of services.

Almost 100,000 transactions must be handled each day. At 00:15 in the morning, the MAP

system starts batch processing, such assorting orders, performing statistics, and outputting all

kinds of business plans for land transportation, warehousing, and shipment. Batch processing

must be completed within 3 hours. Even though plan delivery and data backup take some time,

operations at night have no impact on service next day. Currently, batch processing takes 155

minutes. Services can run properly.

In early 2012, this company planned to expand its services to North America. The number of

daily orders was expected up to 150,000. Therefore, the MAP system cannot process the data

of 150,000 transactions within 3 hours. If the report output is delayed for one day, the loss for

this company is nearly 100,000 euros.

Solution: HUAWEI Dorado5100 is used for database storage of the MAP system.

Each event of the database is analyzed. The result indicates that when the service pressure

increases, the I/O latency of the original storage system increases sharply. During peak hours,

the IOPS reaches 200,000 and the average I/O latency is up to 8 ms. High I/O latency causes

that 80% of the time is used for I/O waiting during database running. During I/O waiting, the

CPU is idle. As a result, batch processing is greatly prolonged. To improve database

performance, the latency under the pressure of processing 200,000 and more IOPS must be

reduced.

The Dorado5100 is used to replace the original storage system, serving as the online storage

device of the MAP system. The latency is within 1 ms when the Dorado5100 processes

600,000 IOPS. The batch processing time is reduced to 15 minutes for three consecutive days.

Benefits: The batch processing time is reduced to 9.6% of the original one, ensuring

smooth service growth.

The batch processing time of 100,000 transactions is reduced from 155 minutes to 15 minutes,

and that of 150,000 transactions is expected to be less than 30 minutes. The strong data

processing capacity ensures smooth service growth. This company says that the performance

of Huawei Dorado solution is beyond their expectation. The batch processing time keeps short

even if services grow by 2 or 3 times. The efficient batch processing allows this company to

issue service plans in a timely manner to prevent extra OPEX caused by plan delay.


Technical White Paper 5 Conclusion



39

5 Conclusion

Huawei is dedicated to providing high-quality storage products and user-friendly services for

customers. Based on this concept, Dorado series all-flash-memory arrays feature low latency,

high IOPS, robust reliability, and enhanced usability to help customers cut the TCO and

maximize service competitiveness.


Technical White Paper A Acronyms and Abbreviations



40

A Acronyms and Abbreviations

LUN logical unit number

RAID redundant arrays of independent disks

SCSI Small Computer System Interface

SAS serial attached SCSI

RAM random access memory

PCM phase change memory

SSD solid-state drive

HDD hard disk drive

MTBF mean time between failures

FIT failure in time

AFR annual failure rate

MTTR mean time to repair

OLTP online transaction processing

OLAP online analytical processing

CAPEX capital expenditure

OPEX operating expense

TCO total cost of ownership

HUAWEI OceanStor Dorado Series SSD Array Technical · PDF fileHUAWEI OceanStor Dorado Series SSD Array Technical ... 4.3.1 OLTP Case ... flash memory features and SSD design concepts

Documents