Top Banner
Panasas Performance Scale-Out NAS-Architectural Overview 3 WHITE PAPER Panasas ActiveStor Solution: Architectural Overview
8

Panasas ActiveStor Solution: Architectural Overview · 2019-12-18 · Panasas DirectFlow Parallel Data Access Protocol 6 NFS and SMB Protocols 6 Metadata Architecture 7 ... optimizing

May 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Panasas ActiveStor Solution: Architectural Overview · 2019-12-18 · Panasas DirectFlow Parallel Data Access Protocol 6 NFS and SMB Protocols 6 Metadata Architecture 7 ... optimizing

Panasas Performance Scale-Out NAS-Architectural Overview 3

WHITE PAPER

Panasas ActiveStor Solution: Architectural Overview

Page 2: Panasas ActiveStor Solution: Architectural Overview · 2019-12-18 · Panasas DirectFlow Parallel Data Access Protocol 6 NFS and SMB Protocols 6 Metadata Architecture 7 ... optimizing

Panasas ActiveStor Solution White Paper: Architectural Overview 2

Executive Summary 3

Panasas ActiveStor 3

Scalability 3

Performance 3

Reliability 3

Manageability 4

Panasas ActiveStor Solution Components 4

Director Nodes 4

Storage Nodes 4

Panasas PanFS 5

Parallel File System Based on Object Scalability 5

Object Storage Device File System 5

Scalable Metadata Services 5

Cluster Management Services 6

Panasas PanActive Manager 6

Panasas DirectFlow Parallel Data Access Protocol 6

NFS and SMB Protocols 6

Metadata Architecture 7

File-Level Metadata 7

Block-Level Metadata 7

Data Protection in Panasas PanFS 7

Erasure-Coded Data Protection 7

Reliability That Increases with Scale 8

Data Scrubbing 8

Extended File System Availability 8

Conclusion 8

Contents

Page 3: Panasas ActiveStor Solution: Architectural Overview · 2019-12-18 · Panasas DirectFlow Parallel Data Access Protocol 6 NFS and SMB Protocols 6 Metadata Architecture 7 ... optimizing

Panasas ActiveStor Solution White Paper: Architectural Overview 3

Executive Summary The need to process, store, and analyze ever larger amounts of information has intensified the pressure to build compute and storage environments that can stay ahead of business and data growth. Much of that data growth is unstructured—ranging in size from small text files of a few kilobytes to multiterabyte image and video files. Industry analysts consistently forecast that the growth in unstructured sequential data now significantly exceeds that of structured data. Network-attached storage (NAS) is the ideal choice for unstructured data because it is designed for application-based data sharing, ease of management, and support for multiple file access protocols such as Network File System (NFS) and Server Message Block (SMB).

The challenge is that as traditional scale-up NAS systems grow, they quickly hit one or more physical limits. Further growth results in multiple storage silos that are costly and difficult to manage. While basic scale-out NAS systems helped address this, they lack the high performance needed to support modern workflow requirements. The better approach is an integrated scale-out NAS system with a performance-focused scale-out protocol that you can easily manage and has a compelling price point.

Panasas ActiveStorPanasas® ActiveStor® performance scale- out NAS is an advanced technology solution designed to overcome the expansion challenges and the performance limitations of other NAS architectures. A Panasas clustered storage architecture avoids silos of storage by delivering the flexibility to start with dozens to hundreds of terabytes and then seamlessly and transparently expand to petabytes, while linearly scaling performance. Just add more ActiveStor units to the cluster and each added ActiveStor enclosure will immediately contribute more storage capacity, dynamic random-access memory (DRAM) cache, processing power, and networking bandwidth. Even in the largest Panasas deployments, all data resides within a single namespace with a single management graphical user interface (GUI), delivering data at very high reliability and availability.

ScalabilityA key Panasas architectural accomplishment is enabling efficient scaling by clustering one or more ActiveStor enclosures together into a shared storage pool with a single namespace. Growing from one to 10 or even 100 ActiveStor enclosures will increase performance and capacity by the same factor, with nearly perfect linear scaling.

PerformanceNo one wants to wait to get data back from their storage, but high-performance and data-intensive applications have much more data to move and usually much tighter deadlines to meet.

Panasas ActiveStor storage can meet those demands more easily than other NAS architectures because Panasas has been optimizing for those requirements for more than a decade. We use a scale-out architecture that grows storage capacity, DRAM caching, and network bandwidth incrementally and linearly as you add more ActiveStor enclosures. We deliver data from our storage nodes in parallel to the application, thereby multiplying the bandwidth an application can achieve to a single file, not just an increase in aggregate bandwidth. And finally, data flows directly from our storage nodes to the application. There are no hops through intermediate servers or even extra network links.

ReliabilityAs total storage capacity goes up in a NAS system, drive failures can become more frequent, simply because there are more drives in the system. Panasas addresses this challenge with an advanced per-file, distributed erasure-coding software implementation with two levels of protection. A dual-parity erasure-coded level of protection is applied as data in each file is distributed across storage nodes in the cluster, correcting up to two simultaneous failures, whether it is drives or whole nodes. The second level of protection handles failures of a portion of a drive, where just a range of blocks within a drive are inaccessible. This allows the system to detect and correct sector-level errors and other low-level issues without having to perform a full node-level rebuild.

K E Y P A N A S A S A R C H I T E C T U R A L C O N C E P T S

PARALLEL DATA FLOW

Just as a highway accomplishes greater traffic flow by adding more lanes, the Panasas archi-tecture increases I/O throughput by adding more network channels between a storage consumer and Panasas storage. Traditional scale-up architectures, and even the more traditional scale-out architectures, are limited to only one network path between a storage consumer and a network share, constricting data throughput.

DIRECT ACCESS TO DATA

Scale-up NAS and the more traditional scale-out NAS architectures force data access to go through an intermediary filer head or a single node of the NAS cluster to get to the storage node that contains the data. The Panasas solution directly connects all storage consumers to all storage nodes in our scale-out NAS cluster, boosting performance while also eliminating hotspots and increasing availability. The Panasas solution also separates metadata access from the data path, which is key to avoiding another aspect of the congestion present in scale-up and other scale-out storage solutions.

SCALE-OUT NETWORK-ATTACHED STORAGE

You can easily and incrementally grow a single pool of Panasas storage, adding capacity and performance as needed without having to create multiple silos of separately managed storage in the process. It remains online and accessible as a single global namespace during the process. In contrast, you grow scale-up NAS by adding more and more drives to the single NAS head until the limit is reached. Scaling further means either de-ploying another storage silo or replacing the NAS filer head through a disruptive forklift upgrade.

HYBRID STORAGE WITH INTELLIGENT TIERING

The Panasas architecture leverages two types of storage media—flash and hard drives. Small files and file system metadata are stored in the flash, while large files are stored on the hard drives. The streaming bandwidth of hard drives makes them ideal to store large sequential file data, and the high random IOPS rates of flash are a good match for storing small files and file system metadata. Blending these architectural concepts maximizes price/performance for random and sequential I/O and results in an attractive price point for the solution. Intelligent hybrid tiering hides the complexities while maintaining a single view into the file system.

ERASURE-CODE-BASED PER-FILE DATA PROTECTION

Panasas uses software-based erasure codes to separately protect individual files instead of protecting whole raw storage devices. Soft-ware-based data protection avoids the scalabili-ty challenges of hardware-based RAID while also enabling more flexibility in data protection levels. Operating at the file level accelerates rebuild times, which increases data availability with scale rather than decreasing it as with traditional RAID techniques.

Page 4: Panasas ActiveStor Solution: Architectural Overview · 2019-12-18 · Panasas DirectFlow Parallel Data Access Protocol 6 NFS and SMB Protocols 6 Metadata Architecture 7 ... optimizing

Panasas ActiveStor Solution White Paper: Architectural Overview 4

Traditional hardware or software RAID architectures rebuild a whole physical drive onto another physical drive. Rebuilding a physical drive is limited by the bandwidth of writing to the replacement drive, and the entire replacement drive is written to, even if the drive that failed is not full. The Panasas solution rebuilds only the files that are affected by the failure; and it rebuilds them onto free space distributed throughout the storage cluster, not on a dedicated and otherwise idle “hot spare” drive. Because we use the whole cluster to rebuild any affected files, we can recover from failures in a fraction of the time traditional RAID architectures require.

ManageabilityA Panasas scale-out storage system is a single entity you manage from a single GUI, no matter how many ActiveStor enclosures have been integrated into the single namespace it supports. The cluster will automatically rebalance capacity across the ActiveStor enclosures if they become unbalanced; automatically reconstruct the full levels of erasure-coded data protection for all files in the event of any failures; and continuously scan in the background all the files in the cluster and scrub out any latent data failures. A cluster is much easier to manage than separate silos of traditional NAS, resulting in lower operational expenses.

Panasas ActiveStor Solution ComponentsThe Panasas ActiveStor solution consists of the Panasas PanFS® distributed clustered file system running on your choice of two different form factors of ActiveStor bladed hardware platforms (see Figure 1).

You can deploy an ActiveStor solution using a 4RU hardware enclosure that houses 11 nodes, both DB-class director nodes and storage nodes. Each enclosure can contain from zero to three DB-class director nodes intermixed with eight to 11 storage nodes.

You can also deploy director functionality using a 2RU hardware enclosure that houses one to four of the more powerful ASD-class director nodes. The two types of director nodes, DB-class and ASD-class, function in the same manner, taking on the same roles and responsibilities in the PanFS clustered distributed file system. Throughout this document we refer to them both as simply director nodes unless otherwise noted.

You can configure the ratio of director to storage nodes, and the class of director nodes, to meet the demands of specific application workloads and file protocols.

Scalability is accomplished by growing the number of ActiveStor enclosures in the single namespace, with each one adding to the capacity, performance, and network bandwidth of the solution.

Director NodesDirector nodes make up the “control plane” of the architecture, managing metadata rather than storing user data. Each director node has a processor, DRAM, and several high-bandwidth Ethernet ports and runs a Panasas-developed software image that contains control and interface processes for many different aspects of the overall storage system. Among other things, director nodes track the health and “quorum membership” of all director and storage nodes (whether they are alive and well and part of the Panasas cluster or not). They manage the namespace (file names and the hierarchy of directories), the distribution and consistency of user data on storage nodes, and failure recovery actions like data scrubbing and rebuilds, and host a GUI that treats the whole storage cluster as a single entity.

Director nodes also provide a “gateway” functionality, translating between the native Panasas DirectFlow® protocol of the Panasas architecture and the standards-defined storage protocols NFS and SMB. Director nodes do all of this without being in the data path; user data in files does not pass through a director node unless it is being

translated to/from NFS or SMB. The new higher-performance ASD-class director nodes provide roughly twice the metadata operations per volume than DB-class director nodes.

Storage NodesStorage nodes contain all the application and user data stored by the Panasas system and make up the “data plane” of the architecture. Each storage node is a “hybrid” device with one flash solid-state drive (SSD), two Serial ATA (SATA) hard disk drives (HDDs), a processor, and a pair of redundant Ethernet ports to connect it to the client systems and the director nodes. Each 4U ActiveStor enclosure includes redundant fans and power supplies, in addition to a dedicated internal UPS (uninterruptible power supply) that allows the storage node software to treat all DRAM as a power-protected cache of newly written data. The flash SSD supports fast access to small user data files and PanFS file system metadata. The two high-capacity SATA drives are striped together and are well-suited for large unstructured sequential user data. The storage node runs a Panasas-developed software image called object storage device file system (OSDFS) that interfaces to the rest of the Panasas architecture, including directly to the client systems, and manages storing data and metadata on the SSD and HDDs on behalf of those systems.

Figure 1. High scalability and efficiency

Page 5: Panasas ActiveStor Solution: Architectural Overview · 2019-12-18 · Panasas DirectFlow Parallel Data Access Protocol 6 NFS and SMB Protocols 6 Metadata Architecture 7 ... optimizing

Panasas ActiveStor Solution White Paper: Architectural Overview 5

Panasas PanFSTo power the ActiveStor performance scale-out NAS solution, Panasas developed the PanFS distributed clustered parallel file system. Unlike other storage products that loosely couple parallel file system software with legacy block storage arrays, the PanFS solution is designed for the more modular and granular ActiveStor enclosures, with an easy-to-maintain field-replaceable unit (FRU) design. The result is an enterprise-class solution that is as reliable as it is easy to manage and grow. We call a storage cluster a “realm.”

The PanFS scale-out NAS parallel file system is designed for high performance, reliability, and manageability. The PanFS platform combines the functions of a distributed and clustered file system, a volume manager, and a scalable software-based erasure-coding data-protection engine into a single high-performance system that serves up to hundreds of gigabytes per second of data from a single namespace. A single system administrator can manage the PanFS operating environment at any scale, from dozens of terabytes to dozens of petabytes of user data.

To simplify usability, the PanFS solution supports a namespace containing from one to as many volumes as you need. To maximize throughput and scalability, the PanFS operating environment distributes the data across a pool of storage nodes and enables direct parallel access to it all via multiple user authentication schemes, and multiple access protocols control data access permissions for smooth integration into Linux, macOS, and Windows environments. Clients requiring the highest performance access the PanFS architecture via the DirectFlow parallel data protocol, while traditional NFS and SMB protocols provide open access to the same namespace.

The PanFS architecture (see Figure 2) is composed of the following functional blocks:

• Parallel file system – Coordination of the actions of the object storage devices and metadata processing

• OSDFS – Storage and retrieval of data from drives on behalf of client systems

• Scalable metadata services – Cache coherency and file metadata processing

• Cluster management services – Nodes that are currently healthy and in the “quorum” and those that aren’t, recovery actions if nodes fail, configuration changes, etc.

• PanActive® Manager – CLI, SNMP, and XML management interfaces

• DirectFlow protocol – Native high-performance, cache-coherent file access for Linux and macOS

• NFS protocol – Standard NFSv3 protocol for Linux, macOS, and others

• SMB protocol – Standard SMBv3.1 protocol for Windows and macOS

• Director gateway services – Support for exporting PanFS architecture via a protocol implementation layered above

Parallel File System Based on Object ScalabilityThe Panasas PanFS clustered parallel file system delivers scale-out NAS file access from an underlying distributed object storage device pool. Object-based data layout is one of the key design principles behind the high scalability and efficiency of the Panasas architecture. Files in the PanFS environment are stored inside objects, but because we use per-file erasure coding to protect each file, we “shard,” or stripe, the file across multiple objects. The overall file is stored in a “virtual object,” which is made up of many component objects that store the shards. The PanFS operating environment only stores the component objects; the virtual object is just a map identifying the set of component objects plus the striping parameters that make up this file. The map is stored near the component objects. To avoid a single storage node failure affecting more than one component object of a given file, all the component objects for a file are spread across different storage nodes. File system metadata (information about files) and directories are also stored inside objects and are kept in the same object store as the objects containing file data.

The DirectFlow protocol is the native high-performance method of direct client access to object storage device data. Clients using the DirectFlow protocol read and write objects directly to storage nodes in parallel, bypassing the metadata managers. DirectFlow clients interact with the metadata managers out of band via a remote procedure call (RPC) to obtain access permissions and location information for the objects that store files (e.g., the map).

Object Storage Device File SystemAll the PanFS data services are built on top of the capabilities of the storage nodes. The software on storage nodes implements an abstraction called an object storage device file system (OSDFS). The DirectFlow protocol describes object-level reads and writes that the DirectFlow client software wants the storage node to perform.

To optimize performance, the OSDFS includes advanced caching capabilities and intelligent object data placement on the SSD and HDDs. In addition to a read cache to improve data retrieval, the OSDFS caches newly written data in power-protected DRAM. Data fragmentation is reduced by accumulating newly written data into larger sequential regions for writing, so that later reads of that same data will also be sequential. The OSDFS determines the best physical storage device within the storage node for each object based on the object’s size. PanFS metadata and user files smaller than 60KB will reside on the storage node’s SSD, and all user data larger than 60KB will reside on the storage node’s hard drives.

Scalable Metadata ServicesPanFS metadata services run on director nodes, implement all file system semantics, and manage sharding of data (striping) across the storage nodes. They control distributed file system operations such as file-level and object-level metadata consistency, client cache coherency, recoverability from interruptions to client I/O, storage node operations, and secure multiuser access to files. Portable Operating System Interface (POSIX) compliance mandates that each modification to the directory hierarchy or file metadata be atomic; director nodes use a transaction log to ensure that atomicity. Fault tolerance is based on synchronously replicating each local transaction log to another director node, which is, in addition to its other duties, the designated “backup node” for this director node. The backup node relationships are assigned and reassigned automatically.

Figure 2. Overcome the limitations of more traditional scale-out NAS solutions.

Page 6: Panasas ActiveStor Solution: Architectural Overview · 2019-12-18 · Panasas DirectFlow Parallel Data Access Protocol 6 NFS and SMB Protocols 6 Metadata Architecture 7 ... optimizing

Panasas ActiveStor Solution White Paper: Architectural Overview 6

Cluster Management ServicesEvery ActiveStor system is a cluster of storage and director nodes. Each node in the cluster runs a common PanFS cluster management service, enhanced with additional services that provide hardware monitoring, configuration management for this node, and control of services running on this node. An arbitrary subset of three or more of the director nodes in the realm are selected by the customer to be part of the “repset,” ones that will host a replicated copy of the global configuration database. The director nodes in the repset vote to elect one of their members as the “realm president” using the Paxos algorithm; the president runs the master cluster management service for the realm. This process avoids so-called “split-brain” conditions commonly found in other scale-up storage architectures and is the reason at least three director nodes are required in a realm for normal operation. The realm president is responsible for modifications to the global system configuration, detecting and responding to cluster node failures and handling software upgrades and system restarts. In addition, it is responsible for deciding which services are started on which nodes in the realm.

Panasas PanActive ManagerStorage administrators interact with PanFS cluster management services via Panasas PanActive Manager, an intuitive web-based GUI. It allows storage administrators to man-age the cluster of ActiveStor enclosures as a single appliance regardless of scale, providing a single point of storage management for the en-tire namespace. PanActive Manager automates key workflows such as new storage discovery, load balancing to streamline performance, and enterprise data services such as report-ing, snapshots, and user quota enforcement. In addition to accessing the PanFS operating environment via PanActive Manager, admin-istrators can use a command-line interface (CLI), a scriptable XML interface, and a stan-dards-based SNMP interface.

Panasas DirectFlow Parallel Data Access ProtocolThe Panasas DirectFlow parallel data access protocol avoids traditional protocol I/O bottlenecks by allowing client systems to access all the storage nodes in the entire ActiveStor storage cluster directly and in parallel. This results in higher performance than what you can typically achieve with industry-standard protocols such as NFS and SMB, while adding support for cache coherency across client systems that those protocols do not support.

The DirectFlow client that runs on a system wanting access to the PanFS operating

environment is implemented as a file system module that runs inside the kernel of the client’s operating system. There are corresponding DirectFlow protocol server-side components in the director and storage nodes. The client-side kernel module implements a standard virtual file system (VFS) interface; clients can access the PanFS platform as a standards-compliant POSIX file system.

DirectFlow clients divide files into fragments using sharding. When data is being written, it is first sharded. Then an erasure code algorithm is run over the shards to generate data protection information for that region of the file, with the protection information being treated as additional shards. All the shards are then written via the DirectFlow protocol as independent component objects, each to a different storage node, in parallel. Similarly, clients read files by fetching the shard component objects directly from the storage nodes that contain them, in parallel, and combining them back into the desired file (see Figure 3). Performing the erasure coding in the client eliminates the performance bottleneck of storage-side erasure coding imposed by more traditional scale-out NAS solutions.

NFS and SMB ProtocolsIn addition to native DirectFlow access, the PanFS operating environment provides scalable access for client systems via the standard NFS or SMB protocols. Running on

director nodes as a “gateway” (see Figure 4), this enables the PanFS platform to integrate into heterogeneous IT environments consisting of a combination of Linux, macOS, and Windows clients.

The NFS service is a tuned version of the standard FreeBSD NFS server with support for the NFSv3 protocol. There are two options for SMB access to the PanFS environment: our older SMB implementation that supports

Figure 3. Native high-performance method of direct client access to object storage device data

Figure 4. Panasas PanFS operating environment provides scalable access for client systems via NFS or SMB protocols.

Page 7: Panasas ActiveStor Solution: Architectural Overview · 2019-12-18 · Panasas DirectFlow Parallel Data Access Protocol 6 NFS and SMB Protocols 6 Metadata Architecture 7 ... optimizing

Panasas ActiveStor Solution White Paper: Architectural Overview 7

cross-protocol cache coherency, and our new SMB implementation that dramatically increases performance but does not yet support cross-protocol cache coherency. Cross-protocol cache coherency is valuable if you anticipate wanting to access a file from both NFS and SMB at the same time. The previous version of the PanFS SMB service runs on both DB-class and ASD-class director nodes and is derived from a commercially licensed SMB 2.1 implementation compliant with Microsoft SMB protocol specifications (Panasas is a Microsoft protocol licensee). The new PanFS SMB service runs only on the more powerful ASD-class director nodes and is based upon Samba version 4.6.6. Using either of the SMB gateway solutions, users can easily manage files created by a Microsoft Windows or macOS environment. User authentication is managed via a variety of options including Active Directory and Lightweight Directory Access Protocol (LDAP).

The PanFS platform provides administrators with the capability to map Windows security identifiers (SIDs) to Linux user IDs so that storage quotas can apply simultaneously to a given user’s Windows, macOS, and Linux accounts.

Metadata Architecture Storage administrators can easily create volumes within the PanFS global namespace.

Volumes are normal hierarchies of directories and files and share the common pool of storage capacity in the PanFS operating environment but they separate the overall namespace into areas that can have different administrative controls. For example, per-user capacity quotas can be defined at the volume level, unique to each volume. In addition, snapshots are taken at the volume level. Volumes appear as top-level directories in the file system namespace. Each volume has a set of management services associated with it to govern the behavior of that volume, implementing quotas and snapshots, for example, with each volume manager running on a different director node. Because the number of director nodes increases as the system scales out, this partitioning scheme allows overall file system metadata performance to scale linearly with the growth of the overall namespace.

The term “metadata” can mean several different things at the same time and usually means different things to different vendors. In general, metadata means almost everything that is not

data. In the PanFS operating environment, the following is treated as metadata:

• Realm president treats the global configuration database as metadata.

• Director nodes treat the directory hierarchy and file name information as metadata.

• Director nodes treat file attributes such as access control lists (ACLs), permissions bits, file owners, and so on, as metadata.

• Director nodes treat the mapping of a file (virtual object) to a set of IDs for the sharded component objects that contain the file as metadata.

• Storage nodes treat the mapping from a component object ID to a set of block addresses for that component object in that storage node as metadata.

The PanFS platform classifies metadata into two basic categories: file-level and block-level. File-level metadata is manipulated by directors but stored on storage nodes. Block-level metadata is used and visible only inside storage nodes.

File-Level MetadataThe PanFS operating environment maintains several different types of metadata about files, including the typical user-visible information such as the owner, size, and modification time. In addition, it maintains a per-file map from each file’s virtual object to the set of component object IDs that shows which component objects store the sharded pieces of each file and how the data is striped across those objects. The Panasas solution stores this file metadata as, essentially, “extended attributes” associated with three of that file’s component objects. Those component objects are designated “attribute storing components.” The rest of the component objects only have basic attributes such as individual object length and modification times.

Block-Level MetadataThe software inside a storage node (OSDFS) manages block-level metadata using a “delayed allocation” scheme in which data, block pointers, and object descriptors are batched into large write operations in DRAM before being sent to the SSD and HDDs. The OSDFS write buffer in DRAM is protected by the ActiveStor platform’s integrated UPS and is flushed to disk on power failure or other system failure.

The OSDFS is fairly typical of modern “local” file systems. It tracks where the bytes for each component object are stored on the drives via

a traditional direct/indirect/double-indirect scheme. A proprietary bitmaplike data structure that is optimized for copy-on-write reference counting tracks free space. The copy-on-write support is integral to the PanFS space-efficient snapshotting technique.

In traditional NAS systems, block-level metadata management often consumes a large portion of available processing power. By delegating low-level drive management to the storage nodes, PanFS director nodes have an order of magnitude less work to do than equivalent SAN or NAS file system managers that must track all the blocks in the file system, in addition to all the files in the file system.

Data Protection in Panasas PanFSThere are multiple layers of advanced, scalable protection for your data in the PanFS architecture, resulting in a system that is both highly available and very tolerant of faults while still providing high performance.

Erasure-Coded Data ProtectionThe PanFS platform offers an advanced per-file, distributed erasure-coding software implementation with two levels of protection. A dual-parity erasure-coded level of protection is applied as data in each file is distributed across storage nodes in the cluster, correcting up to two simultaneous failures, whether drives or whole nodes. The second level of protection handles failures of a portion of a drive where just a range of blocks within a drive are inaccessible. This allows the system to detect and correct sector-level errors and other low-level issues without having to perform a full node-level rebuild.

The top level of data protection in the PanFS operating environment is calculated on a per-file basis rather than per-drive or within a RAID group as in other architectures. Files are sharded across storage nodes. Sharding implies that each file is striped across a different subset of the storage nodes. If one storage node fails, it will only affect one shard of that file and the erasure coding algorithm can easily rebuild the missing data from the remaining shards.

The second level of data protection in the PanFS operating environment is called “vertical parity.” It protects against drive-level sector failures (i.e., unrecoverable read errors, or UREs) within each storage node. UREs are much more common than outright failures of a drive and can even be transient, failing

Page 8: Panasas ActiveStor Solution: Architectural Overview · 2019-12-18 · Panasas DirectFlow Parallel Data Access Protocol 6 NFS and SMB Protocols 6 Metadata Architecture 7 ... optimizing

Panasas ActiveStor Solution White Paper: Architectural Overview 8

sometimes and not others. Experiencing a URE on a storage node while the system is reconstructing a set of files will look to the system like an additional failure, risking “too many failures” and causing data loss for that file. Vertical parity can detect and correct those UREs, avoiding the “too many failures” case.

Reliability That Increases with ScaleTraditional storage products based on physical RAID groups, whether implemented in hardware or even in software, base their data protection strategy on using multiple groups of drives and an algorithm that can recover from one or possibly two drive failures within each group. Those groups are usually quite small, on the order of 12 to 24 drives, and usually fixed in size. Upon any failures, the RAID controller will pull an idle and unused drive from a “hot spare” pool and use an algorithm to fill that new drive with the same data the failed drive had.

There are several important and potentially performance-impairing consequences to this approach:

• “Rebuild” of a RAID group can only go as fast as the new drive can accept the rebuilt data.

• As drives continue to grow in capacity, their write bandwidth is not growing at nearly the same rate as their capacity, if at all.

• Because drives fail at roughly a constant rate over their lifetime, the probability of an additional drive failing in a RAID group while that RAID group is recovering from an earlier failed drive is dependent upon how long it takes to rebuild the first drive. For example, the longer it takes to rebuild, the higher the likelihood of a “dual failure” and data loss.

• You paid for a “hot spare pool” of drives that are not contributing to the performance or capacity you need.

Using this architectural approach means overall data reliability decreases as the scale increases. The PanFS platform, on the other hand, is designed to deliver increasing data reliability as scale (and performance) increases. One of the most important factors behind that is parallel reconstruction. PanFS rebuilds files instead of drives, and because files are each striped across differing subsets of all the storage nodes, all the files affected by a failure can be rebuilt in parallel utilizing the bandwidth of all the storage nodes at once. The surviving fragments of each file are read from the storage nodes holding them, and the reconstructed portions of each file are written to other storage nodes.

PanFS rebuild is not limited by the performance of a single drive. Rebuild performance scales just as linearly as data access performance. The shorter rebuild time dramatically reduces the window of vulnerability where a file is not fully covered by all the layers of data protection. The actual work of reconstructing each file, reading that file’s surviving data and erasure code parity information, calculating the missing data, and writing that rebuilt data to a new storage node is distributed across all the director nodes in the system. As more director nodes are added to the system, reconstruction performance increases linearly.

Data ScrubbingThe PanFS operating environment performs background scrubbing of all files. Each file is checked to ensure that all the levels of data protection information for each file matches the data in that file. Detection of inconsistent erasure code parity information, for example, one level of the erasure code doesn’t match the data, leads to correcting the error with file-level reconstruction using the other two levels of protection. The vertical parity layer at the bot-tom of the data protection system can detect and correct latent “bit rot” errors in the drive media. Data scrubbing is important in catching any latent failures early so that they can be fixed before there is any risk of data loss.

Extended File System AvailabilityIn addition to the two levels of protection for data, dual-parity erasure coding and vertical parity, the PanFS solution provides an additional layer of data protection called extended file system availability (EFSA) for the namespace, directory hierarchy, and file names. In the extremely unlikely event of encountering errors that our erasure coding cannot recover from, such as three or more storage nodes failing at the same time, the PanFS system will know which files have been affected and will fence them off as bad. Because we have an additional layer of protection for the directory hierarchy and file names, we can usually tell the administrators the full path names of precisely which files have been affected. This provides users with uninterrupted access to all files unaffected by the failure (typically the vast majority of user files), while the smaller number of affected files are logged for restoration from backup or some other source.

ConclusionThe Panasas ActiveStor architecture with the PanFS operating environment is designed to break through the performance constraints of traditional scale-up NAS architectures and

overcome the limitations of more traditional scale-out NAS solutions. The comprehensive and tightly integrated Panasas solution enables high-performance direct parallel access to terabytes and petabytes of data while avoiding the reliability problems inherent in legacy NAS systems as they grow.

True Linear Scalability for Large, Unstructured Data SetsWith the Panasas solution, you can easily and seamlessly scale the performance and capacity of your storage. Growing from 10 to 100, or even 1,000 ActiveStor enclosures will increase performance, capacity, and client access by the same factor with nearly perfect linear scaling.

Ultrafast Parallel PerformanceThe Panasas DirectFlow protocol can support your highest data transfer requirements, accel-erating time to results via clients doing direct parallel data I/O to multiple storage nodes.

Mixed WorkloadsThe I/O performance of the Panasas ActiveStor architecture remains consistent across large numbers of concurrently executing applications, even when they are reading and writing a mix-ture of large and small unstructured data sets.

Enterprise-Grade Reliability and AvailabilityData reliability and availability increase as you expand ActiveStor scale-out NAS. The Panasas advanced per-file, distributed erasure-coding software implementation offers two levels of protection. One detects and corrects problems when a storage node fails. The other detects and corrects sector-level errors and other low-level issues within a storage node without having to perform a full node-level rebuild. In addition, Panasas directly connects all storage consumers to all storage nodes in our scale-out NAS cluster, boosting performance while also eliminating hotspots and increasing availability.

Management Simplicity, Lower CostsThe Panasas ActiveStor solution is a fully integrated scale-out NAS system with a single point of management regardless of scale. It includes enterprise data services you expect from a NAS system and is optimized for high-performance commercial market workloads. By combining high performance with ease of use, Panasas reduces storage complexity in your data center by consolidating many different unstructured data workloads into a single scale-out ActiveStor solution.

For more information about Panasas performance scale-out NAS, visit www.panasas.com.

© 2017 Panasas Inc. All rights reserved. Panasas, the Panasas logo, ActiveStor, Active Directory, DirectFlow, PanActive and PanFS are trademarks or registered trademarks of Panasas Inc. in the U.S. and/or other countries. All other trademarks, registered trademarks, trade names, company names, and service marks are the respective properties of their holders.