-
2014 IBM Corporation
IBM Linux Technology Center
Storage Trends File and Object Based Storage
and how NFS-Ganesha can play
Venkateswararao Jujjuri (JV)File systems and Storage
ArchitectIBM Linux Technology center
[email protected] | [email protected]
2014
mailto:[email protected]
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 20142/28
Outline Data is Exploding Storage Trends Unstructured Data Need
for new solution Object Store File vs Object and Object details Big
question and answer FOBS File and Object Based Storage Object
Storage details and variations NFS Evolution and pNFS and future
NFS-Ganesha Conclusions
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 20143/28
Data is Exploding We create
Source:
http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html
Growth will reach IDC Says Data will grow from 4.4ZB today to 44
ZB by 2020
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 20144/28
Storage Trends Data growth is around 70%/ year, most of it is
unstructured. Scale-out rather than scale-up. Object is gaining lot
of traction but file is not going away; NAS will
stay as significant player. Analysts predict NAS grow at a CAGR
of 25.44% over 2013-2018.
(http://cti.tmcnet.com/news/2014/04/04/7762020.htm)
Unified Storage NAS, SAN, and Object Growth mantra: FOBS
IDC Projections* Structured Data Will grow At a 21.8% CAGR
* Unstructured Data Will grow At a 61.7% CAGR
Market Needs and Adoption2000 Direct Attached Storage SAN
2010 Network Attached Storage NFS,CIFS
2020 File and Object Based Storage
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 20145/28
Unstructured Data Basically non-database data Usually generated
on an event (Cheese......Click)
Typically no access or read access (photos, xrays, dental recs)
Tough to interpret the content (jpeg can be a silly pic or
blueprint)
Emails, Instant Messages, Documents, Spread Sheets, Graphics,
Images, Videos, Social Media, Medical Records, wearable. on .. and
.. on...
Explosive growth in search for cost effectiveness and
manageability. Why not continue file/NAS model?
Simple access model No need for heavy POSIX interface.
Scale-Out: Hierarchical model is more of an overhead Context:
Difficult to build context of an individual file. (need entire
path) Metadata is distributed hence complex/inefficient
policies. Loose/Eventual consistency is often good enough.
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 20146/28
Need for a new Solution Requirements
Simple interface Easy access, no need to traverse through
dirs/subdirs Context of the contents Scale-Out capabilities Massive
and Cheap Easy policies for ILM
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 20147/28
File vs Object
File
Penguin.jpg Object
FileName: Penguin.jpgTimes: atime, mtime, ctime
etcOwner:GroupPermissions: Unix style, ACLs etc
ObjectIdFileTyleTimesCamera Info:Resolution:Owner
Name:Location:Copyright:OrientationYcbCr
positioningCompressionExposure
TimeX-ResolutionY-ResolutionFocalAperture
FlashFocal LengthColor SpaceAngleOrientationPreferred
DisplayCategoryImportanceTagsVersionNotesVoice/comment
Object: Simply an abstract container where data and metadata are
co-located
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 20148/28
Objects Rich meta-data that co-exists with data; easy policies
Addressed by a 128 bit id Flat Access Checksum is part of metadata
Multiple file types can be in one object (a wave and jpeg) Cost
effective because of eventual consistency and the lack
of POSIX complexity. Scales well with off-the-shelf hardware
Simple access protocol, RESTful API. Suited for the digital world
generated unstructured data.
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 20149/28
Big Question
So...
File and NAS are DEAD?
So...
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 201410/28
.and the Answer is..
File and NAS will continue to grow
File and Object joins hands togetherto keep the party on!
FOBS File and Object Based Storage
NO
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 201411/28
FOBS File and Object Based Storage Object storage works best for
WORM workloads and not all
data fits that tab. Object is meant for low cost mass storage
which is not
actively shared. Traditional applications and file systems use
continues File fills part of the spectrum where the need for rich
set of
security and consistency guarantees. Object Storage fills the
space where file/NAS is week. After-all, most of the object stores
and structured data stores
(databases) are created on file-systems
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 201412/28
FOBS File and Object Based Storage
Object Store
File
Volum
e Based
Market share
FOBS
Secure Consistency General Purpose Performance Legacy
WORM/Cold Cost Effective High Volume Scalability
Manageability
Access/Update Frequency
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 201413/28
Object Storage Objects are broadly divided into two
categories.
Storage Devices* Move Smarts into the device layer NASD, OSD-1,
OSD-2, OSD Layer on FS etc
* Access command set Ex: SCSI model command set for OSD.
* Custom OSD mode: Lustre, Ceph
* T10 OSD model: EXOFS, PanFS
* pNFS support.
* PBs of storage on 1000s of disks, 1000s of clients
Web Services* Objects created on Filesystems and accessed
through web.
* REST Model HTTP protocol Operation:Get, Put, Post, Delete
* Highly Available
* Loosely consistent.
* SWIFT, S3, Azure etc
* Gaining tremendous popularity.
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 201414/28
Object Based Filesystem (Ex: Ceph)
* Provides Posix-Compliant FS on top of Object-Based Ceph
Storage Cluster
* Files gets mapped to Objects and MDS below librados
* MDS stores all Filesystem Metadata (Directories, Owners,
Access info etc)
* Data directly stored on OSDs
* Out of band IO: Metadata provides data location, and IO is
directly to OSDs
* Offers kernel mount or FUSE interface
Source:http://ceph.com/docs/master/architecture/
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 201415/28
Web Services Object Store - SWIFT
Source:http://docs.openstack.org/training-guides/content/module003-ch007-cluster-architecture.html
* Storage Nodes consists Objects, stored as binary files on the
filesystem with metadata stored in the files extended attributes
(xattrs).
* Proxy Nodes receive and process Incoming request and determine
the correct storage server for the request.
* All objects stored in Swift have a URL
* All objects stored are replicated 3x in as-unique-as-possible
zones.
* Object data can be located any where in the cluster
* Nodes/Disks can be added without downtime.
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 201416/28
NFS Evolution NFS is extremely popular and widely used.
Stateless NFSv3, very successful and de-facto for 'NFS'. Stateful
NFSv4 came out in 2003
Adaptation is slow but gaining momentum since NFSv4.1 came out
Became a stepping stone to move towards NFSv4.1
NFSv4.1 introduced in 2010, added enhancements and addressed
NFSv4 deficiencies.
Improved performance - pNFS, Directory Delegations, Trunking
Robustness - Exactly Once Semantics Security Windows native ACL
support, Kerberized Back Channel For time-to-market reasons, few
players skipping NFSv4.0 and
directly moving to NFSv4.1. Ex: Vmware, Microsoft.
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 201417/28
Parallel NFS - pNFS
File/Object Layout Driver
POSIX Interface
ExoFS
Control
* Removes IO bottlenecks and improves large file
performance.
* Load balancing
* Scale-out model
* Control Protocol is not standardized, vendor value-add.
* Allows direct client access to the storage devices
* Clients can do parallel IO across storage
* Layouts can be leased, re-callable, and revokable.
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 201418/28
Parallel NFS - pNFS
* Supported layout types are open-ended.
* Supports three types of layouts - File (RFC 5661) - Block (RFC
5663) - Object (RFC 5664) - Future: - Flexible File Layout
(proposal) and others
* File Layout - Files, NFS protocol - Default layout and many
implementations
* Block Layout - SCSI blocks, iSCSI, FCP etc
* Object Layout - OSD SCSI object protocol, OSD2 - Few
implementations, PanFS, Exofs(OSDFS)
User Interface
NFSv4.1
PNFS Layouts
File Obj Block Future
Network / IO stack
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 201419/28
NAS and Object to become FOBS Many traditional applications
written for POSIX access Object storage is different and foreign to
the traditional
applications. One of the solutions is to create a Filesystem
layer on top of
object store. Ex: Maldivica storage connector creates
filesystem
interface on top of SWIFT object store which can be exported via
NFS/CIFS (NAS)
Provide Object Interface on NAS. Ex: Calsoft Integrates NAS with
modified openstack
SWIFT and provides SWIFT interface on NAS.
Source:
http://www.calsoftinc.com/OpenStack-Object-Storage-Swift.aspx# ,
http://maldivica.com/
http://www.calsoftinc.com/OpenStack-Object-Storage-Swift.aspx
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 201420/28
Swift-on-File Ability to access the back-end using both object
interface and
file interface. Swift-on-File stores objects following the same
path hierarchy
as that object's URL. Object URL:
https://swift.example.com/v1/acc/cont/obj
Swift:/mnt/sdb1/2/node/sdb2/objects/981/f79/f566bd022b9285
b05e665fd7b843bf79/1401254393.89313.data SoF:
/mnt/gluster-vol/acc/cont/obj
Enables objects created using the Swift API to be accessed as
files on a Posix filesystem.
This opens up enormous possibilities including NAS and RESTful
interface to create and access the same data
Use Case: Create video files using SWIFT, use file access to
trans-code it, and let it use by SWIFT to access in different
codec.
Source:https://github.com/swiftonfile/swiftonfile/blob/master/README.md
https://swift.example.com/v1/acc/cont/obj
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 201421/28
NFS for future NFS Pathless Objects - filesystem objects which
can be
created, queried for and destroyed without being associated with
a pathname.
(http://tools.ietf.org/html/draft-dipankar-nfsv4-pathless-objects-02)
Metastripe - RFCs are being proposed to stripe/scale meta-data
servers
(http://tools.ietf.org/html/draft-mbenjamin-nfsv4-pnfs-metastripe-01)
Ceph providing access to back-end RADOS object store through
LIBRADOS API, S3/Swift compatible API, Block, CEPHFS - which can be
nfs exported, including pNFS.
(http://ceph.com/docs/master/architecture/)
pNFS over CEPH CohortFS with metastripe PNFS over Lustre. - CEA,
French Defense organization. Possible to offer selectable
consistency with nfs backed
object store vrs web based. OpenStack Manila project
(https://wiki.openstack.org/wiki/Manila/)
http://tools.ietf.org/html/draft-dipankar-nfsv4-pathless-objects-02http://tools.ietf.org/html/draft-mbenjamin-nfsv4-pnfs-metastripe-01http://ceph.com/docs/master/architecture/
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 201422/28
Added Advanced features takes NFS into advanced file sharing
category.
Performance: Server Side Copy: Removes one leg of copy operation
IO_ADVISE: Client advise Server on Application access pattern.
Application Data Blocks (ADB): ex: VM image file type. Sparce file
support.
Security: Labeled NFS: Mandatory Access Control based on system
wide policy
Scalability and QoS Space Reservation: Reserve Storage useful in
thin provisioning Hole Punching: Return unused parts of the file
back to the pool.
.
NFSv4.2
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 201423/28
NFS-Ganesha One of the mainstream NFS Server. User-level NFS
server suitable for enterprise applicatoins
Manageability and debug-ability http://tinyurl.com/kka8czz
Can manage huge meta-data and data caches Provision to exploit
FS specific features. Can serve multiple types of File Systems at
the same time. Can serve multiple protocols at the same time. Can
act as a Proxy server and export a remote NFSv4 server. Cluster
friendly and Cluster Manager agnostic.
Easy recovery, failover and failback implementation.
Multi-protocol support with common DLM (planned)
Small but growing community. Active participants
IBM, Panasas, Redhat, CohortFS(LinuxBox), CES, Bull, + few
more.
http://tinyurl.com/kka8czz
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 201424/28
Supports many Filesystems through FSAL layer VFS, GPFS, PanFS,
Gluster, CEPH, Lustre, XFS, FUSE, Proxy, PT etc
NFS v3, NFSv4.0, NFSv4.1, pNFS support. Minimal NFSv4.2 IBM,
Redhat, LinuxBox, Panasas released/releasing products
based on NFS-Ganesha. Released 1.5, 2.0, 2.1 releases, 2.2 is
set to be GA'd by end of
October 2014. Delegation, Statistics, Dynamic exports, LTTng
support. Supports file and object layouts of pNFS Cluster Manager
Abstraction Layer (CMAL)
Clustered DRC, DLM, multi-protocol support.
.
NFS-Ganesha
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 201425/28
Conclusions Object storage is expanding and file remains to be
very
important part of the equation and expected play together FOBS
is the future.
Unified storage - NAS, Object, SAN NFS is progressing as a
protocol, NFSv4.1 and pNFS support is a
must to be competitive in the market space. pNFS has major
advantages - Scale-out meta-data, data; parallel
IO/ performance improvement. NFSv4.2 is taking NFS as a
preferred filesystem/access protocol
for future storage needs.
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 201426/28
NFS-Ganesha links
NFS-Ganesha is available under the terms of the LGPLv3 license.
NFS-Ganesha Project homepage on github
https://github.com/nfs-ganesha/nfs-ganesha/wiki Github:
https://github.com/nfs-ganesha/nfs-ganesh Download page
http://sourceforge.net/projects/nfs-ganesha/files Mailing
lists
[email protected]
[email protected]
[email protected]
https://github.com/nfs-ganesha/nfs-ganesha/wikihttps://github.com/nfs-ganesha/nfs-ganesh
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 201427/28
Legal Statements This work represents the view of the author and
does not necessarily represent the
view of IBM. IBM is a registered trademark of International
Business Machines Corporation in
the United States and/or other countries. UNIX is a registered
trademark of The Open Group in the United States and other
countries . Linux is a registered trademark of Linus Torvalds in
the United States, other
countries, or both. Other company, product, and service names
may be trademarks or service marks
of others CONTENTS are "AS IS" WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE. Some states do not allow disclaimer of express or implied
warranties in certain transactions, therefore, this statement may
not apply to you. This information could include technical
inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in
new editions of the publication. Author/IBM may make improvements
and/or changes in the product(s) and/or the program(s) described in
this publication at any time without notice.
-
IBM Linux Technology Center
2014 IBM CorporationLinuxCon 201428/28
THANKYOU!
Q&A
VirtFS Overview of a Cluster Filesystem
Pass-ThroughOverviewSlide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide
9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide
17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide
25Slide 26page25Slide 28