Top Banner
Big “Unstructured” Data A Case for Optimized Object Storage Paul Speciale Friday, July 27, 2012
20

SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

Jul 14, 2015

Download

Technology

Gigaom
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

Big “Unstructured” Data A Case for Optimized Object Storage

Paul Speciale

Friday, July 27, 2012

Page 2: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

Storage facts and trends

Recent studies estimate that data storage capacities will likely increase by over 30X in the coming decade to over 35 Zettabytes

30X

35ZB

Time

Sto

rag

e C

on

sum

pti

on

High-capacity drivesLess Staff / TBUnstructured Data

2020

Friday, July 27, 2012

Page 3: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

Storage facts and trends

But…. The number of qualified people to manage this huge volume of data will stay flat (~1.5X)

Administrators will be expected to manage 20X more data each

Time

Cap

city

/ B

ud

get

Efficiency: automate & reduce overhead

Storag

e Req

uirem

ents

Storage Budget

Friday, July 27, 2012

Page 4: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

4

• Much of that growth (80%) is driven by unstructured data• Billions of large objects and files

Media Archives Online Images Large Files

Medical Images Online Storage Online Movies

Storage facts and trends

Friday, July 27, 2012

Page 5: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

5

M&E is driving huge capacity requirements, both with file sizes and volume of files and storage capacities in use, driven by HD, 3D video formats:

“Petabytes are peanuts”

3TB per hour for 4K video

Storage facts and trends: Media & Entertainment Industry Example

Friday, July 27, 2012

Page 6: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

Big Data for Analytics vs. Big “Unstructured” Data

6

Friday, July 27, 2012

Page 7: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

Big Data for Analytics

7

• In the 90’s, we experienced an explosion of data captured for analytics purposes:• Academic Research• Chemical R&D facilities• Travel industry• Geo-industry, oil & gas• Financial / Trading• Agriculture

• In the 2000’s, online applications & social media triggered a flood of trend data

Friday, July 27, 2012

Page 8: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

Big Data for Analytics

8

• Data is captured as many small log files & concatenated as “Big Data”

• Relational databases were not optimal: • Too much data, too big• Insufficient performance for analytics

• This stimulated innovations:• Hadoop, MapReduce, GFS• XML databases

• => This is Big Data for Analytics

Friday, July 27, 2012

Page 9: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

Big Data Evolution

9

• Today, Big Data trend refers to Big Data for Analytics & Big Unstructured Data:

• Media• Streaming• Business• Scientific

• Fundamentally different data but with lots of similarities

• Immense capacities• Number of transactions or objects

• Unstructured data is traditionally stored on host files systems but:

• Host file systems impose fixed limits - do not scale up to the size we need

• File systems do not meet performance requirements due to host limiting access

Friday, July 27, 2012

Page 10: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

Big Unstructured Data

10

• Most unstructured data is archived, often to tape (cost), then difficult to access

• Volumes are increasing exponentially

• Data archives are an organization & management burden (Grandma’s Attic)

Friday, July 27, 2012

Page 11: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

Big Unstructured Data

11

• Companies are starting to see the value of the data in their archives:• Documents of individuals can be valuable for others• Some companies have legal reasons to keep data available• Unexplored analytics opportunities• This data can be mined and monetized

Friday, July 27, 2012

Page 12: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

Big Unstructured Data

12

But how do store all this data in a cost efficient way?

“Building cost-efficient Live Archives”

Friday, July 27, 2012

Page 13: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

Big Unstructured Data

13

What are the requirements?

• Tape is a difficult option: access latency is key

• Data has to be always available online

• Direct interface to the applications

• Petabyte scalability

• Extreme reliability, integrity

• Cost-efficient

• Security

Disk Storage (online, low-latency access)

+ Open application API’s (App & Cloud-enabled)

+ Ultra-high data durability (Erasure Coding)

= Optimized Object Storage

}}

Friday, July 27, 2012

Page 14: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

Disk vs. Tape

14

Tape has several obvious advantages over disk & there will always be use cases for tape

But disks enable live archives with instant data accessibility

More arguments for disk-based archives• Disks can be powered down• Tape requires replication to protect against media errors• Data integrity checking • Massive migration projects• …

Friday, July 27, 2012

Page 15: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

Object Storage Simplifies this Problem

15

• File System organization of data becomes a burden

• File systems impose limitations on numbers of files & directories

• Very time-consuming to organize data

• Object Storage simplifies this problem

• Flat “Namespaces” (not file systems) - without storage limits

• Let’s the applications talk directly to the Storage

• Use “Object” application API’s to let applications directly manage objects & metadata

• File Gateways can be used as a transition bridge

• Bring legacy data and apps into Object Storage

Object API

Application Application Application

Friday, July 27, 2012

Page 16: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

Petabyte Scalability and Beyond

16

Systems should scale BIG• Beyond petabytes of data – no built-in limits• Beyond billions of data objects

Systems should scale uniformly• Add resources incrementally and grow as a Single System View• Manage from a “Single Pane of Glass”• Scale performance and capacity separately• Migration and seamless growth across newer generations of component

technologies (processors, disk densities)

Friday, July 27, 2012

Page 17: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

Ultra-High Levels of Data Integrity

17

• Data needs to be archived for lifetimes• Expect “bit perfect” integrity to store gold-copy of critical assets• Consolidate multiple copies of data into a single highly-durable tier

• Ensuring the integrity of long-term unstructured data archive requires new data protection algorithms, to:• Address the increasing capacity of disk drives• Solve issues related to long RAID rebuild windows

“Object storage systems based on erasure-coding can not only protect data from higher numbers of drive failures, but also against the failure of entire storage modules.”

Friday, July 27, 2012

Page 18: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

Big Unstructured Data

18

What are the requirements?

• Tape is a difficult option: access latency is key

• Data has to be always available online

• Direct interface to the applications

• Petabyte scalability

• Extreme reliability, integrity

• Cost-efficient

• Security

Disk Storage (online, low-latency access)

+ Open application API’s (App & Cloud-enabled)

+ Ultra-high data durability (Erasure Coding)

= Optimized Object Storage

}}

Friday, July 27, 2012

Page 19: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

Paul Speciale, VP Products, Amplidata Inc. www.amplidata.com

Thank You!

Friday, July 27, 2012

Page 20: SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

Sponsored Workshop

Friday, July 27, 2012