Top Banner
University of Minnesota D igital Technology Center I ntelligent S torage C onsortium Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search
22

Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

Mar 18, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

Aravindan RaghuveerDavid.H.C.Du

DISC

Object-based Storage for Exhaustive Search

Page 2: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

Introduction

Exhaustive Search– Examine all objects in a storage system.– An expensive Operation

Why Exhaustive Search ?– Fuzzy Queries:

Semantic gap in image, video hard to annotate Content-based (Query-by-Example) Demonstrated in the Diamond project at Intel/CMU

– Index Creation: Not effective: Curse of dimensionality Too expensive Not always possible: Fuzzy queries

A “necessary evil” feature on all filesystems.

Page 3: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

The factors . . .

Four recent developments spur us to rethink theway exhaustive search is implemented today:

Data Characteristics Disk Technology Trends Filesystem and Database Design Concurrent Applications

Page 4: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

Factor-1: Data Characteristics

Petabyte scale data sets are becoming common indata mining and HPC apps

Video surveillance:– terabytes of data (depending on number of feeds) that needs

to be searched by content. For instance, experiments @ Mayo clinic generate:

– 3 million images per week– 1TB per 9 days– Presentation in ISW last year.

Impact on exhaustive search– Reduce search space of exhaustive search as much as

possible (for instance, based on metadata)– Exhaustive Search better be darn efficient !

Page 5: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

Factor-2: Disk Technology Trends

Bits per unit area increasing rapidly I/O Bandwidth lagging behind Effect on exhaustive search:

– 1 day to sequentially read 10TB*– 5 months with 8KB chunk random access !!

Impact on exhaustive search:– Exhaustive Search algorithm should be conscious of avoiding

random disk seeks– Try to get as sequential performance as we possibly can.

* Dr. Jim Gray’s keynote from FAST’05:

Page 6: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

Factor-3: Filesystem and Database Design

Exhaustive search (eg. grep) runs on top of a filesystem.

Filesystem or a database is not even aware that theapplication is exhaustively searching

Exhaustive Search is not the primary design criteriafor today's’ filesystems.– Filesystem level exhaustive search: Recursive exploration of

directories. With aged, fragmented filesystems:

– At the disk: an Exhaustive search will look more like randomaccess than sequential.

Databases : not as efficient as filesystems in handlingblobs in the presence of fragmentation*.

Impact on exhaustive search: F/S and D/B are not theright place to embed exhaustive search.

* R. Sears, C.Van. Ingen, “Fragmentation in Large Object Repositories”, CIDR 2007

Page 7: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

Factor 4: Concurrent Applications

Exhaustive Search : Long running, I/O intensive task. Other filesystem applications running concurrently. Concurrent execution of both:

– Performance Isolation: Impact on response time of other applications should be minimal. Impact on efficiency of exhaustive search should be as low as

possible.

Not possible with block storage:– Cannot distinguish one block request from another.– No priorities and QoS levels assigned to requests

Impact on exhaustive search:– The storage device should be able to provide differentiated

services.

Page 8: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

Summary of what we’ve seen:

Data Characteristics:– Exhaustive Search better be darn efficient !

Disk Technology Trends:– Exhaustive Search algorithm should be conscious

of avoiding random disk seeks Filesystem and Database Design:

– F/S and D/B are not the right place to embedexhaustive search

Concurrent Applications:– The storage device should be able to provide

differentiated services.

Page 9: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

What this work is about ?

A fresh look at Exhaustive Search Ensure that the storage system isnever the bottleneck in performance. Conscious of random disk seeks. Close-to-sequential performancealways Concurrent execution with otherfilesystem apps.

– Without compromising extensively onresponse time and efficiency

Page 10: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

An Overview of proposed approach

Layout aware:– Search order not based on logical filesystem view but

physical on-disk organization.– As close to sequential performance as possible.

Suspend-and-resume– On a real-time request to disk:

Suspend exhaustive search. Service real-time request. Resume exhaustive search.

– Modify search order based on current disk headposition.

Page 11: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

Questions to be answered … Architecture:

– Where to embed functionality: filesystem or smart object baseddisk ?

Layout-Aware Search:– Planning the search ?– Metadata handling and placement?

Where are object extents located List of objects already scanned

Suspend-Resume:– Maintaining search progress metadata to avoid re-scanning [suspend]– Computing new search plan [resume]

Page 12: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

Proposed Solution

Architecture: An intelligent storage node (ISN)capable of exhaustive search.– Exposes an object interface– A case for application-awareness at the storage level to

improve performance Why OSD ?

– File-system or databases does not have idea of storageinternals and parameters.

– Filesystem can be built on multiple layers of virtualization disconnected from actual reality (disk boundaries)

– Filesystem level search performance degrades withfragmentation.

– Block-storage does not differentiate real-time and exhaustivesearch.

Page 13: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

The Intelligent Storage Node Storage node with:

– T10 compatible object interface– Capable of executing a limited set of exhaustive search queries.– Called an Intelligent Storage Node (ISN)

Extension to the OSD interface:– Command OSD_QUERY to trigger an exhaustive search.– resultSetCollectionID OSD_QUERY(queryType,

exampleObjectID)

Search Order : Object-Fragment level search as opposed to objectlevel.

Suspend and Resume :– Static : Search Order not modified on resume– Dynamic : Search Order adjusted on resume

Page 14: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

ISN for Exhaustive Search

A case for application-aware intelligent storage Application Characteristic:

– Order in which object fragments are scanned is notimportant

Storage-Device characteristic:– Sequential performance is 10X better than random access

performance

Application-Aware Storage Optimization:– Determining search order of fragments to obtain close-to-

sequential performance– Suspend-and-resume support for real time requests.

Page 15: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

Architecture of ISN:

Prototype under development based on DISC OSD referenceimplementation:– Object filesystem (ext3)– Fragment Indexer– Search Planner

Real-time request support implementation in progress.

Initial results look very promising..

OSD Command Interpreter

Object Filesystem

Fragment Indexer

Search Planner

Block Device

Page 16: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

Experimental Setup

Aging tool

F/S SearchPlanner

Layout -Aware Search Planner

Search executor

Ext3 Filesystem

Aging tool syntheticallyfragments a filesystem through

file append, delete, createoperations.

Filesystem search plan Layout-Aware search plan

Page 17: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

Results

Storage Age = 5 Filesystem usage 10G (Partition Size = 63G) Time taken for exhaustive search

– Filesystem : 41 mins– Layout-aware search : 7 mins

Page 18: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

Acknowledgments

DISC Team– Faculty and Cory Devor– Students

Member Companies

Page 19: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

Thank You!

Questions ??

Page 20: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

More spindles ??

More spindles comes at a cost:– Hardware cost : not too bad.– Maintenance : backup, scrubbing : expensive.– Concurrent failure issues.

Our technique can make “more-spindles” evenbetter.

More spindles not a solution for all scenarios– Home user– Video and image search imaginable.

Page 21: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

Lazy defragmentation of filesystem??

Alleviates the issue but an expensiveoperation in itself.

Still may not work as good as a storage levelsearch with multiple levels of virtualization.

Page 22: Aravindan Raghuveer David.H.C.Du DISC - DTC · 2012. 8. 16. · Aravindan Raghuveer David.H.C.Du DISC Object-based Storage for Exhaustive Search. University of Minnesota Digital Technology

University of Minnesota Digital Technology Center Intelligent Storage Consortium

Investigations toDo:

Layout-Awareness:– 2 modes of layout-aware search.– Pre-planned and adhoc.

Pre-planned used when the disk stores a small number ofobjects.

Adhoc mode used when the disk is almost full. Pre-planned and adhoc can be used at finer granularities

(example: different modes on different areas of the disk)– Suspend-Resume:

Suspend: Search Metadata is distributed over the disk, close to thedata.

Resume: Based on the remaining number of objects we either shiftto the pre-planned or adhoc mode.