Top Banner
Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D., IBM
13

Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

Sep 11, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

AcceleratingImageRecognitionProcessingUsingTieredStorage

andSparkIsomCrawfordJr.,Ph.D.,IBM

DavidChen,Ph.D.,IBM

Page 2: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

SparkProcessing

• Open-sourceframework,improvementsonHadoop• Popularfordataanalytics– text,imagery,etc.

Page 3: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

TiersofStorage• Multiple“levels”oflatency,bandwidth,andcost

SAS,etc.

Cloud

SequentialMedia

Processor

SSD

Page 4: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

TieredStorage• Organizedandmanagedtomovedatabeingprocessedclosertoprocessor• Andvice-versa:movedatanotbeingprocessedtoless-expensivestorage

SAS,etc.

Cloud

SequentialMedia

Processor

SSD

Page 5: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

Problem– Less-expensivestorageI/Ospeeds

• Tieredstorage• VariouslevelsofI/Operformance• Faster->smaller• Larger->slower

• Data(orInformation)LifecycleManagement• Handlesautomaticmigrationbetweentiers,e.g.,ILMpolicies• Eliminatesmanualretrieval

• Buton-demandretrievaladdsalotoflatency

Page 6: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

Example– ImageRecognition

• ConsiderImageRecognition• Oncemodelisbuilt,recongition isrelativelyfast

• Fasterthancloudorsequentialmediacanfeeddata

• Problem:HowtoimproveI/Ospeedsfromless-expensivestorage

Page 7: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

SequentialapproachRecognition

Processing(Tp)DataRetrieval(Td)

Page 8: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

Pipelineapproachsingleprocessor,multi-retriever

RecognitionProcessing(Tp)DataRetrieval(Td)

Page 9: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

Pipelineapproachsingleprocessor,multi-retriever

RecognitionProcessing(Tp)DataRetrieval(Td) Ostensibly(Ideally?),numberofretrievalthreadsis

ratioofretrievaltimetoprocessingtime:Nr ~Td/Tr

Challenges:Nonuniform processingtimesTpRetrievalthreadsmaycorrespondtotapedrive

(ormaynot!)Retrievaltimesnotlikelytobeproportiional to

imagesizeinitialTp (initialtapeload,etc.)networktraffic(cloud)

Generallynon-deterministicLimitedbynumberoftapedrives,networkconnections

Nr

Nr

Page 10: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

Strategies

• Prefetch inputdatasets

• Use“close”storageascache

• Communicationbetweenprocessingthreadsandprefetch threads• Usecomm file,SparksharedRDD(IgniteRDD,IBMConductor/sharedRDD)

• Leverageschedulers(Lava,LSF,SGE,ConductorforSpark,etc.)tostartprefetchjob(s)beforeprocessing• Needtomanage“cache”capacity–usecommunicationtopurge• LeverageInformationLifecyclefunctionality(Scale/GPFS,etc.)

Page 11: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

SimplePrefetch ApproachSimpledatastructure

(Filename) StatusAbc1.png processedAbc2.png processedAbc3.png processingAbc4.png cacheAbc5.png cacheAbc6.png tapeAbc7.png tape…

Prefetch threadsfindnextinputobjectstillarchived,initiate

retrieval

Prefetch threadsfindnextinputobjectstillarchived,initiate

retrieval

If/whencacheapproachesbeingfull,archivethreadsmove

objectstoless-expensivestorage

Page 12: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

ImplementationApproaches

• Usecomm file• UsePOSIXfile-lockingtoallowcommunication

• SparksharedRDD(ApacheIgnite,IBMConductor-with-Spark)• Leveragesharedmemoryapproach• Fastercommunication

Page 13: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

Summary

• Tieredstoragepresentsefficiencyandchallenges• Prefetchingtechnologywell-understoodfor”singletier”(filecaching)• Futureinvestigation• IdentifysharedRDDtemplate,syntaxtobettersupportmulti-tierprefetch• Prefetch objects,files,and/orblocks(granularitychallenges&efficiencies)• Application-triggeredarchival,post-processing(selective,read-only,etc.)