YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

AcceleratingImageRecognitionProcessingUsingTieredStorage

andSparkIsomCrawfordJr.,Ph.D.,IBM

DavidChen,Ph.D.,IBM

Page 2: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

SparkProcessing

• Open-sourceframework,improvementsonHadoop• Popularfordataanalytics– text,imagery,etc.

Page 3: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

TiersofStorage• Multiple“levels”oflatency,bandwidth,andcost

SAS,etc.

Cloud

SequentialMedia

Processor

SSD

Page 4: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

TieredStorage• Organizedandmanagedtomovedatabeingprocessedclosertoprocessor• Andvice-versa:movedatanotbeingprocessedtoless-expensivestorage

SAS,etc.

Cloud

SequentialMedia

Processor

SSD

Page 5: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

Problem– Less-expensivestorageI/Ospeeds

• Tieredstorage• VariouslevelsofI/Operformance• Faster->smaller• Larger->slower

• Data(orInformation)LifecycleManagement• Handlesautomaticmigrationbetweentiers,e.g.,ILMpolicies• Eliminatesmanualretrieval

• Buton-demandretrievaladdsalotoflatency

Page 6: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

Example– ImageRecognition

• ConsiderImageRecognition• Oncemodelisbuilt,recongition isrelativelyfast

• Fasterthancloudorsequentialmediacanfeeddata

• Problem:HowtoimproveI/Ospeedsfromless-expensivestorage

Page 7: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

SequentialapproachRecognition

Processing(Tp)DataRetrieval(Td)

Page 8: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

Pipelineapproachsingleprocessor,multi-retriever

RecognitionProcessing(Tp)DataRetrieval(Td)

Page 9: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

Pipelineapproachsingleprocessor,multi-retriever

RecognitionProcessing(Tp)DataRetrieval(Td) Ostensibly(Ideally?),numberofretrievalthreadsis

ratioofretrievaltimetoprocessingtime:Nr ~Td/Tr

Challenges:Nonuniform processingtimesTpRetrievalthreadsmaycorrespondtotapedrive

(ormaynot!)Retrievaltimesnotlikelytobeproportiional to

imagesizeinitialTp (initialtapeload,etc.)networktraffic(cloud)

Generallynon-deterministicLimitedbynumberoftapedrives,networkconnections

Nr

Nr

Page 10: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

Strategies

• Prefetch inputdatasets

• Use“close”storageascache

• Communicationbetweenprocessingthreadsandprefetch threads• Usecomm file,SparksharedRDD(IgniteRDD,IBMConductor/sharedRDD)

• Leverageschedulers(Lava,LSF,SGE,ConductorforSpark,etc.)tostartprefetchjob(s)beforeprocessing• Needtomanage“cache”capacity–usecommunicationtopurge• LeverageInformationLifecyclefunctionality(Scale/GPFS,etc.)

Page 11: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

SimplePrefetch ApproachSimpledatastructure

(Filename) StatusAbc1.png processedAbc2.png processedAbc3.png processingAbc4.png cacheAbc5.png cacheAbc6.png tapeAbc7.png tape…

Prefetch threadsfindnextinputobjectstillarchived,initiate

retrieval

Prefetch threadsfindnextinputobjectstillarchived,initiate

retrieval

If/whencacheapproachesbeingfull,archivethreadsmove

objectstoless-expensivestorage

Page 12: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

ImplementationApproaches

• Usecomm file• UsePOSIXfile-lockingtoallowcommunication

• SparksharedRDD(ApacheIgnite,IBMConductor-with-Spark)• Leveragesharedmemoryapproach• Fastercommunication

Page 13: Accelerating Image Recognition Processing Using Tiered ......Accelerating Image Recognition Processing Using Tiered Storage and Spark Isom Crawford Jr., Ph.D., IBM David Chen, Ph.D.,

Summary

• Tieredstoragepresentsefficiencyandchallenges• Prefetchingtechnologywell-understoodfor”singletier”(filecaching)• Futureinvestigation• IdentifysharedRDDtemplate,syntaxtobettersupportmulti-tierprefetch• Prefetch objects,files,and/orblocks(granularitychallenges&efficiencies)• Application-triggeredarchival,post-processing(selective,read-only,etc.)


Related Documents