RaVioli: A Parallel Video Processing Library with Auto Resolution Adjustability Hiroko SAKURAI † Masaomi OHNO † Shintaro OKADA ‡ Tomoaki TSUMURA † Hiroshi MATSUO † † Nagoya Institute of Technology, Japan ‡ Toyota Motor Corp., Japan IADIS International Conference APPLIED COMPUTING 2009 November 19 – 21, 2009 Rome, Italy
33
Embed
RaVioli: A Parallel Vide Processing Library with Auto Resolution Adjustability
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
RaVioli: A Parallel Video Processing Librarywith Auto Resolution AdjustabilityHiroko SAKURAI† Masaomi OHNO† Shintaro OKADA‡Tomoaki TSUMURA† Hiroshi MATSUO†† Nagoya Institute of Technology, Japan‡ Toyota Motor Corp., JapanIADIS International Conference APPLIED COMPUTING 2009November 19 – 21, 2009Rome, Italy
Background(1/2): Portability of Video Applications• Real-time video processing applications– should run on a great variety of platforms• Cell phones• Cars• PCs
– Principal goal of an application• Long battery life• High throughput• Good accuracy
Applied Computing 2009 2
We must rewrite a video processing program,when porting it to another platform
Background(2/2): Many-Core Era is Coming• Multi/Many-core processors have come into wide use• Video processing applications– have various parallelisms• Pixels in video frames have data parallelism• Multiple frames can be processed in parallel by pipelining
– promise good performance on such parallel systems
Applied Computing 2009 3
Parallelizing programs is not so simpleIt becomes much important to improve compilers and libraries
A Video Processing Library: RaVioli• RaVioli provides:– Easy writeability of• pseudo real-time video processing
– Interfaces for parallelization• Detecting data dependencies and formulating reductions• Balancing loads of pipeline stages
Applied Computing 2009 4
Outline• Concept of RaVioli– RaVioli hides resolutions from programmers– Easy writeability of video processing applications
• Pseudo real-time processing by adjusting loads• Semi-automatic parallelization functions– Automatic block decomposition– Pipelining interface with automatic load balance mechanism
• Evaluation resultsApplied Computing 2009 5
Traditional Image Processing Program• Image processing program written by traditional C
int sum = 0;void pixSum(RV_Pixel p){ sum += 1;}int main(){ RV_Image InputImg; //read image data in “InputImg” InputImg.procPix(pixSum);}
sum += 1;
_localsum+=1;sum+= _localsum;
sum += 1associative law ?commutative law ? Reductionoperation
_localsum += 1;
inputImg.reduction(__pixSum);
__thread int _localsum = 0;Component function
void __pixSum(int threadNum){ mutex_lock(&Mutex); sum += _localsum; mutex_unlock(&Mutex);}InputImg.procPix(pixSum, 4);
associative law OK!commutative law OK!
Outline• Concept of RaVioli– RaVioli hides resolutions from programmers– Easy writeability of video processing applications
• Pseudo real-time processing by adjusting loads• Semi-automatic parallelization functions– Automatic block decomposition– Pipelining interface with automatic load balance mechanism
• Evaluation results of our workApplied Computing 2009 19
Assisting Pipeline Implementation• For building pipeline– Whole process is split into several stages– Several threads are created and assigned to the stages– FIFOs are needed to be implemented and managed for data transfer between stages
Applied Computing 2009 20
binarize edgedetect houghtrans・・・
FIFO3・・・
FIFO2・・・
FIFO1 thread1 thread2 thread3
Creating threads and FIFOs • is not the essence of video processing• is troublesome for programmers
Interface for Pipelining
Applied Computing 2009 21
RV_Pipedata* GrayScale(RV_Pipedata* data){ // Grayscale processing for a frame return data;}RV_Pipedata* Laplacian(RV_Pipedata* data){ // Laplacian filter processing for a frame return data;}int main (){ RV_Pipeline pipe; pipe.push(GrayScale); pipe.push(Laplacian); pipe.run(); return 0;}
・・・
FIFO1・・・
FIFO2thread1 thread2pushGrayScale Laplacianrun
RV_Pipeline pipe
Load Imbalance between Stages
Applied Computing 2009 23
A Bthread1 thread2 thread3
A BA B
A B Cthread1 thread2 thread3・・・
・・・
・・・
C Cframe1frame2frame3
C
123
Pipelinestalls
Automatic Load Balancing
Applied Computing 2009 24
thread1 thread2 thread3frame1frame2frame3
A B Cthread1 thread2 thread3・・・
B・・・
・・・
thread1
Cthread3Cthread2
Automatic Load Balancing
Applied Computing 2009 25
thread1 thread2 thread3A B A B A B
frame1frame2frame3
Athread1・・・
・・・
Bthread1
Cthread3Cthread2
CC C
123
Outline• Concept of RaVioli– RaVioli hides resolutions from programmers– Easy writeability of video processing applications
• Pseudo real-time processing by adjusting loads• Semi-automatic parallelization functions– Automatic parallelization with block decomposition– Pipelining interface with automatic load balance mechanism
• Evaluation results of our workApplied Computing 2009 26
OS Solaris 10CPU UltraSPARC T1Frequency 1.0GHzNumber of cores 8Number of active threads per core 4Memory 16GBCompiler Sun Studio 12 (Sun C++5.9)Compiler options -fast –m64 –xchip=ultraT1Thread library pthreads
Conclusion• RaVioli– hides resolutions from programmers
• pseudo real-time processing– has semi-automatic parallelization functions
• semi-automatic block decompotision• load balancing mechanism between pipeline stages
• Our future works– implementing automatic power-saving function to RaVioli– making RaVioli adaptive to various platforms such as Cell Broadband Engine– designing easy-to-write language which cooperates with RaVioliApplied Computing 2009 32