3-Tier Dynamically Adaptive Power-Aware Motion Estimator for H.264/AVC Video Encoding Muhammad Shafique, Lars Bauer, and Jörg Henkel University of Karlsruhe, Chair for Embedded Systems, Karlsruhe, Germany {shafique, lars.bauer, henkel} @ informatik.uni-karlsruhe.de ABSTRACT The limitation of energy in portable communication/entertainment devices necessitates the reduction of video encoding complexity. The H.264/AVC video coding standard is one of the latest video codecs and features a complex Motion Estimation scheme that ac- counts for a major part of the encoder energy [2]. We therefore present a power-aware Motion Estimator for H.264 that adapts at run time according to the available energy level. We perform a set of adaptations at different Processing Stages of Motion Estima- tion. Our results show that in case of CIF videos (typically used in portable devices; but our approach is equally applicable to other video resolutions too), we achieve an average energy reduction of 52 times and 27 times as compared to UMHexagonS [12] and EPZS [14] respectively. This energy saving comes at the cost of an average loss of only 0.39 dB in Peak Signal to Noise Ratio (PSNR: an objective quality measure) and 23% increase in area (synthesized for 90nm technology). Categories and Subject Descriptors B.7.1 [INTEGRATED CIRCUITS]: Types and Design Styles- Algorithms implemented in hardware General Terms: Performance, Design, Algorithms Keywords: H.264, MPEG-4 AVC, Motion Estimation, low power, low complexity, run-time adaptation, early termination, search pattern 1. INTRODUCTION AND BASIC IDEA In today’s mobile devices, energy/power is a critical design pa- rameter and multimedia is a major application domain in emerging mobile devices. Video encoding consumes a significant amount of processing time and energy. Encoding effort highly depends upon the characteristics of the input video sequence and the target bit rates. Under changing scenarios of input-data characteristics and available energy budgets, embedded solutions for video encoding require run-time adaptivity. H.264/MPEG-4 AVC [1] from the Joint Video Team (JVT) of the ITU-T VCEG and ISO/IEC MPEG is one of the latest video coding standards. It provides a bit rate reduction of 50% as com- pared to MPEG-2 with similar subjective visual quality [10] but at the cost of additional computational complexity (~10x relative to MPEG-4 simple profile encoding [11]). This is mainly due to the enhanced Motion Estimation (ME) comprising variable block size ME, sub-pixel ME up to quarter pixel accuracy, and multiple ref- erence frames. As a result, the ME process may consume up to 60% (1 reference frame) and 80% (5 reference frames) of the total encoding time [12]. High computational load makes the ME mod- ule not only time consuming but also power demanding [2]. The available energy budgets may change according to various application scenarios on mobile devices. Although there has been extensive research on low-power ME [3, 4, 16], related work does not provide run-time adaptivity. In particular, state-of-the-art low- power ME approaches focus either on modifying the Sum of Abso- lute Differences (SAD) formula [3, 4] or on low-power hardware implementations [16]. [5] exploits input-data variations to dy- namically configure the search-window size of Full Search but does not consider the changing levels of energy/power. Moreover, it targets Full Search which is far more energy consuming than state-of-the-art Motion Estimators. Summarizing, state-of-the-art low-power Motion Estimators offer a fixed low-power solution that may work well for short duration encodings of low-resolution images with low motion characteristics. These approaches do not perform well in case of rapid motions and long duration real-time encodings and ultimately are less energy/power efficient. Varying motion types and changing status of available energy budgets sti- mulate the need for a run-time adaptive power-aware Motion Es- timator while exhibiting minimal loss in the video quality (PSNR). In this paper, we propose a novel run-time adaptive power- aware Motion Estimator to cope the run-time changing scenarios of available energy budgets while keeping a good video quality. We achieve this by means of multiple Operational Levels, be- tween which our novel Motion Estimator dynamically switches. Each level constitutes multiple ME stages (as we will see in Sec- tion 3) where each stage is also run-time adaptive to save energy using motion field characteristics and power-aware termination criteria (see Section 4). Besides the three Operational Levels of ME, we propose a set of Conditions and Equations to activate run- time adaptations and level switching for changing scenarios of en- ergy/power and performance. We have benchmarked the hardware implementation of our Motion Estimator against four state-of-the- art Motion Estimators for various test video sequences. We will now present a simple example (corroborated with results) to show how we reduce the energy requirements of ME. Test Conditions: A mobile device scenario is considered where an H.264 video encoder is encoding a 30 fps Quarter Common In- termediate Format (QCIF) Carphone [8] test video at 128 kbps. We have used the Orinoco tool [9] and an available 90 nm tech- nology library (Frequency: 150 MHz) for energy estimation. Figure 1: Switch Search Pattern from UMHexagonS to TSS Example: Reduction in the total number of computed SADs: The Sum of Absolute Differences (SAD) is the basic processing block in the ME process and is used to calculate the block distortion for a current Macroblock (MB) w.r.t. a reference MB. For one MB in the current frame (F t ), various SADs are computed using MBs from the reference frame (e.g. immediately previous F t-1 ). The higher the number of SADs/MB, the higher is the energy con- sumption. Figure 1 shows the implementation of UMHexagonS [12] and Three-Step Search (TSS) [15]. For the above-mentioned test conditions, UMHexagonS requires on average 45 SADs/MB. Once the available energy drops below a particular level, we may switch the search pattern from UMHexagonS to TSS, which is rel- atively light and only requires 25 SADs/MB. Table 1 shows that this switching of search pattern results in a reduction of total en- ergy from 98.15 μWs to 54.20 μWs for 1 QCIF video frame. Permission to make digital or hard copies of all or part of this work for per- sonal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISLPED’08, August 11–13, 2008, Bangalore, India. Copyright 2008 ACM 978-1-60558-109-5/08/08...$5.00. 147
6
Embed
3-tier dynamically adaptive power-aware motion estimator for h.264/AVC video encoding
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
3-Tier Dynamically Adaptive Power-Aware Motion Estimator for H.264/AVC Video Encoding
Muhammad Shafique, Lars Bauer, and Jörg Henkel University of Karlsruhe, Chair for Embedded Systems, Karlsruhe, Germany
ABSTRACT The limitation of energy in portable communication/entertainment devices necessitates the reduction of video encoding complexity. The H.264/AVC video coding standard is one of the latest video codecs and features a complex Motion Estimation scheme that ac-counts for a major part of the encoder energy [2]. We therefore present a power-aware Motion Estimator for H.264 that adapts at run time according to the available energy level. We perform a set of adaptations at different Processing Stages of Motion Estima-tion. Our results show that in case of CIF videos (typically used in portable devices; but our approach is equally applicable to other video resolutions too), we achieve an average energy reduction of 52 times and 27 times as compared to UMHexagonS [12] and EPZS [14] respectively. This energy saving comes at the cost of an average loss of only 0.39 dB in Peak Signal to Noise Ratio (PSNR: an objective quality measure) and 23% increase in area (synthesized for 90nm technology). Categories and Subject Descriptors B.7.1 [INTEGRATED CIRCUITS]: Types and Design Styles- Algorithms implemented in hardware General Terms: Performance, Design, Algorithms Keywords: H.264, MPEG-4 AVC, Motion Estimation, low power, low complexity, run-time adaptation, early termination, search pattern 1. INTRODUCTION AND BASIC IDEA In today’s mobile devices, energy/power is a critical design pa-rameter and multimedia is a major application domain in emerging mobile devices. Video encoding consumes a significant amount of processing time and energy. Encoding effort highly depends upon the characteristics of the input video sequence and the target bit rates. Under changing scenarios of input-data characteristics and available energy budgets, embedded solutions for video encoding require run-time adaptivity.
H.264/MPEG-4 AVC [1] from the Joint Video Team (JVT) of the ITU-T VCEG and ISO/IEC MPEG is one of the latest video coding standards. It provides a bit rate reduction of 50% as com-pared to MPEG-2 with similar subjective visual quality [10] but at the cost of additional computational complexity (~10x relative to MPEG-4 simple profile encoding [11]). This is mainly due to the enhanced Motion Estimation (ME) comprising variable block size ME, sub-pixel ME up to quarter pixel accuracy, and multiple ref-erence frames. As a result, the ME process may consume up to 60% (1 reference frame) and 80% (5 reference frames) of the total encoding time [12]. High computational load makes the ME mod-ule not only time consuming but also power demanding [2].
The available energy budgets may change according to various application scenarios on mobile devices. Although there has been extensive research on low-power ME [3, 4, 16], related work does not provide run-time adaptivity. In particular, state-of-the-art low-
power ME approaches focus either on modifying the Sum of Abso-lute Differences (SAD) formula [3, 4] or on low-power hardware implementations [16]. [5] exploits input-data variations to dy-namically configure the search-window size of Full Search but does not consider the changing levels of energy/power. Moreover, it targets Full Search which is far more energy consuming than state-of-the-art Motion Estimators. Summarizing, state-of-the-art low-power Motion Estimators offer a fixed low-power solution that may work well for short duration encodings of low-resolution images with low motion characteristics. These approaches do not perform well in case of rapid motions and long duration real-time encodings and ultimately are less energy/power efficient. Varying motion types and changing status of available energy budgets sti-mulate the need for a run-time adaptive power-aware Motion Es-timator while exhibiting minimal loss in the video quality (PSNR).
In this paper, we propose a novel run-time adaptive power-aware Motion Estimator to cope the run-time changing scenarios of available energy budgets while keeping a good video quality. We achieve this by means of multiple Operational Levels, be-tween which our novel Motion Estimator dynamically switches. Each level constitutes multiple ME stages (as we will see in Sec-tion 3) where each stage is also run-time adaptive to save energy using motion field characteristics and power-aware termination criteria (see Section 4). Besides the three Operational Levels of ME, we propose a set of Conditions and Equations to activate run-time adaptations and level switching for changing scenarios of en-ergy/power and performance. We have benchmarked the hardware implementation of our Motion Estimator against four state-of-the-art Motion Estimators for various test video sequences. We will now present a simple example (corroborated with results) to show how we reduce the energy requirements of ME.
Test Conditions: A mobile device scenario is considered where an H.264 video encoder is encoding a 30 fps Quarter Common In-termediate Format (QCIF) Carphone [8] test video at 128 kbps. We have used the Orinoco tool [9] and an available 90 nm tech-nology library (Frequency: 150 MHz) for energy estimation.
Figure 1: Switch Search Pattern from UMHexagonS to TSS
Example: Reduction in the total number of computed SADs: The Sum of Absolute Differences (SAD) is the basic processing block in the ME process and is used to calculate the block distortion for a current Macroblock (MB) w.r.t. a reference MB. For one MB in the current frame (Ft), various SADs are computed using MBs from the reference frame (e.g. immediately previous Ft-1). The higher the number of SADs/MB, the higher is the energy con-sumption. Figure 1 shows the implementation of UMHexagonS [12] and Three-Step Search (TSS) [15]. For the above-mentioned test conditions, UMHexagonS requires on average 45 SADs/MB. Once the available energy drops below a particular level, we may switch the search pattern from UMHexagonS to TSS, which is rel-atively light and only requires 25 SADs/MB. Table 1 shows that this switching of search pattern results in a reduction of total en-ergy from 98.15 µWs to 54.20 µWs for 1 QCIF video frame.
Permission to make digital or hard copies of all or part of this work for per-sonal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISLPED’08, August 11–13, 2008, Bangalore, India. Copyright 2008 ACM 978-1-60558-109-5/08/08...$5.00.
Recapitulating, our novel multi-level Motion Estimator that in-corporates power-aware adaptations at different Processing Stages of ME process (Section 3 and 4) is crucial. We will show an aver-age energy reduction of 52 times and 27 times as compared to UMHexagonS [12] and EPZS [14] respectively at the cost of an average loss of 0.39 dB in PSNR (Section 5). Our novel contributions: • A 3-tier low-power Motion Estimator that adapts at run-time
depending upon the changing scenarios of the available en-ergy budget, motion field characteristics and performance (Section 3 and 4).
• A set of Conditions and Equations along with the Triggering Parameters for run-time Adaptations at each Processing Stage of Motion Estimator (Section 4).
• A set of Patterns (Octagon-Star and Polygon), power-aware dynamically adaptive Thresholds for early termination, Tem-poral Median Predictors, and a AltGroupAltRow pixel deci-mation pattern for SAD computation (Section 3).
In Section 5, we present the hardware of our novel Motion Estima-tor and benchmark it for energy/performance/area results against four state-of-the-art Motion Estimators using various video se-quences. We conclude in Section 6. 2. RELATED WORK Many fast block-matching algorithms have been focused on the search strategies with predictor set, early termination criteria, and search patterns, such as Three-Step Search (TSS) [15], Unsymmet-rical-cross Multi-Hexagon-grid Search (UMHexagonS) [12], sim-ple UMHexagonS [13], Enhanced Predictive Zonal Search (EPZS) [14], etc. However, these algorithms only consider per-formance and visual quality but not energy/power, and compute a large number of SADs/MB for Motion Estimation.
[16, 17] provide various VLSI implementations to expedite the process of H.264 Motion Estimation. Most of these hardware im-plementations are either suited for Full Search or UMHexagonS. An ASIP-based approach is considered in [18] but it uses only spatial predictors and does not consider temporal information of the motion field and takes longer to find the optimal MV in case of heavy angular motions. [19] explores fixed and random sub-sampling patterns for computation reduction in case of Global Motion Estimation. [6] focuses on reducing the power by eliminat-ing candidates using partial distortion sorting. Partial distortion sorting is itself an overhead and it excludes only a set of candi-dates from Full Search, which still results in a much larger num-ber of candidates. [4] presents a Motion Estimation scheme based on algorithmic noise tolerance. It uses an estimator based on input sub-sampling but this approach results in degradation especially in case of videos with small objects.
Our work is different from the previous approaches in several aspects: In contrast to [3-6], our Motion Estimator handles the en-ergy/power issue using three dynamically selectable Operational Levels, each with different energy consumption and different vis-ual quality. It makes our Motion Estimator adaptive for both en-ergy/power and performance that has not been targeted by others before. Unlike [18], our Octagonal-Star Search pattern captures the large angular/irregular motions and Polygon Search pattern re-fines the motion search in close vicinity. Unlike [12, 14], our threshold scheme for early termination is run-time adaptive for both energy/power and performance.
3. OUR POWER-AWARE MOTION ESTIMATOR
Figure 2: Overview of our Proposed Scheme of
Run-Time Power-Aware Motion Estimator We have performed a detailed energy analysis of state-of-the-art Motion Estimators. We found that these Motion Estimators (al-though providing a good video quality) do not consider changing energy budgets. Our proposed 3-tier Motion Estimator has 3 Op-erational Levels, each composed of multiple Processing Stages. Figure 2 presents the methodology of our run-time adaptive Mo-tion Estimator considering the available energy budget and per-formance constraints. A set of parameters is passed to the Adapta-tion block. Our Motion Estimator tracks the motion field charac-teristics (Monitoring block) and passes this information as a feedback to the Trigger block. The Trigger block gathers en-ergy/power, motion field, and user control information and for-wards it to the Adaptation block. Section 4 sheds more light on the Trigger, Monitoring, and Adaptation blocks with a state transition diagram of the Operational Levels. Now we will explain the com-plete flow of our proposed Motion Estimator and the constituting Processing Stages of three Operational Levels. 3.1 Flow of the Proposed Motion Estimator 1. // For Each Video Frame 2. powerLevel = getBatteryPower(); 3. LEVEL = setMELevel(powerLevel, bitRate); //Figure 7 4. setSearchRange&SAD(LEVEL); // Section 4.2 5. Thresholdpred = setThresholdspred(LEVEL); // Section 4.5 6. Thresholdpattern = setThresholdspattern(LEVEL); // Section 4.5 7. IntegerPixelMotionEstimation: 8. INPUT: currentMB, referenceMB, referenceFrame,
blockType, bitRate, powerLevel, SearchRange 9. OUTPUT: MVbest, SADbest 10. BEGIN 11. For all Macroblocks in the Video Frame 12. Predictors = selectPredictorSet(LEVEL); // Figure 6 13. // Check the Predictor Conditions for motion-dependent adaptations 14. RemainingPreds = checkPredConds(Predictors); // Figure 9 15. PerformSearch(RemainingPreds, Thresholdpred); 16. // Choose and Process an appropriate Search Pattern
depending upon the available energy budget 17. For all Patterns in the selected LEVEL 18. PerformSearch(pattern, Thresholdpattern); // Figure 5 19. EndFor 20. Store SADbest and MVbest obtained from Line 13-18; 21. EndFor 22. END
Figure 3: Complete Flow of our proposed Motion Estimator Figure 3 shows the flow of our Motion Estimator. ME is per-formed at frame level [20]. Before starting ME, the Level is de-termined and initialized (Lines 2-6). Inside the ME loop, the in-formation of motion field is used for adaptation at Macroblock-level (Lines 12 and 13). Afterwards, the selected predictors and search patterns are processed depending upon the chosen Opera-tional Level (Lines 14-18). At each step (Lines 14 and 17), the cor-responding Thresholds are checked for early termination.
Figure 4: Pixel-Decimation Patterns for SAD Computation
A technique that we use to reduce the energy of Motion Estima-tion (ME) is block matching with pixel decimation. For block matching, the matching criterion (SAD) is evaluated using every pixel of the MB. Figure 4 shows four decimation patterns consid-ered for evaluation. AltPixel and AltGroup4 are beneficial for high-textured videos and reduce the energy but these are not very beneficial for cache-based architectures because the data is al-ready in the cache. On the other hand, AltRow and AltGroup4AltRow are beneficial for cache-based architectures as they skip row by row. Therefore, we have considered AltGroup4 for Level-2 to counter the issue of heavy-textured videos and AltGroup4AltRow for Level-3 to obtain a significant energy reduc-tion even for cache-based architectures. These patterns scale down accordingly for different block modes in H.264. 3.2.2 Initial Search Point Prediction Initial Search Point Prediction provides a good starting point to converge quickly to the near-optimal Motion Vector (MV). Based on the assumption that the blocks of an object move together in the direction of the object motion, spatial and temporal neighbors are considered as good predictor candidates. Therefore, MV of the current MB is highly correlated with MVs of the spatially and temporally adjacent MBs that have been previously calculated. Our predictor set is: PredictorsSpatial = {MVzero, MVLeft , MVTop, MVTop-Left, MVTop-Right, Medianspatial (eq. 1)}, PredictorsTemporal = {MVcollocated, Mediantemporal1 (eq. 2), Mediantemporal2 (eq. 3)}. MVcol-located is the MV of the collocated MB in the previous frame (Ft-1). Medianspatial = median(MVL, MVT, MVTR)Ft (1) Mediantemporal1 = median(MVL, MVT, MVTR)Ft-1 (2) Mediantemporal2 = median(MVR, MVD, MVDR)Ft-1 (3) 3.2.3 Search Patterns
Figure 5: Four Search Patterns used in our Motion Estimator The search starts by choosing the best MV from the predictor set as the search center and evaluates the following search patterns: a) Octagonal-Star Search: This pattern consists of 20 search
points and handles large irregular motion cases. Figure 5 shows an Octagonal-Star Search pattern executing at a dis-tance of 8 pixels from the search center. For small-medium motion and CIF/QCIF video sequences only one Octagonal-Star Search pattern is processed. For heavier motions or high resolutions (D-1, HD), multiple Octagonal-Star Search pat-terns may be executed (each extended by a pixel distance of 12) until the minimum SAD is less than or equal to Thresholdpattern. The MV with minimum SAD in this step will be chosen as the search center of the next search pattern.
b) Polygon Search: This pattern consists of 10 search points and narrows the search space after processing of Octagonal-Star Search pattern. Figure 5 shows both full and sparse Polygon Search. This pattern favors horizontal motion over vertical motion, because in typical video scenes horizontal motion is dominant as compared to the vertical motion. Multiple pat-terns may be used and the search stops when the minimum SAD of the current Polygon is unchanged. The MV with min-imum SAD serves as the center for the next search pattern.
c) Small Diamond Search: At the end, a 4-point Diamond Search pattern is applied to refine the motion search.
3.2.4 Stopping Criteria Our thresholds are run-time adaptive for both power and perform-ance. The search will stop if the current SADbest (i.e. minimum SAD) is smaller than a threshold. Early termination results in en-ergy saving but care should be taken in consideration to avoid false termination. We have two different thresholds: Thresholdpred (eq. 4) for predictors and Thresholdpattern (eq. 5) for patterns. Thresholdpred = SADstationaryMB * (1+δ) * αpower (4) Thresholdpattern = SADstationaryMB * (1+γ) * αpower (5) SADstationaryMB
1 thereby is the SAD for a static MB (i.e. with zero motion). αpower (eqs. 4 and 5) is the power-scaling factor that pro-vides a tradeoff between reconstructed video quality and energy consumption. δ and γ are modulation factors to provide a tradeoff between reconstructed video quality and search speed. The values of αpower, δ, and γ are determined dynamically (see Section 4.5). 3.3 Operational Levels of Motion Estimator Combining the above explained ME Processing Stages (Section 3.2), our Motion Estimator executes at three different Operational Levels. Each level differs in its configuration of these Processing Stages. Our proposed approach is orthogonal to the number of Operational Levels. However, we have considered three levels as a compromise between the discontinuity of the motion field (af-fecting the adaptation conditions; see Figure 9) and the frequency of level switching. As each level offers a change in the visual quality, more levels will cause frequent changes in the recon-structed visual quality that may be visually uncomfortable for the user. Figure 6 shows the composition of our three Operational Le-vels. Level-1 targets high quality and operates when more energy is available. It utilizes the complete set of predictors and three search patterns. Level-2 excludes spatial neighbors from the pre-dictor set and Octagonal-Star from the search patterns. Thus it consumes less energy as compared to Level-1. Level-3 consumes the least energy of all three sets. It uses only the most probable predictors followed by Sparse Polygon and Small Diamond search patterns for final refinement.
Figure 6: Three Operational Levels of our Novel
Run-Time Adaptive Motion Estimator 4. RUN-TIME ADAPTATIONS FOR LOW-
POWER AND IMPROVED PERFORMANCE This section sheds light on the flow of Triggering Parameters and Adaptations that directly corresponds to the Trigger and Adapta-tion blocks in Figure 2. Figure 7 shows the transition between three states (i.e. Operational Levels) of the proposed Motion Esti-mator. Four constants (x, y, a, b) are considered for hysteresis to 1 SADstationaryMB is calculated in similar way as mentioned in [12].
149
avoid oscillation. The values of these constants may be selected by the application designer depending upon the application require-ment. Frequent switching between these states is undesirable since it will cause rapid variations in the reconstructed video quality, which is visually uncomfortable for the user. Moreover, oscilla-tion will result in a random motion field, which will disturb the behavior of motion-dependent adaptations (see Figure 9).
Figure 7: State Diagram showing the transition between three
Operational Levels of our Proposed Motion Estimator depending upon the Triggers and Constraints
4.1 Triggers In addition to Run-Time Monitored Parameters, the proposed ap-proach also facilitates the User-Defined Controls for enhanced flexibility. Typical user controls are Target Bit Rate (kbps), Target Frame Rate (fps), Video Resolution (e.g. CIF or QCIF) and Qual-ity Level. The values of these parameters may change at any time during the execution of the application. However, these controls will be activated from the next frame as we avoid major adapta-tions during the ME process to preserve the motion field. Addi-tionally, we monitor a set of parameters e.g. available power, Quantization Parameter (QP) value, motion field characteristics (MVs and SADs of the current and previous frame). Other pa-rameters may also be considered e.g. video frame characteristics (brightness, texture, edge map etc.) to predict the motion behavior before estimation. For example, if the video frame is less textured then Level-2 is preferred otherwise Level-1. Available power is monitored by a system call after a (configurable) period. Motion field characteristics and QP values are used for adaptation at frame-level. We will now explain different adaptations (Section 4.2 - 4.5) of Processing Stages to compose an Operational Level of Motion Estimator appropriate for the current system conditions. 4.2 Search Range and SAD Adaptation The value of Search Range depends upon the resolution of the video sequence and the type of motion (low or high). It determines the number of candidates for SAD calculation. Figure 8 shows the run-time adaptation of Search Range (Lines 2-6). We put a lower limit of 16 on Search Range to avoid high prediction errors (caused by insufficient ME). 1. setSearchRange&SAD(LEVEL) 2. If (LEVEL!=1 and SearchRange>16) Then 3. SearchRange >>=1; 4. ElseIf (SearchRange<64) Then 5. SearchRange <<=1; 6. EndIf 7. Switch(LEVEL) 8. Case Level-1: SAD = SAD_Orig; 9. Case Level-2: SAD = SAD_AltGroup4; 10. Case Level-3: SAD = SAD_AltGroup4AltRow; 11. End Switch
Figure 8: Power-Aware Search Range and SAD Selection For one 16x16 SAD computation, we need 256 subtractions, 256 absolute operations, 255 additions along with loading of 256 cur-rent and 256 reference Macroblock pixels from memory. In order to save energy for one SAD computation (reducing memory trans-
fers and computation) we exclude some pixels from SAD compu-tations when the available energy is low. In Section 3.2.1, we have introduced several pixel decimation patterns in the SAD calcula-tion. AltPixel, AltGroup4, AltRow patterns (Figure 4) reduce the number of pixels for SAD computation by 2, while AltGroup4AltRow reduces by 4 that directly corresponds to an en-ergy reduction (mainly due to reduced memory transfers). The se-lection of a particular pattern highly depends upon the video scene characteristics. Figure 8 shows the power-aware selection of SAD Decimation Pattern (Lines 7-11). 4.3 Reduction of Predictor Set A large set of predictors comes at the cost of higher energy spent for SAD calculation. For low energy/power (i.e. Level-2 & Level-3) we cut down the set of predictors. More importance is given to the highly probable predictors. Figure 6 shows the set of spatial and temporal predictors chosen for different Operational Levels. Figure 9 shows the conditions for the predictor set for early termi-nation to save energy depending upon the motion field characteris-tics. The Motion field changes with the properties of the input video sequence and thus results in adaptation at run time.
Figure 9: Conditions for Motion-Dependent Adaptations
4.4 Pattern Switching Switching to a simpler search pattern reduces not only the energy consumption, but also the computational complexity as it limits the region where an MV may be found. In case of small to me-dium motions, the impact on visual quality is not noticeable (as we will see in Section 5.1) but in case of heavier motions, there may be a little degradation in the visual quality (depending upon the input video). However, due to the low energy/power status, a user may afford a little degradation in the visual quality. 4.5 Modify Stopping Criterion In Section 3.2.4 we have introduced three factors αpower, δ, and γ for run-time adaptations (eqs. 4 and 5) in threshold calculation for an early termination of motion search. Figure 10 shows the proce-dure for setting the value of power-factor αpower. ThresholdP1 and ThresholdP2 are two normalized values w.r.t the full maximum available energy and w is the user-defined weighting factor that amplifies the threshold values to increase the number of forceful termination, thus saving energy at the cost of a little degradation in the visual quality (PSNR). 1. setAlpha4Thresholds(LEVEL) 2. Switch(LEVEL) 3. Case Level-1: αpower_L1 = 1.0; break; 4. Case Level-2: αpower_L2 = αpower_L1 + (ThresholdP1-ThresholdP2); break; 5. Case Level-3 :αpower_L3 = αpower_L2 + w*(ThresholdP2); break; 6. End Switch
Figure 10: Updating Thresholds for Stopping Criterion Modulation factors δ and γ provide a trade-off between recon-structed video quality and search speed. Initial values of δ and γ are determined by Quantization Parameter (QP; see eqs. 6 and 7). The larger the value of QP is, the larger are the values of δ and γ.
150
This is due to the following reason: when the quantization step is larger, the quantization noise is also larger and most of the details of the reconstructed image are lost. In this case, the difference be-tween the best matching block and the sub-optimal matching block becomes blurry. c1 and c2 are user-defined weights to con-trol the effect of change in QP value. If ME fails to achieve the time line, i.e. targeted frame rate (fps) encoding, then the values of δ and γ are increased at the cost of loss in PSNR (eqs. 8 and 9). δ += c1 * ((QP>20) ? (QP-20) : 0) (6) γ += c2 * ((QP>20) ? (QP-20) : 0) (7) δ += (fpstarget – fpsachieved) / (2*fpstarget) (8) γ += (fpstarget – fpsachieved) / fpstarget (9)
5. EVALUATION AND RESULTS We have compared our approach against four benchmark Motion Estimators (Full Search, UMHexagonS [12], UMHexagonS_Simple [13], EPZS [14]) for energy and video quality (PSNR). To validate our approach we have employed various QCIF and CIF resolution video sequences (typical for mobile devices) with diverse motion types [8]. Test conditions are: IPPP sequence, 1 Reference Frame, Frame Rate = 30 fps, Bit Rate = 256 kbps, w = 7, ThresholdP1 = 0.6, ThresholdP2 = 0.3 (normalized w.r.t max. available energy/power), c1 = 0.025, and c2 = 0.015. For energy and area estimation of our hardware implementation, we have used Orinoco [9] and an avail-able 90nm technology library (150 MHz). In our current architec-ture, we have considered the Motion Estimator hardware as a co-processor synthesized as an ASIC processing ME at full frame level [20], but our proposed approach is beneficial for both soft-ware (GPP, DSP) and hardware (ASIC, FPGA) implementations. 5.1 Discussion of Energy Results
Table 2: Energy and Performance Results for One Video Frame (averaged over 139 frames)
VIDEO SEQUENCES CIF QCIF Motion
Estimator Attributes Fore-man Mobile Susie Akiyo Car-
Table 2 gives the details of performance and energy results for four benchmarks and three Operational Levels of our Motion Es-timator for various video sequences. Mobile [8] is a sequence with small object motions and when compared to UMHexagonS [12] our Level-1, Level-2, Level-3 achieve an energy reduction of 16.15, 37.22, 110.61 times at the cost of 0.04, 0.05, 0.1 dB (an in-significant) loss in PSNR2 respectively. Carphone [8] is a video
2 For low-to-high bit rates, a loss of 0.5dB in PSNR results in a small
visual quality degradation (corresponding to 10% reduced bit-rate) [7].
conferencing test sequence and as compared to UMHexagonS our Level-1, Level-2, Level-3 achieve an energy reduction of 9.69, 38.11, 44.85 times at the cost of 0.05, 0.06, 0.15 dB (an insignifi-cant) loss in PSNR respectively. For all sequences, Level-3 achieves an overall energy reduction of up to 110.61 times and 40.43 times (avg. 52.42 times and 27.42 times) when compared to UMHexagonS and EPZS respectively at the cost of an average 0.39dB loss in PSNR. It shows that Level-3 provides a significant energy saving while still giving a reasonably good visual quality.
Table 3: Energy Distribution [µWs] for One Frame (averaged over 139 frames) for our 3 Operational Levels and UMHexagonS
Total 54.78 23.77 8.00 884.53 10.13 2.58 2.19 98.31
Table 3 shows the breakdown of energy consumption for 1 frame Motion Estimation averaged over 139 video frames when compar-ing 3 Operational Levels of our Motion Estimator with UM-HexagonS. We save energy from both SAD calculation and re-duced number of memory transfers that we achieve by processing less #SADs/MB (Table 2) and our pixel decimation in each SAD.
0
2000
4000
6000
8000
10000
12000
Foreman Mobile Susie Akiyo Carphone Claire MissAmerica
Ener
gy C
onsu
med
[µW
s]
Test Video Sequences
Comparison of Our Proposed 3 Levels of MotionEstimation Energy for Encoding 139 Frames
Level1: HQLevel2: MQLevel3: LQ
AllLevels having almost the same energy consumption shows that Level-1and Level-2are adaptive internally too
Figure 11: Comparison of 3 Operational Levels of our
Motion Estimator for different video sequences Let us compare the three Operational Levels: Figure 11 shows the total energy consumption for encoding 139 frames of various vid-eo sequences. Note: for Claire sequence Level-1 and Level-2 have almost same energy consumption. That is due to the fact that Lev-el-1 already adapts internally for low power due to our two tempo-ral predictors that capture a true motion field. In case of the Fore-man sequence, Level-2 and Level-3 provide 3.26 and 10.78 times energy reduction respectively as compared to Level-1.
0
20
40
60
80
100
120
140
160
180
200
1 11 21 31 41 51 61 71 81 91 101 111 121 131
Ener
gy C
onsu
med
[µW
s]
Frame Number
Energy Consumption for 139 frames of Foreman Video Sequence
Level1: HQLevel2: MQLevel3: LQ
Label-A:Level-1approaching the energy consumption of Level-2
Label-C:Level-2 approaching the energy consumption of Level-3
Label-B: Level-1having different energy consumption per frame: showing that a
Level also adapts internally
Figure 12: A Frame-wise Energy Consumption
Comparison of 3 Operational Levels of our Motion Estimator for Foreman Video Sequence
Figure 12 illustrates the variations in energy consumption at frame level for encoding 139 frames of Foreman sequence, which is cha-racterized as high motion video. Label-A highlights a noticeable case, where at frame numbers 107 and 108 the energy consump-tion of Level-1 is almost equal to that of Level-2. This validates that our Motion Estimator is not only adapting across Operational Levels but also within each Operational Level while ensuring a high visual quality. Therefore, each Operational Level needs a dif-ferent amount of energy for different types of motions in one vid-eo sequence (Label-B). This adaptivity within each Operational Level is achieved with the help of motion field properties (Section 4.3) and power-adaptive Thresholds for early termination (Section 4.5). Label-C shows a similar kind of behavior at frame number 28 for Level-2 and Level-3. 5.2 Hardware Architecture and Area Results
Figure 13: HW Architecture of our 3-Tier Motion Estimator
Figure 13 presents the hardware block diagram of our proposed Motion Estimator. Current and Reference frames are transferred from the main memory to two low-power memory modules. Note: in our current architecture, there is no cache but for other architec-tures, cache can replace the two low-power memory modules. The Motion Estimator block has an internal Address Generation Unit (AGU) that generates the addresses of current and reference MBs. The ME Logic Module provides the base addresses of current and reference MBs and the offset in terms of the candidate vector. The current and reference MB data is fetched from the two low-power memory modules and sent to the SAD Unit. The SAD Unit consti-tutes a Row SAD Module and two control units for pixel decima-tion. The decision of decimation is made by the Adaptation Unit. The ME Logic Module checks all candidates and stores the SADbest and MVbest in Registers from where it is forwarded to the output and to the Monitoring Unit. The Monitoring Unit keeps track of MV, SAD, and QP of each MB in the current and previous frames. This information is sent to the Adaptation Unit to perform a set of adaptations and the chosen Level ID is sent to the ME Logic Mod-ule that activates the corresponding Operational Level.
Table 4: Area Statistics for Our Multi-Level Motion Estimator and UMHexagonS
Our Motion Estimator UMHexagonS
Area of SAD+AGU [mm2] 0.236 0.141
Area of Adapt_ME [mm2] 0.178 0.195
Total Area [mm2] 0.414 0.336
Table 4 gives the area details of our Motion Estimator and UM-HexagonS [12]. The increase in area of SAD Unit is due to the control logic for decimation. Adapt_ME area contains ME-logic, Adaptation and Monitoring Units. The area of our Motion Estima-tor is 23% bigger than that of UMHexagonS but it pays off in terms of significant reduction in energy consumption. The leakage
energy is 0.07µWs and 3.29 µWs for our Motion Estimator and UMHexagonS respectively. Due to the increased area, we have more leakage but our Motion Estimator is much faster than UM-HexagonS and finishes the Motion Estimation before UM-HexagonS (#SADs in Table 2 directly corresponds to computation time). Due to this reason, despite of bigger area, our leakage en-ergy is much lesser than that of UMHexagonS3. 6. CONCLUSION We have presented a novel 3-tier Motion Estimator that adapts at run time depending upon the available energy budgets, user con-trols and motion field characteristics. At run time, a set of Condi-tions and Equations are evaluated on energy/performance Triggers to choose one of the three Operational Levels of our Motion Esti-mator. Our results show that for CIF videos we achieve an aver-age energy reduction of 52 times and 27 times as compared to UMHexagonS [12] and EPZS [14] respectively. This energy sav-ing comes at the cost of 0.39dB loss in PSNR (insignificantly no-ticeable in subjective video quality analysis) and 23% increase in area when compared to that of UMHexagonS. 7. REFERENCES [1] ITU-T Rec. H.264 and ISO/IEC 14496-10:2005 (E) (MPEG-4 AVC),
“Advanced video coding for generic audiovisual services”, 2005. [2] S. Yang, W. Wolf, N.Vijaykrishnan, “Power and performance analysis
of motion estimation based on hardware and software realizations”, IEEE Transactions on Computers, vol. 54, no. 6, pp. 714-726, 2005.
[3] M. G. Koziri, G. I. Stamoulis, I. X. Katsavounidis, “Power reduction in an H.264 encoder through algorithmic and logic transformations”, ISLPED, pp. 107-112, 2006.
[4] G. V. Varatkar and N. R. Shanbhag, “Energy-efficient motion estima-tion using error-tolerance”, ISLPED, pp. 113-118, 2006.
[5] S. Saponara, L. Fanucci, “Data-adaptive motion estimation algorithm and VLSI architecture design for low-power video systems”, IEEE Computers and Digital Techniques, vol. 151, pp. 51-59, 2004.
[6] Y. Song, Z. Liu, T. Ikenaga, S.Goto, “Low-power partial distortion sorting fast motion estimation algorithms and VLSI implementations”, IEIEC Transactions on Information and Systems, vol. E90-D, pp. 108-117, 2007.
[7] G. Bjontegaard, “Calculation of Average PSNR Differences Between RD-Curves”, ITU-T SG16 Doc. VCEG-M33, 2001.
[8] Xiph.org Test Media, Video Sequences: http://media.xiph.org/video/derf [9] Orinoco: http://www.chipvision.com/orinoco/index.php [10] T. Wiegand, G. J. Sullivan, G. Bjntegaard, A. Luthra, “Overview of the
H.264/AVC video coding standard”, IEEE Transaction on Circuit and Systems for Video Technology (CSVT), vol. 13, pp. 560-576, 2003.
[11] J. Ostermann et al., “Video coding with H.264/AVC: Tools, Perform-ance, and Complexity”, IEEE Circuits and Systems Magzine, vol. 4, no. 1, pp. 7-28, 2004.
[12] Z. Chen, P. Zhou, Y. He, “Fast integer pel and fractional pel motion es-timation for JVT”, JVT-F017, December, 2002.
[13] X. Yi, J. Zhang, N. Ling, W. Shang, “Improved and simplified motion estimation for JM”, JVT-P021, July, 2005.
[14] A. M. Tourapis, “Enhanced predictive zonal search for single and multi-ple frame motion estimation,” in Proc. VCIP, 2002, pp. 1069–1079.
[15] T.Koga, K. Iinuma, A.Hirano, Y. Lijima, T. Ishiguro, ”Motion compen-sated interframe coding for video conferencin”, National Telecommuni-cation Conference, pp. G.5.3.1-G.5.3.5, 1981.
[16] L. Deng, W. Gao, M.Z. Hu, Z. Z. Ji, “An efficient hardware implemen-tation for motion estimation of AVC standard”, IEEE Transactions on Consumer Electronics, vol. 51, no. 4, pp. 1360-1366, 2005.
[17] C. A. Rahman, W. Badawy, “UMHexagonS Algorithm Based Motion Estimation Architecture for H.264/AVC”, International Workshop on System-on-Chip for Real-Time Applications, pp. 207- 210, 2005.
[18] S. Momcilovic, N. Roma, L., Sousa, “An ASIP approach for adaptive AVC motion estimation”, IEEE Conference on Ph.D. Research in Mi-croelectronics and Electronics (PRIME), pp. 165-168 2007.
[19] H.Alzoubi, W. D. Pan, “Efficient global motion estimation using fixed and random subsampling patterns”, IEEE International Conference on Image Processing (ICIP), vol. 1, pp. I-477 - I-480, 2007.
[20] M. Shafique, L. Bauer, J. Henkel, “An Optimized Application Architec-ture of the H.264 Video Encoder for Application Specific Platforms”, ESTIMedia 2007, pp. 119-124.
3 Since leakage is technology dependent, the numbers for e.g. 65nm will
be different. However, our leakage results for 90nm show that we will still be far better than UMHexagonS due to faster execution.