3-tier dynamically adaptive power-aware motion estimator for h.264/AVC video encoding

3-Tier Dynamically Adaptive Power-Aware Motion Estimator for H.264/AVC Video Encoding

Muhammad Shafique, Lars Bauer, and Jörg Henkel University of Karlsruhe, Chair for Embedded Systems, Karlsruhe, Germany

{shafique, lars.bauer, henkel} @ informatik.uni-karlsruhe.de

ABSTRACT The limitation of energy in portable communication/entertainment devices necessitates the reduction of video encoding complexity. The H.264/AVC video coding standard is one of the latest video codecs and features a complex Motion Estimation scheme that ac-counts for a major part of the encoder energy [2]. We therefore present a power-aware Motion Estimator for H.264 that adapts at run time according to the available energy level. We perform a set of adaptations at different Processing Stages of Motion Estima-tion. Our results show that in case of CIF videos (typically used in portable devices; but our approach is equally applicable to other video resolutions too), we achieve an average energy reduction of 52 times and 27 times as compared to UMHexagonS [12] and EPZS [14] respectively. This energy saving comes at the cost of an average loss of only 0.39 dB in Peak Signal to Noise Ratio (PSNR: an objective quality measure) and 23% increase in area (synthesized for 90nm technology). Categories and Subject Descriptors B.7.1 [INTEGRATED CIRCUITS]: Types and Design Styles- Algorithms implemented in hardware General Terms: Performance, Design, Algorithms Keywords: H.264, MPEG-4 AVC, Motion Estimation, low power, low complexity, run-time adaptation, early termination, search pattern 1. INTRODUCTION AND BASIC IDEA In today’s mobile devices, energy/power is a critical design pa-rameter and multimedia is a major application domain in emerging mobile devices. Video encoding consumes a significant amount of processing time and energy. Encoding effort highly depends upon the characteristics of the input video sequence and the target bit rates. Under changing scenarios of input-data characteristics and available energy budgets, embedded solutions for video encoding require run-time adaptivity.

H.264/MPEG-4 AVC [1] from the Joint Video Team (JVT) of the ITU-T VCEG and ISO/IEC MPEG is one of the latest video coding standards. It provides a bit rate reduction of 50% as com-pared to MPEG-2 with similar subjective visual quality [10] but at the cost of additional computational complexity (~10x relative to MPEG-4 simple profile encoding [11]). This is mainly due to the enhanced Motion Estimation (ME) comprising variable block size ME, sub-pixel ME up to quarter pixel accuracy, and multiple ref-erence frames. As a result, the ME process may consume up to 60% (1 reference frame) and 80% (5 reference frames) of the total encoding time [12]. High computational load makes the ME mod-ule not only time consuming but also power demanding [2].

The available energy budgets may change according to various application scenarios on mobile devices. Although there has been extensive research on low-power ME [3, 4, 16], related work does not provide run-time adaptivity. In particular, state-of-the-art low-

power ME approaches focus either on modifying the Sum of Abso-lute Differences (SAD) formula [3, 4] or on low-power hardware implementations [16]. [5] exploits input-data variations to dy-namically configure the search-window size of Full Search but does not consider the changing levels of energy/power. Moreover, it targets Full Search which is far more energy consuming than state-of-the-art Motion Estimators. Summarizing, state-of-the-art low-power Motion Estimators offer a fixed low-power solution that may work well for short duration encodings of low-resolution images with low motion characteristics. These approaches do not perform well in case of rapid motions and long duration real-time encodings and ultimately are less energy/power efficient. Varying motion types and changing status of available energy budgets sti-mulate the need for a run-time adaptive power-aware Motion Es-timator while exhibiting minimal loss in the video quality (PSNR).

In this paper, we propose a novel run-time adaptive power-aware Motion Estimator to cope the run-time changing scenarios of available energy budgets while keeping a good video quality. We achieve this by means of multiple Operational Levels, be-tween which our novel Motion Estimator dynamically switches. Each level constitutes multiple ME stages (as we will see in Sec-tion 3) where each stage is also run-time adaptive to save energy using motion field characteristics and power-aware termination criteria (see Section 4). Besides the three Operational Levels of ME, we propose a set of Conditions and Equations to activate run-time adaptations and level switching for changing scenarios of en-ergy/power and performance. We have benchmarked the hardware implementation of our Motion Estimator against four state-of-the-art Motion Estimators for various test video sequences. We will now present a simple example (corroborated with results) to show how we reduce the energy requirements of ME.

Test Conditions: A mobile device scenario is considered where an H.264 video encoder is encoding a 30 fps Quarter Common In-termediate Format (QCIF) Carphone [8] test video at 128 kbps. We have used the Orinoco tool [9] and an available 90 nm tech-nology library (Frequency: 150 MHz) for energy estimation.

Figure 1: Switch Search Pattern from UMHexagonS to TSS

Example: Reduction in the total number of computed SADs: The Sum of Absolute Differences (SAD) is the basic processing block in the ME process and is used to calculate the block distortion for a current Macroblock (MB) w.r.t. a reference MB. For one MB in the current frame (Ft), various SADs are computed using MBs from the reference frame (e.g. immediately previous Ft-1). The higher the number of SADs/MB, the higher is the energy con-sumption. Figure 1 shows the implementation of UMHexagonS [12] and Three-Step Search (TSS) [15]. For the above-mentioned test conditions, UMHexagonS requires on average 45 SADs/MB. Once the available energy drops below a particular level, we may switch the search pattern from UMHexagonS to TSS, which is rel-atively light and only requires 25 SADs/MB. Table 1 shows that this switching of search pattern results in a reduction of total en-ergy from 98.15 µWs to 54.20 µWs for 1 QCIF video frame.

Permission to make digital or hard copies of all or part of this work for per-sonal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISLPED’08, August 11–13, 2008, Bangalore, India. Copyright 2008 ACM 978-1-60558-109-5/08/08...$5.00.

147

https://www.researchgate.net/publication/220597603_Overview_of_the_H264AVC_video_coding_standard?el=1_x_8&enrichId=rgreq-bf4b9f64e528b39915de73acef76cacd-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0Njg0NztBUzo5ODY5MTY3NTMyODUyNEAxNDAwNTQxMzM3ODQ3

https://www.researchgate.net/publication/237128290_Fast_Integer_Pel_and_Fractional_Pel_Motion_Estimation_for_JVT?el=1_x_8&enrichId=rgreq-bf4b9f64e528b39915de73acef76cacd-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0Njg0NztBUzo5ODY5MTY3NTMyODUyNEAxNDAwNTQxMzM3ODQ3


https://www.researchgate.net/publication/3181250_An_efficient_hardware_implementation_for_motion_estimation_of_AVC_standard?el=1_x_8&enrichId=rgreq-bf4b9f64e528b39915de73acef76cacd-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0Njg0NztBUzo5ODY5MTY3NTMyODUyNEAxNDAwNTQxMzM3ODQ3


https://www.researchgate.net/publication/243658911_Motion-compensated_interframe_coding_for_video_coferencing?el=1_x_8&enrichId=rgreq-bf4b9f64e528b39915de73acef76cacd-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0Njg0NztBUzo5ODY5MTY3NTMyODUyNEAxNDAwNTQxMzM3ODQ3

Table 1: Energy Reduction due to Pattern Switching for Carphone Video Sequence

Full Search

UMHexagonS [12]

Three-StepSearch [15]

Complexity [# SADs/Frame] 107811 4424 2475 PSNR [dB] 40.2 40.2 39.9 Energy/Frame (µWs) 2361.06 98.15 54.20

Recapitulating, our novel multi-level Motion Estimator that in-corporates power-aware adaptations at different Processing Stages of ME process (Section 3 and 4) is crucial. We will show an aver-age energy reduction of 52 times and 27 times as compared to UMHexagonS [12] and EPZS [14] respectively at the cost of an average loss of 0.39 dB in PSNR (Section 5). Our novel contributions: • A 3-tier low-power Motion Estimator that adapts at run-time

depending upon the changing scenarios of the available en-ergy budget, motion field characteristics and performance (Section 3 and 4).

• A set of Conditions and Equations along with the Triggering Parameters for run-time Adaptations at each Processing Stage of Motion Estimator (Section 4).

• A set of Patterns (Octagon-Star and Polygon), power-aware dynamically adaptive Thresholds for early termination, Tem-poral Median Predictors, and a AltGroupAltRow pixel deci-mation pattern for SAD computation (Section 3).

In Section 5, we present the hardware of our novel Motion Estima-tor and benchmark it for energy/performance/area results against four state-of-the-art Motion Estimators using various video se-quences. We conclude in Section 6. 2. RELATED WORK Many fast block-matching algorithms have been focused on the search strategies with predictor set, early termination criteria, and search patterns, such as Three-Step Search (TSS) [15], Unsymmet-rical-cross Multi-Hexagon-grid Search (UMHexagonS) [12], sim-ple UMHexagonS [13], Enhanced Predictive Zonal Search (EPZS) [14], etc. However, these algorithms only consider per-formance and visual quality but not energy/power, and compute a large number of SADs/MB for Motion Estimation.

[16, 17] provide various VLSI implementations to expedite the process of H.264 Motion Estimation. Most of these hardware im-plementations are either suited for Full Search or UMHexagonS. An ASIP-based approach is considered in [18] but it uses only spatial predictors and does not consider temporal information of the motion field and takes longer to find the optimal MV in case of heavy angular motions. [19] explores fixed and random sub-sampling patterns for computation reduction in case of Global Motion Estimation. [6] focuses on reducing the power by eliminat-ing candidates using partial distortion sorting. Partial distortion sorting is itself an overhead and it excludes only a set of candi-dates from Full Search, which still results in a much larger num-ber of candidates. [4] presents a Motion Estimation scheme based on algorithmic noise tolerance. It uses an estimator based on input sub-sampling but this approach results in degradation especially in case of videos with small objects.

Our work is different from the previous approaches in several aspects: In contrast to [3-6], our Motion Estimator handles the en-ergy/power issue using three dynamically selectable Operational Levels, each with different energy consumption and different vis-ual quality. It makes our Motion Estimator adaptive for both en-ergy/power and performance that has not been targeted by others before. Unlike [18], our Octagonal-Star Search pattern captures the large angular/irregular motions and Polygon Search pattern re-fines the motion search in close vicinity. Unlike [12, 14], our threshold scheme for early termination is run-time adaptive for both energy/power and performance.

3. OUR POWER-AWARE MOTION ESTIMATOR

Figure 2: Overview of our Proposed Scheme of

Run-Time Power-Aware Motion Estimator We have performed a detailed energy analysis of state-of-the-art Motion Estimators. We found that these Motion Estimators (al-though providing a good video quality) do not consider changing energy budgets. Our proposed 3-tier Motion Estimator has 3 Op-erational Levels, each composed of multiple Processing Stages. Figure 2 presents the methodology of our run-time adaptive Mo-tion Estimator considering the available energy budget and per-formance constraints. A set of parameters is passed to the Adapta-tion block. Our Motion Estimator tracks the motion field charac-teristics (Monitoring block) and passes this information as a feedback to the Trigger block. The Trigger block gathers en-ergy/power, motion field, and user control information and for-wards it to the Adaptation block. Section 4 sheds more light on the Trigger, Monitoring, and Adaptation blocks with a state transition diagram of the Operational Levels. Now we will explain the com-plete flow of our proposed Motion Estimator and the constituting Processing Stages of three Operational Levels. 3.1 Flow of the Proposed Motion Estimator 1. // For Each Video Frame 2. powerLevel = getBatteryPower(); 3. LEVEL = setMELevel(powerLevel, bitRate); //Figure 7 4. setSearchRange&SAD(LEVEL); // Section 4.2 5. Thresholdpred = setThresholdspred(LEVEL); // Section 4.5 6. Thresholdpattern = setThresholdspattern(LEVEL); // Section 4.5 7. IntegerPixelMotionEstimation: 8. INPUT: currentMB, referenceMB, referenceFrame,

blockType, bitRate, powerLevel, SearchRange 9. OUTPUT: MVbest, SADbest 10. BEGIN 11. For all Macroblocks in the Video Frame 12. Predictors = selectPredictorSet(LEVEL); // Figure 6 13. // Check the Predictor Conditions for motion-dependent adaptations 14. RemainingPreds = checkPredConds(Predictors); // Figure 9 15. PerformSearch(RemainingPreds, Thresholdpred); 16. // Choose and Process an appropriate Search Pattern

depending upon the available energy budget 17. For all Patterns in the selected LEVEL 18. PerformSearch(pattern, Thresholdpattern); // Figure 5 19. EndFor 20. Store SADbest and MVbest obtained from Line 13-18; 21. EndFor 22. END

Figure 3: Complete Flow of our proposed Motion Estimator Figure 3 shows the flow of our Motion Estimator. ME is per-formed at frame level [20]. Before starting ME, the Level is de-termined and initialized (Lines 2-6). Inside the ME loop, the in-formation of motion field is used for adaptation at Macroblock-level (Lines 12 and 13). Afterwards, the selected predictors and search patterns are processed depending upon the chosen Opera-tional Level (Lines 14-18). At each step (Lines 14 and 17), the cor-responding Thresholds are checked for early termination.

148

https://www.researchgate.net/publication/2491226_Enhanced_Predictive_Zonal_Search_for_Single_and_Multiple_Frame_Motion_Estimation?el=1_x_8&enrichId=rgreq-bf4b9f64e528b39915de73acef76cacd-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0Njg0NztBUzo5ODY5MTY3NTMyODUyNEAxNDAwNTQxMzM3ODQ3


https://www.researchgate.net/publication/4298003_An_ASIP_approach_for_adaptive_AVC_motion_estimation?el=1_x_8&enrichId=rgreq-bf4b9f64e528b39915de73acef76cacd-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0Njg0NztBUzo5ODY5MTY3NTMyODUyNEAxNDAwNTQxMzM3ODQ3




https://www.researchgate.net/publication/4287778_An_Optimized_Application_Architecture_of_the_H264_Video_Encoder_for_Application_Specific_Platforms?el=1_x_8&enrichId=rgreq-bf4b9f64e528b39915de73acef76cacd-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0Njg0NztBUzo5ODY5MTY3NTMyODUyNEAxNDAwNTQxMzM3ODQ3


https://www.researchgate.net/publication/4288705_Efficient_Global_Motion_Estimation_using_Fixed_and_Random_Subsampling_Patterns?el=1_x_8&enrichId=rgreq-bf4b9f64e528b39915de73acef76cacd-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0Njg0NztBUzo5ODY5MTY3NTMyODUyNEAxNDAwNTQxMzM3ODQ3


https://www.researchgate.net/publication/4186700_UMHexagonS_algorithm_based_motion_estimation_architecture_for_H264AVC?el=1_x_8&enrichId=rgreq-bf4b9f64e528b39915de73acef76cacd-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0Njg0NztBUzo5ODY5MTY3NTMyODUyNEAxNDAwNTQxMzM3ODQ3

3.2 Processing Stages of our Motion Estimator 3.2.1 Matching Criterion (SAD) Decimation Patterns

Figure 4: Pixel-Decimation Patterns for SAD Computation

A technique that we use to reduce the energy of Motion Estima-tion (ME) is block matching with pixel decimation. For block matching, the matching criterion (SAD) is evaluated using every pixel of the MB. Figure 4 shows four decimation patterns consid-ered for evaluation. AltPixel and AltGroup4 are beneficial for high-textured videos and reduce the energy but these are not very beneficial for cache-based architectures because the data is al-ready in the cache. On the other hand, AltRow and AltGroup4AltRow are beneficial for cache-based architectures as they skip row by row. Therefore, we have considered AltGroup4 for Level-2 to counter the issue of heavy-textured videos and AltGroup4AltRow for Level-3 to obtain a significant energy reduc-tion even for cache-based architectures. These patterns scale down accordingly for different block modes in H.264. 3.2.2 Initial Search Point Prediction Initial Search Point Prediction provides a good starting point to converge quickly to the near-optimal Motion Vector (MV). Based on the assumption that the blocks of an object move together in the direction of the object motion, spatial and temporal neighbors are considered as good predictor candidates. Therefore, MV of the current MB is highly correlated with MVs of the spatially and temporally adjacent MBs that have been previously calculated. Our predictor set is: PredictorsSpatial = {MVzero, MVLeft , MVTop, MVTop-Left, MVTop-Right, Medianspatial (eq. 1)}, PredictorsTemporal = {MVcollocated, Mediantemporal1 (eq. 2), Mediantemporal2 (eq. 3)}. MVcol-located is the MV of the collocated MB in the previous frame (Ft-1). Medianspatial = median(MVL, MVT, MVTR)Ft (1) Mediantemporal1 = median(MVL, MVT, MVTR)Ft-1 (2) Mediantemporal2 = median(MVR, MVD, MVDR)Ft-1 (3) 3.2.3 Search Patterns

Figure 5: Four Search Patterns used in our Motion Estimator The search starts by choosing the best MV from the predictor set as the search center and evaluates the following search patterns: a) Octagonal-Star Search: This pattern consists of 20 search

points and handles large irregular motion cases. Figure 5 shows an Octagonal-Star Search pattern executing at a dis-tance of 8 pixels from the search center. For small-medium motion and CIF/QCIF video sequences only one Octagonal-Star Search pattern is processed. For heavier motions or high resolutions (D-1, HD), multiple Octagonal-Star Search pat-terns may be executed (each extended by a pixel distance of 12) until the minimum SAD is less than or equal to Thresholdpattern. The MV with minimum SAD in this step will be chosen as the search center of the next search pattern.

b) Polygon Search: This pattern consists of 10 search points and narrows the search space after processing of Octagonal-Star Search pattern. Figure 5 shows both full and sparse Polygon Search. This pattern favors horizontal motion over vertical motion, because in typical video scenes horizontal motion is dominant as compared to the vertical motion. Multiple pat-terns may be used and the search stops when the minimum SAD of the current Polygon is unchanged. The MV with min-imum SAD serves as the center for the next search pattern.

c) Small Diamond Search: At the end, a 4-point Diamond Search pattern is applied to refine the motion search.

3.2.4 Stopping Criteria Our thresholds are run-time adaptive for both power and perform-ance. The search will stop if the current SADbest (i.e. minimum SAD) is smaller than a threshold. Early termination results in en-ergy saving but care should be taken in consideration to avoid false termination. We have two different thresholds: Thresholdpred (eq. 4) for predictors and Thresholdpattern (eq. 5) for patterns. Thresholdpred = SADstationaryMB * (1+δ) * αpower (4) Thresholdpattern = SADstationaryMB * (1+γ) * αpower (5) SADstationaryMB

1 thereby is the SAD for a static MB (i.e. with zero motion). αpower (eqs. 4 and 5) is the power-scaling factor that pro-vides a tradeoff between reconstructed video quality and energy consumption. δ and γ are modulation factors to provide a tradeoff between reconstructed video quality and search speed. The values of αpower, δ, and γ are determined dynamically (see Section 4.5). 3.3 Operational Levels of Motion Estimator Combining the above explained ME Processing Stages (Section 3.2), our Motion Estimator executes at three different Operational Levels. Each level differs in its configuration of these Processing Stages. Our proposed approach is orthogonal to the number of Operational Levels. However, we have considered three levels as a compromise between the discontinuity of the motion field (af-fecting the adaptation conditions; see Figure 9) and the frequency of level switching. As each level offers a change in the visual quality, more levels will cause frequent changes in the recon-structed visual quality that may be visually uncomfortable for the user. Figure 6 shows the composition of our three Operational Le-vels. Level-1 targets high quality and operates when more energy is available. It utilizes the complete set of predictors and three search patterns. Level-2 excludes spatial neighbors from the pre-dictor set and Octagonal-Star from the search patterns. Thus it consumes less energy as compared to Level-1. Level-3 consumes the least energy of all three sets. It uses only the most probable predictors followed by Sparse Polygon and Small Diamond search patterns for final refinement.

Figure 6: Three Operational Levels of our Novel

Run-Time Adaptive Motion Estimator 4. RUN-TIME ADAPTATIONS FOR LOW-

POWER AND IMPROVED PERFORMANCE This section sheds light on the flow of Triggering Parameters and Adaptations that directly corresponds to the Trigger and Adapta-tion blocks in Figure 2. Figure 7 shows the transition between three states (i.e. Operational Levels) of the proposed Motion Esti-mator. Four constants (x, y, a, b) are considered for hysteresis to 1 SADstationaryMB is calculated in similar way as mentioned in [12].

149

avoid oscillation. The values of these constants may be selected by the application designer depending upon the application require-ment. Frequent switching between these states is undesirable since it will cause rapid variations in the reconstructed video quality, which is visually uncomfortable for the user. Moreover, oscilla-tion will result in a random motion field, which will disturb the behavior of motion-dependent adaptations (see Figure 9).

Figure 7: State Diagram showing the transition between three

Operational Levels of our Proposed Motion Estimator depending upon the Triggers and Constraints

4.1 Triggers In addition to Run-Time Monitored Parameters, the proposed ap-proach also facilitates the User-Defined Controls for enhanced flexibility. Typical user controls are Target Bit Rate (kbps), Target Frame Rate (fps), Video Resolution (e.g. CIF or QCIF) and Qual-ity Level. The values of these parameters may change at any time during the execution of the application. However, these controls will be activated from the next frame as we avoid major adapta-tions during the ME process to preserve the motion field. Addi-tionally, we monitor a set of parameters e.g. available power, Quantization Parameter (QP) value, motion field characteristics (MVs and SADs of the current and previous frame). Other pa-rameters may also be considered e.g. video frame characteristics (brightness, texture, edge map etc.) to predict the motion behavior before estimation. For example, if the video frame is less textured then Level-2 is preferred otherwise Level-1. Available power is monitored by a system call after a (configurable) period. Motion field characteristics and QP values are used for adaptation at frame-level. We will now explain different adaptations (Section 4.2 - 4.5) of Processing Stages to compose an Operational Level of Motion Estimator appropriate for the current system conditions. 4.2 Search Range and SAD Adaptation The value of Search Range depends upon the resolution of the video sequence and the type of motion (low or high). It determines the number of candidates for SAD calculation. Figure 8 shows the run-time adaptation of Search Range (Lines 2-6). We put a lower limit of 16 on Search Range to avoid high prediction errors (caused by insufficient ME). 1. setSearchRange&SAD(LEVEL) 2. If (LEVEL!=1 and SearchRange>16) Then 3. SearchRange >>=1; 4. ElseIf (SearchRange<64) Then 5. SearchRange <<=1; 6. EndIf 7. Switch(LEVEL) 8. Case Level-1: SAD = SAD_Orig; 9. Case Level-2: SAD = SAD_AltGroup4; 10. Case Level-3: SAD = SAD_AltGroup4AltRow; 11. End Switch

Figure 8: Power-Aware Search Range and SAD Selection For one 16x16 SAD computation, we need 256 subtractions, 256 absolute operations, 255 additions along with loading of 256 cur-rent and 256 reference Macroblock pixels from memory. In order to save energy for one SAD computation (reducing memory trans-

fers and computation) we exclude some pixels from SAD compu-tations when the available energy is low. In Section 3.2.1, we have introduced several pixel decimation patterns in the SAD calcula-tion. AltPixel, AltGroup4, AltRow patterns (Figure 4) reduce the number of pixels for SAD computation by 2, while AltGroup4AltRow reduces by 4 that directly corresponds to an en-ergy reduction (mainly due to reduced memory transfers). The se-lection of a particular pattern highly depends upon the video scene characteristics. Figure 8 shows the power-aware selection of SAD Decimation Pattern (Lines 7-11). 4.3 Reduction of Predictor Set A large set of predictors comes at the cost of higher energy spent for SAD calculation. For low energy/power (i.e. Level-2 & Level-3) we cut down the set of predictors. More importance is given to the highly probable predictors. Figure 6 shows the set of spatial and temporal predictors chosen for different Operational Levels. Figure 9 shows the conditions for the predictor set for early termi-nation to save energy depending upon the motion field characteris-tics. The Motion field changes with the properties of the input video sequence and thus results in adaptation at run time.

Figure 9: Conditions for Motion-Dependent Adaptations

4.4 Pattern Switching Switching to a simpler search pattern reduces not only the energy consumption, but also the computational complexity as it limits the region where an MV may be found. In case of small to me-dium motions, the impact on visual quality is not noticeable (as we will see in Section 5.1) but in case of heavier motions, there may be a little degradation in the visual quality (depending upon the input video). However, due to the low energy/power status, a user may afford a little degradation in the visual quality. 4.5 Modify Stopping Criterion In Section 3.2.4 we have introduced three factors αpower, δ, and γ for run-time adaptations (eqs. 4 and 5) in threshold calculation for an early termination of motion search. Figure 10 shows the proce-dure for setting the value of power-factor αpower. ThresholdP1 and ThresholdP2 are two normalized values w.r.t the full maximum available energy and w is the user-defined weighting factor that amplifies the threshold values to increase the number of forceful termination, thus saving energy at the cost of a little degradation in the visual quality (PSNR). 1. setAlpha4Thresholds(LEVEL) 2. Switch(LEVEL) 3. Case Level-1: αpower_L1 = 1.0; break; 4. Case Level-2: αpower_L2 = αpower_L1 + (ThresholdP1-ThresholdP2); break; 5. Case Level-3 :αpower_L3 = αpower_L2 + w*(ThresholdP2); break; 6. End Switch

Figure 10: Updating Thresholds for Stopping Criterion Modulation factors δ and γ provide a trade-off between recon-structed video quality and search speed. Initial values of δ and γ are determined by Quantization Parameter (QP; see eqs. 6 and 7). The larger the value of QP is, the larger are the values of δ and γ.

150

This is due to the following reason: when the quantization step is larger, the quantization noise is also larger and most of the details of the reconstructed image are lost. In this case, the difference be-tween the best matching block and the sub-optimal matching block becomes blurry. c1 and c2 are user-defined weights to con-trol the effect of change in QP value. If ME fails to achieve the time line, i.e. targeted frame rate (fps) encoding, then the values of δ and γ are increased at the cost of loss in PSNR (eqs. 8 and 9). δ += c1 * ((QP>20) ? (QP-20) : 0) (6) γ += c2 * ((QP>20) ? (QP-20) : 0) (7) δ += (fpstarget – fpsachieved) / (2*fpstarget) (8) γ += (fpstarget – fpsachieved) / fpstarget (9)

5. EVALUATION AND RESULTS We have compared our approach against four benchmark Motion Estimators (Full Search, UMHexagonS [12], UMHexagonS_Simple [13], EPZS [14]) for energy and video quality (PSNR). To validate our approach we have employed various QCIF and CIF resolution video sequences (typical for mobile devices) with diverse motion types [8]. Test conditions are: IPPP sequence, 1 Reference Frame, Frame Rate = 30 fps, Bit Rate = 256 kbps, w = 7, ThresholdP1 = 0.6, ThresholdP2 = 0.3 (normalized w.r.t max. available energy/power), c1 = 0.025, and c2 = 0.015. For energy and area estimation of our hardware implementation, we have used Orinoco [9] and an avail-able 90nm technology library (150 MHz). In our current architec-ture, we have considered the Motion Estimator hardware as a co-processor synthesized as an ASIC processing ME at full frame level [20], but our proposed approach is beneficial for both soft-ware (GPP, DSP) and hardware (ASIC, FPGA) implementations. 5.1 Discussion of Energy Results

Table 2: Energy and Performance Results for One Video Frame (averaged over 139 frames)

VIDEO SEQUENCES CIF QCIF Motion

Estimator Attributes Fore-man Mobile Susie Akiyo Car-

phone ClaireMiss

Amer-ica

#SAD 431244 431244 431244 431244 107811 107811 107811PSNR[dB] 33.91 24.59 40.31 42.25 40.16 46.33 44.30

Full Search

Energy [µWs] 9444.24 9444.24 9444.24 9444.24 2361.06 2361.06 2361.06#SAD 21098 39874 12504 4829 4424 1358 1807 PSNR[dB] 33.87 24.59 40.24 42.25 40.16 46.32 44.30

UM-HexagonS

[12] Energy [µWs] 468.03 884.53 227.38 107.11 98.15 30.13 40.09 #SAD 29605 45022 10534 3978 3892 1109 1232 PSNR[dB] 33.78 24.59 40.18 42.21 40.15 46.08 44.18

UM-HexagonS

Simple [13] Energy [µWs] 656.74 998.74 233.68 88.25 86.34 24.60 27.33

#SAD 10106 14575 7792 4388 2194 797 1119 PSNR[dB] 33.86 24.58 40.29 42.24 40.16 46.33 44.27 EPZS [14]

Energy [µWs] 224.18 323.32 172.85 97.34 48.67 17.68 24.82

#SAD 4954 3305 3331 714 611 158 264 PSNR[dB] 33.45 24.55 39.85 42.22 40.11 46.23 44.20 Level1:HQ

Energy [µWs] 82.09 54.78 55.20 11.83 10.13 2.61 4.37 #SAD 1988 2098 1251 614 227 155 227

PSNR[dB] 33.34 24.54 39.7 42.21 40.10 46.29 44.28 Level2:MQ

Energy [µWs] 22.53 23.77 14.17 6.95 2.58 1.76 2.58

#SAD 1155 1351 946 597 370 154 225

PSNR[dB] 32.72 24.49 39.18 42.20 40.01 46.31 44.20 Level3:LQ

Energy [µWs] 6.84 8.00 5.60 3.54 2.19 0.91 1.33

Table 2 gives the details of performance and energy results for four benchmarks and three Operational Levels of our Motion Es-timator for various video sequences. Mobile [8] is a sequence with small object motions and when compared to UMHexagonS [12] our Level-1, Level-2, Level-3 achieve an energy reduction of 16.15, 37.22, 110.61 times at the cost of 0.04, 0.05, 0.1 dB (an in-significant) loss in PSNR2 respectively. Carphone [8] is a video

2 For low-to-high bit rates, a loss of 0.5dB in PSNR results in a small

visual quality degradation (corresponding to 10% reduced bit-rate) [7].

conferencing test sequence and as compared to UMHexagonS our Level-1, Level-2, Level-3 achieve an energy reduction of 9.69, 38.11, 44.85 times at the cost of 0.05, 0.06, 0.15 dB (an insignifi-cant) loss in PSNR respectively. For all sequences, Level-3 achieves an overall energy reduction of up to 110.61 times and 40.43 times (avg. 52.42 times and 27.42 times) when compared to UMHexagonS and EPZS respectively at the cost of an average 0.39dB loss in PSNR. It shows that Level-3 provides a significant energy saving while still giving a reasonably good visual quality.

Table 3: Energy Distribution [µWs] for One Frame (averaged over 139 frames) for our 3 Operational Levels and UMHexagonS

Mobile (CIF) Carphone (QCIF) Our

Level-1Our

Level-2Our

Level-3UMHexa-gonS [12]

Our Level-1

Our Level-2

Our Level-3

UMHexa-gonS [12]

SAD_row 22.00 12.65 4.08 119.34 4.07 1.37 1.12 13.24 SAD_control 0.77 0.49 0.21 12.01 0.14 0.05 0.06 1.33 ME_control 1.15 0.73 0.51 11.36 0.21 0.08 0.14 1.26 Memory 30.86 9.90 3.21 741.80 5.71 1.07 0.88 82.31

Total 54.78 23.77 8.00 884.53 10.13 2.58 2.19 98.31

Table 3 shows the breakdown of energy consumption for 1 frame Motion Estimation averaged over 139 video frames when compar-ing 3 Operational Levels of our Motion Estimator with UM-HexagonS. We save energy from both SAD calculation and re-duced number of memory transfers that we achieve by processing less #SADs/MB (Table 2) and our pixel decimation in each SAD.

0

2000

4000

6000

8000

10000

12000

Foreman Mobile Susie Akiyo Carphone Claire MissAmerica

Ener

gy C

onsu

med

[µW

s]

Test Video Sequences

Comparison of Our Proposed 3 Levels of MotionEstimation Energy for Encoding 139 Frames

Level1: HQLevel2: MQLevel3: LQ

AllLevels having almost the same energy consumption shows that Level-1and Level-2are adaptive internally too

Figure 11: Comparison of 3 Operational Levels of our

Motion Estimator for different video sequences Let us compare the three Operational Levels: Figure 11 shows the total energy consumption for encoding 139 frames of various vid-eo sequences. Note: for Claire sequence Level-1 and Level-2 have almost same energy consumption. That is due to the fact that Lev-el-1 already adapts internally for low power due to our two tempo-ral predictors that capture a true motion field. In case of the Fore-man sequence, Level-2 and Level-3 provide 3.26 and 10.78 times energy reduction respectively as compared to Level-1.

0

20

40

60

80

100

120

140

160

180

200

1 11 21 31 41 51 61 71 81 91 101 111 121 131

Ener

gy C

onsu

med

[µW

s]

Frame Number

Energy Consumption for 139 frames of Foreman Video Sequence

Level1: HQLevel2: MQLevel3: LQ

Label-A:Level-1approaching the energy consumption of Level-2

Label-C:Level-2 approaching the energy consumption of Level-3

Label-B: Level-1having different energy consumption per frame: showing that a

Level also adapts internally

Figure 12: A Frame-wise Energy Consumption

Comparison of 3 Operational Levels of our Motion Estimator for Foreman Video Sequence

151





Figure 12 illustrates the variations in energy consumption at frame level for encoding 139 frames of Foreman sequence, which is cha-racterized as high motion video. Label-A highlights a noticeable case, where at frame numbers 107 and 108 the energy consump-tion of Level-1 is almost equal to that of Level-2. This validates that our Motion Estimator is not only adapting across Operational Levels but also within each Operational Level while ensuring a high visual quality. Therefore, each Operational Level needs a dif-ferent amount of energy for different types of motions in one vid-eo sequence (Label-B). This adaptivity within each Operational Level is achieved with the help of motion field properties (Section 4.3) and power-adaptive Thresholds for early termination (Section 4.5). Label-C shows a similar kind of behavior at frame number 28 for Level-2 and Level-3. 5.2 Hardware Architecture and Area Results

Figure 13: HW Architecture of our 3-Tier Motion Estimator

Figure 13 presents the hardware block diagram of our proposed Motion Estimator. Current and Reference frames are transferred from the main memory to two low-power memory modules. Note: in our current architecture, there is no cache but for other architec-tures, cache can replace the two low-power memory modules. The Motion Estimator block has an internal Address Generation Unit (AGU) that generates the addresses of current and reference MBs. The ME Logic Module provides the base addresses of current and reference MBs and the offset in terms of the candidate vector. The current and reference MB data is fetched from the two low-power memory modules and sent to the SAD Unit. The SAD Unit consti-tutes a Row SAD Module and two control units for pixel decima-tion. The decision of decimation is made by the Adaptation Unit. The ME Logic Module checks all candidates and stores the SADbest and MVbest in Registers from where it is forwarded to the output and to the Monitoring Unit. The Monitoring Unit keeps track of MV, SAD, and QP of each MB in the current and previous frames. This information is sent to the Adaptation Unit to perform a set of adaptations and the chosen Level ID is sent to the ME Logic Mod-ule that activates the corresponding Operational Level.

Table 4: Area Statistics for Our Multi-Level Motion Estimator and UMHexagonS

Our Motion Estimator UMHexagonS

Area of SAD+AGU [mm2] 0.236 0.141

Area of Adapt_ME [mm2] 0.178 0.195

Total Area [mm2] 0.414 0.336

Table 4 gives the area details of our Motion Estimator and UM-HexagonS [12]. The increase in area of SAD Unit is due to the control logic for decimation. Adapt_ME area contains ME-logic, Adaptation and Monitoring Units. The area of our Motion Estima-tor is 23% bigger than that of UMHexagonS but it pays off in terms of significant reduction in energy consumption. The leakage

energy is 0.07µWs and 3.29 µWs for our Motion Estimator and UMHexagonS respectively. Due to the increased area, we have more leakage but our Motion Estimator is much faster than UM-HexagonS and finishes the Motion Estimation before UM-HexagonS (#SADs in Table 2 directly corresponds to computation time). Due to this reason, despite of bigger area, our leakage en-ergy is much lesser than that of UMHexagonS3. 6. CONCLUSION We have presented a novel 3-tier Motion Estimator that adapts at run time depending upon the available energy budgets, user con-trols and motion field characteristics. At run time, a set of Condi-tions and Equations are evaluated on energy/performance Triggers to choose one of the three Operational Levels of our Motion Esti-mator. Our results show that for CIF videos we achieve an aver-age energy reduction of 52 times and 27 times as compared to UMHexagonS [12] and EPZS [14] respectively. This energy sav-ing comes at the cost of 0.39dB loss in PSNR (insignificantly no-ticeable in subjective video quality analysis) and 23% increase in area when compared to that of UMHexagonS. 7. REFERENCES [1] ITU-T Rec. H.264 and ISO/IEC 14496-10:2005 (E) (MPEG-4 AVC),

“Advanced video coding for generic audiovisual services”, 2005. [2] S. Yang, W. Wolf, N.Vijaykrishnan, “Power and performance analysis

of motion estimation based on hardware and software realizations”, IEEE Transactions on Computers, vol. 54, no. 6, pp. 714-726, 2005.

[3] M. G. Koziri, G. I. Stamoulis, I. X. Katsavounidis, “Power reduction in an H.264 encoder through algorithmic and logic transformations”, ISLPED, pp. 107-112, 2006.

[4] G. V. Varatkar and N. R. Shanbhag, “Energy-efficient motion estima-tion using error-tolerance”, ISLPED, pp. 113-118, 2006.

[5] S. Saponara, L. Fanucci, “Data-adaptive motion estimation algorithm and VLSI architecture design for low-power video systems”, IEEE Computers and Digital Techniques, vol. 151, pp. 51-59, 2004.

[6] Y. Song, Z. Liu, T. Ikenaga, S.Goto, “Low-power partial distortion sorting fast motion estimation algorithms and VLSI implementations”, IEIEC Transactions on Information and Systems, vol. E90-D, pp. 108-117, 2007.

[7] G. Bjontegaard, “Calculation of Average PSNR Differences Between RD-Curves”, ITU-T SG16 Doc. VCEG-M33, 2001.

[8] Xiph.org Test Media, Video Sequences: http://media.xiph.org/video/derf [9] Orinoco: http://www.chipvision.com/orinoco/index.php [10] T. Wiegand, G. J. Sullivan, G. Bjntegaard, A. Luthra, “Overview of the

H.264/AVC video coding standard”, IEEE Transaction on Circuit and Systems for Video Technology (CSVT), vol. 13, pp. 560-576, 2003.

[11] J. Ostermann et al., “Video coding with H.264/AVC: Tools, Perform-ance, and Complexity”, IEEE Circuits and Systems Magzine, vol. 4, no. 1, pp. 7-28, 2004.

[12] Z. Chen, P. Zhou, Y. He, “Fast integer pel and fractional pel motion es-timation for JVT”, JVT-F017, December, 2002.

[13] X. Yi, J. Zhang, N. Ling, W. Shang, “Improved and simplified motion estimation for JM”, JVT-P021, July, 2005.

[14] A. M. Tourapis, “Enhanced predictive zonal search for single and multi-ple frame motion estimation,” in Proc. VCIP, 2002, pp. 1069–1079.

[15] T.Koga, K. Iinuma, A.Hirano, Y. Lijima, T. Ishiguro, ”Motion compen-sated interframe coding for video conferencin”, National Telecommuni-cation Conference, pp. G.5.3.1-G.5.3.5, 1981.

[16] L. Deng, W. Gao, M.Z. Hu, Z. Z. Ji, “An efficient hardware implemen-tation for motion estimation of AVC standard”, IEEE Transactions on Consumer Electronics, vol. 51, no. 4, pp. 1360-1366, 2005.

[17] C. A. Rahman, W. Badawy, “UMHexagonS Algorithm Based Motion Estimation Architecture for H.264/AVC”, International Workshop on System-on-Chip for Real-Time Applications, pp. 207- 210, 2005.

[18] S. Momcilovic, N. Roma, L., Sousa, “An ASIP approach for adaptive AVC motion estimation”, IEEE Conference on Ph.D. Research in Mi-croelectronics and Electronics (PRIME), pp. 165-168 2007.

[19] H.Alzoubi, W. D. Pan, “Efficient global motion estimation using fixed and random subsampling patterns”, IEEE International Conference on Image Processing (ICIP), vol. 1, pp. I-477 - I-480, 2007.

[20] M. Shafique, L. Bauer, J. Henkel, “An Optimized Application Architec-ture of the H.264 Video Encoder for Application Specific Platforms”, ESTIMedia 2007, pp. 119-124.

3 Since leakage is technology dependent, the numbers for e.g. 65nm will

be different. However, our leakage results for 90nm show that we will still be far better than UMHexagonS due to faster execution.

152

















https://www.researchgate.net/publication/220239636_Low-Power_Partial_Distortion_Sorting_Fast_Motion_Estimation_Algorithms_and_VLSI_Implementations?el=1_x_8&enrichId=rgreq-bf4b9f64e528b39915de73acef76cacd-XXX&enrichSource=Y292ZXJQYWdlOzIyMDg0Njg0NztBUzo5ODY5MTY3NTMyODUyNEAxNDAwNTQxMzM3ODQ3



















3-tier dynamically adaptive power-aware motion estimator for h.264/AVC video encoding

Documents