Video Scaler CookBook Victor Ramamoorthy Software Engineering Group Information Appliance National Semiconductor Corp Santa Clara, CA 30 October 2002 ________________________________________________________________________ Video Scaler CookBook Victor Ramamoorthy Page: 1 (30) 10/13/2004
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Video Scaler CookBook Victor Ramamoorthy Software Engineering Group Information Appliance National Semiconductor Corp Santa Clara, CA 30 October 2002
________________________________________________________________________ Video Scaler CookBook Victor Ramamoorthy Page: 1 (30) 10/13/2004
________________________________________________________________________ Video Scaler CookBook Victor Ramamoorthy Page: 2 (30) 10/13/2004
Contents Video Scaler CookBook ..................................................................................................... 1 Video Scaler Cook Book .................................................................................................... 3
Introduction..................................................................................................................... 3 Performance of a Video Scaler ....................................................................................... 4 Scaling Fundamentals ..................................................................................................... 6
Example ...................................................................................................................... 7 Linear Interpolation .................................................................................................... 8
Choosing a better predictor............................................................................................. 9 Mirroring Pixels ............................................................................................................ 11 Implementing the Interpolating Filter........................................................................... 11 Putting It All Together .................................................................................................. 14 Finally ........................................................................................................................... 15 Tables............................................................................................................................ 17
Video Scaler Cook Book
Introduction Video Scaler is a device that stretches or shrinks images or video frames. For the purposes of illustration, we have a source image as shown below:
Source Image When expand the size of the image, the total number of pixels contained in the destination image is more than what is available in the source image. Hence a whole lot of new pixels have to be created by the scaler. When we shrink the image, we may keep a part of the pixels and throw the rest. In addition, we may still have to generate new pixels even if we are shrinking the image. The scaler’s job is simple: (1) It decides what pixels to keep in the source image and (2) what pixels have to be recreated. In this cookbook we will explore the scaler’s anatomy.
Destination Image: Shrinking Note that a scaler can stretch an image only in X direction or only in Y direction or both. We want to design a scaler that can achieve arbitrary1 image sizes without noticeable quality loss. There is no restriction to keep the aspect ratio2 as constant while scaling.
________________________________________________________________________ Video Scaler CookBook Victor Ramamoorthy Page: 3 (30) 10/13/2004
2 The ratio of width and height of an image is called the aspect ratio.
Destination Image: Expansion No distinction is made between shrinking and expansion of an image as far as the scaler algorithm is concerned. They are the opposite sides of the same problem. The scaler will operate exactly the same way if we are expanding or squeezing the source image. This cookbook describes the process with which smooth expansion/contraction scaling is obtained. It also details the construction of a hardware system, which can operate at video frame rates. In particular, the focus of the treatment is to spell out the different parameters that control the performance of the scaler. This cookbook is aimed at engineers who are not familiar with video scaling problem. Experts on the subject are warned that they may be wasting their time reading this cookbook. Many of the theoretical issues are merely glossed over and the attention is centered on the implementation. Useful tables at the end contain filter coefficients that can be used in a practical situation.
Performance of a Video Scaler As with any engineering design, we have to ask critical questions like “what are we designing and how do we know it is good”. When it comes to video scaler, that kind of questions are tough to answer because the quality of the end result simply cannot be measured. Video quality is a subjective thing. It varies with the content and viewing conditions. More importantly, when we scale video frames, there is no reference to compare. We are creating a new frame that was not there before, and we have to compare it with the original frame with a different size. It is like comparing apples and oranges. The best way to get around this problem is to ask if the scaled frame “looks” like the original and hunt for artifacts that might have crept in the scaling process. Still we are looking for something that has no reference and has some how appeared on the end result – which is not a good way to measure performance. So we come back and settle down to the measurement of “how it looks” after scaling.
________________________________________________________________________ Video Scaler CookBook Victor Ramamoorthy Page: 4 (30) 10/13/2004
The primary performance measure is undoubtedly the visual quality of the scaler. In general, visual quality is not easily measured by metrics such as Picture Signal-to-Noise Ratio and Mean Square Error, though they are often quoted in the literature. The secondary performance measures are Computational Load or Complexity of Design. We will attempt to quantify these measures wherever meaningful. Another thing to remember is that performance is a function of color conversion stages employed. For instance, if the source image is RGB, see fig. 1 below, the scaling can be done in the RGB domain. Notice that scaling is done on the component images. If the image is only available as YUV domain then scaling can be done in YUV domain as in fig.2:
R
G
B
Scaler
Scaler
Scaler
R
G
B
Source Image
Destination Image
Fig.1
Y
U
V
Scaler
Scaler
Scaler
Y
U
V
Source Image
Destination Image
Fig.2
_______________________________________________Video Scaler CookBook Victor Ramamoorthy 10/13/2004
Fig. 3 As suggested by fig.3 above, the color conversion can be included in the last stage before destination image is displayed. Or it also can be done before scaling as shown below:
R
G
B
Scaler
Scaler
Scaler
DestinationImage (YUV)
R
G
B
Source Image (RGB)
Color Conversion
YY
U
V
Color Conversion
Source Image (RGB)
Fig.4
Scaler
UDestinationImage
Scaler
V(YUV)
Scaler
_________________________ Page: 5 (30)
The performance of the system in fig.3 and fig.4 will in general be different. Because of additional operations involved with color conversion, there is ample opportunity to inject additional noise in the system. However, by careful choice, we can make the visual qualities of these two systems to approach each other. Another important thing to note is that the scaler operates on Planar images. A planar image could be an image containing just red color, or Y values. It is a component image. Repeating the scaling on all other component images does complete scaling.
Scaling Fundamentals Let be the size in the X direction and Y direction respectively of a planar source
image. Let the corresponding sizes of the destination image be . This means that
the source image has pixels in a row, which must be converted to pixels in the destination image. The ratio of these two numbers is called scaling factor
yx ss,
yx dd,
xs xd
xxS
s
dx = . (1)
Similarly the scaling factor in the Y direction is yy
Ss
dy = . Note that the scaling factor
is the ratio of two integers and is hence a real (floating-point) number. We will just illustrate the method for scaling in the X direction in this cookbook as the same kind of operations can be duplicated for Y direction. The key idea here is to expand the source size range to include floating point numbers. What is the size range? Well, it is the set of integers I S
x ={0, 1, 2, 3, … -1} which denote the pixel positions
xs3 in the X direction of the source image. The first pixel position
starts at 0. The last one ends at -1. We can expand this range to contain all real numbers – not just integers. Note that this is our mental construct and not a real thing. When everything is finished we will map this back into integers, as a pixel position cannot be a fractional number. Let us denote this new range as containing all
numbers from 0 to -1.
xs
RSx
xs
Now we are ready to map the destination pixel positions back onto the source pixel positions, i.e., mapping onto the range . Why is this needed? We have the job of generating new pixels in the destination image and they must have the resemblance of the source image. We need to know where to place these new pixels. We also need to know
RSx
________________________________________________________________________ Video Scaler CookBook Victor Ramamoorthy Page: 6 (30) 10/13/2004
3 We are talking about pixel positions which are sequential numbers starting at 0 from left. Do not confuse this with the value of a pixel. We will use the notation V(.) to denote the value of a pixel.
their values. Because we only know about the source image and know nothing about the destination image, we must make the connection between these two. We need to derive the destination pixel values from the source image. We can do that only if we know where these pixels are located.
From (1), we see that Sxx
x
ds = . Then the mapping of destination range back to
become: {0,
RSx
,1
S x
,2
S x
,3
S x
… S
xx
d1−
}. Out of this set, we will force that pixel position
at 0 will map onto pixel position 0. The other end condition is also forced: the pixel position at -1 will go to the pixel position at -1. This means that for i-th pixel,
, position of the destination image falls on the position
xd xs
01 >>− ixd S x
i in the expanded
range of the source image. The position RSx S x
i is a real number and not an integer.
That is why we expanded the range to include real numbers in the first place! The next
thing to do is to find where exactly the position S x
i is located.
To do this, we need to find the integer part of S x
i . This is also known as the F-center4,
]][[)(SF
xc
ii = . The fractional part is denoted by ))(()(Sf
xc
ii = . This means that the
i-th pixel position of the destination image falls in between source pixels in positions )(iF c and )(iF c + 1 and is fractional value off from the F-center )(if c
)(iF c . Are you with me? Let us do an example and clarify the notation.
Example Let the source X size be 4 pixels and the destination X size is 7 pixels. That is = 4
and = 7. The source pixel positions are denoted by {0, 1, 2, 3} and the destination pixel positions are denoted by {0, 1, 2, 3, 4, 5, 6}.
xs
xd
The scaling factor is 75.147===
xxS
s
dx . Let us now map onto the expanded range of
source image.
________________________________________________________________________ Video Scaler CookBook Victor Ramamoorthy Page: 7 (30) 10/13/2004
4 The F-center is actually the pixel on which the filter is centered. You will get it later. Keep cool.
Hence the simplest way to scale from 4 pixels to 7 pixels is to do the following: Value of pixel 0 at destination = Value of pixel 0 at the source (end condition) Value of pixel 1 at destination = Value of pixel 0 at the source Value of pixel 2 at destination = Value of pixel 1 at the source Value of pixel 3 at destination = Value of pixel 1 at the source Value of pixel 4 at destination = Value of pixel 2 at the source Value of pixel 5 at destination = Value of pixel 2 at the source Value of pixel 6 at destination = Value of pixel 3 at the source (end condition) We wanted 7 pixels and we got them all. Unfortunately such a simple minded scaling does not offer good visual quality!
Linear Interpolation What can we do better? Since we know the fractional values of the pixel positions, we can put them to work. This is called as Linear Interpolation. This can be illustrated as follows: Value of pixel 0 at destination = Value of pixel 0 at the source (end condition) Value of pixel 1 at destination = Value of pixel 0 at the source x (1.0 – 0.5714) + Value of pixel 1 at the source x 0.5714 Value of pixel 2 at destination = Value of pixel 1 at the source x (1.0 – 0.1426) + Value of pixel 2 at the source x 0.1426 Value of pixel 3 at destination = Value of pixel 1 at the source x (1.0 – 0.7143) + Value of pixel 2 at the source x 0.7143 Value of pixel 4 at destination = Value of pixel 2 at the source x (1.0 – 0.2857) + Value of pixel 3 at the source x 0.2857 ________________________________________________________________________ Video Scaler CookBook Victor Ramamoorthy Page: 8 (30) 10/13/2004
Value of pixel 5 at destination = Value of pixel 2 at the source x (1.0 – 0.8571) + Value of pixel 3 at the source x 0.8571 Value of pixel 6 at destination = Value of pixel 3 at the source (end condition) The above can be succinctly written for the middle pixels as:
Where V(. ) denotes the value of a pixel. This may be considered as an improvement over the case where we just copied pixels. Using the fractional part indeed helps in improving the quality. However the quality improvement with adjacent pixel linear prediction is not very great. The next big improvement can occur if we can extend the linear prediction idea and concoct a workable system.
Choosing a better predictor The previous section explained the notion of using two adjacent pixels in defining the value of a new pixel that falls in between. This is equivalent to using just a 2-tap linear filter with varying filter coefficients. Getting to know additional pixels is only good5. But there is a big problem. Since the fractional part can assume any value between 0 and 1, extending to multi-tap filter has inherent computational problems: Depending on the value of the fractional part, we need to design a filter for every pixel! A better idea is to quantize the fractional part to M regions and choose an appropriate filter from the stored set of filter coefficients. It is a common practice to use M as a power of 2, but is not necessary. The quantized fractional part )()( ˆ iiQ ff
cc= now takes values from 0 to M-1. For
each value, a different set of filter coefficients will be used to predict the pixel value. What would be an ideal filter to use? That is a separate topic by itself. We would not go into details as ample texts cover the subject of digital signal processing. It is suffice to say that optimal interpolation filter has an impulse response described by the Sinc function:
________________________________________________________________________ Video Scaler CookBook Victor Ramamoorthy Page: 9 (30) 10/13/2004
5 Recall the statistical motto: The more you know, the better you compute the estimate.
)(
)sin(
MkMk
π
π
for k = … -2, -1, 0, 1, 2, … and M is the number of
quantization levels used. Corresponding time response of the PolyPhase Interpolation Filters is given by:
nM
Mn
Mn
nhnh all and ,1,...3 ,2 ,1 ,0 ,)(
)](sin[)()(~
−=+
+== ρ
ρπ
ρπ
ρ (4)
The direct way designing a scaler is to use the above equation and truncate the ideal prototype given in (4). Unfortunately this leads to ringing and Gibbs phenomenon, which can be visually disturbing in the end result. A better way is to taper the filter coefficient gradually to zero with a windowing function as shown below: )(kw
21
21- ),()(~)( −
≤≤−= NkNkwkhkh (5)
There are a number of windows shown to have good properties in handling Gibbs phenomenon. Here we use a generalized Hamming window given by
2
1 2
1 ,Otherwise 0
2cos)1()( −≤≤
−−
⎪⎩
⎪⎨⎧
⎥⎦⎤
⎢⎣⎡−+=
NkNN
kkwH
πγγ (6)
where γ is in the range 10 ≤≤ γ . If γ =0.54, the window is called a Hamming Window, and if γ = 0.5, it is called a Hanning Window. The filter coefficients given by equations (4), (5), and (6) are still real numbers. They need to be quantized to digital numbers to operate in a digital filter. As far as you, the reader, is concerned you do not need to worry about the above equations and theoretical stuff. We give you all these in the form of tables which you can readily use without losing sleep. Though most scalers use odd-number of filter taps, here we use even number filter taps. The reason is that odd-numbered filter taps generate phase switching noise– unless they are big enough –and we suggest looking into a nicer solution of even-numbered filter tap designs.
________________________________________________________________________ Video Scaler CookBook Victor Ramamoorthy Page: 10 (30) 10/13/2004
________________________________________________________________________ Video Scaler CookBook Victor Ramamoorthy Page: 11 (30) 10/13/2004
ig. 5
ince images have finite support – that is, finite width and height – employing filtering is
hat happens if you do not do mirroring? Nothing. Except that you end up with a black
Implementing the Interpolating Filter
irst let us see how to design a Finite Impulse Response filter for scaling. The theory of
Mirroring Pixels
Source Image
FEDCBAABCDEFGHIJKL
Mirrored border
F Sa little tricky. To alleviate the “border” problems, an artificial border is created around the source image by mirroring the pixels both column-wise and row-wise. With mirroring filtering can start right at the image edge and stop at the corresponding edge on the other side. Each row of pixels is extended as shown in fig.5 as if there is a mirror right on the border to reflect the pixels. The width of the border is related to the half-length of the filter used. Fig.5 illustrates the case when 6-pixel border is created with mirroring. Wborder on your destination image!
Fthe filtering was covered in a previous section. The section “Tables” give a set of filters that can be used in practice. Here let us construct a simple 6-tap filter system, see table 8. The next figure illustrates how a filter works on the source image to produce the content of the destination image.
The table chosen has 6 filter coefficients depending on the quantized value of the fractional part )()( ˆ iiQ ff
cc= . The integer part, )(iF c selects the center pixel in the
source image, which will support the filtering in the immediate neighborhood. In the case of 6-tap filter, 2 pixels before and 3 pixels after the center pixel form the six pixels that will be used in the creation of the corresponding destination pixel as shown in the fig 6.
________________________________________________________________________ Video Scaler CookBook Victor Ramamoorthy Page: 12 (30) 10/13/2004
Fig.6 Since the filter uses 12 bits of coefficient precision, dividing the sum (formed by the filter coefficients and the pixel values) by 4095 normalizes the product sum. This leads to the destination pixel value of 6. This type of Finite Impulse Response filtering is routinely done in DSP applications. Over the years, the hardware required for filtering has been almost standardized. A DSP engine is constructed by having a multiplier, adder, shifter and coefficient RAM. Coefficients are first loaded into the RAM before operations begin. The multiplier typically has two registers for loading the multiplicands. After values are multiplied, they are added with the previous sum. Appropriate shifting is needed for normalization to avoid overflow.
________________________________________________________________________ Video Scaler CookBook Victor Ramamoorthy Page: 13 (30) 10/13/2004
________________________________________________________________________ Video Scaler CookBook Victor Ramamoorthy Page: 14 (30) 10/13/2004
Putting It All Together
the time to assemble all the previous knowledge and put them to work. Fig. 7 the micro architecture of the scaler. It uses all the familiar digital building blocks
A Register
B RegisterX
16
16 Adder 3
32
Shifter 2
32
16
16
Load
Destination ImageFrame Buffer
Overflow
MirroredSource Image Frame Buffer
8
8
16
16
Coef. RAM
Video Scaler Micro Architecture
Adder 1
Frac. Reg16
Adder 2
Int. Reg
Load16 16
Address
Shifter 1
IRFR
Log M
DSP
Fig. 7 Now is
ows shlike multipliers, adders, and shifters. Details of clocking are not shown in fig.6 for the sake of clarity and brevity.
First going back to our notation, we need to compute the scaling factor xxS d= . It must
sx
al part ofbe computed by another entity controlling the scaler. In fact, what we need is just the reciprocal of the scaling factor: we separate the integer and fraction the
))1((]]1[[1
SSS xxx
+= . Why do we do this? Our goal is to compute S x
i . This can be by a
parate fractional and integer parts and add them p as we increase i.
Note that
pair of adders if we se u
))1((]][[SSS xxx
. The integer part is already a whole number. We can 1 iii+=
________________________________________________________________________ Video Scaler CookBook Victor Ramamoorthy Page: 15 (30) 10/13/2004
nteger part into the 16lied by 65536 to m
ent of the Integer register is loaded into 16-bit Adder 2. This behavior
e overflow, which will be an address of the source
the DS engine shown in fig.6. The
-bit output, and a 32-bit adder called Adder 3. The output of Adder
r taps used.
e have the theory. We have the architecture. We have the tables. Next thing is to put od use. To help in the design process, we have a Video Scaler Design Tool as
load this i -bit Integer Register IR. The fractional part is a real number is multip ake that into a 16-bit number and then is loaded into the Fractional Register FR. These loadings are done as a part of initialization. The coefficient RAM also needs to be filled with correct numbers before the scaling operations begin. Note that the content of the Fractional Register is also loaded into 16-bit Adder 1. Similarly the contof loading Adder 1 and Adder 2 will be repeated for every pixel in the destination image. In addition, note that Adder 1 feeds back into itself while supplying its output as an address to the Coefficient RAM. To get the correct address, we need the Shifter 1 to drop all bits except what is needed to generate the right filter coefficients. Hence if we use a M level quantization of the fractional part, then we just need Mlog2
bits for the address in
selecting the correct filter coefficients in the coefficient RAM. Since Adder 1 feeds back its output into its input, it will generatadded as input to Adder 2. Adder 2 has its output pointing topixel being operated on. If the scaling is being performed for X direction, then the Adder 2 output will be a number representing the column of the pixel. If the scaling is done in Y direction, this will be the row number of the pixel. How does the interpolation work? First we need to load the pixel value from the mirrored source image buffer into A Register of P corresponding filter coefficient is taken from the coefficient RAM and loaded into B Register of the DSP. The DSP engine consists of 2 16-bit registers (called A Register and B Register), a 16 bit full multiplier with 323 is fed back to its input. When computations are completed, the Shifter 2 scales the output to 8 bit and sends it to the destination location. The clocks governing the DSP engine run p times faster than the clock governing Adder 1, Adder 2, and Shifter 1, where p is the number of filte The rest is, as the saying goes, elementary.
Finally Wthem to goshown in the fig.8. This tool allows one to change all the pertinent parameters and arrive at a useful design that satisfies the design goal.
Fig.8 Figure 8 shows the GUI of the tool. It accepts BMP files and scales the input on the fly. The first combo box allows the selection of M (Inter Pixel Quantization) in bits. The second combo box offers selection of number of bits used in representing each filter coefficient. The third combo box selects the number of filter taps. The fourth one selects the window. When the “Input File” button is clicked, a pop up window helps in choosing an input BMP file. A thumbnail of the chosen file appears in the small picture box below. By operating the sliders around the picture box, one can scale the input both in X and Y direction.
________________________________________________________________________ Video Scaler CookBook Victor Ramamoorthy Page: 16 (30) 10/13/2004
________________________________________________________________________ Video Scaler CookBook Victor Ramamoorthy Page: 17 (30) 10/13/2004