Introduction to Video Encoding Preben N. Olsen University of Oslo and Simula Research Laboratory [email protected] August 26, 2013 1 / 37
Introduction to Video Encoding
Preben N. Olsen
University of Oslo andSimula Research Laboratory
August 26, 2013
1 / 37
Agenda
1 IntroductionRepetitionHistoryQuality AssessmentContainers
2 Video Encoding FundamentalsMacroblocksFramesPrediction ModesMotion CompensationParallel Encoding
2 / 37
Repetition
From first lecture. . .
• Media Compression• Raw data is inconvenient, very large file sizes• Compression reduces bandwidth and storage costs
• Image Representation• Number of pixels, e.g., 1920 ∗ 1080• Color representation per pixel
• Y UV (Y CbCr) Color Space• Y is the luma component (light intensity)• U is a chroma component (color)• V is a chroma component (color)• Reduce file size by chroma sub-sampling
3 / 37
Repetition
Figure : RGB and CMYK [1]
4 / 37
Repetition
Figure : YUV Dissected, original [2]
5 / 37
Repetition
Full HD YUV frame size . . .
1920× 1080× 24 bits ≈ 5.9 MB
6 / 37
Repetition
Why not (buy and) download The Hobbit in full YUV format?
5.9 MB × 48 FPS × (162× 60) Seconds ≈ 2.6 TB
7 / 37
Repetition
Figure : YUV Data Layout, original [3]
8 / 37
Repetition
Figure : JPG Block Diagram [1]
9 / 37
Repetition
• JPEG is short for Joint Photographic Experts Group
• There’s a trade-off between size and quality in jpg images
• Compression rate of 1: 10 gives a reasonable result
• Lossless jpg encoding yields approx comp rate of 1: 1.6
10 / 37
History
• The MPEG is short for Motion Picture Expert Group
• Industry together with ISO and ITU develops standards
• MPEG-1 started in 1988, released in 1993
• MPEG-2 started in 1990, released in 1996
11 / 37
History
• MPEG-3 was to include support for HDTV (1080p)
• MPEG-4 started in 1998, released between 1999-. . .
• Part 2 of MPEG-4 describes H.263 Advanced Simple Profile
• Sometimes referred to as DivX or Xvid
12 / 37
History
• Part 10 of MPEG-4 defines H.264, introduced in 2003
• Twice the compression of H.263 (MPEG-4 ASP)
• Used by Blu-ray, Rikstv, Youtube, and many others
• Sometimes referred to as x264
• This codec has 17 different profiles
13 / 37
History
• High Efficiency Video Coding or H.265 started in 2004
• HVEC has better compression, same level of quality
• Released to the public on June 7th, 2013 [4]
• Supports Ultra High Definition TV (UHDTV), 7680× 4320
14 / 37
Money and Politics
• Patent pool created by MPEG-LA
• About 1,500 patents related to H.264
• Incentive for large, global companies
15 / 37
History
• Google bought On2, which initially developed VP8
• VP8 spec. released with open-source implementation in 2010
• Supported by many browsers and mobile platforms
• Ongoing development on VP9, a HVEC competitor
16 / 37
Quality Assessment
• Assessing video quality is difficult
• A group of people rate which version is best
• People have different opinions on quality
• Objective measurements can give an estimate
• Peak Signal-to-Noise Ratio (PSNR)
• Shell script for PSNR found in mplayer source tree
17 / 37
Containers
• File containers are not codecs
• Video codecs are used for encoding and decoding bitstreams
• Containers are used for packaging bitstreams
• Examples include Audio Video Interleave (AVI), Matroska(MKV), Video Objects (VOB), and OGG
18 / 37
Video Encoding Fundamentals
19 / 37
Figure : Overview of H.264/VP8
20 / 37
Macroblocks
Figure : Missing macroblocks [5]
21 / 37
Macroblocks
• Different macroblock types and sizes
• 16× 16 pixels, subdivided into 4× 4
• Intra-, predicted-, and bi-directional predicted macroblocks
22 / 37
Macroblocks
Figure : Foreman and macroblocks
23 / 37
Frames
• Also different frame types
• Usually intra-predicted frames, predicted frames, andbi-directional predicted frames
• VP8 does not have bi-directional, but alt-ref andgolden-frames
Figure : Different frames [6]
24 / 37
Frames
Prediction type
1 Intra-prediction
2 Inter-prediction
3 Bi-directional
Predict the pixels of amacroblock using informationavailable within a single frame.
Typically predicts from left, topand top-left macroblock by inter-or extrapolating the borderpixel’s values.
Different prediction modesavailable, e.g. horizontal,vertical, and average.
25 / 37
Frames
Prediction type
1 Intra-prediction
2 Inter-prediction
3 Bi-directional
Predict a macroblock by reusingpixels from another frame.Objects tend to move around in avideo, and motion vectors areused to compensate for this.
H.264 allows up to 16 referenceframes, while VP8 only supports3 frames.
26 / 37
Frames
Prediction type
1 Intra-prediction
2 Inter-prediction
3 Bi-directional
Predict the pixels of amacroblock using informationavailable in other frames, bothprevious and upcoming frames;that is, going back and forwardin time.
Can reference every type offrame, including otherbi-directional predicted frames.
27 / 37
Determining Prediction Modes
• The motion estimator tries many modes
• Different blocks are evaluated
• Two-step process, initial and refinement
28 / 37
Some Cost Functions
• Mean square error (MSE)
• Sum of Absolute Differences (SAD)
• Sum of Absolute Transformed Differences (SATD)
• SATD is more accurate than SAD
29 / 37
Motion Compensation
• With the best motion vector a predicted block is generated
• The original reference frame can not be used directly as inputto the motion compensator as the decoder never sees theoriginal image
• Decoder ”sees” a reconstructed image, i.e., an image with loss
• A reconstructed reference image must be used as input
30 / 37
Parallel Encoding
• Approaches available both for intra- and inter-prediction
• Some give up compression efficiency for increased parallelism
• Pipeline approach shouldn’t be combined with real-time reqs
31 / 37
Parallel EncodingWhat should be optimized?
Figure : VP8 profiling
32 / 37
Parallel Encoding
Figure : Group of Pictures [6]
33 / 37
Parallel Encoding
Figure : Sliced-based approach
34 / 37
Conclusion
• Video encoding is mainly about trying (and failing) differentprediction modes limited by user-defined restrictions (resourceusage)
• The ”actual” encoding of the video when the parameters areknown usually accounts for a small percentage of the runningtime
• Any (reasonable) codec can produce the desired video quality- what differs between them is the size of the outputbitstream they produce
35 / 37
The End
36 / 37
References
Video & Image Compression Techniques: Image Coding Fundamentals
http://goo.gl/6fCK7N
Wikipedia: YUV
http://en.wikipedia.org/wiki/Yuv
Any To YUV: Documentation
http://any2yuv.sourceforge.net/Docs
H.265: High efficiency video coding
http://www.itu.int/rec/T-REC-H.265
BitBlit.Org
http://www.bitblit.org/gsoc/g3dvl/
GOP (Group of Pictures)
http://goo.gl/83D7Hz
37 / 37