Video Forgery Detection: Literature review

An overview of modern video forgery detectionA Literature Review

(Kumara M.P.T.R., Fernando W.M.S., Perera J.M.C.U., Philips C.H.C., Jayawardane A.L.H.S)Department of Computer Science and Engineering

University of MoratuwaSri Lanka

1

Table of Contents1. Introduction........................................................….....................................32. Literature survey........................................................................................4 2.1 Double MPEG compression................................................................4 2.1.1 Intra Coding principle.....................................................................5 2.1.2 Inter Coding Principle....................................................................5 2.1.3 Detecting static artifacts in a double compressed MPEG video......7 2.1.4 Detecting temporal artifacts in a double compressed MPEG video 8 2.1.4.1. Discrete Cosine Transform and Support Vector Machine......8 2.1.4.2 Double Compression Detection Algorithm.............................9 2.1.4.2.1 First digit distribution in MPEG video.............................9 2.2 Detecting Duplication..........................................................................13 2.2.1 Detecting duplicated frames..........................................................13 2.2.2 Detecting duplicated regions across frames ..................................14 2.3 Extending image forgery detection techniques for videos...................15 2.3.1 An approach for JPEG resize and image splicing detection...........15 2.3.2 JPEG compression analysis and algorithm for forgery detection...16 2.3.3 Method based on directional filter using JPEG image analysis.....18 2.4 Combining artifact in screen shots......................................................19 2.5 Multimodal feature fusion...................................................................23 2.6 Detecting Video Forgery by Ghost Shadow Artifact...........................26 2.6.1 Video Inpainting............................................................................26 2.6.2 Ghost Shadow Artifact..................................................................26 2.6.3 Detecting Video Forgery...............................................................273. Conclusion...............................................................................................284. References................................................................................................29

2

1. Introduction

Video data has become more popular with the advancement of digital cameras and networking technologies with high speed bandwidths. As a result many systems make use of video data and rely on the accuracy of such data. On the other hand, an inevitable adverse effect of this critical nature of video data is video forgery. There are many software available all over the internet that facilitates video editing. With these resources, video editing has become increasingly easier and even novices can make an edited video stream within minutes. This can introduce many security concerns. So detecting video forgery has become a critical requirement to ensure integrity of video data.

Basically there are two major techniques for protecting video data against tampering, active and passive methods. Earlier, active video protection methods like digital signatures and watermarking techniques are used to maintain integrity and authenticate video data. But the problem with these methods is that the situations where we can apply these techniques are greatly limited and they reduce the video quality and require specialized hardware[1]. Therefore more useful and improved passive techniques are used to detect forgeries in video data without relying on the previously embedded information such as digital signatures. In these passive video protection methods, statistical details in the video data are used to recognize the validity of data. So the entire objective of video forgery detection is to ensure that video data have not been changed after the recording time.The ideas of video forgery detection has evolved a lot and in this literature survey, these methods will be assessed based on their applicability and how much each method fit into a given situation.

3

2. Literature survey

As described in the introduction, various video forgery methods exploit a wide range of tampering techniques. Some of the most common techniques are,

● Inpainting - Image inpainting, Video inpainting, Motion inpainting (Inserting or

removing frames from the actual scene)

● Object computing (Motion estimation motion compensation and motion tracking)

● Frame and region duplication

Video forgery refers to the finalizing techniques after doing inpainting and object computing for integrating them to create a forged video with high quality. Methods have been developed to identify this video tampering. These techniques will be discussed throughout this survey.

2.1 Double MPEG compressionDigital videos are usually encoded with MPEG-X or H.26X coding standard. MPEG video are organized into hierarchy of layers. The layer hierarchy is as follows.

Figure 1. The hierarchy of MPEG video

The algorithm and corresponding experiments in this paper focus on MPEG-2 Constant Bit Rate (CBR) video, but they are backward compatible to MPEG-1. There are two principles to be discussed.

1. Intra coding principle2. Inter coding principle

4

2.1.1 Intra Coding principleWhen an image is compressed without reference to any other images, the time axis naturally does not come into play and therefore, this type of compression is referred to as “intra-coded” compression. [16]Intra coding includes three technologies.

1. DCT2. Quantization3. Entropy Coding (Variable length coding)

DCT, which is introduced above is used to achieve energy concentration and signal data de-correlation. Including the Direct Current (DC) coefficient, the low frequency DCT coefficients, are significant than high frequency DCT coefficients, which are almost zero. On account of the energy concentration in DCT, higher frequency DCT coefficients will be more coarsely quantized. The corresponding quantization matrix value in the matrix are pre-scaled by multiplying with quantize scale, which take effects on a macroblock basis. Quantizer scale is an encoder parameter in Variant Bit Rate (VBR) videos and an output bit rate controller in CBR which change adaptively. Each DCT coefficient is divided by the respective quantization matrix value. Entropy coding is applied to Quantized DCT coefficients.

2.1.2 Inter Coding Principle

Double MPEG compression is a widely used method for video forgery detection. Here double compression means encoding the original video data twice, first by the recorder itself and later by the tamperer after introducing changes to the video stream. This double compression introduce unique static and temporal statistical perturbations. These changes can be used as evidence to video tampering.Motion compensation is an algorithmic technique employed in the encoding of video data for video compression including MPEG-x compression. Motion compensation describes the relative transformation of picture to current state from a reference picture.The reference picture may be either previous or future. Compression efficiency can be improved if the images are synthesized from previously transmitted or stored images.For many frames of a movie, the only difference between one frame and another is the result of either the camera moving or an object in the frame moving. This interprets that, much of the information that represents one frame will be the same as the information used in the next frame in reference to a video file. Motion Compensation exploits this fact.Inter coding is about applying motion compensation to exploit temporal redundancy. To identify redundancy spatial search is performed in a search range. If a relatively good match can be found, this macroblock will go through inter coding, else it will perform intra coding.After predicting frames using motion compensation, the coder finds the error (residual) which is then compressed and transmitted. The predicted macroblock is subtracted from real macroblock. This leaves a less complicated residual error.

5

Similarly as the intra coding residual error is spatially coded. The difference is that the quantization matrix is a flat matrix with a constant value. Most of the quantized DCT coefficients are zero because residual error has less information as described above.

Figure 2: Schematic view of the double MPEG compression [1]

MPEG compression process can be described as follows. Once we produce an MPEG video using an original video stream, a coding sequence of three different frame types are formed[2] which normally follows an iterative pattern.

● Intra (I) frames - These are typically the highest quality frames and consequently least compression happens in I frames. In this type of frames, compression is achieved by removing spatial redundancies within a single frame and no other reference is made to the neighboring frames. [2]

● Predictive (P) frames - These frames are generated as a result of removing temporal redundancies across the frames and predicting the motion of the current frame with respect to the preceding frames. [2]

● Bidirectionally predictive (B) frames - Like P frames, B frames are encoded with motion compression. But unlike P frames, B frames use past and future I,P frames (preceding and succeeding frames) for motion detection. So B frames will have the highest compression level. [2][16]

As mentioned earlier, these I,P and B frames are packed in an iterative pattern where we call the iterative pattern a Group Of Pictures(GOP). A GOP starts with an I frame and ends with the next I frame. This defines the GOP length N. Within the GOP, the gap between two P frames is denoted by M. Consider the following frame sequence as an example with N=12, M=3.

I1 B1 B2 P4 B5 B6 P7 B8 B9 P10 B11 B12 I13 B14 …

So if someone tampers this video and save again in MPEG format, this pattern is distorted and hence can be recognized. This frame distortion adds both spatial and temporal effects to the resulting video stream.

6

Figure 2 describes how double MPEG compression introduces evidences of tampering into the video stream.

Figure 3: How the original GOP changes after double MPEG compression [2]

The topmost frame sequence is the original MPEG video stream. Now assume that shaded three frames are deleted from the video. Second line of frames shows the resulting frame sequence. The following section describes how these static and temporal artifacts are detected in a double compressed MPEG video.

2.1.3 Detecting static artifacts in a double compressed MPEG video

As mentioned earlier, I frames are totally based on static compression. That is, the amount of compression in I frames does not depend on neighboring frames. So the static artifacts of double compression are introduced by the double compression of I frames. Further details about this process and double quantization is described in section 2.3.

7

2.1.4 Detecting temporal artifacts in a double compressed MPEG video

P and B frames are the ones responsible for introducing temporal artifacts to a double compressed MPEG video. This is because the amount of compression in P and B frames depend on the neighboring frames. So when a sequence of frames are altered, removed or added to the original frames sequence, the existing P and B frames also get altered as a result of this accumulate compression procedure. At this point it is noteworthy that the effect of temporal dependency on neighboring frames only prevails within a single GOP because every GOP is surrounded by two I frames whose compression is absolute. Once a sequence of frames are altered and re-encoded, frames from different GOPs will be grouped together in the same GOP. In the original video sequence, P frames are strongly correlated with the respective I frames in their GOP. But once the P frames shift to other GOPs as a result of frame removal/addition, this correlation becomes weaker increasing the motion error [2]. Once the motion errors are large enough, they can be identified by performing a Fourier domain analysis on the video stream.

The advantage of this technique is that it detects both static and temporal errors in the video stream. Unlike the active video protection techniques, this method employs statistical analysis on video data. So it does not introduce any quality degradation.

On the other hand there are certain limitations in this method. This method fails to detect a tampered video stream if the number of removed frames is a multiple of GOP length because it will not add a net change to the video stream when an entire GOP is removed. Not only that, this methodology is applicable for the videos that are encoded by MPEG compression scheme. But the effect of this issue is not critical because MPEG is the most widely used video encoding method. With that being said, one can find a number of ways to countermeasure that can hide video tampering. Newer methods are proposed to overcome the limitations in this method. One such method is exploiting markov statistics to detect double MPEG compression that has proven to provide average detection accuracy over 90% [3]. 0Another method of detecting double MPEG compression is described below.

2.1.4.1. Discrete Cosine Transform and Support Vector Machine

Discrete Cosine Transform (DCT)It expresses a finite sequence of data points, represented as a sum of cosine functions which have different frequencies. (Like in Cosine Fourier Series. ) Use of cosine series instead of sine series is critical in these applications.

Double compression disturb DCT coefficients, which reflects the violation of the parametric logarithmic law for first digit distribution of quantized Alternating Current coefficients.

8

Support Vector Machine (SVM)

A SVM is a computer algorithm that learns by example to assign labels to objects. As an example it can learn to detect fraudulent bank cheques by looking at hundreds or thousands of fraudulent cheques and in-fraudulent cheques.

2.1.4.2 Double Compression Detection Algorithm

2.1.4.2.1 First digit distribution in MPEG video

The paper [15 ]says that it is reasonable to deduce that the first digit distribution in MPEG has the same characteristic as in JPEG. Since it has been proved that the first digit distribution can be utilized well in JPEG double compression detection same is assumed for MPEG.This is a reasonable assumption and can be proved with an example.

Parameter Logarithmic Law

y = corresponding value N, q = [0.1, 3] x = first digits s = [-1, 1]

Figure 4.Fitting results for original MPEG video(a) Intra (I) frame. (b) Non-intra (P) frame.[15]

9

Figure 5 . Fitting results for doubly compressed MPEG video [15](a) Target bit rate is larger than original bit rate. (b) Target bit rate

is smaller than original bit rate.

Figure 6. Fitting results in log-log scaleSolid, dashed and dash dotted line stand for original video, doubly compressed video with larger bit rate

and smaller bit rate.[15]

First Digit Distribution of the original MPEG video and the non-intra frame are considered in figure 1. They follow the parametric logarithmic law as shown. But first digit distribution in non-intra frame differs. AC coefficients with first digit 1 has a bigger proportion, but first digit 7, 8 and 9 are smaller. So the non-intra frames may not follow the logarithmic law as well as intra frames. This is because of the inter coded macroblocks in P and B frames. The fitting results of doubly compressed MPEG video is considered in figure 2. The violation is so high for the case where target bit rate to doubly compress is larger than original bit rate. That can be seen clearly by naked eye. It is not that obvious when the rate is smaller than original bit

10

rate. There first distribution has the tendency to follow the algorithm. The results of that situation are re-plotted using log-log scale in figure 3. There it is clearly shown that doubly compressed MPEG video drift away from the first digit distribution of original MPEG video. [15]From this point onwards the algorithm followed is as follows.

1) For both query and training video, the first digit distribution of quantized AC coefficients is extracted.2) Test the first digit distribution with parametric algorithmic law. Three goodness-to-fit statistics are calculated, including squares due to error (SSE), root mean squared error (RMSE) and R-square. SSE and RMSE closer to zero, R-square closer to one means a good fit.3) Combine the first digit probabilities and goodness-to-fit statistics to compose a 12-D feature. Only I frames are taken into consideration because the fitting results for intraframes are better than that for non-intra frames.4) Each GOP with a 12-D feature is treated as a detection unit, so the SVM classifier will judge on a GOP basis. The GOP proportion D is defined as D= M/NAccording to [16], there are three categories of current video and image forensic approaches.

● Source identification or camera sensor fingerprinting.● Image and video tampering detection.● Image and video hidden content detection and recovery.

This falls into image and video tampering detection category.

The methods that are used in this category are detection of cloned region, analysis of feature.variations (using the original and tampered image/video), inconsistencies in features, video splicing and cloning detection, inconsistencies regarding the acquisition process or even structural inconsistencies present in the targeted attacks, lighting inconsistencies and variation in image features etc. [16]The implemented method in [15] falls into video splicing and cloning detection.According to [16] there are two ways to implement video splicing and cloning for MPEG-2.

● Decoding the MPEG-2 encoded file using a video editor and using the editor to edit the frames and then re-encode the modified video as a new MPEG-2 encoded file.

● Directly editing the information stored in the MPEG-2 encoded bit stream, i.e., .mpeg file, without going through an intermediate video editor.

Described method in [15] flows into the first category which is known as double compression and discussed above. The second method is seldom used because the attacker is unable to visualize how the video frames looks like since the MPEG -2 file is in an encoded and compressed form.The benefits of using these methods rather than other methods are,

11

● Do not have to analyze about the hardware devices like cameras, which is used to record the video, voice recorders etc

● Tracking inconsistencies like light inconsistencies, image inconsistencies, voice inconsistencies may spend a lot of resources and effort and may need expertise experience in the field

● The method used to tamper the video by double compression is the most frequently used one. So using this method more forgeries can be detected

● Currently a lot of researches are done and going on double compression detection techniques

● The method can be improved furthermore using that knowledge. As an example analysis of the histograms of the video, which are featured to detect double compression will help the method describe

● Human interaction is not a must

The important point to be considered is that this is not the only method. So if no double compression is detected at all, anyone can’t assure that the video is not forged. So it is requested to do other analysis for other type of forgeries and prove that those are not used to forge the video without double compression.Use of first digit distribution is a good solid practice, but may give fallacious interpretations. As an example when the target bit rate is smaller than original bit rate, as shown in figure 2, (b), an inexperienced analyzer may think that it is not forged. It is encouraged to use other methods rather than using first digit distribution alone. As a suggestion, using Fast Fourier Transform (FFT) of the mean motion errors of P-frames for a doubly compressed video will show spikes when the frames are deleted and added. But may not accurate at times. This is more general technique than first digit distribution. The reason for this change in motion error is that the P-frames within a single GOP are correlated to its initial I -frame. These compression artifacts propagate through the P-frames, due to the motion compensation encoding. The motion errors of each P-frame will be correlated to its neighboring P-frame or I–frame as a result. [16] But again there are limitations in this method also. That is the “spikes” will be visible as long as the deletion (or addition) of P-frames is not a multiple of the GOP length. So this technique will not work if the deleted or added P-frames are exactly a multiple of the GOP length. So a forensic investigator can’t be satisfied by using only this technique also. As an example if an entire GOP is deleted it is not shown by the FFT graph. According to [16] there are encoders which can adaptively change GOP length while encoding. Other limitation is this need human interaction to see spikes.

Another method to be considered is counter-forensics algorithm in which the algorithm first constructs a target P-frame motion error sequence that is free from the temporal fingerprints and then selectively alters the video’s predicted frames so that the motion error sequence from the tampered video matches the targeted one. This is done by setting the P-frame’s motion vectors

12

for certain macroblocks to zero and then recalculating the associated motion error values for the affected macroblocks.[16]The rate of detection of tampered videos by a forensics investigator depends on following two things, (whether applied with counter-forensics technique or not)

● Carefulness the attacker is when crafting a tampered video● Solidness of the threshold that a forensics investigator would set in detecting tampered

videos

Therefore it is recommended for a forensic investigator to try all the methods and techniques that is possible to detect video forgeries. No technique is 100% assured and its not a good practice to use one technique alone.

2.2 Detecting Duplication

One very common form of altering video data is by duplication. Duplication can be used to remove people, objects and undesired events from a video sequence. Duplication is relatively easy and when done with care hard to detect. Over the years there had been some methods developed to detect duplication. But the main problem associated with these methods is the computational cost. The reason for the computational cost is that a video of a modest length can generate thousands of frames [8]. Therefore it is important to develop methods for detecting duplication which are computationally efficient.

Detecting Duplication can be divided into two parts

● Detecting duplicated frames● Detecting duplicated regions across frames

2.2.1 Detecting duplicated frames

The basic approach is to first divide the full length video sequence into short overlapping subsequences. Then the temporal and spatial correlations in each subsequence is computed and compared throughout the video. Similarities in these correlation values are used to detect duplication. When we represent the images in vector form, correlation coefficient between two vectors u and v is given by the following equation.

where and are the element of and and and are the respective means of vectors and [8].

13

For a video subsequence with n frames, it can be denoted by,

Where denotes the sequence starting time.

Then the temporal correlation matrix can be defined as the symmetric matrix whose

element is the correlation coefficient between the and frame of the subsequence.

Similarly the spatial correlation matrix can be computed by tiling each frame

with m non overlapping blocks. In this matrix element gives the correlation coefficient

between the and blocks.

In the first stage of the detecting process temporal correlation matrix of all overlapping subsequences are computed. Then the correlation coefficient between pairs of these matrices is computed. A threshold value close to 1 is considered as an indicator of possible duplication. In the next stage spatial correlation matrices for those candidate subsequences are computed and compared. Again a specified threshold value close to 1 used as an indicator that exhibit the duplication.

2.2.2 Detecting duplicated regions across frames

Two cases of region duplication can be considered.● Stationary Camera● Moving Camera

In the case of stationary camera, normalized cross power spectrum is first defined as,

Where and are the Fourier transforms of the two frames, * denotes the complex conjugate and ||.|| is the complex magnitude.

In this case, at the origin (0, 0) significant peak is expected compared to other positions. Peaks at other positions can be used as indicators for duplications.

In the case of moving camera, rough measure of the motion of camera is computed to determine that the field of view between two frames is sufficiently different to ensure that they do not contain any overlap. Camera motion can be approximated as a global translation [8].

14

A one main advantage associated with the described methods is the computational efficiency. In addition to low computational cost, these methods have the capability of detecting duplications in both high and low quality compressed videos. One disadvantage associated with detecting frame duplication is that a stationary surveillance camera generally records a static scene and hence likely to generate large number of duplicated frames and when used with this method generates a value close to the threshold. Therefore this method cannot differentiate static frames from duplicated frames. The problem with detecting region duplication is that this method is not designed to be robust to geometric transformations [9].

2.3 Extending image forgery detection techniques for videos

Another famous approach video forgery detection is to treat a video as a sequence of stationary images. So when detecting the forgeries in videos, image forgery techniques can be used to detect video forgeries.

2.3.1 An approach for JPEG resize and image splicing detection

We can detect forge images using correlation of neighbouring discrete cosine transform (DCT) coefficients. By using these DCT we can detect resizing of images and image splicing. According to the studies it has shown that this method is highly effective in detecting JPEG image resizing and image splicing. One drawback in this method is the performance is depend on the image complexity and the resize scaling factor.

Studies have shown that adaptively varying two parameters of the generalized Gaussian Distribution[4] can achieve a good probability distribution (PDF). In this method the probability distribution function they have used is

Where alpha models the width of the PDF peak and shape parameter where beta models the shape of the distribution.[13]

To study the dependency between the compressed DCT coefficients and their neighbors, we study the neighboring joint distribution of the DCT coefficients and a multivariate GGD model may achieve a good approximation of the probability density function as described below.

15

where the indicates a normalized constant defined by , and is the covariance matrix and is the expectation vector. [13]

Regarding the neighboring joint probability density, If the left adjacent DCT coefficient denoted by the random vector x1 and the right DCT coefficient is denoted by the x2, we can define x =(x1,x2). if the different part of the JPEG image found with different resize history and then the image may be a tampered one.

The recent studies of JPEG steganalysis shows that the most information hiding techniques modify the neighboring joint density of DCT coefficients[7]. Based on those studies neighboring DCT coefficient method assumes that when a JPEG image resized or spliced the DCT neighboring joint probability density functions will be affected. In this detection method neighboring joint density probabilities used as the detection parameters.

Here are the other existing methods to detect JPEG image resizing and splicing.● Image splicing detection method based on the bipolar signal perturbation● Image splicing detection method base on consistency checking on the camera

characteristics of the image.● Image splicing detection method based on statistical moment generating functions

Compared to the above methods the DCT analysis of neighboring pixels gives much more accuracy to detect the spliced images. The main disadvantage of this method is that the performance of the image splicing detection is strongly correlated to the image compression.

2.3.2 JPEG compression analysis and algorithm for forgery detection

Another method for image forgery detection is format base forgery detection. Formats are additive tag for every file system. Format based image forgery detection can be used to detect both copy-paste ,copy move image forgery detection. A crafty individual, who wants to perfect an image forgery, who has all the time needs can make a image which is very hard to detect forgery. For those kinds of images this method might not work. But for most image forgeries done using simple tools can be easily detected by format base image forgery detection method.Photo image forgery is classified into two categories. They are,

● Copying one area of the image and pasting it to another area. (Copy Move forgery or Cloning)

● Copying and pasting areas from one or more images and pasting onto an image being forged.(Copy Create Image forgery)

Block based image processing is a famous and efficient way to process an image. The image is broken into sub parts or equal squares.

16

Here is a simple algorithm to perform image compression.[14]

Figure 7: JPEG Image Compression Algorithm

Steps of the JPEG compression algorithm is given below.

Step 1 : Divide image into disjoint 8 x 8 compression blocks (i,j)For each 8 x8 JPEG compression block (i,j) within bounds:

where A=pixelValue([8*i]+1,8*j) , B= pixelValue(8*i,[8*j]+1), C= pixelValue([8*i]+1,8*j), D= pixelValue(([8*i]+1,8*j+1)

Step 2: For each 8 x 8 JPEG compression Block(i,j) within bounds

Dright(i,j) =

Dbottom(i,j) = Step 3: For each 8x8 JPEG compression block (i,j) within bounds

If(Dright(i,j) or Dbottom(i,j) )Set all pixel values in (i,j) to white

ElseSet all pixel values in (i,j) to black

End

Here are the results of the JPEG compression based forgery detection.

17

Figure 8: Forgery Detection Result for different threshold values.

One drawback of this method is that this is only applicable for JPEG image forgery detections only. The following method described below can be used to detect the image forgeries independent of the image format.

2.3.3 Method based on directional filter using JPEG image analysis

Forgery detection methods which are based on JPEG compression threshold which work for only JPEG image format. So that is a huge disadvantage of JPEG compression threshold method.Today digital cameras support various kinds of image format. This method described here can be used to detect forgeries in any image format. This gives us more freedom to detect image forgeries.

Here we explain the Steps involved in this algorithm.[14]Step 1 Image Processing: If the image is not represented in HSV color space convert the image color space to HSV by using correct transformations.

Step 2 Edge Detection: This step focuses the attention to areas where tampering may occur. In this method it uses a simple method for converting the gay scale image into an edge image.

Step 3 Localization of tampered part:Horizontal and vertical projections are calculated and with the help of horizontal and vertical thresholds other directional edges are removed.

18

Figure 9: Edge detection of tampered area.

STEP 4: Calculate Horizontal and Vertical projection profile STEP 5: Find boundary pixel values, which differ with Projection profile with X, and Y values STEP 6: Calculate feature map STEP 7: Identify the forgery region STEP 8: Display the Forgery Region STEP 9: Extract the forgery Region

2.4 Combining artifact in screen shots

Video forgery is not all about detecting video tampering. With the rise of modern social media and a number of content sharing services, many aspects of copyrighted video data are violated. One such scenario is illegal screen shots obtained from copyrighted video content. In modern video playback software, frame capturing facilities are provided and it has introduced some adverse effects as well. Nowadays, it is a common site that people share these screen shots taken from various video streams (eg. movies) without the owner’s consent which is a serious copyright violation. So it has become a main concern in video forgery detection. Hence methods are developed to identify whether these still images are obtained from a particular video stream. In this method, the properties of interlaced video are exploited to detect illegal screen shots. In contrast to progressive scan mode, in interlaced video mode a video frame is represented as the combination of two components, odd field and the even field. Surveys have shown that many illegal screen shots are taken through the television. TVs use the interlaced video mode to render

19

frames [5]. This adds some artifacts to video frames as a result of this interlacing process. One such instance is called the combing artifact.

Figure 10: Combining artifact in interlaced video [5]

The following section explains how the combing artifact is generated in an interlaced video. A video frames can be broken into two parts, odd field and even field. Odd field is obtained from selecting only the odd horizontal lines of the full resolution frame where the even field is obtained by extracting the even lines of the full resolution frame.

Figure 11: Odd and Even fields in a video frame [5]

With that being said, video interlacing can be explained now. An interlaced video frame is developed by combining an odd field and an even field. But these two fields do not correspond to the same full resolution frame. If the odd field is taken from the frame at time t (F(x,y,t)), even field shall be taken from the frame at time t-1 (F(x,y,t-1)).

Figure 12: Combining odd and even fields to obtain an interlaced video frame [6]

20

It is obvious that two field have a motion difference because the two fields are taken from two video frames, t-1 and t . This introduces an artifact due to the motion between the odd field and the even field [6] and it is called the combing artifact (Figure 3). In this method, the ideas is to capture the traces of combing artifact in a screen shot as evidence to interlaced video capturing.

Initial step is to extract a good set of features which emphasizes the areas that are rich of combing artifact. It is observed that combing artifact is predominant in areas where there are vertical edges in the screen shot. [6]8 features have been identified and there are divided to two categories. One is sub-bands (4 sub bands Low-Low,Low-High,High-Low,High-High) obtained from Discrete Wavelet Transformation and the other from the differential histograms (vertical and horizontal) of the screen shot. [6]

Figure 13: Feature extraction procedure of combing artifact [6]

Standard deviation, skewness and kurtosis are calculated in both LH and HL bands which provides 3 * 2 features.

21

Standard deviation

where ,I has N x N pixels

Skewness

where s is standard deviation of I

Kurtosis

where s is the standard deviation of I

Two other features are extracted from vertical histogram (Hv) and horizontal histograms(Hh) of the screen shot. The algorithm for calculating these histograms are given below.

for each pixel b in the block b do

end for

Now we define two another feature based on this histograms.

Manhattan distance

22

Euclidean distance

Now we have eight features altogether which then fed into a SVM that recognizes the amount of combing artifact. The SVM classifier is then trained by extracted features from various screen shots and original images [6]

The reason for the robustness and reliability of this method is that it uses a support vector machine algorithm for screen shot detection. Not like pure statistical methods, this algorithm has the capability to learn and evolve with the experience it obtains over time. So the average accuracy improves with time. According to [6] , the test has achieved an accuracy of 97.77%. So this method has proven successful for identifying whether a screen shot is taken from a video stream or not. The main drawback of this method is that it totally relies on the combing artifact of a screen shot. But there can be situations where the relative motion between odd field and even field is not significant and consequently the combing artifact is not so rigorous. In this scenario, the SVM algorithms would not produce satisfactory results. This argument implies that we will need to extract a set of more robust features to detect these extreme scenarios. This method has been used to detect screen shots in many popular image formats like JPEG, BMP, TIFF and video formats like MPEG-2,MPEG-4 and H.264. Possible improvements to this method involves making this system compatible with other image,video formats as well.

2.5 Multimodal feature fusion

This method, based on combining local image features and global image features offer an opportunity to discriminate genuine image from a tampered or forged image. In video footage depicting certain human communication and interaction tasks such as speaking, talking, acting or expressing emotions, different regions in faces such as lips, eyes, and eyebrows undergo different levels of motion, and by exploiting the spatio-temporal dynamics from different regions in face images, it is possible to discriminate a genuine video from tampered or forged video.

The region of interest (ROI) segmentation for detecting faces and regions within faces (lips, eyes, eyebrows, nose) is done in the first frame of the video sequence. The tracking of the face and lip region in subsequent frames is done by projecting the markers from the first frame. This is followed by measurements on the lip region boundaries based on pseudo-hue edge detection and tracking. The advantage of this segmentation technique based on exploiting the alternative color spaces is that it is simpler and more powerful in comparison with other methods used to segment the regions of interest.

23

The first stage is to classify each pixel in the given image as a skin or non-skin pixel. The second stage is to identify different skin regions in the image through connectivity analysis. The last stage is to determine for each of the identified skin regions- whether it represents a face. This is done using two parameters. They are the aspect ratio (height to width) of the skin-colored blob, and template matching with an average face image at different scales and orientations.

With a Gaussian skin color model based on red-blue chromatic space, the “skin likelihood” for any pixel of an image is extracted.

Figure 14: Face detection by skin color in red-bluechromatic space [18]

The skin probability images obtained is thresholded to obtain a binary image. The morphological segmentation involves the image processing steps of erosion, dilation and connected component analysis to isolate the skin colored blob. By applying an aspect ratio rule and template matching with an average face, it is ascertained the skin-colored blob is indeed a face. Above figure shows an original image from a facial video database, and the corresponding skin-likelihood and skin-segmented images, obtained by the statistical skin color analysis and morphological segmentation. Once the ROI is segmented and face region is localized, the local features from different sections of the face (lip region for example) are detected using another color space.

Figure 15 shows ROI region extraction (lip region) based on hue-saturation thresholding.

Figure 15: Lip region localization using hue-saturationthresholding [18]

24

An illustration of the geometry of the extracted features and measured key points is shown in the figure below. The extracted local features from the lip region are used as visual feature vector. This method is applied to the video sequences showing a speaking face for all subjects in the database.

Figure 16: Lip-ROI key points for different lip openings ofa speaking face [18]

For extracting global features, Principal Component Analysis (PCA) or Eigen analysis is performed on face and segmented facial regions corresponding to lips, eyes, eyebrows, forehead and nose. The feature fusion involves concatenation of local and global features corresponding to each facial region.

For evaluating the performance of the proposed local feature extraction and feature fusion technique experiments with a facial video sequence corpus, a video database was used. A broadcast-quality digital video camera in a noisy office environment was used to record the data.

The fusion of local and global features is evaluated for three different experimental scenarios. In the first experiment, the local feature vector for each facial part (lips, eyes, eyebrows, forehead and nose) is obtained. In the second experiment, global features in terms of 10 eigen projections of each facial part, based on principal component analysis, is used. For the third experiment, the local features from the each part of the face were concatenated with global features.

Discriminating a forged/tampered image from genuine image is done in Bayesian framework and was approached as a two class classification task based on hypothesis testing. It should be noted that decision scores are computed from for each facial part separately and combined using a late fusion approach with equal weights. Instead of making decision from one image, the decision about genuine and tampered/forged image sequence is done using at-least 100 correlated image frames.

The proposed feature fusion of local features extracted from alternative color spaces such as red-blue chrominance color space and pseudo-hue color space coupled with global features obtained with principal component analysis shows a significant improvement in performance in detecting

25

tamper/forgery as compared to single global or local features. The technique demonstrates a simple and powerful method of verifying authenticity of images. Further investigations include extending the proposed modeling techniques for extracting and estimating blindly the internal camera processing techniques targeted for complex video surveillance, secure access control, and forensic investigation scenarios.

2.6 Detecting Video Forgery by Ghost Shadow Artifact

2.6.1 Video Inpainting

Video inpainting is one of the fields in computer vision that deals with the removal of objects or restoring missing or infected regions present in a video sequence by using the temporal and spatial information gained from neighboring scenes. The objective of video inpainting is to generate an inpainted area that is merged into the video so that the consistency is maintained. Therefore when the video is played as a sequence, a human eye will not be able detect any distortion in the affected areas [12].Video inpainting techniques can be classified into two sections [11].

● Patch based techniques● Object based techniques

Patch based methods extend the image inpainting methods to video inpainting. Some of these methods are,

● Navier-Stokes video inpainting algorithm● Video Completion using global optimization● Video Completion using tracking and fragment merging

Object based methods usually produce high quality results than the patch based methods [11]. Some of these methods are,

● Video repairing under variable illumination using cycle motion● Rank minimization approach● Human object inpainting using manifold learning-based posture sequence estimation

2.6.2 Ghost Shadow Artifact

The problem associated with video inpainting is that due to the temporal discontinuities of the inpainted area, flickers are produced. Ghost shadow artifact is the visual annoyance that can be caused by these flickers.Even though efforts are made to remove ghost shadow complicated, video and camera motions will degrade video inpaint performance which leads to ghost shadow artifact [10].

26

2.6.3 Detecting Video Forgery

In order to detect video forgeries first each frame is segmented into static background and moving foreground by block matching. Here a rough motion confidence mask is computed for a frame by comparing each frame with the following frame using block matching. The camera shift can be estimated as the median shift of all the blocks in the image. Blocks having significant shift after subtracting camera motion can be identified as the moving foreground. Then all frames are aligned with camera motion and the foreground mosaic is build. Foreground mosaic is the panoramic image obtained by stitching number of frames together. Then the binary accumulative difference image (ADI) is formed by using a reference frame and comparing other frames with this reference frame. Speed and direction of the moving object can be obtained from the accumulative difference image. But because of the distortions caused by the MPEG compression binary ADI of the inpainted video sequence can be composed of some isolated points. Mathematical morphological operations can be easily used to remove these isolated points. Finally the inconsistencies in the foreground mosaic and the track are used as indications of forged video [10].

The advantage of this method is that it can be used even with MPEG compression and recompression to detect video inpainting. Limitation associated with this approach is that currently it can be used only to detect inpainting in video sequences with static backgrounds. Modifications are needed when this method is applying to video sequences with complicated video motions with different camera angles [10].

27

3. Conclusion

Throughout this literature survey, a number of video/image forgery detection mechanisms and techniques have been discussed with different perspectives. Video tampering is done using different methods. So it is obvious that there should be different methods to detect these different types of video forgery. Modern video forgery detection mechanisms are in a much advanced level when compared with what we had several years ago. It shows how rapidly this area of study is being evolved.No single detection method works best for every situation. So what video forgery detection method is appropriate for a given situation depends on a number of reasons.

● Techniques used for video forgery● Available technology● Computational restrictions● Video/Image quality● Video/Image formats

So it is essential to understand the requirement and the environmental parameters as described above in video forgery detection.In contrast to the early days of video forgery detection, researchers are more interested in intelligent video forgery detection mechanisms. This is mainly because of the technical charisma of these intelligent algorithms which are capable of self learning and evolving. Most of the recent researches are oriented around these intelligent concepts and they have showed promising results which encourages the use of more and more learning algorithms in video forgery detection. With that being said, one might think that classical image processing techniques will no longer entertain modern approaches. But this is not true for a very good reason. Even though the detection methods have been revolutionary developed, the basic structure of video and images are almost the same. For an example, televisions have been broadcasting interlaced video for a very long time. So what this argument implies is that even though the detection mechanism change rapidly, the theoretical concepts will sustain many years. As a result, the validity of fundamental image processing are preserved.With video recorders and digital cameras becoming easily accessible to the public, video forgery detection has become one of the most challenging topics. Various forms of optical data are used in almost every application nowadays from simple photography/videography to advanced applications like face recognition, security access clearance and many other security focused fields. The applications of video data in critical fields like security has added more value to the field of video forgery detection as well.

28

4. References

[1] Wen Chen, Yun Q. Shi, “Detection of Double MPEG Compression Based on First Digit Statistics” in Digital Watermarking , Springer Berlin Heidelberg, 2008, pp 16-30

[2] Weihong Wang, Hany Farid, “Exposing digital forgeries in video by detecting double MPEG compression”, MM&Sec, 2006

[3] Xinghao Jiang, Wan Wang, Tanfeng Sun, Yun Q. Shi, Fellow, IEEE, and Shilin Wang, “Detection of Double Compression in MPEG-4 Videos Based on Markov Statistic”, IEEE Signal Processing Letters, Vol 20, No 5, May 2013

[4] Sharifi K and Garcia AL (1995). Estimation of shape parameter for generalized Gaussian distributions in subband decompositions of video. IEEE Trans. Circuits Syst. Video Technol. 5: 52–56.

[5] “Interlacing- Luke’s Video Guide” [Online]. Available: http://www.neuron2.net/LVG//interlacing.html [Accessed: 15-Aug-2013]

[6] Ji-Won Lee, Min-Jeong Lee, Tae-Woo Oh, Seung-Jin Ryu, Heung-Kyu Lee, “Screenshot identification using combining artifact from interlaced video”, MM&Sec, 2010

[7] Liu Q, Sung AH, and Qiao M (2009). Improved detection and evaluation for JPEG steganalysis.ACM-MM 2009, Beijing, China. October 19-24, 2009.

[8] W. Wang and H. Farid, “Exposing digital forgeries in video by detecting duplication,” in Proceedings of the 9th workshop on Multimedia & security, 2007

[9] W. Wang, “Digital video forensics,” 2009

[10] Jing Zhang, Yuting Su, Mingyu Zhang, “Exposing Digital Video Forgery by Ghost Shadow Artifact”, MiFor '012 Proceedings of the First ACM workshop on Multimedia in forensics

[11] Anu Rachel Abraham, A. Kethsy Prabhavathy, J. Devi Shree, PhD, “A Survey on Video Inpainting”, International Journal of Computer Applications (0975 – 8887) Volume 55– No.9, October 2012

[12] Sean Moran, “Video Inpainting”, April 2012

[13] Qingzhong Liu, Andrew H. Sung , “A New Approach for JPEG Resize and Image Splicing Detection”, MiFor, 2009

29

[14] S.Murali ,Govindraj B. Chittapur , H.S Prabhakar, “Format Based Photo Forgery Image Detection”, CCSEIT '12 Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology,2012

[15] Tanfeng Sun, Wan Wang, Xinghao Jiang, “Exposing Video Forgeries by Detecting MPEG Double Compression” Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference,2012

[16] Ho Hee-Meng, “Digital Video Forensics Detecting MPEG-2 Video Tampering through Motion Errors” , Technical Report RHUL–MA–2013–5, 01 May 2013

[17] Michihiro Kobayashi, Takahiro Okabe, Yoichi Sato, “Detecting Video Forgeries Based on Noise Characteristics”, PSIVT, 2006

[18] Girija Chetty, Matthew Lipton, “Multimodal feature fusion for video forgery detection”, Information Fusion, 2010

30