Compression and distribution of panoramic videos …qoe.kt.agh.edu.pl/publ/jp1.pdfCompression and distribution of panoramic videos utilising MPEG-7-based image registration Andrzej

Compression and distribution of panoramic videosutilising MPEG-7-based image registration

Andrzej Glowacz & Michał Grega & Piotr Romaniak &

Mikołaj Leszczuk & Zdzisław Papir & Ignacy Pardyka

Published online: 2 July 2008# Springer Science + Business Media, LLC 2008

Abstract This paper describes an innovative compression method of panoramic imagesbased on MPEG-7 descriptors. The proposed solution employs a detection of a series ofindividual video frame overlaps in order to produce concatenated panoramic images. Thepresented method is easy to implement even in simple devices such as low powerconsuming chipsets installed in remote cameras having limited power supplies. Undersubjective tests it has been proved that the concatenation method allows for achieving lowertransmission rates while sustaining picture quality.

Keywords Wireless content distribution . Panoramic image . MPEG-7 .

Edge histogram descriptor . Mean opinion score . Quality evaluation

1 Introduction

Panoramic images are widely used in surveillance applications, scientific experiments and,most widely, in commercial applications. The commercial applications vary from simple

Multimed Tools Appl (2008) 40:321–339DOI 10.1007/s11042-008-0209-0

A. Glowacz (*) :M. Grega : P. Romaniak :M. Leszczuk : Z. PapirAGH University of Science and Technology, Al. Mickiewicza 30, 30-059 Krakow, Polande-mail: [email protected]

M. Gregae-mail: [email protected]

P. Romaniake-mail: [email protected]

M. Leszczuke-mail: [email protected]

Z. Papire-mail: [email protected]

I. PardykaSwietokrzyska Academy, ul. Swietokrzyska 15, 25-406 Kielce, Polande-mail: [email protected]

web-based broadcasts to fully commercial TV programs. A good example of such anapplication is TV stations which broadcast panoramic video sequences from skiing areas inthe winter season. Such applications require a slowly rotating, high-quality camera with ahigh-speed data link installed at mountain peaks. This often limits the applicability of sucha panoramic video image source or makes its distribution extremely expensive in somedifficult to access locations. The constraints are mainly in a power supply and an access to afast data link. There are several approaches to compress sequences of panoramic images,mostly based on MPEG schemes with modifications in the motion estimation andcompensation loop with epipolar geometry application, as for example [18, 22, 23].

Camera lenses can only acquire pictures with a narrow angle, therefore a panoramicimage is usually constructed as a concatenation of several images taken by a camera rotatedover its vertical axis. If a motion transmission is desired, there are significant overlaps(resulting in redundancy) between consecutive frames, as a camera rotates relatively slowly.Transmission of a concatenated panoramic image (instead of a series of redundant frames)can reduce bandwidth requirements.

Our motivation is to create an alternative compression method of panoramic imagesbased on MPEG-7 descriptors which would lower the required bandwidth, thus allowingplacement of panoramic cameras in locations without a proper transmission infrastructure.The proposed image compression solution will allow reducing a cost of placing panoramiccameras and conduce to a broader dissemination of such a service. In scenarios in which abandwidth is not a constraint, the proposed compression method allows for improvingquality of a received image.

The paper is structured as follows: the next section describes the overlapping ofpanoramic images, introduces the metrics for overlapping (MPEG-7 Edge HistogramDescriptor), detection of image overlaps, as well as the issue of concatenating andtransmitting images. In the third section, the quality of compression is evaluated usingsubjective tests. The fourth section presents the possible further extensions of the systemand concludes the paper. Additionally, Appendix 1 describes the overlap coordinatesdetection algorithm in more detail.

2 New approach to compression, transmission and decompressionof panoramic videos

The architecture of the proposed solution is depicted in Fig. 1 with two paths depicted forcomparison. First, the traditional one is the transmission of an ordinary video stream (e.g.MPEG-4) from the panorama camera to a receiving end (broadcast point). The other path isthe proposed solution aimed to capture several images, concatenate them into onepanoramic image, and then transmit it over a wireless channel. After the image istransmitted it can be split into a video stream by cropping frames out of the panoramicimage. The images do not have to be transmitted in real time, as for most of the applicationsof the remote panoramic image capture systems it is enough to update the panoramic imagefrom time to time.

The issue of panorama creation is well addressed in the literature. Some methods wereproposed that combine panoramic images into a mosaic using bundle adjustment [21, 29],block matching and reconstruction [5, 27, 28], edge methods [4] and point methods [26].In [6] a very regular camera cycle is considered to analytically compute image concatenationequations.

322 Multimed Tools Appl (2008) 40:321–339

The proposed solution can be seen as a proposal of a tool for fast constructing panoramicimages on the sender’s side of the wireless distribution system. The novelty of thetechnique lays in the employment of MPEG-7 descriptors for detecting overlapping regions,which results in lowering of the transmission requirements.

2.1 MPEG-7 edge histogram descriptor

The MPEG-7 standard [10] defines several image descriptors. The descriptors cancharacterize, e.g. colours, textures and shapes. The descriptors also allow for measuringsimilarities between two different images. The MPEG-7 colour descriptors characterize thedominant colour, the scalable colour, the colour layout and the colour structure. Texturesare characterized by homogeneous texture descriptors (HTD), texture browsing descriptors(TBD), and edge histogram descriptors (EHD). The MPEG-7 standard also specifiesdescriptors for 2D and 3D shapes [1].

Employing the MPEG-7 descriptors allows for comparing image regions usingdescriptor distances and matching overlapping regions of panoramic images. As over-lapping regions are featured with similar textures, one of the MPEG-7 texture descriptorsshould be selected. EHD seems to be the most suitable one since it operates on non-homogenous textures (overlapping regions can usually be non-homogenous). One of themost important advantages of EHD is its robustness to illumination changes. In outdoor

Source of panoramic image

WirelessTransmission

BroadcastImage capture

Videocompression

MPEG 4, H.264/AVC

DecompressionOverlapping

detection

Frame cropping

Video creation

Traditionalapproach

Proposed solution

Image concatenation

Image compression

JPEG, JPEG2000

VideodecompressionMPEG 4, H.264/

AVC

Fig. 1 Traditional architecture versus the proposed solution

Multimed Tools Appl (2008) 40:321–339 323

systems, the change of illumination with the time alters the appearance of the scene andcauses deviations of the background. This shortcoming makes automated detection ofoverlapping regions challenging under changing illumination conditions [24]. The issue ofusage of EHD has been discussed in detail in [1].

The EHD defines a histogram of the elementary edge types: vertical, horizontal, 45°diagonal, 135° diagonal and non-directional edges. The edge types are calculated in variousconfigurations of sub-images (but usually in 16, composed of 4×4), as shown in Fig. 2.Each sub-image is further divided into non-overlapping square image blocks [20].

The descriptor represents the spatial distribution of five types of edges, namely fourdirectional edges and one non-directional edge. It primarily targets image-to-imagematching, especially for natural images (having non-uniform edge distribution). The besteffectiveness of this descriptor is obtained by using the semi-global and the globalhistograms generated directly from the edge histogram descriptor, as well as the local onesfor the matching process [10].

The descriptor employs neither high mathematics nor sophisticated algorithms so it is quiteeasy to implement even in hardware devices. The extraction is mostly based on edgeclassification processing each image block as a 2×2 super-pixel block and applyingappropriately oriented edge detectors to compute the corresponding edge strengths. Thematching process is also simple, and is based on the so-called l1 (Manhattan) distance measure.The distance D between the two images (A and B) is defined by the following equation:

D A;Bð Þ ¼X79

i¼0

hA ið Þ � hB ið Þj j þ 5 �X4

i¼0

hgA ið Þ � hgB ið Þ�� þX64

i¼0

hSA ið Þ � hSB ið Þ�� ð1Þ

where h(i) represents the edge histogram values, hg(i) represents the normalized bin values forthe global-edge histograms and hS(i) represents the histogram bin values for the semi-globaledge histograms [10].

2.2 Detecting image overlaps

In order to correctly combine a given set of images into a panoramic image, individualimages have to contain overlapping parts. To correctly combine the images, there is a need

Sub-image

(0,0)

Sub-image

(0,1)

Sub-image

(0,2)

Sub-image

(0,3)

Sub-image

(1,0)

Sub-image

(2,0)

Sub-image

(3,0)

Sub-image

(1,1)

Sub-image

(2,1)

Sub-image

(3,1)

Sub-image

(1,2)

Sub-image

(2,2)

Sub-image

(3,2)

Sub-image

(1,3)

Sub-image

(2,3)

Sub-image

(3,3) Image-block

Fig. 2 Definition of sub-images and image blocks (based on [20])


of detection of overlapping areas. This section will discuss the use of the MPEG-7 EHDdescriptor for this purpose.

The basic goal of image overlap detection is finding the (x, y) coordinates, pinpointingthe location of the intersection of the images. Within the reported research, x represents thepixel value of the horizontal coordinate of the intersection on the left image and y representsthe horizontal offset between the images. A positive value of y means that the left imageneeds to be elevated by y pixels above the level of the right picture. Movement of thecamera along the vertical axis causes offset y. This movement, although very small, isunavoidable in the field applications and requires compensation. In some cases, the cameramay also roll along the axis of the lens causing additional distortions. These distortions,as well as geometric distortions caused by multiple camera lenses, are out of the scopeof this work.

The EHD-based comparator accepts two images as an input. These images are comparedand a numerical result in a form of a single number (the measure of distance—D) isreturned. Distance’s value of 0 indicates identical images. The higher the distance value is,the larger the difference between the images.

A recursive algorithm searches consecutively for x and y coordinates of intersection,assessing the progress by comparison of descriptor values for the overlapping parts of theimages, trying to minimize the value of this distance. The algorithm is depicted in Fig. 3and described in detail in Appendix 1.

Horizontal and vertical coordinates are calculated in a similar way. The horizontalcoordinate search algorithm will be described in detail.

There are several approaches to search for the coordinate of overlapping. The mostobvious solution (referred to as method A) is to overlap the images pixel-by-pixel and use

Load Input Images

Calculate x

Calculate dx descriptor value

Crop images along x

Calculate y

Calculate dx+y descriptor value

END

Optimal: (x,y) Crop images along (x,y)

dx+y<dxNo

Yes

Fig. 3 The graphical representa-tion of the algorithm used forsearch for optimal coordinates ofintersection of panoramic images


the EHD descriptors to calculate the coordinate of overlapping. This method, however, doesnot give good results. The EHD-based comparator seems to work optimally with the stripesof the limited width. Therefore the second approach was applied—the search withutilization of the sliding window (referred to as method B).

Two approaches to the comparison with use of a sliding window were tested. The firstwas to use one stationary window at the side of the picture (right side for the left picture orleft side for the right picture) and one sliding window. This method (named B1) isinfluenced by geometric distortions caused by the camera lenses. Geometric distortionshave the strongest impact on the image at the edges.

The other approach, which gives much better results than the previous one, was to utilizetwo sliding windows (we will call it B2). Horizontal search starts with selection of the size(width, in case of x coordinate search) of the sliding window. The sliding windowdetermines the part of the image that is compared in single step of the algorithm. At thebeginning of the algorithm, the sliding windows are set at the right end of the left image andthe left end of the right image. The parts of the images lying inside of the sliding windoware compared, and the result is stored. In each step of the algorithm, sliding windows aremoved in the opposite directions and comparisons are made. After both pictures arecompletely compared, the minimum value of the distance is found. The position of thesliding window for the minimum value of the distance gives the sought after x coordinate.

Figure 4 shows the numerical results obtained for methods B1 and B2, respectively. As itcan be seen in the B1 method, the minimum is hidden in the noise-like shape of the plot. Inthe B2 method the minimum can be found with greater certainty, but false results can stilloccur. To make sure that the found intersection coordinates are optimal, it is required totune the size of the sliding window (described later, at the end of this subsection).

The choice of the width of the sliding window required precise investigations. The lowerlimit of the size of the sliding window is restricted by construction of the EHD descriptor.We investigated the width of the sliding window of 32, 48, 64, 96 and 128 pixels (not allsliding window widths have been used in some tests).

Fig. 4 Comparison of methods B1 and B2


Table 1 presents results (found x coordinates) for different sizes of the sliding windowfor an example set of images. Column “reference” represents the ideal points of overlapping(found manually). The next four columns represent calculated x coordinates, and the lastfour columns—the differences between the reference x coordinate and the oneautomatically found. As can be observed, for each size of the sliding window there arevalues close to the ideal ones and some very far from the ideal ones. By calculating theaverage x coordinate using each of four sliding window sizes and comparing the individualresults with the average, we can eliminate the evidently false results, having in mind that byusing this method we are able to eliminate faulty results without referring to the reference xcoordinates—so the process can be automated. In Table 1 the coordinates, which differsignificantly from the mean (the outliers), have been printed in bold. The median ortrimmed mean (mean of sample, excluding extreme values), here both giving very similarresults, can be easily calculated and used for image concatenation. It was also observed thatthe average error of the x coordinate is around 0.02 of the overall image width. This errorcan be easily masked in the process of image concatenating.

Another test has been executed on a different group of nine images of a panoramic view(minimal overlapping of 50% of the images, the main pixel transformation is translation).The results are very similar (0.0238 of a relative mean error, with σ=0.01).

2.3 Image concatenating

A comprehensive study of the panoramic images concatenating methodology is presentedin this section. The entire process is presented systematically, with suitable figuresillustrating the approach.

The image concatenating process consists of several steps. The methodology will beexplained by an example including two overlapping images; however with a larger number ofimages it is performed in a similar way. The whole process is fully automatic and does notrequire human interaction. The module handles image stitching with overlap from 0% to almost100%. The overlapping process is performed with the use of PerlMagick, which is an objected-oriented Perl interface to ImageMagick. As a result, panorama in PNG format is obtained.

In the first step, dimensions of the background stretch, determined by the dimensions ofleft image, right image, and the coordinates of overlapping have to be calculated. Theoverlap coordinates are calculated starting from the left upper corner of the left image and

Table 1 Results of x coordinate search for a set of 320×240 images

Left image ID Right image ID Reference X coordinate Difference

32 48 64 128 Δ32 Δ48 Δ64 Δ128

1 2 142 140 142 37 109 2 0 105 332 3 170 170 170 154 132 1 1 16 383 4 144 99 145 144 162 45 1 0 184 5 197 197 200 205 47 1 4 9 1505 6 108 14 117 108 116 94 9 0 86 7 107 107 103 102 98 1 4 5 97 8 118 114 85 117 119 4 33 1 28 9 226 257 209 240 21 32 17 15 2059 10 95 93 94 125 99 2 1 31 510 11 84 80 95 86 82 4 12 3 211 12 115 113 156 140 99 2 41 25 16


indicate the point on this image where the left upper corner of the right image should beplaced, hence a bigger module of the coordinate means bigger background dimensions(refers to both width and height). The left image is placed on the background first, with leftedges of the image and background overlapping, Y coordinate indicates vertical offset (seeFig. 5).

Subsequently, the right image is overlapped in the proper position. Once it has beenperformed, it is easy to evaluate that the overlap coordinates were calculated properly,because the images are lined up precisely enough and there are no obvious breaks in theview at the image edges (see Fig. 6).

The discussed image is an indirect result, meant to show how well the images are linedup; this is not what is expected. There are obvious stitch lines in the preview, but that isacceptable because it is not the result, and will be straightened out.

Fig. 6 Overlapped images

Fig. 5 Background with the left picture


The next step is to optimise the visual quality of the panorama by setting up the blendarea for images. The image’s pixels transparency spatial distribution is determined in thedetecting image overlaps phase, and is always located within the overlapping stretch. Visualtransparency is achieved due to a computation of average R, G and B values for overlappingpixels of the left and right images. The weight of each pixel from the pair depends on thelocation within the blend area and changes smoothly from max to min value. Location anddimensions of the blend area are adjustable and some research was performed in order tocalibrate the module.

A good tutorial on blending methods can be found in [25]. We experiment with twofeathering approaches to set the blend area, presented in Fig. 7; black and white colourspresent two different images. The approach presented on the left side was examined inorder to determine location of the blend area that gives the best ‘optical’ results. The leftand right images were cut off the panorama just after the blending process and subtractedfrom the original ones, showing how they differ. Best results were achieved when locationand dimensions of the blend area were the same as in the detecting image overlaps phase.

The approach presented on the right side was used to create the full panoramic imagepresented in the paper. It was designed in order to obtain better blending results than in the

Fig. 7 Two different approaches to set the blend area

Fig. 8 Blending result


previous one, and can be perceived as an improvement. The shape of the blend area issimilar to the half of the ellipsoid, with adjustable length of the one semi-axis. Themaximum length of the semi-axis should be applied only for the perfectly matched (inthe vertical) pairs of the images. In a case of a pure vertical matching, shortening of thesemi-axis allows for concealing this defect; however, it means the worst blending effect inthe corners. Hence, the dimension of the ellipsoid should be reasonably assessed, respectingthe current matching quality. In fact, both methods are feathering approaches and thus canreveal known problems, e.g. with high frequency details that cannot be confidentlyreproduced in a panorama image [25].

Results of the blending process (second approach) applied to create the full panoramicimage are presented in Fig. 8 (see Fig. 6 before blending process) to compare. In the laststep the redundant bars from the top and the bottom, which were created as a result of non-zero value of overlapping in vertical axis, have to be cropped off the picture (both ofY coordinate value height) in order to receive the “field of view”—a part of the fullpanoramic image (see Fig. 9).

The result of the image concatenating methodology described in this section is presentedin Fig. 10, and illustrates the full panoramic image created from a few individual panoramicimages.

Fig. 9 A part of full panoramic image

Fig. 10 Full panoramic image


2.4 Sending and receiving images

After the panoramic image has been composed, it needs to be compressed and transmitted.Three compression algorithms are proposed: JPEG [14], JPEG 2000 [7, 16] and PNG [11].JPEG offers a lower quality and compression ratio; however, its algorithm is relativelysimple and does not require a high computation power. JPEG 2000 offers a better qualityand compression ratio, but at the cost of an increase in computation power. Computationpower required for both JPEG and JPEG 2000 compression schemes can be easily deliveredby using embedded encoders [28]. PNG offers lossless compression, generates no artefactsand preserves sharp edges.

It is desired to achieve the shortest possible delay time between a start of transmission ofa panoramic image and a playout at a recipient side. In order to ensure this, the panoramicimage should be rotated 90° clockwise (for left to right by panning) or counter clockwise(for right to left panning). This allows for simultaneous transmitting of a panoramic imageand displaying the fragments already downloaded—its decompression may start before awhole file is received. Unfortunately, the method works only for the JPEG compression—the JPEG 2000 scheme requires receiving an entire image file prior to decompression, soa delay will be introduced. For the same reason the Progressive JPEG compression cannotbe used.

2.5 Decompression

When the panoramic image is received it needs to be decompressed into a video sequence.It requires, in the first stage, the decompression of the JPEG/JPEG 2000 image. Thecomputation power at the receiving end is not a constraint in most of the applications.

Construction of the video sequence is done by cropping out frames from the panoramicpicture. As the whole panoramic image is received, it is possible to change the speed ofscrolling of the panoramic image, change the direction of scroll or even, if the image is ofhigher resolution, zoom into the interesting parts of the image. It allows creation of servicesmore advanced than a simple video broadcast. An interactive panorama service can beconstructed and delivered.

3 Quality evaluation of a compression of concatenated images

Evaluation of each compression scheme involves an analysis of a trade-off between twocompetitive factors—a fidelity loss versus a compression rate.

This section presents a comparison of a concatenation-based panoramic imagecompression (deploying JPEG, JPEG 2000 or PNG algorithms) with a traditional streamingapproach (deploying MPEG-4 or H.264/AVC standards).

Quality evaluation was performed with a test sequence containing the same videocontent compressed with all the mentioned techniques. As the CR (compression ratio) isconstant for the PNG encoded sequence, all the other compressed sequences were encodedwith a number of different CR.

A fidelity loss due to a compression algorithm applied can be derived using either anobjective or subjective approach.

Specific test requirements (quality assessment of cropped out parts of a full panoramicpicture) enforced a subjective approach since objective ones (e.g. PSNR, MAE, MSE,RMSE metrics) are not adequate for this type of distortion, where slight movements of


image parts are present (huge difference for the pixel-to-pixel-based metrics and, forcontrast, insignificant from the end user’s perceived quality point of view). The performedtest proved that results obtained with use of an objective metrics indicate existence of a verystrong distortion, while the user’s feeling about the quality may be quite different.

This section presents a subjective quality evaluation and comparison for cropped outframes from a full (concatenated) panoramic picture as well as for corresponding MPEG-4and H.264/AVC [8, 13] compressed frames (320×240 resolution). The applied experimentprinciples, methodology and results are presented.

In order to provide an accurate quality comparison for video transmitted with differenttechniques (as a panoramic picture and as a video sequence), an analysis of correspondingframes had to be performed. To fulfill the mentioned requirement, video streams weredecompressed to the separate frames, and full panoramic pictures were divided into anumber of cropped out parts (corresponding to the decompressed frames). All those frames/images determined the test set, and were used in the subjective evaluation trials.

3.1 Experiment principles

In order to evaluate the quality of reconstructed images and to compare the panoramicapproach to the MPEG-4 and its successor H.264/AVC, the authors gathered severalsubjective quality scores. The responders compared frames reconstructed from thepanoramic image with their equivalents from MPEG-4 and H.264/AVC streams. The deltaframes were selected in case of video streams, as they are the prevailing ones. The frameswere encoded using SIF resolution (320×240). The authors used frames compressed underseveral compression ratios. The test set contained lossless compressed frames as well.

3.2 Experiment methodology

Methodologies for the subjective tests can differ, depending on the specific test require-ments. In order to fulfill the requirements, one method can be used, or a composition of afew of the existing methods can be applied.

Subjective tests of the image’s quality should be conducted using diverse and numerousgroups of expert users. Results of the tests should allow for calculating the mean opinionscore—MOS [15] for all of the types of pictures presented in the tests.

Methodology for different methods of the subjective tests is described in [15]. The firstmethod, called double stimulus impairment scale—DSIS, operates on the five-levelgrading. The reference image is always shown against the distorted one. Assessment ofthe image’s quality refers to the distortion level, not the absolute image quality. The secondmethod is called double stimulus continuous quality scale—DSCQS. The picture quality isassessed on a continuous quality scale from excellent to bad. Experts are not aware whichpicture is the reference one, so absolute image quality is assessed.

Another type of the subjective tests methodologies is presented in [12]: Single-stimulusand stimulus-comparison methods. In the Single-Stimulus method, a single image orsequence of images is presented and the responder (tester) provides an index of the entirepresentation. The method applied in the test is a single-stimulus method called adjectivalcategorical judgement method, the most adequate for the requirements and applied in thesubjective tests in the presented research. Experts assign an image or image sequence to oneof a set of categories that, typically, are defined in semantic terms. Categorical scales thatassess image quality and image impairment have been used most often, and the ITU-Rscales are given in Table 2 [12].


The test session consists of a series of assessment trials. These should be presented in arandom order and, preferably, in a different random sequence for each observer. The testpictures or sequences are presented only once in the test session: experiment normallyensures that the same image is not presented twice in succession with the same level ofimpairment [12].

The subjective tests were performed using adjectival categorical judgement method(single-stimulus method), as it reflects perceptual response of a viewer, who does not haveaccess to the original content (tester has no reference and cannot compare quality of theoriginal and distorted content directly). Hence, the overall quality is assessed based only ontransmitted (distorted) and decoded content. The subjective trial consisted of a number ofimages presented (one by one) to the testers who were asked to grade the quality of thescreened images according to the scale presented in Table 2. The test set included separatedframes from MPEG-4 and H.264/AVC streams as well as JPEG, JPEG 2000, PNG anduncompressed images. Different compression ratios correspond to different quality ofimages. Images in the JPEG, JPEG 2000 and PNG format are cropped out parts of a fullpanoramic picture, equivalent to the other frames used in the test.

3.3 Results

The MOS evaluations have been performed using around 160 responders (most of thembeing faculty staff members and students: however it does not influence test results,according to the [12]). Around 5,000 responses were collected. Figure 11 presents the MOSvalues for several visual codecs as a function of CR.

The plots should be interpreted as follows. As noticeable, the overall score is below 4.Two effects influence such a low score. First, people generally tend not to give boundary(1, 5) marks. Second, no images in SIF resolution were scored at 5 due to small dimensions.

Fig. 11 MOS for several visualcodecs as a function of CR(logarithmic CR scale for bettervisualization). Except for the verylow compression ratios, theproposed JPEG (2000)-basedcompression schemes, operatingon full panoramic images,prevail traditional MPEG 4/H.264/AVC-based approaches,operating on frame sequences

Quality Impairment

5—Excellent 5—Imperceptible4—Good 4—Perceptible, but not annoying3—Fair 3—Slightly annoying2—Poor 2—Annoying1—Bad 1—Very annoying

Table 2 ITU-R quality andimpairment scales


The “JPEG” and “JPEG 2000” plots as well as the “PNG” point show the quality scores forthe corresponding frames extracted from the panoramic image. Clearly, there is anadvantage (higher MOS) of the panoramic-based compression scheme, while the CR isbetween (around) 20 and 100.

As seen in Fig. 11, the MPEG-4 codec does not allow for achieving the MOS>2 forCR>20. The MOS for H.264/AVC codec drops to 2 for CR between 30 and 40.Fortunately, there is an improvement of quality if cropped out panoramic images (instead ofvideo sequence frames) are used. While compressing the still panoramic image usingregular JPEG codec, it is possible to achieve MOS>2 for CR up to 50. In the case wherethe more modern JPEG 2000 codec is used, the threshold of CR is up to 60.

Highlighted benefits of the results used in a compression approach in various wirelesssystems have been presented upon the basis of simulated transmission time of 14 framelong sequences (Fig. 12). The assumed quality was MOS=2.5.

4 Conclusions and future work

This paper described an innovative compression method of panoramic videos utilisingMPEG-7-based image registration. The proposed solution employs a detection of a series ofindividual video frame overlaps in order to produce concatenated panoramic images. Theconcatenation is done using detection of overlaps with an easy to implement (even insimple devices) MPEG-7 edge histogram descriptor. The presented approach allows forachieving lower transmission volumes with sustained picture quality, which consequentlyallows for shortening transmission times. The compression gain relates to the desired MOS(mean opinion score) level. As the panoramic images are widely used in surveillanceapplications, scientific experiments and, most widely, in commercial applications (like forslowly-rotating, high-quality cameras with low-speed data links installed at mountainpeaks, transmitting panoramic video sequences from skiing areas in a winter season), theapproach considers distribution using links based on mobile bearers such as GPRS, EDGEand UMTS. As for further work, the authors left the issue of examining the methodefficiency at various rotation speeds and FPS ratios, as well as facing the results with state-of-the-art motion compensation methods.

The work has been brought to the stage of the working prototype. A next stage isthe implementation of a stand-alone codec. Such software would allow the last step of thework—an example of implementation based on the above described idea and the codec. It

Fig. 12 Simulated transmissiontimes of a 14-frame sequence.The same compression principlesas in the Section 2.4 have beenused here


would create a product ready to be used. In addition, some work can be done in the area ofoptimisations. It is possible to optimise the algorithm so that it could transmit the panoramicimages in real time.

Acknowledgements The work presented in this paper was supported in part by the Polish State Ministry ofScience and Higher Education under the Grants No. NN517438833 and 4T11D00525, as well as by theEuropean Commission under the grant no. FP6-0384239 (Network of Excellence CONTENT).

Appendix 1 Description of overlap coordinates detection algorithm

A recursive algorithm is implemented [1]. It starts with a zero translation vector: (x0, y0) =(0, 0). In the first step the x1 coordinate is searched. This coordinate pinpoints the best placeof intersection of images in the horizontal axis (with no vertical offset). A D(x1, y0) distancevalue of the overlapping at (x1, y0) image parts is calculated for evaluation of progress ofthe algorithm. After calculating the horizontal offset, a vertical offset is searched. For afound x1 coordinate the y1 coordinate is searched. A D(x1, y1) distance value of theoverlapping at (x1, y1) image parts is calculated. If the result is better than the previous one,i.e. D(x1, y1) < D(x1, y0), the above algorithm is repeated, aiming at finding (x2, y1) withD(x2, y1). Otherwise, it terminates. The comparison of descriptor values calculated in theprevious steps allows assessing if the result is closer to the optimal one. If any progress ismade (the latter distance value is smaller than the previous one), the algorithm is continued.Otherwise, the previous coordinates pinpoint the best location for intersection of images.The algorithm is depicted in Fig. 3 in Section 2.2. Horizontal and vertical coordinates arecalculated in a similar way.

References

1. Glowacz A, Grega M, Leszczuk M, Romaniak P (2006) Detecting panoramic image overlaps withMPEG-7 descriptors. In: Proceedings of the international conference on signals and electronic systems(ICSES ’06). Łódź, Poland

2. Halonen T, Romero J, Melero J (2003) GSM, GPRS and EDGE performance—evolution towards 3G/UMTS. Wiley, New York

3. Holma H, Toskala A (2004) WCDMA for UMTS. Wiley, New York4. Hsieh JW (2004) Fast stitching algorithm for moving object detection and mosaic construction. Image

Vis Comput 22:291–3065. Hsu CT, Tsan YC (2004) Mosaics of video sequences with moving objects. Signal Process Image

Commun 19:81–986. Huang F (2000) Epipolar geometry in concentric panoramas, research report CTU-CMP-2000-07.

University of Auckland, Auckland, New Zealand7. ISO Standard IS 14444-1:2004 (2004) Information technology—JPEG 2000 image coding system: core

coding system8. ISO Standard IS 14496-10 (2004) Information technology—coding of audio-visual objects—part 10:

advanced video coding9. ISO Standard IS 14496-2 (2004) Information technology—coding of audio-visual objects—part 2: visual10. ISO/IEC Standard TR 15938 (2005) Information technology—multimedia content description interface11. ISO/IEC 15948 (2004). Information technology—computer graphics and image processing—portable

network graphics (PNG): functional specification12. ITU-R Recommendation BT.500-11 (2002) Methodology for the subjective assessment of the quality of

television pictures. Geneva, Switzerland13. ITU-R Recommendation H.264 (2005) Advanced video coding for generic audiovisual services14. ITU-R Recommendation T.81 (1992) Information technology—digital compression and coding of

continuous-tone still images—requirements and guidelines


15. ITU-T Recommendation P.800 (1996) Methods for subjective determination of transmission quality.Geneva, Switzerland

16. ITU-T Recommendation T.800 (2004) Information technology—JPEG 2000 image coding system: corecoding system

17. Kaaranen H, Ahtiainen A, Laitinen L, Naghian S, Niemi V (2005) UMTS networks—architecture,mobility and services. Wiley, New York

18. Kamisetty C, Jawahar CV (2003) Multiview image compression using algebraic constraints. In: Proc.IEEE region 10 conference on convergent technologies (TENCON). Bangalore, India, pp 927–931

19. Kim DH, Yoon YI, Choi JS (2003) An efficient method to build panoramic image mosaics. PatternRecogn Lett 24:2421–2429

20. Manjunath BS, Salembier P, Sikora T (2002) Introduction to MPEG-7 multimedia content descriptioninterface. Wiley, Chichester, England

21. McLauchlan PF, Jaenicke A (2003) Image mosaicing using sequential bundle adjustment. Image VisComput 20:751–759

22. Pardyka I (2006) Homography-based panoramic image sequence compression method. ICSES ’06. Lodz,Poland, pp 293–296

23. Pardyka I (2006) Panoramic image sequence compression method. Visualization, Imaging, and ImageProcessing (VIIP 2006). ACTA, Mallorca, Spain, pp 282–286

24. Shah M, Javed O, Shafique K (2007) Automated visual surveillance in realistic scenarios. IEEEMultimedia 14(1):30–39

25. Szeliski R (2006) Image alignment and stitching: a tutorial. Foundations Trends Comput Graphics Vis2(1):1–104

26. Tian GY, Gledhill D, Taylor D (2003) Comprehensive interest points based imaging mosaic. PatternRecogn Lett 24:1171–1179

27. Trakaa M, Tziritasa G (2003) Panoramic view construction. Signal Process Image Commun 18:465–48128. Zhang C, Long Y, Kurdahi F (2005) Embedded computer systems: architectures, modeling, and

simulation. In: Proceedings of the 5th international workshop SAMOS (SAMOS ’05) (Samos, Greece,July 18–20, 2005). Springer, Lecture Notes in Computer Science, vol. 3553. Berlin/Heidelberg, p 334

29. Zitova B, Flusser J (2003) Image registration methods: a survey. Image Vis Comput 21:977–1000

Andrzej Glowacz received M.Sc. in Telecommunications with first class honours and the Golden Medal ofStanislaw Staszic for Best Graduate from the AGH University of Science and Technology, Krakow, in 2002.He has also been awarded five nationwide distinctions for his Master Thesis. He finished Ph.D. studies inComputer Science and obtained Ph.D. in Telecommunications in 2007. Currently he is Assistant Professor inAGH University of Science and Technology. Andrzej Glowacz has been working as an IT Expert innumerous commercial projects, grants of the Polish Ministry of Science and Education and the Europeanresearch projects: DAIDALOS, DAIDALOS 2, CARMEN, EuroNGI, EuroFGI, OASIS Archive, andGAMA. His main professional areas are: image processing and recognition, Wireless QoS, modern transportprotocols, network simulations, advanced Linux programming, and operating systems. He is author of overforty scientific papers and technical reports, he also serves as a reviewer of several international journals andconferences.


Michał Grega started his university education at the AGH University of Science and Technology in Cracow,Poland in 2001. In 2004 qualified to the EU “Socrates–Erasmus” program and studied one semester at theTampere University of Technology, Tampere, Finland. In 2006 he presented the Master Thesis “Trustmanagement in Ad–Hoc Networks”, which received highest–possible grades. He received a diploma withhonors. In 2006 he passed exams for the Ph.D. studies at the Department of Telecommunications, Universityof Science and Technology. In 2006 and 2007 he was awarded a “Sapere Auso” Ph.D. student grant, fundedby the European Commission. In 2005 he joined a working team at the Department of Telecommunicationsat the University of Science and Technology, where he took part in several national and European projects.He is an author of 10 publications and a book chapter.

Piotr Romaniak received M.Sc. Eng. degree in Telecommunications (2006), currently is a Ph.D. student inthe Department of Telecommunications at the AGH University of Science and Technology (Krakow, Poland).In the scope of his area of interests/skills, the followings are included: knowledge about multimedia systems,content-based indexing, video and image quality evaluation (objective and subjective techniques), videostreaming and video contribution technologies (TV-studio quality, montage purpose), wide programmingskills. Piotr Romaniak was/is involved in Culture 2000, eContentPlus, FP6 and FP7 projects. He was alsoinvolved in few research activities: for polish top telecommunication services provider (assessment of the 3rdgeneration fax image quality, subjective and objective evaluation methods) and in cooperation with othereducational entities (panoramic images concatenating, subjective tests - MOS).


Mikołaj Leszczuk is an assistant professor at the Department of Telecommunications (AGH University ofScience and Technology, Krakow, Poland). He received the M.Sc. in Electronics and Telecommunications in2000 and PhD degree in Telecommunications in 2006, both from the AGH University of Science and Technology.He is currently lecturing on Digital Video Libraries, Information Technology and Basics of Telecommunications.In 2000 he visited Universidad Carlos III de Madrid (Madrid, Spain) for a scientific scholarship. During 1997-1999 he was serving for several Comarch holding companies as a Manager of the Research and DevelopmentDepartment, President of the Management and Manager of the Multimedia Technologies Department. He hasparticipated actively as steering committee member or researcher in several national and European projects,including: INDECT, BRONCHOVID, GAMA, e-Health ERA, PROACCESS, Krakow Centre of Telemedicine,CONTENT, E-NEXT, OASIS Archive, and BTI. His current activities are focused on e-Health, multimedia forsecurity purposes, P2P, image/video processing (for general purposes as well as for medicine), development ofdigital video libraries, particularly on video summarization, indexing, compression and streaming subsystems.He has been a chairman of the “e-Health in Common Europe” conferences sessions. He is a member of IEEESociety since 2000. He has served as expert for European Framework Programme and Polish State ForesightProgramme as well as he served as a reviewer for several scientific conferences and journals.

Zdzisław Papir is professor at Department of Telecommunications (AGH University of Science andTechnology). He received the M.Sc. degree in Electrical Engineering in 1976 and Ph.D. degree in ComputerNetworks both from the AGH University of Science and Technology. In 1992 he received the Dr Hab. degreefrom the Technical University of Gdansk. He is currently lecturing on Signal Theory, Modulation andDetection Theory, and Modelling of Telecommunication Networks. During 1991-98 he made several visits atuniversities in Belgium, Germany, Italy, and US working on traffic modelling. During 1994-95 he was servingfor the Polish Cable Television as a Design Department Manager. Since 1995 he serves also as a consultant inthe area of broadband access networks for the Polish telecom operators. He authored five books and about 60


research papers. He was involved in organization of several international conferences at home and abroad.He is a guest editor for IEEE Communications Magazine responsible for the Broadband Access Series. Her isa member of an editorial board of a bookseries “Global Information Society” (in Polish). He has beenparticipating in several R&D IST European projects.

Ignacy Pardyka is a Senior Lecturer at the University of Jan Kochanowski in Kielce (UJK) Poland. Hereceived M.Eng. degree in Electronics and Ph.D. in EE from the Warsaw University of Technology in 1975and 1987, respectively. Lecturer at the Kielce University of Technology (1978-2001), and since 2001 at theUJK. Currently his research activity is focused on image and video compression methods.


Compression and distribution of panoramic videos …qoe.kt.agh.edu.pl/publ/jp1.pdfCompression and distribution of panoramic videos utilising MPEG-7-based image registration Andrzej

Documents