Top Banner
Human Detection using HDR Images Tomoya Miyoshi, Takao Jinno and Shigeru Kuriyama Toyohashi University of Technology Abstract—This paper proposes a robust human detection method using high dynamic range (HDR) images that can avoid over- or under-exposure against steep lighting changes. Such images have potential for robustly detecting object in a very dark or bright scene. Directly applying them to ordinary object detection algorithm, however, increases computational and storage cost. In addition, compressing a dynamic range of HDR images into a low dynamic range (LDR) often over-boosts object features. This paper therefore investigates useful or obstructive properties of existing tone-mapping methods in order to apply HDR images for human detection with boosted HOG features. KeywordsHuman detection, HOG fuature, HDR image, tone- mapping. I. I NTRODUCTION HDR images can capture high dynamic range scenes with- out over- or under-exposure since they have wider dynamic range and higher bit-depth than LDR images. Such property is suitable for robustly detecting object in lighting scenes of night including backlight and a sudden/temporary changing il- luminations. Directly applying HDR images to ordinary object detection, however, requires the modification of their algorithm because they are designed for LDR images. The pixel values of LDR images have non-linearity to sensor irradiance, when captured with an ordinary general-purpose digital camera. HDR images, on the other hand, have linear high bit-depth pixel values. Such inconsistency of the acquisition process between LDR and HDR images causes the decrease in the accuracy of object detection. The methods of tone-mapping were introduced for com- pressing dynamic range to reduce bit-depth of HDR images, which makes the image displayed on a ordinary monitor. Though these methods can preserve minute details with their local or non-linear processing, they often degrades the perfor- mance. This paper introduces various types of tone-mappings for comparing the performance of human detection task based on HOG features. We darify the difficulty in utilizing HDR images to object detection by analyzing the side-effect caused by these tone-mappings. II. DATASET Directly applying HDR images to human detection of- ten increases computational and storage cost. We therefore compress their dynamic range into a LDR one for applying them to our human detector with HOG features [1], that are boosted through Real AdaBoost [2]. We introduce two datasets separated for boosting and detecting, and regard the training data of conventional LDR images as optimal dataset for boosting. Since our human detection mechanism using HDR images should hold the compatibility with existing detection, existing dataset was utilized for training. On the other hand, authors collected testing data of HDR images whose dynamic range is compressed by local or global tone-mapping. We assume that the HDR images are obtained by fusing multi-exposure images as follows [3]: I HDR = P p=1 w(I LDR,p ) · f -1 (I LDR,p )/t p P p=1 w(I LDR,p ) (1) where p is the index of exposure images, t is the shutter speedf -1 denotes the inverse function of camera response curve, and w denotes the weighting function. The testing dataset of 531 HDR images were composed by fusing three exposure images. They were captured at optimal exposure value and its ± 2 EV which are calculated by the automatic control. The training dataset of 531 LDR images consist of the only optimal exposure images as shown in Figure 1. These datasets includes 203 night scenes, as shown in Figure 1(b), and some images have the wide dynamic range caused by car headlights or sparse street lamps. We selected 1386 positive and 7654 negative samples from training dataset for boosting classifiers, as shown in Figure 2. We intentionally excluded night scenes having many dark-current noise and saturated regions, in order to avoid undesirable effects in boosting. III. TONE- MAPPING Our method introduces a global and local tone-mappings based on [4]. The global tone-mapping is given as, L GT M = L 1+ L · ( 1+ L L 2 w ) , (2) L = α ¯ L HDR · L HDR , (3) where the scaling parameter α is the default value 0.18 in [4], ¯ L HDR is the logarithmic mean of L HDR , and L w is smallest luminance that is mapped to pure white. This paper sets The local tone-mapping is given as, L LT M = L 1+ V , (4) where V is an edge-preserving smoothed luminance with a fast bilateral filter [5] for accelerating the conversion. Three types of tone-mapping are applied to compress the dynamic range of testing dataset, which we categorized as follows: Global tone-mapping (GTM) is that input HDR im- ages are compressed by using global tone-mapping of equation (2), Local tone-mapping (LTM) is that input HDR images are compressed by using local tone-mapping of equation (4), and Virtual camera model within each windows (VCM) is
4

Human Detection using HDR Images images can capture high dynamic range scenes with-out over- or under-exposure since they have wider dynamic range and higher bit-depth than LDR images.

Apr 09, 2019

Download

Documents

leduong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Human Detection using HDR Images images can capture high dynamic range scenes with-out over- or under-exposure since they have wider dynamic range and higher bit-depth than LDR images.

Human Detection using HDR Images

Tomoya Miyoshi, Takao Jinno and Shigeru KuriyamaToyohashi University of Technology

Abstract—This paper proposes a robust human detectionmethod using high dynamic range (HDR) images that canavoid over- or under-exposure against steep lighting changes.Such images have potential for robustly detecting object in avery dark or bright scene. Directly applying them to ordinaryobject detection algorithm, however, increases computational andstorage cost. In addition, compressing a dynamic range of HDRimages into a low dynamic range (LDR) often over-boosts objectfeatures. This paper therefore investigates useful or obstructiveproperties of existing tone-mapping methods in order to applyHDR images for human detection with boosted HOG features.

Keywords—Human detection, HOG fuature, HDR image, tone-mapping.

I. INTRODUCTION

HDR images can capture high dynamic range scenes with-out over- or under-exposure since they have wider dynamicrange and higher bit-depth than LDR images. Such propertyis suitable for robustly detecting object in lighting scenes ofnight including backlight and a sudden/temporary changing il-luminations. Directly applying HDR images to ordinary objectdetection, however, requires the modification of their algorithmbecause they are designed for LDR images. The pixel valuesof LDR images have non-linearity to sensor irradiance, whencaptured with an ordinary general-purpose digital camera.HDR images, on the other hand, have linear high bit-depthpixel values. Such inconsistency of the acquisition processbetween LDR and HDR images causes the decrease in theaccuracy of object detection.

The methods of tone-mapping were introduced for com-pressing dynamic range to reduce bit-depth of HDR images,which makes the image displayed on a ordinary monitor.Though these methods can preserve minute details with theirlocal or non-linear processing, they often degrades the perfor-mance.

This paper introduces various types of tone-mappings forcomparing the performance of human detection task based onHOG features. We darify the difficulty in utilizing HDR imagesto object detection by analyzing the side-effect caused by thesetone-mappings.

II. DATASET

Directly applying HDR images to human detection of-ten increases computational and storage cost. We thereforecompress their dynamic range into a LDR one for applyingthem to our human detector with HOG features [1], that areboosted through Real AdaBoost [2]. We introduce two datasetsseparated for boosting and detecting, and regard the trainingdata of conventional LDR images as optimal dataset forboosting. Since our human detection mechanism using HDRimages should hold the compatibility with existing detection,

existing dataset was utilized for training. On the other hand,authors collected testing data of HDR images whose dynamicrange is compressed by local or global tone-mapping.

We assume that the HDR images are obtained by fusingmulti-exposure images as follows [3]:

IHDR =

∑Pp=1 w(ILDR,p) · f−1(ILDR,p)/∆tp∑P

p=1 w(ILDR,p)(1)

where p is the index of exposure images, ∆t is the shutterspeed,f−1 denotes the inverse function of camera responsecurve, and w denotes the weighting function.

The testing dataset of 531 HDR images were composed byfusing three exposure images. They were captured at optimalexposure value and its ± 2 EV which are calculated by theautomatic control. The training dataset of 531 LDR imagesconsist of the only optimal exposure images as shown in Figure1. These datasets includes 203 night scenes, as shown in Figure1(b), and some images have the wide dynamic range causedby car headlights or sparse street lamps.

We selected 1386 positive and 7654 negative samplesfrom training dataset for boosting classifiers, as shown inFigure 2. We intentionally excluded night scenes having manydark-current noise and saturated regions, in order to avoidundesirable effects in boosting.

III. TONE-MAPPING

Our method introduces a global and local tone-mappingsbased on [4]. The global tone-mapping is given as,

LGTM =L

1 + L·(1 +

L

L2w

), (2)

L =α

L̄HDR· LHDR, (3)

where the scaling parameter α is the default value 0.18 in [4],L̄HDR is the logarithmic mean of LHDR, and Lw is smallestluminance that is mapped to pure white. This paper sets Thelocal tone-mapping is given as,

LLTM =L

1 + V, (4)

where V is an edge-preserving smoothed luminance with afast bilateral filter [5] for accelerating the conversion.

Three types of tone-mapping are applied to compress thedynamic range of testing dataset, which we categorized asfollows: Global tone-mapping (GTM) is that input HDR im-ages are compressed by using global tone-mapping of equation(2), Local tone-mapping (LTM) is that input HDR imagesare compressed by using local tone-mapping of equation (4),and Virtual camera model within each windows (VCM) is

Page 2: Human Detection using HDR Images images can capture high dynamic range scenes with-out over- or under-exposure since they have wider dynamic range and higher bit-depth than LDR images.

(a) Daytime scenes (b) Night scenes

Fig. 1. Examples of LDR images

(a) Positive data (b) Negative data

Fig. 2. Training data

that input HDR images are compressed within each windowby using virtual camera model, where we here estimate theoptimal exposure for capturing images with response function[6] of a virtual camera, and adaptively set shutter speed forevery window. We comparatively evaluate the with above threemethods and the automatic camera model (ACM) using onlythe optimally controlled exposure image.

Our human detector slides the detection window six times,while varying window size and position in the order shownin Figure 3, for detecting the window just enclosing human.As this process often detects plural windows per person, ourmethod finds all windows that enclose a whole human bodyas correct regions, as shown in Figure 4. The above tone-mappings then try to detect a human within each window.

Fig. 3. Sliding order of detection window

Fig. 4. Classification of correct and false images

IV. EXPERIMENTAL EVALUATIONS

A. Evaluation for daytime and night scenes

Figure 5 shows the comparison of the above-mentionedthree tone-mapping and the automatic camera model in day-time scenes. The GTM is suitable for human detections be-cause it can compress the HDR image without saturation andstrongly enhance details. The image compressed by GTM,however, is often rather bright, which is caused by the scalingparameter as shown in equation (3). It often enhances thedark-current noise which decreases the detection accuracy. TheACM often has saturation regions, thus the detection accuracyof the ACM is lower than the GTM. The VCM improvesthe saturation of the ACM but the compressed image oftenbecomes very bright. The detection accuracy of the VCMis approximately the same as that of the ACM. The LTMenhances the details of both human and background, and itincreases the number of the false recognition.

Figure 6 shows similar comparison in night case. TheGTM has also the high detection accuracy for night scenes.Since night scenes has many saturation regions, the detec-tion accuracy of the ACM significantly is degraded. Thesaturation improvement of the VCM is effective for nightscenes. Although the LTM enhances the details of both humanand background, whose dusky mapping suppresses the effectof enhanced background because the background of nightscenes often exists in dark reasons. The accuracy of the LTM,therefore, is approximately the same as that of the GTM.

We summarize the above observation in Table I. The GTMsatisfies those three conditions, and the detection accuracytherefore achieved the highest in our experiment.

B. Effect of the noise of HDR image

Fusing multi-exposure images often causes undesirableeffects called dark-current noise (Figure 8) and ghost (Figure9). The dark-current noise in the HDR images decreases the

Page 3: Human Detection using HDR Images images can capture high dynamic range scenes with-out over- or under-exposure since they have wider dynamic range and higher bit-depth than LDR images.

Fig. 5. Performance curves for daytime scenes

Fig. 6. Performance curves for night scenes

TABLE I. CONCLUSION OF THE COMPARATIVE EVALUATION OF TONE-MAPPING ALGORITHMS

Saturation region Detail enhancement Brightness of scaled image Daytime result Night resultACM Many No No scaling Medium LowVCM Few No Bright Medium MediumGTM None No Rather bright High HighLTM None Yes Dusky Low High

detection accuracy in dark regions, and the noise in dark areasignificantly affects the detection accuracy. For this reason,we suppressed this noise by improving the weighting functionin equation (1). Although many ghost removal algorithmshave been proposed, most methods are unsuited to humandetection due to the large error of remapped image. Our humandetection should remove the noise and ghost in synthesizingHDR images.

V. CONCLUSION

We have proposed a basic framework of human detectionwith HDR image for improving the accuracy, especially for thenight scenes disturbed by very bright lighting. The dynamicrange of an input HDR image is compressed so that existinghuman detectors for LDR images can be adopted.

The accuracy of human detection depends on the com-pression algorithm of dynamic range. We have comparativelyevaluated the three types of tone-mapping algorithms, andhave found that the improving saturation and optimal scalingare effective for robust human detection. The enhancement ofcontrast or details with local tone-mapping, however, decreasethe detection accuracy.

Consequently, the GTM is currently the best selection forHDR-based human detection, and developing a new tone-mapping algorithm by extending the GTM is our future work.

REFERENCES

[1] N. Dalal and B. Triggs, “Histgrams of Oriented Gradients for HumanDetection,” IEEE CVPR, pp.886-893, 2005.

Page 4: Human Detection using HDR Images images can capture high dynamic range scenes with-out over- or under-exposure since they have wider dynamic range and higher bit-depth than LDR images.

VCMw GTM LTMCorrect Correct False

Fig. 7. Comparative evaluation of tone-mapping algorithms

original HDR image modified HDR iamgeFalse Correct

Fig. 8. Effect of dark-current noise in HDR image

OPT VCMw GTM LTMCorrect False False False

Fig. 9. Effect of the ghost in HDR image

[2] R. E. Schapire and Y. Singer, “Improved Boosting Algorithms UsingConfidence-rated Predictions,” Journal of Machine Learning, Vol 37,1999.

[3] E. Reinhard, W. Heidrich, P. Debevec, S. Pattanaik, G. Ward and K.Myszkowski,“High Dynamic Range Imaging: Acquision, Display, andImage-Based Lighting,” Morgan Kaufmann Publishers, 2010.

[4] E. Reinhard, M. Stark, P. Shirley, J. Ferwerda, “Photographic tonereproduction for digital images,” ACM SIGGRAPH 02, pp.267-276,2002.

[5] Jiawen Chen, Sylvain Paris, Fr é do Durand, “Real-time edge-awareimage processing with the bilateral grid,” ACM SIGGRAPH 07, 2007.

[6] Paul E. Debevec, Jitendra Malik, “Recovering High Dynamic RangeRadiance Maps from Photographs,” ACM SIGGRAPH ’97, pp.369-378,1997.