Top Banner

of 13

content asp pdf (2)

May 31, 2018

Download

Documents

James Yu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/14/2019 content asp pdf (2)

    1/13

    Machine Vision and Applications (1998) 10: 308320 Machine Vision andApplicationsc Springer-Verlag 1998

    Extracting characters of license plates from video sequencesYuntao Cui, Qian Huang

    Siemens Corporate Research, 755 College Road East, Princeton, NJ 08536, USA; e-mail: {cui,huang}@scr.siemens.com

    Received: 13 August 1997 / Accepted: 7 October 1997

    Abstract. In this paper, we present a new approach to ex-tract characters on a license plate of a moving vehicle,given a sequence of perspective-distortion-corrected licenseplate images. Different from many existing single-frame ap-proaches, our method simultaneously utilizes spatial andtemporal information. We first model the extraction of char-acters as a Markov random field (MRF), where the random-ness is used to describe the uncertainty in pixel label assign-ment. With the MRF modeling, the extraction of characters isformulated as the problem of maximizing a posteriori prob-ability based on a given prior knowledge and observations.A genetic algorithm with local greedy mutation operator isemployed to optimize the objective function. Experimentsand comparison study were conducted and some of our ex-perimental results are presented in the paper. It is shown thatour approach provides better performance than other singleframe methods.

    Key words: Document analysis Binarization Image se-

    quence analysis

    1 Introduction

    Automatic recognition of car license plates plays an impor-tant role in traffic surveillance systems. Recently, we haveseen quite a few computer-vision-based systems that recog-nize the license plates [2, 8, 9, 13]. Most existing systemsfocus on the development of a reliable optical character rec-ognizer (OCR). However, prior to the recognition an OCRsystem performs, the characters have to be extracted from

    license plates. To simplify the extraction problem, the exist-ing systems assume that a license plate is a rectangular areacontaining a number of dark characters on a white back-ground. With this assumption, various approaches exist thatextract characters using global threshold methods [8, 9, 13]with sometimes global contrast enhancement prior to the ex-traction [2]. Unfortunately, these methods do not work wellin most real applications because of the following factors: 1)

    Correspondence to: Y. Cui

    low resolution of the characters on the plate due to the appli-cation requirement that the entire car has to be visible in theimage; 2) global thresholding or enhancement methods workwell only when the plate is uniformly illuminated and nottoo noisy, which usually is not the case in real applications.

    Takahashi et al. [19] proposed a morphology-based thresh-olding method to improve the performance in extractingcharacters from license plates. They viewed a character tobe the combinations of ditches, where a ditch is formed bytwo edges with opposite direction. Then, they designed ap-propriate morphological operators to enhance the area withinthe ditch (between two edges). Although this method is suit-able in the case when the contrast between the character andthe background is strong, it is very difficult to pick up thecorrect locations of edges in low-contrast images, a situationoften occurs in real applications.

    Another set of solutions to extract the characters of alicense plate is to use adaptive thresholding (see [20] fora survey of binarization methods). Unlike global methods,

    adaptive approaches find the thresholds based on the infor-mation from local regions. Therefore, they are capable ofdealing with non-uniformly illuminated license plate images.However, the performance of these adaptive thresholdingmethods depends on the selection of the local regions. An-other problem is that these methods tend to generate brokencharacters as a single character may belong to different localregions that have different local thresholds. To deal with thisproblem, some algorithms apply region growing to fill theholes [11, 23].

    Another alternative to improve the extraction is to useadditional data constraints from multiple frames. Multiframemethods have been shown to be effective in image restora-tion tasks [16, 21]. In this paper, we address the issue of

    extracting characters by simultaneously utilizing spatial andtemporal information. In our approach, the extraction ofcharacters from license plate images is modeled as a 3DMarkov random field (MRF), where the randomness is usedto describe the uncertainty in the label assignment of pix-els. As a result, the prior knowledge which promotes theconsistency between adjacent pixels can be represented interms of the clique functions associated with the underlyingGibbs probability distribution function (pdf) describing the

  • 8/14/2019 content asp pdf (2)

    2/13

    309

    MRF. Under the MRF modeling assumption, the extractionproblem can be formulated as an optimization problem ofmaximizing the a posteriori probability given prior knowl-edge and observations. Then, a genetic algorithm with localgreedy mutation operator is employed to optimize the ob-jective function.

    The paper is organized as follows. In Sect. 2, we use theMRF model to formulate the character extraction problemas an optimization problem. Then, we apply a genetic algo-rithm with our local greedy mutation operator in Sect. 3 to

    optimize the objective function. Section 4 is related to thepreprocessing of how to extract license plates from imagesin a sequence. In Sect. 5, we show the experimental results.Finally, we draw our conclusions in Sect. 6.

    2 The MRF-model-based character extraction

    The MRF model, as an extension of the one-dimensionalMarkov process, has attracted much attention in the imageprocessing and computer vision community (e.g., [3, 4]).MRF models can be used to incorporate prior contextualinformation or constraints in a quantitative way. Local spa-tial/contextual dependencies can be utilized to perform bi-narization [7]. Another advantage of the MRF-model is thatit tends to be local, hence it is suitable for parallel hardwareimplementation. In this section, we present an MRF model-based approach to extract characters from multiple frames.The model combines the prior knowledge and observationsfrom both spatial and temporal dimensions into a unifiedframework.

    2.1 Problem statement

    Our problem is to extract characters from a moving vehicle.Such extracted results can be fed into an OCR to performautomatic recognition. The developed approach can be usedin many applications, including automatic traffic violation

    control, automatic parking lot billing, etc.. Formally, assumeyl be the lth frame which contains a license plate, where l ={1, 2, , n}. Our goal is to extract one rectangular imageI of the license plate with size N1 N2. Each pixel zi,j Iis labeled either as 1 (on the character) or 0 (backgroundpixel).

    2.2 Motion model

    Multiple frames of license plates are obtained during differ-ent time intervals. Since the vehicle is moving and the viewof camera is not necessarily perpendicular to the incomingvehicle, the license plates from different frames have not

    only different size but also different perspective distortion.The first thing we need to do is transform each license platey(l) to a rectangular image fl with size N1 N2. For anypixel yli,j on y

    l, we can map it onto fli,j using a planarsurface motion model [1], where

    i = i +p1i +p2j +p5 +p7i2 +p8ij ,

    j = j +p3i +p4j +p6 +p7ij +p8j2 . (1)

    The coefficients pi can be solved if four correspondencesare available. We will discuss this in detail in Sect.4.

    B1 B2

    B3

    B4

    Fig. 1. A four-node clique

    Extract feature points from the plate

    Localize the plate of one image

    Track the feature points

    Correct the perspective distortion

    Estimate motion parameters

    Fig. 2. The diagram of the plate localization and the correction of the

    perspective distortion

    2.3 The MRF model

    After correction of the perspective distortion, we have astack of equally sized license plate images, based on which,we intend to obtain I using and MRF-model-based approach.Let S be a finite set of N1 N2 sites. Consider z to be abinary random field and g = {gi,j, (i, j) S} to be a neigh-

    borhood system on S, such that

    1. (i, j) is not in gi,j,2. (i, j) gk,l, if and only if (k, l) gi,j, for all (i, j), (k, l)

    S.

    We say z is an MRF with respect to g if and only if [4]:

    1. P(z = s) > 0, for any realization s of z,2. P(zi,j = si,j|zk,l = sk,l, (i, j) /= (k, l)) = P(zi,j =

    si,j |zk,l = sk,l, (k, l) gi,j),

  • 8/14/2019 content asp pdf (2)

    3/13

    310

    Fig. 3. Five images of a moving vehicle

    where P() and P(|) are the joint and conditional pdfs,respectively.

    An important feature of the MRF model is that its jointpdf has a general form, known as the Gibbs distribution

    defined based on the concept of cliques [4]. A clique is asubset C S if and only if every pair of distinct sites in Care neighbors. Another important feature of the MRF modelis that z is an MRF on S with respect to the neighborhoodsystem g if and only if the probability distribution is a Gibbsdistribution based on the cliques. A Gibbs distribution canbe represented as follows:

    P(s) =1

    ZexpU(s)/T , (2)

    where

    U(s) =cC

    Vc(s) (3)

    is the Gibbs energy function and Vc(s) is called the cliquepotential, T is the temperature parameter. Finally,

    Z =

    all s

    expU(s)/T (4)

    is the normalization factor. Notice that the preceding MRFpdf is quite rich in that the clique functions can be arbitraryas long as they depend only on the nodes in the correspond-ing cliques. Therefore, the MRF-model-based approach pro-vides potential advantages in the problem of the character

  • 8/14/2019 content asp pdf (2)

    4/13

    311

    Table 1. Clique energies

    Configuration ofB1B2B3B4 Energy

    0000 1.0

    0001 6.0

    0010 6.0

    0011 3.0

    0100 6.0

    0101 3.0

    0110 18.0

    0111 6.0

    1000 1.0

    1001 18.01010 3.0

    1011 6.0

    1100 3.0

    1101 6.0

    1110 6.0

    1111 1.0

    extraction, for example, we can define the clique functions topromote the consistence of the labeling between neighboringpixels.

    2.4 The MRF-model-based formulation

    We formulate the character extraction from multiframes asa Bayesian MAP estimation problem. The MAP estimate isat the maximum of the posterior probability P(z|{fl}), orequivalently, at the maximum of the log-likelihood function

    z = arg maxfl

    log P(z|f1, f2, , fn). (5)

    Applying Bayes theorem, we have

    z = arg maxfl{log P(z) + log P(f1, f2, , fn|z)}. (6)

    The prior probability can be written as

    P(z = s) =1

    Z exp

    U(s)/T

    U(s) =cC

    Vc(s) (7)

    The parameter T is assumed to be 1 for simplicity. Here,the clique energies are chosen to encourage the consistenceof the labeling between neighboring pixels. Encouragementor discouragement is done by assigning some energy valueswith a clique. We use a four node clique as shown in Fig. 1.

    For the extraction problem, each pixel Bi on the cliqueis assigned either 1 or 0. Therefore, B1B2B3B4 is a hex-adecimal number which has 16 different choices. Table 1shows one set of energy which was used in the experiments.

    The observation between frames is assumed to be in-dependent, so that the complete conditional density can bewritten as

    P(f1, f2, , fn|z) =n

    l=1

    P(fl|z). (8)

    Let zl be the extraction result based on the single frame l.Assume the observation model is given by

    z = zl + N, (9)

    Fig. 4. The results of the localization is shown using white rectangular

    Fig. 5. The top 30 features within the license plate region

    where N is a zero mean white Gaussian random field withvariance l for each variable in N. Then, we have

    P(fl|z) =1

    (22l )N1N2/2exp

    zzl2

    22l , (10)

    for l = {1, 2, , n}. Incorporate the prior and the condi-tional density into Eq. 6, we have

    z = argminz{cC

    Vc(z) +n

    l=1

    z zl2

    22l}. (11)

    The above objective function is not well behaved. Manygradient-based techniques cannot be applied here since thefunction is not differentiable. The genetic algorithm (GA)is an adaptive search algorithm based on the mechanics ofnatural selection and natural genetics [5]. GA requires onlyobjective function values to perform an effective search. This

    characteristic makes a GA a more canonical method thanmany other search schemes.

    3 Optimization using GA

    GAs were introduced by Holland [6] as a computationalanalog of adaptive systems. Given the objective function(11), GAs can be used to find solution zi to minimizethe fitness value g(zi). With GAs, a set of fixed size is

  • 8/14/2019 content asp pdf (2)

    5/13

    312

    Fig. 6. The tracking results

    Fig. 7. The results of the mapping

    prepared, called a population, consisting of M individualszi = zi,1zi,2 zi,N1N2 , where zi,j = 0, 1. The fitness func-tion g is used to evaluate individuals.

    3.1 Simple GA

    A simple GA is composed of three operators:

    1. Selection. It is a process in which individuals are re-produced according to their fitness values. Intuitively,we would like to duplicate the individuals whose fitnessvalues are lower with higher probability for our minimiz-

    ing problem. The probability of an individual zi beingreproduced in the next generation is defined as

    Pzi =1/g(zi)M

    j=1(1/g(zj )). (12)

    2. Crossover. It selects two individuals from the currentpopulation with a probability . Then, it mates them byexchanging the 1 l N1 N2 right-most bits of thetwo individuals, where the number of the exchanged bits,l, is chosen uniformly at random from [1, N1 N2].

    3. Mutation. Given a mutation probability , randomlychoose an individual from the population and position,and then invert the bit.

    3.2 Greedy mutation operator

    In the simple GA, the fittest individual in a population isnot guaranteed to survive into the next generation, and thusan extremely good solution to the fitness function may bediscovered and subsequently lost. One way to avoid this

    problem is to use elitist selection in which the best individ-ual in the population survives with probability one [15, 18].Since the simple GA runs with a non-zero mutation rate, it istrivial to show that a global optimum will be reached whenthe GA is left to run infinitely. However, the crossover andmutation operators randomly explore the solution space inthe simple GAs. The random search is not very efficient andit makes the GAs converge very slowly, especially when theentire solution space is large, like our case. In this paper, weuse a local greedy mutation operator to speed up the con-

  • 8/14/2019 content asp pdf (2)

    6/13

    313

    Fig. 8. The results of the character extraction using Parks

    method

    Fig. 9. The results of the character extraction using Yanowitz

    and Brucksteins method

    Fig. 10. The result after 20 iterations

    vergence. Let zi,j be the bit selected to flip. We define thelocal greedy flip probability Pf to be

    Pf =

    1 if g(zi) < g(z

    i)

    2 otherwise, (13)

    where zi is zi with the jth bit flipped and 1 > 2 > 0.A non-zero 2 is necessary to prevent the GA from beingstuck on the local minima. Due to the localalism of boththe prior (cliques) and the conditional density, only a fewlocal pixels are needed to compute the difference betweeng(zi) and g(z

    i). This greedy mutation operator utilizes the

    problem information, thus making the solution search moreefficient.

    3.3 Expected convergence ratewith the greedy mutation operator

    Let l = N1 N2 be the length of the binary strings, thenr = 2l is the total number of possible strings. If n is the

    population size, then the number of possible populations Nis [10]

    N =

    n + r 1

    n

    . (14)

    As can be seen, even for a relatively small image 64 64, Ncan be very large. So, with the random crossover and mu-tation operators, the GAs converge very slowly. The greedymutation operator which utilizes the problem informationis expected to be more efficient. In this section, we use aMarkov chain model to see how the greedy mutation oper-ator moves in the solution space and answer the followingquestion: what is the expected generation number that theGA population will contain a copy of the optimum. Theclosed form analysis is difficult in general. Here, we con-sider a type of problems called simple greedy problems.

    Definition 1. Let zi be a realization in the solution space ofa problem E, which is defined to minimize g(zi). If, for anyzi, there exists a j such thatg(z

    i) < g(zi), where z

    i is zi with

    the jth bit flipped, then P is a simple greedy problem.

    For a simple greedy problem, we can use the greedymutation operator, where 1 = 1.0 and = 0.0, to minimize

    the objective function. A simple probabilistic Markov chainis adopted to model the behavior of the algorithm. Considera stochastic process {zi, i = 0, 1, 2, , N}. If zn = i, thenthe process is said to be in state i at time n. We assume that,whenever the process is in state i, there is a fixed probabilityPi,j that it will be in state j. Such a stochastic process isknown as a Markov chain [14].

    Theorem 1. Let E be a simple greedy problem and N bethe number of possible populations. The expected generationnumber that the GA population will contain a copy of the

    optimum is O(log N) when the greedy mutation probability

    1 = 1.0 and the crossover probability = 0.0.

    Proof. Since E is a simple greedy problem, at any timethe population has jth best solution, then after the greedymutation, the population should have a better solution. As-sume that the new solution is equally likely to be any ofthe j 1 best. This can be modeled by a Markov chain forwhich P1,1 = 1 and

    Pi,j =1

    i 1, (15)

    where j = 1, 2, , i1, i > 1. Let Ti denote the number oftransitions needed to go from state i to state 1. A recursiveformula for E(Ti) can be obtained by conditioning on the

    initial transition:

    E(Ti) = 1 +1

    i 1

    i1j=1

    E(Tj )

    =

    i1j=1

    1/j. (16)

    SinceN1

    dx

    x