Top Banner
Msc Dissertation NAO project : object recognition and localization Author : Sylvain Cherrier Msc : Robotics and Automation Period : 04/04/2011 → 30/09/2011
107
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

Msc DissertationNAO project :object recognition and localizationAuthor : Sylvain CherrierMsc : Robotics and AutomationPeriod : 04/04/2011 30/09/2011Sylvain Cherrier Msc Robotics and AutomationAcknowledgementIn order to do my project, I used a lot of helpful documents which simplified me the work. I could mention a website from Utkarsh Sinha (SIFT, mono-calibration and other stuff), the publication of David G. Lowe Distinctive Image Features from Scale-Invariant Keypoints (SIFT and feature matching) or the dissertation of a previous MSc student, Suresh Kumar Pupala named 3D-Perception from Binocular Vision (stereovision and other stuff).After, I would like to thank Dr. Bassem Alachkar to have worked with me on the recognition. He guided me to several feature extraction methods (SIFT and SURF) and presented me several publications which allowed me to understand the feature matching.Then, I would like to thank my project supervisor Dr. Samia Nefti-Meziani, who allowed me to use the laboratory and in particular the NAO robot and the stereovision system.Finally, I would like to thank all researchers and studients from the laboratory for their sympathy and their help.NAO project page 2/107Sylvain Cherrier Msc Robotics and AutomationTable of contents 1 Introduction......................................................................................................................................3 2 Subject : object recognition and localization on NAO robot............................................................5 2.1 Mission and goals......................................................................................................................5 2.2 Recognition...............................................................................................................................7 2.3 Localization.............................................................................................................................22 2.4 NAO control............................................................................................................................42 3 Implementation and testing...........................................................................................................58 3.1 Languages, softwares and libraries.........................................................................................58 3.2 Tools and methods..................................................................................................................59 3.3 Steps of the project.................................................................................................................64 4 Conclusion.......................................................................................................................................67 5 References......................................................................................................................................68 6 Appendices.....................................................................................................................................69 6.1 SIFT: a method of feature extraction......................................................................................69 6.2 SURF : another method adapted for real time applications...................................................84 6.3 Mono-calibration....................................................................................................................92 6.4 Stereo-calibration.................................................................................................................100NAO project page 3/107Sylvain Cherrier Msc Robotics and Automation 1IntroductionVideo cameras are interesting devices to work with because they can transmit a lot of data to work with. For human beings or animals, image data inform them about the objects within the environment they meet : a goal for the future is to understand how we can recognize an object. Firstly, we learn about it and after we are able to recognize it within a different environment.The strength of the brain is to be able to recognize an object among different classes of objects : we call this ability clustering. But before all, the brain is able to extract interesting features from an object that make it belonging to a class.Thus, the difficulty to implement a recognition ability on a robot has two levels : what features extract from an object ? How to identify a specific object within a lot of several others ?We can say we have firstly to have well chosen features before to be able to identify an object, so the first question has to be replied before the second one.Besides recognition, we need often the location and orientation of the object in order to make the robot interact with it (control). Thus, the vision system must also enable object localization as it allows to increase the definition of the object seen.In this report, firstly, I will present the university and where I worked before to describe what were the context and goals of my project.After, I will focus on the three subjects I had to study : object recognition, object localization and NAO control before to present how I implemented and managed my project.NAO project page 4/107Sylvain Cherrier Msc Robotics and Automation 2Subject : object recognition and localization on NAO robot 2.1Mission and goals2.1.1. ContextIn real life, the environment can change easily and our brain can easily adapt itself to match object whatever the conditions.In Robotics, in order to tackle recognition problems, we think often about a specific context ; I can mention three examples : For industrial robots, we can have to make robots recognize circular objects on a moving belt : the features to extract is well defined and we will implement an effective program to match only circular ones. In this first example, the program has been done to work if the background is the moving belt one. Thus, a first problem is the background : in real applications, it can vary a lot ! When we want improve a CCTV camera to recognize vehicles, we can use the thickness to their edges to distinguish them from people walking in the street. After, we can also use the car blobs to match a specific size or the optical flow to match a specific speed. The problem there can be at the occlusion level : indeed, the car blob could be not valid if a tree hide the vehicle and the would be not identified. Using a HSV spacecolor, we can recognize monocoloured objects within a constant illumination. Even if the use of blobs could improve the detection, a big problem is illumination variation.Thus, from these examples we can underline three types of variations : background occlusion illuminationBut there are a lot of others such as all the transformation in space : 3D translation (on image : translation (X, Y) and scale (1/Z)) 3D rotationNAO project page 5/107Sylvain Cherrier Msc Robotics and Automation2.1.2. State of the artA lot of people has worked to find invariant features to recognize an object whatever the conditions.An insteresting way of study is to work on invariant keypoints : The Scale-Invariant Feature Transform (SIFT) was published in 1999 by David G. Lowe, researcher from the University of British Columbia. This algorithm allow to detect local keypoints within an image, these features are invariant in scale, rotation, illumination. Using keypoints instead of keyregions enable to deal with occlusion. The Speeded Up Robust Feature (SURF) was presented in 2006 by Herbert Bay et al., researcher from the ETH Zurich. This algorith is partly inspired from SIFT but it's several times faster than SIFT and claimed by its authors to be more robust against different image transformations.2.1.3. My missionMy mission was to recognize an object in difficult conditions (occlusion, background, different rotations, scales and positions). Moreover, the goal was also to apply it for a robotic purpose : the control of the robot need to be linked to the recognition. Thus, I had to study the localization between recognition and control in order to orientate to robotic applications.In the following, I will speak more about these methods and how to use them to match an object.After, I will focus on the localization as it allows to have a complete description of the object in the space : it enables more interaction between the robot and the object.As I worked on the NAO robot, I will describe the hardware and software of it and the ways to program it.Then, in a final part, I will link all the previous parts (recognition, localization and robot control) in order to describe my implementation and testing. I will also speak about the way I managed my project.NAO project page 6/107Sylvain Cherrier Msc Robotics and Automation 2.2Recognition2.2.1. IntroductionIn pattern recognition, there are two main steps before to be able to have a judgement on an object: Firstly, we have to analyze the image, the features (which can be for example keypoints or keyregions) susceptible to be interesting to work with. In the following, I will call this step: feature extraction. Secondly, we have to compare these features with them stocked in the database. I will call it : feature matching.Dealing with the database, there are two ways to create it: either it's created manualy by the user and stays fixed (the user learns to the robot) or it's updated automatically by the robot (the robot learns by itself)2.2.2. SIFT and SURF : two methods of invariant feature extractionSIFT and SURF are two invariant-feature extraction methods which are divided in two main steps : keypoint extraction : keypoints are found for specific locations on the image and an orientation is given to them. They are found depending on their stability with scale and position : the invariance with scale is done at this step. Descriptor generation : a descriptor is generated for each keypoint in order to identify them accurately. In order to be flexible to transformations, it has to be invariant : the invariances with rotation, brightness and contrast are done at this step.During the keypoint extraction, each method have to build a scale space in order to estimate the Laplacian of Gaussian. The Laplacian allows to detect pixels with rapid intensity changes but it's necessary to apply on a Gaussian in order to have the invariance in scale. Moreover, the Gaussian makes the Laplacian less noisy. From the Laplacian of Gaussian applied to the image, the keypoints are found at the extremums of the function as their intensity changes very quickly around their location. Then, the second step is to calculate their orientation using gradients from their neighborhood.NAO project page 7/107Sylvain Cherrier Msc Robotics and AutomationConcerning the descriptor generation, we will need firtly to rotate the axis around each keypoint by their orientation in order to have invariance witk rotation. After, as for the calculation of the keypoint orientation, it's necessary to calculate the gradients around a specific neighborhood (or window). This window will be then divided in several subwindow in order to build a vector containing several gradient parameters. Finally, a threshold is applied to this vector and it's normalized at unit length in order to make it respectively invariant with brightness and contrast.This vector have a specific number of values depending on the methods : with SIFT, it has 128 values and with SURF, 64.The fact that SURF works on integral images and uses simplified masks reduces the computation and makes the algorithm quicker to execute. For these reasons, SURF is better for real time applications than SIFT even if SIFT can be very robust.For more details about the feature extraction methods SIFT and SURF, you can refer to the appendices where I describe more each one of these methods.The diagram below sums up the main common steps used by SIFT and SURF.NAO project page 8/107Build a scale space(invariance on scale)Estimate the Laplacian of GaussianCalculate the gradians around the keypointswithin a window W1Estimate the orientationsCalculate the gradians around keypointswithin a window W2Divide the window W2in several subwindowsRotate the keypoint axisby the orientation of the keypoints(invariance on rotation)Calculate gradian parametersfor each subwindowThreshold and normalize the vector(invariance on brightness and contrast)Keypoint extractionDescriptor generationKeypointlocationKeypointorientationDescriptorSylvain Cherrier Msc Robotics and Automation2.2.3. Feature matchingThus, now we have seen two methods to extract keypoints which locate and orientate keypoints and assign them an invariant descriptor. The feature matching consists in recognize a model within an image acquired.The goal of the matching is to compare two lists of keypoint-descriptor in order to : identify the presence of an object know the 2D position, 2D size, 2D orientation and scale of the object on the imageA big problem of recognition is to generalize it for 3D objects : a 3D object is complicated to be read by a camera but also the 3D position, 3D size and 3D orientation are difficult to estimate with only one camera.Thus, we will see how a matching can be done using only one model we want recognize within the image acquired. It can be divided in three main steps : compare the descriptor vectors (descriptor matching) compare the feature of the keypoints matched (keypoint matching) calculate the 2D features of the model within the image acquired (calculation of 2D model features)NAO project page 9/107kpListacqDescriptor matchingKeypoint matchingCalculation of 2D model featuresFeature extraction (SIFT or SURF) Feature extraction (SIFT or SURF)Image acquired Image modeldescListacqdescListmodelkpListmodelFeature matchingSylvain Cherrier Msc Robotics and AutomationThe diagram above shows the principle of the feature matching : features are extracted from the image acquired and from the image model in order to match their descriptors and keypoints. The goal is to obtain the 2D features of the model within the acquisition at the end of the matching. 2.2.3.1descriptor matching 2.2.3.1.1Compare distancesThe descriptors we have previously calculated using SIFT or SURF allow us to have data invariant in scale, rotation, brightness and contrast.Firstly, in order to compare descriptors, we can focus on their distance or on their angle. If descacq (descacq(1) descacq(2) descacq(sizedesc)) is a descriptor from the image acquired and descmodel (descmodel(1) descmodel(2) descmodel(sizedesc)) is one from the model :deuclidian=.(descacq(1)descmodel(1))2+...+(descacq(sizedesc)descmodel( sizedesc))20=arccos( descacq(1) descmodel(1)+...+descacq( sizedesc) descmodel( sizedesc))Thus, there are two basic solutions to compare descriptors : using euclidian distance or angle (in the following I qualify the angle as distance also). There are others types of distances I don't mention here in order to make it simple.Thus, we could calculate the distances between one descriptor from the image acquired and each one from the image model : the smallest distance could be compared to a specific threshold.But how to find the threshold in this case ? Indeed, each descriptor could have its own valid threshold. Thus, a threshold has to be relative to a distance.David G. Lowe had the idea to compare the smallest distance with the next smallest distance and deduced that rejecting matching with a distance ratio (ratiodistance = distancesmallest/distancenext-smallest) more that 0.8, we eliminate 90 % of the false matches with just 5 % of the correct ones.The figure below represents the Probability Density Functions (PDF) functions of the ratio of distance (closest/next closest). As you can see, removing matches from 0.8 enables to keep the most of correct matches and to reject the most of false matches. We can also keep matches under 0.6 in order to focus on only correct matches (but we lose data).NAO project page 10/107Sylvain Cherrier Msc Robotics and AutomationThis method allows to eliminate quickly matches from the background of the acquisition which are completely incompatibles with the model. At the end of the procedure, we have a number of interested keypoints from the image acquired that are supposed to match with the model but we need a method more precised than distances. 2.2.3.1.2k-d treeThe goal now is to match the nearest neighboors more accurately. To do that, we can use the k-d tree method.Before all, k-d tree uses two lists of vectors of at least k dimensions (as it will focus only on the k dimensions specified) : the reference (N vectors) which aludes to the vectors we use to build the tree the model (M vectors) which aludes to the vectors we use to find a nearest neighboorThus, when the k-d tree has finished its querying for each vector from the model within the tree, it has for each one of them a nearest reference vector : we have M matches at the end.In order to avoid several model vectors to match for the same reference vector, we need to have :NMNAO project page 11/107Sylvain Cherrier Msc Robotics and AutomationThen, depending on the previous operation (compare distances), we will need to assign correctly the model and acquisition descriptor lists depending on the number of their descriptors to match : if there are more descriptors in the acquisition than in the model, the acquisition will be the reference and the model the reference of the k-d tree. If there are more descriptors in the model than in the acquisition, it will be the opposite.In our case, if we want accuracy, the dimension of the tree has to be the dimension of the corresponding descriptors (128 for SIFT and 64 for SURF). The drawback is that more we have dimensions, more there is computation and slower the process is.I don't explain more details about k-d tree as I used it as a tool. 2.2.3.2Keypoint matchingAfter knowing the nearest neighboors, the descriptors are not useful any more.Now, we have to check if the matches are coherent at the level of the geometry of the keypoints. Thus, now we will use the data relative to the keypoint (2D position, scale, magnitude and orientation) in order to check if there are again some false matches.To do this, we can use a Generalized Hough Transforms (GHT) but there are two types : GHT invariant with 2D translation (classical GHT) GHT invariant with 2D translation, scale and rotation (invariant GHT)NAO project page 12/107Sylvain Cherrier Msc Robotics and AutomationThe classical GHT uses only the 2D translation of the keypoints as reference. It has the advantage to give already an idea of the scale and orientation of the model within the image acquired but it requires more computation as it's necassary to go through several possible scales and 2D rotations. Moreover, it's necessary to know a specific range of scale and 2D rotation to work with and at what sampling.The invariant GHT uses both the 2D translation and orientation of the keypoints as references. It has the main advantage to be invariant with all the 2D transformations of the object but it doesn't inform about the scale and 2D rotation and as it uses orientations, it's sensitive to their accuracy.In the following, we will both study the classical Generalized Hough Transform and the invariant one. 2.2.3.2.1Classical GHTThe Generalized Hough Transform leans mainly on keypoint coordinates defined relative to a reference point. Indeed, using a reference point for all the keypoints, it enables to keep a data invariant in 2D translation.As the goal of the GHT is here to check whether the keypoints are correct, it has as input the keypoints matched between the image acquired and the model. Build the Rtable of the modelBefore all, the model need to have a corresponding Rtable which describes for each of its keypoint their displacement from the reference point :y(1, 0)=(xmodelxrymodelyr)Where xmodel and ymodel are the coordinates of the keypoint from the model and xr and yr are those from the reference point. The two parameters of the displacement alude respectively to the scale (s = 1) and rotation ( = 0 rad) as the model is considered as base.The reference point can be simply calculated as being the mean of position among all the keypoints :xr=xmodelnbKpmodelyr=ymodelnbKpmodelNAO project page 13/107Sylvain Cherrier Msc Robotics and AutomationWhere nbKpmodel refers to the number of keypoints within the model.Thus, the Rtable for each model is structured as below :T=(xmodel(1)xrymodel(1)yrxmodel(2)xrymodel(2)yr... ...xmodel(nbKpmodel)xrymodel( nbKpmodel)yr) Check the features of the acquisitionThe second step is to estimate the position of the reference point within the acquired image and check the keypoints which have a good result.As we have already a match between the keypoints from the acquisition and those from the model, we can directly use the Rtable in order to make each keypoint vote for a specific position of the reference point.Indeed, if we consider a transformation invariant both in scale and rotation, we have :xr=xacqRtable (numKpmodel ,1) yr=yacqRtable( numKpmodel , 2)Where xacq and yacq alude to the position of the keypoint from the acquisition and numKpmodel refers to the corresponding keypoint within model matched by acquisition.With scale s and rotation , we have to modify the value of the displacement given by the Rtable :y( s , 0)=s (cos(0) sin (0)sin (0) cos(0) ) y(1,0) Thus, we have our new reference point for the acquisition :xr=xacqs (cos(0) Rtable (numKpmodel ,1)sin(0) Rtable( numKpmodel, 2))yr=yacqs (sin (0) Rtable (numKpmodel , 1)+cos(0) Rtable (numKpmodel , 2))NAO project page 14/107Sylvain Cherrier Msc Robotics and AutomationThus, at the end of this step, we have an accumulator of votes of the size the image acquired.In the image below, you can see an example of accumulator : the maximum show the best probability of position of the reference point within the image acquired.A problem is to know at what scale and rotation we have the best results : personnaly, I focused on the maximum of votes to choose but there can be several for different scales and rotation. An idea would be to detect where the votes are more grouped.Adding the scale and rotation, another problem is that we add more calculation and the difficulty here is to know what range use and how many iterations (more we increase the number of iterations, more it will be accurate but more it will spend time). 2.2.3.2.2Invariant GHTContrary to the previous method, we will not use the displacement this time.We will use a relation between the speed of the keypoint ' and its acceleration '' Speed ' and acceleration ''It's easier to define the speed and accleration focusing on a template and not on keypoints.The studied template has a reference point as origin (x0, y0).The speed (at a specific angle of the template) is defined as being tangent of its border. The acceleration is defined as going through the border and the reference point (x0, y0).NAO project page 15/107Sylvain Cherrier Msc Robotics and AutomationWe can prove that between the angle of speed 'a and the angle of acceleration''a, there is a value k invariant with translation, rotation and scale :k=a' a' 'In our case, we will use the orientation of the keypoint as the angle of speed 'a and concerning the acceleration '', it's defined by :' ' =yy0xx0 and a' ' =arctan(' ' ) Build the Rtable of the modelAs previously, we will build a Rtable for the model but this time we will record the value of k and not the displacement. Thus we will use both the position and orientation of each keypoint to build the Rtable.NAO project page 16/107Sylvain Cherrier Msc Robotics and Automation Check the features of the acquisitionUsing the position of the keypoints within the acquisition and their corresponding matches, we will have an associated value of k.After, knowing the orientation of the keypoints ('a), we will have the angle of acceleration ''a.Contrary to the previous method, each keypoint will not vote for a precised position but for a line. 2.2.3.2.3ConclusionThus, at the end of the both methods, we have to look for the maximum of votes in the accumulator or where the votes are the most grouped.I chose to focus on the maximum as it's more simple.In order to be more tolerant, I added a radius of tolerance around the maximum. The false matches, outside the tolerance zone will be rejected : it allows to be coherent at the geometric level.The GHT allows to reduce errors in the calculation of 2D model features we will see in the following. 2.2.3.3calculation of 2D model features Least Square SolutionIn order to calcultate the 2D features, we will have to use a Least Squares Solution.Indeed, a point of the model (pmodel) is expressed in the image acquired (pmodel/acq) using a specific translation t, a scale s (scalar) and a rotation matrix R :pmodel / acq=s R pmodel+twhereR=(cos(0) sin(0)sin(0) cos(0) ) andt =(txty)NAO project page 17/107Sylvain Cherrier Msc Robotics and Automation is the angle of rotation between the model and the model within the image acquired. We consider that the model has a scale s of 1 and an angle of 0 radians.Thus, we will apply the Least Squares in order to find two matrices :M=s R=(m11m12m21m22) and t =(txty)The idea of the Least Squares is to arrange the equation (4.3-1) in order to group the unknowns m11, m12, m21, m22, t1 and t2 within a vector :pmodel / acq=(m11m12m21m22)pmodel+(txty) with pmodel / acq=(xmodel/ acqymodel /acq) and pmodel=(xmodelymodel)Thus, we have also :A mt =(xmodel / acqymodel/ acq)with A=(xmodelymodel0 0 1 00 0 xmodelymodel0 1) andmt =( m11m12m21m22txty)TThus the vector mt can be determined using the inverse of the matrix A, if it's not square, we will have to calculate a pseudo inverse :mt =(( ATA)1AT)(xmodel / acqymodel /acq)As we have six unknowns, in order to solve this problem, we require at least three positions pmodel with three corresponding positions within the image acquired pmodel/acq. But, in order to increase the accuracy of the Least Squares Solution, it's better to have more positions : all the matches can be considered (of course, the previous steps such as the descriptor matching and keypoint one have to be correct).NAO project page 18/107Sylvain Cherrier Msc Robotics and AutomationThus, we have :A mt =vmodel / acqwithA=(xmodel(1) ymodel(1) 0 0 1 00 0 xmodel(1) ymodel(1) 0 1... ... ... ... ... ...xmodel(nbmatches) ymodel(nbmatches) 0 0 1 00 0 xmodel( nbmatches) ymodel(nbmatches) 0 1)and vmodel/ acq=(xmodel / acq(1)ymodel /acq(1)......xmodel /acq(nbmatches)ymodel / acq( nbmatches))nbmatches is the number of matches detected.Thus, as previously, the solution is as the following :mt =(( ATA)1AT)vmodel/ acqWe have now the matrix M and the translation t but we need to have the scale s and the angle aluding to the transformation from the model to the acquisition.To calculate them, we can simply lean on the proprieties of the rotation matrix R :RRT=I andR=(cos(0) sin (0)sin(0) cos(0) )NAO project page 19/107Sylvain Cherrier Msc Robotics and AutomationWe conclude :MMT=s2I ands=. MMT(1,1)=.MMT( 2, 2)=. MMT(1, 1)+MMT(2, 2)2We deduce the rotation matrix R and the angle :R=Ms and0=arctan(R( 2,1)R(1, 1))Then, at the end of this step, we have the translation t, the size s and the angle refering to the transformation of the model within the acquisition.In the diagram below, the red points refer to examples of matches between acquisition and model : it's necessary to have at least three couples of positions (acquisition-model) in order to calculate the features. You can see some measures useful to estimate : Cmodel and Cmodel/acq alude respectively to the center of rotation within the model and within the acquisition. the Least Squares Solution gives us only an estimation of the origin of the model within the image acquired Omodel/acq but the center of rotation is easily calculated using the size of the model (sW0, sH0) which refer to the model image from where the keypoints were extracted.NAO project page 20/107Omodel/acqtxtyOacqW0H0sH0sW0 Acquisition ModelOmodelCmodelCmodel/acqSylvain Cherrier Msc Robotics and Automation Reprojection errorIn order to have an idea of the quality of the Least Square Solution, it's necessary to calculate the reprojection error.The reprojection error is simply obtained by calculating the new position vector of the model within the acquisition v'model/acq using the position vector of the model vmodel and the matrix M and translation t previously calculated.Thus, we have an error (the reprojection error errorrepro) defined by the distance between the position vector calculated v'model/acq and the real one vmodel/acq :derror=v'model / acqvmodel /acqanderrorrepro=derrorvmodel / acqNAO project page 21/107Sylvain Cherrier Msc Robotics and Automation 2.3Localization2.3.1. IntroductionThere are a lot ways to localize objects : ultrasonic sensors infrared sensors cameras But, localization with cameras allows to have a lot of data and working with pixels is more accurate. However it need more processing time and computing.In the context of the project, localization allows to improve the interaction between the robot and the object detected.We will see in the following how we can localize objects using one and several cameras; we'll see also the drawbacks and advantages of these two methods.But before all we need to focus on the parameters of a video camera which are important to define.2.3.2. Parameters of a video cameraA video camera can be defined by several parameters: extrinsic intrinsicThe extrinsic parameters focus on the geographical location of the camera whereas the intrinsic parameters deals with the internal features of the camera. In the following, we will describe each of these parameters. 2.3.2.1extrinsicThe extrinsic parameters of a camera alude to the position and orientation of the camera in relation to the focused object. Thus, the position of the camera pcam defined as below:NAO project page 22/107Sylvain Cherrier Msc Robotics and Automationpcam=Rcampworld+Tcam= RcamTcam pworldwhere: pworld aludes to the position of the world origin Rcam and Tcam refer respectively to the rotation and translation of the camera relative to the reference pointRcam=(r11r12r13r21r22r23r31r32r33)Tcam=(txtytz)In our case, it's interesting to know the object position pobj/world in relation to the camera frame pobj/cam. In order to build this relation, we consider the Rcam and Tcam are referenced to the object and not to the world as before. As we want to express Tcam in the object frame, we have a translation of -Tcam before the rotation:pobj/ cam=Rcam( pobj/ worldTcam)=Rcam ITcam pobj /world 2.3.2.2intrisicTheintrinsic parameters can be divided in two categories: the projection parameters allowing to build the projection relation between 2D points seen NAO project page 23/107Xobj/worldYobj/worldZobj/worldOobj/worldOobj/camZobj/camXobj/camYobj/cam(Rcam, Tcam)Sylvain Cherrier Msc Robotics and Automationand real 3D points the distortions defined in relation to an ideal pinhole camera 2.3.2.2.1projection parametersThe projection parameters allow to fill the projection matrix linking 3D points from the world with 2D points (or pixels) seen with the camera.This matrix allows to apply the relation followed by ideal pinhole cameras. Actually, a pinhole camera is a simple camera without a lens and with a single small aperture; we can assume a video camera with a lens has a similar behaviour that the pinhole camera because the rays go throught the optical center of the lens.This relation is determined easily using geometry using a front projection pinhole camera :x'imgxobj/cam= y 'imgyobj/ cam= fcamzobj/camWe deduce:x'img=f cam xobj/ camzobj /camy 'img=fcam yobj /camzobj /camThese relations allow to build the projection matrix linking the 2D points on the image p'img and the 3D object points expressed relative to the video camera frame pobj/cam:p'img=s Pcampobj/ camNAO project page 24/107OcamOptical axisImage planeOimgOobj fcamzobj/camyzx(xobj/cam, yobj/cam)(x'img, y'img)Sylvain Cherrier Msc Robotics and Automationwhere Pcam=(fcam0 00 fcam00 0 1) ands=1zobj/ cams is a scale factor depending on the distance between the object and the camera: higher it is, bigger the object will appear on the image.The position on the image pimg is expressed in distance unit, it's necessary now to convert it in pixel unit (pimg) and apply an offset:pimg=Pimg p'imgwhere Pimg=(NxNxcot(0) cx0Nysin(0)cy0 0 1)Nx and Ny alude respectively to the number of pixels for one unit of distance horizontally and vertically.cx and cy alude to the coordinates of the image center (in pixels); as the z axis of the camera frame is located on the optical axis, it's necessary to offset pimg of the half of image width on x and of the half of image height on y (of cx and cy respectively). refers to the angle between the x and y axes, it can be responsible for a screw distortion on the image. The axis are generally orthogonal, this angle is so very closed to 90.Finally, we can deduce the final projection matrix Pcam-px allowing to have directly the position on the image in pixels pimg:pimg=s PimgPcam pobj/ cam=s Pcampx pobj/ camwhere Pcampx=(Nxf camNxfcamcot (0) cx0N yfcamsin(0)cy0 0 1=fxf xcot (0) cx0f ysin(0)cy0 0 1)We have fx = Nx fcam andfy = Ny fcam; generally, the term sin() = 1 as 90 but the term cot() is often kept as tan(). Thus, a skew coefficent is defined as c = cot(); the projection matrix can be approximated:NAO project page 25/107Sylvain Cherrier Msc Robotics and AutomationPcampx(f xf xoccx0 f ycy0 0 1)where c 0 2.3.2.2.2distortionsThere are two differrent types of distortion: the radial distortions dr the tangential distortions on x (dtx) and on y (dty)The radial distortions come from the convexity of the lens; more we go far from the center of the lens (center of the image), more we have distortions. You can see below the effect of radial distortion.The tangential distortions come from the lack of parallelism between the lens and the image plane.Thus, we can express the distorted and normalized position (pdn) relative to the normalized object position s pobj/cam (xn, yn):pnd=dr s pobj/ cam+(dtxdty1 )=Md s pobj/ camNAO project page 26/107Sylvain Cherrier Msc Robotics and Automationwhere Md=(dr 0 dtx0 dr dty0 0 1 )ands=1zobj/ camYou can notice we have to normalize the object position before to apply distortions but it must be done before the camera projection we saw in the previous part. Indeed, focals, center of the image or skew coefficient will be applied on the distorted and normalized position pdn instead of directly the object position pobj/cam; it will be not necessary to normalize again by 1/s during the projection.dr, dtx and dty are expressed functions of the normalized object position on z axis (s pobj/cam (xn, yn)):xn=xobj/ camzobj/ camyn=yobj / camzobj/ camr=.xn2+yn2Thus, the distortion coefficients are calculated as shown below:dr=1+dr1r2+dr2r4+dr3r6+...+drnr2ndtx=2dt1xn yn+dt2(r2+2xn2) dty=dt1(r2+2 yn2)+2dt2xn yn 2.3.2.3ConclusionUsing the equations from above, we can express a real 2D pixel point on the image pimg from the camera functions of a 3D object point within the world pobj/world:pimg=s PcampxMd Rcam ITcam pobj /worldpimg=s(f xf xoccx0 f ycy0 0 1) (dr 0 dtx0 dr dty0 0 1 )(r11r12r13txr21r22r23tyr31r32r33tz) pobj/world=M pobj/worldM will be called the intrinsic matrix in the following.NAO project page 27/107Sylvain Cherrier Msc Robotics and Automation2.3.3. Localization with one cameraWith one camera, it's a simple way to localize objects in the space even if it's necessary to know at least one dimension in distance unit.It consists in three steps : calibrate the camera (mono-calibration) undistord the image find patterns calculate the position of the patternsAs the diagram above shows, the calibration of the camera is only needed at the beginning of the process.In the following, we will describe each one of these steps. 2.3.3.1Calibrate the camera (mono-calibration)We have before all to calibrate the camera in order to know its intrisic parameters. We can notice that calibration can also inform us about external parameters but as they are always variable, they are less interesting.NAO project page 28/107Calibrate the cameraUndistord the imageFind patternsCalculate the position of the patternsReal time applicationSylvain Cherrier Msc Robotics and Automation What are the different methods of calibration ?We can notice that there are several ways to calibrate a camera: photogrammetric calibration self-calibrationPhotogrammetric calibration need to know the exact geometry of the object in 3D space. Thus, this method isn't flexible using an unknown environment and moreover, it's difficult to implement because of the need of precised 3D objects.Self-calibration doesn't need any measurements and produces more accurate results. It's this method chosen to calibrate using a chessboard which has simple reptitive patterns. Chessboard corner detectionA simple way to calibrate is using a chessboard, this allows to know accurately the position of the corners on the image.In the appendices, you could find more details about this type of calibration, I describe the Tsai method which is a simple way to proceed.Thus, the calibration informs us about the intrinsics parameters of the camera : the focales on x (fx) and on y (fy) the position of the center (cx, cy) the skew coefficient (c) the radial distortions (dr) and tangential ones (dtx and dty)Calibration is only needed at the beginning of a process, the intrinsic parameters could be save in an XML file. 2.3.3.2Undistord the imageUndistord the image is the first step of the real time application which will be executed at a specific speed.Distortions can be responsible to a bad arrangement of pixels within the image, which will have a big impact on the accuracy of the localization.Thus, distortions have to be compensated by a undistortion process that will remap the image.NAO project page 29/107Sylvain Cherrier Msc Robotics and AutomationTo do so, we use the distortion matrix Md and the projection matrix Pcam-px in order to calculate the ideal pixel from the distorted one.If we consider piimg as the ideal pixel and pdimg as the distorted one, considering what we said previously, we have:pimgi=s Pcampx pobj/camand pimgd=s Pcampx Md pobj/ camThe idea is to calculate the distorted pixel from the ideal one:pimgd=s PcampxMd 1s Pcampx1pimgi=PcampxMd Pcampx1pimgiWe can't calculated directly the ideal one because the distortion matrix Md is calculated thanks to the ideal pixels. Because of this problem, we will need to use image mapping:ximgd=mapx( ximgi, yimgi) and yimgd=mapy( ximgi, yimgi)Thus, in order to find the ideal image, we remap:( ximgi, yimgi)=(mapx( ximgi, yimgi) , mapy(ximgi, yimgi)) 2.3.3.3find patternsThen, we have to find patterns we want to focus on.We can either : find keypoints (Harris, Shi-Thomasi, SUSAN, SIFT, SURF, ...) or find keyregions (RANSAC, Hough, )SIFT and SURF allow to add some robust pattern recognition invariant in scale, rotation, brightness and contrast even if they require more computation than other corner detection methods.I don't give more details about the corner detection methods but they are divide in two types : Harris, Shi-Thomasi : an Harris matrix is calculated using the local gradients and depending on the eigenvalues, it's possible to extract specific corners. SUSAN (Smallest Univalue Segment Assimilating Nucleus): it works with a SUSAN mask and generate a USAN (Univalue Segment Assimilating Nucleus) region area depending on the pixel intensities. It can be improved using shaped rings counting the number of intensity changes.As my project is oriented for pattern recognition, I used SIFT and SURF to extract keypoints.NAO project page 30/107Sylvain Cherrier Msc Robotics and Automation 2.3.3.4calculate the position of the patternsKnowing the projection matrix Pcam-px given by the calibration, it's possible to calculate 3D coordinates (xobj/cam, yobj/cam, zobj/cam) from pixels (ximg, yimg) :(ximgyimg1 )=s Pcampx(xobj/ camyobj /camzobj/ cam)1zobj/ cam(fx xobj/cam+cx zobj/ camf y yobj /cam+cyzobj/ camzobj /cam) if oc0In the following, we consider c = 0. Knowing the distance from the object (zobj/cam), we can calculate directly the position of the object (xobj/cam, yobj/cam) :xobj/ cam=ximgcxf xzobj/ cam andyobj/ cam=yimgcyf yzobj /cam Knowing one dimension of the object (dobj), we can calculate the distance from the object (zobj/cam) :zobj/ cam=f0dimgc0 dobjWhere f0 and c0 are the mean values of fx, fy and cx, cy. The dimension dimg refers to the the dimension dobj/cam seen on the image.Thus, we see there are two main problems of the localization with one camera : either it's needed to know the distance from the object zobj/cam or it's needed to know at least one dimension of the object (dobj/cam) and know how to extract it from the image (dimg).In order to make more autonomous the localization, we need more than one camera as we will see in the next part using the stereovision.NAO project page 31/107Sylvain Cherrier Msc Robotics and Automation2.3.4. Localization with two cameras : stereovisionNow we will focus on stereovision using two cameras (one on the left and another on the right) : it enables autonomous localization even if it's more complicated to implement.It requires the same main steps that using one camera but adds a bit more : mono and stereo-calibrate the cameras rectify, reproject and undistord the image find patterns in the left camera find the correspondences in the right camera calculate the position of the patternsAs previously, the calibration is isolated from all the real time application.In the following, I give more details about each one of these steps.NAO project page 32/107mono and stereo-calibrate the camerasrectify, reproject and undistord the imageFind patterns in the left cameraFind the correspondences in the right imageCalculate the position of the patternsReal time applicationSylvain Cherrier Msc Robotics and Automation 2.3.4.1mono and stereo-calibrate the camerasThis operation consists, as its name means, to mono-calibrate each one of the cameras (left and right) and stereo-calibrate the system made up of the two cameras.A simple way to do is to firstly mono-calibrate each one of the cameras in order to have their respective intrinsic and extrinsic parameters. After, the stereo-calibration will use their extrinsic parameters in order to know their relationship.This relationship will be defined by : the relative position between the left camera and the right one Plr (rotation Rlr and translation Tlr) the essential and fundamental matrices E and FIn the diagram above, you can see Pl and Pr as the position of the left and right cameras realtive to the object P. Tlr and Rlr alude to the relative translation and rotation from the left camera to the right one.The essential matrix E links the position of the left camera Pl and the right one Pr (it considers only extrinsic parameters) :PrTE Pl=0The fundamental matrix F links the image projection of the left camera pl and pr (it considers both extrinsic and intrinsic parameters) :NAO project page 33/107POlOrPlPrTlrRlrSylvain Cherrier Msc Robotics and AutomationprTF pl=0Where pl=Ml Pl and pr=MrPrMl and Mr alude to the intrinsic matrices of the left and right cameras.The matrix F enables us to have a clear relation between one pixel from the left camera and others from the right one. Indeed, one pixel on the left image corresponds to a line in the right one (called right epipolar line lr), the same with one pixel on the right image (left epipolar line ll).In the diagram above, you can see a pixel pl from the left image imgl with its corresponding right epiline lr from the right image lr.Ol and Or alude to the optical center of the left and right camera respectively.Thus, thanks to the fundamental matrix F, we can calculate the epilines : if pl is a pixel from the left camera, the correponding right epiline lr will be Fpl. if pr is a pixel from the right camera, the correponding left epiline ll will be FTpr.NAO project page 34/107Ol OrplRight epiline lrLeft camera Right cameraimglimgrSylvain Cherrier Msc Robotics and AutomationMoreover, using the relative position Plr, we are able to calculate a rectification and reprojection matrices for each one of the cameras. But why rectify and reproject ?As we're working with a stereovision of two cameras, the goal is to have a simple relationship between them in order to : simplify the correspondence step (on the right image) but before all, calculate easily the disparity between the two images (step of position calculation)Indeed, a translation only on x will enable us to predict more easily the position of the features within the right camera : it avoids us to have to use the fundamental matrix F and the epilines to make the correspondence.The fact that the cameras have just just a shift on x between them enables the epilines to be at the same y that the pixel from the other image as the diagram below shows. The rectification will compensate the relative position between the cameras in order to have only a shift on x between them. A rectification matrix, which is a rotation one, will be applied to each one of the cameras (Rlrect and Rrrect). The reprojection will unify the projection matrices of the two cameras (Plcam-px and Prcam-px) in order to build a global one Pglobalcam-px. Indeed, the goal here is to find the global camera equivalent to the stereovision system. A reprojection matrix will be applied also at each one of the cameras (Plrepro and Prrepro).NAO project page 35/107OlplLeft cameraRight cameraimglRight epiline lrimgrOrSylvain Cherrier Msc Robotics and AutomationIn the appendices, there are more explanation about the epipolar geometry used in stereovision and about the calculation of the rectification and reprojection matrices. 2.3.4.2rectify, reproject and undistord the imageThis step is similar to the undistording step seen for the mono-camera case : the difference here is that we apply also a rectification and reprojection matrices to each camera.our ideal pixel will be now:pimgi=sglobal Prepro Rrect pobj /camThus, we can express the distorted (and unrectified) pixel pdimg functions of piimg:pimgd=s PcampxMd1sglobal RrectTPrepro1pimgiThen, similarly to undistortion, we will use image mapping. 2.3.4.3find patterns in the left cameraExactly similar to the case of one camera but now we will extract patterns from the left camera. 2.3.4.4find the correspondences in the right cameraThe correspondences on the right image can be calculated using : a matching function on epipolar lines an optical flow (Lucas-Kanade for example)NAO project page 36/107Sylvain Cherrier Msc Robotics and Automation Matching function on epipolar linesThe first method allows to use the interest of the rectification which enables to work on an horizontal line (constant y).Using pixels from the patterns found in the left image, it goes through horizontal lines in the right image for a same y : as we work on the right image, the correspondence points are expected to be on the left side of the position x of the pattern. A matching operation using a small window called Sum of Absolute Difference (SAD) is calculated at each possible pixels of the right image (this operation is based before on texture).Thus, the maximum of the matching function give the best correspondence.The diagram below explains the principle. Optical flowOptical flow or optic flow is the pattern of apparent motion of objects, surfaces and edges in a visual scene caused by the relative motion between an observer and the scene.Thus, you can extract an optical flow using : several images from one camera at different instants several images from different cameras with different viewpointIn stereovision, we are in the second case.NAO project page 37/107Sylvain Cherrier Msc Robotics and AutomationThe Lucas-Kanade method is an accurate method to find the velocity in a sequence of images but it leans on several assumptions : brightness constancy : a pixel from an object detected has a constant intensity. temporal persistence : the image motion is small. spatial coherence : neighboring points in the world have same motion, surface and neighboring position on the image.Using Taylor series, we can express the intensity of the next image I(x+x, y+y, t+t) functions of the previous one I(x, y, t) :I ( x+6 x , y+6 y ,t +6t )I ( x , y ,t )+I x6 x+I y6 y+ It 6tUsing the assumption of brightness constancy and the relation from above a an equality, we have :I ( x+6 x , y+6 y , t +6t )=I ( x , y ,t )and Ix V x+ I y Vy+ It =0WhereV x= xt andV y= ytAs we have two unknowns for the components of the velocity (Vx and Vy), the equation can't be solved directly. We need to use a region made up of several pixels (at least two pixels) where we will consider that we have a constant vector of speed (spatial coherence assumption).As Lucas-Kanade method focus on local features (small window), we see why it's important to have small movements (temporal persistence assumption).In the following, I will consider :Ix= I x,I y= I y andIt= ItNAO project page 38/107Sylvain Cherrier Msc Robotics and AutomationThen, using the previous formula applied to small regions, we have for a region of n pixels :(Ix1I y1Ix2I y2... ...IxnI yn)(VxVy)=(It1It2...Itn)In order to calculate the velocity, we need to calculate a Least Squares Solution :(V xV y)=(( ATA)1AT)(It1It2...Itn) where A=(Ix1Iy1I x2Iy2... ...IxnIyn)The Least Squares Solutions works well for features as corners but less for flat zones.The Lucas-Kanade method can be improved to consider also big movements using the scale dimension. Indeed, in this case, we can calculate the velocity (Vx, Vy) for a large scale to refine it several times before to work on the raw image (pyramid Lucas-Kanade). The diagram below shows the principle of this method.Thus, using the velocity, we can calculate the displacement vector d and so the correspondence pixels within the right image.NAO project page 39/107Sylvain Cherrier Msc Robotics and Automation 2.3.4.5calculate the position of the patternsThis step is the core of the use of stereovision of localization.The diagram below illustrates the stereovision system when the two images have been undistorted, rectified and reprojected.We can find a geometrical relation between the cameras in order to have the distance from the object Z :T( xlxr)Zf=TZ thus Z= f TxlxrThe disparity between the two cameras is defined as :d=xlxrThus, similarly that in the mono-camera case, knowing Z, we can calculate the complete position of the object relative to the camera (X, Y, Z) :X=xlcxlfZ and Y=ylcylfZNAO project page 40/107Sylvain Cherrier Msc Robotics and Automation disparityThe disparity d is an interesting parameter to measure as it's inverse proportionnal of the distance of the object Z : for an object far from the camera, it will be near zero : the localization become less accurate. For an object closed to the camera, it will tend to infinite : the localization become very accurate.Thus, we will have to find a maximal distance from where the object is detected with enough accuracy.NAO project page 41/107Sylvain Cherrier Msc Robotics and Automation 2.4NAO control2.4.1. IntroductionThe NAO is an autonomous humanoid robot built by the French company Aldebaran-Robotics. This robot is good for education purposes as it's done to be easy and intuitive to program.It integrates several sensors and actuators which allow students to work on locomotion, grasping, audio and video signal treatment, voice recognition and more.A humanoid robot such as NAO can make more funny the courses motivating the students to work on practical projects : it allows to test very quickly some complicated algorithms on real time and using a real environment.2.4.2. Hardware 2.4.2.1GeneralFirstly, it's necessary to know that there are several models of NAO which enable to do different kinds of applications : NAOT2 (humanoid torso) : sensing, thinkingThis model allows to focus on signal treatment and artificial intelligence. NAOT14 (humanoid torso) : sensing, thinking, interacting, controlling, graspingThis model adds more practical aspect of the previous one with the capacity to work on control. NAOH21 (humanoid) : sensing, thinking, interacting, controlling, travelingThis first model of humanoid allows to make NAO move within the environment : a lot of applications are possible with for example mapping. NAOH25 (humanoid) : sensing, thinking, interacting, controlling, travelling, being autonomous, graspingThis last model is the most evolved : it adds the ability to grasp objects to the humanoid (as the torso NAOT14 does also).NAO project page 42/107Sylvain Cherrier Msc Robotics and AutomationNotice: Moreover, besides these different models of NAO, there are some modules which can be added to NAO such as the laser head. The use of a laser improves the navigation of the robot in complex indoor environments.Besides the model, the NAO robot refers to a specific version (V3+, V3.2, V3.3, ).The robots available in the laboratory are NAOH25 version V3.2 : this hardware is a good base for a lot of applications. 2.4.2.2FeaturesThe NAO robot (NAOH25) has the following features : height of 0.57 meters and weight of 4.5 kg sensors : 2 CMOS digital cameras, 4 microphones, 2 ultrasonic emitters and receivers (sonars), 2 infrared emitters and receivers, 1 inertial board (2 gyrometers 1 axis and 1 accelerometer 3 axis), 9 tactile sensors (buttons), 8 pressure sensors (FSR), 36 encoders (Hall effect sensors)NAO project page 43/107NAOT2NAOT14NAOH21NAOH25Sylvain Cherrier Msc Robotics and Automation actuators : brush DC motors others outputs : 1 voice synthetizer, LED lights, 2 high quality speakers Connectivity : internet (Ethernet and WIFI) and infrared (remote control, other NAO, ...) a CPU (x86 AMD GEODE 500 MHz CPU + 256 MB SDRAM / 2 GB Flash Memory) located in the head with a Linux kernel (32bit x86 ELF) and Aldebran Software Development Kit (SDK) NAOqi a second CPU located in the torso a 55 Watt-Hours Lithium ion battery (nearly 1.5 hours of autonomy)The NAO is well equipped in order to be able to be very flexible on its applications.We can notice that the motors are controlled by a dsPIC microcontroller and through their encoders (Hall effect sensors).The robot has 25 Degree Of Freedom (DOF) : head : 2 DOF arm : 4 DOF in each arm pelvis : 1 DOF leg : 5 DOFin each leg hand : 2 DOF in each handNAO project page 44/107Sylvain Cherrier Msc Robotics and AutomationFor my project, I focused on the head at the beginning in order to work on object tracking.After, I planned to work on object grasping : this task requires more calculations to synchronise correctly the actuators.Depending on the time, I had to leave some control as my main topic was on image processing for object recognition and localization.In the next part, I will give more details about the two cameras of the NAO robot as I focused on these sensors. 2.4.2.3Video camerasNAO has two identical video cameras which are located on its face. They provide a resolution of 640x480 and they can run at 30 frames per second.Moreover, we have these features for the two cameras : camera output : YUV422 Field Of View (FOV) : 58 degrees (diagonal) focus range : 30 cm infinity focus type : fixed focusNAO project page 45/107Sylvain Cherrier Msc Robotics and AutomationThe upside camera is oriented straight through the head of NAO : it enables the robot to see in front of him.The downside camera, down shiffted, is oriented to see the ground : it enables NAO to see what he meets on the ground closed to him.We can notice that these cameras can't be used for stereovision as there is no everlapping between them.2.4.3. SoftwaresThere are several softwares which enable to program NAO very quickly and easily. They allow also to interact easily with the robot. 2.4.3.1ChoregrapheChoregraphe is a cross-platform application that allows to edit NAO's movement and behavior. A behavior is a group of elementary actions linked together following an event-based and a time-based approach.NAO project page 46/107Sylvain Cherrier Msc Robotics and AutomationIt consists on a simple GUI (Graphical User Interface) based on NAOqi SDK, which is the framework used to control NAO : we will focus on it later. InteractionsChoregraphe allows to see a NAO robot on the screen which imitate the actions of the real NAO in real time. It enables the user to use the simulated NAO to save more time during the tests. The software enables to un-enslave the motors which is impose no stiffness for each motor. This aspect allows the user to focus on the simulated NAO if he had to test the code more quickly and safely.Moreover, the software enables to control each joints in real time from the GUI. The interface displays the new joint values in real time even if the user moves the robot manually. Thanks to these features, the user can easily test the range of each joint for example.NAO project page 47/107Sylvain Cherrier Msc Robotics and Automation BoxesIn Choregraphe, each movement and detection (action) can be represented as a box; these boxes are arranged in order to create an application. As mentionned previously, behaviors which are groups of actions can be also represented as boxes but with several boxes inside.In order to connect the boxes, they have input(s) and output(s) which allow generally to start and finish the action. For specific boxes, such as the sensors'ones or computation's ones, there can be several outputs (the value of the detected signals, calculated variables, ...) and several inputs (variables, conditions, ).NAO project page 48/107Sylvain Cherrier Msc Robotics and Automation TimelineAn interesting feature of Choregraphe is the possibility to create an action or a sequence of behaviours using the timeline. Contrary to the system of boxes seen previously, the timeline allows to have a view on the time which is the real conductor of the application.The screenshot above show the timeline (marked as 1), the behavior layers (marked as 2) and the position on the frame (marked as 3). Thus, it's easier to manage several behaviors in series or in parallel.Moreover, the timeline enables to create an action from several positions of the joints of the robot. Indeed, it's only necessary to select a specific frame from the timeline, to put the NAO in a specific position and save the joints. The software will automatically calculate the required speed of the actuators in order to follow the constraints of the frames. It's also possible to edit manually the laws of speed for each motor even if it's necessary to be careful !In the screenshot above, you can see the keyframes (squares) and several functions controlling the speed of the motors.Edit a movement using keyframes allows for example to implement easily a dance for NAO.Choregraphe offers others possibilities such as a video monitor panel enabling to :NAO project page 49/107Sylvain Cherrier Msc Robotics and Automation display the images from NAO cameras recognize an object using as model the selection of a pattern on the image 2.4.3.2TelepatheTelepathe is an application that allows to have feedback from the robot, as well as send basic orders. As Choregraphe, it's a cross-platform GUI but which is customizable (the user can load plugins in different widgets). These widgets or windows can be connected to different NAO robots.Actually, Telepathe is a more sophiscated software than Choregraphe for the configuration of devices and visualization of data. It has directly a link with the memory of NAO which allows to follow in real time a lot of variables. Memory viewerThe module ALMemory of NAO (which is part of NAOqi) allows to write to and read from the memory of the robot. Telepathe uses this module to display variables with tables and graphs. The user can monitor some variable accurately during testing.NAO project page 50/107Sylvain Cherrier Msc Robotics and Automation Video configurationMoreover, Telepathe enables to configure deeply the video camera. Thus, it make easier the testing of the applications of computer vision.2.4.4. NAOqi SDKThe Software Development Kit (SDK) NAOqi is robotic framework built especially for NAO. There are a lot of others for use in a large range of robots : Orocos, YARP (Yet Another Robotic Platform), Urbi and other. They can be specialized in motion, in real-time, IA oriented and they can be sometimes open-source in order to enable all people to work on the source code.NAOqi allows to improve the management of the events and the synchronization of the tasks as a real-time kernel can do. It works with a system of modules and around a shared memory (ALMemory module).NAO project page 51/107Sylvain Cherrier Msc Robotics and Automation 2.4.4.1LanguagesThe framework uses mainly three languages : Python : as a simple interpeter is needed to generate the machine code, the language enables to control quickly NAO and manage parallelism more easily. C++ : the language needs a compilator to execute the program; it's used to program new modules as they work well with oriented object languages and as they are expected to stay fixed. URBI (UObjects and urbiScript) : URBI is a robotic framework as NAOqi which can start from NAOqi (it's a NAOqi module called ALUrbi). It enables to create UObjects which refer to modules and use urbiScript to manage events and parallelism.URBI is an interesting framework as it's compatible with a lot of robots (Aibo, Mindstorms NXT, NAO, Spykee, ) but in my internship, I focused on NAOqi which has the same principle. Thus, in the following, I will focus on the Python and C++ languages which use the core of NAOqi.NAOqi is very flexible on its languages (cross-language) as C++ modules can be called from a Python code (as the opposite) : this simplifies the programming and the exchange of programs between people.Moreover, NAOqi is embedded with several libraries such as Qt or OpenCV which enable respectively to create GUIs and manipulate images. 2.4.4.2Brokers and modulesNAOqi works using brokers and modules : a broker is an executable and a server that listen remote commands on IP and port. a module is both a class and a library which deals with a specific device or action of NAO.Modules can be called from a broker using proxies : a proxy enables to use methods from a local or remote module. Indeed, a broker can be executed on NAO but also on others NAO and computers : it enables to make only one program working on several devices.An interesting propriety of the module is that they can't call directly other modules : only proxies enable them to use methods from different modules as they refer to the IP and port of a specific broker.NAO project page 52/107Sylvain Cherrier Msc Robotics and AutomationIn the diagram above, you can see the architecture adopted by both NAO and its softwares; purple rectangles alude to brokers and green circles to modules. The broker mainBroker is the broker of the NAO robot which is executed directly when the robot is switched on and has an IP address (Ethernet or WIFI).This architecture is a safe one as the robot and each one of the softwares work with different brokers : if there is a crash on Choregraphe or on Telepathe, the robot will be less expected to fall as the brokers will not communicate during 100 ms.By using NAOqi as developper, we can choose to use the mainBroker (unsafe but fast) or create our own remote broker. Personnaly, as I focused firstly on the actuators from the head for the tracking, I prefered use the mainBroker as it was more intuitive and fast.As the diagram above shows, there are a lot of modules available for NAO (mainBroker) but we can add more using the module generation from C++ classes.The modules refer often to devices available on the robot : sensors (ALSensors) : tactile sensors motion (ALMotion) : actuators leds (ALLeds) : led outputs videoInput (ALVideoDevice) : video cameras audioIn (ALSoundDetection) : microphones AudioPlayer (ALAudioPlayer) : speakers NAO project page 53/107Sylvain Cherrier Msc Robotics and AutomationBut, there are others which are built for a specific purpose : interpret Python code : pythonbridge (ALPythonBridge) run computationnal processes : faceDetection (ALVisionRecognition), landmark (ALLandMarkDetection), communicate with the lower level software controlling electrical devices : DCM (DCM) manage a shared memory between modules : memory (ALMemory) Concerning Telepathe, we meet a memory and a camera modules as expected.In order to enable a module to make its methods available for the brokers, they need to be bound to the API (Application Programming Interface) module. This module enables to call the methods both locally and remotely.As we said previously, python code allows to control quickly NAO because NAOqi SDK provides a lot of python functions to call Proxies from one or several brokers and use these proxies to execute methods.NAOqi provide the same kind of methods for C++ but more oriented for the generation of modules. 2.4.4.3Generation of modules in C++Before all, the module can be generated using a Python code called module_generator.py. It allows to generate automatically several C++ templates : projectNamemain.cpp : it allows to create the module for a specific platform (NAO or PC). moduleName.h and moduleName.cpp : they are the files which will be customized by the user.projectName and moduleName will be remplaced by the desired names.Thus, as NAOqi provides module libraries and other libraries (OpenCV for example) for C++, the user can develop easily with the framework. When the module is ready, it's necessary to compile it.In order to build modules, NAOqi uses CMake which enables to compile sources using a configuration file (generally with the name CMakeLists) and others options such as the location of the sources or of the binaries. The Python script module_generator.py creates also a CmakeLists file which facilitate the compiling process. CMake can be executed with a command as it's the case on Linux or it can open a GUI as on Windows.NAO project page 54/107Sylvain Cherrier Msc Robotics and AutomationIn order to compile modules, we use cross-compiling so as to make the code run in a specific platform (PC or robot). Indeed, the PC has not the same modules as the robot as it has not the same devices. Cross-compiling is used using a cmake file : toolchain-pc.cmake to compile a module for the PC and toolchain-geode.cmake for NAO.On Linux, the CMake command has to mention another option in order to cross-compile :cmake -DCMAKE_TOOLCHAIN_FILE=path/toolchain_file.cmake ..Cross-compilation from Windows for NAO is impossible as NAO has a Linux OS : however, Cygwin on Windows can be an alternative.After execution, CMake provide : a MakeFile configuration file on Linux : it descibes the compilator to use (g++ in our case) and a lot of other options and it has to be executed by the make command. A Visual Studio Solution file (.sln) on Windows : it can be open on Visual Studio and it's necessary to build it.At the end of the process, both Operating Systems create a library file (.so on Linux and .dll on Windows). The problem of Windows in this case is that it can't provide a library for NAO as the robot uses .so libraries (Linux). The module library can be moved after in the lib folder of NAO or the PC and the configuration file autoload.ini has to be modified in order to load this module. 2.4.4.4The vision module (ALVideoDevice)As I focused on the vision, I present in this part the management of the NAO cameras through the vision module called ALVideoDevice.The vision on NAO (ALVideoDevice module) is controlled by three modules : the driver V4L2It allows to acquire frames from a device : it can be a real camera, a simulated camera, a file.It used a circular buffer which contains a specific number of image buffers (for NAO, there are four images). This images will be updated at a specific frame rate, resolution and format depending on the features of the device.NAO project page 55/107Sylvain Cherrier Msc Robotics and Automation the Video Input Module (VIM)This module allows to manage the video device and the driver V4L2 : it opens and close the video device using the I2C bus, it allows to start it in streaming mode (circular buffer) and stop it. It will configure the driver for a specific device (real camera, simulated camera, file). Moreover, it can extract an image from the circular buffer (unmap buffer) at a specific frame rate and convert the image at a specific resolution and format. We can notice that several VIM can be created at the same time for different or same devices, they are identified by a name. the Generic Video Module (GVM)This module is the main interface between the video device and the user : it allows the user to choose a device, a frame rate, resolution and format; the GVM will configure the VIM in order to do this. Moreover, this module enables the user to control the video device in local or remote mode. The difference will be at the level of the data received : in local mode, the image data (from the VIM) can be accesed using a pointer whereas in remote mode, the image data has to be duplicate from the image of the VIM and it will be an ALImage. The GVM has also methods to recuperate the raw images from the VIM (without any conversions) : this way to do allow to improve the frame rate even if it requires a restricted number of VIM depending on the number of buffers enabled by the driver.The diagram below show the three modules with their own thread (driver, VIM and GVM). The GVM can access the image by two accesses : access 1 to use conversion : by pointer if local and by ALImage variable if remoteross-compilation from Windows for NAO is impossible as NAO has a Linux OS : however, Cygwin on Windows can be an alternative. access 2 to use without conversion (raw image) : by pointer if local and by ALImage variable if remoteNAO project page 56/107Sylvain Cherrier Msc Robotics and AutomationIn my case, I used the GVM on remote and local mode with conversion. I used the front camera of NAO in RGB and BGR modes : as the camera is initally in YUV422, a conversion is needed.NAO project page 57/107GVM threadCameraDriver threadVIM threadGVM threadGVM threadUnmap bufferFormat conversionAccess 1Access 2Sylvain Cherrier Msc Robotics and Automation 3Implementation and testingIn this part, I will focus on what hardware and software I used to accomplish my mission : object recognition and localization on a NAO robot for control purpose. 3.1Languages, softwares and librariesI worked using two ways : simulating and testing on MatlabMatlab is a very useful tool to test algorithms as it's very easy to manipulate data and display it on figures. Moreover, a lot of toolboxes are available on the web : it enables the users to use quickly programs as tools.Moreover, Matlab is cross-platform : it can work an Windows, Linux or Mac. It's cross-language as the mex command allows to convert C, C++, Frotran code to matlab code and the opposite.It offers a lot of toolboxes for several fields shuch as aerospace, image processing, advanced control, Matlab offers also functions to read data from serial port (serial) or from other hardwares such as video cameras which allow to manipulate a large range of devices. All these aspects make Matlab very flexible for any applications needing computation. real implementation in C++/Python with OpenCV for NAONAO can be programmed using both C++ and Python as we saw in the NAO control part. C++ allows to build modules for NAO whereas Python which is directly interpreted allows to do quickly applications using modules.NAO project page 58/107MatlabSylvain Cherrier Msc Robotics and AutomationIn my case, as the vision was the core of my project, I chose to build a vision module in order to have the main features of the object recognized and to enable the user to choose what object to recognize among other parameters.I used Python for control as I didn't focus on control during my project. If there is a problem from the control part, it's very easy to modify the Python code as it's not necessary to complile it. I could use Matlab to simulate the control as i did for the vision part but I didn't have enough time.I used the Open source Computer Vision (OpenCV) library in order to implement more easily the vision part. 3.2Tools and methods3.2.1. RecognitionFor recognition, I used two ways to test on Matlab : Firstly, I used an image (studied image) and a transfromed image of it (cropped, rotated, scaled) as model image.This way to do allows to test easily the number of keypoints detected, the number of matches depending on the rotation, scale and size of the cropped image. Using this method, we can test both the feature extraction (SIFT/SURF) and the feature matching.I focused before all on the invariance in rotation for the test : I used several rotated images and calculated the number of keypoints and matches with the studied image (with SIFT). You can see below the keypoints detected for several rotations and the number of keypoints functions of the rotation of the model image (green points : studied image, red points : model image, blue points : matches).NAO project page 59/107OpenCV PythonSylvain Cherrier Msc Robotics and AutomationThe number of matches is reduced with the rotation of the image : the SIFT was not efficient and I had to improve it.After, I focused on the calculation of 2D model features within the studied image : the model image was cropped, rotated and scaled depending on the inputs of the user. The matching could estimate the values and in order to test it, we could calculate the translation, rotation and scale errors.NAO project page 60/107Sylvain Cherrier Msc Robotics and Automation Then, I used images acquired from the video camera of my computer and model images of objects.This way to do allows to test the algorithm for real 3D objects from the world. It enables to see the influence of rotation of the object in 3D which limits the recognition. To make it simple, I focused firstly on plane objects (books for example) as only one model image is required. After, for specific objects such as robots, several model images are needed. The 2D features of the model within the acquisition are always interesting to calculate.NAO project page 61/107Sylvain Cherrier Msc Robotics and Automation3.2.2. LocalizationFor localization, I didn't spend the time to test on Matlab and I focused directly on the implementation on C++ using the Open source Compter Vision (OpenCV). I used a stereovision system available in the laboratory (a previous MSc studient, Suresh Kumar Pupala, had designed it for his dissertation). I wanted to learn more about mono-calibration, stereo-calibration and stereo correspondence as OpenCV functions hide the processing. Thus, I used also Matlab to test programs for calibration and stereo correspondence. I thought that understanding the concepts could enable me to be more comfortable with camera devices. The photos below shows the stereovision system and NAO wearing them : the system can be connected to a PC (USB ports); in order to use them with NAO, it's necessary to start NAOqi both on NAO and the PC.You can see below the screenshots showing the stereo-calibration and the stereovision using C++/OpenCV.NAO project page 62/107Sylvain Cherrier Msc Robotics and AutomationThe blue points alude to the corners found in the left image (left camera), the green points to the corresponding points in the right image using stereocorrespondence (right camera) and then the red points to the 3D points deduced in the left image.3.2.3. NAO controlMy final goal was to use a vision module (enabling object recognition and localization) combined with a program in Python for NAO control. I wanted firstly to do a basic control for NAO (as tracking the object by its head, moving to the object, saying what object he is watching) and after, if it's possible, more complicated control (grasping an object for example). In order to practice on NAO, I did a small vision module to track a red ball (HSV color space) and I made NAO focus on the barycenter of the object rotating its head (pitch and yaw).NAO project page 63/107Sylvain Cherrier Msc Robotics and AutomationIn the screenshot above, you can see the original image on the left and the HSV mask on the right : tracking using HSV colorspace is not perfect as it depends a lot on the brightness but it was a simple project to practice on NAO. 3.3Steps of the projectMy project was divided in several steps : As we said before, the recognition is divided in two main step : feature extraction and feature matching. I worked firstly on feature extraction with SIFT. I implemented SIFT on Matlab using a C++ code written by an Indian student in Computer Science, Utkarsh Sinha. He used as me the publication of David G. Lowe ( Distinctive Image Features from Scale-Invariant Keypoints ). I focused on a clear program instead of a fast one because I wanted to divide it in several distinct functions : by doing this, the program becomes more flexible. After, I studied the feature matching and I focused on the descriptor matching and the calculation of the 2D model features (Least Squares Solution). Then, I looked into the localization by learning the parameters of a video camera, the calibration and the stereovision. In order to implement it, I used a stereovision system adaptable for NAO's head which was available in the laboratory. I used a lot of references such as a MSc dissertation from a previous student in Robotics and Automation ( 3D-Perception from Binocular Vision - Suresh Kumar Pupala) or the publication from David G. Lowe. In order to implement it in C++, I used the Open source Compter Vision (OpenCV) library. . I did also some experiments on matlab in order to understand more the calibration and the stereo correspondence. Before to link the localization with the NAO project page 64/107Sylvain Cherrier Msc Robotics and Automationrecognition, I focused only on corners, this allows to study the evolution of disparity and the values of 3D positions. After, I learned about the control of NAO and I tested a small module of recognition in C++ (ballTracking) combined with a control part in Python. The goal of the small project was to deal with the use of NAO camera or another camera (laptop camera), in remote or local mode and to build a simple control of the head of NAO. The project consisted to track a red ball on the image from the camera (HSV color space), to calculate the barycenter, and to control the head of NAO in order to make it focus its camera on the object. Later, I studied another invariant feature extraction method (SURF). Contrary to SIFT, I didn't implement it on Matlab, I only downloaded the OpenSURF library which can be found for either matlab code or C++. As this was quicker than SIFT, I used it to study the feature matching. Finally, I learned about the Generalized Hough Transform which enables to do keypoint matching focusing on the position and the orientation of the keypoints. I studied firstly the algorithms outside the matching and after I integrated them to it; I used only Matlab to test them. My work is not finished as I didn't implement the feature matching in C++. The goal is then to replace the corner detection of the localization by SURF feature extraction and add the feature matching before to do the stereo correspondence on the right image : the recognition will be done on the left image only. Thus, combining recognition and localization, we will have keypoints localized within the space, it enables to estimate the average position of the object and maybe more (3D orientation, 3D size, ).NAO project page 65/107Sylvain Cherrier Msc Robotics and AutomationNAO project page 66/107Feature matching- descriptor matching- 2D features calculationNAO control- ball tracking projectLocalization- parameters of a video camera- calibration- stereovisionSIFT- SIFT on Matlab- some testingSURF- use of OpenSURF library- some testingFeature matching- keypoint matchingFuture work- feature matching in C++- SURF and feature matching combined with stereovision in a module for NAOSylvain Cherrier Msc Robotics and Automation 4ConclusionThe project was around three subjects : object recognition object localization NAO controlI spoke firstly about these subjects before to present what I did and what I have to do for the next days. I used different methods to deal with each one of these subjects : I prefered to simulate more on matlab the recognition whereas I chose to practice quickly the localization and the control of NAO (using C++, OpenCV and Python). However, dealing with the localization, I used also Matlab in order to understand more what was inside the calibration and stereo-calibration.I can conclude that this project allowed me to learn more about the use of video camera for robotic purpose. Indeed, recognition is useful at the beginning to track the desired object and localization enables the robot to interact with it using control : the behavior of the robot will depend on these three parts (recognition, localization and control).Concerning my project, as I said when I described its steps, it's not finished : it need some implementations in C++ and some testing. I hope it will be. Contrary to SIFT, SURF can be open source with OpenSURF for example : thus, the recognition I presented is not only academic and could be used by companies.NAO project page 67/107Sylvain Cherrier Msc Robotics and Automation 5References 3D-Perception from Binocular Vision (Suresh Kumar Pupala) Website of Utkarsh Sinha http://www.aishack.in/ A simple camera calibration method based on sub-pixel corner extraction of the chessboard image (Yang Xingfang, Huang Yumei and Gao Feng)http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5658280&tag=1 University of Nevada, Renohttp://www.cse.unr.edu/~bebis/CS791E/Notes/EpipolarGeonetry.pdf A simple rectification method of stereo image pairs (Huihuang Su, Bingwei He) http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5678343 OpenCV tutorials (Noah Kuntz) http://www.pages.drexel.edu/~nk752/tutorials.html Learning OpenCV (Gary Bradski and Adrian Kaehler) OpenCV 2.1 C Reference http://opencv.willowgarage.com/documentation/c/index.html Structure from Stereo Vision using Optical Flow (Brendon Kelly)http://www.cosc.canterbury.ac.nz/research/reports/HonsReps/2006/hons_0608.pdf Distinctive Image Features from Scale-Invariant Keypoints (David G. Lowe)http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf Feature Extraction and Image Processing (Mark S. Nixon and Alberto S. Aguado) Speeded-Up Robust Features (SURF) (Herbert Bay, Andreas Ess, Tinne Tuytelaars and Luc Van Gool)ftp://ftp.vision.ee.ethz.ch/publications/articles/eth_biwi_00517.pdf Aldebaran official website (Aldebaran) http://www.aldebaran-robotics.com/ Aldebaran Robotics (Aldebaran)http://users.aldebaran-robotics.com/docs/site_en/index_doc.html URBI (Gostai)http://www.gostai.com/downloads/urbi-sdk/2.x/doc/urbi-sdk.htmldir/index.html#urbi-platforms.html NAO tutorials (Robotics Group of the University of Len)http://robotica.unileon.es/mediawiki/index.php/Nao_tutorial_1:_First_stepsNAO project page 68/107Sylvain Cherrier Msc Robotics and Automation 6Appendices 6.1SIFT: a method of feature extraction6.1.1. Definition and goalFind the keypoints invariant with scale, rotation and illumination of the object.In pattern recognition, SIFT has two advantages: it can identify complex objects in a scene using specific keypoints it can identify a same object at several positions and rotations of the cameraHowever, it has a main disavantage, the processing time. SURF (Speeded Up Robust Feature) leans on the same principle that SIFT but it's quicker but in order to be invariant in rotation, it has to use FAST keypoint detection.6.1.2. DescriptionSIFT process can be divided in two main steps we will study in detail: find the location and orientation of the keypoints (keypoint extraction) generate for each one a specific descriptor (descriptor generation)Indeed, we have to know before all the location of the keypoints on the image but after, we have to build an ID very descriminating, that's why we need one descriptor for each one of them. We can have a lot of keypoints so we have to be able to be very selective in our choice.Dealing with SIFT, one descriptor is a vector of 128 values.6.1.3. Keypoint extractionThis first process is divided in two main steps: find the location of keypoints find their orientation These two steps have to be done very carefully because they have a big impact on descriptor generation we will see later.NAO project page 69/107Sylvain Cherrier Msc Robotics and AutomationFirslty, in order to find their location, we will have to work on the laplacian of gaussian. 6.1.3.1Keypoint location 6.1.3.1.1The laplacian of gaussian (LoG) operatorThe laplacian operator can be explained thanks to the following formula :L( x , y)=A I ( x, y)=2I ( x , y)=2I ( x , y)x2+2I ( x, y) y2The laplacian allows to detect pixels with rapid intensity change on : they match with extremums of the laplacian.However, as it's a second derivative measurement, it's very sensitive to noise so before to apply the laplacian, it's used to blur the image in order to delete the high frequency noise on the image.In order to do this, we use a gaussian operator which blurs the image in 2D according to the parameteru:G( x , y , u)=12u2ex2+y22u2We call the resulting operator (gaussian followed by laplacian) the laplacian of gaussian (LoG).We can estimate the laplacian of gaussian :AG( x , y , u)=2G( x , y , u)x2+2G( x, y ,u) y2AG( x , y , u)=x2+y22u22u6ex2+y22u2Actually, in the SIFT method, we don't calculate the laplacian of gaussian but we focus on the difference of gaussian (DoG) which has more advantages we will see in the next part.NAO project page 70/107Sylvain Cherrier Msc Robotics and Automation 6.1.3.1.2The operator used in SIFT : the difference of gaussian (DoG)Using the heat diffusion equation, we have an estimation of the laplacian of gaussian:G( x , y, u)u=u2G( x , y ,u)Where G( x , y, u)ucan be calculated thanks to a difference of gaussians:G( x , y, u)uG( x, y , k u)G( x, y , u)k uuFrom equations (1) and (2), we deduce:G( x , y , k u)G( x , y, u)(k1)u22G( x , y ,u)Using the difference of gaussian instead of the laplacian, we have three main advantages: it reduces the processing time because laplacian requires two derivatives it's less sensitive to noise than laplacian it's already scale invariant (normalization by u2shown in equation (3)). We can notice the factor k-1 is constant in the SIFT method so it doesn't have an impact on extrema location. 6.1.3.1.3The resulting scale spaceIn order to generate these differences of gaussian, we have to build a scale space. Instead of bluring several times the raw image, we resize it several times in order to reduce the calculation time. Indeed, more we blur, more we lose high frequency details, more we can approximate the image by reducing its size.Below, you can see the structure of the scale space for gaussians.Several constants have to be specified : the number of octaves (octavesNb) specifying the number of scales to work with the number of intervals (intervalsNb) specifying the number of local extremas to generate for each octaveOf course, more there are octaves and intervals, more keypoints are found but more the processing time is important.NAO project page 71/107(1)(2)(3)Sylvain Cherrier Msc Robotics and AutomationIn order to generate intervalsNb local extremas images, we have to generate intervalsNb+2 Differences of Gaussians (DoG) and intervalsNb+3 Gaussians.Indeed, two Gaussians are substracted to generate one DoG and three DoG are needed to find local extremas.NAO project page 72/1072-0 size20 20/intervalsNb 2-0 size20 2(intervalsNb+2)/intervalsNb 2-(octaveNb-1) size2(octaveNb-1) 2(intervalsNb+2)/intervalsNb 2-(octaveNb-1) size2(octaveNb-1) 20/intervalsNb ----MINMAXMINMAXGaussiansDifferences of GaussiansLocal extremasintervalsNb imagesintervalsNb+2 imagesintervalsNb+3 imagesSylvain Cherrier Msc Robotics and AutomationComment: prior smoothing In order to increase the number of stable keypoints, a prior smoothing to the raw image can be done. David G. Lowe Found experimentally that a gaussian of = 1.6 could increase the number of stable keypoints by a factor of 4. In order to keep highest frequency, the size of the raw image is doubled. Thus, two gaussians are applied : one before doubling the size of the image for anti-aliasing ( = 0.5) one after for pre-bluring ( = 1.0) 6.1.3.1.4Keep stable keypointsLocal extremas can be numerous for each scale, so we have to be selective in our choice.There are several ways we can be more selective : keep keypoints whose DoG has enough contrast keep keypoints whose DoG is located on a cornerIn order to find whether a keypoint is located on a corner, we use the Hessian matrix :H( x , y)=(2I ( x, y) x22I ( x , y) x y2I ( x, y) y x2I ( x , y) y2)=(dxx dxydxy dyy)Using the classical differentiate equation, we can estimate the different derivatives of the Hessian matrix :f ' ( x)=limh-0 f ( x+h)f (x)h=limh-0 f ( x+h)f ( xh)2hUsing (8) with h=0.5, we deduce :dxh=0.5( x , y)=I ( x+0.5, y)I ( x0.5, y)So, dxxdxh=0.5( x+0.5, y)dxh=0.5( x0.5, y)=I ( x+1, y)+I ( x1, y)2 I ( x, y)We have similarly :dyyI ( x,