PROJECTS ΨΗΦΙΑΚΗ ΕΠΕΞΕΡΓΑΣΙΑ ΕΙΚΟΝΑΣ/VIDEO Σπύρος Φωτόπουλος 1 Aggregating local descriptors into a compact image representation -VLAD descriptor Images and corresponding VLAD descriptors, for K=16 centroids. The components of the descriptor are represented like SIFT, with negative components in red. Σχετικό υλικό: Paper-Aggregating local descriptors into a compact image representation Jegou, H. ; INRIA Rennes, Rennes, France ; Douze, M. ; Schmid, C. ; Perez, P. http://lear.inrialpes.fr/pubs/2010/JDSP10/jegou_compactimagerepresentation.pdf https://hal.inria.fr/inria-00633013/PDF/jegou_aggregate.pdf Πρόκειται για μία απλή διαδικασία εξαγωγής χαρακτηριστικού σε εικόνα που βασίζεται στην ομαδοποίηση k-means. Είναι ο γνωστός αλγόριθμος VLAD Ζητείται η υλοποίηση του και η εφαρμογή σε «απόσταση» εικόνων – ανάκτηση εικόνων
33
Embed
Aggregating local descriptors into a compact image representation -VLAD descriptor · 2018. 6. 8. · representation -VLAD descriptor Images and corresponding VLAD descriptors, for
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Aggregating local descriptors into a compact image
representation -VLAD descriptor
Images and corresponding VLAD descriptors, for K=16 centroids. The components of the descriptor are represented like SIFT, with negative
components in red.
Σχετικό υλικό:
Paper-Aggregating local descriptors into a compact image representation
Jegou, H. ; INRIA Rennes, Rennes, France ; Douze, M. ; Schmid, C. ; Perez, P. http://lear.inrialpes.fr/pubs/2010/JDSP10/jegou_compactimagerepresentation.pdf
Project description Detecting faces and extracting key facial features remains an active research area with a wide range of
applications. This project seeks to leverage these facial feature detection methods and morphing methods
for an entertaining application: to intelligently combine the faces of two individuals to form a
composite baby image. The algorithm will involve a number of topics covered in class lectures
including color balancing, image segmentation, face detection, eigenimages, and edge detection. The first portion of the project involves implementing an algorithm for detecting human
faces and localizing the key facial features. The implementation will build upon several face
detection and facial feature extraction algorithms including Turk and Pentland’s work using
eigenfaces to locate faces [1], Huang and Chen’s work using active contour models for facial feature
extraction [2], and Saber and Tekalp’s work using color, shape, and symmetry-based cost functions for
facial detection and feature extraction [3]. Additionally, the input faces’ ethnicities will be classified
using eigenimages or fisher images and combined to select the baby’s ethnicity from a predefined
database. The second portion of the project involves combining the identified facial features of the two
individuals to form a composite image. This algorithm incorporates facial morphing techniques
including Beier’s field- morphing algorithm [4] to properly weight and combine the input faces. A
randomized weighting method will be used for selecting which features from the input images will
appear in the output image so that a different baby image is generated each time the program runs.
The final implementation of the baby face generator will include a user-friendly interface
implemented in MATLAB.
REFERENCES
[1] M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of cognitive
neuroscience, vol. 3, no. 1, pp. 71–86, 1991.
[2] C.-L. Huang and C.-W. Chen, “Human facial feature extraction for face interpretation and
The goal of this project is to develop an image-processing algorithm that can recognize
and classify beer bottle labels in near real-time. Mobile applications that implement such
an algorithm could be used to provide information not available on the bottle (e.g. consumer
ratings and reviews) in different on-the-go situations like grocery shopping.
Goals and
Implementation:
Since this project revolves around machine learning, we will first need to collect images for
the training and testing datasets. For the training set, we will collect “clean” beer labels using
images available online (Figure 1, left). For the testing set, we will use photographs of
bottles with corresponding labels, similar to those that would be acquired with a mobile
phone (Figure 1, right). We plan to use a training set of at least 100 images/beer labels,
including some that come from the same brewery to make the classification more difficult.
Figure 1: Example training image (left) and matching test image (right).
For the test images, we will perform the following pre-processing steps before feature
extraction:
1. Conversion from RGB to grayscale (for some features)
2. 8:1 or 4:1 downsampling with low-pass filtering to improve runtime speed
3. Segmentation to extract the label from the rest of the photograph 4. De-warping to make the label flat using an approximate cylindrical projection [1]
Three sets of features will be investigated:
1. Scale-invariant features, identified by finding local extrema in difference-of-gaussian image pyramids using the SIFT algorithm [2].
2. RGB color histograms of the extracted label. Although this method is sensitive to subtle
changes in noise and lighting, it can be useful for labels that are largely one color (for
example, the label shown in Figure 1). Feature comparisons can be computed using the histogram intersection kernel for support vector machines [3].
3. Prevalence of text. Since labels can include many different fonts and sizes of letters, any attempt to identify specific characters via morphological image processing would probably not succeed. Therefore, instead of character recognition, we will perform text detection –
that is, detecting regions where there is text versus regions where there is no text [4]. How well a
test label matches with each image in the training set can then be quantified by the fraction of the
label that is taken up by text.
We will perform the final label classification taking each of these features into account and
weighting them according to their individual success.
References:
[1] N. Stamatopoulos, et al., "A two-step dewarping of camera document images," in
Document Analysis Systems, 2008. DAS'08. The Eighth IAPR International Workshop on, 2008, pp. 209-216.
[2] D. G. Lowe, "Object recognition from local scale-invariant features," in Computer vision, 1999. The proceedings of the seventh IEEE international conference on, 1999, pp. 1150-
1157. [3] A. Barla, et al., "Histogram intersection kernel for image classification," in Image
Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on, 2003, pp. III-513-16
vol. 2.
[4] C. Liu, et al., "Text detection in images based on unsupervised classification of edge- based features," in Document Analysis and Recognition, 2005. Proceedings. Eighth
International Conference on, 2005, pp. 610-614.
Βοηθητικό συνημμένο paper : Beer Label Classification for Mobile Applications
Human face, as a window to the soul, conveys a significant amount of nonverbal information to
facilitate the real-world human-to-human communications. Recall first time when you meet a
person, you have the ability, probably developed early in life as a child, to accurately determine
facial attributes such as identity, age and gender. We want a machine that can do the same job, as in
some science-fiction films.
Specifications Particularly for this project, we are aiming to estimate the human age only. We can extend to
identity or gender. We plan to develop a MATLAB application that can perform the following
tasks on an input human face image:
1. Accurately recognize the human face as well as important facial aging features and
patterns.
2. Given aging features/patterns, estimate the age range.
Prior works
Age estimation by machine has been a challenging problem for long time. Different people have
different rates of aging process, which is determined by not only the people’s gene but also many
factors, such as health condition, living style, working environment, and sociality.
Paper [1] – [4] introduce a few techniques to tackle τηισis problem:
The system in [1] has to be performed on a large database, Yamaha gender and age
(YGA) database that can be downloaded from web. Biologically-inspired features
(BIF) were investigated for age estimation and showed good performance. Estimation
performance can be further improved significantly when manifold learning uses BIF features.
This could be a promising technique for us in terms of efficiency and manageability.
Paper [2] uses similar approach to that in [1]. It improves the BIF models and shows a better age
estimation results. We will implement the improvements if time permits.
Paper [3] proposes a novel scheme for aging feature extraction and automatic age estimation. A
new method (locally adjusted robust repressor) is introduced for robust learning and prediction of
aging patterns. Experiments are performed on FG-NET database. Complexity of this method is
higher. Paper [4] introduces an estimation method called AGES. It models the aging pattern, which is defined as the sequence of a particular individual’s face images sorted in time order, by constructing a representative subspace.
References
[1] Guodong Guo; Guowang Mu; Yun Fu; Dyer, C.; Huang, T., "A study on automatic age
estimation using a large database," Computer Vision, 2009 IEEE 12th International Conference on
Distance information for potential obstacles in the path of a vehicle is critical for intelligent automotive systems in order to avoid collision. We propose to develop a real-time vehicle detection and distance measurement algorithm using temporally correlated sequential images from monocular vision Many distance determination algorithms has been proposed and developed in the last decade [1- 3]. Active detection systems are widely implemented commercially in vehicles today because of their immunity to changing ambient light conditions; however, the cost is usually higher than passive systems because they involves transmitters and receivers. Among the passive detection systems, vision-based methods are the most common and effective techniques, which can be roughly classified into two categories: monocular and stereo vision. Stereo vision utilizes images from two or more cameras to construct the 3D space, which usually provides better accuracy than monocular vision; however, the cost is also higher, and it is prone to error if the parameters of the cameras are unknown or change due to vibration on the road. Therefore, we choose to focus on monocular vision.
A complete distance measurement system includes two steps: vehicle detection and distance
calculation. Motion-based and Appearance-based are two main approaches for vehicle detections [3].
We will first apply the combination of Histogram of Oriented Gradients (HOG) descriptor and Support Vector Machine (SVM) classification for vehicle detection, because it has shown promise by many previous works [4]. There are also database online for us to train the SVM classifier [5]. Optical flow method, one of the popular motion-based methods, is another candidate. Once the vehicles are detected, the location in the 3D world needs to be computed. Because monocular vision is used, reference features with known dimensions are required. We will start from simple cases. For example, the parallel lines on the road can be used to calculate the vanishing points as a reference; the width of vehicles are roughly on the same scale, which can also be used as references; license plates are rectangular and have a fixed size serving as a great reference; however, detecting license plates from distance may be very challenging. While these techniques should be enough to measure the distance in the simplest cases, e.g. detecting a vehicle right ahead driving straight, we are also interested in investigating some edge cases, including occlusions, driving in the night or rainy days, presence of vibration, being lit by the headlights from vehicles driving from opposite direction, etc. We will combine different techniques to tackle [1] Sun, Zehang, et al., "On-road vehicle detection: A review." Pattern Analysis and Machine Intelligence, IEEE Transactions on 28.5 (2006): 694-711. [2] Cualain, Diarmaid O., et al., “Distance detection systems for the automotive environment: a review.” In Irish Signals and Systems Conf. 2007. [3] Sivaraman, Sayanan, and Mohan Manubhai Trivedi. “Looking at vehicles on the road: A survey of vision-based vehicle detection, tracking, and behavior analysis.” Intelligent Transportation Systems, IEEE Transactions on 14, no. 4 (2013): 1773-1795. [4] H. Tehrani Niknejad, et al., “On-road multivehicle tracking using deformable object model and particle filter with improved likelihood estimation,” IEEE Trans. Intell. Transp. Syst., vol. 13, no. 2, pp. 748–758, Jun. 2012. [5] http://pascallin.ecs.soton.ac.uk/challenges/VOC/
Βοηθητικό (συνημμένο) paper : On-Road Vehicle and Lane Detection
In real life, it is often very handy to draw an electronic circuit with various components on paper.
However, paper is not a reliable media for storing information. On the other hand, sometimes we
want to try things out and test if the sketched circuit is functional, which is impossible to realize on
paper. To solve this problem, we propose the idea to scan the circuit sketch on paper with our
Android device and translate it into standard layouts and run circuit simulations. Challenges in sketched symbol recognition lie in the different sketch styles with regard to stroke
order, direction, etc. We plan to adopt the following approach:
1.Solve correspondence problem between reference image and sketch.
2.Use correspondence to solve alignment problem.
3.Compute distance between the corresponding points of the two shapes.
4.Find the reference image in our database that has the lowest distance score. We will
also continue looking for other promising methods. Plan
1. Build image databases of standard circuits.
2. Test on printed circuits. Start with recognizing single circuit components. Then test on multiple
components connected together as a circuit.
3. Test on sketched circuits.
References [1]Calhoun C, Stahovich TF, Kurtoglu T, Kara LB.Recognizing multistroke symbols. In AAAI
Spring Symposium on Sketch Understanding. 2002. p. 15–23.
[2]Belongie S, Malik J, Puzicha J. Shape matching and object recognition using shape contexts.
IEEE Transactions on Pattern Analysis and Machine Intelligence 2002;24(4): 509–22.
[3]Hse H, Newton AR. Sketched symbol recognition using zernike moments. Technical report,
EECS, University of California, 2003 Βοηθητικό υλικό : Circuit Sketch Recognition (συνημμένο)
Recognition of a small set of sign language using Hand Gesture
The aim of this project is to build a model for gesture recognition. In this work, the main aim has been to count the number of fingers that is denoted by a gesture using image processing techniques only. The second challenge is to devise an algorithm to count the number of fingers that is denoted by the gesture. For this part, problem an edge detection algorithm is used specifically for this problem. This too is discussed in detail in ‘Approach’ section. Δίνεται η πλήρης περιγραφή (όχι ο κώδικας ) σχετικά paper:
[1] Zhong Yang; Yi Li; Weidong Chen; Yang Zheng , “Dynamic Hand Gesture Recognition Using Hidden Markov Models”, 7th International Conference on Computer Science and Education, 2012, pp 360-365, IEEE conference Publication. [2] Rautaray, S.S.; Agrawal, A. ,“Design of Gesture Recognition System for Dynamic User Interface”, 2012 IEEE International Conference on Technology Enhanced Education (ICTEE), pp. 1-6, IEEE Conference Publication. [3] Panwar, M., “Hand Gesture Recognition based on Shape Parameters”, 2012 International Conference on Computing, Communication and Applications (ICCCA), pp. 1-6, IEEE Conference Publication.
Video Detection and Enhancement of Small Unmanned Aircraft
This project is aimed at being able to successfully detect, track, and highlight the motion of a small unmanned aerial vehicle (UAV). We would like to explore both a ground based observer location and an in-flight observer location. The work for this project focus on the detection of small aircraft (UAVs) and is an extension of [1]. A proposed starting point for a ground based observer method is a technique called sky segmentation, which splits sky and ground elements in the frame and looks for non-sky elements in the sky portion [2]. This would assist us in the detection of an aircraft on a near static background, as there is a good probability that the UAV will be in the sky portion of the video. For an in-flight observer viewing from above the aircraft, there entire field of view will be non-sky requiring a different approach. Here we can start to leverage techniques that compare the velocity of different elements of the video (by comparing sequential frames) and looking for elements that are traveling at different rates than the background. UAVs pose an additional problem that some types of vehicles have the ability to hover and therefore move along with the background. To address this issue, we plan on using a navigation-like approach, similar to the approach in [3], where a Kalman Filter was used. Using a statistical model of the dynamics of a UAV and measured dynamics information from the video, we plan on decreasing the false positive results and improve tracking performance. After detection, we would like to use image processing techniques in class to process aircraft and background pixels separately to highlight the aircraft and display its path in the video for an enhanced user experience References Previous work [1] https://stacks.stanford.edu/file/druid:my512gb2187/Hammond_Padial_Obstacle_Classification_and_Se gmentation.pdf Sky segmentation [2] http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1570842
Distance and Velocity determination [3] http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6205034