Posted on Wed 13 July 2016

A brief introduction on object recognition techniques in CV

Object recognition is one of the hot topics in computer vision. Particularly, of much significance, it has various applications in robotics, identification, interpretation and such image oriented tasks.

One can ramify between Object recognition into 4 major subsets such as :

  1. Template based approach : We take the object and match the object with distribution pattern of intensities. It’s quite exhaustive is not invariant to geometric transformations inside search image Eg: Template Matching . However, this isint much sturdy.

  2. Geometric approach : Recognize the object on basis of edges, corners, angles. Then we create a one 2 one match and then make a transformation that handles rotation, scaling, etx.
    Eg: Matching w.r.t positioning of objects. Classifying patterns

  3. Graph based approach : we match similar types of structures. Encode the relationship between structures rather than the geometry of the figure.

  4. Bag of Visual Words : I will cover Bag of Words in a seperate post Bag of Words.

I’ll be going through a couple of common techniques. Firstly I’ll introduce template matching and in the follow, using probabilistic recognition , then moving on to Machine learning applications such as using BOW models coupled with SVM , or feature extrapolation and recognition using neural nets.

Template based aproach Well, there are a few contentions i;d like to make when we use this approach. Shape based approach is not the key or the best method for recognition. But , in industry, all methods have their perks. Say we have an industry converyor belt that is suppossed to carry only boxes or T- Joints or anything so simply extrapolable and uniform that it is feasible to use shape matching rather than going for complex approaches like using nets, or using SVM along with bag of words. Then I’d say, template matching or shape based matching is the best bet in terms of accuracy as well as speed.

Lets dive a little deeper in the aforementioned topic. I’ll be using template matching as an example to explain how Shape based matching works. Consider two images. The template image (T) and the actual image (I). We say that we want to find the location / occurence of template inside the actual image.

In simple words I can say, return True, iff \(T \subseteq I\). But life aint so simple. In reality, every pixel needs to be iterated over and checked. It can be said that the a template is belonging to a particular image iff, all pixels match exactly. i.e. the sum of ( Absolute Difference between every pixel must be zero).

$$SAD(x, y) = \sum_{i=0}^{T_{\text{rows}}}\sum_{j=0}^{T_{\text{cols}}} {\text{Diff}(x+i, y+j,i,j)}$$

Here’s how the above formula works.

\(SAD\) : sum of absolute differences. \(SAD (x, y)\) : the SAD of a particular pixel location in the actual Image \(I\)

Each pixel in \(I\) can be represented using \((x, y)\). It has an intensity, or color value expresed as \(Intensity(I(x,y))\).

Now, a pixel in template image \(T(p,q)\) will have intensity as \(Intensity(T(p,q))\).

A search window is created that is of the size of template image. The window is then slid over the actual Image I and every time SAD is computed. For checking single occurence of template the iterations can be stopped once SAD is \(0\), or else it can be continued until the end of \(I\) is reached.

We’ll be implementing the algorithm using opencv , python

Take a look at \(line 14\), it states cv2.TM_SQDIFF parameter in matchTemplate(). Opencv’s rendition for SAD can be delineated as :

$$R(x, y) = \sum_{i, j}\ {(\text{T}(i,j) - \text{I}(x+i, y+j))^2}$$

And here is the output ! We will search a pair of glares from this paraphenalia

crop

original

Okay, so template matching can be established using the opencv api. Its’ quite simple to implement in plain ol c++ as well. Not a biggie. We’ll use the helper functions for image reading , accessing pixel value, etc. but the rest can be quite easily implemented using a bunch of for loops

You can find my remaining code on Github

remaining posts on other approaches coming soon …


comments powered by Disqus