best tool for object recognition [closed] - object-recognition

i want to do a small project on object recognition, any any tools or literature suggestions on this topic ?

Opencv
alt text http://img.amazon.ca/images/I/51wL-eaIHpL._BO2,204,203,200_PIsitb-sticker-arrow-click,TopRight,35,-76_AA300_SH20_OU15_.jpg
It's free, useable from c/c++ and python. And has a lot of community and a lot of examples and college courses based on it.
An alternative if you have a copy (or some spare money) is matlab.

Literature:
You will probably need to work with Image Processing techniques in your project. A very good introductory book to this area is the Digital Image Processing by Gonzalez and Woods. It covers topics such as image segmentation, which is a technique used to separate the objects to be recognized from the rest of the image
After you have identified the objects in the input image, the next step consists in finding a way to measure how similar they are to one another. Probably, the best way to do that is to use image descriptors. Usually, for object recognition, the best class of descriptors are the ones based on shape. The article "Review of shape representation and description techniques" by Zhang D. and Lu G. provides a great review about shape descriptors.
Finally, you have to classify those objects. [Machine Learning] by Mitchell is a classical book that discusses techniques such as k-NN that you can use in your project.
Tools:
OpenCV or Matlab. I particularly use OpenCV and I really like it for the following reasons:
Very good documentation and a great number of good books and tutorials about it.
A number of segmentation algorithms implementations, such as the Otsu method and the Watershed.
Provides basic GUI and Media IO.

A nice playground is to have and use Processing ( http://processing.org/ ) and the various computer vision libraries, especially OpenCV ( http://ubaa.net/shared/processing/opencv/ ). You don't need the libraries for simple frame grabbing from an inbuilt or external usb camera because it works out of the box.
With a connected with a USB camera you can start doing some interesting stuff straight away because programming with Processing is very, very easy. I mean I was detecting and tracking faces in no time and I have no background in the subject.

Also research Adobe Flash for object recognition. Seriously.

Related

How a marker-based augmented reality algorithm (like ARToolkit's one) works?

For my job i've been using a Java version of ARToolkit (NyARTookit). So far it proven good enough for our needs, but my boss is starting to want the framework ported in other platforms such as web (Flash, etc) and mobiles. While i suppose i could use other ports, i'm increasingly annoyed by not knowing how the kit works and beyond that, from some limitations. Later i'll also need to extend the kit's abilities to add stuff like interaction (virtual buttons on cards, etc), which as far as i've seen in NyARToolkit aren't supported.
So basically, i need to replace ARToolkit with a custom mark detector (and in case of NyARToolkit, try to get rid of JMF and use a better solution via JNI). However i don't know how these detectors work. I know about 3D graphics and i've built a nice framework around it, but i need to know how to build the underlying tech :-).
Does anyone know any sources about how to implement a marker-based augmented reality application from scratch? When searching in google i only find "applications" of AR, not the underlying algorithms :-/.
'From scratch' is a relative term. Truly doing it from scratch, without using any pre-existing vision code, would be very painful and you wouldn't do a better job of it than the entire computer vision community.
However, if you want to do AR with existing vision code, this is more reasonable. The essential sub-tasks are:
Find the markers in your image or video.
Make sure they are the ones you want.
Figure out how they are oriented relative to the camera.
The first task is keypoint localization. Techniques for this include SIFT keypoint detection, the Harris corner detector, and others. Some of these have open source implementations - i think OpenCV has the Harris corner detector in the function GoodFeaturesToTrack.
The second task is making region descriptors. Techniques for this include SIFT descriptors, HOG descriptors, and many many others. There should be an open-source implementation of one of these somewhere.
The third task is also done by keypoint localizers. Ideally you want an affine transformation, since this will tell you how the marker is sitting in 3-space. The Harris affine detector should work for this. For more details go here: http://en.wikipedia.org/wiki/Harris_affine_region_detector

SVM library suitable for online embedding

We're working on a machine learning project in which we'd like to see the influence of certain online sample embedding methods on SVMs.
In the process we've tried interfacing with Pegasos and dlib as well as designing (and attempting to write) our own SVM implementation.
dlib seems promising as it allows interfacing with user written kernels.
Yet kernels don't give us the desired "online" behavior (unless that assumption is wrong).
Therefor, if you know about an SVM library which supports online embedding and custom written embedders, it would be of great help.
Just to be clear about "online".
It is crucial that the embedding process will happen online in order to avoid heavy memory usage.
We basically want to do the following within Stochastic subGradient Decent(in very general pseudo code):
w = 0 vector
for t=1:T
i = random integer from [1,n]
embed(sample_xi)
// sample_xi is sent to sub gradient loss i as a parameter
w = w - (alpha/t)*(sub_gradient(loss_i))
end
I think in your case you might want to consider the Budgeted Stochastic Gradient Descent for Large-Scale SVM Training (BSGD) [1] by Wang, Crammer, Vucetic
This is because, as specified in the paper about the "Curse of Kernelization" you might want to explore this option instead what you have indicated in the pseudocode in your question.
The Shark Machine Learning Library implements BSGD. Check a quick tutorial here
Maybe you want to use something like dlib's empirical kernel map. You can read it's documentation and particularly the example program for the gory details of what it does, but basically it lets you project a sample into the span of some basis set in a kernel feature space. There are even algorithms in dlib that iteratively build the basis set, which is maybe what you are asking about.

Image processing Basics

I am planning to do a project on Image Processing, my knowledge in this subject in general is low. My Preferred Language is C++.
Can the members out here give me:
A Brief idea of What Image Processing is?
What books should i consult [ please keep in mind i am a beginner and am ONLY interested in making a college Project ]
What libraries can i use? [ I know about Boost/OpenCV etc. I would like to know what simplest and can get my project done quickly - its a minor project ]
Apart from the above 3 points, anything which i should know if be told to me will be of good help. Thanks in Advance.
I would suggest reading a good book. Image processing is not a programming field - it's an engineering field and it involves mathematics and signal processing knowledge and intuition. The Gonzalez and Woods Image Processing is quite good and doesn't require a vast knowledge of signal processing before you start reading it. The bottom line is you don't learn image processing like you learn a new programming language; you learn it like a completely new subject that just happens to involve coding. To break this up into answers to your questions,
Image processing is a discipline of digital signal processing which is itself at an intersection of computer science and applied mathematics. It involves pixel-based image operations for purposes of image enhancement (color and contrast correction, denoising, deblurring), visual effects (spatial distortion, morphing, color-substitution), artificial vision (feature extraction, texture segmentation, pattern identification, spatial perception). There are also many narrowly applied areas of image processing such as RADAR image processing, medical image processing, etc.
The book I mentioned above is really a great read. If it's a bit pricey for you, I always find it useful going to Amazon and searching for an inexpensive older edition used book on the subject with a five star rating. Never failed me yet. Beware of getting books that are too old though.
There are plenty of libraries for the task, Boost/CImg are some of them, and it really depends on the platform you're coding for. However, I would think that an image processing project would not involve any libraries, instead you would be writing image processing filters and other operators yourself -- that's the essence of it. You would very likely use algorithm libraries though for faster computation. A project in image processing is not a software project; rather, it's an engineering project and using a library would kill the purpose completely. That is in my humble opinion, of course.
Answer to 3.: CImg might be a good choice to start quickly.
Modifying the image data in such a way to get the desired effect (for example, change a colour image into black and white image).
Very broad question, and the answer depends on what you want to do.
Take a look into GraphickMagick or ImageMagick.
Image processing is a lot about math, and is particular matrix manipulations and in more advanced processing, Fourrier transformation.
image processing is at its basic definition, image manipulation, whatever the manipulations are (either color manipulation, feature extractions, enhancements, ... ). Image processing is different than computer graphics (2d and 3d)
I would assume visit your local college library, they should have existing reference for image processing, algorithms and all that jazz. You have to decide (with your college professor/adviser) what part of image processing you want to explore.
Have a look at the ImageMagick libraries (among others), it offer a good package to start learning about image processing; source code is available).
Max.
Altough old, I trink Digital Image Processing by K. Pratt is a good choice to start (to get a gist of common techniques), but imho you shouldn't learn with C++; a high level language with good image processing toolbox (like MATLAB) is far better to try algorithms (whic sometimes need heavy use of complex numerical methods).

what c++ for feedforward neural network do you suggest?

I want to do some classification problem (word sense disambiguation) in c++; and I need a a feedforward neural network (MLP). I know there are many libraries but I want the one that is not very large, I need just MLP and , and easy to learn and get work?
I'v read about OpenCV and FANN, but I have no idea which library is best?
FANN looks fantastic, and has very helpful introduction/getting started sections on its website (here and a PDF here for example). The MLP is a fairly straightforward concept for neural networks so your work should be well within the capabilities of the library.
With regard to size, you only need to include the bits of the library you intend to use.
Some more info on the basics of Neural networks here if you need it. Although if you're tackling word sense disambiguation you probably know more about them already.
I'm not sure why you'd want to use OpenCV, since that's really intended for graphics processing. Are you trying to grab additional input for your MLP from video recording of a speaker?

Implementing Real Time frequency spectrum for a beginner

I want to develop an application that would take audio(.wav) as input and display its real time simultaneous frequency spectrum . From what i have looked upon the subject , this requires fourier transform of the waves . Can someone suggest where i should start with ? Possible references and books . I want to learn the details of the implementations of realtime frequency spetrum rather than the development of GUI which i am quite familiar with(in C# and in C++).
There are already many libraries to do FFTs for you. No reason to reinvent the wheel. DirectX has an implementation but it might only be in the most recent version. Here's an open source C library for it.
If you want to understand the math behind it, here's a simple explanation and here's a complicated explanation.
You should begin with opening the wav file, extracting the audio stream and decoding it. There are 3rd party libraries to help on this operation.
Take a look at FFTW.
As far as books go, the classic text book on signal processing is Oppenheim and Schafer's Digital Signal Processing. Its college level but it is quite through. You do need some knowledge of calculus in places.
One should understand a bit of the theory before going off and implementing an application to display something. Here are some free online resources on digital signal processing, which is the basis for understanding FFTs and frequency spectrums, and maybe how not to misuse them.
http://www.dspguide.com/pdfbook.htm
http://www.bores.com/courses/intro/index.htm
http://ccrma.stanford.edu/courses/320/Welcome.html
http://yehar.com/blog/?p=121/

Resources