Teaching Computers to Recognize Objects

Researchers of the European “Cognitive-Level Annotation using Latent Statistical Structure” (CLASS) project are developing technologies that could help computers to visually recognize specific objects, such as one’s glasses, and general classes of objects, such as a random car on the street. While these actions are trivial for human beings, machines require a variety of complex directions in order to perform such feats.
 Prof. Luc Van Gool (Credit: ETH Zurich)
Prof. Luc Van Gool
(Credit: ETH Zurich)

The processes involved in recognizing objects include collecting visual and other sensory data, cross referencing the essential elements with memories and making mental deductions. While humans perform all of these actions almost instantaneously, such tasks are much more difficult for computers to imitate. Recently, scientists of the CLASS project have introduced several novel solutions to this conundrum.

The first step was to define the different sensory elements required for objects’ perception. “Vision is our most important sense and about half of the human brain is involved with vision in one way or another,” explains Prof. Luc Van Gool of Belgium’s Leuven University (KUL) who also leads the Computer Vision Laboratory at the Swiss Federal Institute of Technology (ETH). “It enables us to recognize the objects.”

While there are numerous ways in which visual accessories could give computers the ability to ‘see’, there is more to it. The computer needs to know how the object generally looks like, the various shapes and sizes it comes in, accessories that could be attached to it, etc. “The same object will look different depending on the viewpoint, the illumination, or the occlusions caused by other objects in front,” notes Van Gool.

In brief, although computers have incredible computational skills, they find the act of realizing that Chihuahuas and Dobermans belong to the same species almost impossible without complex and painstaking programming. Van Gool has elaborated on the subject: “The recognition of an object as belonging to a particular group is a harder problem for a computer than the recognition of a specific object. The reason is that object classes show large variability among their members.”

During the last three and a half years, the EU-funded project managed to achieve impressive technological improvements when compared to previous efforts. It developed a system in which the description of objects is based on the appearance of many separate, small patches. Such localized features give the necessary robustness to deal with the massive variations mentioned earlier. In addition, CLASS created special mechanisms – known as efficient approximate neighborhood searches – for the comparison of an image or an object with huge numbers of reference images.

In addition to the researchers’ curiosity and the theoretical challenge, a commercial application has already been found. Through a company known as “Kooaba”, CLASS technology enables mobile phone subscribers to install software that instantly ‘tags’ photos. For instance, after shooting a monument or an album cover the relevant online information is retrieved. “It’s like the object itself becomes the link to further information,” says Van Gool. Other expected applications for this technology are guided tours for different cities and museums – using Kooaba’s service.

TFOT has previously covered the Italian Magic Mirror, an interactive mirror that interacts with shoppers and allows them to view personalized content regarding items they are interested in, and Object, a new gizmo which offers a unique photographic experience, as it enables users to relate digital data with physical objects. Another related TFOT story covers the Wizkid, a robot that sets its ‘eyes’ on you and turns to watch and interact with you wherever you go.

For more information about the CLASS project, see its website.