Fictional Goggles turn realistic
This is the ideal stuff that one would imagine for year 2044 A.D – You are holidaying in Paris and the only French you know is “Bonjour” which you anyway spell with a rustic accent, taking count to zero. You are lost, need directions and communication being a roadblock here, you take out your phone and point to the buildings around you. The phone starts giving you information on the area you are in and maps you on an interactive map. You ask for some nice Italian pizzeria and it gives you a list of options and on selecting one of these, you get choice of routes to the selection. On the way a Greek restaurant catch your attention and you target your mobile, and you get all the reviews of it. You decide to try it out and ask for menu, point your phone on the menu and get the image results of the dishes and decide on your order. Just then you see a friend standing at the far end, you decide to play a prank; point your mobile to her and voila! Your phone gives you an option to call, SMS him, post message on his Facebook page, blah blah… So what is the story? Just that all this is not 2044 stuff, it has become possible as we speak! May I have the pleasure of introducing you to Google Goggles in case you two haven’t met already!
Background
Image recognition has been attracting lot of R&D efforts since late nineties and biggies like Google, Microsoft and Nokia etc are pouring millions into the research. Google has been focusing on non text objects since 2000; one would remember about patent filed by Larry Page in 2004 titled “Method for searching media“. They bought Neven Vision in 2006, which was into “next generation” face and object recognition technologies, and hence got handful patents too which were owned by Neven Vision. Google also brought Transformics in 2006 which enabled it to index the pages its Google crawlers were not able to – basically the unstructured information. Google then integrated this technology with its homegrown Picasa, and launched Face detection, though in primitive form, in 2008. With launch of Android phone, Google got the base on which it could bring out its future technologies and capture the feedback in legal and cost effective way. And a look into Google Labs would introduce this next generation image recognition application – Google Goggle.
Google Goggles was developed for use on Google’s Android operating systems for mobile devices. While currently only available in a beta version for Android phones, Google has announced it plans on making the software capable of running on other platforms, notably the Apple iPhone and Blackberry devices. See the video below to get the gist of Google Goggles-
Competition
IBM (Direct competition)
IBM came up with SAPIR (Search in Audio-Visual Content Using Peer-to-peer Information Retrieval) in 2009 which analyzes photos, sound files and even video queries. It has created its database by extracting data from Flickr’s ginormous archive and index features such as color structure, color layout, shape edges and texture. It also allows one to combine text with media to refine down the search.
Demo: Click here for YouTube demo video
Nokia (Can be a direct competition)
Nokia announced its Point and Find app for its handsets in April 2009, which can recognize barcodes and cinema posters. The software uses the phone’s camera, internet access and GPS to call up pre-programmed tags; that can then bring up local movie times, the ability to book tickets, and – eventually – price comparisons.
Demo: http://pointandfind.nokia.com
Microsoft (Indirect competition)
Microsoft has not come up with something as sensational in image detection as examples listed above as they have been focusing more on gesture recognition and object recognition. I have already blogged about Microsoft Surface, read the post at http://www.jasginder.com/bizblog/2009/12/microsoft_surface. You will also hear about the project Natal which is being used to develop XBox‘s next version to challenge Wii. As I mentioned earlier, focus of Microsoft is in enterprise and commercial sector rather than consumer side.
Like.com (from Riya.com, the start-up which introduced the concept of face recognition in personal photos)
Riya was the first website to introduce the feature of tagging friends in photo using face recognition. As Google and Microsoft made the inroads over here, Riya CEO Munjal Shah decided to become niche player by venturing into Like.com. Like.com is image search and takes images and text as inputs which only IBM has been able to replicate as of now. Say a user likes the watch that Megan Fox wore in some party then user can use it as an image query and Like.com will return results showing watches that look very similar. Right now it supports only shoes, jewelry, hand bags and clothing but it plans to expand over time to include other categories.














