Mobile Search for a New Era: Voice, Location and Sight

Editor's note: today Google held a launch event at the Computer History Museum in Mountain View, CA. Fresh off the stage, we've invited Vic to highlight the mobile team's announcements, and the unique set of technologies that make them possible. (All [video] links point to event footage that will be viewable later today.)

A New Era of Computing

Mobile devices straddle the intersection of three significant industry trends: computing (or Moore's Law), connectivity, and the cloud. Simply put:
  • Phones get more powerful and less expensive all the time
  • They're connected to the Internet more often, from more places; and
  • They tap into computational power that's available in datacenters around the world
These "Cs" aren't new: we've discussed them in isolation for over 40 years. But today's smartphones -- for the first time -- combine all three into a personal, handheld experience. We've only begun to appreciate the impact of these converged devices, but we're pretty sure about one thing: we've moved past the PC-only era, into a world where search is forever changed.

Just think: with a sensor-rich phone that's connected to the cloud, users can now search by voice (using the microphone), by location (using GPS and the compass), and by sight (using the camera). And we're excited to share Google's early contributions to this new era of computing.

Search by Voice

We first launched search by voice about a year ago, enabling millions of users to speak to Google. And we're constantly reminded that the combination of a powerful device, an Internet connection, and datacenters in the cloud is what makes it work. After all:
  • We first stream sound files to Google's datacenters in real-time
  • We then convert utterances into phonemes, into words, into phrases; and
  • We then compare phrases against Google's billions of daily queries to assign probability scores to all possible transcriptions; and
  • We do all of this in the time it takes to speak a few words
Over the past 12 months we've introduced the product on many more devices, in more languages, with vastly improved accuracy rates. And today we're announcing that search by voice understands Japanese, joining English and Mandarin.

Looking ahead, we dream of combining voice recognition with our language translation infrastructure to provide in-conversation translation [video]-- a UN interpreter for everyone! And we're just getting started.

Search by Location

Your phone's location is usually your location: it's in your pocket, in your purse, or on your nightstand, and as a result it's more personal than any PC before it. This intimacy is what makes location-based services possible, and for its part, Google continues to invest in things like My Location, real-time traffic, and turn-by-turn navigation. Today we're tackling a question that's simple to ask, but surprisingly difficult to answer: "What's around here, anyway?"

Suppose you're early to pickup your child from school, or your drive to dinner was quicker than expected, or you've just checked into a new hotel. Chances are you've got time to kill, but you don't want to spend it entering addresses, sifting through POI categories, or even typing a search. Instead you just want stuff nearby, whatever that might be. Your location is your query, and we hear you loud and clear.

Today we're announcing "What's Nearby" for Google Maps on Android 1.6+ devices, available as an update from Android Market. To use the feature just long press anywhere on the map, and we'll return a list of the 10 closest places, including restaurants, shops and other points of interest. It's a simple answer to a simple question, finally. (And if you visit google.com from your iPhone or Android device in a few weeks, clicking "Near me now" will deliver the same experience [video].)

Of course our future plans include more than just nearby places. In the new year we'll begin showing local product inventory in search results [video]; and Google Suggest will even include location-specific search terms [video]. All thanks to powerful, Internet-enabled mobile devices.

Search by Sight

When you connect your phone's camera to datacenters in the cloud, it becomes an eye to see and search with. It sees the world like you do, but it simultaneously taps the world's info in ways that you can't. And this makes it a perfect answering machine for your visual questions.

Perhaps you're vacationing in a foreign country, and you want to learn more about the monument in your field of view. Maybe you're visiting a modern art museum, and you want to know who painted the work in front of you. Or maybe you want wine tasting notes for the Cabernet sitting on the dinner table. In every example, the query you care about isn't a text string, or a location -- it's whatever you're looking at. And today we're announcing a Labs product for Android 1.6+ devices that lets users search by sight: Google Goggles.



In a nutshell, Goggles lets users search for objects using images rather than words. Simply take a picture with your phone's camera, and if we recognize the item, Goggles returns relevant search results. Right now Goggles identifies landmarks, works of art, and products (among other things), and in all cases its ability to "see further" is rooted in powerful computing, pervasive connectivity, and the cloud:
  • We first send the user's image to Google's datacenters
  • We then create signatures of objects in the image using computer vision algorithms
  • We then compare signatures against all other known items in our image recognition databases; and
  • We then figure out how many matches exist; and
  • We then return one or more search results, based on available meta data and ranking signals; and
  • We do all of this in just a few seconds
Now, with all this talk of algorithms, image corpora and meta data, you may be wondering, "Why is Goggles in Labs?" The answer -- as you might guess -- lies in both the nascence of the technology, and the scope of our ambitions.

Computer vision, like all of Google's extra-sensory efforts, is still in its infancy. Today Goggles recognizes certain images in certain categories, but our goal is to return high quality results for any image. Today you frame and snap a photo to get results, but one day visual search will be as natural as pointing a finger -- like a mouse for the real world. Either way we've got plenty of work to do, so please download Goggles from Android Market and help us get started.

The Beginning of the Beginning

All of today's mobile announcements -- from Japanese Voice Search to a new version of Maps to Google Goggles -- are just early examples of what's possible when you pair sensor-rich devices with resources in the cloud. After all: we've only recently entered this new era, and we'll have more questions than answers for the foreseeable future. But something has changed. Computing has changed. And the possibilities inspire us.

No comments:

Post a Comment