Somewhere in the world, this scene is taking place. The lights are low in the living room, and a couple wants to switch from a video game to a movie.
The end of the remote control
“No, it’s the other remote.”
“No, the one over there, under the table.”
“Which button do I push?”
Hands-free: Instead of pressing buttons, this boy is playing with Kinect
Sound familiar? These unlucky people are struggling to communicate with a machine by poking their fingers at a poorly designed interface. If only they could speak to the television in plain language, as if it were a person, and simply tell it to switch from Forza to Avatar.
That future is at hand. Designers on three continents are busy creating more intuitive methods to interact with the growing ranks of ever-smarter machines. The systems they are building make sense of our words, our gestures and movements, even our shrugs and facial expressions. This field is known as Natural User Interface, or NUI, and it’s exploding.
Fuelling the demand is the proliferation of those information machines around us. Until recently, most people dealt with just a few computers, maybe one at home, another at work. These big boxes had keyboards and mice, and users mastered the necessary clicks and commands. But new types of computers are on the rise, from phones to security cameras. And in the coming years, we will be surrounded by ever smarter machines, some the size of rooms, others practically invisible. They’ll be navigating the car, booking a plumber, tracking the dog, letting us into the office, adjusting the air conditioning, fine-tuning medications and handling myriad other tasks. Most of the time, we will be able to communicate with them without sitting down and logging in, and often without even taking the time to pick up a gadget.
Someday, perhaps, we will be able to send signals by pretending we are holding the device. At the Hasso Plattner Institute near Berlin, researchers have discovered that people often remember with startling precision the layout of icons on their smart phones. In a new application, users can call up an app – perhaps to get directions or hear a voice mail – by simply moving their fingers in certain patterns in the air. These movements are captured by a tiny camera in their lapel and sent to a handset nearby. “People should be able to control a lot in their device while it’s in their pocket,” says Patrick Baudisch, chair of the institute’s Human-Computer Interaction Lab.
That doesn’t mean that touch screens, a popular breakthrough in the past decade, are going to disappear. But they too are sure to evolve, especially as screens shrink. How can the user see what he or she is doing? One design at the Plattner Institute puts the touch on the back of a transparent screen, so that the finger does its work in the background.
Other advances look to more ingrained body movements. At Microsoft Research, scientists are developing not only verbal commands for televisions and game consoles, but also systems that recognise everyday gestures. Instead of hitting an OK button on a remote, for example, a user might simply hold up a thumb. None of this would be possible without smarter, more perceptive machines. Traditional computers were deaf and blind. The only messages they could receive came from keyboards and the movements of a mouse on a two-dimensional grid. “For decades, humans have been the eyes and ears of computers,” says Anoop Gupta, Distinguished Scientist at Microsoft Research. “Now the sensors are getting powerful enough, and software sophisticated, so that computers can have eyes and ears of their own.”
Starting with video games
The sensors are becoming very perceptive, and this opens endless opportunities for new natural interfaces. Some of the most dramatic come from the world of video games. Five years ago, Nintendo introduced the Wii. Its controller contains motion sensors that allow the user to interact with hand gestures, whether hitting backhands or shooting skeets. This was a harbinger of NUI. People could easily forget they were dealing with a computer.
Microsoft created a revolution with the November 2009 release of Kinect for the Xbox 360: the user does not need to hold a controller any more. Equipped with projected infrared light and other sensors, Kinect can detect the motion of players’ bodies, recognise their faces and process voice commands. It sold a record 8 million units in its first 60 days, driving home an important message: NUI sells. Perhaps more significantly, last June Microsoft released a development kit for Kinect programmers, which opened the door for a host of new applications for this breakthrough sensory technology.
One of them is on display at a home for elderly people, TigerPlace, in Columbia, Missouri, where Kinect systems monitor the day-to-day movements of voluntary test subjects. The data capture each person as a faceless three-dimensional silhouette. This helps to protect their sense of privacy, while providing enough detail to analyse changing patterns of walking or bending. These could be signs that the person is losing balance and faces risks of a fall.
This may sound more like surveillance than communication, but Eric Dishman, an Intel executive who oversees similar in-home monitoring for the elderly in Dublin, Ireland, and Portland, Oregon, says that this type of in-home health care technology will increasingly interact with the people. One of Intel’s tools, the Magic Carpet, features a network of sensors underneath the tiles of a kitchen floor. If it determines by the subjects’ walking patterns that they are at risk of a fall, he says, “the same platform can literally lead them through exercises. And while they’re doing those exercises, you’re collecting even more data.”
What if the person doesn’t feel like doing the prescribed exercises? The simplest solution would be an interface that understood a spoken sentence: “I don’t want to today.” Capturing natural language is a crucial component of NUI. In laboratories around the world, linguists and computer scientists are working to expand the active vocabularies of machines, and to teach them to adapt to different accents and dialects. In essence, these computers will be going through an apprenticeship not only in Humanity 101, but in specifically comprehending each individual user.
David Ferrucci headed the IBM team that built a question-answering computer called Watson. Early in 2011, the machine demonstrated advanced natural-language skills as it defeated two human champions in an American televised quiz show called Jeopardy! In Ferrucci’s vision, computers like Watson will soon accompany us, perhaps through our cell phones, listening to complex spoken questions and providing correct responses. “I look at it like the computer in Star Trek,” he says. “You forget it’s a computer and simply ask it questions. It works through a massive data base, which only a computer can do, and comes back with answers.”
To many, advanced speech sounds like the ideal interface. But Bill Buxton, Principal Researcher at Microsoft, argues that there is no such thing as ideal. The success of each approach, whether touch screens, hand gestures or speech, depends on its context, he says. Speech, for example, works far better than a touch screen for delivering an important business message while driving a car. “But let’s say I’m landing at the San Jose Airport,” he says, “and I deliver that same message by voice. I might get fired if it turns out the guy sitting next to me is from a competitor.” In that case, the best interface might be a touch screen – or even a venerable qwerty keyboard.
Floor sensors may detect that a nursing home patient is limping across the kitchen floor and at risk of a fall. But how to know who it is? Multitoe technology, developed at the Hasso Plattner Institute in Germany, creates profiles of each user's foot.
The ultimate natural interface would be a system that directs our thoughts straight to a machine. It sounds like science fiction, but the early versions of such technology are taking form. Already, researchers have connected prosthetic limbs to the edges of people’s nervous systems, permitting them to move the limbs with signals from the brain. In European laboratories, from the University Medical Center in Utrecht, Netherlands, to the Institute for Knowledge Discovery in Graz, Austria, researchers are developing techniques to process more complex signals from the brain, allowing stroke victims and people suffering from locked-in syndrome to operate computers. Another project, financed by the National Science Foundation in the United States, is developing a generation of brain microprocessors. The goal is to have them ready by 2020.
If successful, this brain-machine interface could spread to the population at large, eventually enabling us to control certain computer functions – play video games, pull up the contact list on our smart phones – with our thoughts. Conceivably, messages could be sent brain to brain, creating a digital version of telepathy. This would present thorny new challenges for designers – how to protect people, for example, from messaging their stray thoughts. In the meantime, though, we’ll be building vast portfolios of words and body signals to tell the machines in our lives what we’re up to. We’ve been yelling and gesturing at them for decades. Now, with NUI, the computers will finally be paying attention.
This article was originally published in Issue 9 of the Futures Magazine.