Small computing devices, such as enhanced cameras, are becoming smarter and more capable: not only do they record visual input (video) but they are starting to interpret it.
Small cameras that not only see, they watch!
Video cameras are being built into phones, robots, vehicles, traffic lights, and homes. For example, in-home embedded vision systems will have the dual roles of security and interaction, monitoring for children in dangerous parts of a kitchen or serving as the substrate for robots and entertainment-centre controls. But that is just the start.
“Critically missing from these devices is an embedded computer vision system that can group the parts of a scene that are moving together,” explains Marc Pollefeys, Professor at ETH Zurich, Switzerland, heading the Computer Vision and Geometry group. Pollefeys, together with his fellow researcher Gabriel Brostow from ETH and Andrew Blake from Microsoft Research Cambridge, recently started a research project awarded and supported by the Innovation Cluster for Embedded Software (ICES), a joint venture of Microsoft Research, the ETH Zurich and the EPF Lausanne. Pollefeys continues, “The main goal of our project is to develop novel algorithms that automatically detect simultaneously moving objects in video images.”
In the long term, embedded systems that are good at interpreting video motion will be critical for building devices ranging from stationary stand-alone CCTVs to car-mounted cameras. Vision will also play an increasing role in making mobile phones aware of their environment so they can work for video-conferencing, personal or inspection-style datacollection.
There are numerous potential application scenarios: for example, embedded CCTV systems can count the number of pedestrians or vehicles passing through their field of view, distinguish their trajectories at intersections and help resolve congestion. The locus of people, cars, and other entities moving through a scene must be tracked to establish what went where, and how the entity looked. Real outdoor traffic scenes have long periods of sparse activity, with waves of rush hour traffic that follows roughly the same trajectories.
Another example is peoplecounting in crowds in order to feed real numbers to architects and urban planners who study crowd density and resolve bottlenecks of people traffic at different times of day. Such information is also important when planning large sporting events where safety limits must be considered, and where public transit frequency is adapted to real-time needs. In such scenarios, mobile devices with built-in video cameras will be mounted near traffic signs and will periodically send back distilled information about how many people are going which way, without risking the privacy concerns of individuals.
And, of course, another application domain is science, where cameras and integrated video analysis could help monitor and track species for biodiversity research. Sensors have just started to impact data collection in science.
Today, video camera systems are used predominantly for collecting videos that must be watched and visually interpreted by a human in order to be useful. The limited capabilities in computing power and memory of the small hand-held devices that have been previously available have prevented the research group at ETH Zurich from considering them as a viable platform for computer vision. However, small embedded computing devices in general and those attached to or augmented by a camera have increased their computing power and will become more capable in the future.
Advancements on the software side must keep pace with the developments of hardware to unleash the power of these technologies. “In the past there has been much research invested in analysing still images, but what we need here are techniques that are relevant for video analysis,” explains Pollefeys. There are two situations causing motion: either the camera observes things and people in motion, or the camera itself is moving, so stationary objects appear as moving. In both situations, Pollefeys and his team face the challenge that objects are often partially occluded in parts of a video sequence, which limits the use of some existing image analysis techniques, so motion must be analysed in order to reveal boundaries of objects. Another challenge is that when the camera is moving, nearby objects appear to move more quickly than distant ones, in a phenomenon called “motion parallax”.
“We enjoy working with Andrew Blake and the Computer Vision team of Microsoft Research Cambridge as they are one of the best, if not the best, teams in this domain worldwide,” says Marc Pollefeys. “This is the value of the collaboration within the ICES, and bringing together the mutual expertise in computer vision and breaking down the problem to implementable software that runs on little devices promises great results.”
The Microsoft Innovation Cluster for Embedded Software (ICES) in Switzerland has been set up in cooperation with the two leading national universities, the ETH Zurich and the EPF Lausanne. The ICES research programme is due to run for five years and Microsoft will invest up to 1 million Swiss francs per year. The collaboration and participation of researchers from various disciplines, involving both Swiss universities, Microsoft Research Cambridge and partners from industry, is key to creating and expanding a pool of know-how that is accessible to all partners of the Innovation Cluster. An Industry Council is being established to better connect academia with industry and to create a channel to industry for early-stage technologies that result from research.