No language left behind

Imagine the informational and cultural isolation that can result if you don’t speak one of the world’s major languages. Think about how limited your online experience would be.

This is a reality for so many of people worldwide who find themselves cut off linguistically from a great knowledge resource. With the rapid proliferation of digital content, some languages are literally falling behind in their usefulness to native speakers.

In Europe, according to a study by Meta-Net, “many European languages are unlikely to survive in the digital age. Assessing the level of support through language technology for 30 of the approximately 80 European languages, the experts concluded that digital support for 21 of the 30 languages investigated is ‘non-existent’ or ‘weak’ at best”.

So, how can native speakers of those languages with weak support gain access to content on the Web and avoid being cut off from digital knowledge?

Automated translation

One way to make more of the Web available in any language is to teach computers how to understand and process written and spoken human language. Bing Translator or Google Translate are good examples of such automated translations tools.

In addition, developers and webmasters can leverage the power of Microsoft Translator by installing the Microsoft Translator Widget, allowing any visitor to a site to see the content translated real time in any of the 40+ supported languages. Users can then utilize the collaborative features of the widget to provide alternate translations to improve the translation quality.

The Microsoft Translator Hub takes machine translation one step further as it empowers businesses and communities to build, improve, and deploy their own automatic language translation systems — bringing better and specialized translation quality to established languages, as well as languages that are not yet supported by major translation providers.

We have seen some great momentum with both the business and language communities for the Translator Hub. Since the launch of the Hub last August, 156 translation systems have been built targeting European languages using the Hub and they are accessible to all website developers. Community efforts for new language pairs such as Breton, Chuvash, and Kyrgyz are underway.

Consumer focused solutions such as the Bing Translator App for Windows Phone allows users to harness the power of translation while traveling or away from their computer. It combines Augmented Reality Translation using your camera, speech & text translation, word-of-the-day live tiles and a travel optimized offline mode.

Supporting the creation of multi-lingual content

Another technology solution is to build tools that make it easier for communities to create content in their own language from existing content in a different language. In Wikipedia for instance, as of January 2013, there were more than three times more pages in English than French, 52 times more than in Greek, and 1,492 times more pages than in Maltese (data from http://stats.wikimedia.org/EN/Sitemap.htm). Since it is difficult to increase the number of content contributors who needs to be experts in specific domains, these disadvantaged communities  may leverage existing good quality content in English to create new content  based on the English version, and worked upon by contributors passionate about the local language.

WikiBhasha, developed by Microsoft Research, is a cloud hosted service that addresses that need. The name comes from the well-known term “Wiki” denoting collaboration, and “Bhasha”, which means “language” in Sanskrit. WikiBhasha enables instant and cost-effective translation and recreation of articles from English to more than 40 other languages through machine-translation technology from Microsoft Translator, and enabling the content creators to change/modify appropriately to create local language content, thus helping to break the language barrier for communities, businesses and developers.

The WikiBhasha technology is easy to use, allowing a wide array of users to create Wikipedia content in non-English languages. In addition, WikiBhasha eliminates the need for the contributor to know the Wikipedia markup language.  WikiBhasha is also made into an open-source tool (released in Mediawiki), and thus could be customized for other environments easily.

Similarly, the Afkar Portal offers a set of smart tools and solutions to enrich Arabic Internet content and user experience in multiple domains like multi-language content authoring, web browsing, translation services and Arabic language experience.

Next generation of translation technology

In the realm of language technologies, the single most important ability – yet also one of the most difficult for computers – is that of translating human speech. For the last 60 years, computer scientists have been working to build systems that can understand what a person says when they talk. Over the last 10 years, better methods, faster computers and the ability to process dramatically more data has led to many improvements. Until recently though, even the best speech systems still had word error rates of 20-25 per cent on arbitrary speech.

Just over two years ago, researchers at Microsoft Research and the University of Toronto made an important breakthrough. By using a technique called Deep Neural Networks, which is patterned after human brain behaviour, researchers were able to train more discriminative and better speech recognisers. During a presentation in China last October, Rick Rashid, chief research officer at Microsoft Research, had the opportunity to showcase the results. In particular, the tool enabled Rashid, a native English speaker, to present in Chinese in his own voice. When Rashid spoke in English, the system automatically combined all the underlying technologies to deliver a robust real-time Chinese translation in his own voice. (See video recording of this demo and more detailed explanation of the technology)

With the rapid advancement of technology, we may not have to wait until the 22nd century for a universal translator. And as barriers to understanding language are removed, we can hope that barriers to understanding each other might also be reduced.

Speech Recognition Breakthrough for the Spoken, Translated Word

Get Microsoft Silverlight

Chief Research Officer Rick Rashid demonstrates a speech recognition breakthrough via machine translation that converts his spoken English words into computer-generated Chinese language. The breakthrough is patterned after deep neural networks and significantly reduces errors in spoken as well as written translation.