C&B Notes

Rebuilding Babel

A variety of innovators, ranging from Microsoft to individuals in their garage, are working hard to improve machine-driven translation.  Cloud-based processing easily accessed by high-speed wireless internet will eventually allow us to all have our own personal C-3POs at our beck and call.

How long, then, before automatic simultaneous translation becomes the norm, and all those tedious language lessons at school are declared redundant?  Not, perhaps, as long as language teachers, interpreters and others who make their living from mutual incomprehension might like.  A series of announcements over the past few months from sources as varied as mighty Microsoft and string-and-sealing-wax private inventors suggest that workable, if not yet perfect, simultaneous-translation devices are now close at hand.

* * * * *

Microsoft’s contribution is perhaps the most beguiling.  When Rick Rashid, the firm’s chief research officer, spoke in English at a conference in Tianjin in October, his peroration was translated live into Mandarin, appearing first as subtitles on overhead video screens, and then as a computer-generated voice. Remarkably, the Chinese version of Mr. Rashid’s speech shared the characteristic tones and inflections of his own voice…The first challenge is to recognize and digitize speech. In t he past, speech-recognition software has parsed what is being said into its constituent sounds, known as phonemes.  There are around 25 of these in Mandarin, 40 in English and over 100 in some African languages.  Statistical speech models and a probabilistic technique called Gaussian mixture modelling are then used to identify each phoneme, before reconstructing the original word.  This is the technology most commonly found in the irritating voice-mail jails of companies’ telephone-answering systems.  It works acceptably with a restricted vocabulary, but try anything more free-range and it mistakes at least one word in four.

The translator Mr. Rashid demonstrated employs several improvements.  For a start, it aims to identify not single phonemes but sequential triplets of them, known as senones. English has more than 9,000 of these.  If they can be recognized, though, working out which words they are part of is far easier than would be the case starting with phonemes alone.  Microsoft’s senone identifier relies on deep neural networks, a mathematical technique inspired by the human brain.  Such artificial networks are pieces of software composed of virtual neurons.  Each neuron weighs the strengths of incoming signals from its neighbors and send outputs based on those to other neighbors, which then do the same thing.  Such a network can be trained to match an input to an output by varying the strengths of the links between its component neurons.