Peter Roach * Speech Technology: a Look into the Future




3. Speech synthesis applications

In talking about speech recognition, [24] we noted that we can speak more rapidly than we can type. One disadvantage to speech synthesis as a way of providing information is that, in general, we can read a screen more rapidly than we can listen to a voice. Receiving information from a synthesiser can be frustratingly slow, so we need to look carefully to find applications where the advantages of speech output compensate for this. Clearly we should look at cases where the user's eyes are not available. In-car information is one example which is developing rapidly: as cars become stuck in congestion more and more often, there is a growing market for systems which advise on the least congested route. This is more useful than the rudimentary in-car synthesis system of a decade ago, when mechanical voices informed the driver about the car's oil level or an unfastened seat-belt. Drivers did not like them (indeed, it has been reported that some drivers actually paid garages to disconnect the voice module). A recent example from the UK of the difficulty of getting synthetic voices into everyday use comes from a story in the Guardian newspaper, reporting on trials of a new "talking bus stop" in Leeds. Passengers waiting for a bus could press buttons on a panel to ask for spoken information about the various destinations of buses. Unfortunately, a young computer "hacker" managed to penetrate the computer system providing this service and substituted some messages that should not be spoken in polite society, and certainly never by a talking bus stop. When this breach of security had been fixed, another problem remained: the synthesiser could only speak in a refined Southern accent ("Received Pronunciation") which is not liked in the Yorkshire city of Leeds, and there were so many complaints about this accent that the service has been withdrawn until a Yorkshire accent can be substituted.

Speech synthesis can, however, help the disabled. One of the most attractive applications is that of reading machines for the blind. A printed page is scanned, the text is converted into phonetic symbolic form and speech is synthesised. This requires a synthesis-by-rule program, and the improvement to synthesis-by-rule is probably the most important activity in this field. Of course, speech synthesis can also help those disabled who are unable to speak. One of Britain's greatest scientists, Professor Stephen Hawking, is only able to speak by means of a "pointer" keyboard and speech synthesiser.

Sadly, it must be admitted that the application of speech synthesis which is most likely to make money is that of talking toys.

3.1. Synthesis techniques

As with recognition, it is not possible here to review the whole range of synthesis techniques. I would like to mention a few important points, however. Firstly, the self-teaching processes described under speech recognition above work also for synthesis - Hidden Markov Models and Artificial Neural Networks can be used for synthesis, and it follows that our work on constructing speech databases has value in this field also. Secondly, there are many applications where it has been found that a completely artificial synthetic voice is not necessarily the best solution.



136

Previous Index Next