Peter Roach *
Speech Technology: a Look into the Future
3. Speech synthesis applications
In talking about speech recognition,
[24] we noted that we can speak more rapidly
than we can type.
One disadvantage to speech synthesis as a way of providing information
is that, in general, we can read a screen more rapidly than we
can listen to a voice. Receiving information from a synthesiser
can be frustratingly slow, so we need to look carefully to find
applications where the advantages of speech output compensate
for this. Clearly we should look at cases where the user's eyes
are not available. In-car information is one example which is
developing rapidly: as cars become stuck in congestion more and
more often, there is a growing market for systems which advise
on the least congested route. This is more useful than the rudimentary
in-car synthesis system of a decade ago, when mechanical voices
informed the driver about the car's oil level or an unfastened
seat-belt. Drivers did not like them (indeed, it has been reported
that some drivers actually paid garages to disconnect the voice
module). A recent example from the UK of the difficulty of getting
synthetic voices into everyday use comes from a story in the Guardian
newspaper, reporting on trials of a new "talking bus stop"
in Leeds. Passengers waiting for a bus could press buttons on
a panel to ask for spoken information about the various destinations
of buses. Unfortunately, a young computer "hacker" managed
to penetrate the computer system providing this service and substituted
some messages that should not be spoken in polite society, and
certainly never by a talking bus stop. When this breach of security
had been fixed, another problem remained: the synthesiser could
only speak in a refined Southern accent ("Received Pronunciation")
which is not liked in the Yorkshire city of Leeds, and there were
so many complaints about this accent that the service has been
withdrawn until a Yorkshire accent can be substituted.
Speech synthesis can, however,
help the disabled. One of the most attractive applications is
that of reading machines for the blind. A printed page is scanned,
the text is converted into phonetic symbolic form and speech is
synthesised. This requires a synthesis-by-rule program, and the
improvement to synthesis-by-rule is probably the most important
activity in this field. Of course, speech synthesis can also help
those disabled who are unable to speak. One of Britain's greatest
scientists, Professor Stephen Hawking, is only able to speak by
means of a "pointer" keyboard and speech synthesiser.
Sadly, it must be admitted that the application of
speech synthesis which is most likely to make money is that of
talking toys.
3.1. Synthesis techniques
As with recognition, it is not possible here to review
the whole range of synthesis techniques. I would like to mention
a few important points, however. Firstly, the self-teaching processes
described under speech recognition above work also for synthesis
- Hidden Markov Models and Artificial Neural Networks can be used
for synthesis, and it follows that our work on constructing speech
databases has value in this field also. Secondly, there are many
applications where it has been found that a completely artificial
synthetic voice is not necessarily the best solution.
136