Peter Roach *
Speech Technology: a Look into the Future
As signal processing techniques develop, it can be more practical to manipulate
"real" speech signals to generate new messages without
the old problem of noticeable discontinuities in the signal where
pieces of speech are joined together. Finally, it is important
to remember that while high-quality synthesis of small sections
of speech is important in the speech research context, it is synthesis-by-rule
which represents the commercial future of this field, and at present
there is still a long way to go before the goal of truly natural-sounding
synthetic speech from synthesis-by-rule is achieved. Synthesis-by-rule
takes as its input written text and produces as its output connected
speech. One of the most important components of such a system
is a dictionary which gives the pronunciations of words in computer-readable
form. I have recently finished working on a 90,000-word pronouncing
dictionary of English [25]
which exists in computer-readable form,
and it is our intention to exploit this dictionary for such purposes.
Another pronunciation dictionary [26]
is also in computer-readable
form.
One of the most significant challenges
is the production of realistic prosody. As with recognition, this
is an area waiting for more research work to be done
[27, 28, 29].
We need large amounts of prosodically-transcribed data. One
example of such data is the Spoken English Corpus, and its computer-readable
version MARSEC [19, 20].
This makes it possible to train a prosody-generating
program, and to simulate attitudes and emotions [30].
4. Education
Although I have spoken of specific issues in recognition
and synthesis, I would like to speak of education as a separate
area. There is growing interest in speech technology
[31] as a
way of providing additional teaching for advanced-level language
learners who need practice in using the spoken language. Computer
systems are being developed which give learners tasks, evaluate
their spoken performance and diagnose errors. It is not realistic
to think of these as replacing teachers, but rather as providing
additional work for students who require additional practice outside
the classroom. A good example of a specific research project which
made important progress in this area is the SPELL project, funded
by the European Union's ESPRIT programme (see papers by
[32, 33, 34, 35, 36]).
For a different approach, with the suggestion that
speech files could be moved over the Internet for pronunciation
training, see [37].
5. Conclusions
For most of my research career,
speech recognition and speech synthesis have been areas of technology
that were only available in well-funded research laboratories.
The first speech recognition system that I worked with in the
1980's cost about the same price as a good new car. In the last
few years the price of this technology has dropped dramatically,
and some of the major manufacturers of speech recognition technology
are selling small-scale systems for as little as $50, though they
are capable of being used for serious dictation work. The breakthrough
into widespread public use of speech technology is already beginning
to happen. One of the most urgent tasks facing us is that so much
work remains to be done on languages that are not major world
languages. I hope that speech research on languages such as Romanian
will be encouraged and accelerated by these developments.
137