Peter Roach * Speech Technology: a Look into the Future




As signal processing techniques develop, it can be more practical to manipulate "real" speech signals to generate new messages without the old problem of noticeable discontinuities in the signal where pieces of speech are joined together. Finally, it is important to remember that while high-quality synthesis of small sections of speech is important in the speech research context, it is synthesis-by-rule which represents the commercial future of this field, and at present there is still a long way to go before the goal of truly natural-sounding synthetic speech from synthesis-by-rule is achieved. Synthesis-by-rule takes as its input written text and produces as its output connected speech. One of the most important components of such a system is a dictionary which gives the pronunciations of words in computer-readable form. I have recently finished working on a 90,000-word pronouncing dictionary of English [25] which exists in computer-readable form, and it is our intention to exploit this dictionary for such purposes. Another pronunciation dictionary [26] is also in computer-readable form.

One of the most significant challenges is the production of realistic prosody. As with recognition, this is an area waiting for more research work to be done [27, 28, 29]. We need large amounts of prosodically-transcribed data. One example of such data is the Spoken English Corpus, and its computer-readable version MARSEC [19, 20]. This makes it possible to train a prosody-generating program, and to simulate attitudes and emotions [30].

4. Education

Although I have spoken of specific issues in recognition and synthesis, I would like to speak of education as a separate area. There is growing interest in speech technology [31] as a way of providing additional teaching for advanced-level language learners who need practice in using the spoken language. Computer systems are being developed which give learners tasks, evaluate their spoken performance and diagnose errors. It is not realistic to think of these as replacing teachers, but rather as providing additional work for students who require additional practice outside the classroom. A good example of a specific research project which made important progress in this area is the SPELL project, funded by the European Union's ESPRIT programme (see papers by [32, 33, 34, 35, 36]). For a different approach, with the suggestion that speech files could be moved over the Internet for pronunciation training, see [37].

5. Conclusions

For most of my research career, speech recognition and speech synthesis have been areas of technology that were only available in well-funded research laboratories. The first speech recognition system that I worked with in the 1980's cost about the same price as a good new car. In the last few years the price of this technology has dropped dramatically, and some of the major manufacturers of speech recognition technology are selling small-scale systems for as little as $50, though they are capable of being used for serious dictation work. The breakthrough into widespread public use of speech technology is already beginning to happen. One of the most urgent tasks facing us is that so much work remains to be done on languages that are not major world languages. I hope that speech research on languages such as Romanian will be encouraged and accelerated by these developments.



137

Previous Index Next