Peter Roach * Speech Technology: a Look into the Future




I have to say that I do not regard automatic dictation machines as an unmixed blessing: when the technology is fully established in the market, and computer companies are making good profits from selling their machines, tens of thousands of secretarial jobs will be lost, while in general letters will not be typed any better than they were before. There are other applications where the benefits are clearer. One very large field is that of telephone interaction with information systems [13]. Although it is often possible to connect a computer to a remote server over telephone lines, this is often inconvenient, and to be able to use speech over the telephone is a real advantage. I have had experience of using a telephone airline flight enquiry system developed by the Marconi company and found it very effective. Enquiries about trains, weather, financial movements, entertainment, bank accounts, even elementary medical diagnosis could be made to work in this way; imagine a remote community with no nearby doctor being able to dictate the symptoms of an illness to a computer and receive advice on how serious the condition was likely to be, with the option of having the computer call a doctor in if the diagnosis was sufficiently urgent. The advantage of this becomes even more obvious if you imagine a community where most people are illiterate and would be unable to use a keyboard even if they had one. The computer which receives the telephone call and gives out the information is, of course, available 24 hours per day, every day. There are two major technical challenges here: one is that most telephone systems still effectively band-bass filter the signal between about 300 and 3,300 Hz, severely reducing the amount of acoustic information available to the recogniser. The second is that the system must be able to work with any voice presented to it: the full range of speaker types (male and female, young and old), accent types and speaking styles must be recognised, and while in an office environment it may be assumed that operators of voice-input technology will be well-trained and co-operative, information system users calling in by telephone may well be less easy to work with.

Another big area within the applications field is the "hands and eyes busy" situation - situations where someone needs to interact with a computer but is not in a position to use a keyboard. The example most often quoted is that of aircraft pilots, but I think there are many less exotic applications. One is certainly the car phone: it is well known that drivers dialling telephone numbers while at the wheel are unsafe, and the technology exists already to allow drivers to request a telephone number by voice. Manual dialling while driving should be made illegal, and this would dramatically boost the sale of "voice-dialling" systems. There are many other applications: I have been involved in several projects with the Institute for Transport Studies in Leeds University. Research in transport engineering requires a lot of observation of traffic in motion, and researchers often have to stand on motorway bridges or railway platforms manually recording what they see. Studies of urban car parking may require regular recording of information on all the cars in a car park, while surveys of street fittings such as warning notices and road markings also have to be surveyed regularly. We found these tasks could be made much easier if the researchers were equipped with hand-held portable computers with voice recognition capability - the spoken data was entered directly into a database. In our research, we found that recogniser performance could be unsatisfactory, and could deteriorate over time as the speaker became tired, unless they received immediate feedback from the computer confirming what had been said. We also looked at the work of geologists: in inspecting core samples, the geologists were often working in difficult and dirty conditions which would have resulted in damage to most computers and would have resulted in keyboards covered in mud; however, using a radio microphone connected to a speech recogniser allowed observations of the samples to be entered instantly into the computer being used for the survey work.

Another application area for speech technology, and one with a value that everyone can see, is in helping the disabled. There are many people who are physically unable to operate a keyboard but have the power of speech. To be able to control their environment by spoken commands (open the door, switch on the heating, operate an alarm) would be a big help to such people, and voice-operated devices can provide this.

2.2. Techniques in speech recognition

Many books and papers on speech technology devote considerable space to reviewing the heroic days of the 1970's and 80's when many of the computational techniques in use today were laboriously worked out. There is not space in this paper to go through such a historical review. I would simply like to make two basic points. Firstly, the most lasting developments of speech technology have been the result of partnership between specialists in computer science and electronic engineering on the one hand and specialists in speech science and linguistics on the other. Attempts to solve the many problems of speech recognition simply by advanced engineering have resulted in systems that work satisfactorily within the laboratory for an ideal speaker, but have been unable to survive exposure to the enormous variability of speech in the real world. The input of speech science has been of different types in different applications, but I believe phonetic expertise is always an essential component of a successful system.



134

Previous Index Next