Marian Boldea * Speech Technology Research at Computer Science Department, "Politehnica" University of Timiºoara




Fig. 1. - Block diagram of the text-to-speech system.



4. Speech recognition

The last two decades have established hidden Markov models (HMM) [2] as the most successful tool in automatic speech recognition, and attempts to resort to other methods ended up as mere variations of output observations probability estimators for them [14,15], so they have been chosen for all our work in this area.

A first set of experiments have explored their use in isolated word recognition [3] and tried to select an optimal signal processing method [4]. Using a database of 0 to 9 digits [5] spoken by 100 speakers, balanced by sex and split into a 68-speaker training test and a 32-speaker test set, recognition scores higher than 99% have been achieved [4].

At present, continuous speech recognition is investigated by using an appropriate database [10] to be first automatically labelled at the phoneme level. This will be realised by using a simplified version of an automatic continuous speech recognition system [16] (Fig. 2).

Once labelled, the database will be used to train acoustic models for a speaker- independent continuous speech recognizer (Fig. 3), which essentially is a directed Viterbi decoder that evaluates the probabilities of various word sequences as the likelihoods of the analysed signal being generated by the sequence of acoustic models associated to their pronunciations, with a pruning procedure used to reduce the computational load and processing time [17].

Fig. 2. - Block diagram of the automatic labelling system.

Fig. 3. - Block diagram of the automatic continuous speech recognition system.



5. Instead of conclusions

Although a series of encouraging achievements have been obtained so far, the experience gained confirms the views of well-established researchers [18] that the increased accessibility of computers to a wide range of users through voice interfaces requires multidisciplinary research, the development of shared linguistic resources, the availability of adequate computational support, and rapid communication with the scientific community.

As far as our work is concerned, at one point or another we have felt the need for all these, and we hope that the future will bring some changes without which there is little to be expected.



176

Previous Index Next