Mircea Giurgiu * Results on Automatic Speech Recognition in Romanian




Table 5. Recognition performances for independent Romanian digit recognition with HMM
WORD SPEAKER 1 SPEAKER 2 SPEAKER 3 OUTSIDE SPEAKER
UNU 100.00% 100.00% 100.00% 100.00%
DOI 80.00% 100.00% 100.00% 80.00%
TREI 80.00% 100.00% 100.00% 100.00%
PATRU 80.00% 100.00% 100.00% 100.00%
CINCI 100.00% 80.00% 80.00% 60.00%
ªASE 100.00% 100.00% 100.00% 80.00%
ªAPTE 100.00% 100.00% 100.00% 80.00%
OPT 80.00% 60.00% 100.00% 80.00%
NOUÃ 100.00% 100.00% 100.00% 100.00%
ZECE 100.00% 100.00% 100.00% 100.00%
TOTAL 92.00% 94.00% 98.00% 88.00%

3.3. Experiments using Artificial Neural Networks for automatic speech recognition

There are three different neural networks-based approaches for speech recognition: (a) feed-forward networks (transform sets of input signals into set of output signals), (b) feedback networks (the input information defines a state of a feedback system, and after transitions the asymptotic final state is identified as the outcome of the computation), as MLP are, and (c) self-organizing feature maps (SOFM) (mutual lateral interconnections among neighboring cells in a neural network develop specific detectors of different signal patterns) [3,9].

3.3.1. Experiments on speech recognition with Multilayer Perceptrons

We are focusing in this case on feed-forward networks known as Multilayer Perceptrons (MLP). The MLP structure we propose has ni neurons in the input layer, nh neurons in the hidden layer, and no neurons in the output layer. The network is fed up at the input with the spectral information extracted from the speech wave and gives a coded output for the recognized word. The appropriate number of neurons in each layer has been studied. For word recognition task the ANN are fed up with spectral information of the word. All training examples consisting of spectral patterns extracted from words collected from different speakers are presented cyclically until the error of the entire set is acceptably low. After training, an MLP has the ability for a proper response to input patterns not presented during the learning process [9].

The application of MLP to automatic word recognition task consists of defining an output layer with the number of units equal to the size of the vocabulary (10 neurons, each corresponding to the digits 1, 2,...,10) uttered in Romanian language. One hidden layer with a variable number of processing units has been used and the input layer is fed by LPC spectral representation or with the VQ codewords of each digit. When MLP are used for speech recognition, a big problem arises from the beginning: the variable length of speech patterns, which does not suit the fixed dimension of MLP input. In this particular case the input layer has 600 or 50 processing units, each corresponding to one spectral band derived from LPC spectrum or to a VQLPC codeword. Different tests have been done in order to find the activation function which gives the best recognition rate using MLP and, finally, the sigmoid function has been selected for all experiments, because it better encodes the speech spectral information (Table 6). The training has been accomplished by using Back-Propagation algorithm and the entire speech database has been divided in two parts (one for training and one for testing), each containing 200 patterns [3,10].

Table 6. Recognition performances for MLP with sigmoid and the activation functions
MLP
input/hidd/output (spectral info)
RECOGNITION [%]
f(x)=1/(1+e-x)
RECOGNITION [%]
f(x)=th(x)
600/40/10 (LPC)
600/50/10 (LPC)
100/140/10 (VQLPC)
100/120/10 (VQLPC)
84.3
82.5
81.5
75.0
20.0
47.5
48.0
42.5

Table 7. The characteristics of four MLP used in experiments
Network Inputs Hidden Outputs
net40sw.nnw 600 LPC 40 10
net50sw.nnw 600 LPC 50 10
nvq120t.nnw 50 VQLPC 120 10
nvq140t.nnw 50 VQLPC 140 10




183

Previous Index Next