Romanian Language Technology

Mircea Giurgiu * Results on Automatic Speech Recognition in Romanian

Table 5. Recognition performances for independent Romanian digit recognition with HMM
WORD	SPEAKER 1	SPEAKER 2	SPEAKER 3	OUTSIDE SPEAKER
UNU	100.00%	100.00%	100.00%	100.00%
DOI	80.00%	100.00%	100.00%	80.00%
TREI	80.00%	100.00%	100.00%	100.00%
PATRU	80.00%	100.00%	100.00%	100.00%
CINCI	100.00%	80.00%	80.00%	60.00%
ªASE	100.00%	100.00%	100.00%	80.00%
ªAPTE	100.00%	100.00%	100.00%	80.00%
OPT	80.00%	60.00%	100.00%	80.00%
NOUÃ	100.00%	100.00%	100.00%	100.00%
ZECE	100.00%	100.00%	100.00%	100.00%
TOTAL	92.00%	94.00%	98.00%	88.00%

3.3. Experiments using Artificial Neural Networks for automatic speech recognition

There are three different neural networks-based approaches for speech recognition: (a) feed-forward networks (transform sets of input signals into set of output signals), (b) feedback networks (the input information defines a state of a feedback system, and after transitions the asymptotic final state is identified as the outcome of the computation), as MLP are, and (c) self-organizing feature maps (SOFM) (mutual lateral interconnections among neighboring cells in a neural network develop specific detectors of different signal patterns) [3,9].

3.3.1. Experiments on speech recognition with Multilayer Perceptrons

We are focusing in this case on feed-forward networks known as Multilayer Perceptrons (MLP). The MLP structure we propose has n_i neurons in the input layer, n_h neurons in the hidden layer, and n_o neurons in the output layer. The network is fed up at the input with the spectral information extracted from the speech wave and gives a coded output for the recognized word. The appropriate number of neurons in each layer has been studied. For word recognition task the ANN are fed up with spectral information of the word. All training examples consisting of spectral patterns extracted from words collected from different speakers are presented cyclically until the error of the entire set is acceptably low. After training, an MLP has the ability for a proper response to input patterns not presented during the learning process [9].

The application of MLP to automatic word recognition task consists of defining an output layer with the number of units equal to the size of the vocabulary (10 neurons, each corresponding to the digits 1, 2,...,10) uttered in Romanian language. One hidden layer with a variable number of processing units has been used and the input layer is fed by LPC spectral representation or with the VQ codewords of each digit. When MLP are used for speech recognition, a big problem arises from the beginning: the variable length of speech patterns, which does not suit the fixed dimension of MLP input. In this particular case the input layer has 600 or 50 processing units, each corresponding to one spectral band derived from LPC spectrum or to a VQLPC codeword. Different tests have been done in order to find the activation function which gives the best recognition rate using MLP and, finally, the sigmoid function has been selected for all experiments, because it better encodes the speech spectral information (Table 6). The training has been accomplished by using Back-Propagation algorithm and the entire speech database has been divided in two parts (one for training and one for testing), each containing 200 patterns [3,10].

Table 6. Recognition performances for MLP with sigmoid and the activation functions
MLP
input/hidd/output (spectral info) RECOGNITION [%]
f(x)=1/(1+e^-x) RECOGNITION [%]
f(x)=th(x)

600/40/10 (LPC)
600/50/10 (LPC)
100/140/10 (VQLPC)
100/120/10 (VQLPC) 84.3
82.5
81.5
75.0 20.0
47.5
48.0
42.5

Table 6. Recognition performances for MLP with sigmoid and the activation functions
MLP input/hidd/output (spectral info)	RECOGNITION [%] f(x)=1/(1+e^-x)	RECOGNITION [%] f(x)=th(x)
600/40/10 (LPC) 600/50/10 (LPC) 100/140/10 (VQLPC) 100/120/10 (VQLPC)	84.3 82.5 81.5 75.0	20.0 47.5 48.0 42.5

Table 7. The characteristics of four MLP used in experiments
Network Inputs Hidden Outputs

net40sw.nnw 600 LPC 40 10

net50sw.nnw 600 LPC 50 10

nvq120t.nnw 50 VQLPC 120 10

nvq140t.nnw 50 VQLPC 140 10

Table 7. The characteristics of four MLP used in experiments
Network	Inputs	Hidden	Outputs
net40sw.nnw	600 LPC	40	10
net50sw.nnw	600 LPC	50	10
nvq120t.nnw	50 VQLPC	120	10
nvq140t.nnw	50 VQLPC	140	10

183