Mircea Giurgiu * Results on Automatic Speech Recognition in Romanian
WORD | 20 | 30 | 60 | 90 | 150 | 200 | 300 | 400 |
1 A
T | 100
100 | 100
100 | 100
100 | 100
100 | 100
100 | 100
100 | 100
100 | 100
100 |
2 A
T | 76.92
66.67 | 76.92
66.67 | 84.62
66.67 | 100
75.11 | 100
66.67 | 100
66.67 | 100
66.67 | 100
66.67 |
3 A
T | 84.62
76.92 | 84.62
76.92 | 100
92.31 | 100
84.62 | 100
92.31 | 100
92.31 | 100
92.31 | 100
92.31 |
4 A
T | 100
100 | 100
100 | 100
75.00 | 100
75.00 | 100
75.00 | 100
75.00 | 100
75.00 | 100
75.00 |
5 A
T | 100
100 | 100
100 | 100
92.31 | 100
100 | 100
100 | 100
100 | 100
100 | 100
100 |
6 A
T | 61.54
53.85 | 61.54
53.85 | 100
100 | 100
100 | 100
92.31 | 100
92.31 | 100
92.31 | 100
92.31 |
7A
T | 61.54
57.14 | 61.54
57.14 | 76.92
50.00 | 76.92
50.00 | 100
71.43 | 100
71.43 | 100
71.43 | 100
71.43 |
8 A
T | 46.14
69.23 | 46.15
69.23 | 76.92
76.92 | 84.62
69.23 | 84.62
69.23 | 84.62
69.23 | 84.62
69.23 | 84.62
69.23 |
9 A
T | 100
83.33 | 100
83.33 | 100
91.67 | 100
91.67 | 100
91.67 | 100
91.67 | 100
91.67 | 100
91.67 |
10A
T | 91.67
61.54 | 91.67
61.54 | 91.67
84.62 | 91.67
84.62 | 91.67
84.62 | 91.67
84.62 | 91.67
84.62 | 91.67
84.62 |
TOTAL
A
T | 81.89
76.38 | 81.89
76.38 | 92.91
82.68 | 95.28
83.46 | 97.64
84.25 | 97.64
84.25 | 97.64
84.25 | 97.64
84.25 |
DTW approach has been especially
used to better understand the ASR problem, but it has proved to
be, in fact, an efficient technique for small vocabularies, where
it offers very good temporal description. The recognition is accurate
when the average length of references is approximately the same
as that of the input words. It is proposed a DTW system which
incorporates the VQ of LPC patterns in order to reduce the amount
of computing. Unfortunately, the algorithms are not able to model
long-term correlation in the speech wave and adaptation system
to a new speaker is time-consuming.
Based on a statistical model of
production for each word, which has well defined mathematical
description, the HMM are able to recognize speaker-independent
words. The extension of vocabulary could be done only by adding
a new HMM in the database. The main problems of HMM are the estimation
of parameters because of insufficient training data, the probabilities
scaling, and the computing time for training.
The application of ANN for speech recognition proves
the ability of MLP structures to recognize speech patterns. To
avoid the variable length duration of spoken words at the input
of MLP, we have proposed and experimented the spectral segmentation
of speech sequence. This is important for the evaluation of the
stationary segments from the speech wave with some possible impact
on the phonetics of Romanian digits. Moreover, using the segmentation,
the recognition performances increase with 6% and the training
time diminishes, as compared to the standard MLP approach.
The influence of ANN parameters on the recognition
rate has been experimented, too. We have experimented the influence
of the number of reference vectors in SOFM and it has been demonstrated
the positive influence on recognition error of an increased number,
but not greater than 150. The results of the work demonstrate
the power of SOFM when applied to classification of speech and
the applicability of the Constrained Clustering Segmentation to
segmenting speech into significant acoustic segments. The training
process is faster than backpropagation algorithm from MLP, is
unsupervised, but a labeling process is needed for each entry
in the training set. The experiments presented under this framework
reveal the capacity of SOFM to better recognize speech patterns
when CCS is used and with an increase of speed.
Using the accumulated experience
on ASR for isolated words, we would like to continue the work
for the continuous speech recognition and to develop a practical
application, possibly by using hardware platforms with dedicated
signal processors.
Acknowledgments.
The author would like to thank Prof. Toderean Gavril from the
Technical University of Cluj-Napoca for the PhD supervising and
Dr. Dan Tufiº for his continuous efforts to highlight the
research potential in language technology. Special thanks are
also due to Prof. Antonio Rubio and Prof. Antonio Peinado from
Granada University, Spain, for their specialized and worthy points
of view during the author's visits in their department.
186