Corneliu Burileanu & al * Text-to-SpeechSynthesis for Romanian Language




For the pitch detection we used a parallel processing approach based on an algorithm first proposed by Gold and then modified by Gold and Rabiner [19]. The basic principles of the algorithm are the following:

The steps in pitch detection are:

  1. The speech signal is applied at the input of a lowpass filter with a cutoff of about 800 Hz. Special attention was paid to this filtering because it is crucial both for the accuracy of pitch detection and the processing time. In Figure 6 it is shown the response (magnitude in dB) for the FIR filter we used.

    Fig. 6. - The FIRfilter response

  2. Following the filter, six impulse trains are obtained such as: a train with impulses equal to each maximum, at the location of each peak; another train with impulses equal to the difference between peak amplitude and the previous minimum amplitude, at the location at each maximum; a train with impulses equal to the difference between two consecutive peak amplitudes, at the location of each peak; similarly with the negative of the amplitude of each minimum, at the location of each valley.
  3. Each impulse train is processed by six individual pitch period estimators with a "peak detecting exponential window" algorithm.
  4. The six estimates are combined with two of the most recent estimates for each detector. Then these estimates are compared and the value with the most occurrences is declared the pitch period at this time. If there is an obvious lack of consistency among the estimates, the frame is declared as "unvoiced" and the value of the pitch is set to zero. Figure 7 shows the contour of the pitch (in samples) during the utterance: Detecþia perioadei fundamentale folosind metoda Rabiner ("Pitch detection using Rabiner method").The most important problem remains the voiced/unvoiced frame detection.

Fig. 7. - Pitch contour for a sentence in Romanian language.



5. Conclusions

General considerations concerning a speech-to-text system for the Romanian language were presented. First of all, we pointed out the terms and definitions related to the automatic synthesis of speech. The most important techniques of the speech synthesis implementation were presented in a tutorial manner. Then, a general approach to a system for text-to-speech synthesis was detailed. We described two different approaches for this task:





149

Previous Index Next