Corneliu Burileanu & al * Text-to-SpeechSynthesis for Romanian Language
A general text-to speech system includes two main stages:
The basic steps involved in converting text to speech are illustrated in Figure 3.
Fig. 3. - Principal elements of a text-to-speech conversion system.
I. The linguistic preprocessing must convert the input text in a "normalized" form, that can be then correctly processed (we will assume that an ASCII representation of each input sentence is available as input to the text analysis stage):
II. Syntactic analysis is important for word pronunciation as the one of several components in the determination of prosody; at least a partial syntactic analysis is made, in order to take some decision about the syntactic structure of the sentence (identification of the part of speech of each word).
III. The next stage performs the letter-to-phoneme conversion; the phonemic representation is obtained with the use of a dictionary and letter-to-sound rules, so that orthographic characters are mapped into the appropriate strings of phonemes and their associated lexical stress markers.
In text-to-speech synthesis a reliable grapheme-to-phoneme conversion tool becomes an important part, whether the synthesis method is. Most commercial existent TTS systems translate input text into a phonetic transcription by making a compromise between the use of letter-to-sound rules (sometimes hundreds of ordered rules) and a pronouncing dictionary (for most common irregular words, for example). The development of a large set of rules for a certain language is usually a very difficult and laborious task; also the storage of a large dictionary and the time required to identify a word raise many problems.
Stress assignment rules are also
an important constituent of TTS systems; incorrect lexical stress
assignment is very unpleasant for the listener, because the stress
pattern (which affects several acoustic dimensions) is a basic
feature in the recognition of speech. Stress assignment rules
are generally modeled after linguistic theories, but some simpler
stress rules may be used if one knows only the number of syllables
and basic syllabic structure.
143