Attila Ferencz & al. * A Text-To-SpeechSystem for the Romanian Language




4. Text preprocessing

A text preprocessing module is needed on the grapheme level in order to convert the incoming orthography into some linguistically reasonable standard form. There are many phenomena encountered in normal orthography like: underlining, the occurrence of capitals, abbreviations containing periods, abbreviations containing no vowels, numbers (with and without decimal point), fractions, Roman numbers, dates, times, formulas and a wide variety of punctuation including periods, commas, question marks, parentheses, quotation marks and hyphens. In our system the abbreviations are stored in a vocabulary which can be extended by the user, so field specific abbreviations can be built into the system.



5. Speech sound set

We used a set of 31 phonemes for Romanian language. Appropriate symbols were used for Romanian sounds, with the following exceptions: g1 (in ge, gi), g (in ghe, ghi), c (in ce, ci), k (in che, chi), a1 (for Romanian letter a), i1 (for î and â) s1 (for s), t1 (for t).



6. Grapheme to phoneme conversion

In order to transform the written text into phonetical form we designed a set of rules. These rules are alphabetized according to the first letter of the sequence. Each letter of the alphabet represents a separate rule block in the table. The organization of such a block is similar to a saw-tooth triangle contour, which is the widest at the top of the block where the longest rule is placed (having the highest number). The peak of the triangle is at the bottom, i.e. the last rule consists of only one letter, the initial letter of the unit itself.

Examples:	the e sound rule block	the c sound block rule

es1ti=yes1tj1y coop=koyop este=yj1es1tey cea=ca exa=egza cio=co eio=ej1o chi=ki ea=yj1ay che=ke el=yj1ely ci=ci ei=yj1eiy ce=ce e=e c=k

where y means pause, j1 means special short i.

Finally, the string of diphones is obtained. For example, the string corresponding to the word 'crocodil' is: yk kr ro ok ko od di il ly.



7. Accent, intonation and rhythm

We need a set of prosodic rules to obtain synthesized speech close to the natural speech. The three main aspects of prosody are: accent, intonation and rhythm, related to the energy, the frequency and the temporal aspect of the vocal signal.

For the Romanian language the word accent is free, choosing between the last two syllables of the word, but there are many words with other place of accent. On the other hand, semantically different words have the same orthography. For example:

	cúrele (cure -plural)   curéle (belt -plural)

vésela (gay -feminine, plural) veséla (dishes)

7.1. Sentence intonation

When devising acceptable intonation for unrestricted text, a set of rules has to be formulated which produces natural sounding pitch contours for utterances that may have never been spoken.

We experimentally determined the pitch contour for different kinds of sentences (declaratives, questions, exclamations). For declarative sentences, the fundamental frequency rises on the first word from 100% to 140% of its value and slows down to 125% for the last part of this word, and continues slowing down until the end of the sentence, except for the last word. Here it falls at 70% and remains constant.

Questions can be with Q-word (specific word for interrogation) or without. For the former, the fundamental frequency rises on this word from 100% to 160% and comes down to 100%. For the last type of question we adopted a conventional pitch contour, but very subtle intonation effect cannot be handled.

7.2. Punctuation marks effects superimposing

Finally we superimpose the effect of the punctuation marks to obtain the sound waveform.The acoustic realization of comma, semicolon and colon improves the naturalness of the resulted speech. In our system the fundamental frequency slows down to 70% at the middle of the word preceding the comma, and rises to 120% at the end of this word.



165

Previous Index Next