Romanian Language Technology

Amalia Todiraºcu * A Unification-Based Model for Speech Generation

4. A multilingual system for speech generation

The last part describes a possible application of the model which can be used in speech generation. The languages are very different in structure and also the origin of them is very different. The results of some studies made for several European languages [4] have come to the conclusion that there are phonemes which are common to those languages. These phonemes, which are common to most of languages, will be treated in a uniform way in the present framework. Additionally, there are a number of specific phonemes for every language. The model is constraint-based and, instead of rules, the system has a set of constraint descriptions. The set of the constraints used in generation has two parts: a number of constraints (which are applicable in every language) used to generate common phonemes, and a set of structures describing some constraints which are language-dependent. The system is modelled on the basis of a comparison between English and Romanian languages. The constraints will be applied through the unification between structures describing the constraints and the structure describing the text to be translated. The system can be extended for complete texts, because the f-structures describe in the same way isolated words and complex phrases.

For every language, a small exception dictionary containing difficult words and their phonetic representation is available. A lexical entry has a HPSG-like representation. A new word is added to the dictionary after a test (the phonetic form is generated by using the set of constraints, and this form is compared with the phonetic representation associated with the word).

The system can work in two ways:

a word from the dictionary is pronounced according to its corresponding orthographic representation;
a new word can be pronounced after the language has been specified and the corresponding orthographic form has been generated according to the set of constraints.

A collection of phonemes is available, being stored in f-structures of the types introduced before. To improve the pronunciation, some techniques for modifying speech parameters like intonation and rhythm [5,6,7] are also described by using the formalism.

Example: Most part of the consonants in Romanian and English languages are described in the same way, using the representations introduced with the model.

English: p is a bilabial, fortis, voiceless, plosive consonant;
Romanian: p is a labial, voiceless, occlusive consonant.

The representation given below is a good description of the phoneme and covers the descriptions given in both languages:

The steps necessary to be followed in order to obtain speech synthesis from the initial text are:

the transformation from text to orthographic string, using context-dependencies rules;
accent identification, after the correct identification of the syllables;
application of rhythm and intonation rules, in order to improve the pronunciation.

The constraints specific to each language are divided into three sets:

constraints imposed by the context (context-dependencies rules);
accent identification rules (depending on the four suprasegmental phonemes existing in each language);
rhythm and intonation rules (which are used in the case of continuous speech).

171