The last part describes a possible
application of the model which can be used in speech generation.
The languages are very different in structure and also the origin
of them is very different. The results of some studies made for
several European languages [4] have come to
the conclusion that
there are phonemes which are common to those languages. These
phonemes, which are common to most of languages, will be treated
in a uniform way in the present framework. Additionally, there
are a number of specific phonemes for every language. The model
is constraint-based and, instead of rules, the system has a set
of constraint descriptions. The set of the constraints used in
generation has two parts: a number of constraints (which are applicable
in every language) used to generate common phonemes, and a set
of structures describing some constraints which are language-dependent.
The system is modelled on the basis of a comparison between English
and Romanian languages. The constraints will be applied through
the unification between structures describing the constraints
and the structure describing the text to be translated. The system
can be extended for complete texts, because the f-structures describe
in the same way isolated words and complex phrases.
For every language, a small exception
dictionary containing difficult words and their phonetic representation
is available. A lexical entry has a HPSG-like representation.
A new word is added to the dictionary after a test (the phonetic
form is generated by using the set of constraints, and this form
is compared with the phonetic representation associated with the
word).
The system can work in two ways:
- a word from the dictionary
is pronounced according to its corresponding orthographic representation;
- a new word can be pronounced
after the language has been specified and the corresponding orthographic
form has been generated according to the set of constraints.
A collection of phonemes is available,
being stored in f-structures of the types introduced before. To
improve the pronunciation, some techniques for modifying speech
parameters like intonation and rhythm [5,6,7]
are also described by using the formalism.
Example:
Most part of the consonants in Romanian and English languages
are described in the same way, using the representations introduced
with the model.
English: p is a bilabial,
fortis, voiceless, plosive consonant;
Romanian: p is a labial,
voiceless, occlusive consonant.
The representation given below
is a good description of the phoneme and covers the descriptions
given in both languages:
The steps necessary to be followed
in order to obtain speech synthesis from the initial text are:
- the transformation from text
to orthographic string, using context-dependencies rules;
- accent identification, after
the correct identification of the syllables;
- application of rhythm and
intonation rules, in order to improve the pronunciation.
The constraints specific to each
language are divided into three sets:
- constraints imposed by the
context (context-dependencies rules);
- accent identification rules
(depending on the four suprasegmental phonemes existing in each
language);
- rhythm and intonation rules
(which are used in the case of continuous speech).
171