Romanian Language Technology

Amalia Todiraºcu * A Unification-Based Model for Speech Generation

The constraints that express the above rules will be the negation of the action of rules. There are some examples of constraints to be applied when a hyphenation algorithm is used:

The rule "a syllable must contain at least a vowel" is represented as:
The rule VCV --> V- CV is represented as:
The Romanian rule "a syllable must contain only one vowel" has the following representation:

The constraints are structured in a hierarchy, with respect to their priorities. It is possible that a given delimitation of the syllables may violate one of the constraints, but the delimitation will be considered valid, because the violated constraint is a lower-priority one. Exceptions rules have high priorities.

Example: In Romanian, the rules

VCC*V --> VC - C*V and
VCct*V --> VCc - t*V

are contradictory, but rule 2 expresses an exception and will be a higher-priority rule.

The list of the syllables is obtained in two steps: nucleus identification and constraint verification. Nucleus identification is achieved by considering the most sonorous phonemes of the word. The rest of the phonemes are grouped around each nucleus. In case the constraints are violated, the syllables which do not satisfy the rule will be concatenated with the previous one (if this one exists) or with the next one (depending on the case), and the new list of syllables will be verified again with the list of constraints.

Example. A possible classification of the phonemes due to their sonority could be the following (in the decreasing order of sonority):

English: open vowels; closed vowels; the liquids /l/ and /r/; the nasals /m, n, /; the fricatives, affricates and plosives, voiceless plosives. The vowels (open or closed), the liquids and the nasals can be the nucleus of a syllable.
Romanian: the vowels are more sonorous than the consonants. Only the vowels can be the nucleus.

The constraints to be satisfied are language-dependent. In the case of alternation between consonants and vowels, the constraints system works well. Some ambiguities will appear in the case of diphthongs and triphthongs. When this kind of ambiguities are met and the constraints are not efficient, an exception dictionary will be used to correctly hyphenate the word. A hash table makes the access to the dictionary very fast. The exception dictionary contains about 1,000 Romanian words. The entries in this dictionary are also modelled in HPSG style. The model can be easily modified for other languages. The structures describing the constraints must be replaced with the constraints for the new language and a new exception dictionary will be created. The size of the dictionary is language-dependent.

The correct identification of the syllables of the word is important in the context of speech generation. The syllables may be stressed or unstressed, and can be grouped in rhythmic groups (or stress groups). The syllables identified by using the method described below are processed to identify the stressed ones and the rhythmic groups.

Each language has its own rules for the accent position. A syllable that carries the accent is made prominent by the action of the stress, pitch, quality and quantity. Stress and pitch are two of the suprasegmental phonemes, which can be identified by the same criteria as the segmental ones.

170