Svetlana Cojocaru * Romanian Lexicon: Tools, Implementation, Usage




2.2.2. Nouns and adjectives

We have determined the criteria to classify nouns and adjectives in three inflexion groups: automated, partial automated and irregular. In order to inflect a word it is necessary to know its vowel and consonant alternations and affix series. Absolute regular alternations are divided, in their turn, into two groups: absolute automatic, when inflections are produced without alternations or with consonant alternations and semiautomatic, characterised by vowel and consonant alternations.

After the affix separation in the base word, the absolute regular automatic alternation rules specified for nouns and adjectives are applied (for example, the consonant alternations for masculine nouns, the vowel alternations ea e, ia ie for nouns and adjectives etc.) The partial regular alternations require a detailed context analysis; sometimes there is a need to ask the user for additional information. Words with irregularities form a separate set. They are emphasised a priori and processed in a special way.

The word to be inflected is entered, then the grammar category of this word is determined. If the word is a noun, then its gender and number are specified. For each word, the basic form (indefinite form, nominative case and singular number) is specified. Division of the word into root and affix is made as follows. The affixes specified in [3,5,6] are arranged in the decreased order of their lengths. If the noun proposed for inflection belongs to the set of irregular words then all the flections immediately appear on the screen; otherwise, for each affix a special procedure of noun inflection is selected. It is identified the affix in the decreased order of length. If it coincides with one of the affixes, then the corresponding procedure of inflection is called.

The word affix serves as a distinctive criteria. If the key-affix belongs to the absolute regular set of nouns or adjectives (to be inflected without the user's interference), then the specified word is declined in accordance with the inflection model found. If the key-affix belongs to the set of partial regular nouns or adjectives, it is necessary to select the appropriate alternation rules from several possible variants. In this case it is necessary to initiate a dialogue where the user is asked to select the suitable variant or to add a new one. To simplify the user's work, the inflection programs generate all possible variants of application of the alternation rules. Some of these words may seem strange, but this situation makes the selection easy. For example if we inflect the word dulap. The procedure suggests variants for the neuter noun, plural number, nominative case: dulapuri, dulape, (as fir fire). The user selects the suitable word dulapuri. After that, the corresponding procedure produces the other necessary inflections.

The word inflection program shows all generated forms on the screen, and the user can edit them before writing them in the vocabulary data base.

In [6] we obtained some statistical data which indicate the degree of automatization of the inflexion process. Analysing these data, it appears that 88% of the nouns and adjectives can be declined automatically and only 12% need a dialogue.

3. Romanian spelling pack

Romanian Spelling Pack consists of the following components:

We discuss these components of RomPW in more detail in further subsections.

3.1. Hyphenation function

The Hyphenation function does not depend on the orthography variant of a word and does not use vocabularies. The function has two parameters. The first parameter is a pointer to a zero-terminated string (C-string) containing a word to be hyphenated. The second parameter is the pointer to the buffer reserved for the result. Syllables in the result are separated by hyphens.

The algorithm maximally takes into consideration classical rules of word division into syllables, based on letters' phonetic significance [8]. The specific character of Romanian language does not permit to completely formalise them. The problem of diphthongs and threephthongs, which is the most difficult one in the process of word division into syllables for Romanian language, is solved only for some specific situations. Namely, when the diphthongs are at the beginning or at the end of the word and for some cases in the middle of the word.


80

Previous Index Next