Romanian Language Technology

Svetlana Cojocaru * Romanian Lexicon: Tools, Implementation, Usage

2.1. Romanian word inflection: static method

Let us consider a scattered context grammar rule:

[/]^*[N₁]a₁¬b₁a₂ ... a_n-1¬b_n-1a_n -> a^'₁¬b₁a^'₂ ...a'_n-1¬b_n-1a^'_n[N₂]

where a_i, a^'_i are arbitrary words and either b_i is a nonempty word or the special symbol * stands instead of b_i. N_j are the endings set numbers. The interpretation of this rule is as follows.

Let w be a lemma. Every sign / indicates cutting the last letter from w. Word v (obtained after the deletions) is considered as a root (if N₁ is not empty) and N₁ is its index in the endings set list L. In any case the word v should have the form

f₀a₁f₁a₂f₂ ... a_n-1f_n-1a_nf_n,

where every f_i is an arbitrary (possible empty) word, not containing (for i=1,2,...,n-1) the veto subword b_i. If there exists more than one representation of this kind, the first (scanning v from the left to the right or vice versa if the sign # is present) should be selected. The special character * instead of b_i admits arbitrary f_i.

In this context the parallel substitution

a₁,a₂, ... ,a_n -> a^'₁,a^'₂, ... ,a^'_n,

is produced, generating a new root v^' = f₀a^'₁f₁a^'₂f₂ ... a^'_n-1f_n-1a^'_nf_n and N₂ is its endings set number.

So, in order to generate word-forms knowing the lemma's group number, it is sufficient to interpret the corresponding grammar rules. According to the classification in [4], it is possible to build the grammar rules for every group. Sometimes more than two roots arise and more than one grammar rule is necessary. The experiment showed that 866 grammar rules and 320 ending sets were sufficient to achieve vocabulary decomposition for the Romanian language.

2.2. Romanian words inflection: dynamic method

To inflect one word it is necessary to know

the vowel and consonant alternations,
the rule alternations application context,
the affix series.

The affix series tables, the alternation sets and their admissible combinations [3,5,6,7] form the base of inflexion programs. We will consider these processes for each grammar category.

2.2.1. The Verb

The conjugation schemes of verbs are determined in [3,5]. The main models are reduced to 56 conjugation schemes. Based on these schemes we can determine the corresponding one for a given verb. For this purpose we introduce two simple definitions.

Definition 1. The verb root is a word R=Substr(S,1,Length(S)-k), where k=2 for the verbs of the second conjugation, k=1 for other cases.

Definition 2. The preaffix of root R with the length n (nLength(R)) is a set P(R,n)=Substr(R,Length(R)-(n-1),n).

Using these notions we can determine the conjugation schemes. For example:

The verb is conjugated in accordance with the model of verb "a cãpãta", if: Length(R)>3 & (R[Length(R)-1]= "ã") & (R[Length(R)-3]= "ã") or (R[Length(R)-4]= "ã").
One can easy verify that this condition is satisfied with verbs such as "a cãþãra", "a scãrmãna", "a scãpãra", etc.
In accordance with the model of verb "a afla", the verbs which satisfy the condition: P(R,2)C{"fl", "cr", "pl", "nu", "tr", "bl", "tu"} are conjugated. The verbs "a insufla", "a consacra" belong to this set. An exception is the verb "a lãtra".
The model of verb "a apãsa" satisfies the condition: P(R,3)C{"pãs", "fãs", "fãt", "bãr, "mãt", "bãt", "vãþ", "fãþ", "ãrs", "pãl", "sãl"}.
For example, the verbs "a învãþa", "a nfãºa", "a vãrsa", etc. are conjugated in accordance with this model.

Such rules of conjugation were formulated for all schemes of verb conjugation. Of course, there are exceptions, which are updated separately, such as auxiliary, defective verbs etc.

The full information needed in order to include a verb into one grammatical group or another is determined in procedural way, from the verb ending in the infinitive, but for the verbs of grammatical groups I and IV the information is necessary whether the verb conjugates with or without a suffix. A highly relevant piece of information is whether the verb is marked [±personal]. The function depending on the answer generates 40 or 12 flexions respectively. The answers are obtained from the dialogue between user and system. Once this information provided, the inflection process is carried on automatically.

Consequently we concluded that the formalization of cases and subcases is a rather difficult problem; moreover, if we consider the development and extension of a natural language, the usage of different forms with the same grammatical categoriy (for example: comenzi and comanzi), then it is clear that the results of the program's work must be examined and corrected if necessary by the user and only after that the flexions may be included into a dictionary.