Svetlana Cojocaru * Romanian Lexicon: Tools, Implementation, Usage
Let us consider a scattered context grammar rule:
[/]*[N1]a1¬b1a2 ... an-1¬bn-1an -> a'1¬b1a'2 ...a'n-1¬bn-1a'n[N2]
where ai, a'i are arbitrary words and either bi is a nonempty word or the special symbol * stands instead of bi. Nj are the endings set numbers. The interpretation of this rule is as follows.
Let w be a lemma. Every sign / indicates cutting the last letter from w. Word v (obtained after the deletions) is considered as a root (if N1 is not empty) and N1 is its index in the endings set list L. In any case the word v should have the form
f0a1f1a2f2 ... an-1fn-1anfn,
where every fi is an arbitrary (possible empty) word, not containing (for i=1,2,...,n-1) the veto subword bi. If there exists more than one representation of this kind, the first (scanning v from the left to the right or vice versa if the sign # is present) should be selected. The special character * instead of bi admits arbitrary fi.
In this context the parallel substitution
a1,a2, ... ,an -> a'1,a'2, ... ,a'n,
is produced, generating a new root v' = f0a'1f1a'2f2 ... a'n-1fn-1a'nfn and N2 is its endings set number.
So, in order to generate word-forms knowing the lemma's group number, it is sufficient to interpret the corresponding grammar rules. According to the classification in [4], it is possible to build the grammar rules for every group. Sometimes more than two roots arise and more than one grammar rule is necessary. The experiment showed that 866 grammar rules and 320 ending sets were sufficient to achieve vocabulary decomposition for the Romanian language.
2.2. Romanian words inflection: dynamic method
To inflect one word it is necessary to know
The affix series tables, the alternation sets and their admissible combinations [3,5,6,7] form the base of inflexion programs. We will consider these processes for each grammar category.
2.2.1. The Verb
The conjugation schemes of verbs are determined in [3,5]. The main models are reduced to 56 conjugation schemes. Based on these schemes we can determine the corresponding one for a given verb. For this purpose we introduce two simple definitions.
Definition 1. The verb root is a word R=Substr(S,1,Length(S)-k), where k=2 for the verbs of the second conjugation, k=1 for other cases.
Definition 2. The preaffix of root R with the length n (nLength(R)) is a set P(R,n)=Substr(R,Length(R)-(n-1),n).
Using these notions we can determine the conjugation schemes. For example:
Such rules of conjugation were formulated for all schemes of verb conjugation. Of course, there are exceptions, which are updated separately, such as auxiliary, defective verbs etc.
The full information needed in order to include a verb into one grammatical group or another is determined in procedural way, from the verb ending in the infinitive, but for the verbs of grammatical groups I and IV the information is necessary whether the verb conjugates with or without a suffix. A highly relevant piece of information is whether the verb is marked [±personal]. The function depending on the answer generates 40 or 12 flexions respectively. The answers are obtained from the dialogue between user and system. Once this information provided, the inflection process is carried on automatically.
Consequently we concluded that
the formalization of cases and subcases is a rather difficult
problem; moreover, if we consider the development and extension
of a natural language, the usage of different forms with
the same grammatical categoriy (for example: comenzi
and comanzi), then it is clear that the results of
the program's work must be examined and corrected if necessary
by the user and only after that the flexions may be included into
a dictionary.
79