Dan Tufiº, Ana-Maria Barbu * A Reversible and Reusable Morpho-Lexical Description of Romanian
The feature-structure for a word-occurrence, that is the output of the morpho-lexical analysis, or the input expected by the morpho-lexical generator, will contain information congruently provided by the heading, the stem and the ending. The congruency of this information is ensured through feature-structure unification.
As previously mentioned, in the EGLU implementation, the Romanian lexicon is a finite state automaton where the states contain information (equations and relational abstractions) and the transitions are labelled with strings of characters corresponding to word segments. Within a state, successor states are indicated with the special character '$'. A lexicon is thus distributed among a number of 'sublexicons', some of which will be starting points for the analysis and generation, and some others will be 'continuation classes'.
Having analysed the specific attributes of each part of speech, for the morpho-syntactic description of Romanian, we have used 20 grammatical categories. The classification took into account not only the requirements for morpho-lexical processing, but also the granularity needed by the syntactic parsing and generation, respectively. The grammatical and derivative morphology of Romanian is specified by means of several global paradigms, each of them being a combination of partial paradigms.
The verb morphology is encoded by means of 48 global grammatical paradigms, three of them being specific to the auxiliaries (see Figure 2). A global paradigm is made up of several partial paradigms, each corresponding to simple moods and tenses (there are 107 such partial paradigms). Figure 3 shows the encoding of one of the six partial paradigms corresponding to the indicative past-perfect. Beside grammatical verbal paradigms (exhaustively encoded), we considered the most frequent and productive lexical paradigms - which were 27 - attaching lexical suffixes to the verbal stems. These suffixes change the grammatical category of the verb by yielding nouns, adjectives and adverbs. Figure 4 shows such a lexical paradigm that allows for deriving a noun from a verbal stem.
# paradigm verb1 | # paradigm indic_mmcperf_1 | ||||||||||||||||||||||||||||||||||||
-!Verb !type(main)
| - | !VTensed(past-perfect,indicative)
| $indic_prez_1
| asem v {+past} !my_Vagr(singular,1,_)
| $indic_imperf_1
| aseºi v {+past} !my_Vagr(singular,2,_)
| $indic_perfsim_1
| ase v {+past} !my_Vagr(singular,3,_)
| $indic_mmcperf_1
| aserãm v {+past} !my_Vagr(plural,1,_)
| $conj_prez_1
| aserãþi v {+past} !my_Vagr(plural,2,_)
| $imper_prez_1
| aserã v {+past} !my_Vagr(plural,3,_)
| $infin_prez_1
| $part_1
| $gerund_2
|
Fig. 2. - Global verbal paradigm.
| Fig. 3. - Partial verbal paradigm.
| |
# paradigm verb_suf4 | |||
!Encl(_,imperson) !common !rol(denom) | |||
ar | n | {+a}{-b} | $nom_fem8 |
ãr | n | {-a}{+b} | $nom_fem8 |
Fig. 4. - A lexical paradigm for verbal stems. |
86