Dan Tufiº, Ana-Maria Barbu * A Reversible and Reusable Morpho-Lexical Description of Romanian
A lexical entry is a complex feature structure, the encoding of which is slightly different for the two formats we used. In the FAVR format (see Figure 6), besides a flat attribute-value structure, we used some GPSG flavoured attributes.
[ROOT:abandon V:+ N:- BAR:0 AUX:- IMPERS:- PRDM: verb21 PRD:@ FIN:@ VFORM:@ TENSE:@ PERS:@ PLU:@ [VOICE:{active reflexive} [SUBCAT:<np_nom np_acc> SEM:<PRED:abandona AGENT:np_nom PACIENT:np_acc>] [SUBCAT:<np_nom np_acc np_dat> SEM:<PRED:abandona AGENT:np_nom PACIENT:np_acc BENEFICIARY:np_dat>]] [VOICE:passive [SUBCAT:<np_nom pp_de> SEM:<PRED:abandona AGENT:pp_de PACIENT:np_nom>] [SUBCAT:<np_nom np_dat pp_de> SEM:<PRED:abandona AGENT:pp_de PACIENT:np_nom BENEFICIARY:np_dat>]]]
Fig. 6. - Lexical entry example in FAVR format.
The EGLU encoding, described in the rest of this section, uses a minimal embedding, which, by means of relational abstractions (for a discussion of expressive power of the relational abstractions as well as for an overall description of EGLU, see [17]), is hidden to the lexicographer. Physically, in our EGLU implementation, the description of the lexical stock is spread over several files, and each one of them encodes a certain type of information relevant to the Romanian language morpho-lexical description. By compiling these separate descriptions, the coreferential information is unified and the result is associated with the headword of the corresponding lexical entry. The words (in fact, stems and lemmas) belonging to the inflecting categories (noun, verb, adjective, indefinite pronoun and adjective, demonstrative pronoun, demonstrative adjective, relative/interrogative pronoun, article and numeral) are described each in two files: the former contains the morphological information and the latter the lexical information, associated with the respective lexicon entry. For the noninflectional grammatical categories it is necessary only to specify the lemma forms and the lexical information attached to these forms. A very simple (but quite inefficient) solution has been adopted for prefixes, i.e. to define for each prefix the continuation lexicons (that is what categories of words it can adjoin to) and to specify, at the level of each lexical entry, which prefixes it can take.
The master file of the lexical stock definition has the following structure (note inclusion of several files):
# Language romanian ############# ## Including general settings ############# # include header # include helpers # include display # include lookup ############# ## Including affixes paradigms ############# # include prefixes # include romanian_morphology ############# ## Including lexical relational abstractions ############# # include romanian_lexis ############# ## Including nominal, adjectival, verbal... stems associated ## with their corresponding inflectional description ############# # include noun_morphological_entries # include adjective_morphological_entries # include verb_morphological_entries ... ############# ## Including nominal, adjectival, verbal... lemmas associated ## with their corresponding lexical descriptions #############
# include noun_lexical_entries # include adjective_lexical_entries # include verb_lexical_entries ...
88