Teodor Vuºcan, Emma Tamâianu, Sanda Cherata * SILEX - a Lexico-Morphological Software for Romanian




The common attributes of all entries are:

  1. the lemma;
  2. the lexico-grammatical class (morpho-lexical category) of the lemma; the values of this attribute correspond to the traditional classification and entail the specific attributes of each class;
  3. the root/roots of the paradigm; the dictionary entries are ordered according to the values of this attribute;

    For inflectional words, some additional attributes are specified:

  4. the code of the list of endings attached to the paradigm (or to a subset of the paradigm);
  5. the paradigm (or the sub-paradigm) corresponding to the root; if there is a single root for the whole paradigm ( for instance, in the case of the word cas ) then the value of this attribute is total, otherwise the sub-paradigm to which the root belongs is specified (in a coded manner);
  6. for each lexico-grammatical class, a set of specific attributes are given (for instance, the gender for nouns; the type of pronouns: personal, relative etc.; the type of verbs: predicative, auxiliary, copula etc).

Optimization of the dictionary entries

In order to reduce the number of dictionary entries, without reducing the number of the words that the system is able to recognise, we decided not to introduce in the dictionary, as separate entries, the following categories of words:

  1. verb participles, including the adjective-participles ( ex. elev citit); in this way, approximately 5000 entries are saved;
  2. nouns derived from the long infinitive; in this way, 5000 more entries are saved;
  3. nouns and adjectives derived from a verbal root with the suffix -tor (ex. muncitor, muncitoare); thus, approximately 7000 entries are saved;
  4. nouns homographic to adjectives (ex.: calmant, diagonal , tonic);
  5. nouns, adjectives and verbs derived from a verbal root with the prefixes -ne and -re (ex. a rescrie, necitit, recitit);
  6. adjectives derived from other adjectives with the prefix -ne (ex. demn - nedemn).

In addition, for the mobile nouns there is a single entry in the dictionary, the one corresponding to the masculine gender (ex. for the pair elev/elev, only the entry corresponding to the word elev is included in the dictionary ).

Words that do not have their own entry in the dictionary are recognized by inflectional algorithms. Besides the economy of memory space, this solution also has the merit of reflecting more closely the dynamism of lexical derivability.

2.2. The lists of word endings

Besides the internal dictionary, SILEX contains, for each inflectional lexico-grammatical class, a database with the lists of word endings belonging to that class. Thus, there are databases with the lists of endings for: nouns, adjectives, pronouns and verbs. The lists of endings are referred to by a number associated to each list. The information in the SILEX dictionary, together with that contained in the lists of endings, make possible the recognition and the inflection of Romanian words, as well as a large number of lexical derivations.

2.3. The auxiliary dictionaries

The SILEX system has to meet the following main requirements:

To meet these requirements, SILEX has some auxiliary dictionaries, external and independent of any specific application. These dictionaries are used to produce automatically, for each application, the specific and most appropriate internal dictionary. An internal dictionary is active only during the execution of the application that uses it.

When defining the structures of the auxiliary dictionaries, there are no space and/or access time constraints. These structures are defined so that:

The auxiliary dictionaries are structured according to morphological classes; each class has its corresponding auxiliary dictionary. There are automatic procedures to input new words in the dictionaries; these procedures ensure the correctness, consistence and coherence of both the data input by the user and the dictionaries themselves.


65

Previous Index Next