Sanda Cherata * CONCORD - Software System for Concordances of Romanian Poetical Texts




3. Lemmatization of the text

This part of the CONCORD system uses the information in the basic databases and performs the lemmatization of all the words occurring in the treated author's works, keeping, for each word, the information relevant for the concordance:

The routine of lemmatization is conceived in such a manner that the lemmatization proceeds automatically, the user's action being necessary only to confirm the result. The lexical analysis of each word is performed using the lemmatization function of the SILEX system. In the case of multiple analysis of a word (this circumstance appears now and then, because of the homographs and because the analysis is context-free performed) the user has to choose the one validated by the context. In a small number of cases, the lemmatization information of a certain word must be hand-operated. This situation arises when the word is not automatically recognized - either because the word does not comply with the actual grammatical rules (in a poetical text this case may appear), or because the machine dictionary does not contain the information concerning that word (if it is an archaism or a dialectal word), or because it is a word in a foreign language. The CONCORD system offers all the facilities needed to input/modify the data required to generate the concordance. More than that, there is a special function which makes it possible to change, under the system's control, the concordance information already input in the databases, in a manner suitable to the user and safe as far as the data coherence and consistency are concerned.

The linguistic information input in the databases has to be validated by linguists, and the system provides the interface and the means for this activity.

The data obtained after the lemmatization of the text of a poem are organized as a database containing all the necessary information needed for the concordance.

Note: Due to the fact that the SILEX system was developed in a way to perform left-contextual lexical analysis, the result of a word lemmatization is more accurate; that is, in most of the cases of homographs, it returns no more several results, but the only one validated by the context.

So, the CONCORD system was enriched with a new function: at the user's request, the lemmatization is automatically performed, for one or several poems, no intervention being necessary. The result of the lemmatization is subsequently printed and can be checked. The necessary corrections are then performed using the appropriate function of the CONCORD system.


4. The generation of the concordance

The information prepared by the previous phases is automatically processed, in order to create the concordance. This means that, at the user's request, a concordance may be generated for the poems of a volume, or for the whole work of an author. Concordances of the works of several authors may subsequently be merged.

The generation of a concordance consists of the following steps:

The user has the opportunity to interactively consult the concordance - that is, he can specify a lemma and its category and the system will display its absolute and relative frequencies and all the contexts in which that lemma appears.


57

Previous Index Next