Sanda Cherata * CONCORD - Software System for Concordances of Romanian Poetical Texts
The routine of lemmatization is
conceived in such a manner that the lemmatization proceeds automatically,
the user's action being necessary only to confirm the result.
The lexical analysis of each word is performed using the lemmatization
function of the SILEX system. In the case of multiple analysis
of a word (this circumstance appears now and then, because of
the homographs and because the analysis is context-free performed)
the user has to choose the one validated by the context. In
a small number of cases, the lemmatization information of a certain
word must be hand-operated. This situation arises when the word
is not automatically recognized - either because the word does
not comply with the actual grammatical rules (in a poetical text
this case may appear), or because the machine dictionary does
not contain the information concerning that word (if it is an
archaism or a dialectal word), or because it is a word in a foreign
language. The CONCORD system offers all the facilities
needed to input/modify the data required to generate the concordance.
More than that, there is a special function which makes it possible
to change, under the system's control, the concordance information
already input in the databases, in a manner suitable to the user
and safe as far as the data coherence and consistency are concerned.
The linguistic information input
in the databases has to be validated by linguists, and the system
provides the interface and the means for this activity.
The data obtained after the lemmatization
of the text of a poem are organized as a database containing all
the necessary information needed for the concordance.
Note:
Due to the fact that the SILEX
system was developed in a way to perform left-contextual lexical
analysis, the result of a word lemmatization is more accurate;
that is, in most of the cases of homographs, it returns no more
several results, but the only one validated by the context.
So, the CONCORD system
was enriched with a new function: at the user's request, the lemmatization
is automatically performed, for one or several poems, no intervention
being necessary. The result of the lemmatization is subsequently
printed and can be checked. The necessary corrections are then
performed using the appropriate function of the CONCORD
system.
The generation of a concordance consists of the following steps:
The user has the opportunity to
interactively consult the concordance - that is, he can specify
a lemma and its category and the system will display its absolute
and relative frequencies and all the contexts in which that lemma
appears.
57
4. The generation of the concordance