The morphological inflexion is
a necessary part in creating computational lexicons. Some ideas
to solve these problems for the Romanian language are described
in [1,2,3]. The static method, described in
[1,2], proceeds from
the knowledge of the base word and the inflexional group in correspondence
to the classification giving in [4].
The dynamic method [3] results
from the base word and the morphological category (the part of
speech, the gender for nouns etc.). We discuss them in Section 2.
One of the problems with natural
language processing software is how to integrate in various environments
and how to develop an application for a specific platform. We
propose the Romanian Spelling Pack (RomPW) which is represented
under Windows by several DLLs (dynamic link libraries).
The Hyphenation function, the
Checking function, the function of Romanian words inflection,
the Vocabulary support function are compound parts of the Romanian
Spelling Pack. The components of this Pack are described in Section 3.
We will show its integration into
MS Word 6.0 as the word processing environment (Romanian Spelling
Checker) in Section 4.
RomPW is a developing system, and the perspectives
of its development are presented in the final Section 5.
Using the definition of binary
decomposition specified in [1,2], there are
various ways for constructing
such decompositions: it is quite possible that R=V
and all endings are empty words, or vice versa, when there is
a single root (empty word), but all the elements of V
serve as endings. If V is the vocabulary of word-forms
for a language, there is some hope that taking a natural decomposition
into E and R the above method lead to a reasonable
map. It means that list L of all the possible values of
subsets f(r) would not be so large (as compared to the size of
V). In this case it would be sufficient to keep for every
root r only the index of its subset f(r) in a list L; thus the
necessary memory for the vocabulary would consist of two main
parts: memory for root set R (plus memory for index
for every root) and memory for the list L of possible sets
of endings.
The starting point for this approach was book
[4], where most
of Romanian inflective words were classified according to the
methods of flexion creation. There were 100 groups of masculine
nouns, 273 of verbs etc. in the book, and about 30,000 words with
their group numbers were listed. The classification was made
from the linguistic point of view, and, for example, the accents
were taken into account. Nevertheless, this classification was
useful and led to the idea of introducing the special grammar
formalising word-forms production. We present it in subsection
2.1. Using these grammar rules, we can formalise the process of
creation of the decomposed vocabulary.
The above method of decomposition is based on the knowledge about
the morphological group of a given word. Nevertheless it is necessary
to have the possibility to include a new set of word-forms for
the given item without this knowledge. We need to detect the group
number dynamically.
First of all the word-forms themselves
should be obtained. A special program to facilitate this boring
work is presented in subsection 2.2.
78
2. Romanian words inflection