Dan Tufiº, Ana-Maria Barbu * A Reversible and Reusable Morpho-Lexical Description of Romanian
As RACAI has largely used this
environment for the development of the Romanian morphology and
of a large lexicon (see Chapter 3) meant for the public domain,
we have decided to make another port, this time using a public
domain LISP. Our choice was CMU-LISP, running under SOLARIS, probably
the best freeware implementation of Common-Lisp. The decision
was supported not only by the efficiency of this CL implementation,
but also by the API facilities offered by CMU-CL allowing almost
null-cost future ports on other platforms (HP, SGI, Linux, etc.).
Full access to X graphical functions which CMU-CL gives has been
another very attractive feature of this environment (we plan to
add a graphical interface and a feature-structure browser). This
new port, which we named EGLU
(Environnment Generique Linguistique
d'Unification) maintains full compatibility with the initial ELU,
but due to the conditional compilation special forms, its code,
as it is, compiles and runs under MCL (MacOS), AllegroCL (SunOS)
and CMU-CL (SOLARIS). It takes better advantage of the operating
system (for instance a command not recognized by the interpreter
is sent to an operating system shell and, if the return code is
an error-code, then a complaint is addressed to the user). Building
applications and patching have been optimised (with appropriate
conditional compilation forms) for the CMU-CL. For instance, initially,
a dumped application needed 21 Mb of hard disk. After filtering
out the unnecessary code for the execution of EGLU applications,
the hard disk requirement decreased to 6Mb. Beside the ISO-Latin1
character set existing in the initial ELU implementation, ISO-Latin2
(covering Romanian diacritics) was incorporated.
Although not discussed here, EGLU
has been used for developing a comprehensive grammar of the noun
phrase in Romanian. This implementation purposely avoided generalizations
which would have committed it to a specific linguistic theory.
Therefore, EGLU is rather a gloss over the most frequent NP structures,
than an efficient description. However, being operational it can
serve as a testbed for future HPSG-based encoding of a Romanian
computational grammar.
3.1. Encoding Romanian morphology
One of the assumptions in the
computational linguistics of the 1980s was that the procedural
formalization is the most appropriate one for describing the phonological,
morphological and orthographic levels. This opinion spread thanks
to the influential work of Koskenniemi and his colleagues on the
two-level phono-morphological model. On the other hand, the generalisation
of unification-based formalisms, advocating descriptional uniformity
and adequacy, declarativeness, reversibility and reusability,
has naturally led to new modelling alternatives, based on feature-structure
theories. The feature-structure paradigmatic morphology, defined
at the end of the 1980s [9,10,11,12,13],
tends to generalise because,
beside its upper noted advantages, it blurs the distinction between
the inflexional morphology and the derivational one, as far as
necessary processing tools are concerned. It is worth mentioning
another characteristic of the paradigmatic morphology, deriving
from the underlying unification theory: the combination of partial
information provided by different knowledge sources (this is relevant
for the morpho-lexical acquisition process, as well as for the
natural language parsing and generation process) is strictly monotonic.
84
3. Encoding linguistic knowledge in EGLU
1 In estimating this number, we counted homographs and homonyms as
individual word-forms; for instance, the string "vin" counts as three word-forms:
wine, (to) comeI and comethey.