Luciana Peev, Lidia Bibolar, Jodal Endre * A Formalization Model of the Romanian Morphology




5. The present stage

The present volume of Romanian language lexic contains all the words in DEX. For this volume, the flexion and recognizing algorithm functions experimentally. As a first checking of the designed model, there was created a model of a spelling checker for Romanian language.

There was also created a product that could constitute the basis for a system of learning the flexion in Romanian, computer assisted. The present information is suitable and sufficient for other special applications like automatic indexing, interactive bilingual dictionaries and it could lead to the successful approach of computer assisted translation.

The primary basis is going to be further improved with the attributes necessary to a syntactical analysis and it stays open to all the attributes necessary in other domains of interest for linguistics.

ORTOGRAF is a software for spelling check. It can be seen as an independent unit that can be integrated in different text processors. It is the first component of a complete linguistic package, which will be enriched with a software for syntactic analysis and a dictionary of synonyms. In this way, it will be similar to the linguistic packages edited by other firms for other languages, destined to improve the quality of the texts elaborated by means of text processors. The module includes a special function for automatic hyphenation. The software is based on its own dictionary of over 60,000 roots, covering more than 2,000,000 words, taking into account the strong flexionary character of the Romanian language.

The module of spelling check contains two important functions:

1) The spelling check function receives at its entry a word from the text that has to be analyzed, at its exit gives the answer about the correctness of the word. Its effective recognition algorithm is based on finding the grammatical components of the word; it also checks whether the root exists in the dictionary, as well as of the other components (prefixes, suffixes and endings) in its internal schemes. The final answer is determined by means of checking the coherence of the coining, based on the specific grammar rules of Romanian. For a wrong or unknown word the module offers (if possible) a list o suggestions of available correct items, also taking into account the possible typing errors.

The way of correcting deals well enough with the hyphen constructions, which are specific for the Romanian language.

Orthographic analysis can be also controlled by means of certain user dictionaries. The most common dictionary is destined to enrich the vocabulary with nondeclined words (proper names, denominations, neologisms etc.) Another dictionary that may help to recognize certain flexionary words that are momentarily absent from its own dictionaries, based on certain grammatical similarities with already existing words. The dictionary of suggestions serves to improve the implicit correction. Some words, otherwise correct, but which are not wanted in the current context of editing, can be placed into an exclusion dictionary. These user dictionaries are simple (ASCII) text files that can be edited by any editor, the only condition being that the text should be ASCII.

2) The hyphenation function receives at its entry the word that should be hyphenated and answers by giving the row of syllables determined by the specific rules of hyphenation. This function can be enriched too by means of a user dictionary, where one can find the exceptions from the usual hyphenation rules.

ORTOGRAF is independent of the fonts used, but it implicitly presupposes the usage of the internal codes existing in WinCP 1250, which is the acknowledged norm for the Eastern Europe languages.

At the present moment, the product is integrated in text processors of the OFFICE package of Microsoft, so it functions on its 16 bites for Word 6.0, respectively 32 bites for Word 7.0. After its installation, the product signals its existence by the appearance of two other options in the list of languages to be selected: ROMANIAN for the new syntax recommended by the Romanian Academy and ROMANIAN (OLD) for the previous syntax. The selected language should be attached to the section of text chosen for analysis. The effective analysis is begun by the specific command of the editor and starts from the current point where it is inserted in the text. For unknown words, a dialogue box appears and shows the specific information of the editor. Analysis can be restricted to small portions of the text, its inferior limit being the word. Any errors signaled in the text can be interactively corrected by activation of the editing surface, and the analysis can be continued from the point indicated by the current position of the prompter.

The product can be integrated to any editing system. It is necessary to have only the specific interfaces, as well as the installation procedures by means of which the software can become an internal function of the processors. Its need of disk capacity is about 1MB.


76

Previous Index Next