Poul Andersen * Cooperation with Central and Eastern Europe in Language Engineering
The project ONOMASTICA is building
a pronunciation lexicon for the names of the European Union. These
are city and town names, street names, family names, product names
in a total of 11 languages - Danish, Dutch, English, French,
German, Greek, Italian, Norwegian, Portuguese, Spanish and Swedish.
The goal for ONOMASTICA-Copernicus
is to extend the languages covered in the database to include
names and pronunciations for Czech, Estonian, Latvian, Polish,
Romanian, Slovakian, Slovenian and Ukrainian.
Pronunciation dictionaries for
up to 250,000 names per language will be constructed.
The objective of the project
is to make available quality controlled pronunciation lexicons
in machine readable form (CD-ROM) for use in automated language
systems, of primary interest to international European companies
in the telecommunications sector and in the (dictionary) publishing
industry, as well as to language system researchers and developers.
The Romanian partner
in ONOMASTICA-Copernicus is Prof. Cezar Tabarcea, Faculty of
Arts, University of Bucharest.
MULTEXT-EAST is a spin-off of
MULTEXT, one of the largest EU projects in the domain of language
tools and resources. MULTEXT has three main objectives:
MULTEXT-EAST extends the scope
of MULTEXT to CEE countries, and together the two projects create
a network of more than 20 academic research centres and companies,
developing and using common lingware and methodologies, as well
as producing the first annotated large-scale multilingual corpus
for 12 EU and CEE languages.
East European languages
covered: Bulgarian, Czech, Estonian, Hungarian, Romanian,
Slovenian.
The Romanian partner
in MULTEXT-EAST is the same as in TELRI (see above).
The concrete and practical outcome of PRACTEAST is the compilation
of four terminological collections in such priority domains as
Economics and Management, Energy, Environment and Telecommunications,
each collection containing the 2,000 most common terms in the
field with English terms and definitions and French and Spanish
equivalent terms.
Each partner will be able to use
the Multilingual Database for its own research purposes. The corresponding
conventional paper dictionaries can be published for use by the
public, and each associated partner can benefit from the possible
sales of such dictionaries.
East European languages
covered: Bulgarian, Czech, Estonian, Hungarian, Latvian, Lithuanian,
Polish, Romanian, Russian, Slovakian and Ukrainian.
The Romanian partners
are Marius Sala, Institute
of Linguistics (Romanian Academy of Sciences),
and Mihail Ciocodeica, Romanian Standards Institute.
16