Wolfgang Teubert * Language Resources for Language Technology


Validation is important for generic resources designed for reusability. These resources will always be necessary for language technology. However, the more sophisticated applications are being designed, the more specific the language resources backing them will have to be. These resources will either have to be produced from scratch or by adding value to existing generic resources. In either case, specific resources of this kind will probably not be distributed by large centers, but produced by one institution and then be passed on directly to the user on a bilateral agreement. This trend will lead to a reassessment of the validation issue in the long run.

5. The Trans-European Language Resources Infrastructure (TELRI)

The boom in language industry has brought with it a growing demand for more and better monolingual and multilingual resources. In Europe, only a joint effort of existing focal language institutions could be expected to harmonize existing and designed standardized new resources in compliance with the needs of dictionary makers and developers of language technology applications.

The European Commission, realizing the central role of language engineering in the emergent information and communication technology market, has supported a number of relevant infrastructure activities, serving the needs of the three 'colleges': speech (spoken language), terminology, and written resources. These projects helped to set up a common infrastructure for the countries of the European Union and the European Economic Area (formerly EFTA), and at the same time encouraged formation of national language resources networks. After years of preparation, the new PAROLE II project with partners from all European Union countries will produce a first generation of harmonized, comparable generic textual and lexical reusable resources, meeting the basic demands of language technology.

But Europe is larger than the European Union. All European countries must be given the opportunity to participate on an equal level in academic and industrial research and development. In the COPERNICUS Programme, the European Commission provided a framework of projects aiming at the integration of activities in Central and Eastern Europe with complementary ones in Western Europe. Several projects currently underway deal with various aspects of speech, terminology, and written resources. One of these projects dealing primarily with written resources is the Trains-European Language Resources Infrastructure (TELRI) [8]. Since it aims at including as many partners in Central and Eastern Europe as possible, it is set up as a Concerted Action rather than as a project proper. Its partners are 22 focal language and language technology institutions in 17 countries, six Western European and 11 Central and Eastern European countries. The partners in the West are also cooperating in the PAROLE projects, thus linking closely TELRI activities with Western European developments. For the time being, there are no partners from former Yugoslavia (with the exception of Slovenia) or from the Commonwealth of Independent States. However, formalized links have been established with the leading institutions in Croatia, Serbia, and Russia; and these associated partners are participating in TELRI activities.

The Concerted Action TELRI has an initial duration of three years, beginning in early 1995, and is working on a budget of about half a million ECU. It is not a research project: rather, its goal is to create a viable infrastructure in order to establish a permanent platform for industry, research institutes and universities, and to supply the NLP community with precompetitive or public domain monolingual and multilingual language resources. These resources are: corpora, machine readable dictionaries and lexicons, lexical data bases, and software tools for the creation, reuse, maintenance, valorization, and exploitation of linguistic data.

The activities of TELRI are organized in Working Groups for specific tasks. The collection, documentation, and dissemination of relevant information on language resources, providers and users, their potentials, and their needs is a basic activity. TELRI will promote the formation of national language resource networks, and TELRI partners will act as focal nodes. They will also design small scale joint ventures with private industry in order to foster cooperation between academic research and development. TELRI will pool and enhance existing service activities, providing resources, expertise, consulting and training facilities. The central platform will be annual seminars directed at the needs of small- and medium-sized enterprises. TELRI will engage in European and global standardization and validation activities and contribute to the harmonization of already existing resources. It is organizing joint research in the field of corpus-based multilingual lexicography and the use parallel aligned texts.


27

Previous Index Next