Language Resources for Language Technology
1. Language resources for language technology: academia meets industry
Language engineering is the core of information technology, and
information technology will be the key industry of the 21st century.
The information super highways conceived today will soon transport
infinite amounts of digital data, images, sounds, tables, figures,
calculations, and process protocols. If these data are to be intelligible,
if they are to make sense, they must be bound together by language.
Without natural language processing (NLP), information remains
incomprehensible.
More than any other continent,
Europe is multilingual by common commitment. This situation provides
a challenge to European language technology. We all want information
to cross borders freely; however, countries can only uphold their
cultural and linguistic identity if all the relevant information
is available and accessible in the national language(s). This
is an important principle of the European Union today, and it
also holds for all European nations who have not yet joined the
European Community. For the emergent European information society,
we have to develop a language technology that meets the multilingual
challenge. If this challenge is met, European language engineering
will play a leading role on the global market.
Language resources are the raw
material of all language technology. The better they are, the
more expensive is their creation. Language industry, small- and
medium-sized enterprises in particular, often cannot afford to
build them up. On the other hand, in practically all European
countries there are focal language centers with a long tradition
in the creation (and also in the application) of language resources.
This is not only true for all of Western Europe, but also for
most of Central and Eastern Europe including the former Soviet
Union. In its short history, computational linguistics always
has been a global discipline; and the NLP community was and continues
to be well connected.
However, while the results of academic research traveled freely
(with a few clandestine pockets here and there), language resources,
corpora, and machine readable dictionaries did not flow as easily.
Due to the lack of hardware compatibility, the old restrictions
now often have given way to new limitations based on property
rights. Solutions only can be found in joint efforts. We need
a European network of all academic research institutions to allow
the free flow of language resources between all partners under
fair conditions.
A research project, however, is
only the first step. In a second step, we have to set up an operational
infrastructure of (public domain) research and (private) industry.
We need a common platform where providers and users of language
resources come together, share expertise, discuss their needs,
exchange resources, join forces, and give birth to new visions.
Private industry will ensure that new language technology applications
find a market (and the money invested will not be wasted), and
public domain research will provide the linguistic expertise to
make the products a success.
In some European countries, such
a national infrastructure already exists; in others, it is gradually
evolving. Still most of the work is devoted to monolingual applications.
Until some years ago, there was not much cross-border cooperation,
at least not in academic circles. This is why repeated efforts
in Western Europe have been made to set up a transnational infrastructure
that can serve the needs of multilingual language technology applications.
In March 1995, the European Language Resources Association (ELRA)
was founded with strong backing by the Commission of the European
Community. It will be seen in the years to come which additional
measures are necessary to secure Europe's role as a leading actor
in the global language engineering market. This future Europe
will be larger than the European Union of today. Linguistic expertise,
language resources, and computational linguistics are highly developed
in the countries of the former Soviet Union and in Central and
Eastern Europe. We can observe the emergence of a dynamic, if
still small, language industry in these locations. If we want
to make Europe a competitor on the world market of language industry,
we must build up a common infrastructure all the way from Galway
at the West Coast of Ireland to Vladivostok at the Sea of Japan.
20