Research Institute for Artificial Intelligence and Institute of Computer Science have been collaborating, within a priority project of the Romanian Academy, on the creation of a reference electronic corpus of contemporary Romanian language, i.e., a collection of (written and spoken) texts of great dimension (hundreds of million of word forms), annotated with metadata (date, author, etc.) and with linguistic data (part of speech, grammatical categories, syntactic dependencies, etc.).

Due to the texts naturalness and to the annotation it will contain, the corpus will be useful to linguists (for describing various language aspects), to lexicographers (for creating general and special dictionaries), developers of applications based on natural language (for which corpora offer training, learning and testing material), to those learning Romanian as a foreign language (a corpus offers real examples of possible contexts for a target word or word form, of examples of relations established by the target word with other words, etc.), to the Romanian teachers (both in the process of teaching and in that of evaluating students).


