Romanian Language Technology

Walter von Hahn * Machine Translation

6. Challenges for research / Future aims

- linguistics -

One of the most urgent needs of complex natural language systems is the integration of formalisms, i.e. the integration of syntactic (HPSG, ...), semantic (STUFF, ...) and discourse (DRS, ...) formalisms.

Another challenge is the integration of rule based processing and stochastic methods (stochastic driven chart parsing, see Weber). There exist a lot of stochastic methods in information systems and document processing which do not relate to linguistic approaches. These methods must be evaluated as components in linguistic processing.

There is not enough research on needs and demands for the different types of machine translation (see Carbonell). Similarly, there is not enough formal and operational research on human translation / interpreting.

Standards and benchmarks must be defined to surmount the difficulty of quality measuring in machine translation.

Every translator, especially technical translators, stress the fact, that more than linguistic knowledge is necessary for an adequate translation (Schmitt 1992). The role of knowledge representation and knowledge engineering must be redefined for MT and MAT.

Example: Japanese to English⁴

As examples let us inspect some features of Japanese, which cause difficulties when translated from or to other languages, e.g. English (see Uszkoreit 1995):

Japanese is a head-final language. Translating Japanese sentences always means to rearrange the constituents from left to right of a verb and vice versa with translations from English.
Japanese has no pronouns (zero-pronouns). Necessary pronouns must be "invented". This means rather often to determine artificially the referent, which sometimes is left ambiguously in the source text. In English texts pronouns must be omitted and the referent must be found implicitly.
Japanese has no particles in time and spatial expressions, which requires similar techniques of reconstruction.
Japanese has phrase-final particles. They must be replaced by a completely different linguistic class: punctuation.
Japanese has very flexible nominalizing techniques. This can be adopted in English syntax only in some cases. Translations into Japanese can use this feature more often.
Japanese has no difference between complement and adjunct. On the other hand you need this distinction in English for lexical choice.
The most prominent feature of Japanese is the use of honorifics. Lexical choice is determined, e.g., by social rank, sex and age of the conversing partners. Missing information about the listeners/readers of written texts may make lexical choice rather complicated. In translations into English several words and grammatical classes collapse into one. It is often difficult to decide whether honorifics have to be translated by other means (in diplomatic talks, political negotiations, etc.).
Japanese speakers like to insert metacommunicative comments, which is unusual (to this extent) in English. In fluent Japanese texts such meta-level utterances may be inserted.
Japanese allows for free topicalizing. This might cause problems in languages with a more strict grammatical word order.

- computational concepts -

Research in this field centers around the notion of translation strategies:

Which is the search space for translation equivalents: the linguistic correctness, the communicative adequacy, or a rough contents paraphrase, or a combination of these?
Which strategies are applied in cases of translation difficulties? Compare the following internal or external situations:

- one component (e.g. syntax) of the system cannot achieve a consistent result,
- the input text is ambiguous,
- the input text is structurally incorrect, or
- the input text is factually wrong.

⁴ Material from Hans Uszkoreit

124