Text-to-SpeechSynthesis for Romanian Language: Present and Future Trends

Corneliu Burileanu, Eugeniu Oancea, Mihai Radu, Dragoº Burileanu, Jãnel Arhip, Laurenþiu Vasilescu

1. Introduction

Speech communication is one of the essential capabilities of human beings. Speech may be considered the most important method for exchanging information among people. Although written language is effective for sharing knowledge and lasts longer than spoken language if properly preserved, the amount of information exchanged by speech is considerably larger.

The speech processing technologies obtained significant results in the last decades. Progress in these areas has been made possible by advances in acoustic and linguistic theory, mathematical modelling of speech production and perception, signal processing, structured programming, VLSI technology and hardware design. As computer becomes more prevalent, the need for communication between man and machine is increasing and there is a visible tendency for computing machines to accommodate the characteristics of their human users. Obviously, one of these characteristics is the voice and for this reason the process of producing artificially speech and especially text-to-speech (TTS) conversion, is today a main objective of speech processing and subject of intensive research [1, 2, 3, 4].

A text-to-speech system may offer a wide range of applications in a number of fields, from accessing electronic mail and various kinds of databases by voice to reading for the blind people. It is important to observe that voice response technology for synthesised speech presents several advantages for information transmission:

anybody can easily understand the message without training or intense concentration;
the message can be received even when the listener is involved in other activities, such as walking, handling an object or looking at something;
the conventional telephone network can be used to realise easily and fast remote access to information;
this form of messaging is essentially a paper-free communication form.

All of these factors and the various applications asked by the industry have been driving forces for research on TTS systems, resulting in currently available commercial systems that can produce speech of acceptable intelligibility from unrestricted input text.

One of the issues to be addressed for TTS systems is improvement of the synthesized speech quality, which is still unsatisfactory; in particular, the quality of naturalness must be improved. Synthetic speech is inferior to natural speech in voice quality, smooth articulatory movement, prosodic characteristics such as intonation and rhythm and the presence of the unnatural sounds. It must be also emphasised that the quality of speech synthesis systems depends essentially on the communication language.

139