A Text-To-SpeechSystem for the Romanian Language

Attila Ferencz, Diana Zaiu, Maria Ferencz, Teodora Raţiu, Gavril Toderean

1. Introduction

What is speech synthesis? A simple answer to this question would be: speech synthesis is the process of speech production using various machines. But this answer cannot be considered a definition because even a tape recorder replays human voice and nobody considers it a voice synthesiser.

A more precise definition, which stands in this paper, would be: "Speech synthesis is the process of producing vocal signal using machines, based on the phonetical transcription of messages".

On the other hand we have to make a distinction between "speech synthesis systems" (denoting systems that transform written text into speech or text-to-speech systems) and "speech synthesisers". As we said, the former systems produce voice signals from a written text, while speech synthesisers do the same thing starting from some parameter sets. We note that a speech synthesiser represents the final module of a text-to-speech system.

The parameters used by synthesisers could be classified into two groups: the first referring to the source features like fundamental frequency, intensity, sonority, and the second referring to the acoustical features of the vocal tract: resonance frequencies (called formants) and the corresponding bandwidth, linear prediction coefficients, the signal energy in filters and so on.

This distinction is based on the concept of "source-filter" and it is convenient to be used in text-to-speech systems because it allows separate command of the excitation and of the transfer function of the filter (this independence is an approximation of the real process).

2. The building elements of the text-to-speech system

A text-to-speech system consists of two types of modules: modules concerning text processing which should take into account all the aspects from the contextual dependent analysis until the conversion of the text into sound codes, and modules concerning the synthesiser (the synthesis algorithm and the hardware on which it is running).

We will briefly discuss the functions of the modules composing our system for Romanian language. The next figure presents the interconnections between these modules.

163