A more precise definition, which
stands in this paper, would be: "Speech synthesis is the
process of producing vocal signal using machines, based on the
phonetical transcription of messages".
On the other hand we have to make
a distinction between "speech synthesis systems" (denoting
systems that transform written text into speech or text-to-speech
systems) and "speech synthesisers". As we said, the
former systems produce voice signals from a written text, while
speech synthesisers do the same thing starting from some parameter
sets. We note that a speech synthesiser represents the final module
of a text-to-speech system.
The parameters used by synthesisers could be classified
into two groups: the first referring to the source features like
fundamental frequency, intensity, sonority, and the second referring
to the acoustical features of the vocal tract: resonance frequencies
(called formants) and the corresponding bandwidth, linear prediction
coefficients, the signal energy in filters and so on.
This distinction is based on the
concept of "source-filter" and it is convenient to be
used in text-to-speech systems because it allows separate command
of the excitation and of the transfer function of the filter (this
independence is an approximation of the real process).
We will briefly discuss the functions
of the modules composing our system for Romanian language. The
next figure presents the interconnections between these modules.
163
2. The building elements of the text-to-speech system