The speech processing technologies obtained significant results
in the last decades. Progress in these areas has been made possible
by advances in acoustic and linguistic theory, mathematical modelling
of speech production and perception, signal processing, structured
programming, VLSI technology and hardware design. As computer
becomes more prevalent, the need for communication between man
and machine is increasing and there is a visible tendency for
computing machines to accommodate the characteristics of their
human users. Obviously, one of these characteristics is the voice
and for this reason the process of producing artificially speech
and especially text-to-speech (TTS) conversion, is today a main
objective of speech processing and subject of intensive research
[1, 2, 3, 4].
A text-to-speech system may offer
a wide range of applications in a number of fields, from accessing
electronic mail and various kinds of databases by voice to reading
for the blind people. It is important to observe that voice response
technology for synthesised speech presents several advantages
for information transmission:
All of these factors and the various
applications asked by the industry have been driving forces for
research on TTS systems, resulting in currently available commercial
systems that can produce speech of acceptable intelligibility
from unrestricted input text.
One of the issues to be addressed
for TTS systems is improvement of the synthesized speech quality,
which is still unsatisfactory; in particular, the quality of naturalness
must be improved. Synthetic speech is inferior to natural speech
in voice quality, smooth articulatory movement, prosodic characteristics
such as intonation and rhythm and the presence of the unnatural
sounds. It must be also emphasised that the quality of speech
synthesis systems depends essentially on the communication language.
139