Before describing
the application, and just to make clearer the way it acts, it
might be interesting to have a look at the motivation that led
us to its development. The systems for computer-aided language
learning usually incorporate exercises intended to help students
to improve their abilities in pronouncing phonemes, syllables,
words or long utterances. The student might be instructed to a
correct perception of utterances by repeatedly listening to speech
signals pronounced with a high accuracy by native speakers, might
be instructed to a correct production of utterances, by rules
of coarticulation, indications about controlling the position
of vocal tract articulators and so on. But when evaluating student's
performances, that is, when the student is uttering, as no feed-back
comes from the automatic tutor, it remains at the student's choice
whether her performances are good or not.
PROSODICS
is a system that reacts to student's actions, ensuring the needed
feed-back. The general scenario the application puts on stage
is this: a master signal (pronounced by a native speaker),
eventually labeled with phonological information, is supplied
to the student; in turn, the student records and listens to her
voice, while comparing it to the master's, and then waits for
system's reply to see whether her pronunciation was good or not;
the reply consists of a visual aid and a written diagnosis, as
well as indications on how to overcome and correct the mistake.
PROSODICS
makes a comparison between the two signals (named master-signal
and student-signal) that results in assigning a score to
the student and, if the score is bad, telling her where the mistake
is and in what it consists. The relevant parameters the comparison
focuses on result from time domain analysis and regard energy,
voicing and F0 estimation. Energy is used for signals'
segmentation, while accurate F0 estimation is
essential
to the approximation of intonational contour and to decide where
the main accent of the utterance resides. The final diagnosis
results rely on rhythm and duration, the overall intonational
curve, lexical and sentential stress, prosody pattern.
In order to deal
with the task described, special procedures have been implemented
for silence detection, fricatives' detection, F0
estimation,
noise elimination, the detection of boundaries delimiting speech
units, prosody computing, signal-to-signal alignment, and signal-to-text
alignment.
The paper is organized as follows: section
2 gives a short description of the way the application behaves
during a working session, section 3 shows details of the technical
realization and section 4 contains some concluding remarks.
151