Maria-Mirela Petrea, Dan Cristea * Dealingwith Prosody. A Computer-Assisted Language Learning Approach
Fig. 5. - Female
speaker, "More peaceful?";
(a) - the waveform
signal, (b) - pitch finally detected, (c) - conflicting pitch
traces.
The algorithm generates a tree-like state space in which a path denotes a route in development and a node records a pitch trace and a score reflecting the traces' participation to the developing route. Supposing that at a certain moment a path was developed; then the algorithm searches for all traces consecutive to the current path on the time axis, and for each of them creates nodes in the tree and computes scores according to the fuzzy propositions. The universe of discourse for scores assigned to conflicting traces is illustrated in figure 4.
One of the fuzzy propositions used has the following motivation(see figure 5): the closer a pitch trace to one of its neighbors (lets say the right one) is in time, the closer the first period of this neighbor and the last period of the trace inspected must be. That is, in short time intervals the F0 contour is gradually changing (while in long intervals it may change significantly). Consequently, when a pitch trace is added to a partial route, it is inspected whether it naturally continues what it was already constructed; if the construction is developing left-to-right, then it is asserted that the contribution of a current pitch trace to a partial route (previously developed) is BAD, ACCEPTABLE or GOOD, according to a function of the last pitch period in the route and the first pitch period in the trace, with the truth degree computed by a function of time interval between the route and the trace. For instance, in figure 5 pitch traces are individuated by numeric labels; if the partial route is the sequence 1, 2, 3, 4 then the route composed by adding the 6-labeled trace will receive a better score than that obtained by adding the 5-labelled one. Finally, among the tree's leaves, the route with the best score will be chosen.
The actual implementation uses backtracking; only the leaves of the tree (identifying complete traces) are object to defuzzification. The algorithm could be further improved by using a branch-and-bound method, which could synchronize with uncertain trace development, heuristically based on defuzzification of partial routes.
3.7. Prosody
Prosody is computed based on the fundamental frequency, since it acts as the most important acoustic correlate of stress and of the "melody" of an utterance. Basically, to represent the intonational contour, the pitch has to be stylized. Stylization of the F0 contour is intended to get a sequence of line segments that most closely approximate F0 (this time, computed on frequency, not on period) therefore being a very accurate representation of the movements in speaker's intonation.
Fig. 6. - Master
signal, female speaker, "Can you manage?";
(a) - the waveform
signal; (b) - the pitch; (c) - the loudness; (d) - the prosody
contour.
157