Maria-Mirela Petrea, Dan Cristea * Dealingwith Prosody. A Computer-Assisted Language Learning Approach
The first action the signal is object to, that is, the one that decides which peaks are epochs, is performed conforming to two criteria:
Then the filtered peaks are entered into the buffer, as candidates participating in different traces. The periodicities are searched for within this buffer. The length of the buffer is calculated according to the formula:
buffer_length> 2 * Tmax / Tmin
where Tmax and Tmin correspond to the permissible range of the fundamental frequency delimited by Fmin and Fmax. Two adjacent peaks cannot occur closer than the minimum permissible period and the buffer must be long enough to permit the tracing, therefore at least three peaks of the longest possible period. As emphasized above, each item in the buffer keeps a record of the participation of the corresponding peak in different traces; so, each peak entering the buffer can trace at most Tmax/Tmin traces and this limits also the number of channels that must be allocated for the buffer's elements (channels are records allocated for the parallel development of traces).
Fig. 3. - Waveform signal, male speaker; illustration of pitch periods between significant peaks.
The algorithm proceeds as follows (figure 3 can be suggestive to what follows): suppose that at a certain moment the buffer displays already a number of traces in progress, each trace "knowing" about the last period introduced (that is, the number of samples between the two last peaks in trace), let's call it T1; suppose also that the controller performing the first action decided which was the next epoch and entered it into the buffer. What the controller further does is to verify whether (and which of) the traces under development can be continued with the current tail of the buffer. First, it verifies if the sample's number between the tail-peak in buffer and the tail-peak of a trace, let's call it T2, is not greater than Tmax (T2 _ Tmax ); if it is, the trace's development is stopped. Otherwise, it is checked whether T1 _ T2. If it isn't, the trace remains, still expecting a peak to continue it; otherwise, the tail peak in the buffer is marked as participating in the trace that is continuing, and the trace is also recorded with the last peak and the last period (that is, T2) in it. The other action done by the controller is to start new traces, that is, it checks whether three peaks, the tail in buffer and two other from those previously introduced in buffer, can start a new trace; the criteria used are the same: the periods cannot exceed the range between Tmin and Tmax, and the two periods corresponding to the three peaks must be close enough.
3.6.2. Post-processing
Pitch tracing is disposed at finding multiple traces simultaneously. Therefore a post-processing phase is intended to disambiguate among them. Because of the large range of permissible frequencies for F0 (and different for male and female speakers) the algorithm can detect traces which are mirroring halves, doubles or even triples of real pitch periods. In the sequel, we shall call pitch route a sequence of consecutive pitch traces (not necessarily adjacent), and conflicting (or uncertain) traces intersecting pitch traces.
Fig. 4. -
The universe of discourse for scores assigned to conflicting traces.
156