Romanian Language Technology

Maria-Mirela Petrea, Dan Cristea * Dealingwith Prosody. A Computer-Assisted Language Learning Approach

There are two structures used in pitch detection: a buffer containing peaks and an array keeping information about the pitch "routes" in progress, named TRACES. The buffer is a queue of cells, each of them holding the time and amplitude of the current peak in the original signal, but also (and most important) the different traces the peak can participate in. The TRACES array simply keeps the pitch traces under development, each trace being uniquely identified by a label. As it was already said, the algorithm tries to develop simultaneously traces which could intersect with each other, and finally chooses among them the T₀ route.

The first action the signal is object to, that is, the one that decides which peaks are epochs, is performed conforming to two criteria:

amplitude of peaks;
permissible range of the fundamental frequency.

Then the filtered peaks are entered into the buffer, as candidates participating in different traces. The periodicities are searched for within this buffer. The length of the buffer is calculated according to the formula:

buffer_length> 2 * T_max / T_min

where T_max and T_min correspond to the permissible range of the fundamental frequency delimited by F_min and F_max. Two adjacent peaks cannot occur closer than the minimum permissible period and the buffer must be long enough to permit the tracing, therefore at least three peaks of the longest possible period. As emphasized above, each item in the buffer keeps a record of the participation of the corresponding peak in different traces; so, each peak entering the buffer can trace at most T_max/T_min traces and this limits also the number of channels that must be allocated for the buffer's elements (channels are records allocated for the parallel development of traces).

Fig. 3. - Waveform signal, male speaker; illustration of pitch periods between significant peaks.

The algorithm proceeds as follows (figure 3 can be suggestive to what follows): suppose that at a certain moment the buffer displays already a number of traces in progress, each trace "knowing" about the last period introduced (that is, the number of samples between the two last peaks in trace), let's call it T₁; suppose also that the controller performing the first action decided which was the next epoch and entered it into the buffer. What the controller further does is to verify whether (and which of) the traces under development can be continued with the current tail of the buffer. First, it verifies if the sample's number between the tail-peak in buffer and the tail-peak of a trace, let's call it T₂, is not greater than T_max (T₂ _ T_max ); if it is, the trace's development is stopped. Otherwise, it is checked whether T₁ _ T₂. If it isn't, the trace remains, still expecting a peak to continue it; otherwise, the tail peak in the buffer is marked as participating in the trace that is continuing, and the trace is also recorded with the last peak and the last period (that is, T₂) in it. The other action done by the controller is to start new traces, that is, it checks whether three peaks, the tail in buffer and two other from those previously introduced in buffer, can start a new trace; the criteria used are the same: the periods cannot exceed the range between T_min and T_max, and the two periods corresponding to the three peaks must be close enough.

3.6.2. Post-processing

Pitch tracing is disposed at finding multiple traces simultaneously. Therefore a post-processing phase is intended to disambiguate among them. Because of the large range of permissible frequencies for F₀ (and different for male and female speakers) the algorithm can detect traces which are mirroring halves, doubles or even triples of real pitch periods. In the sequel, we shall call pitch route a sequence of consecutive pitch traces (not necessarily adjacent), and conflicting (or uncertain) traces intersecting pitch traces.

Fig. 4. - The universe of discourse for scores assigned to conflicting traces.

156