[sdiy] Bandwidth vs. resonance in the world of vocoders

Magnus Danielson magnus at rubidium.dyndns.org
Wed Jul 13 17:18:19 CEST 2011


Dear fellow synth-diyers,

I wanted to revisit the aspects of resonance on vocoder analysis and 
synthesis filters and their relation to responce to transients/speech.

There is essentially two basic roads to go down to when doing vocoder 
analysis and synthesis filters, either to Butterworth-style bandpass 
responses where the steep slopes derive from the filter order, or using 
high-resonance filters where the steep slopes derive not from the filter 
order but the high Q-values. The later strategy allows for lower degree 
on filters, making them cheaper.

Now, the slopes which for traditional filters like Butterworth is a 
matter of placing the zeroes in either one or the other end of the 
spectrum while the poles are not very resonant at all. The slopes is 
derived in how many zeroes is placed on which side on the center 
frequency, and you can place as many zeroes as there is poles (in fact 
you must always place exactly the same amount, but you rarely need to 
see this grim detail, it get's done anyway). Each zero is good for 6 
dB/Oct, but if you have two, you can get a 12 dB lowpass-slope, 6 dB 
lowpass + 6 dB highpass or a 12 dB highpass slope. You can also use one 
or both zeros for all-pass responses, but it's beside the point I want 
to make.

The other strategy confuses us at first, since the narrow-in slopes is 
not derived from the number of zeros available (usually confused with 
the number of poles, but it's the degree of the system really) but from 
using very high-Q values. As you move away from the center frequency the 
resonance rolls off and the remainder system is that of whatever the 
zero-placements had, but much way down as the peak responces has been 
included in the gain-structure, so the overall responce is dampend such 
as the average pass-band is at about unity gain.

So far I only explain the strategies at hand. Now I come to the issue I 
would like to discuss. What is the reaction-time of these strategies and 
how will that modify the speach inteligience of the vocoder?

The more narrowband a filter is, the longer reaction time it has, but 
the second strategy uses even narrower bands next to each other (2 to 4 
peaks spread out is used). Can the reaction time be so large that in 
will affect the speach intelligentability? Also, there is a subtle 
difference between filtering prior and after the VCA on the 
synthesis-side. Filtering after VCA, the output filters also needs to be 
woken up or release their energy and not only the analysis filter for 
that band, thus the resonance beats us twice... filtering prior to the 
VCA avoids this issue since the amplitude transients does not go through 
the synthesis filters.

OK, if I still have some readers with me at this point... did you get my 
overall analysis and have something to say, experiences to share?

PS. First day at summerhouse and weather does not allow anything but 
boredom. :)

Cheers,
Magnus



More information about the Synth-diy mailing list