[sdiy] Bandwidth vs. resonance in the world of vocoders
Magnus Danielson
magnus at rubidium.dyndns.org
Wed Jul 13 17:18:19 CEST 2011
Dear fellow synth-diyers,
I wanted to revisit the aspects of resonance on vocoder analysis and
synthesis filters and their relation to responce to transients/speech.
There is essentially two basic roads to go down to when doing vocoder
analysis and synthesis filters, either to Butterworth-style bandpass
responses where the steep slopes derive from the filter order, or using
high-resonance filters where the steep slopes derive not from the filter
order but the high Q-values. The later strategy allows for lower degree
on filters, making them cheaper.
Now, the slopes which for traditional filters like Butterworth is a
matter of placing the zeroes in either one or the other end of the
spectrum while the poles are not very resonant at all. The slopes is
derived in how many zeroes is placed on which side on the center
frequency, and you can place as many zeroes as there is poles (in fact
you must always place exactly the same amount, but you rarely need to
see this grim detail, it get's done anyway). Each zero is good for 6
dB/Oct, but if you have two, you can get a 12 dB lowpass-slope, 6 dB
lowpass + 6 dB highpass or a 12 dB highpass slope. You can also use one
or both zeros for all-pass responses, but it's beside the point I want
to make.
The other strategy confuses us at first, since the narrow-in slopes is
not derived from the number of zeros available (usually confused with
the number of poles, but it's the degree of the system really) but from
using very high-Q values. As you move away from the center frequency the
resonance rolls off and the remainder system is that of whatever the
zero-placements had, but much way down as the peak responces has been
included in the gain-structure, so the overall responce is dampend such
as the average pass-band is at about unity gain.
So far I only explain the strategies at hand. Now I come to the issue I
would like to discuss. What is the reaction-time of these strategies and
how will that modify the speach inteligience of the vocoder?
The more narrowband a filter is, the longer reaction time it has, but
the second strategy uses even narrower bands next to each other (2 to 4
peaks spread out is used). Can the reaction time be so large that in
will affect the speach intelligentability? Also, there is a subtle
difference between filtering prior and after the VCA on the
synthesis-side. Filtering after VCA, the output filters also needs to be
woken up or release their energy and not only the analysis filter for
that band, thus the resonance beats us twice... filtering prior to the
VCA avoids this issue since the amplitude transients does not go through
the synthesis filters.
OK, if I still have some readers with me at this point... did you get my
overall analysis and have something to say, experiences to share?
PS. First day at summerhouse and weather does not allow anything but
boredom. :)
Cheers,
Magnus
More information about the Synth-diy
mailing list