Speech Synthesizers

Ethan Duni eduni at ucsd.edu
Wed Nov 18 05:36:24 CET 1998


At 02:05 PM 11/18/98 +1000, you wrote:
>>A vocoder uses some number of envelopes following a signal (the voice) to
>>determine the amplitude of bands across a harmically rich signal, right?
>>Can you use real envelopes to do the same thing?  Not necissarily asdr's,
>>but some implement of "shaping" these lines.  Is this how (some) speech
>>synthesizers work?  Is this why a vocoder sounds so much like a
>>speak-and-spell?  Would you use filtered harmonics for these lines, or
>>just put an osc somewhere within each freq band?  Any thoughts?
>
>
>Ditto ; )
>
>I'd kinda assumed you could make one by just controlling the VCA of an
>oscillator (manually controlling frequency) using an envelope follower fed
>by a mic, but I've been told this is not the case.
>What else do you need?  Can such a thing be made CHEAP, say just shaping
>the audio out of a synth?

-well, the crucial element of speech is what's called formants.  formants
are frequency bands of various widths that are emphasized to some degree
(not a great text-book definition, but workable).  anyway, what makes a
given vowel or consanant itself is which bands are emphasized by how much.
now, to make words, you have to be able to morph between given sets of
formants.  a vocoder accomplishes this by taking two signals, one control
and one carrier.  (the control one would be a voice for speech).  it then
runs both of these signals through identical banks of bandpass filters.
this is splits up each signal into how ever many frequency regions there are
filters.  then outputs of the filters on the control signal are run through
envelope followers (gives you a voltage proportional to the volume of the
signal) and the outputs of the filers on the carrier are run through VCA's
controlled by said envelope followers.  so, the result is that whichever
frequency regions were loud in the control signal are emphasized in the
carrier, likewise with the quiet bands, in something like the same
proportions-something because the proportions of the various freuqnecy bands
are going to be different in the carrier signal, so there may not be any
energy in a given band to boost/cut.. or there will otherwise be a different
distribution (this is why you use "harmonically rich waveforms"), unless you
used the same signal for each of them (why?).. so, whatever formants were in
the control signal end up impressed on the carrier, and they change in the
same ways as the control signal's.  if you wanted to do speech specifically,
you could design the filters such that they wee centered around the
important formants in speech.  

so, yeah, it's a bit more complicated than an oscillator, a vca and an
envelope follower...

yes, it's theoretically possible to use envelopes instead of
speech->filters->envelope followers to control the carrier section of a
vocoder, but i'm thinking you'd need a lot of stages in these envelopes to
make it say anythig other than "eeeaaaaaoooohh".. and it would be a pain in
the ass to program to say anything.. fun though...

i thought there was some device called a "voder" (?) that had keys that
represented various sets of formants that you could press sequentially to
form words.. anyone?

Ethan




More information about the Synth-diy mailing list