Speech Synthesizers

Martin Czech martin.czech at intermetall.de
Wed Nov 18 08:07:52 CET 1998


SNIP
> -well, the crucial element of speech is what's called formants.  formants
> are frequency bands of various widths that are emphasized to some degree
> (not a great text-book definition, but workable).  anyway, what makes a
> given vowel or consanant itself is which bands are emphasized by how much.
> now, to make words, you have to be able to morph between given sets of
> formants.  a vocoder accomplishes this by taking two signals, one control
> and one carrier.  (the control one would be a voice for speech).  it then
> complicated than an oscillator, a vca and an
> envelope follower...
SNIP

>From my experiments it is not so important where the formants exactly
are, or what bandwidth, but the whole movement (yes, formants move
during speech, a mechanical equivalent to a bp vcf...) is the important
thing. The ear seams to evaluate how it moves. If there are only three
main formants, it looks a bit strange to use a 20 channel fixed filter
bank for a vocoder synthesis. But how can one extract formant data out
of a given spectrum?

It is quite illustrating to try to record backwards speech, ie. to try to
speak in such a way that the speech sounds correct if the tape is played
backwards. You will see that the written word is sometimes not a good
phonetic description, eeelreeelk tiawk zith zoosh snetnezz tsaaal ethh  !


Just my 0.02$ 
m.c.






More information about the Synth-diy mailing list