[sdiy] speech formant frequencies
Tim Parkhurst
tim.parkhurst at gmail.com
Thu Jun 28 19:11:04 CEST 2007
On 6/28/07, Michael Buchstaller <buchi at allnet.de> wrote:
>
> >Interestingly, it's pretty intellegible once you read the accompanying
> >screen text -- but when I close my eyes I hardly can decipher the words.
>
> hmmm.... i thought i woud be alone with that problem. It seems to be
> consistent over various kinds of speech generation (phoneme-based,
> vocal-tract based, various software text-to-speech engines...),
> when i read the text and hear it, i think it is clear and intelligible. But
> turning the head away from the written text, i can hardly understand
> anything...
> mfG
> -Michael Buchstaller
>
What most speech synthesizers (and vocoders) fail at is reproducing
the sounds of "fricatives" and consonants. The steady state vowel
sounds are relatively easy, but a great deal of the information in
speech lies in the percussive pops, clicks and hisses of the
consonants. Look at words like "buy," "sigh," and "why." The vowel
sounds are identical, and the only way to tell (hear) the difference
is in the sound of the consonants. To add to the complexity,
synthesizing a "b" sound is very different from a "k" or a "s" or a
"w." Each one of these would require a very different patch (in an
analog synth). This is why the best speech synthesizers are either
sample-based or digitally controlled. Check out
http://www.research.att.com/~ttsweb/tts/demo.php
I love this link. You can type in text, select a voice (there are
several to choose from), and hear (and even download a sound file of)
the results. I like to put in nonsense and hear the resulting
giberish. Then again, that's me ;-)
Tim (I, for one, welcome our speech synthesizing robot overlords) Servo
--
"Imagination is more important than knowledge." - Albert Einstein
More information about the Synth-diy
mailing list