[sdiy] Accuracy with integer maths
Scott Gravenhorst
music.maker at gte.net
Tue Feb 4 15:23:40 CET 2014
Tom Wiltshire <tom at electricdruid.net> wrote:
>Hi All,
>
>Does anyone know any good websites where I can learn about
>accuracy requirements in integer maths?
>
>Specifically, I'd like to know how many bits I need to calculate
>for various algorithms; multiplication, interpolation, etc. I've
>been doing some experiments and they suggest to me that I need
>more bits than I'd expected to eliminate errors. For example, if
>I take two 8-bit integers and multiply them, I get a 16-bit
>result. If I'm only interested in (say) the top 10 bits of that
>result, do I need to calculate the full 8x8 multiplication? Can I
>eliminate some of the calculation without getting any errors?
>What about if I needed 12 bits? Or what about if I were treating
>one 8-bit value as a 0 -> 0.99 amount value, and I multiply by
>the other 8-bit value and then use only the top 8 bits of the
>result.
>
>If anyone knows anywhere where there is a clear,
>programmer-orientated explanation of this sort of stuff, I'd be
>grateful. A serious academic number-theory treatment probably
>won't get through to me, since I'm not a (remotely!) serious
>academic or a number theoretician.
>
>Thanks!
>
>Tom
I'm probably not the most qualified to speak on the subject, but I do
have a comment based on some experience.
>From my projects, I have seen that different applications and even
different components of these applications may require different bit
widths for fixed point. Consider a phase accumulator - lets start with
a ridiculously small 4 bits. We don't even need to do maths to see
that you can't get very much pitch resolution with 4 bits. This is
because with four bits, a one bit change amounts to 1/16 of the entire
pitch range. As we can see, increasing the number of bits will
increase the resolution.
I remember some work using a simple single pole low pass IIR filter
which started doing bad things when a0 was either near 0.0 or near 1.0.
In this case, the multiplies were truncated back to 18 bits and most
or all of the real information was in the low 18 bits of the product
and was tossed into the bit bucket. Increasing the filter's bit width
fixed the problem. Unfortunately, I can't give any math formulas that
can be used to compute where to draw the line. I think that multiplies
with truncated products are where a lot of fixed point error comes
from. I produced my working result experimentally after writing a C
program to display the products in full and truncated.
Maybe others can expand/correct what I've written here.
-- ScottG
________________________________________________________________________
-- Scott Gravenhorst
-- FPGA MIDI Synth Info: jovianpyx.dyndns.org:8080/public/FPGA_synth/
-- FatMan Mods Etc.: jovianpyx.dyndns.org:8080/public/fatman/
-- Some Random Electronics Bits:
jovianpyx.dyndns.org:8080/public/electronics/
-- When the going gets tough, the tough use the command line.
-- Matt 21:22
More information about the Synth-diy
mailing list