[sdiy] Approximating sine with plain integer math

John Ames commodorejohn at gmail.com
Thu Apr 7 19:43:21 CEST 2016


On 4/6/16, Brian Willoughby <brianw at audiobanshee.com> wrote:
> Your latest code uses 3 multiply, 1 divide, 1 subtract, and 2 branches (the
> ?: operator).
> I believe that the SVF can be implemented with only 2 multiply and 1 add.
> It's rather significant that there are no branches in the SVF calculations,
> because most pipelined processors lose performance when branching. Some
> processor instruction sets, such as the PowerPC AltiVec, have single opcodes
> that can pick one of two results based on a condition, allowing things like
> "(saw < 0) ? -1 : 1" to be implemented without flushing the pipeline, but
> it's far easier to work with algorithms that don't branch at all, assuming
> you want performance.
>
> Granted, one of your multiply operations can be implemented as a 2-bit shift
> (i.e. * 4). However, that divide instruction could be expensive unless you
> use fixed point fractions and multiply by the equivalent of 1/INT_MAX. That
> makes your algorithm 3 multiple, 1 2-bit shift, 1 subtract, and 2 branches.
> Still a lot more than the SVF.
Also, to come back to this briefly, this is a totally correct
assessment performance-wise, but the point of the C implementation was
just as a refernce implementation. In a situation where it actually
made a difference, I'd take a different approach.

For starters, only one branch would really be required, as the
expression with the ternary operator is the same in both places so you
can save the result. Also, in an architecture with conditional
execution (ARM, for example,) you don't even need a full branch to
implement it; if I remember correctly, skipped conditional
instructions don't even really cause a bubble in the pipeline. And if
branches are really that expensive, there's ways to bit-juggle things
so that you can get a functionally equivalent result (-1 or +1
depending on the sign of the input) without ever branching, or even
doing a test. (Example in a minute.)

And yes, you can absolutely reduce one multiply and one divide to
shift operations, which gets it down to two multiplies, two shifts,
and a subtract. You can further change the multiply by the sign into a
negate instruction if you have conditional execution or branches
aren't expensive.

So, as a hypothetical exercise, here's how the algorithm would look if
we were optimizing for performance in J. Random Assembly Language:

1. Take the sawtooth value as an input.
2. Square it; save the original value in another register.
3. Shift right by (sample width.)

(Deriving the sign)
4. Get a copy of the original value.
5. AND it with -max (i.e. 0x8000 in a 16-bit word.)
6. Arithmetic-shift right by (sample width - 2) - duplicates the MSB
to all but the lowest place.
7. Increment by 1 (results in -1 if the input was negative, +1 if it was not.)

9. Multiply the square of the input by the sign of the input.
9a. Or, negate the square of the input if the input was negative (if
this is less expensive.)
10. Subtract the original value.
11. Add the sign of the original value (overflow fix.)
12. Shift left by 2 (multiplies by 4.)
13. Negate, if you want your sine non-inverted.

Now, whether this is "fast" depends entirely on what architecture
you're on, and whether it's faster than SVF, I dunno - but it is
totally possible to implement this technique in a simple,
non-branching form using only one or two multiplies if that's what's
important.



More information about the Synth-diy mailing list