[sdiy] STM32 processor
Tom Wiltshire
tom at electricdruid.net
Thu Sep 15 21:56:57 CEST 2011
On 15 Sep 2011, at 19:08, Eric Brombaugh wrote:
> On 09/15/2011 10:41 AM, Olivier Gillet wrote:
>> Has anybody tried to benchmark those (esp. the 72 MHz parts) against
>> dsPICs for audio generation or audio effects? I'm curious to know
>> whether 40 Mhz clock + 16 bits instruction set with DSP features can
>> beat 72 Mhz clock (with wait states from the flash) + generic 32 bits
>> instruction set.
>
> That was part of the rationale behind my experiment with the 'F100 part - to see how it compares with the dsPICs I've used previously. Although this one is running at just 24MHz compared to the 40MHz that the dsPIC runs (no waits in either), I found that the 32-bit native register operations provided a 'force multiplier' effect that erased some of the difference in cycle speed. The interpolated wavetable oscillator algorithms compare quite favorably running on the two different architectures.
>
> The dsPIC does have an advantage in providing true DSP capabilities though - zero-overhead looping hardware, dual operand buses with parallel address calculations and true single-cycle MAC instructions that greatly accelerate filters, interpolations, etc. The ARM architecture won't be able to match that without a behind-the-scenes overhaul which even the newer Cortex-M4 series doesn't really provide based on my reading of the documentation.
If you can take best advantage of the DSP features of the dsPIC, you can really turn up the performance for what is a fairly basic chip. The single-cycle MAC that Eric mentioned can also increment/decrement two pointers, and prefetch the contents of two registers. This is essentially five actions with a single instruction, in a single cycle. Obviously the more of those you can squeeze into your code, the better. It's worth rewriting things just to use them in many situations. Arrangement of variables in RAM (or even ROM) becomes key too, so as to make extensive use of pointers a realistic possibility. There are only 15 registers available to the user, so after you use some for pointers and some for values, you don't get many of each, but if variables are in the right order, it's possible to start a code loop using a register for one pointer, move onto to another, and finally use it for a third, leaving yourself in the right place ready to start the next loop (scheme below).
What I've been learning recently is not how the chip works, but how to work the chip! You can get a lot more out of it, but it's hand-tweaked assembly coding, which isn't everyone's cup of tea.
Tom
<low-level coding example only for people who are interested>
Say I've got three variables that I use in my code; A, B, C for each of four oscillators.
I can put the variables in memory like this:
A0, A1, A2, A3
B0, B1, B2, B3
C0, C1, C2, C3
Then I need three pointers to them:
Pa, Pb, Pc
I set up a loop to process oscillators, and after I've used each variable for a given oscillator, I increment its pointer ready for the next oscillator.
This works and isn't bad, but you soon run out of pointers.
Instead, if you arrange the variables like this:
A0, B0, C0, A1, B1, C1, A2, B2, C2, A3, B3, C3
You can use a single pointer, increment it after each use, get the next variable you need, carry on, and when the loop is finished, you're in the right place to start the next cycle. The whole thing only uses one pointer.
Obviously this demands that the variables are used in a certain order, and that they are always used (e.g. their use can't be conditional). But you can usually find some of a group of variables that fit these conditions, and arrnage them such that you can save several pointers.
More information about the Synth-diy
mailing list