[sdiy] Raspberry Pi 2 Synthesizer Project

ASSI Stromeko at nexgo.de
Sun Feb 7 18:29:54 CET 2016


On Sunday 07 February 2016, 07:47:01, Eric Brombaugh wrote:
> I'd be interested to find out if you have any success with optimizing via
> hand-written assembly language.

Going to plain assembly is usally not the best option unless you really need 
to wring out the last nanosecond (and then it gets really hard).  There's 
quite a bit of house-keeping you'd need to do (for each of the pipelines you 
use), so you'd probably need a clock-accurate simulator showing you what's 
really going on.

> Like you, I started my DSP work in dsPIC
> assembly and felt pretty competent at wringing the most from that
> architecture. Lately though I've been doing a fair amount of ARM-based
> synth coding using GCC and have made several stabs at assembly without
> being able to improve on the performance of optimized C. Digging into the
> disassembled C code it becomes obvious that the compiler is doing a fair
> amount of loop unrolling and interleaving to make maximum use of pipeline
> latencies. Keeping track of pipeline requirements while hand-coding
> assembly seems like it would be pretty difficult, especially given that
> the exact nature of the pipeline changes from one ARM architecture to
> another, and I find myself using a variety of CPUs in different projects.

While compilers have been getting impressively good at that sort of thing, 
there's always the problem of how to tell them to not cater for corner cases 
that you know cannot happen.  Also, general scheduling is NP-hard, so unless 
you have something that can be exhaustively solved or falls into some 
category that the compiler already implements good heuristics for, you'd 
still need to hand-hold the compiler one way or the other.

> In the end I've concluded that for the sake of productivity it's better to
> write my C code carefully and try a couple optimization settings in the
> compiler, using the one that performs best (the results seem to vary from
> project to project and between compiler versions) rather than worry too
> much about the minutiae of the assembly. I suspect that even it I were
> able to improve on the compiler's results the gains would be fairly
> minor.

I've not done that for ARM, but on other architectures I've used compiler 
intrinsics and maybe some inline assembler to good effect.  That typically 
gets you to over 90% of the theoretical peak performance.  The one time I've 
needed to code up assembler directly was when Sparc v8 added VIS (visual 
instruction set) and the compilers didn't know it yet.  But that was only a 
few inner loops; the loop setup, intro and wind-down was still done in C++.  
The assembly itself was produced with M4 macros, which kept the alignment 
constraints and counted issue slots as well as (optionally) unrolling the 
loop.  In all those cases you'd still start with a plain C (or C++) 
implementation that you can fall back to on new architectures you don't have 
optimized code for.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptation for Waldorf microQ V2.22R2:
http://Synth.Stromeko.net/Downloads.html#WaldorfSDada




More information about the Synth-diy mailing list