[sdiy] Some Audio DSP prototypes

sleepy_dog at gmx.de sleepy_dog at gmx.de
Wed Apr 20 22:38:58 CEST 2022

Mike Bryant:
> >E.g. "For the vast majority of benchmarks the LLVM Clang vs. GCC
> performance was quite close"
> > and "The NCNN neural network inference library from Tencent was
> performing hugely better when built under the GCC compiler"
> >https://www.phoronix.com/scan.php?page=article&item=apple-m1-compilers
> I’m afraid I see this sort of comment all the time.  Since 2014,
> Apple, ARM, Google, IBM et al have poured a fortune into developing
> LLVM as development of gcc had gone down a bit of a dead end.
>  Comments like they are ‘quite close’ tend to come from people using
> the free version of Clang which is years out of date, rather than the
> paid for versions such as ARM Compiler or Xcode which are two
> generations newer with many improvements.  And as GNU simply copied
> some of the optimisations of the LLVM project into gcc without even
> referencing those copies, it is quite possible future improvements may
> never be introduced into the open source versions of LLVM, or at least
> kept back a few generations.
> In our applications I generally see anything from 8% for general logic
> (an example you can try is the Circle RTOS for Raspberry Pis) to 34%
> for DSP functions (which is our main area of expertise so definitely
> application specific but far too large a gain to throw away).
> However I have one particular routine for a high speed multiplexed
> digital drop and accumulate/insert bus where the improvement is over
> 200% as no matter what –Ox I applied, gcc simply couldn’t see an
> obvious register optimisation that stopped it using the stack.


> You can force it to use registers by setting –O0 but then it doesn’t
> optimise the logic as you can’t switch optimisation levels inside
> functions.
I guess one could refactor something to have the bit that needs this
trick as an inline function with a tagged __attribute__((optimize("O0"))) ?
Never tried what happens there with inlines, whether the optimization
level of the surrounding function overrides this... but just throwing
this into GCC, it doesn't complain about this attribute being moot, like
it tends to do when you do pointless stuff.

> But ARM compiler saw the obvious straight away.  This is the routine
> I’m cutting and pasting into RP2040 code.
> Since about 2017, I’ve only found one thing that was better with gcc,
> and that was a custom Linux build we used for an audio analyser
> product, which of course is the thing gcc is optimised for and on
> which it is tested.  But Android, which is based on Linux of course,
> has been developed using ARM Compiler since 2014 and usually always
> compiled on it or Clang.  Similarly with MacOS and iOS.
> The thing that amazes me is people spend ages trying to overclock
> processors using expensive cooling systems to get maybe a 10%
> improvement, yet ignore similar gains available by just getting a
> decent compiler.

I guess some people don't like being dependent on ever changing terms
(read: the whims) of commercial tool vendors when avoidable.
Especially smaller outfits.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://synth-diy.org/pipermail/synth-diy/attachments/20220420/2d307741/attachment.htm>

More information about the Synth-diy mailing list