[sdiy] Some Audio DSP prototypes
Mike Bryant
mbryant at futurehorizons.com
Wed Apr 20 21:55:50 CEST 2022
From: Synth-diy [mailto:synth-diy-bounces at synth-diy.org] On Behalf Of Steve via Synth-diy
>> But main issue is the last time I looked STM still hadn’t integrated Clang into CubeIDE which makes it useless for any application where performance is essential.
> Do you have anything to point to, to substantiate this idea of Clang's general superiority?
> Could it be application specific?
> E.g. "For the vast majority of benchmarks the LLVM Clang vs. GCC performance was quite close"
> and "The NCNN neural network inference library from Tencent was performing hugely better when built under the GCC compiler"
> https://www.phoronix.com/scan.php?page=article&item=apple-m1-compilers
I’m afraid I see this sort of comment all the time. Since 2014, Apple, ARM, Google, IBM et al have poured a fortune into developing LLVM as development of gcc had gone down a bit of a dead end. Comments like they are ‘quite close’ tend to come from people using the free version of Clang which is years out of date, rather than the paid for versions such as ARM Compiler or Xcode which are two generations newer with many improvements. And as GNU simply copied some of the optimisations of the LLVM project into gcc without even referencing those copies, it is quite possible future improvements may never be introduced into the open source versions of LLVM, or at least kept back a few generations.
In our applications I generally see anything from 8% for general logic (an example you can try is the Circle RTOS for Raspberry Pis) to 34% for DSP functions (which is our main area of expertise so definitely application specific but far too large a gain to throw away).
However I have one particular routine for a high speed multiplexed digital drop and accumulate/insert bus where the improvement is over 200% as no matter what –Ox I applied, gcc simply couldn’t see an obvious register optimisation that stopped it using the stack. You can force it to use registers by setting –O0 but then it doesn’t optimise the logic as you can’t switch optimisation levels inside functions. But ARM compiler saw the obvious straight away. This is the routine I’m cutting and pasting into RP2040 code.
Since about 2017, I’ve only found one thing that was better with gcc, and that was a custom Linux build we used for an audio analyser product, which of course is the thing gcc is optimised for and on which it is tested. But Android, which is based on Linux of course, has been developed using ARM Compiler since 2014 and usually always compiled on it or Clang. Similarly with MacOS and iOS.
The thing that amazes me is people spend ages trying to overclock processors using expensive cooling systems to get maybe a 10% improvement, yet ignore similar gains available by just getting a decent compiler.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://synth-diy.org/pipermail/synth-diy/attachments/20220420/aa118ccd/attachment.htm>
More information about the Synth-diy
mailing list