[sdiy] GCC vs Clang for Audio DSP

Eric Brombaugh ebrombaugh at gmail.com
Thu May 5 04:26:40 CEST 2022


You make some excellent points about tuning code for the compiler - 
that's often helpful when you're trying to squeeze the last drop of 
performance out. If one looks at the CMSIS DSP libraries there are many 
examples of this - code that looks ungainly but is obviously crafted 
with an eye towards compiler and target idiosyncrasies.

I'd also agree that this test I've done applies mainly to the 
performance of these tools with a Cortex M4F processor as found in the 
STM32F303K8 I used here. It would not surprise me at all to learn that 
things are very different on Cortex A.

With regard to your question of looking at disassembly - I have taken a 
few peeks (mostly while debugging some linker issues), but my work here 
is mainly quantitative and it doesn't attempt to explain the results, 
merely present them. There are clearly many avenues to explore regarding 
why these differences in performance arise - elsewhere someone suggested 
an exercise of bisecting all the -fxxx options that go into the various 
-O* optimization settings to discover more about which are most 
effective. Although interesting, that's a job for another day.

The only place where I'd take exception is the statement that "gcc is 
past its use-by date". I think these results show clearly that it still 
has some life in it, even if only for lower-end embedded systems. A 
variety of corporate sponsors including ARM are still actively 
supporting its development and although it's been around the block it 
remains a useful tool for those who choose not to support the 
proprietary offerings.


On 5/4/22 13:28, Mike Bryant wrote:
> With my DSP based code (audio mixer type stuff - filters, compressors, mix busses, effects, etc, all running on a 96kHz prime cycle with sidechains as necessary) - I usually see a slightly larger code size - ten percent-ish - but much faster execution time.   From what I've seen gcc tends to unroll the wrong loops, and it's hard even with pragmas to make it unroll the correct ones.
> I've one routine using split double precision multiply-accumulates (using  a combination of struct and union) where Clang was over 100% faster simply because it noticed an optimisation gcc didn't, but nor could I force gcc to take that optimisation short of pasting the Clang code in as assembly.
> Conversely I had a SPDIF receiver where on first compile Clang was about 8% faster but with some careful work I got it down to one unused jmp more than Clang because gcc doesn't like producing indexed jumps (or at least I can't persaude it to do so) so I had to use a line of assembly code to force it to do.
> So I think my opinion stated isn't that gcc is sub-standard for embedded, just that you should write code either compiler can take and see which is best, and in my case that usually is Clang, but may not be for all use-cases.
> However if you are writing code for Macs, iPhones, Android or PCs then it's moreorless a done deal, gcc is past it's use-by date, as is Intel Compiler.
> What I am surprised with is you found no difference between ARMcompiler and Clang.  I'd need to look at your code but it may be that for the M0-M3 class processors they are the same.  Did you look at the actual assembly code produced ?
> When you get to the A7x series using 64 bit it's night and day as it makes far better use of the NEON vector processor, and even on the 32 bit M7 processors there's usually a definitely noticeable percent or two improvement.
> -----Original Message-----
> From: Synth-diy [mailto:synth-diy-bounces at synth-diy.org] On Behalf Of Ben Stuyts
> Sent: 04 May 2022 20:51
> To: Eric Brombaugh
> Cc: Synth-diy at synth-diy.org
> Subject: Re: [sdiy] GCC vs Clang for Audio DSP
> Hi Eric,
> Very interesting, and not what I expected. I generally see smaller code with clang compared to gcc. Mostly using Cortex-M0..M3 class cpu’s with -Os. Only critical files are compiled for speed. It could be that CrossWorks is not setting all the cmd line options in an optimal way.
> Have you looked at the difference in error messages? I generally think those from clang are better / easier to understand than those from gcc.
> Ben
>> On 4 May 2022, at 21:20, Eric Brombaugh via Synth-diy <synth-diy at synth-diy.org> wrote:
>> In a previous thread we had some exchanges about the performance of various compilers used in embedded ARM applications. I've been using GCC for this over the course of the last decade and had seen some remarkable improvements in its performance so I was somewhat surprised that it was viewed as being substantially underperforming in comparison to the proprietary toolchains based on LLVM/clang.
>> I decided to try for myself and put together a quick DSP benchmark that's representative of the sort of stuff that I often do in my embedded work. I took some time to create build scripts for clang and test out both the free and proprietary versions to compare them against GCC for both execution speed and binary size. the results are interesting so I summarized them with tables and charts over here:
>> https://github.com/emeb/f303k8_nucleo
>> The quick summary is that based on this one example it appears that recent builds of GCC are not grossly out of line with the performance of Clang in both the free and proprietary flavors. For all levels of optimization greater than -O0 GCC performs roughly as well if not better than either version of Clang. This test also shows very little difference between the free and proprietary versions of Clang.
>> I would like to emphasize that this is one particular use-case and by no means representative of the performance you'd see in all applications of these compilers. It does however provide an alternative viewpoint backed by hard evidence to the notion that GCC is generally sub-standard and not for professional use.
>> If you have questions about the methodology I used here, the entire source and build scripts are available at that github repo for your inspection. I will admit that I'm not a clang power-user so there may be some optimizations that I've overlooked and I'd appreciate any suggestions on how to make this analysis more complete.
>> _______________________________________________
>> Synth-diy mailing list
>> Synth-diy at synth-diy.org
>> http://synth-diy.org/mailman/listinfo/synth-diy
>> Selling or trading? Use marketplace at synth-diy.org
> _______________________________________________
> Synth-diy mailing list
> Synth-diy at synth-diy.org
> http://synth-diy.org/mailman/listinfo/synth-diy
> Selling or trading? Use marketplace at synth-diy.org

More information about the Synth-diy mailing list