[sdiy] Hardware convolution box?

rsdio at audiobanshee.com rsdio at audiobanshee.com
Sun Feb 12 22:39:43 CET 2017


Hello,

Source code like that shown in the link below is good for educational purposes, but it's still not the most efficient. Full DSP chips have an instruction set that is optimized for FFT and other common operations. One significant aspect of the FFT is the bit-reversed addressing needed to unscramble the order of the results. A side effect of the "fast" aspect of the FFT is that it produces results in the "wrong" order. Bit-reversed addressing is needed to put the results in the expected order (increasing frequency bins).

While it's true that a general purpose processor can calculate the indices needed to unscramble the results, all of the modern DSP chips have a special addressing mode that implements bit-reversed addressing with the same efficiency as direct addressing. To a DSP, it's as if the FFT is calculating the results in the correct order, with no additional cycles needed at all to put the results in the right order.

Some ARM chips have a MAC instruction, and many seem to treat this as if it makes the ARM as good as a DSP. But bit-reversed addressing modes are another feature of a DSP that makes them significantly faster than an ARM. In addition, DSP chips also have the ability to automatically wrap pointers to any power-of-two buffer size, saving another several instruction cycles per loop when calculating many signal processing algorithms. When used together, all of these instruction set enhancements (and more) make the DSP able to process more samples in fewer cycles.

So, when combining FIR and FFT processing for convolution, you'll need MAC, bit-reversed addressing, automatically-wrapped buffer pointers, and possibly other special instructions for maximum efficiency at a given instruction clock rate. Hopefully the DSP you choose will have example code in optimized assembly for a partitioned convolution, and you won't have to piece all of this together yourself. Yes, you could do it all in Standard C on a general purpose ARM or XMOS, but you'll need a higher clock rate and more code to do the same amount of work.

Brian Willoughby
Sound Consulting


On Feb 11, 2017, at 2:45 PM, cheater00 cheater00 <cheater00 at gmail.com> wrote:
> Coincidentally I have found this really simple and concise definition of the FFT today:
> 
> https://www.reddit.com/r/Python/comments/1la4jp/understanding_the_fft_algorithm_with_python/
> 
> On Sat, 11 Feb 2017 11:30 Mikko Helin, <maohelin at gmail.com> wrote:
>> Forgot to mention the method is called non-uniform partitioned
>> concolution, the paper can be found here:
>> http://www.cs.ust.hk/mjg_lib/bibs/DPSu/DPSu.Files/Ga95.PDF





More information about the Synth-diy mailing list