[sdiy] Hardware convolution box?
Olivier Gillet
ol.gillet at gmail.com
Fri Feb 10 02:01:35 CET 2017
I think you're vastly overestimating how much computational resources
this requires.
A well-known trick is to partition the head of the IR into small
blocks (say 32 samples long if you want sub ms latency at 48kHz), and
use larger blocks for the tail of the IR (latency is not a problem for
the tail). The whole convolution can be decomposed as a sum of
convolutions by each of the blocks, which can be evaluated in the
frequency domain by DFT, complex multiplication by the DFT of the IR
block, and IFT.
I did some back of the envelope computations and arrived at the result
of 40 MMACs for a sample rate of 48kHz and a 2s-long IR.
I found it a bit too good to be true and I got back to the source:
http://www.cs.ust.hk/mjg_lib/bibs/DPSu/DPSu.Files/Ga95.PDF
p. 132, just before the beginning of section 6:
"Thus a filter of size 128K samples will require approximately 427
multiples per output sample".
This assumes that the DFT of all the blocks the IR is made of has been
pre-computed; but this can be done in faster than real-time when the
IR is loaded, assuming you've got enough RAM.
Of course there's the issue of scheduling and a lot of additional
bookkeeping, but at the very worst the order of magnitude we're in are
hundreds of MMACs and a couple MBytes of RAM.
On Fri, Feb 10, 2017 at 1:09 AM, cheater00 cheater00
<cheater00 at gmail.com> wrote:
> Found the right spot at the TI website. I've made a somewhat large
> survey of AD and TI chips. I've uploaded the data to Google Docs (see
> link at the end of this email).
>
> For a lot of power, TI can't be beat. Their chips are as cheap as
> $0.78/GMACS, that's on TMS320C6678CYP, a chip with 8.5MB ram and 256
> GMACS, $200 at Mouser.
>
> The cheapest TMS320C is TMS320C6652CZH6 with 19.2 GMACS, 1MB ram, at
> $41.95 at Arrow.
>
> For cheap chips, AD is great. Their most powerful non-obsolete
> offering is ADSP-BF561SKBCZ-5A, 2 GMACS, 328KB ram, $32.59 at Arrow,
> for $16.30/GMACS. Some of their unusually cheap chips include:
> ADSP-BF525BBCZ-5A, 1.2 GMACS, 132KB, $11.79 at Newark Element14 for $9.83/GMACS
> ADSP-BF534BBCZ-4A, 1 GMACS, 134KB, $5.88 at Newark Element14 for $5.88/GMACS
> ADSP-BF531SBBCZ400 0.8 GMACS, 53KB, $4.44 at Avnet for $5.54/GMACS
>
> Those chips were noticeably (3-4x) cheaper than their close
> counterparts, apparently Newark and Avnet have some sort of blowout.
>
> I stopped surveying AD chips around 1.2 GMACS. There are going to be
> much cheaper ones than I found, I guess, but they just have so many
> chips I'd spend 2 days figuring out the prices. It's obvious: their
> stuff is cheap.
>
> AD are inexpensive, but clearly, if you need a lot of processing power
> and/or a lot of memory the TI will be 5 to 10 times cheaper. $200
> might not be so much if that's the majority of the cost of the box for
> a DIY gamer.
>
> As far as evaluation boards go, the highest-powered AD board seems to
> be the best value. The TMDSEVM6678L costs $399 on TI's website, has
> 64MB Flash, 512 MB DDR3 SDRAM, gigabit ethernet, usb mini-B, 80 IO
> header and an AMC header with PCIe, an emulator port, a small FPGA for
> configuration and booting, etc. See features at these two links:
> http://www.ti.com/tool/tmdsevm6678#Technical%20Documents
> http://www2.advantech.com/Support/TI-EVM/6678le_of.aspx
>
> I don't know if the USB can be used in host mode. Does anyone know?
>
> It is unclear to me which version of the chip this board has - the
> 320GMACS one at 1.25 GHz or the 256 GMACS one at 1 GHz.
>
> Finally, there is a version of this board that costs $599 (50% more)
> and it has an XDS560V2 emulation mode. I understand that's a debugger.
> I don't know why exactly it is significant. What advantages does this
> bring for a developer?
> Is the emulator port shown on advantech's website only available in
> this more expensive version? If the cheaper version also has it, what
> can it be used for if the XDS560V2 emulation mode is not available?
>
> Survey data is on Google Docs. Anyone can comment:
>
> https://docs.google.com/spreadsheets/d/1oT-9PVh8yZMMwAkpltqGo8mhSL1NLXZJiC-LH7swsbY/edit?usp=sharing
>
>
> Have fun!
>
> On Thu, Feb 9, 2017 at 9:02 PM, cheater00 cheater00 <cheater00 at gmail.com> wrote:
>> Yeah, usb host mode sounds super useful unless SD will allow faster UI
>> interaction.
>>
>> Do you know which TI chips have the most MMACS? I find the website
>> confusing.
>>
>>
>> On Thu, 9 Feb 2017 20:29 , <rsdio at audiobanshee.com> wrote:
>>>
>>> Based on your survey, I'd recommend the Analog Devices board, even though
>>> I usually lean towards TMS320. The TMS320 family is huge, including both
>>> fixed-point and floating-point, low-power and high-speed, old and new
>>> designs, etc. Some of the TMS320 boards you listed are really geared more
>>> towards motor control than audio, which is why they might be underpowered
>>> for long impulse response convolution. I know that the AD SHARC family is
>>> also large, and they're very popular, but I am less familiar with the
>>> options.
>>>
>>> Don't forget to look at the chip manufacturer as a direct source for these
>>> boards. I always buy directly from Texas Instruments because Digi-Key tends
>>> to have a markup. Outside the US, maybe it's a different story due to
>>> international availability.
>>>
>>> I'd recommend something like the 1MB 800 MMAC board and not worry about
>>> external RAM. 1MB seems like plenty. I'd also recommend trying to implement
>>> both the time domain convolution and the frequency domain version. There are
>>> ways to reduce the latency of the frequency domain approach, and at least it
>>> would allow for longer impulse responses to be supported. For IRs that are
>>> short enough, the time domain approach would work. I've also seen papers on
>>> combining the two, since LTI techniques can be run in parallel and summed.
>>>
>>> As for taking pairs of 16-bit samples to speed things up, be aware that
>>> not all instructions can work that way. I think that most DSPs can do a few
>>> simple operations on value pairs, but the most complex DSP instructions can
>>> only handle full samples. DSP architectures have internal registers that are
>>> much larger than the sample size, like 56-bit or higher. If you think about
>>> all of the potential overflow when adding thousands of samples from an
>>> impulse response, you can see why such large registers are needed. When
>>> working in that model, its not possible to handle the overflow from two
>>> samples that are combined in a single 32-bit input value.
>>>
>>> Finally, I think that nobody has made something like this because the user
>>> interface would be rather difficult. It's a bit of a power-user effect. On
>>> that note, some sort of SD card might be useful, so I can see why you're
>>> looking into that. However, perhaps just a custom USB class device would be
>>> enough of an interface to allow downloading impulse responses to the device.
>>> At a minimum, you'll need a large Flash to store the current impulse
>>> response, or some way to partition the program Flash to set aside room for
>>> the data. The AD board with USB host mode could feasibly read directly from
>>> a USB memory stick or Flash drive.
>>>
>>> Brian
>>>
>>>
>>> On Feb 6, 2017, at 9:20 PM, cheater00 cheater00 <cheater00 at gmail.com>
>>> wrote:
>>> > Brian, $50 is a steal. I've had a look at Digikey.
>>> >
>>> > This TI board is £24. It has ~150 KB on-chip RAM, but it has an
>>> > integrated SDRAM interface.
>>> >
>>> > http://www.digikey.co.uk/product-detail/en/texas-instruments/LAUNCHXL-F28377S/296-42484-ND/5404239
>>> >
>>> >
>>> > This TI board is £40. It has ~384 KB on-chip RAM and an integrated
>>> > SRAM interface and SD card support. 200 MMACS.
>>> >
>>> > http://www.digikey.co.uk/product-detail/en/texas-instruments/TMDX5505EZDSP/296-24965-ND/2127652
>>> >
>>> > This AD board is £60. It has 1MB on-chip RAM and USB host mode, no
>>> > idea about ram interface or SD card. 800 MMACS.
>>> >
>>> > http://www.digikey.co.uk/product-detail/en/analog-devices-inc/ADZS-BF706-EZMINI/ADZS-BF706-EZMINI-ND/5408943
>>> >
>>> > This last one sports 800 MMACS. Is this enough processing power for
>>> > the 5-second convolution I mentioned above? It seemed to me like that
>>> > would need 4608 MMACs. Maybe 2304 if we take pairs of 16 bit samples
>>> > and treat them as 32 bit values. Are my numbers correct? Are there
>>> > optimizations that can be done to lower this number, while still
>>> > having zero latency? I understand FFT domain convolution introduces
>>> > latency, which is not wanted in hardware. "Naive" MAC based
>>> > convolution doesn't seem too far out of reach.
>>> >
>>> > This TI board is £156. It has 256 KB on-chip RAM and support for DDR2
>>> > SDRAM. No mention of MMACs but they say 3648 MIPS and I assume a
>>> > pipelined MAC costs one instruction, would that be correct?
>>> >
>>> > http://www.digikey.co.uk/product-detail/en/texas-instruments/TMDSLCDK6748/TMDSLCDK6748-ND/5213032
>>> >
>>> >
>>> > The more expensive boards don't seem to have more powerful DSP chips.
>>> > And those chips don't really get much more powerful either. However,
>>> > convolution is easily parallelised. So, worst case scenario, if you
>>> > wanted really long impulse responses you'd have to use a few chips.
>>> > However, even the really good 800 MMACS Blackfin ones are £15 unit
>>> > price, so that's not so bad...
>>> >
>>> > So, tell me, why hasn't anyone made this yet?
>>> >
>
> _______________________________________________
> Synth-diy mailing list
> Synth-diy at synth-diy.org
> http://synth-diy.org/mailman/listinfo/synth-diy
More information about the Synth-diy
mailing list