[sdiy] Hardware convolution box?
Terry Shultz
thx1138 at earthlink.net
Fri Feb 10 17:43:55 CET 2017
Hi Bruno,
I use the Code Composer tools from TI and I have been using these tools since they were in Alpha state.
Assembly code is difficult on the TI DSP Sitara platform. Pipeline length and such.
I find it easier to use one DSP as the Decoder for Atmos and the 2nd as post processing and Bass manager etc.
Also found it easier to do assembly programming on the ADI 4th Generation Falcon parts as the pipeline line was a bit shorter. The longer the pipeline,
the more difficult to re-order efficiently.
In some cases I have had to build a small library for hand tuned FFT’s and IFFT’s. Can’t show this IP as I was paid to develop for Automotive apps. by Semiconductor company.
The compilers have gotten pretty good for TI and ADI and I do less assembly work more and more.
Now I am building Neon Signal processing block for the Cortex A series of ARM processors.
Check out projectne10.github.io <http://projectne10.github.io/> or https://projectne10.github.io/Ne10/ <https://projectne10.github.io/Ne10/>
If you want to improve performance , use the Neon processor and build up your own libs.
I hope this helps you guys.
best regards,
Terry
> On Feb 10, 2017, at 7:35 AM, Bruno Afonso <bafonso at gmail.com> wrote:
>
> How are you going about using the DSPs? Could you comment on the toolchain and how easy it is to develop code for it compared to other DSPs?
>
> That X15 does indeed look mighty interesting for even non audio projects of mine.
>
>
>
> On Fri, Feb 10, 2017 at 12:59 AM Terry Shultz <thx1138 at earthlink.net <mailto:thx1138 at earthlink.net>> wrote:
> Hi Guys,
>
> I usually start with the highest performing product first and cost reduce after I get it running.
>
> Why drive yourself nuts worrying about MIPs and Memory. @ $250.00 I can build a monster Convolution engine.
>
> There again the Part is not cheap in small volume and .8mm pitch on BGA means at least a 10 layer board for high performance.
>
> I could not even build that X-15 board for what they sell it for on the Web.
>
> Jason sent me on last spring and I have it running Linux and hooked up to my 2nd monitor.
>
> It is way overkill but I have need of the Dual DSP’s for Dolby Atmos decoding on a 3D audio Headphone project I am building.
>
> best regards,
>
> Terry
>
>> On Feb 9, 2017, at 9:24 PM, Terry Shultz <thx1138 at earthlink.net <mailto:thx1138 at earthlink.net>> wrote:
>>
>> Another guy I know is Jason Kridner, who started BeagleBoard.org <http://beagleboard.org/> https://beagleboard.org/x15/ <https://beagleboard.org/x15/>
>>
>> This is a monster performer http://www.ti.com/product/am5728 <http://www.ti.com/product/am5728>
>>
>> Processor: TI AM5728 2×1.5-GHz ARM® Cortex-A15 <http://www.ti.com/product/am5728>
>> 2GB DDR3 RAM
>> 4GB 8-bit eMMC on-board flash storage
>> 2D/3D graphics and video accelerators (GPUs)
>> 2×700-MHz C66 digital signal processors (DSPs)
>> 2×ARM Cortex-M4 microcontrollers (MCUs)
>> 4×32-bit programmable real-time units (PRUs)
>> Connectivity
>>
>> 2×Gigabit Ethernet
>> 3×SuperSpeed USB 3.0 host
>> HighSpeed USB 2.0 client
>> eSATA (500mA)
>> full-size HDMI video output
>> microSD card slot
>> Stereo audio in and out
>> 4×60-pin headers with PCIe, LCD, mSATA
>> and much more... <http://elinux.org/Beagleboard:BeagleBoard-X15>
>> Software Compatibility
>>
>> Debian
>> Android
>> Ubuntu
>> Cloud9 IDE on Node.js
>> plus much more
>>
>>
>>
>>
>>
>>
>>
>>
>> http://uk.rs-online.com/web/p/processor-microcontroller-development-kits/8874764/ <http://uk.rs-online.com/web/p/processor-microcontroller-development-kits/8874764/> cost is approx. 207.49 but it is backordered at this site until 18/04/2017.
>>
>> More than enough horsepower for Linux and Convolution engine I should think.
>>
>> regards,
>>
>> Terry
>>
>>
>>> On Feb 9, 2017, at 6:26 PM, Terry Shultz <thx1138 at earthlink.net <mailto:thx1138 at earthlink.net>> wrote:
>>>
>>> Check out my friend Dr. Paul Beckman’s site for tools https://www.dspconcepts.com/audio-weaver <https://www.dspconcepts.com/audio-weaver>
>>>
>>> and my friend Tony Rouget site in Hong Kong https://www.minidsp.com <https://www.minidsp.com/>
>>>
>>> and lastly my pal Al Clark’s site https://www.danvillesignal.com <https://www.danvillesignal.com/> https://www.danvillesignal.com/landing-pages/snowbird-audio <https://www.danvillesignal.com/landing-pages/snowbird-audio>
>>>
>>> These are good examples of audio products that are better than the DSP Manufacture can build.
>>>
>>> and lastly my old friend from MIT Dr. Bill Gardner
>>>
>>> https://www.audiobuildersworkshop.com <https://www.audiobuildersworkshop.com/>
>>>
>>> hope this helps you guys a bit more.
>>>
>>> regards,
>>>
>>> Terry
>>>
>>>
>>>
>>>
>>>> On Feb 9, 2017, at 5:20 PM, cheater00 cheater00 <cheater00 at gmail.com <mailto:cheater00 at gmail.com>> wrote:
>>>>
>>>> That makes sense, it's also a very solid way to do things, if one manages to dot all the i's so the result is an accurate copy of the naiive method.
>>>>
>>>>
>>>> On Fri, 10 Feb 2017 02:01 Olivier Gillet, <ol.gillet at gmail.com <mailto:ol.gillet at gmail.com>> wrote:
>>>> I think you're vastly overestimating how much computational resources
>>>> this requires.
>>>>
>>>> A well-known trick is to partition the head of the IR into small
>>>> blocks (say 32 samples long if you want sub ms latency at 48kHz), and
>>>> use larger blocks for the tail of the IR (latency is not a problem for
>>>> the tail). The whole convolution can be decomposed as a sum of
>>>> convolutions by each of the blocks, which can be evaluated in the
>>>> frequency domain by DFT, complex multiplication by the DFT of the IR
>>>> block, and IFT.
>>>>
>>>> I did some back of the envelope computations and arrived at the result
>>>> of 40 MMACs for a sample rate of 48kHz and a 2s-long IR.
>>>>
>>>> I found it a bit too good to be true and I got back to the source:
>>>>
>>>> http://www.cs.ust.hk/mjg_lib/bibs/DPSu/DPSu.Files/Ga95.PDF <http://www.cs.ust.hk/mjg_lib/bibs/DPSu/DPSu.Files/Ga95.PDF>
>>>> p. 132, just before the beginning of section 6:
>>>>
>>>> "Thus a filter of size 128K samples will require approximately 427
>>>> multiples per output sample".
>>>>
>>>> This assumes that the DFT of all the blocks the IR is made of has been
>>>> pre-computed; but this can be done in faster than real-time when the
>>>> IR is loaded, assuming you've got enough RAM.
>>>>
>>>> Of course there's the issue of scheduling and a lot of additional
>>>> bookkeeping, but at the very worst the order of magnitude we're in are
>>>> hundreds of MMACs and a couple MBytes of RAM.
>>>>
>>>> On Fri, Feb 10, 2017 at 1:09 AM, cheater00 cheater00
>>>> <cheater00 at gmail.com <mailto:cheater00 at gmail.com>> wrote:
>>>> > Found the right spot at the TI website. I've made a somewhat large
>>>> > survey of AD and TI chips. I've uploaded the data to Google Docs (see
>>>> > link at the end of this email).
>>>> >
>>>> > For a lot of power, TI can't be beat. Their chips are as cheap as
>>>> > $0.78/GMACS, that's on TMS320C6678CYP, a chip with 8.5MB ram and 256
>>>> > GMACS, $200 at Mouser.
>>>> >
>>>> > The cheapest TMS320C is TMS320C6652CZH6 with 19.2 GMACS, 1MB ram, at
>>>> > $41.95 at Arrow.
>>>> >
>>>> > For cheap chips, AD is great. Their most powerful non-obsolete
>>>> > offering is ADSP-BF561SKBCZ-5A, 2 GMACS, 328KB ram, $32.59 at Arrow,
>>>> > for $16.30/GMACS. Some of their unusually cheap chips include:
>>>> > ADSP-BF525BBCZ-5A, 1.2 GMACS, 132KB, $11.79 at Newark Element14 for $9.83/GMACS
>>>> > ADSP-BF534BBCZ-4A, 1 GMACS, 134KB, $5.88 at Newark Element14 for $5.88/GMACS
>>>> > ADSP-BF531SBBCZ400 0.8 GMACS, 53KB, $4.44 at Avnet for $5.54/GMACS
>>>> >
>>>> > Those chips were noticeably (3-4x) cheaper than their close
>>>> > counterparts, apparently Newark and Avnet have some sort of blowout.
>>>> >
>>>> > I stopped surveying AD chips around 1.2 GMACS. There are going to be
>>>> > much cheaper ones than I found, I guess, but they just have so many
>>>> > chips I'd spend 2 days figuring out the prices. It's obvious: their
>>>> > stuff is cheap.
>>>> >
>>>> > AD are inexpensive, but clearly, if you need a lot of processing power
>>>> > and/or a lot of memory the TI will be 5 to 10 times cheaper. $200
>>>> > might not be so much if that's the majority of the cost of the box for
>>>> > a DIY gamer.
>>>> >
>>>> > As far as evaluation boards go, the highest-powered AD board seems to
>>>> > be the best value. The TMDSEVM6678L costs $399 on TI's website, has
>>>> > 64MB Flash, 512 MB DDR3 SDRAM, gigabit ethernet, usb mini-B, 80 IO
>>>> > header and an AMC header with PCIe, an emulator port, a small FPGA for
>>>> > configuration and booting, etc. See features at these two links:
>>>> > http://www.ti.com/tool/tmdsevm6678#Technical%20Documents <http://www.ti.com/tool/tmdsevm6678#Technical%20Documents>
>>>> > http://www2.advantech.com/Support/TI-EVM/6678le_of.aspx <http://www2.advantech.com/Support/TI-EVM/6678le_of.aspx>
>>>> >
>>>> > I don't know if the USB can be used in host mode. Does anyone know?
>>>> >
>>>> > It is unclear to me which version of the chip this board has - the
>>>> > 320GMACS one at 1.25 GHz or the 256 GMACS one at 1 GHz.
>>>> >
>>>> > Finally, there is a version of this board that costs $599 (50% more)
>>>> > and it has an XDS560V2 emulation mode. I understand that's a debugger.
>>>> > I don't know why exactly it is significant. What advantages does this
>>>> > bring for a developer?
>>>> > Is the emulator port shown on advantech's website only available in
>>>> > this more expensive version? If the cheaper version also has it, what
>>>> > can it be used for if the XDS560V2 emulation mode is not available?
>>>> >
>>>> > Survey data is on Google Docs. Anyone can comment:
>>>> >
>>>> > https://docs.google.com/spreadsheets/d/1oT-9PVh8yZMMwAkpltqGo8mhSL1NLXZJiC-LH7swsbY/edit?usp=sharing <https://docs.google.com/spreadsheets/d/1oT-9PVh8yZMMwAkpltqGo8mhSL1NLXZJiC-LH7swsbY/edit?usp=sharing>
>>>> >
>>>> >
>>>> > Have fun!
>>>> >
>>>> > On Thu, Feb 9, 2017 at 9:02 PM, cheater00 cheater00 <cheater00 at gmail.com <mailto:cheater00 at gmail.com>> wrote:
>>>> >> Yeah, usb host mode sounds super useful unless SD will allow faster UI
>>>> >> interaction.
>>>> >>
>>>> >> Do you know which TI chips have the most MMACS? I find the website
>>>> >> confusing.
>>>> >>
>>>> >>
>>>> >> On Thu, 9 Feb 2017 20:29 , <rsdio at audiobanshee.com <mailto:rsdio at audiobanshee.com>> wrote:
>>>> >>>
>>>> >>> Based on your survey, I'd recommend the Analog Devices board, even though
>>>> >>> I usually lean towards TMS320. The TMS320 family is huge, including both
>>>> >>> fixed-point and floating-point, low-power and high-speed, old and new
>>>> >>> designs, etc. Some of the TMS320 boards you listed are really geared more
>>>> >>> towards motor control than audio, which is why they might be underpowered
>>>> >>> for long impulse response convolution. I know that the AD SHARC family is
>>>> >>> also large, and they're very popular, but I am less familiar with the
>>>> >>> options.
>>>> >>>
>>>> >>> Don't forget to look at the chip manufacturer as a direct source for these
>>>> >>> boards. I always buy directly from Texas Instruments because Digi-Key tends
>>>> >>> to have a markup. Outside the US, maybe it's a different story due to
>>>> >>> international availability.
>>>> >>>
>>>> >>> I'd recommend something like the 1MB 800 MMAC board and not worry about
>>>> >>> external RAM. 1MB seems like plenty. I'd also recommend trying to implement
>>>> >>> both the time domain convolution and the frequency domain version. There are
>>>> >>> ways to reduce the latency of the frequency domain approach, and at least it
>>>> >>> would allow for longer impulse responses to be supported. For IRs that are
>>>> >>> short enough, the time domain approach would work. I've also seen papers on
>>>> >>> combining the two, since LTI techniques can be run in parallel and summed.
>>>> >>>
>>>> >>> As for taking pairs of 16-bit samples to speed things up, be aware that
>>>> >>> not all instructions can work that way. I think that most DSPs can do a few
>>>> >>> simple operations on value pairs, but the most complex DSP instructions can
>>>> >>> only handle full samples. DSP architectures have internal registers that are
>>>> >>> much larger than the sample size, like 56-bit or higher. If you think about
>>>> >>> all of the potential overflow when adding thousands of samples from an
>>>> >>> impulse response, you can see why such large registers are needed. When
>>>> >>> working in that model, its not possible to handle the overflow from two
>>>> >>> samples that are combined in a single 32-bit input value.
>>>> >>>
>>>> >>> Finally, I think that nobody has made something like this because the user
>>>> >>> interface would be rather difficult. It's a bit of a power-user effect. On
>>>> >>> that note, some sort of SD card might be useful, so I can see why you're
>>>> >>> looking into that. However, perhaps just a custom USB class device would be
>>>> >>> enough of an interface to allow downloading impulse responses to the device.
>>>> >>> At a minimum, you'll need a large Flash to store the current impulse
>>>> >>> response, or some way to partition the program Flash to set aside room for
>>>> >>> the data. The AD board with USB host mode could feasibly read directly from
>>>> >>> a USB memory stick or Flash drive.
>>>> >>>
>>>> >>> Brian
>>>> >>>
>>>> >>>
>>>> >>> On Feb 6, 2017, at 9:20 PM, cheater00 cheater00 <cheater00 at gmail.com <mailto:cheater00 at gmail.com>>
>>>> >>> wrote:
>>>> >>> > Brian, $50 is a steal. I've had a look at Digikey.
>>>> >>> >
>>>> >>> > This TI board is £24. It has ~150 KB on-chip RAM, but it has an
>>>> >>> > integrated SDRAM interface.
>>>> >>> >
>>>> >>> > http://www.digikey.co.uk/product-detail/en/texas-instruments/LAUNCHXL-F28377S/296-42484-ND/5404239 <http://www.digikey.co.uk/product-detail/en/texas-instruments/LAUNCHXL-F28377S/296-42484-ND/5404239>
>>>> >>> >
>>>> >>> >
>>>> >>> > This TI board is £40. It has ~384 KB on-chip RAM and an integrated
>>>> >>> > SRAM interface and SD card support. 200 MMACS.
>>>> >>> >
>>>> >>> > http://www.digikey.co.uk/product-detail/en/texas-instruments/TMDX5505EZDSP/296-24965-ND/2127652 <http://www.digikey.co.uk/product-detail/en/texas-instruments/TMDX5505EZDSP/296-24965-ND/2127652>
>>>> >>> >
>>>> >>> > This AD board is £60. It has 1MB on-chip RAM and USB host mode, no
>>>> >>> > idea about ram interface or SD card. 800 MMACS.
>>>> >>> >
>>>> >>> > http://www.digikey.co.uk/product-detail/en/analog-devices-inc/ADZS-BF706-EZMINI/ADZS-BF706-EZMINI-ND/5408943 <http://www.digikey.co.uk/product-detail/en/analog-devices-inc/ADZS-BF706-EZMINI/ADZS-BF706-EZMINI-ND/5408943>
>>>> >>> >
>>>> >>> > This last one sports 800 MMACS. Is this enough processing power for
>>>> >>> > the 5-second convolution I mentioned above? It seemed to me like that
>>>> >>> > would need 4608 MMACs. Maybe 2304 if we take pairs of 16 bit samples
>>>> >>> > and treat them as 32 bit values. Are my numbers correct? Are there
>>>> >>> > optimizations that can be done to lower this number, while still
>>>> >>> > having zero latency? I understand FFT domain convolution introduces
>>>> >>> > latency, which is not wanted in hardware. "Naive" MAC based
>>>> >>> > convolution doesn't seem too far out of reach.
>>>> >>> >
>>>> >>> > This TI board is £156. It has 256 KB on-chip RAM and support for DDR2
>>>> >>> > SDRAM. No mention of MMACs but they say 3648 MIPS and I assume a
>>>> >>> > pipelined MAC costs one instruction, would that be correct?
>>>> >>> >
>>>> >>> > http://www.digikey.co.uk/product-detail/en/texas-instruments/TMDSLCDK6748/TMDSLCDK6748-ND/5213032 <http://www.digikey.co.uk/product-detail/en/texas-instruments/TMDSLCDK6748/TMDSLCDK6748-ND/5213032>
>>>> >>> >
>>>> >>> >
>>>> >>> > The more expensive boards don't seem to have more powerful DSP chips.
>>>> >>> > And those chips don't really get much more powerful either. However,
>>>> >>> > convolution is easily parallelised. So, worst case scenario, if you
>>>> >>> > wanted really long impulse responses you'd have to use a few chips.
>>>> >>> > However, even the really good 800 MMACS Blackfin ones are £15 unit
>>>> >>> > price, so that's not so bad...
>>>> >>> >
>>>> >>> > So, tell me, why hasn't anyone made this yet?
>>>> >>> >
>>>> >
>>>> > _______________________________________________
>>>> > Synth-diy mailing list
>>>> > Synth-diy at synth-diy.org <mailto:Synth-diy at synth-diy.org>
>>>> > http://synth-diy.org/mailman/listinfo/synth-diy <http://synth-diy.org/mailman/listinfo/synth-diy>
>>>> _______________________________________________
>>>> Synth-diy mailing list
>>>> Synth-diy at synth-diy.org <mailto:Synth-diy at synth-diy.org>
>>>> http://synth-diy.org/mailman/listinfo/synth-diy <http://synth-diy.org/mailman/listinfo/synth-diy>
>>>
>>> _______________________________________________
>>> Synth-diy mailing list
>>> Synth-diy at synth-diy.org <mailto:Synth-diy at synth-diy.org>
>>> http://synth-diy.org/mailman/listinfo/synth-diy <http://synth-diy.org/mailman/listinfo/synth-diy>
>>
>> _______________________________________________
>> Synth-diy mailing list
>> Synth-diy at synth-diy.org <mailto:Synth-diy at synth-diy.org>
>> http://synth-diy.org/mailman/listinfo/synth-diy <http://synth-diy.org/mailman/listinfo/synth-diy>
>
> _______________________________________________
> Synth-diy mailing list
> Synth-diy at synth-diy.org <mailto:Synth-diy at synth-diy.org>
> http://synth-diy.org/mailman/listinfo/synth-diy <http://synth-diy.org/mailman/listinfo/synth-diy>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://synth-diy.org/pipermail/synth-diy/attachments/20170210/8ea1dd30/attachment.htm>
More information about the Synth-diy
mailing list