[sdiy] Hardware convolution box?

cheater00 cheater00 cheater00 at gmail.com
Fri Feb 10 18:43:18 CET 2017


Those are some awesome tips, thanks again Terry

On Fri, 10 Feb 2017 17:44 Terry Shultz, <thx1138 at earthlink.net> wrote:

> Hi Bruno,
>
> I use the Code Composer tools from TI and I have been using these tools
> since they were in Alpha state.
> Assembly code is difficult on the TI DSP Sitara platform. Pipeline length
> and such.
>
> I find it easier to use one DSP as the Decoder for Atmos and the 2nd as
> post processing and Bass manager etc.
>
> Also found it easier to do assembly programming on the ADI 4th Generation
> Falcon parts as the pipeline line was a bit shorter. The longer the
> pipeline,
> the more difficult to re-order efficiently.
>
> In some cases I have had to build a small library for hand tuned FFT’s and
> IFFT’s. Can’t show this IP as I was paid to develop for Automotive apps. by
> Semiconductor company.
>
> The compilers have gotten pretty good for TI and ADI and I do less
> assembly work more and more.
>
> Now I am building Neon Signal processing block for the Cortex A series of
> ARM processors.
>
> Check out projectne10.github.io  or https://project*ne10*.github.io/*Ne10*
> / <https://projectne10.github.io/Ne10/>
>
> If you want to improve performance , use the Neon processor and build up
> your own libs.
>
> I hope this helps you guys.
>
> best regards,
>
> Terry
>
> On Feb 10, 2017, at 7:35 AM, Bruno Afonso <bafonso at gmail.com> wrote:
>
> How are you going about using the DSPs? Could you comment on the toolchain
> and how easy it is to develop code for it compared to other DSPs?
>
> That X15 does indeed look mighty interesting for even non audio projects
> of mine.
>
>
>
> On Fri, Feb 10, 2017 at 12:59 AM Terry Shultz <thx1138 at earthlink.net>
> wrote:
>
> Hi Guys,
>
> I usually start with the highest performing product first and cost reduce
> after I get it running.
>
> Why drive yourself nuts worrying about MIPs and Memory. @ $250.00 I can
> build a monster Convolution engine.
>
> There again the Part is not cheap in small volume and .8mm pitch on BGA
> means at least a  10 layer board for high performance.
>
> I could not even build that X-15 board for what they sell it for on the
> Web.
>
> Jason sent me on last spring and I have it running Linux and hooked up to
> my 2nd monitor.
>
> It is way overkill but I have need of the Dual DSP’s for Dolby Atmos
> decoding on a 3D audio Headphone project I am building.
>
> best regards,
>
> Terry
>
> On Feb 9, 2017, at 9:24 PM, Terry Shultz <thx1138 at earthlink.net> wrote:
>
> Another guy I know is Jason Kridner, who started BeagleBoard.org
> <http://beagleboard.org/> https://beagleboard.org/x15/
>
> This is a monster performer http://www.ti.com/product/am5728
>
> Processor: TI AM5728 2×1.5-GHz ARM® Cortex-A15
> <http://www.ti.com/product/am5728>
>
>    - 2GB DDR3 RAM
>    - 4GB 8-bit eMMC on-board flash storage
>    - 2D/3D graphics and video accelerators (GPUs)
>    - 2×700-MHz C66 digital signal processors (DSPs)
>    - 2×ARM Cortex-M4 microcontrollers (MCUs)
>    - 4×32-bit programmable real-time units (PRUs)
>
> Connectivity
>
>    - 2×Gigabit Ethernet
>    - 3×SuperSpeed USB 3.0 host
>    - HighSpeed USB 2.0 client
>    - eSATA (500mA)
>    - full-size HDMI video output
>    - microSD card slot
>    - Stereo audio in and out
>    - 4×60-pin headers with PCIe, LCD, mSATA
>    - and much more... <http://elinux.org/Beagleboard:BeagleBoard-X15>
>
> Software Compatibility
>
>    - Debian
>    - Android
>    - Ubuntu
>    - Cloud9 IDE on Node.js
>    - plus much more
>
>
>
>
>
>
>
>
>
>
> http://uk.rs-online.com/web/p/processor-microcontroller-development-kits/8874764/ cost
> is approx. 207.49 but it is backordered at this site until 18/04/2017.
>
> More than enough horsepower for Linux and Convolution engine I should
> think.
>
> regards,
>
> Terry
>
>
> On Feb 9, 2017, at 6:26 PM, Terry Shultz <thx1138 at earthlink.net> wrote:
>
> Check out my friend Dr. Paul Beckman’s site for tools
> https://www.dspconcepts.com/audio-weaver
>
> and my friend Tony Rouget site in Hong Kong https://www.minidsp.com
>
> and lastly my pal Al Clark’s site https://www.danvillesignal.com
> https://www.danvillesignal.com/landing-pages/snowbird-audio
>
> These are good examples of audio products that are better than the DSP
> Manufacture can build.
>
> and lastly my old friend from MIT Dr. Bill Gardner
>
> https://www.audiobuildersworkshop.com
>
> hope this helps you guys a bit more.
>
> regards,
>
> Terry
>
>
>
>
> On Feb 9, 2017, at 5:20 PM, cheater00 cheater00 <cheater00 at gmail.com>
> wrote:
>
> That makes sense, it's also a very solid way to do things, if one manages
> to dot all the i's so the result is an accurate copy of the naiive method.
>
> On Fri, 10 Feb 2017 02:01 Olivier Gillet, <ol.gillet at gmail.com> wrote:
>
> I think you're vastly overestimating how much computational resources
> this requires.
>
> A well-known trick is to partition the head of the IR into small
> blocks (say 32 samples long if you want sub ms latency at 48kHz), and
> use larger blocks for the tail of the IR (latency is not a problem for
> the tail). The whole convolution can be decomposed as a sum of
> convolutions by each of the blocks, which can be evaluated in the
> frequency domain by DFT, complex multiplication by the DFT of the IR
> block, and IFT.
>
> I did some back of the envelope computations and arrived at the result
> of 40 MMACs for a sample rate of 48kHz and a 2s-long IR.
>
> I found it a bit too good to be true and I got back to the source:
>
> http://www.cs.ust.hk/mjg_lib/bibs/DPSu/DPSu.Files/Ga95.PDF
> p. 132, just before the beginning of section 6:
>
> "Thus a filter of size 128K samples will require approximately 427
> multiples per output sample".
>
> This assumes that the DFT of all the blocks the IR is made of has been
> pre-computed; but this can be done in faster than real-time when the
> IR is loaded, assuming you've got enough RAM.
>
> Of course there's the issue of scheduling and a lot of additional
> bookkeeping, but at the very worst the order of magnitude we're in are
> hundreds of MMACs and a couple MBytes of RAM.
>
> On Fri, Feb 10, 2017 at 1:09 AM, cheater00 cheater00
> <cheater00 at gmail.com> wrote:
> > Found the right spot at the TI website. I've made a somewhat large
> > survey of AD and TI chips. I've uploaded the data to Google Docs (see
> > link at the end of this email).
> >
> > For a lot of power, TI can't be beat. Their chips are as cheap as
> > $0.78/GMACS, that's on TMS320C6678CYP, a chip with 8.5MB ram and 256
> > GMACS, $200 at Mouser.
> >
> > The cheapest TMS320C is TMS320C6652CZH6 with 19.2 GMACS, 1MB ram, at
> > $41.95 at Arrow.
> >
> > For cheap chips, AD is great. Their most powerful non-obsolete
> > offering is ADSP-BF561SKBCZ-5A, 2 GMACS, 328KB ram, $32.59 at Arrow,
> > for $16.30/GMACS. Some of their unusually cheap chips include:
> > ADSP-BF525BBCZ-5A, 1.2 GMACS, 132KB, $11.79 at Newark Element14 for
> $9.83/GMACS
> > ADSP-BF534BBCZ-4A, 1 GMACS, 134KB, $5.88 at Newark Element14 for
> $5.88/GMACS
> > ADSP-BF531SBBCZ400 0.8 GMACS, 53KB, $4.44 at Avnet for $5.54/GMACS
> >
> > Those chips were noticeably (3-4x) cheaper than their close
> > counterparts, apparently Newark and Avnet have some sort of blowout.
> >
> > I stopped surveying AD chips around 1.2 GMACS. There are going to be
> > much cheaper ones than I found, I guess, but they just have so many
> > chips I'd spend 2 days figuring out the prices. It's obvious: their
> > stuff is cheap.
> >
> > AD are inexpensive, but clearly, if you need a lot of processing power
> > and/or a lot of memory the TI will be 5 to 10 times cheaper. $200
> > might not be so much if that's the majority of the cost of the box for
> > a DIY gamer.
> >
> > As far as evaluation boards go, the highest-powered AD board seems to
> > be the best value. The TMDSEVM6678L costs $399 on TI's website, has
> > 64MB Flash, 512 MB DDR3 SDRAM, gigabit ethernet, usb mini-B, 80 IO
> > header and an AMC header with PCIe, an emulator port, a small FPGA for
> > configuration and booting, etc. See features at these two links:
> > http://www.ti.com/tool/tmdsevm6678#Technical%20Documents
> > http://www2.advantech.com/Support/TI-EVM/6678le_of.aspx
> >
> > I don't know if the USB can be used in host mode. Does anyone know?
> >
> > It is unclear to me which version of the chip this board has - the
> > 320GMACS one at 1.25 GHz or the 256 GMACS one at 1 GHz.
> >
> > Finally, there is a version of this board that costs $599 (50% more)
> > and it has an XDS560V2 emulation mode. I understand that's a debugger.
> > I don't know why exactly it is significant. What advantages does this
> > bring for a developer?
> > Is the emulator port shown on advantech's website only available in
> > this more expensive version? If the cheaper version also has it, what
> > can it be used for if the XDS560V2 emulation mode is not available?
> >
> > Survey data is on Google Docs. Anyone can comment:
> >
> >
> https://docs.google.com/spreadsheets/d/1oT-9PVh8yZMMwAkpltqGo8mhSL1NLXZJiC-LH7swsbY/edit?usp=sharing
> >
> >
> > Have fun!
> >
> > On Thu, Feb 9, 2017 at 9:02 PM, cheater00 cheater00 <cheater00 at gmail.com>
> wrote:
> >> Yeah, usb host mode sounds super useful unless SD will allow faster UI
> >> interaction.
> >>
> >> Do you know which TI chips have the most MMACS? I find the website
> >> confusing.
> >>
> >>
> >> On Thu, 9 Feb 2017 20:29 , <rsdio at audiobanshee.com> wrote:
> >>>
> >>> Based on your survey, I'd recommend the Analog Devices board, even
> though
> >>> I usually lean towards TMS320. The TMS320 family is huge, including
> both
> >>> fixed-point and floating-point, low-power and high-speed, old and new
> >>> designs, etc. Some of the TMS320 boards you listed are really geared
> more
> >>> towards motor control than audio, which is why they might be
> underpowered
> >>> for long impulse response convolution. I know that the AD SHARC family
> is
> >>> also large, and they're very popular, but I am less familiar with the
> >>> options.
> >>>
> >>> Don't forget to look at the chip manufacturer as a direct source for
> these
> >>> boards. I always buy directly from Texas Instruments because Digi-Key
> tends
> >>> to have a markup. Outside the US, maybe it's a different story due to
> >>> international availability.
> >>>
> >>> I'd recommend something like the 1MB 800 MMAC board and not worry about
> >>> external RAM. 1MB seems like plenty. I'd also recommend trying to
> implement
> >>> both the time domain convolution and the frequency domain version.
> There are
> >>> ways to reduce the latency of the frequency domain approach, and at
> least it
> >>> would allow for longer impulse responses to be supported. For IRs that
> are
> >>> short enough, the time domain approach would work. I've also seen
> papers on
> >>> combining the two, since LTI techniques can be run in parallel and
> summed.
> >>>
> >>> As for taking pairs of 16-bit samples to speed things up, be aware that
> >>> not all instructions can work that way. I think that most DSPs can do
> a few
> >>> simple operations on value pairs, but the most complex DSP
> instructions can
> >>> only handle full samples. DSP architectures have internal registers
> that are
> >>> much larger than the sample size, like 56-bit or higher. If you think
> about
> >>> all of the potential overflow when adding thousands of samples from an
> >>> impulse response, you can see why such large registers are needed. When
> >>> working in that model, its not possible to handle the overflow from two
> >>> samples that are combined in a single 32-bit input value.
> >>>
> >>> Finally, I think that nobody has made something like this because the
> user
> >>> interface would be rather difficult. It's a bit of a power-user
> effect. On
> >>> that note, some sort of SD card might be useful, so I can see why
> you're
> >>> looking into that. However, perhaps just a custom USB class device
> would be
> >>> enough of an interface to allow downloading impulse responses to the
> device.
> >>> At a minimum, you'll need a large Flash to store the current impulse
> >>> response, or some way to partition the program Flash to set aside room
> for
> >>> the data. The AD board with USB host mode could feasibly read directly
> from
> >>> a USB memory stick or Flash drive.
> >>>
> >>> Brian
> >>>
> >>>
> >>> On Feb 6, 2017, at 9:20 PM, cheater00 cheater00 <cheater00 at gmail.com>
> >>> wrote:
> >>> > Brian, $50 is a steal. I've had a look at Digikey.
> >>> >
> >>> > This TI board is £24. It has ~150 KB on-chip RAM, but it has an
> >>> > integrated SDRAM interface.
> >>> >
> >>> >
> http://www.digikey.co.uk/product-detail/en/texas-instruments/LAUNCHXL-F28377S/296-42484-ND/5404239
> >>> >
> >>> >
> >>> > This TI board is £40. It has ~384 KB on-chip RAM and an integrated
> >>> > SRAM interface and SD card support. 200 MMACS.
> >>> >
> >>> >
> http://www.digikey.co.uk/product-detail/en/texas-instruments/TMDX5505EZDSP/296-24965-ND/2127652
> >>> >
> >>> > This AD board is £60. It has 1MB on-chip RAM and USB host mode, no
> >>> > idea about ram interface or SD card. 800 MMACS.
> >>> >
> >>> >
> http://www.digikey.co.uk/product-detail/en/analog-devices-inc/ADZS-BF706-EZMINI/ADZS-BF706-EZMINI-ND/5408943
> >>> >
> >>> > This last one sports 800 MMACS. Is this enough processing power for
> >>> > the 5-second convolution I mentioned above? It seemed to me like that
> >>> > would need 4608 MMACs. Maybe 2304 if we take pairs of 16 bit samples
> >>> > and treat them as 32 bit values. Are my numbers correct? Are there
> >>> > optimizations that can be done to lower this number, while still
> >>> > having zero latency? I understand FFT domain convolution introduces
> >>> > latency, which is not wanted in hardware. "Naive" MAC based
> >>> > convolution doesn't seem too far out of reach.
> >>> >
> >>> > This TI board is £156. It has 256 KB on-chip RAM and support for DDR2
> >>> > SDRAM. No mention of MMACs but they say 3648 MIPS and I assume a
> >>> > pipelined MAC costs one instruction, would that be correct?
> >>> >
> >>> >
> http://www.digikey.co.uk/product-detail/en/texas-instruments/TMDSLCDK6748/TMDSLCDK6748-ND/5213032
> >>> >
> >>> >
> >>> > The more expensive boards don't seem to have more powerful DSP chips.
> >>> > And those chips don't really get much more powerful either. However,
> >>> > convolution is easily parallelised. So, worst case scenario, if you
> >>> > wanted really long impulse responses you'd have to use a few chips.
> >>> > However, even the really good 800 MMACS Blackfin ones are £15 unit
> >>> > price, so that's not so bad...
> >>> >
> >>> > So, tell me, why hasn't anyone made this yet?
> >>> >
> >
> > _______________________________________________
> > Synth-diy mailing list
> > Synth-diy at synth-diy.org
> > http://synth-diy.org/mailman/listinfo/synth-diy
>
> _______________________________________________
> Synth-diy mailing list
> Synth-diy at synth-diy.org
> http://synth-diy.org/mailman/listinfo/synth-diy
>
>
> _______________________________________________
> Synth-diy mailing list
> Synth-diy at synth-diy.org
> http://synth-diy.org/mailman/listinfo/synth-diy
>
>
> _______________________________________________
> Synth-diy mailing list
> Synth-diy at synth-diy.org
> http://synth-diy.org/mailman/listinfo/synth-diy
>
>
> _______________________________________________
> Synth-diy mailing list
> Synth-diy at synth-diy.org
> http://synth-diy.org/mailman/listinfo/synth-diy
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://synth-diy.org/pipermail/synth-diy/attachments/20170210/07ff748a/attachment.htm>


More information about the Synth-diy mailing list