[sdiy] IR Reverb
rsdio at audiobanshee.com
rsdio at audiobanshee.com
Thu Feb 15 09:36:25 CET 2018
On Feb 14, 2018, at 11:42 PM, Tim Ressel <timr at circuitabbey.com> wrote:
> (note: laughter is understandable and probably mandatory)
> So I just did a cannonball into the murky waters of impulse response reverb. My tenuous grasp of DSP coupled with my sketchy math skills is making this, well, interesting. So far I have acquainted myself with linear convolution, which looks suspiciously like an FIR. Since it says "impulse response" on the box, that seemed to make sense. But then I stepped back and tried to imagine a reverb system as I understand it and got confused.
> Reverbs have two things going on: time delay and filtering. The time component gives the reverb time and overall thickness of the reverb, while the filtering can make the effect warmer or colder (yes, oversimplified). So I am guessing the impulse response is a room characterization used to color a reverb. However that seems incomplete. Unless the impulse response is really long or is sets of impulse responses over time.
> I suspect that gurgling sound is me in over my head.
No laughter here, but maybe others will laugh louder after reading my response…
I don't know if it will help, but here's my "quick" explanation.
A couple of introductory paragraphs for terminology, and then the crux around (*) below.
(cheater teaser: if you know precisely what happens to a single audio sample that is tossed into a room, then you know exactly what will happen to any sequence of any number of samples played in that same room.)
Impulse Responses work on Linear, Time-Invariant systems, and can recreate them rather faithfully.
Linear means that there is no distortion. The important thing about Linear is that if you know what happens to two separate signals when passed through a Linear system, then you can just add the signals together and pass them through the system once. Whether the mixing happens before or after the "reverb" doesn't change anything. This would not be true if there were distortion, because distortion is very level-dependent, and adding two signals would give more distortion than each signal on its own, and then it would very much matter which order the mixing occurred relative to the "reverb." Filtering is also Linear, in the sense that you can mostly filter before or after and get the same results (filtering inside a regeneration loop of a delay doesn't quite fit into this math, but thankfully physics guarantees that no natural room will reverberate forever unless you add energy).
Time-Invariant simply means that the output isn't really any different if you send the input sooner or later (other than the obvious fact that the output also occurs sooner or later, correspondingly). Compressors, Limiters, Expanders, and Gates are not Time-Invariant because the gain changes over time, according to the Attack and Decay among other things, such that mixing audio before or after Dynamics gives different results.
That (LTI) said, an IR Reverb works because once you know the output based on one particular kind of input, you know what the output will be for any input. If the input is louder, the output will be louder. If the input occurs sooner or later, the output will occur sooner or later by the same time (I suppose this could be confusing, since Reverb involves delays - but suffice it to say that the first echo and subsequent echoes will all occur in the same predictable fashion based upon when the input happens). If you break the input into individual samples, then you can recreate any possible output simple by combining samples that vary in amplitude and time delay.
If you're still with me, then the Impulse Response is basically a summary of how the Reverb responds to a single sample. Since we really prefer to listen to the Reverb response to a whole bunch of samples in order (making up a track), the Linear, Time-Invariant rules tell us everything we need to know about all the input-to-output sample relationships based upon that one test sample.
* Another way to put it is this: For each individual input sample, the IR Reverb calculates the total "room" response to that single sample, spread out over time as more samples come in. Of course, this will effectively involve echoing that one sample for quite a long time, and with some smoothing (filtering). By the time the second input sample occurs, the IR Reverb adds in the response to the first sample to the response to the second sample. Because of the Linear part, each output sample is actually the sum (mix) of a *bunch* of previous samples. And so on...
So, you're right, there are a lot of samples in an Impulse Response. They can be really long. If you cut them short, the reverb tail will be cut short. Some vendors with limited memory IR implementations start with the actual Impulse Response for the first part, but synthesize a generic reverb for the quieter tail. It's not as accurate, but it takes less memory and less processing power, and you generally can't hear the detail in the quietest part of the tail because it's overwhelmed by louder sounds. As long as it doesn't suddenly drop out, you're fine.
If your Reverb tail is 1 second, then it will be 48,000 samples long at 48 kHz sample rate. That means every single output sample from an IR Reverb is the weighted sum of 48,000 IR samples plus 1 new input sample! That's a lot of processing, so there are always limitations and clever shortcuts.
There is no filtering in an IR Reverb, per se. The tone changes are part of the linear amplitude and time delayed samples of the IR itself. Again, you're right that it's like an FIR. Some Reverbs do use standard filters to avoid full IR calculations, because those take less CPU. Mixing partial pure IR with some standard reverb techniques can help. But at the core, you don't have to specifically measure the warmth old coldness of a room when measuring its IR - you get that for free. Of course, if you're synthesizing an IR Reverb, then you might have parameters for warm/cold.
p.s. the ideal Impulse is a single sample at maximum volume, followed (and preceded) by perfect silence. It's basically equivalent to throwing all frequencies into a room at once, and measuring the result in time and amplitude. However, it's difficult to find amplifiers and speakers that can reproduce all frequencies together at maximum volume without distortion - and if you recall the Linear requirement above, any distortion ruins the works.
So one popular solution is to feed in only one frequency at a time, at a calibrated level, and then sweep through the frequencies at a precise rate. Amplifiers and speakers have an easier time of this. Then, this weirdly spread out response has to be collapsed in time very precisely to recreate what would have happened with an ideal Impulse that has everything all at once. If you don't start the sweep below 20 Hz or continue above 20 kHz, then you don't actually have the equivalent of a true impulse (which would ideally include all frequencies). Since we can't hear outside that range anyway, the lack of those parts of the Impulse Response does not matter. We don't really save any CPU by skipping the frequencies below 20 Hz - they're just too hard to generate at full volume. But there are massive CPU savings by skipping higher frequencies, especially since the high frequency content drops off towards the tail of the reverb anyway, at least for natural IRs.
TASCAM actually has a patent on a method for reducing the CPU usage by taking advantage of reduced sample rate processing of the tail of a reverb. Since the math tells us we can recreate any LTI system by summing its parts separately, the TASCAM reverb processes part of the audio at full sample rate, and part of the audio at half sample rate, then mixes the two together after properly delaying the tail by the required number of samples. Sounds exactly the same as if it were all processes at full sample rate, provided that the tail doesn't have any high frequencies in it after the dividing point.
Ok, that wasn't "quick" at all...
More information about the Synth-diy