Yahoo Groups archive

Milter-greylist

Index last updated: 2026-04-28 23:32 UTC

Message

Re: [milter-greylist] Re: Limiting resident memory usage

2006-11-02 by Matt Kettler

eclark wrote:
> Jon, please refer to Matthias' previous email, regarding use of rbls to do 
> greylisting, not blacklisting. Specifically these bits:
> 
> dnsrbl "SORBS DUN" dnsbl.sorbs.net 127.0.0.10
> acl greylist dnsrbl "SORBS DUN" delay 12h
> 

<snip>

> 
> Again, these are all just suggestions, but I very strongly feel your overly 
> broad and resource intensive regex approach to the issue will ultimately bite 
> you in the ass.
> 

Wait.. you think the *regex* is too resource intensive, but advocate using RBLs
instead?

Are you completely out of your MIND???!!!


An RBL is a NETWORK TEST. You have to create a UDP socket, send a request, wait
for a reply, parse the reply..

That's by FAR more resource intensive than the regex is. Probably by a factor of
at least 10, and more along the lines of several thousand times more expensive.

I'll admitt that all of the nubmers below are educated guesses on my part.
However, they are likely to be fairly close to real. Certainly much closer than
the viewpoint that RBLs are cheaper than regexes.


Time:	Regex - about 0.1 microseconds
	RBL - tens of milliseconds ( > 10000 microseconds)
	RBL is: about one hundred thousand times slower

Memory: Regex - about 100 bytes, including annotations
	RBL - With the socket structures, buffers to store packets in, etc, probably
about 2000 bytes.
	RBL uses: approximately 20 times the RAM

CPU:	Regex - a few hundred clock cycles
	RBL - about ten thousand clock cycles. (remember, you have to format the query,
and parse the response here. PLUS you have the overhead of creating a udp
socket, IP stack processing, Network interface driver, etc.)
	RBL uses: approximately 50 times the CPU

IO:	Regex - RAM only
	RBL - RAM +  bus access to the NIC registers + busmastering to ram by the NIC.
	RBL uses: at least 10 times the IO bus time. Bus accesses to NIC registers are
considerably slower than cpu-to-ram accesses, and are not cacheable.


If you think you're saving resources using RBLs, do yourself a favor and
re-think that viewpoint.

Perhaps you mistakenly got this viewpoint from the bigevil.cf or
sa-blacklist-uri.cf vs surbl.org DNS issues with SpamAssassin.

However, in that case, bigevil contains HUNDREDS of VERY complicated regexes,
plus SpamAssassin adds lots of overhead beyond just the regex itself.

sa-blacklist-uri contains 540+ regexes like this one:

m/\b0(?:204-qazwsxma\.biz|2319\.com|241\.com|242\.com|243\.com|25ma\.com|284\.com|287\.com|28jsh\.co
m|2bikes\.com|2cruises\.com|2energydeals\.com|2host\.com|2optrix\.com|2owsk\.info|2refi\.net|2techno\.com|3-shopper-value\.c
om|32439\.com|345fjh\.com|35171246\.net|3l\.net|3newsletter-server1\.com|449\.com|491\.com|4aol\.com|4lyrics\.com|4newslette
r-server1\.com|4olympics\.com|4rivival\.com|5100\.com|512ly\.com|534star\.com|571che\.com|58\.cn|5988\.com\.cn|5cars\.com|5m
0rt\.com|5m0rt\.net|5mort\.com|5startlogic\.com|6100\.com|657\.com|683\.com|684\.com|685\.com|693\.com|695\.com|697\.com|6ch
ip\.com|6q\.com)\b/i


Yes.. 500 regexes that are over 600 bytes each in text form is REALLY slow and
REALLY CPU intensive..

So yes, there is a point where it's good to replace regexes with RBLs if you can
replace a LOT of regexes with one RBL. The cost of an RBL query is fixed no
matter how many entries exist in it. Regexes on the other hand start adding up
the more of them you have.

However, using a RBL to replace 2 short lightweight regexes is NOT a performance
gain. It's a massive performance loss.

Attachments

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.