Yahoo Groups archive

Milter-greylist

Index last updated: 2026-04-28 23:32 UTC

Message

Re: [milter-greylist] [RFC] implementing taRgrey

2009-07-07 by Adam Katz

I wrote:
>> One important implementation note: if the connecting server drops
>> the connection but then comes back later, the tarpit clock should
>> have been counting from that first connection. (Otherwise, some 
>> noncompliant servers might never deliver mail.)

Kouhei Sutou wrote:
> I want to rescue the servers by greylisting not tarpitting.

Eh?  Define "rescue" here...  If we offer a tarpit action, it has to
play well with the greylist action, even if you do not intend to use
tarpitting (in which case, I wonder why you suggested it).

>> After reading a bit on S25C, I'm quite dubious. No concrete data
>> on false-positives is presented and the whitelist is MASSIVE.
> 
> Yes. S25R has some false positives. We need a whitelist when we use
> S25R.
> 
> We can use S25R with greylisting to maintain our whitelist 
> automatically.

Perhaps if you use lazyaw, but even with that, massive mailers like
hotmail won't get properly whitelisted.  To somebody trying to limit
the negative impact of greylisting, this mandates whitelisting.

> Here is a configuration to use S25R in milter-greylist:
> 
>   extendedregex
>   racl greylist domain /^\[.+\]$/ msg "S25R rule 0"
>   racl greylist domain /^[^.]*[0-9][^0-9.]+[0-9].*\./ msg "S25R rule 1"
>   racl greylist domain /^[^.]*[0-9][0-9][0-9][0-9][0-9]/ msg "S25R rule 2"
>   racl greylist domain /^([^.]+\.)?[0-9][^.]*\.[^.]+\..+\.[a-z]/ msg "S25R rule 3"
>   racl greylist domain /^[^.]*[0-9]\.[^.]*[0-9]-[0-9]/ msg "S25R rule 4"
>   racl greylist domain /^[^.]*[0-9]\.[^.]*[0-9]\.[^.]+\..+\./ msg "S25R rule 5"
>   racl greylist domain /^(dhcp|dialup|ppp|[achrsvx]?dsl)[^.]*[0-9]/ msg "S25R rule 6"

Rule 0 is commonly hit by legit servers, so it will contain a large
number of false positives.  I would not recommend using it.  As to the
provided statistics, they are stale (April 2004) and insignificant
(only 567 IPs observed).  Because the paper's implementation blocked
rather than greylisted, false-positive rates cannot be determined.

The paper's stats show rules 4-6 are almost completely useless; rules
4 and 5 blocked 1.2% messages together and rule 6 failed to get even
0.1%, while rule 3 (the worst of remaining rules) blocked 6.5%.

For such broad-sweeping "high impact" rules, I see untested methods
which vary from too many false positives to too few hits.

>> I've implemented S25C in SpamAssassin with near-zero scores to
>> see what kind of impact it would have on my servers, but I doubt
>> it will prove useful (since SA fires after greylisting).
> 
> S25R detects most of spam-bots and greylisting also detects (and
> rejects) most of spam-bots. SpamAssassin will not receive mails
> that can be detected by S25R.

Sorry, I thought the last part of my above quoted sentence explained
that.  However, don't forget that botnets sometimes survive
greylisting or find their way onto whitelists like DNSWL (which is
becoming quite common).

Sure enough, I have results after less than a day.  In the following
table, I show hits as a ratio against of all spamassassin-scanned mail
(after passing or bypassing greylisting), then what percent of those
hits would have become marked at various thresholds.  I reject mail at
8+ points and mark mail at 5+ points, so the table has results for
scoring each rule at 3.0, 2.0, and 1.0 points. 2500 emails were
included in these stats.

So S25R rule 1 hit 12.0% of the SA-scanned mails.  If scored 3.0, 4.1%
more of those hits would have been marked as spam and 21.5% would have
been rejected.  If scored 2.0, 2.9% more of the hits would have been
marked and 14.6% would have been rejected, etc.

S25R   total      --- 3.0 ---       --- 2.0 ---       --- 1.0 ---
rule    hits      mark   block      mark   block      mark   block
 0      6.7%      zero    3.6%      zero    2.8%      zero    2.1%
 1     12.0%      4.1%   21.5%      2.9%   14.6%      0.3%    2.3%
 2      2.1%      0.8%    2.3%      0.2%    1.5%      0.0%    zero
 3      1.9%      0.4%    zero      0.0%    zero      0.0%    zero
 4      0.8%      zero    zero      zero    zero      zero    zero
 5      1.0%      zero    zero      zero    zero      zero    zero
 6      0.0%      zero    zero      zero    zero      zero    zero


My data agrees with the paper's numbers, suggesting rules 4-6 are
almost completely useless.  Of course, this makes me wonder about the
thoroughness of the paper and how much we should really trust it...

>> I suspect the "botnet" plugin for SpamAssassin is far more
>> comprehensive, and I've already decided not to use it thanks to the
>> fact that greylisting's main function is combating botnets. The same
>> will probably go for S25R.
> 
> S25R is very lightweight because it just uses only 7 regular
> expressions. It seems that it's reasonable solution at the
> first filter. We will use other comprehensive filters (that
> may be heavy rather than S25R) for mails that they are
> passed S25R (+ greylisting) check.

"only 7 regular expressions" though it should be obvious that rules 0,
4, 5, and 6 are poor quality ... which makes it "only 3" regexps, with
no data suggesting any degree of accuracy.  SpamAssassin's RDNS_NONE
is identical to S25R rule 0.  SpamAssassin's RDNS_DYNAMIC has a huge
overlap with the rest of S25R's rules.  Mass-check stats for these
tests are available:
http://ruleqa.spamassassin.org/week/RDNS_NONE/detail
http://ruleqa.spamassassin.org/week/RDNS_DYNAMIC/detail

While RDNS_NONE and RDNS_DYNAMIC both default to scores of 0.1, I
rescore them as 0.9 and 0.4 respectively.

>> Implementing S25R within milter-greylist once the tarpitting
>> functionality is present should prove trivial, so I see no need to
>> implement a "targrey" clause.
> 
> We doesn't need new codes for S25R because we can use S25R
> with the current milter-greylist as I show in the above. :-)

Then we agree on that issue; those two paragraphs say the same thing.

Attachments

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.