I wrote: >> One important implementation note: if the connecting server drops >> the connection but then comes back later, the tarpit clock should >> have been counting from that first connection. (Otherwise, some >> noncompliant servers might never deliver mail.) Kouhei Sutou wrote: > I want to rescue the servers by greylisting not tarpitting. Eh? Define "rescue" here... If we offer a tarpit action, it has to play well with the greylist action, even if you do not intend to use tarpitting (in which case, I wonder why you suggested it). >> After reading a bit on S25C, I'm quite dubious. No concrete data >> on false-positives is presented and the whitelist is MASSIVE. > > Yes. S25R has some false positives. We need a whitelist when we use > S25R. > > We can use S25R with greylisting to maintain our whitelist > automatically. Perhaps if you use lazyaw, but even with that, massive mailers like hotmail won't get properly whitelisted. To somebody trying to limit the negative impact of greylisting, this mandates whitelisting. > Here is a configuration to use S25R in milter-greylist: > > extendedregex > racl greylist domain /^\[.+\]$/ msg "S25R rule 0" > racl greylist domain /^[^.]*[0-9][^0-9.]+[0-9].*\./ msg "S25R rule 1" > racl greylist domain /^[^.]*[0-9][0-9][0-9][0-9][0-9]/ msg "S25R rule 2" > racl greylist domain /^([^.]+\.)?[0-9][^.]*\.[^.]+\..+\.[a-z]/ msg "S25R rule 3" > racl greylist domain /^[^.]*[0-9]\.[^.]*[0-9]-[0-9]/ msg "S25R rule 4" > racl greylist domain /^[^.]*[0-9]\.[^.]*[0-9]\.[^.]+\..+\./ msg "S25R rule 5" > racl greylist domain /^(dhcp|dialup|ppp|[achrsvx]?dsl)[^.]*[0-9]/ msg "S25R rule 6" Rule 0 is commonly hit by legit servers, so it will contain a large number of false positives. I would not recommend using it. As to the provided statistics, they are stale (April 2004) and insignificant (only 567 IPs observed). Because the paper's implementation blocked rather than greylisted, false-positive rates cannot be determined. The paper's stats show rules 4-6 are almost completely useless; rules 4 and 5 blocked 1.2% messages together and rule 6 failed to get even 0.1%, while rule 3 (the worst of remaining rules) blocked 6.5%. For such broad-sweeping "high impact" rules, I see untested methods which vary from too many false positives to too few hits. >> I've implemented S25C in SpamAssassin with near-zero scores to >> see what kind of impact it would have on my servers, but I doubt >> it will prove useful (since SA fires after greylisting). > > S25R detects most of spam-bots and greylisting also detects (and > rejects) most of spam-bots. SpamAssassin will not receive mails > that can be detected by S25R. Sorry, I thought the last part of my above quoted sentence explained that. However, don't forget that botnets sometimes survive greylisting or find their way onto whitelists like DNSWL (which is becoming quite common). Sure enough, I have results after less than a day. In the following table, I show hits as a ratio against of all spamassassin-scanned mail (after passing or bypassing greylisting), then what percent of those hits would have become marked at various thresholds. I reject mail at 8+ points and mark mail at 5+ points, so the table has results for scoring each rule at 3.0, 2.0, and 1.0 points. 2500 emails were included in these stats. So S25R rule 1 hit 12.0% of the SA-scanned mails. If scored 3.0, 4.1% more of those hits would have been marked as spam and 21.5% would have been rejected. If scored 2.0, 2.9% more of the hits would have been marked and 14.6% would have been rejected, etc. S25R total --- 3.0 --- --- 2.0 --- --- 1.0 --- rule hits mark block mark block mark block 0 6.7% zero 3.6% zero 2.8% zero 2.1% 1 12.0% 4.1% 21.5% 2.9% 14.6% 0.3% 2.3% 2 2.1% 0.8% 2.3% 0.2% 1.5% 0.0% zero 3 1.9% 0.4% zero 0.0% zero 0.0% zero 4 0.8% zero zero zero zero zero zero 5 1.0% zero zero zero zero zero zero 6 0.0% zero zero zero zero zero zero My data agrees with the paper's numbers, suggesting rules 4-6 are almost completely useless. Of course, this makes me wonder about the thoroughness of the paper and how much we should really trust it... >> I suspect the "botnet" plugin for SpamAssassin is far more >> comprehensive, and I've already decided not to use it thanks to the >> fact that greylisting's main function is combating botnets. The same >> will probably go for S25R. > > S25R is very lightweight because it just uses only 7 regular > expressions. It seems that it's reasonable solution at the > first filter. We will use other comprehensive filters (that > may be heavy rather than S25R) for mails that they are > passed S25R (+ greylisting) check. "only 7 regular expressions" though it should be obvious that rules 0, 4, 5, and 6 are poor quality ... which makes it "only 3" regexps, with no data suggesting any degree of accuracy. SpamAssassin's RDNS_NONE is identical to S25R rule 0. SpamAssassin's RDNS_DYNAMIC has a huge overlap with the rest of S25R's rules. Mass-check stats for these tests are available: http://ruleqa.spamassassin.org/week/RDNS_NONE/detail http://ruleqa.spamassassin.org/week/RDNS_DYNAMIC/detail While RDNS_NONE and RDNS_DYNAMIC both default to scores of 0.1, I rescore them as 0.9 and 0.4 respectively. >> Implementing S25R within milter-greylist once the tarpitting >> functionality is present should prove trivial, so I see no need to >> implement a "targrey" clause. > > We doesn't need new codes for S25R because we can use S25R > with the current milter-greylist as I show in the above. :-) Then we agree on that issue; those two paragraphs say the same thing.
Message
Re: [milter-greylist] [RFC] implementing taRgrey
2009-07-07 by Adam Katz
Attachments
- No local attachments were found for this message.