Synth-DIY Yahoo! Groups Archives

Thread

Questions about greylist.db file...

2005-10-26 by Ogogon !!!

Questions about greylist.db file:

1. Why in a file greylist.db the text  list, instead of dbm-hash is used?
At me in it more than two hundred thousand records. At such quantity of 
dbm-structure give appreciable increase in productivity.

2. Why at registration of the incoming letters in greylist.db such 
convenient parameter, how Message-ID is not used?

Ogogon.

Re: [milter-greylist] Questions about greylist.db file...

2005-10-26 by Matt Kettler

Ogogon !!! wrote:
> Questions about greylist.db file:
> 
> 1. Why in a file greylist.db the text  list, instead of dbm-hash is used?
> At me in it more than two hundred thousand records. At such quantity of 
> dbm-structure give appreciable increase in productivity.

I can't answer this one.

> 
> 2. Why at registration of the incoming letters in greylist.db such 
> convenient parameter, how Message-ID is not used?

At the time of greylisting, the Message-ID is not known.

You'll only know the message-id after the end of the SMTP DATA phase when the
message has already been transfered.

The only data available at the time of greylisting is:

Remote IP
Remote RDNS (if any)
Remote HELO/EHLO string
envelope from (ie: the MAIL FROM command)
envelope recipient (ie: the RCPT TO command).

That's all the information that has been provided by the remote server at the
time milter-greylist decides to greylist or not. You don't know the contents of
any message headers, or the body.

Re: [milter-greylist] Questions about greylist.db file...

2005-10-27 by Ogogon !!!

Matt Kettler wrote:
> At the time of greylisting, the Message-ID is not known.
> You'll only know the message-id after the end of the SMTP DATA phase when the
> message has already been transfered.
>
> The only data available at the time of greylisting is:
>
> Remote IP
> Remote RDNS (if any)
> Remote HELO/EHLO string
> envelope from (ie: the MAIL FROM command)
> envelope recipient (ie: the RCPT TO command).
>
> That's all the information that has been provided by the remote server at the
> time milter-greylist decides to greylist or not. You don't know the contents of
> any message headers, or the body.
>   
Absolutely with it it agree, but RFC-821 does not forbid to me to read 
through the letter up to Message-ID and after that to submit a code " 
4.7.1" and to close connection, having cleared the buffer.
It will not be refusal of the message, and will be employment of the 
channel then my partner will repeat attempts of transfer. However, the 
volume of entrance traffic thus increases and becomes less operated, 
that will be a unpleasant payment.
Sometimes, as it seems to me, there can be ambiguous enough situations 
which are easily resolved by use Message-ID. For example, it is 
appreciable the greater network masked NAT in which are and spammer and 
diligent mail relay. (In Moscow there were such cases.)

Ogogon.

Re: [milter-greylist] Questions about greylist.db file...

2005-10-27 by Emmanuel Dreyfus

On Thu, Oct 27, 2005 at 02:14:19AM +0400, Ogogon !!! wrote:
> 1. Why in a file greylist.db the text  list, instead of dbm-hash is used?
> At me in it more than two hundred thousand records. At such quantity of 
> dbm-structure give appreciable increase in productivity.

Some time ago, I tries converting milter-greylist to use a DB style file,
but it was not possible to have a strong garantee that the file would not
be currupted if the milter died during operation. The current scheme ensures
that the text dump is never corrupted.

Moreover, does your machine throttle because of the file size? Mine has 
92268 lines in greylist.db, load is 0.30.

Bigger setups would benefit from alternative storage backends. There have
been discussions about this last week on the list, but nobody seems to 
be in such a need for the feature, since nobody volunteered to work on it :-) 

> 2. Why at registration of the incoming letters in greylist.db such 
> convenient parameter, how Message-ID is not used?

Good question: why not use Message-ID? Anyone has an idea?
-- 
Emmanuel Dreyfus
manu@...

Re: [milter-greylist] Questions about greylist.db file...

2005-10-27 by Emmanuel Dreyfus

On Wed, Oct 26, 2005 at 07:14:12PM -0400, Matt Kettler wrote:
> > 2. Why at registration of the incoming letters in greylist.db such 
> > convenient parameter, how Message-ID is not used?
> 
> At the time of greylisting, the Message-ID is not known.
> 
> You'll only know the message-id after the end of the SMTP DATA phase when the
> message has already been transfered.

And the missing bit: you can wait SMTP data stage for greylisting, but if
you do that you cannot have different settings for different recipients: the
mail will be greylisted (or whitelisted) for every recipient.
-- 
Emmanuel Dreyfus
manu@...

Re: [milter-greylist] Questions about greylist.db file...

2005-10-27 by Matt Kettler

Ogogon !!! wrote:
> Matt Kettler wrote:
> 
>>At the time of greylisting, the Message-ID is not known.
>>You'll only know the message-id after the end of the SMTP DATA phase when the
>>message has already been transfered.
>>
>>The only data available at the time of greylisting is:
>>
>>Remote IP
>>Remote RDNS (if any)
>>Remote HELO/EHLO string
>>envelope from (ie: the MAIL FROM command)
>>envelope recipient (ie: the RCPT TO command).
>>
>>That's all the information that has been provided by the remote server at the
>>time milter-greylist decides to greylist or not. You don't know the contents of
>>any message headers, or the body.
>>  
> 
> Absolutely with it it agree, but RFC-821 does not forbid to me to read 
> through the letter up to Message-ID and after that to submit a code " 
> 4.7.1" and to close connection, having cleared the buffer.

That is true, you CAN do that. But that's not where milter-greylist ties in.
Milter-greylist has made the deliberate choice of greylisting before the STMP
DATA phase in order to save bandwidth.

If you greylist after the DATA phase, a legitimate sender that retries will
waste your bandwidth and theirs by having to send the whole message several
times before you accept it.

And to what gain? Very little. Now you know the message-id and can differentiate
between a retry and a resend.

In the process you've managed to piss off a lot of other network admins, many of
whom may choose to null-route mail sent to your network because you're
intentionally wasting substantial quantities of network bandwidth.

Re: [milter-greylist] Questions about greylist.db file...

2005-10-27 by Emmanuel Dreyfus

On Thu, Oct 27, 2005 at 10:50:23AM -0400, Matt Kettler wrote:
> That is true, you CAN do that. But that's not where milter-greylist ties in.
> Milter-greylist has made the deliberate choice of greylisting before the STMP
> DATA phase in order to save bandwidth.

IMO it was more to have greylisting activated on a per recipient basis.

-- 
Emmanuel Dreyfus
manu@...

Re: [milter-greylist] Questions about greylist.db file...

2005-10-27 by Matt Kettler

Emmanuel Dreyfus wrote:
> On Thu, Oct 27, 2005 at 10:50:23AM -0400, Matt Kettler wrote:
> 
>>That is true, you CAN do that. But that's not where milter-greylist ties in.
>>Milter-greylist has made the deliberate choice of greylisting before the STMP
>>DATA phase in order to save bandwidth.
> 
> 
> IMO it was more to have greylisting activated on a per recipient basis.
> 

Fair enough. I hadn't even considered that angle when I wrote my response.

That said, it would be a considerable waste of bandwidth to wait until after the
DATA phase given that the accuracy boosts you would see from it would be very
small percentage wise.

In my experience, 94.4% of spammers never retry or resend at all. So your
absolute best-case gain would be to pick up the last 5.6%. However, I doubt
you'd get more than 1% additional elimination this way. Most of that 5.6%
appears to be abused relays that really do queue, and no greylist will ever
eliminate those sources, just delay them so you can get better RBL hits.

The network bandwidth overhead of greylisting pre-data phase is pretty small.
Maybe 200 bytes, at most? However, the post-data phase overhead could be large..
say an email with a large .zip attachment.. 100MB?

So, even if you could still do per-recipient filtering after the DATA phase, it
would still be a bad idea.