Yahoo Groups archive

Milter-greylist

Index last updated: 2026-04-28 23:32 UTC

Thread

Another idea for rating system

Another idea for rating system

2004-12-11 by egcrosser

Manu and guys,
Sorry, I feel that I am too chatty today.  I promice to stop after
this message ;-)

I've got another idea how to measure rating of the peer, this time
completely within existing infrastructure.  The idea is this:

if a greylisted submission was *not* retried by the peer within, say,
12 hours, the peer is likely no good!  Good guys always retry after 4xx.

One caveat: the peer might have resent the message via anotehr MX.  So
reception of the same (sender, recipient) from one of our MXes should
amnesty the original sender address.

How does it sound?

Eugene

Re: [milter-greylist] Another idea for rating system

2004-12-11 by Ivan F. Martinez

On Sat, 11 Dec 2004 10:55:30 -0000
"egcrosser" <egcrosser@...> wrote:

E> 
E> 
E> Manu and guys,
E> Sorry, I feel that I am too chatty today.  I promice to stop after
E> this message ;-)
E> 
E> I've got another idea how to measure rating of the peer, this time
E> completely within existing infrastructure.  The idea is this:
E> 
E> if a greylisted submission was *not* retried by the peer within, say,
E> 12 hours, the peer is likely no good!  Good guys always retry after
E> 4xx.
E> 
E> One caveat: the peer might have resent the message via anotehr MX. 
E> So reception of the same (sender, recipient) from one of our MXes
E> should amnesty the original sender address.
E> 
E> How does it sound?

Nice to me, the -L option with something like /25 oe /24 can solve the
problem with MX on many small sites, the big ones will be in whitelist
after sometime using the greylist.


--

Re: [milter-greylist] Another idea for rating system

2004-12-11 by manu@netbsd.org

egcrosser <egcrosser@...> wrote:

> Sorry, I feel that I am too chatty today.  I promice to stop after
> this message ;-)

Don't refrain talking about your ideas, that would lower the chances of
a good thing getting out of the discussion. :) 
 
> I've got another idea how to measure rating of the peer, this time
> completely within existing infrastructure.  The idea is this:
> 
> if a greylisted submission was *not* retried by the peer within, say,
> 12 hours, the peer is likely no good!  Good guys always retry after 4xx.
> 
> One caveat: the peer might have resent the message via anotehr MX.  So
> reception of the same (sender, recipient) from one of our MXes should
> amnesty the original sender address.
> 
> How does it sound?

I see a big flaw: Immagine you get a flood of message from
<big@...>. You get amnisty whereas you shouldn't have.

The problem is about recognizing the message. Could the message-Id be
used for that? Is there a guarantee of its presence? How unique is it?
Any SMTP guru in the room? 

-- 
Emmanuel Dreyfus
Il y a 10 sortes de personnes dans le monde: ceux qui comprennent 
le binaire et ceux qui ne le comprennent pas.
manu@...

Re: Another idea for rating system

2004-12-13 by Klas Heggemann

"egcrosser" <egcrosser@...> wrote:

> Manu and guys,
> Sorry, I feel that I am too chatty today.  I promice to stop after
> this message ;-)
>
> I've got another idea how to measure rating of the peer, this time
> completely within existing infrastructure.  The idea is this:
>
> if a greylisted submission was *not* retried by the peer within, say,
> 12 hours, the peer is likely no good!  Good guys always retry after 
> 4xx.
>

Unfortunatly, good guys sometimes get hit by bad guys.
When checking the logs, I find more then one site, that
resends a perfectly legitimate message after more then 2 days.
My guess is that they have been struck by a very long mail queue,
due to spam with bad addresses and bad return address,
which their MTA cannot handle in a good way.

(This has happend to us, with sendmail never going through the whole
queue, because load gets to high:-( . I had to make arrangements for the
queue to be processed regularly, despite the load. Nowadays the queue
is managable due to the greylist milter.)



/klas

Re: Another idea for rating system

2004-12-14 by egcrosser

OK, I'm back from hacking an antivirus module for zmscanner, and full
of sh^H^Hthoughts.  Maybe too global for this project, maybe not.

First, I think that I must better introduce myself.  On the dayjob, I
am a sysadmin team leader at large, and a postmaster in particular,
for a big isp in my country.  We do not run sendmail.  And we do have
a home-grown spam-supression tool which is very similar to
greylisting.  Aside from that, I am running sendmail on several
small/private sites, with milter-greylist on one of them.

Now when I am thinking about antispam solution, I am equally concerned
about my small and big systems.  Being a big ISP means being a target
of choice for spammers.  It means that if I leave a *potential* hole
in the system, bad guys find it and start using it within a couple of
weeks.  Good antispam solution design should not allow that.

For greylisting approach, "worst case" scenario is an army of zombies
that can retry after 4xx.  And this will happen very soon after any
big ISP deploys greylisting.  You can trust me on that :-)

That's why I think that grelisting per se can be only a temporary
remedy.  But combined with other means, it's a different story. 
Greylisting's strong point is that it's protection starts instantly as
an attack begins.  It's weak point is that it quickly gives up as the
attack continues.

On the other hand, reputation systems (including blacklists and
whitelists as their extreme form) gradually grow stronger with time,
but cannot catch a sudden attack.  So, it seems worth to try to
combine the two approaches.  Greylisting will slow down a new attack,
and feed data to a reputation system.  Then it turn, it can consult
the reputation system and set "level of throttling" according to the
rating of the peer.  Up to complete blocking with 5xx code.

Now to the reputation system.  It probably will be best if it collects
rating data from a number of different sources, with different
weights.  Things that come in mind are:

- intensity of the flow of submissions
- percentage of submissions to non-existant users
- intensity of submissions to honeypot addresses
- for SPF-validated submission, age of the domain
- for back-resolvable peer address, fuzzy check against typical
dualup/dsl/cable patterns, and against typical valid mail server
patterns (we found this one particularily useful here)
- not not back-resolvable, the fact that they are not back-resolvable
- intensity of DCC positive submissions

(my previous idea that non-retried greylisted peer should grow
negative rating is not very useful: mail from it is blocked anyway)

Now, a reputation system is the better the more MTAs it serves.  I
don't beleive in worldwide reputation systems but corporation-wide
seem realistic.  So it should have some simple network interface, with
strong enough access control to disallow poisoning.  DNSBL style? 
Note that it should be able to accept updates from different sources
and  give them different weights.

OK, enough for today.
Now you tell me how stupid/whishfull thinking I am...
:-)
Eugene

Re: [milter-greylist] Re: Another idea for rating system

2004-12-14 by Gary Aitken

Hi Eugene,

> On the other hand, reputation systems (including blacklists and
> whitelists as their extreme form) gradually grow stronger with time,
> but cannot catch a sudden attack.  So, it seems worth to try to
> combine the two approaches.  Greylisting will slow down a new attack,
> and feed data to a reputation system.  Then it turn, it can consult
> the reputation system and set "level of throttling" according to the
> rating of the peer.  Up to complete blocking with 5xx code.

By throttling you simply mean extend the greylist delay, correct?
It's not clear to me that this will have any effect in the end,
until one gets to reject (5xx).

If we assume a compliant sender, then it will look and behave
completely valid and eventually we will be forced to accept the
mail unless it makes it to the blacklist.  Although the resources
necessary for a retry over a long period will obviously be larger.

Presumably if one determines complete blocking with 5xx, then one
also submits to an rbl?  Or must one also maintain a local rbl
with addressee specific info?

> Now to the reputation system.  It probably will be best if it collects
> rating data from a number of different sources, with different
> weights.  Things that come in mind are:

These need some means of seeding them with info if the server
is known to emit good mail, even though no mail from those
sources has yet been received.  In some cases, it may be a
long time before good mail arrives from one of these sources,
and they may be sources of spam as well.

> - intensity of the flow of submissions
> - percentage of submissions to non-existant users
> - intensity of submissions to honeypot addresses

I really like this one.  The system could be given
the location of a local web page and a template for rewriting it,
and rules for generating email addresses.  It could periodically
rewrite the page and tune accordingly.  The advantage over other
sources is that it is guaranteed spam.  Cross-referenced to
something like DCC it could be very effective in rejecting spam
to legitimate users.  The disadvantage is the obvious time
delay before the honeypot starts receiving mail.

> - for SPF-validated submission, age of the domain
> - for back-resolvable peer address, fuzzy check against typical
> dualup/dsl/cable patterns, and against typical valid mail server
 > patterns (we found this one particularily useful here)

Isn't this what DCC does already on a per-message basis?

If not, I don't see how this is particularly useful.  How do you
differentiate valid mail from that server?  It seems to me there
is a very high probability for valid mail being rejected.  Or were
you just lucky and had no valid mail from that server?

Again, a correlation with DCC info would significantly
improve the validity of the results.

> - not not back-resolvable, the fact that they are not back-resolvable
> - intensity of DCC positive submissions

> (my previous idea that non-retried greylisted peer should grow
> negative rating is not very useful: mail from it is blocked anyway)

I'm not sure I understand this.  If it is not retried, it is
effectively blocked on the particular receiving site, but shouldn't
this information be added to the rating in case they upgrade and
begin to retry?

> Now, a reputation system is the better the more MTAs it serves.  I
> don't beleive in worldwide reputation systems but corporation-wide
> seem realistic.  So it should have some simple network interface, with
> strong enough access control to disallow poisoning.  DNSBL style?
> Note that it should be able to accept updates from different sources
> and  give them different weights.

Certainly a worthwhile effort, and something to keep in mind for the
design.  I would initially keep it as simple as possible to allow
an implementation that works locally to be easily built.  Abstract
out the communications layer and you should be able to generalize it
to multiple MTAs without impacting anything else, I would hope.

> OK, enough for today.
> Now you tell me how stupid/whishfull thinking I am...

I think it has potential.
When do I get to try it? :-)

Gary

Re: Another idea for rating system

2004-12-14 by egcrosser

--- In milter-greylist@yahoogroups.com, Gary Aitken <greylist@d...> wrote:

> By throttling you simply mean extend the greylist delay, correct?
> It's not clear to me that this will have any effect in the end,
> until one gets to reject (5xx).

General idea is to accept spam and mail from anyone (except those with
very bad reputation), but make the job harder (bigger delays, higher
chance of reject) for those who send more spam than ham.  If a site
sends only spam, it will end up with very bad reputation and
eventually will be blocked.  If it sends good mail, and almost no
spam, its reputation will grow good, which is equivalent to
whitelisting.  The owners of sites that send both spam and ham will
get complains from their users, and have overcrowded outgoing queue,
and thus get incentive to put more pressue on their spammers.

> Presumably if one determines complete blocking with 5xx, then one
> also submits to an rbl?  Or must one also maintain a local rbl
> with addressee specific info?

Essentially, reputation system is a (better) replacement for RBL. 
Reputation that is lower than a defined threshold means blocking.

> > Now to the reputation system.  It probably will be best if it collects
> > rating data from a number of different sources, with different
> > weights.  Things that come in mind are:
> 
> These need some means of seeding them with info if the server
> is known to emit good mail, even though no mail from those
> sources has yet been received.

Not necesserily.  Initial "default" reputation may be good enough to
impose only a small greylisting delay.  It also may be seeded by
reverse DNS and/or whois inspired guesstimates.

On our production system, we always instantly accept first submission
from an unknown source.  They may have harder time to push the next
one, depending on a number of conditions.  I am not completely
satisfied with the performance (and that's why I am here:), but it
keeps junk at more or less tolerable level.

> > - for back-resolvable peer address, fuzzy check against typical
> > dualup/dsl/cable patterns, and against typical valid mail server
>  > patterns (we found this one particularily useful here)
> 
> Isn't this what DCC does already on a per-message basis?

By DCC I mean generic thing that recognizes repeating patterns, not
the particular implementation from riolytte(sp?).

> If not, I don't see how this is particularly useful.  How do you
> differentiate valid mail from that server?  It seems to me there
> is a very high probability for valid mail being rejected.  Or were
> you just lucky and had no valid mail from that server?

The chance of getting valid mail from p123-nas45.dsl.isp.net is close
to zero.  If someone tries to install a real SMTP server on the end of
a DSL connection he is just unlucky.  If he really must, he can use
SPF, for some time suffer from big delays, but eventually get his
reputation improved and live happily ever after.

> Again, a correlation with DCC info would significantly
> improve the validity of the results.

That's true.  Although it probably should be DCC's responsibility to
distinguish generic reports from reports from honeypots and give the
latter more weight (again, I mean hypothetical DCC here).

> > (my previous idea that non-retried greylisted peer should grow
> > negative rating is not very useful: mail from it is blocked anyway)
> 
> I'm not sure I understand this.  If it is not retried, it is
> effectively blocked on the particular receiving site, but shouldn't
> this information be added to the rating in case they upgrade and
> begin to retry?

Well, maybe.

Eugene

Re: [milter-greylist] Re: Another idea for rating system

2004-12-14 by manu@netbsd.org

egcrosser <egcrosser@...> wrote:

> For greylisting approach, "worst case" scenario is an army of zombies
> that can retry after 4xx.  And this will happen very soon after any
> big ISP deploys greylisting.  You can trust me on that :-)

Yes, I beleive that.

Hum... I'll work on introducing stupid bugs to prevent large scale
deploying of milter-greylist, just in case :)
 
> - intensity of the flow of submissions
> - percentage of submissions to non-existant users
> - intensity of submissions to honeypot addresses
> - for SPF-validated submission, age of the domain
> - for back-resolvable peer address, fuzzy check against typical
> dualup/dsl/cable patterns, and against typical valid mail server
> patterns (we found this one particularily useful here)
> - not not back-resolvable, the fact that they are not back-resolvable
> - intensity of DCC positive submissions

You omit honeypots. Is it on purpose? A good set of spam trap addresses
is a very simple and efficient way of detecting zombies. 
 
> Now, a reputation system is the better the more MTAs it serves.  I
> don't beleive in worldwide reputation systems but corporation-wide
> seem realistic.  So it should have some simple network interface, with
> strong enough access control to disallow poisoning.  DNSBL style? 
> Note that it should be able to accept updates from different sources
> and  give them different weights.

Did you have a look at http://ftp.espci.fr/pub/dst ?

It's a real-time spam trap system. The idea is to set honeypots
addresses and to publish the addresses (on a web page, in an HTML
comment, for instance: <!-- spamtrap20041214@... -->. You can
also post in the news and have it in your headers)

Spambots will collect the addresses and spammers will send spam to the
addresses. Here you have a very important property: you know that any
mail sent to such an address is spam.  

But knowing someone sent some spam is not enough. Spammers can
distribute the attacks on their zombies army, and use a different IP
address for each spam they send to your domain. So we need a way of
quickly distributing the information that a message was dropped in a
spam trap. If the spam trap are largely distributed, then you'll be able
to know about a zombie as soon as it sends a single mail to any machine
participating to the spam trap network.

DST (Distributed Spam Traps) try to acheive that. It works with two
pieces:

dstc is meant to be installed on a mailbox (you use a .forward and pipe
mail through it, for instance). It reads the headers, looking for the
sending machine IP. It then sends RSA signed report to the nearest dtsd

dstd is a daemon. It exchanges spam reports with neighbourg dstd exactly
in the same way as NNTP works: the report is a small RFC822-like
message, with a Message-Id and a Path header to avoid loops.

dstd can log reports, it can verify the RSA signature, and it can feed a
DNSRBL through DNS updates. 

The key points are:
- you get a real-time notification to many machine for one spam 
- that works with any MTA (I assume any MTA supports DNSRBL)
- there is no single point of failure: spammers can DDOS to death a site
and DST will still work

I halted developpement a long time ago because it was not useful at the
moment, but we could consider working on it now. There are a few issues
that should be worked on:

1- Large deployement means key management problems. The public key
should probably be sent with the report and stored in a pending keys
directory, so that the administrator can decide to add the key of a new
site. That seems the simplier to me.

2- What should be done with the information. One counter attack from
spammers could be to identify a spam trap and use zombies to send
messages there through many ISP mail server. The ISP mail servers being
blacklisted by the DST, it becomes completely poisoned, and therefore
useless. In order to avoid that, we just have to change spamtraps
addresses often. Your ideas about varying greylisting time could help
solving that problem too.

-- 
Emmanuel Dreyfus
Il y a 10 sortes de personnes dans le monde: ceux qui comprennent 
le binaire et ceux qui ne le comprennent pas.
manu@...

Re: [milter-greylist] Re: Another idea for rating system

2004-12-14 by manu@netbsd.org

Emmanuel Dreyfus <manu@...> wrote:

> > - intensity of submissions to honeypot addresses
> You omit honeypots. Is it on purpose?

Hum... I swear I have readen the beginning of the line. I just skiped
the last words! :)

-- 
Emmanuel Dreyfus
Il y a 10 sortes de personnes dans le monde: ceux qui comprennent 
le binaire et ceux qui ne le comprennent pas.
manu@...

Re: [milter-greylist] Another idea for rating system

2004-12-16 by Matthias Scheler

On Sat, Dec 11, 2004 at 10:55:30AM -0000, egcrosser wrote:
> if a greylisted submission was *not* retried by the peer within, say,
> 12 hours, the peer is likely no good!

Or it runs Exim and has therefor trouble handling the load.

	Kind regards

-- 
Matthias Scheler                                  http://scheler.de/~matthias/

Re: Another idea for rating system

2004-12-16 by egcrosser

--- In milter-greylist@yahoogroups.com, manu@n... wrote:

> > Now, a reputation system is the better the more MTAs it serves.  
> > I don't beleive in worldwide reputation systems but corporation-wide
> > seem realistic.  So it should have some simple network interface, with
> > strong enough access control to disallow poisoning.  DNSBL style? 
> > Note that it should be able to accept updates from different sources
> > and  give them different weights.
> 
> Did you have a look at http://ftp.espci.fr/pub/dst ?

I did download the thingie and read the docs.  I confess that I did
not try to build it though :-)  Anyway, when I realized that you
already have two important elements of the model that I was mulling I
decided to climb up the soapbox.

Anyway, your system presumes "binary" rating: if a honeypot address is
hit, the sender is blacklisted for good.  I think that it is too
dangerous: valid servers *may* eventually send some spam, including
that to honeypots.  For a reputation system with gradations this is
not a problem because good behavior keeps server reputation high, and
a bit of spam just slightly offsets it.

If we want gradations of black and white, DNS query/update machanism
that you currently use in DST may not be adequate...

BTW, answering to Matthias Scheler's concern about bounces being taken
for spam: nowdays many people think that bounces are almost as bad as
spam, and servers that generate bounces instead of rejecting
submissions at SMTP stage deserve that same treatment as spam senders.

> 1- Large deployement means key management problems. The public key
> should probably be sent with the report and stored in a pending keys
> directory, so that the administrator can decide to add the key of a new
> site. That seems the simplier to me.

Large (worlwide) deployment has more problems than just key
management.  First of all, it's about trust.  Even if you trust the
people who run servers, you harly can hope that all of them are
adequatly intruder-resistant.  And if an enemy infiltrates to even a
single node, he may wreak havoc to the whole system.

Another interesting moment is that for a large server, cooperation may
not be that beneficial.  That's because attacks targeted to just one
specific big server are not uncommon.

I think that a protection system design should presume that it can be
deployed in both widely distributed and relatively "local" environment.

> 2- What should be done with the information. One counter attack from
> spammers could be to identify a spam trap and use zombies to send
> messages there through many ISP mail server. The ISP mail servers being
> blacklisted by the DST, it becomes completely poisoned, and therefore
> useless. In order to avoid that, we just have to change spamtraps
> addresses often. Your ideas about varying greylisting time could help
> solving that problem too.

Yes, I think that we should try to detect both "good" and "bad"
behavior, and adjust reputation accordingly.  This would minimize
impact of accidental false positives and deliberate poisoning attacks
alike.

Eugene

Re: [milter-greylist] Re: Another idea for rating system

2004-12-16 by manu@netbsd.org

egcrosser <egcrosser@...> wrote:

[DST]
> Anyway, your system presumes "binary" rating: if a honeypot address is
> hit, the sender is blacklisted for good.  I think that it is too
> dangerous: valid servers *may* eventually send some spam, including
> that to honeypots.  For a reputation system with gradations this is
> not a problem because good behavior keeps server reputation high, and
> a bit of spam just slightly offsets it.

Sure. One of the problems is spam sent through an ISP mail server.
Indeed there is the need for something smarter. 

I think about adding more information in the DNSRBL, such as a spamming
score for a given machine, but in order to do something useful, we need
to collect spam patterns from the real world. That's why I call for an
experimental deployment of DST. 

Once we'll have 100 reports per hour (with only one spam trap, I already
get a lot of spam each day), it will be easier to invent a scoring
system.
 
> If we want gradations of black and white, DNS query/update machanism
> that you currently use in DST may not be adequate...

You can store many informations in the DNS. You can have a score in a
TXT record. 
 
> BTW, answering to Matthias Scheler's concern about bounces being taken
> for spam: nowdays many people think that bounces are almost as bad as
> spam, and servers that generate bounces instead of rejecting
> submissions at SMTP stage deserve that same treatment as spam senders.

I agree here, rejection should occur at SMTP level whenever it is
possible. The Internet will move toward that, but I won't 
 
> > 1- Large deployement means key management problems. The public key
> > should probably be sent with the report and stored in a pending keys
> > directory, so that the administrator can decide to add the key of a new
> > site. That seems the simplier to me.
> 
> Large (worlwide) deployment has more problems than just key
> management.  First of all, it's about trust.  Even if you trust the
> people who run servers, you harly can hope that all of them are
> adequatly intruder-resistant.  And if an enemy infiltrates to even a
> single node, he may wreak havoc to the whole system.

Give score to reporers, with higher score to more trusted ones.
Intrusions are not a real issue, however. The spammer can use an army of
compromised hosts to target a spamtrap through ISP mail servers,
therefore poisoning the DST. We need a mecanism to address that. Maybe a
spamtrap should be closed after enough hits, or something like this.  

-- 
Emmanuel Dreyfus
Il y a 10 sortes de personnes dans le monde: ceux qui comprennent 
le binaire et ceux qui ne le comprennent pas.
manu@...

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.