Yahoo Groups archive

Milter-greylist

Index last updated: 2026-04-28 23:32 UTC

Thread

Storing Hashes instead of full tuples

Storing Hashes instead of full tuples

2005-01-15 by billstewart2002a

There was a discussion a while back about storing hashes of a tuple
instead of the tuple itself.  It makes the live database much smaller,
as much as a factor of 10, and also protects against attacks like
overly-long source or destination addresses.  The initial proposals
suggested using MD5, a formerly popular cryptographic hash.  However,
for this application, MD5 is overkill, because you're not worrying
about the sender trying to invert the hash.  A simple CRC code is much
faster to calculate, and you could also choose a shorter hash, e.g. 64
bits instead of 128.  <p>

A shorter hash does have an increased chance of collisions - birthday
problem says a 64-bit hash probably gets one if you've got 2**32
tuples in your database, which is pretty unlikely except maybe for
very big ISPs.  But an occasional hash collision isn't a big problem,
because the worst consequence is that a new spam message collides with
a real message's tuple,  so the spam gets in, but the tuple gets
whitelisted safely.  One extra spam in a few billion messages isn't
going to break the system. <p>

Some of the discussion suggested that the hash wouldn't be useful for
wildcarding and destination-based or source-based whitelisting, but
you can do those things before calculating the hash.  There's also the
issue of logfile analysis, but logfiles and the live database are
really separate issues - the logfile can be a regular file that gets
appended to, and you can store the hash with the rest of the records
if it makes sense (e.g. you want to check how often a given source or
destination's tuples get expired.)

Re: [milter-greylist] Storing Hashes instead of full tuples

2005-01-15 by manu@netbsd.org

billstewart2002a <bill.stewart@...> wrote:

> There was a discussion a while back about storing hashes of a tuple
> instead of the tuple itself.  It makes the live database much smaller,
> as much as a factor of 10, and also protects against attacks like
> overly-long source or destination addresses. 

But you loose regex matching. 
What problem will you solve by using hashes anyway? Do you have problems
with the greylist database being too big for your system?

-- 
Emmanuel Dreyfus
Publicité subliminale: achetez ce livre!
http://www.eyrolles.com/Informatique/Livre/9782212114638/livre-bsd.php
manu@...

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.