Synth-DIY Yahoo! Groups Archives

Thread

milter-greylist 3.1.x branch and OpenBSD performance loading greylist.db

2007-01-05 by rudeyak

Hi, gang.  Is anyone else running milter-greylist on OpenBSD and, if
so, how's the new 3.1 branch performing?  In our environment (OpenBSD
3.9 on i386), we see a substantial (geometric-scale) slowdown in the
time required to load the greylist.db file on startup, which is of
particular concern when that file grows, as ours does, to hundreds of
thousands of entries, or more.

greylist.db lines     m-gr 3.0 startup     m-gr 3.1.1 startup
1,000                 0.06s                  0.03s
5,000                 0.11s                  0.15s
10,000                0.20s                  1.05s
20,000                0.33s                  6.71s
35,000                0.58s                 22.05s
50,000                0.80s                 44.62s
75,000                1.16s                101.12s

The slowdown seems to be isolated to the dump_parse() routine, but
I've been unable to figure out why, and Emmanuel suggested that I post
to the list for assistance.  Any ideas out there?

Cheers,

Erick.

Re: [milter-greylist] milter-greylist 3.1.x branch and OpenBSD performance loading greylist.db

2007-01-05 by Emmanuel Dreyfus

On Fri, Jan 05, 2007 at 03:26:04PM -0000, rudeyak wrote:
> The slowdown seems to be isolated to the dump_parse() routine, but
> I've been unable to figure out why, and Emmanuel suggested that I post
> to the list for assistance.  Any ideas out there?

Well, the goal is to gather other experience: is it specific to OpenBSD
or not

-- 
Emmanuel Dreyfus
manu@...

Re: [milter-greylist] milter-greylist 3.1.x branch and OpenBSD performance loading greylist.db

2007-01-05 by Fabien Tassin

According to rudeyak:

> file grows, as ours does, to hundreds of thousands of entries, or more.

just curious, what is the ratio greylisted / whitelisted in your db ?
(how the Summary line at the end of the db looks like).

I reduced my own db quite drastically by using 'flushaddr' on my (many)
honeypots.. but I don't have zillions of users either.

/Fabien

Re: [milter-greylist] milter-greylist 3.1.x branch and OpenBSD performance loading greylist.db

2007-01-05 by goemon@anime.net

On Fri, 5 Jan 2007, rudeyak wrote:
> The slowdown seems to be isolated to the dump_parse() routine, but
> I've been unable to figure out why, and Emmanuel suggested that I post
> to the list for assistance.  Any ideas out there?

someone really needs to work on sql backend :(

Re: [milter-greylist] milter-greylist 3.1.x branch and OpenBSD performance loading greylist.db

2007-01-07 by AIDA Shinra

At Fri, 5 Jan 2007 10:30:23 -0800 (PST),
goemon@... wrote:
> 
> On Fri, 5 Jan 2007, rudeyak wrote:
> > The slowdown seems to be isolated to the dump_parse() routine, but
> > I've been unable to figure out why, and Emmanuel suggested that I post
> > to the list for assistance.  Any ideas out there?
> 
> someone really needs to work on sql backend :(

I'm writing Berkeley DB backend, but cannot make advance due to my
real life... Development may or may not resume in February.

Scratch in my disk:
http://www.j10n.org/files/engine-20061202.c

Re: [milter-greylist] milter-greylist 3.1.x branch and OpenBSD performance loading greylist.db

2007-01-07 by manu@netbsd.org

AIDA Shinra <shinra@...> wrote:

> > someone really needs to work on sql backend :(
> I'm writing Berkeley DB backend, but cannot make advance due to my
> real life... Development may or may not resume in February.

I've completed a BDB backend back in may 2004. Look for the BDB tag in
CVS.

I droped it because we had no garantee of not loosing data when the
milter would be killed. How are you going to address that problem? 

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] milter-greylist 3.1.x branch and OpenBSD performance loading greylist.db

2007-01-07 by Fabien Tassin

According to manu@...:
> 
> > > someone really needs to work on sql backend :(
> > I'm writing Berkeley DB backend, but cannot make advance due to my
> > real life... Development may or may not resume in February.
> 
> I've completed a BDB backend back in may 2004. Look for the BDB tag in
> CVS.
> 
> I droped it because we had no garantee of not loosing data when the
> milter would be killed. How are you going to address that problem? 

What about SQLite ? I'd love to have that. No need to maintain a local DNSRBL
then. Any tool could update a table very easily from the outside.

/Fabien

Re: [milter-greylist] milter-greylist 3.1.x branch and OpenBSD performance loading greylist.db

2007-01-07 by Emmanuel Dreyfus

On Sun, Jan 07, 2007 at 06:53:30PM +0100, Fabien Tassin wrote:
> > I droped it because we had no garantee of not loosing data when the
> > milter would be killed. How are you going to address that problem? 
> 
> What about SQLite ? I'd love to have that. No need to maintain a local DNSRBL
> then. Any tool could update a table very easily from the outside.

How does it resist to a kill -9?

-- 
Emmanuel Dreyfus
manu@...

Re: [milter-greylist] milter-greylist 3.1.x branch and OpenBSD performance loading greylist.db

2007-01-07 by Fabien Tassin

According to Emmanuel Dreyfus:
> On Sun, Jan 07, 2007 at 06:53:30PM +0100, Fabien Tassin wrote:
> > > I droped it because we had no garantee of not loosing data when the
> > > milter would be killed. How are you going to address that problem? 
> > 
> > What about SQLite ? I'd love to have that. No need to maintain a local DNSRBL
> > then. Any tool could update a table very easily from the outside.
> 
> How does it resist to a kill -9?

theory:

Transactions are atomic, consistent, isolated, and durable (ACID) even after
system crashes and power failures.

(...)

If a process is writing to the database file but exits abruptly without
finishing its write (perhaps because of a power failure or an OS crash) it
leaves behind a "hot journal". Subsequent processes which try to read the
database will see this hot journal and use it to restore the database to a
consistent state.

read this for more on database corruption past issues:
http://www.sqlite.org/cvstrac/wiki?p=DatabaseCorruption

/Fabien

Re: [milter-greylist] milter-greylist 3.1.x branch and OpenBSD performance loading greylist.db

2007-01-08 by Oliver Fromme

Emmanuel Dreyfus wrote:
 > Fabien Tassin wrote:
 > > Emmanuel Dreyfus wrote:
 > > > I droped it because we had no garantee of not loosing data when the
 > > > milter would be killed. How are you going to address that problem? 
 > > 
 > > What about SQLite ? I'd love to have that. No need to maintain a local DNSRBL
 > > then. Any tool could update a table very easily from the outside.
 > 
 > How does it resist to a kill -9?

You need to use a real transactional database (such as
PostgreSQL) in order to support that.  If the transaction
is aborted for any reason (e.g. SIGKILL), it is guaranteed
that the database server rolls it back and the database is
always in a consistent state.  No data is lost except (of
course) for the actual current transaction that has been
aborted.

Using the C client library of PostgreSQL isn't difficult.
There's a very detailed documentation for the current
version here:
http://www.postgresql.org/docs/8.2/static/libpq.html
In fact, only a small subset of the API is required for
the simple things that milter-greylist would need to do
(i.e. just "INSERT" and "SELECT" SQL statements).  The
docs include several example programs on the last page.

Using PostgreSQL also has the advantage that it is very
resistant against crashes and other kinds of forced
shutdowns.  In my daily job as a consultant I often have
to work with various databases, most of them are MySQL
and PostgreSQL.  When a live MySQL database crashes,
it's quite likely that one or more tables are corrupted
so you have to repair them manually, sometimes they're
even damaged to the point that you have to drop them and
restart from the most recent backup.  That never happened
with PostgreSQL, never ever.  Even after a hard reboot
(e.g. after a power outage) the database engine detects
automatically which transactions had been committed to
disk completely and which had not.  It uses techniques
similar to a journalled file system (called "WAL" =
write-ahead log) to keep track of its data.

That's one of the reasons (but not the only one) I always
suggest PostgreSQL as the number one open-source database.

If I had a bit of free time, I would have a look at adding
the support to milter-greylist myself.  It really isn't a
big deal if you have an idea what SQL is.  Unfortunately
my job and my real life don't let me have enough time for
it right now.  :-(

By the way, using an SQL database would also provide more
solutions to the MX synchronization problem.  For example,
instead of using milter-greylist's own MX sync feature,
you could simply let several instances of milter-greylist
(on different MX servers) access the same SQL database.
Or -- for improved redundancy -- let each MX server run its
own SQL server instance, and use SQL replication features
to distribute all changes ("INSERT" commands) to the other
SQL servers.  There are probably even more possibilities,
but the ones mentioned are the most obvious ones.

It would also be a solution to the memory (RAM) problem,
I think.  The milter-greylist processes wouldn't need to
hold any data anymore, so they require very little RAM.
An SQL database, on the other hand, is optimized for
managing huge amounts of data in an efficient way.  That's
the main purpose of a database, after all.

Best regards
   Oliver

-- 
Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

"I started using PostgreSQL around a month ago, and the feeling is
similar to the switch from Linux to FreeBSD in '96 -- 'wow!'."
        -- Oddbjorn Steffensen

Re: [milter-greylist] milter-greylist 3.1.x branch and OpenBSD performance loading greylist.db

2007-01-08 by manu@netbsd.org

AIDA Shinra <shinra@...> wrote:

> I'm writing Berkeley DB backend, but cannot make advance due to my
> real life... Development may or may not resume in February.

That will obviously require a storage API for milter-greylist. It would
be nice if that could be discussed here before a complete patch is done.
Other people might have other strage plans, and having an API on which
anything could plug-in would be better than hacking the thing over and
over to add more storage back-ends.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] milter-greylist 3.1.x branch and OpenBSD performance loading greylist.db

2007-01-08 by AIDA Shinra

At Mon, 8 Jan 2007 09:47:19 +0100,
manu@... wrote:
> 
> AIDA Shinra <shinra@...> wrote:
> 
> > I'm writing Berkeley DB backend, but cannot make advance due to my
> > real life... Development may or may not resume in February.
> 
> That will obviously require a storage API for milter-greylist. It would
> be nice if that could be discussed here before a complete patch is done.
> Other people might have other strage plans, and having an API on which
> anything could plug-in would be better than hacking the thing over and
> over to add more storage back-ends.

Plug-in makes many problems. Milter-greylist is not such a big project
that benefits of plug-in architecture exceed costs.

Re: [milter-greylist] milter-greylist 3.1.x branch and OpenBSD performance loading greylist.db

2007-01-08 by eclark

I agree with Aida here. I also further believe that while some flavor of db 
backend might be nice for smaller users, I highly doubt its function in a 
larger environment. Currently gdmilter is very fast, writing its db only 
periodically and storing as much of the transitional information in memory as 
possible. Adding additional layers to its complexity really does nothing 
beneficial for the milter outside of small homegrown offices. I certainly 
would not want a 2million+ daily mail server feeding its requests for 
updates/whitelistings/blacklistings from some sort of third party database. 
It slows the entire transaction down, increases hardware requirements for it 
to run reliably, and provides only better auditing and recovery in case of 
total failure.

Show quoted textHide quoted text

On Monday 08 January 2007 08:11 am, AIDA Shinra wrote:
> At Mon, 8 Jan 2007 09:47:19 +0100,
>
> manu@... wrote:
> > AIDA Shinra <shinra@...> wrote:
> > > I'm writing Berkeley DB backend, but cannot make advance due to my
> > > real life... Development may or may not resume in February.
> >
> > That will obviously require a storage API for milter-greylist. It would
> > be nice if that could be discussed here before a complete patch is done.
> > Other people might have other strage plans, and having an API on which
> > anything could plug-in would be better than hacking the thing over and
> > over to add more storage back-ends.
>
> Plug-in makes many problems. Milter-greylist is not such a big project
> that benefits of plug-in architecture exceed costs.

Re: [milter-greylist] milter-greylist 3.1.x branch and OpenBSD performance loading greylist.db

2007-01-08 by AIDA Shinra

At Mon, 8 Jan 2007 09:06:32 +0100 (CET),
Oliver Fromme wrote:
> 
> 
> Emmanuel Dreyfus wrote:
>  > Fabien Tassin wrote:
>  > > Emmanuel Dreyfus wrote:
>  > > > I droped it because we had no garantee of not loosing data when the
>  > > > milter would be killed. How are you going to address that problem? 
>  > > 
>  > > What about SQLite ? I'd love to have that. No need to maintain a local DNSRBL
>  > > then. Any tool could update a table very easily from the outside.
>  > 
>  > How does it resist to a kill -9?
> 
> You need to use a real transactional database (such as
> PostgreSQL) in order to support that.  If the transaction
> is aborted for any reason (e.g. SIGKILL), it is guaranteed
> that the database server rolls it back and the database is
> always in a consistent state.  No data is lost except (of
> course) for the actual current transaction that has been
> aborted.

And Berkeley DB is also available for this purpose.

> By the way, using an SQL database would also provide more
> solutions to the MX synchronization problem.  For example,
> instead of using milter-greylist's own MX sync feature,
> you could simply let several instances of milter-greylist
> (on different MX servers) access the same SQL database.
> Or -- for improved redundancy -- let each MX server run its
> own SQL server instance, and use SQL replication features
> to distribute all changes ("INSERT" commands) to the other
> SQL servers.  There are probably even more possibilities,
> but the ones mentioned are the most obvious ones.

Replication copies data from the master read/write server to other
readonly servers. It is not what we want. In contrast, MX sync is
peer-to-peer mechanism. While MX sync cannot ensure consistency
between servers, it satisfies basic needs in SMTP world.

Re: [milter-greylist] milter-greylist 3.1.x branch and OpenBSD performance loading greylist.db

2007-01-08 by Oliver Fromme

AIDA Shinra wrote:
 > Oliver Fromme wrote:
 > [...]
 > > By the way, using an SQL database would also provide more
 > > solutions to the MX synchronization problem.  For example,
 > > instead of using milter-greylist's own MX sync feature,
 > > you could simply let several instances of milter-greylist
 > > (on different MX servers) access the same SQL database.
 > > Or -- for improved redundancy -- let each MX server run its
 > > own SQL server instance, and use SQL replication features
 > > to distribute all changes ("INSERT" commands) to the other
 > > SQL servers.  There are probably even more possibilities,
 > > but the ones mentioned are the most obvious ones.
 > 
 > Replication copies data from the master read/write server to other
 > readonly servers.

Not necessarily.  It depends what kind of replication setup
you use.  It's perfectly possible to have two (or more)
"master" servers (read+write) which replicate changes to
each other.

(Whether it makes sense to do that is another question.
But it's not a bad thing to have the choice to do that, and
the possibility comes for free when an SQL database is
supported.)

Best regards
   Oliver

-- 
Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

"Life is short (You need Python)"
        -- Bruce Eckel, ANSI C++ Comitee member, author
           of "Thinking in C++" and "Thinking in Java"

Re: [milter-greylist] milter-greylist 3.1.x branch and OpenBSD performance loading greylist.db

2007-01-08 by manu@netbsd.org

AIDA Shinra <shinra@...> wrote:

> > That will obviously require a storage API for milter-greylist. It would
> > be nice if that could be discussed here before a complete patch is done.
> > Other people might have other strage plans, and having an API on which
> > anything could plug-in would be better than hacking the thing over and
> > over to add more storage back-ends. 
> Plug-in makes many problems. Milter-greylist is not such a big project
> that benefits of plug-in architecture exceed costs.

I'm not talking about having a directory of DSO implementing back-ends,
just about defining the API between milter-greylist and the storage
software. Making that in a clean way will not consume any performance,
but it will help future developpement.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...