Hello,
Regarding a DB backend for storing greylist tuples ...
What is the average ratio of creating tuples vs. looking
up tuples in the list? In other words: How many write
operations and how many read operations? That is a very
important question, because it affects the scalability
of a DB backend solution.
Someone mentioned a situation with 20 servers. Currently
every newly created tuple has to be replicated to all the
other servers, so that's 19 packet exchanges. On the other
hand, a look-up is extremely cheap: Every server has a
complete copy of the list, so it can look up tuples locally
without any network access.
Now consider the same 20 servers using a database cluster
(lets say 2 DB servers for redundancy and load-balancing).
When a new packet is generated, only the DB servers need
to be contacted, so there are two packet exchanges (lets
ignore the fact that an SQL command is more complicated
than milter-greylist's MX sync packet). On the other hand,
every look-up also requires a network packet exchange (SQL
select query) between an MX server and one of the database
servers. This means more delay on the greylist servers,
and possibly increases thread contention.
So which one scales better? If 10% of all operations are
new tuple creation, and 90% are look-ups, then I think the
current implementation without separate DB servers scales
better. If the numbers are 50%/50%, then the answer might
be different, but it probably depends on a lot of things
(number of servers, performance of DB servers, network
bandwith, and so on). You can't even benchmark it properly
without actually putting the whole setup in production,
I guess, because it's very difficult to generate realistic
mail load on such a large setup.
What I'm trying to say: Implementing a DB backend will not
necessarily solve any scalability propblems. You should
first get some statistics on the behaviour (number of
newly created tuples vs. number of tuple look-ups per
time unit) in order to decide whether it's even worth the
effort to implement a DB backend.
I think there are other -- maybe better -- ways to improve
scalability. For example, when you've got 20 servers, then
don't use all of them for the same set of domains. Instead
you should partition them into sets that create roughly
the same load ("divide and conquer"), e.g. use 5 sets of
4 servers for different domains. That will cut the MX sync
traffic down by 75%.
Best regards
Oliver
--
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606, Gesch\ufffdftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M\ufffdn-
chen, HRB 125758, Gesch\ufffdftsf\ufffdhrer: Maik Bachmann, Olaf Erb, Ralf Gebhart
FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd
"And believe me, as a C++ programmer, I don't hesitate to question
the decisions of language designers. After a decent amount of C++
exposure, Python's flaws seem ridiculously small." -- Ville VainioMessage
Re: [milter-greylist] Implement MySQL backend in
2009-01-20 by Oliver Fromme
Attachments
- No local attachments were found for this message.