[milter-greylist] MX synchronization loss critical bug
2009-05-14 by attila.bruncsak@itu.int
Hello,
Time to time I got the following error message in the log:
milter-greylist: Unexpected reply "105" from peer a.b.c.d closing
connection (0 entries queued)
After this entry many times the peer could not re-establish the MX
synchronization.
I started to debug why is that happening and I found the bug,
but the conditions are very specific:
Both the two peers has to get approximately the same time new
configuration file
and no syncaddr is specified
and only IPV4 should be used on the system.
When a new configuration file is read by the milter-greylist
there is a time when the peer list is empty.
If a new peer connection is coming in exactly
that moment the sync_master thread will exit.
Alone this situation is not a problem
since the sync_master_restart() is called regularly.
On the other hand if only the sync_master for the IPV4 thread exit,
and the sync_master for IPV6 continues to run
The sync_master_restart() will not restart the IPV4 sync_master thread:
if (empty || sync_master4.runs || sync_master6.runs)
goto last;
I have rearranged to code of sync_master_restart() to take into account
that any of the sync_master thread can exit independently.
It also reports better possible error conditions.
The side effect of this is that on Tru64 UNIX the compilation
environment
supports IPV6 but the run-time not by default.
Both the two sync_master threads are running, but I get a warning:
milter-greylist: cannot set IPV6_V6ONLY: Invalid argument
After that both the two sync_master thread tries to run on IPV4 socket.
The second one fails on the bind, so the milter-greylist exit.
To fix this error condition I had to add in addition
too the SO_REUSEPORT code in the sync_listen() function.
The patch is attached.
Bests,
Attila