Yahoo Groups archive

Milter-greylist

Index last updated: 2026-04-28 23:32 UTC

Thread

[milter-greylist] MX synchronization loss critical bug

[milter-greylist] MX synchronization loss critical bug

2009-05-14 by attila.bruncsak@itu.int

Hello,

Time to time I got the following error message in the log:

milter-greylist: Unexpected reply "105" from peer a.b.c.d closing
connection (0 entries queued)

After this entry many times the peer could not re-establish the MX
synchronization.

I started to debug why is that happening and I found the bug,
but the conditions are very specific:
Both the two peers has to get approximately the same time new
configuration file
and no syncaddr is specified
and only IPV4 should be used on the system.

When a new configuration file is read by the milter-greylist
there is a time when the peer list is empty.
If a new peer connection is coming in exactly
that moment the sync_master thread will exit.

Alone this situation is not a problem
since the sync_master_restart() is called regularly.
On the other hand if only the sync_master for the IPV4 thread exit,
and the sync_master for IPV6 continues to run
The sync_master_restart() will not restart the IPV4 sync_master thread:

        if (empty || sync_master4.runs || sync_master6.runs)
                goto last;

I have rearranged to code of sync_master_restart() to take into account
that any of the sync_master thread can exit independently.
It also reports better possible error conditions.
The side effect of this is that on Tru64 UNIX the compilation
environment
supports IPV6 but the run-time not by default.
Both the two sync_master threads are running, but I get a warning:

milter-greylist: cannot set IPV6_V6ONLY: Invalid argument

After that both the two sync_master thread tries to run on IPV4 socket.
The second one fails on the bind, so the milter-greylist exit.

To fix this error condition I had to add in addition
too the SO_REUSEPORT code in the sync_listen() function.

The patch is attached.

Bests,
Attila

Re: [milter-greylist] MX synchronization loss critical bug

2009-05-15 by Hajimu UMEMOTO

Hi,

>>>>> On Thu, 14 May 2009 11:52:39 +0200
>>>>> attila.bruncsak@... <attila.bruncsak@...> said:

attila> I have rearranged to code of sync_master_restart() to take into account
attila> that any of the sync_master thread can exit independently.
attila> It also reports better possible error conditions.
attila> The side effect of this is that on Tru64 UNIX the compilation
attila> environment
attila> supports IPV6 but the run-time not by default.
attila> Both the two sync_master threads are running, but I get a warning:

attila> milter-greylist: cannot set IPV6_V6ONLY: Invalid argument

attila> After that both the two sync_master thread tries to run on IPV4 socket.
attila> The second one fails on the bind, so the milter-greylist exit.

attila> To fix this error condition I had to add in addition
attila> too the SO_REUSEPORT code in the sync_listen() function.

It seems your code try to listen IPv4 1st then try to listen IPv6.  It
doesn't work correctly on some environment.  So, you need to try to
listen IPv6 before IPv4.  I think trying IPv6 before IPv4 should fix
your problem without issuing SO_REUSEPORT.

attila> The patch is attached.

You changed to test sync_master4.runs and sync_master6.runs
independly.  It seems to me that when either IPv6 thread or IPv4
thread fails, milter-greylist exit, now.  I suspect milter-greylist
became not work on at least IPv4 only host or the envionment where
IPv4-mapped IPv6 address is required to handle both IPv6 and IPv4.
This is why the following code was exist:

	if (!sync_master4.runs && !sync_master6.runs) {
		mg_log(LOG_ERR, "cannot start MX sync, socket failed: %s",
		    strerror(errno));
		exit(EX_OSERR);

Sincerely,

--
Hajimu UMEMOTO @ Internet Mutual Aid Society Yokohama, Japan
ume@...  ume@{,jp.}FreeBSD.org
http://www.imasy.org/~ume/

Re: [milter-greylist] MX synchronization loss critical bug

2009-05-15 by manu@netbsd.org

Hajimu UMEMOTO <ume@...> wrote:

> It seems your code try to listen IPv4 1st then try to listen IPv6.  It
> doesn't work correctly on some environment.  So, you need to try to
> listen IPv6 before IPv4.  I think trying IPv6 before IPv4 should fix
> your problem without issuing SO_REUSEPORT.

Um, it seems I committed too early. Shall I roll back the change?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] MX synchronization loss critical bug

2009-05-15 by Hajimu UMEMOTO

Hi,

>>>>> On Fri, 15 May 2009 07:49:24 +0200
>>>>> manu@... said:

> It seems your code try to listen IPv4 1st then try to listen IPv6.  It
> doesn't work correctly on some environment.  So, you need to try to
> listen IPv6 before IPv4.  I think trying IPv6 before IPv4 should fix
> your problem without issuing SO_REUSEPORT.

manu> Um, it seems I committed too early. Shall I roll back the change?

How about this patch?  It is against fresh 4.3.2.
I confirmed that it runs on my FreeBSD box.  However, I don't test if
it solved the problem.

Sincerely,

RE: [milter-greylist] MX synchronization loss critical bug

2009-05-15 by attila.bruncsak@itu.int

Hello,

> attila> I have rearranged to code of sync_master_restart() to 
> take into account
> attila> that any of the sync_master thread can exit independently.
> attila> It also reports better possible error conditions.
> attila> The side effect of this is that on Tru64 UNIX the compilation
> attila> environment
> attila> supports IPV6 but the run-time not by default.
> attila> Both the two sync_master threads are running, but I 
> get a warning:
> 
> attila> milter-greylist: cannot set IPV6_V6ONLY: Invalid argument
> 
> attila> After that both the two sync_master thread tries to 
> run on IPV4 socket.
> attila> The second one fails on the bind, so the milter-greylist exit.
> 
> attila> To fix this error condition I had to add in addition
> attila> too the SO_REUSEPORT code in the sync_listen() function.
> 
> It seems your code try to listen IPv4 1st then try to listen IPv6.  It
> doesn't work correctly on some environment.  So, you need to try to
> listen IPv6 before IPv4.  I think trying IPv6 before IPv4 should fix
> your problem without issuing SO_REUSEPORT.

I have tried on my system with IPv6 first
and after the IPv4 and behaves
exactly the same way as with the reverse order.
I do not know on other systems is it a problem
If yes we can change the order.

> 
> attila> The patch is attached.
> 
> You changed to test sync_master4.runs and sync_master6.runs
> independly.  It seems to me that when either IPv6 thread or IPv4
> thread fails, milter-greylist exit, now.  I suspect milter-greylist
> became not work on at least IPv4 only host or the envionment where
> IPv4-mapped IPv6 address is required to handle both IPv6 and IPv4.
> This is why the following code was exist:
> 
> 	if (!sync_master4.runs && !sync_master6.runs) {
> 		mg_log(LOG_ERR, "cannot start MX sync, socket 
> failed: %s",
> 		    strerror(errno));
> 		exit(EX_OSERR);
> 

In code I provided the IPv6 code is protected with
#ifdef AF_INET6
So that should not be a problem for IPv4 only systems.
I do not know what the status is for the
IPv4-mapped IPv6 address case.
Can someone try this out and let us know?

An other thinking:
The sync_master_restart() is called very frequently
actually at every new envelop recipient.
At the entry point of this function we do not
have enough information why a sync_master is not running:
we had a permanent failure of starting it up earlier or
it was just exited because the peer list became empty.
If it is a permanent failure (due to improper IP version support)
it is not practical to try to restart so frequently.
We should try to restart only if it was just due
a previous empty peer list exit condition.
What do you think to encode the additional
Information about the reason of non-running
condition into the sync_master[46].runs?

Bests,
Attila

RE: [milter-greylist] MX synchronization loss critical bug

2009-05-15 by attila.bruncsak@itu.int

Hello,

> > It seems your code try to listen IPv4 1st then try to 
> listen IPv6.  It
> > doesn't work correctly on some environment.  So, you need to try to
> > listen IPv6 before IPv4.  I think trying IPv6 before IPv4 should fix
> > your problem without issuing SO_REUSEPORT.
> 
> manu> Um, it seems I committed too early. Shall I roll back 
> the change?
> 
> How about this patch?  It is against fresh 4.3.2.
> I confirmed that it runs on my FreeBSD box.  However, I don't test if
> it solved the problem.
> 

On my Tru64 UNIX system it looks working but I get
"milter-greylist: cannot start MX sync, bind failed: Address already in
use"
message for every envelop recipient,
practically flooded with that message.
Probably with SO_REUSEPORT that message would go away.

Bests,
Attila

Re: [milter-greylist] MX synchronization loss critical bug

2009-05-15 by Hajimu UMEMOTO

Hi,

>>>>> On Fri, 15 May 2009 11:15:28 +0200
>>>>> attila.bruncsak@... <attila.bruncsak@...> said:

attila> On my Tru64 UNIX system it looks working but I get
attila> "milter-greylist: cannot start MX sync, bind failed: Address already in
attila> use"
attila> message for every envelop recipient,
attila> practically flooded with that message.

I suspect you got the message in question even before you tried to fix
the problem, and you get "cannot set IPV6_V6ONLY:" message as well,
right? 
If so, your system uses IPv4-mapped IPv6 address and we cannot disable
use of it.  In that case, your system listen IPv4 by IPv6 socket.  So,
you don't need to listen on IPv4 separately.

attila> Probably with SO_REUSEPORT that message would go away.

If is is your case, the error is expected behavior, and use of
SO_REUSEPORT is not good way, IMHO.  Just disabling the message is
enough.

Index: sync.c
diff -u -p sync.c.orig sync.c
--- sync.c.orig	2009-05-15 16:55:20.000000000 +0900
+++ sync.c	2009-05-15 18:42:14.249372861 +0900
@@ -961,15 +961,19 @@ sync_listen(addr, port, sms)
 		optval = 1;
 		if ((setsockopt(s, IPPROTO_IPV6, IPV6_V6ONLY,
 		    &optval, sizeof(optval))) != 0) {
+#if 0
 			mg_log(LOG_ERR, "cannot set IPV6_V6ONLY: %s",
 			    strerror(errno));
+#endif
 		}
 	}
 #endif
 
 	if (bind(s, SA(&laddr), laddrlen) != 0) {
+#if 0
 		mg_log(LOG_ERR, "cannot start MX sync, bind failed: %s",
 		    strerror(errno));
+#endif
 		sms->runs = 0;
 		close(s);
 		return;


Sincerely,

--
Hajimu UMEMOTO @ Internet Mutual Aid Society Yokohama, Japan
ume@...  ume@{,jp.}FreeBSD.org
http://www.imasy.org/~ume/

Re: [milter-greylist] MX synchronization loss critical bug

2009-05-15 by Hajimu UMEMOTO

Hi,

>>>>> On Fri, 15 May 2009 10:41:21 +0200
>>>>> attila.bruncsak@... <attila.bruncsak@...> said:

attila> In code I provided the IPv6 code is protected with
attila> #ifdef AF_INET6
attila> So that should not be a problem for IPv4 only systems.

Unfortunately, we cannot have such assumption.  There are the
environments where AF_INET6 is defined but IPv6 is not supported.
Further, there are the people who disables IPv6 support by kernel.

Sincerely,

--
Hajimu UMEMOTO @ Internet Mutual Aid Society Yokohama, Japan
ume@...  ume@{,jp.}FreeBSD.org
http://www.imasy.org/~ume/

RE: [milter-greylist] MX synchronization loss critical bug

2009-05-15 by attila.bruncsak@itu.int

> I suspect you got the message in question even before you tried to fix
> the problem, and you get "cannot set IPV6_V6ONLY:" message as well,
> right? 

Yes, I got it once.

> If so, your system uses IPv4-mapped IPv6 address and we cannot disable
> use of it.  In that case, your system listen IPv4 by IPv6 socket.  So,
> you don't need to listen on IPv4 separately.
> 
> attila> Probably with SO_REUSEPORT that message would go away.
> 
> If is is your case, the error is expected behavior, and use of
> SO_REUSEPORT is not good way, IMHO.  Just disabling the message is
> enough.
> 
> Index: sync.c
> diff -u -p sync.c.orig sync.c
> --- sync.c.orig	2009-05-15 16:55:20.000000000 +0900
> +++ sync.c	2009-05-15 18:42:14.249372861 +0900
> @@ -961,15 +961,19 @@ sync_listen(addr, port, sms)
>  		optval = 1;
>  		if ((setsockopt(s, IPPROTO_IPV6, IPV6_V6ONLY,
>  		    &optval, sizeof(optval))) != 0) {
> +#if 0
>  			mg_log(LOG_ERR, "cannot set IPV6_V6ONLY: %s",
>  			    strerror(errno));
> +#endif
>  		}
>  	}
>  #endif
>  
>  	if (bind(s, SA(&laddr), laddrlen) != 0) {
> +#if 0
>  		mg_log(LOG_ERR, "cannot start MX sync, bind failed: %s",
>  		    strerror(errno));
> +#endif
>  		sms->runs = 0;
>  		close(s);
>  		return;

I still feel it is an overhead to try
to open a new socket for every new recipient,
even if there is no error message about the
bind failure because we hide it.

What about the following:
When we once get the message
"cannot set IPV6_V6ONLY"
consider it as serious
(actually no real support for IPv6 in the kernel)
so close the socket do not start
the IPv6 sync_master thread
but set the sync_master6.runs equals to -1
without associated thread running.
The mg will not try to start it again later.

That way the IPv4 thread sync_master thread will start
happily no need the SO_REUSEPORT.

Please see the attached patch for an implementation.
Also based on the vanilla 4.3.2.

Bests,
Attila

Re: [milter-greylist] MX synchronization loss critical bug

2009-05-15 by manu@netbsd.org

<attila.bruncsak@...> wrote:

> Please see the attached patch for an implementation.
> Also based on the vanilla 4.3.2.

I will be AFK for the next week. I hope the issue will be settled then
:-)

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] MX synchronization loss critical bug

2009-05-16 by Hajimu UMEMOTO

Hi,

>>>>> On Fri, 15 May 2009 16:13:22 +0200
>>>>> attila.bruncsak@... <attila.bruncsak@...> said:

attila> I still feel it is an overhead to try
attila> to open a new socket for every new recipient,
attila> even if there is no error message about the
attila> bind failure because we hide it.

attila> What about the following:
attila> When we once get the message
attila> "cannot set IPV6_V6ONLY"
attila> consider it as serious
attila> (actually no real support for IPv6 in the kernel)
attila> so close the socket do not start
attila> the IPv6 sync_master thread
attila> but set the sync_master6.runs equals to -1
attila> without associated thread running.
attila> The mg will not try to start it again later.

attila> That way the IPv4 thread sync_master thread will start
attila> happily no need the SO_REUSEPORT.

attila> Please see the attached patch for an implementation.
attila> Also based on the vanilla 4.3.2.

It seems your idea is good to me.  However, we cannot use IPv6 where
IPV6_V6ONLY is defined but is not usable.
I think sync_listen() should set -1 to sms->runs for all errors,
instead.  The candidate patch against fresh 4.3.2 is attached.

Sincerely,

Re: [milter-greylist] MX synchronization loss critical bug

2009-05-16 by Hajimu UMEMOTO

Hi,

>>> Sat, 16 May 2009 18:19:18 +0900,
>>> Hajimu UMEMOTO <ume@...> said:

ume> It seems your idea is good to me.  However, we cannot use IPv6 where
ume> IPV6_V6ONLY is defined but is not usable.
ume> I think sync_listen() should set -1 to sms->runs for all errors,
ume> instead.  The candidate patch against fresh 4.3.2 is attached.

It didn't help for the case syncaddr is specified.  And, I made
cosmetic changes.  The patch against 4.3.2 is attached.

Sincerely,

RE: [milter-greylist] MX synchronization loss critical bug

2009-05-18 by attila.bruncsak@itu.int

> 
> ume> It seems your idea is good to me.  However, we cannot 
> use IPv6 where
> ume> IPV6_V6ONLY is defined but is not usable.
> ume> I think sync_listen() should set -1 to sms->runs for all errors,
> ume> instead.  The candidate patch against fresh 4.3.2 is attached.
> 
> It didn't help for the case syncaddr is specified.  And, I made
> cosmetic changes.  The patch against 4.3.2 is attached.
> 

Hello,

This works well for me both on Tru64 UNIX and on Linux.
Thanks.

Bests,
Attila

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.