Yahoo Groups archive

Milter-greylist

Index last updated: 2026-04-28 23:32 UTC

Thread

Cause of "peer queue overflow" errors?

Cause of "peer queue overflow" errors?

2006-11-21 by Rohe, Patrick J.

Hello everyone,

I am currently running the 3.0 build of milter-greylist (previously running various 3.0rc# releases), and continue to have a problem with sync'ing.  We have 4 servers, all handling about 500,000 messages total daily.

Peer sync'ing will work for a period of time, then start producing "peer queue overflow" errors.  This happens on different servers at different times.  What we've noticed is that the error will affect a particular server, but the other servers will continue to sync with each other with no problem; so the problem appears to be localized to the server initiating the synchronization.  (And eventually, all the servers will end up with queue overflow errors.)

In addition, the affected server appears to no longer successfully synchronize *any* information, and only killing the milter process and restarting it, does synchronization work again.  The TCP connection remains established the whole time, so I can't see any connectivity problem between the affected server and its peers.  We don't see any other bottlenecks on the system, and the rest of the local milter-greylist functionality continues to work during these sync failures.  We've even tried increasing the value of SYNC_MAXQLEN in sync.h to 10240, and the problem still occurs.

My questions are:  What are some potential causes of "peer queue overflow" errors?  What is the expected behavior in the event of an overflow -- should at least *some* records continue to be synchronized (while dropping others)?  Why would one server register a peer queue overflow for *all* other peers, when each of those peers can receive sync'd information just fine from their other peers -- unless there is a problem specific to the server initiating the sync?

Thanks to anyone who can help or offer any information,
Patrick

RE: Cause of "peer queue overflow" errors?

2007-03-22 by An.H.Nguyen

I have the same problem.
Has anybody figured out a solution?
Thanks,
An Nguyen
Show quoted textHide quoted text
----- Original Message -----
Sent: Tuesday, November 21, 2006 11:22 AM
Subject: [milter-greylist] Cause of "peer queue overflow" errors?

Hello everyone,

I am currently running the 3.0 build of milter-greylist (previously running various 3.0rc# releases), and continue to have a problem with sync'ing. We have 4 servers, all handling about 500,000 messages total daily.

Peer sync'ing will work for a period of time, then start producing "peer queue overflow" errors. This happens on different servers at different times. What we've noticed is that the error will affect a particular server, but the other servers will continue to sync with each other with no problem; so the problem appears to be localized to the server initiating the synchronization. (And eventually, all the servers will end up with queue overflow errors.)

In addition, the affected server appears to no longer successfully synchronize *any* information, and only killing the milter process and restarting it, does synchronization work again. The TCP connection remains established the whole time, so I can't see any connectivity problem between the affected server and its peers. We don't see any other bottlenecks on the system, and the rest of the local milter-greylist functionality continues to work during these sync failures. We've even tried increasing the value of SYNC_MAXQLEN in sync.h to 10240, and the problem still occurs.

My questions are: What are some potential causes of "peer queue overflow" errors? What is the expected behavior in the event of an overflow -- should at least *some* records continue to be synchronized (while dropping others)? Why would one server register a peer queue overflow for *all* other peers, when each of those peers can receive sync'd information just fine from their other peers -- unless there is a problem specific to the server initiating the sync?

Thanks to anyone who can help or offer any information,
Patrick

Re: [milter-greylist] RE: Cause of "peer queue overflow" errors?

2007-03-22 by manu@netbsd.org

An.H.Nguyen <AnNguyen251@...> wrote:

>   Peer sync'ing will work for a period of time, then start producing "peer
> queue overflow" errors. This happens on different servers at different
> times. What we've noticed is that the error will affect a particular
> server, but the other servers will continue to sync with each other with
> no problem; so the problem appears to be localized to the server
> initiating the synchronization. (And eventually, all the servers will
> end up with queue overflow errors.)

milter-greylsit puts records to be sent on a queue, and a syncer thread
is responsible for emptying the queue. If the syncer thread gets stuck,
the queue will grow to the limit, and you'll get the error message.

There may be a race condition hidden somewhere that cause the syncer
thread to get hung. In order to debug that, you'll have to add
mg_log(LOG_DEBUG, "%s() %s:%d", __func__, __FILE__, __LINE__);
lines everywhere in sync.c:sync_sender(). When the thread stops
operating, check the last debug message so that we can get an idea of
where it got stuck.

An other idea: there is a loop on all the peers:
        LIST_FOREACH(peer, &peer_head, p_list) {

You can add a log here:
mg_log(LOG_DEBUG, "%s sync with %s", peer->p_name);

So that we check that the peer list does not get corrupted.

Sorry, I can't help more, as I never saw that problem occuring at mine.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.