On Fri, Oct 14, 2005 at 04:15:20PM -0400, Matthew S. Cramer wrote: > I've added another global var and am now incrementing each time > sync_server starts and decrementing each time it returns. > Each time milter-greylist crashes it is preceeded by the > pthread_create errors of EAGAIN, always when trying to start a thread > for sync_server. I think I've made some progress. I have 3 MX servers, and in 1 of them I made the change above. I was never able to get the milter to run with sync enabled for more than a couple hours before crashing. I did this Friday before leaving for the weekend. The server to which I made the change did NOT crash all weekend, while the other two did. Is it possible there is a race condition in sync_server, or some other thread problem? This is my current theory because to increment and decrement my global var for counting sync threads, I set a mutex, which may be resolving the real problem: the sync_server threads need to wait on each other to finish. There is a mutex in the code that initiates a connection to another server but there wasn't one in the function that handles the incoming connection. Maybe you have to be running more than 2 servers to see this problem? Is anyone else using more than 2? I've moved my patched code over to the other 2 servers and will see if they run all day without crashing, and will let the list know what I find. My patches for reference: bash-3.00$ diff milter-greylist.h milter-greylist-2.0.1/milter-greylist.h 96,99d95 < /* MSC */ < int syncers; < /* MSC */ < bash-3.00$ diff milter-greylist.c milter-greylist-2.0.1/milter-greylist.c 586,592d585 < < /* MSC */ < extern int syncers; < < syncers = 0; < /* MSC */ < bash-3.00$ diff sync.c milter-greylist-2.0.1/sync.c 74,77d73 < /* MSC */ < pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER; < /* MSC */ < 826,834d821 < /* MSC */ < extern int syncers; < < pthread_mutex_lock( &mutex1 ); < syncers++; < syslog(LOG_ERR, "Syncers: %d", syncers); < pthread_mutex_unlock( &mutex1 ); < /* MSC */ < 1003,1009d989 < /* MSC */ < pthread_mutex_lock( &mutex1 ); < syncers--; < syslog(LOG_ERR, "Syncers: %d", syncers); < pthread_mutex_unlock( &mutex1 ); < /* MSC */ < Optimistically, Matt -- Matthew S. Cramer <mscramer@...> Office: 717-396-5032 Project Manager, Planning and Service Management Fax: 717-396-5590 Armstrong World Industries, Inc. Cell: 717-917-7099
Message
Re: [milter-greylist] sync causing crash
2005-10-17 by Matthew S. Cramer
Attachments
- No local attachments were found for this message.