Yahoo Groups archive

Milter-greylist

Index last updated: 2026-04-28 23:32 UTC

Thread

milter-greylist and p0f: socket dialog problem

milter-greylist and p0f: socket dialog problem

2013-12-05 by Jim Klimov

Hello all,

This is a question about p0f, since I can't seem to contact its
author, and I hoped someone in the milter-greylist community has
any experience with it too (version 3.06b).

I've enabled p0f on one of the deployments, and found that after a
while the p0f daemon consumes all of the CPU with about 75% of the
time spent in kernel. Tracing shows that it is busy looping in the
"poll (pollsys) - read" cycle that can be seen in p0f.c around line
970. Apparently, the problem happens when a client does not close
the dialog on the socket gracefully, but it also happens with the
stock p0f-client program, despite the presence of a close(sock).

The loop in p0f is supposed to poll with a 250msec timeout, which
it does with little impact, until a client disconnects (be it a
restart of the milter-greylist or a query with p0f-client) - then
the loop without delays begins, and fires several tens of thousands
of times per second, on this box. If the new client disconnection
repeats, I see the p0f daemon receiving more and more FD's ready
to be read - and it is reading them all too, to no avail:

pollsys(0x080BC6F6, 5, 0x08047B30, 0x00000000)  = 2
read(6, 0x080ACA42, 21)                         = 0
read(8, 0x080ACB4E, 21)                         = 0
pollsys(0x080BC6F6, 5, 0x08047B30, 0x00000000)  = 2
read(6, 0x080ACA42, 21)                         = 0
read(8, 0x080ACB4E, 21)                         = 0
pollsys(0x080BC6F6, 5, 0x08047B30, 0x00000000)  = 2
read(6, 0x080ACA42, 21)                         = 0
read(8, 0x080ACB4E, 21)                         = 0
pollsys(0x080BC6F6, 5, 0x08047B30, 0x00000000)  = 3
accept(5, 0x00000000, 0x00000000, SOV_DEFAULT)  = 10
fcntl(10, F_SETFL, FNONBLOCK)                   = 0
pollsys(0x080BC6F6, 6, 0x08047B30, 0x00000000)  = 3
read(6, 0x080ACA42, 21)                         = 0
read(8, 0x080ACB4E, 21)                         = 0
read(10, "01 F 0 P04 Q05 q05\0\0\0".., 21)      = 21
pollsys(0x080BC6F6, 6, 0x08047B30, 0x00000000)  = 3
read(6, 0x080ACA42, 21)                         = 0
read(8, 0x080ACB4E, 21)                         = 0
write(10, "02 F 0 P  \0\0\0\0\0\0\0".., 232)    = 232
pollsys(0x080BC6F6, 6, 0x08047B30, 0x00000000)  = 3
read(6, 0x080ACA42, 21)                         = 0
read(8, 0x080ACB4E, 21)                         = 0
read(10, 0x080ACD66, 21)                        = 0
pollsys(0x080BC6F6, 6, 0x08047B30, 0x00000000)  = 3
read(6, 0x080ACA42, 21)                         = 0
read(8, 0x080ACB4E, 21)                         = 0
read(10, 0x080ACD66, 21)                        = 0

I am ready to accept that this may be some glitch of Solaris 10u8 x86
involved as the platform; but still - does anyone have any ideas how
to fix or work-around this?

For example, I see that read() returns 0 bytes repeatably - can this
be a valid reason to forcibly close the file descriptor from p0f daemon
side? Should there be some grace period for such closure (i.e. count
that this happens 10 times in a row for this descriptor, or 10000)?

Thanks for ideas,
//Jim Klimov

Re: [milter-greylist] milter-greylist and p0f: socket dialog problem

2013-12-05 by Jim Klimov

On 2013-12-05 17:50, Jim Klimov wrote:
> Hello all,
>
> This is a question about p0f, since I can't seem to contact its
> author, and I hoped someone in the milter-greylist community has
> any experience with it too (version 3.06b).

FWIW, I added this check to p0f.c:979 and it worked for me:

           if (i < 0) PFATAL("read() on API socket fails despite POLLIN.");

+          if (i == 0) {
+            pfds[cur].revents |= POLLHUP;
+            pfds[cur].revents |= POLLERR;
+            DEBUG("[#] API connection on fd[%d]=%d has ended: read 
returned zero.\n",
+                cur, pfds[cur].fd);
+          }

           ctable[cur]->in_off += i;


The flags involved are checked a bit later as the condition for closed
dialog and invoke the logic to tear its server side down.

No more CPU hogging now :)

Still hope for comments whether this was the right thing to do...

//Jim

Re: [milter-greylist] milter-greylist and p0f: socket dialog problem

2013-12-06 by manu@...

Jim Klimov <jimklimov@...> wrote:

> I am ready to accept that this may be some glitch of Solaris 10u8 x86
> involved as the platform; but still - does anyone have any ideas how
> to fix or work-around this?

How is it implemented in p0f code? select(2) or poll(2) loop? Perhaps
there is something wrong with the timeout?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@...

Re: [milter-greylist] milter-greylist and p0f: socket dialog problem

2013-12-06 by Jim Klimov

On 2013-12-06 02:12, manu@... wrote:
> Jim Klimov <jimklimov@...> wrote:
>
>  > I am ready to accept that this may be some glitch of Solaris 10u8 x86
>  > involved as the platform; but still - does anyone have any ideas how
>  > to fix or work-around this?
>
> How is it implemented in p0f code? select(2) or poll(2) loop? Perhaps
> there is something wrong with the timeout?

It is a poll with a 250 msec timeout, as can be seen in p0f.c:874
function live_event_loop() - the code calls poll(), then interprets
its results and acts on them with a series of switch'es:

   while (!stop_soon) {

     s32 pret, i;
     u32 cur;

     /* We use a 250 ms timeout to keep Ctrl-C responsive without 
resortng to
        silly sigaction hackery or unsafe signal handler code. */

poll_again:

     pret = poll(pfds, pfd_count, 250);
...


Ultimately, by line 973, it determines that there is some data
toread() from one file descriptor or another. However, in the
pathological case, the reads return zero bytes, and there is
always a flag up that something can be read, so it becomes a
full-speed infinite loop.

I read up a bit on this yesterday, and it seems to be a proper
solution to close the connection if read() returns zero bytes
while poll() is signalling that "something" is available.

//Jim

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.