milter-greylist and p0f: socket dialog problem
2013-12-05 by Jim Klimov
Hello all, This is a question about p0f, since I can't seem to contact its author, and I hoped someone in the milter-greylist community has any experience with it too (version 3.06b). I've enabled p0f on one of the deployments, and found that after a while the p0f daemon consumes all of the CPU with about 75% of the time spent in kernel. Tracing shows that it is busy looping in the "poll (pollsys) - read" cycle that can be seen in p0f.c around line 970. Apparently, the problem happens when a client does not close the dialog on the socket gracefully, but it also happens with the stock p0f-client program, despite the presence of a close(sock). The loop in p0f is supposed to poll with a 250msec timeout, which it does with little impact, until a client disconnects (be it a restart of the milter-greylist or a query with p0f-client) - then the loop without delays begins, and fires several tens of thousands of times per second, on this box. If the new client disconnection repeats, I see the p0f daemon receiving more and more FD's ready to be read - and it is reading them all too, to no avail: pollsys(0x080BC6F6, 5, 0x08047B30, 0x00000000) = 2 read(6, 0x080ACA42, 21) = 0 read(8, 0x080ACB4E, 21) = 0 pollsys(0x080BC6F6, 5, 0x08047B30, 0x00000000) = 2 read(6, 0x080ACA42, 21) = 0 read(8, 0x080ACB4E, 21) = 0 pollsys(0x080BC6F6, 5, 0x08047B30, 0x00000000) = 2 read(6, 0x080ACA42, 21) = 0 read(8, 0x080ACB4E, 21) = 0 pollsys(0x080BC6F6, 5, 0x08047B30, 0x00000000) = 3 accept(5, 0x00000000, 0x00000000, SOV_DEFAULT) = 10 fcntl(10, F_SETFL, FNONBLOCK) = 0 pollsys(0x080BC6F6, 6, 0x08047B30, 0x00000000) = 3 read(6, 0x080ACA42, 21) = 0 read(8, 0x080ACB4E, 21) = 0 read(10, "01 F 0 P04 Q05 q05\0\0\0".., 21) = 21 pollsys(0x080BC6F6, 6, 0x08047B30, 0x00000000) = 3 read(6, 0x080ACA42, 21) = 0 read(8, 0x080ACB4E, 21) = 0 write(10, "02 F 0 P \0\0\0\0\0\0\0".., 232) = 232 pollsys(0x080BC6F6, 6, 0x08047B30, 0x00000000) = 3 read(6, 0x080ACA42, 21) = 0 read(8, 0x080ACB4E, 21) = 0 read(10, 0x080ACD66, 21) = 0 pollsys(0x080BC6F6, 6, 0x08047B30, 0x00000000) = 3 read(6, 0x080ACA42, 21) = 0 read(8, 0x080ACB4E, 21) = 0 read(10, 0x080ACD66, 21) = 0 I am ready to accept that this may be some glitch of Solaris 10u8 x86 involved as the platform; but still - does anyone have any ideas how to fix or work-around this? For example, I see that read() returns 0 bytes repeatably - can this be a valid reason to forcibly close the file descriptor from p0f daemon side? Should there be some grace period for such closure (i.e. count that this happens 10 times in a row for this descriptor, or 10000)? Thanks for ideas, //Jim Klimov