Yahoo Groups archive

Milter-greylist

Index last updated: 2026-04-28 23:32 UTC

Message

Re: [milter-greylist] ext4fs reliability bug

2009-03-20 by Greg Troxel

kogan <kogan@...> writes:

> fclose() guaranees that internal stdio lib buffers are "flushed" to disk
> using write() and close() syscalls. Years ago that was a guarantee that
> data are in fact made it's way to persistent storage (in the early days
> of Unix read/write syscalls were translated to actual read/write
> operations of a storage device).

I think it was only supposed to be a guarantee that the metadata had hit
the disk, but it may be that in practice the writes were scheduled on
the data and the RK05 driver had no reordering.

> So, fclose() only pushes down user-level buffers into kernel but does
> not guarantee that changes are really written to disk. Attempting to
> guarantee than makes necessary not only write any dirty kernel
> buffers but also to issue bus/drive-specific command to flush drvice
> internal buffer. And belive me, you don't want to do it on every file
> close(). Sove devices tend to execute flush command far longer then
> others, and some flush not only write-back but also read-ahead caches
> which impacts subsequent read() operations.

Agreed, but fsync really needs to ensure that the data is on disk, not
just in a disk cache, and there's FUA and tagged queuing and this seems
to be quite hard in practice.

> That is why we need explicit fsync()/fdatasync() to ensure actual disk
> writes and internal buffer flushes.
>
> About fclose() macro:
>
> I belive there is no need to fsync() data before each close(). Most of
> files we write are of no value to us after a system restart. The only
> file that matters - is a database dump. I think that the only one
> fsync() we need is before closing a database dump file.

I agree that only files that need transaction properties should be
fsynced. But that may be most of them.

I also do not understand how fclose(3) can be used with fsync. It seems it really needs an FSYNC flag and doesn't have it. So we need to do

fflush()
fsync()
fclose()

in these cases.

One can argue that the real trouble in ext4 is not that data is pending,
but that it's wrong to commit a rename to the journal before all data
associated with the new file has been committed. I don't know if soft
updates gets this right, or if NetBSD's WAPBL gets it right either.

So in the real world, we should fsync.

It's not just about avoiding data that's a bit stale - that's what
happens when the fsync doesn't happen. It's about replacing the data
with a zero-length file, which is much worse.

Attachments

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.