Boston Linux & UNIX was originally founded in 1994 as part of The Boston Computer Society. We meet on the third Wednesday of each month at the Massachusetts Institute of Technology, in Building E51.

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] Gluster startup, small-files performance



On Wed, May 14, 2014 at 11:25 AM, F. O. Ozbek <ozbek at gmx.com> wrote:
>
>
> On 05/14/2014 11:13 AM, Richard Pieri wrote:
>>
>> F. O. Ozbek wrote:
>>>
>>> The data gets written. We have tested it.
>>
>>
>> Ignoring fsync/O_SYNC means that the file system driver doesn't flush
>> its write buffers when instructed to do so. Maybe the data gets written.
>> Maybe not. You can't be sure of it unless writes are atomic, and you
>> don't get that from MooseFS.
>>
>> So, like I said, good luck with that. Because outages like the one that
>> clobbered Cambridge back in November 2012 happen.
>>
>
> That is the whole point, "doesn't flush its write buffers when instructed to
> do so". You don't need to instruct. The data gets written
> all the time. When we have done the tests, we have done tens of thousands of
> writes (basically checksum'ed test files) and
> read tests succeeded all the time. OK, I admit, you probably
> do not want to run your transactional financial applications on moosefs
> but the reality is that these filesystems are used in research
> environments where high I/O bandwidth is the key.
> The fact that it doesn't support forcing the buffer to the disk
> is not the problem in this case. Glusterfs will start giving
> you random I/O errors under heavy load. How is that any good?
>
> I don't know what you are referring to in Cambridge but
> we are not Cambridge.

The issue is, how much data do you lose if the power goes off?  In
particular, what data
do you lose from any file descriptors that have been closed or whose
most recent operation was an fsync() call.
If a filesystem caches writes for an extended period before committing
them to disk you could
lose significant data due to power failure/system crash.  Is there a
delay beween an fsync()
or a close() is completed and all data/meta-data reaches stable storage?

I believe Richard is saying that there can be a delay with MooseFS and
that is a problem.   I recall hearing something
like that myself, but the current FAQ for MooseFS seems to  imply that
it does handle close()/fsync() correctly:

http://www.moosefs.org/moosefs-faq.html#wriiten
...
Hence, before executing close, it is recommended (especially when
using MooseFS) to perform an fsync operation after writing to a file
and then checking the status of the result of the fsync operation.
Then, for good measure, also check the return status of close as well.
...

That seems to imply that if MooseFS had this problem it doesn't have
it anymore.   Without doing careful testing (or reading
the source), it can be difficult to test this.   Getting the timing
right to invoke such a failure (essentially pulling the plug at the
wrong time or sudden non-disk hardware failure) can be an extremely
difficult thing to arrange.   I don't know if you tested for this
failure mode.   I don't know when Richard last checked on this and it
is possible that it was broken at one time and has been fixed since
then.

It may also be that you don't actually care. Unless you are logging
transactions that need to be kept consistent, the higher performance
that could come with agressive write caching might be worth the
reduced reliability.   (OTOH losing all data written
to a file that has been open for writting for days would seem to be a
bit harsh.)

Of course, as the FAQ points out many programmers assume even tighter
results from a filesystem.   i.e. That a relatively small
amount of data can be lost even if an fsync()/close() hasn't been
done.   I don't think POSIX requires this, but since most
filesystems seem to work this way we have gotten used to that
behavior.   This is why the FAQ suggests doing something like:

prog | fsync-cat > foobar

rather then

prog > foobar

Pipes tend to have relatively small buffers and by using an fsync()ing
cat program you can
guard against data loss when using MooseFS.

Bill Bogstad



BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org