BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Asynchronous File I/O on Linux

Subject: Asynchronous File I/O on Linux
From: bogstad-e+AXbWqSrlAAvxtiuMwx3w at public.gmane.org (Bill Bogstad)
Date: Tue, 18 May 2010 13:01:22 -0400
In-reply-to: <8947B611-EA2B-4923-9921-85C6E3E3119A-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
References: <mailman.90988.1274159676.8097.discuss@blu.org> <4BF28EFD.8050304@mohawksoft.com> <8947B611-EA2B-4923-9921-85C6E3E3119A@gmail.com>

On Tue, May 18, 2010 at 10:20 AM, Richard Pieri <richard.pieri-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org> wrote:
> On May 18, 2010, at 8:58 AM, Mark Woodward wrote:
>>
>> Wait, even from a pedantic perspective, asynchronous I/O is the ability it issue disk I/O requests without blocking the process or thread. I am merely attempting to use this ability to optimize a particular type of operation on a file.
>
> No, you're not. ?Really. ?Try this: open() a file handle with the async I/O ?option then try to read() and see what happens. ?Experiment, because the results are not what you seem to expect.

He already knows that he can't do what he wants with the standard
read() function.  He is trying to determine if some other function
will work.

>
>> With tagged queuing on SATA and SCSI before it, a driver is able to issue multiple requests simultaneously to the device and the device is supposed to be able to get requested blocks in cache and return them over the device I/O bus.
>
> This is concurrent I/O.

Can we stop harping on what to call what he wants to accomplish?
Fine, we'll call it concurrent IO.   So how does he do the
equivalent (submit multiple requests in one operation) from a
Linux/UNIX application?  I'm guessing that he wants to interleave the
IO and computation time in his app and therefore wants to submit the
requests in a non-blocking fashion and either receive an event or poll
for completion status.

He also doesn't want to open the file multiple times which causes a
problem since a file descriptor can only have one lseek() position at
a time.   I can imagine scenarios (for example a library might be
passed in an already opened FD) which would make it helpful to be able
to do this.  I've heard no comment about why using /proc/self/fd/###
to reopen the same file is not an acceptable solution to that part of
his problem.

>> (3) A "few" (4) milliseconds shaved off a function that is run half a million to a million times is between 1/2 hour and an hour of wall clock time. That is important.
>
> If you stripe across three spindles then you cut your access times by approximately 33% without having to code anything. ?But if you code it anyway then your code will probably run *slower* because you're wasting CPU cycles trying to optimize something with completely different seek timings from what you expect from a single spindle.

Err, CPU cycles are practically free compared to disk seeks.   That's
why disk schedulers implement things like elevator algorithms rather
then FIFO:

http://en.wikipedia.org/wiki/Elevator_algorithm

These algorithms tend to work better with longer queues of requests
and he wants to fill the scheduler's queue with pending requests.
Think of this as application directed file read ahead rather then a
more naive read the next NNKbytes.

You are making the assumption that the particular set of requests that
his SINGLE (go fast) app want to make happen to
have been written by the filesystem so they end up in stripes on
different spindles.   Striping data works well for high speed
sequential file IO, but may have little or no benefit for other access
patterns.   A good mirrored disk implementation should always speed up
reads with any access pattern (but no help for writes).  Of course,
this is ONLY true if the (go fast) app has a way to efficiently submit
multiple requests and it's also helpful if you can interleave your
computation with IO time.

BTW, in thinking/googling about this; I ran across two interesting
system calls which could help:

pread(fd, buf, count, offset) - sort of like a combined
lseek()/read().  Unfortunately, it is blocking so you would have to
have one thread per request and some kind of synchronization with the
main computation thread.   The advantage is that it does not change
the file offset, so multiple threads could use the same FD.

readhead(fd, offset, count) - Linux specific way to pre-populate the
page cache with data from a file.  offset/count are rounded up/down to
be page sized chunks.   This is a blocking operation so threads would
be required.  Doesn't affect the
file offset so the threads could use the same FD as the computation thread.

One approach might be to have the computation thread spawn a new IO
thread for each readahead() call.   The computation thread would then
call the regular read() call when it actually needed the data.    If
IO interleaving worked the read() would return immediately, if not the
computation thread would block until the data was available.   If the
computation thread can't be allowed to ever block then having the IO
thread use pread() and some kind of completion queue for the
computation thread to examine might work.

If threads are totally unacceptable, I don't see how to do this.   I
see no way to submit multiple requests in a single system call so if
you want (near) simultaneous requests (feed the elevator algorithm)
you seem to need one thread per request.  A single system call that
combined pread() and readv() would be nice.

Bill Bogstad

References:
- Asynchronous File I/O on Linux
  - From: markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org (Mark Woodward)
- Asynchronous File I/O on Linux
  - From: richard.pieri-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org (Richard Pieri)

Prev by Date: Power Management and Encryption
Next by Date: Asynchronous File I/O on Linux
Previous by thread: Asynchronous File I/O on Linux
Next by thread: Asynchronous File I/O on Linux
Index(es):
- Date
- Thread


BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Boston Linux & Unix / webmaster@blu.org