BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Asynchronous File I/O on Linux

Subject: Asynchronous File I/O on Linux
From: bogstad-e+AXbWqSrlAAvxtiuMwx3w at public.gmane.org (Bill Bogstad)
Date: Tue, 18 May 2010 14:06:25 -0400
In-reply-to: <F56F6DB2-C40B-4D3C-8432-EA4B264D3FFA-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
References: <mailman.90988.1274159676.8097.discuss@blu.org> <4BF28EFD.8050304@mohawksoft.com> <8947B611-EA2B-4923-9921-85C6E3E3119A@gmail.com> <AANLkTilLEV3_vU_pmcB9FRWckJ7mWYS7w4vhgEkVJV5i@mail.gmail.com> <F56F6DB2-C40B-4D3C-8432-EA4B264D3FFA@gmail.com>

On Tue, May 18, 2010 at 1:19 PM, Richard Pieri <richard.pieri-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org> wrote:
> On May 18, 2010, at 1:01 PM, Bill Bogstad wrote:
>>
>> He also doesn't want to open the file multiple times which causes a
>> problem since a file descriptor can only have one lseek() position at
>> a time. ? I can imagine scenarios (for example a library might be
>
> This is why you create a bunch of threads, each with its own file handle, each file handle with its own unique pointer. ?You have to open the file multiple times and run each read() in its own thread to truly parallelize the read operations.

Please re-read the end of my last message.   Take a look at pread()
(POSIX) and readahead() (Linux only).
It turns out you do not need separate file handles.    Threads may
still be required to make it non-blocking.

>> Err, CPU cycles are practically free compared to disk seeks. ? That's
>> why disk schedulers implement things like elevator algorithms rather
>> then FIFO:
>
> They're not free if your poll/select loop is waiting for input from the device. ?In other words, you're blocking on the I/O anyway, you're just shifting where you block from the kernel into a loop in your program.

Not necessarily   I would like random chunks of data from this file
(perhaps NEED it at some specific computation point in the future),
but I have some other computation I can do in the meantime.   Please
start the disk IO now.  Don't make me create multiple FDs for a single
file.    At my option, I would like you to:

1. signal() me in some way when the data is available.
2. have a non-blocking operation I could use to check when the data is ready.

Bonus would be if I could submit multiple such requests to the kernel
with a single system call so I can be sure the disk scheduler's
algorithm has all my requests as soon as possible.

I don't see why this is an absurd way to want to do computation.  It
may not be possible in a Linux/POSIX environment, but I don't see why
it wasn't worth thinking about how to come close to it.   If I hadn't,
I would have never learned about pread()/readahead() which look to be
very useful.   If you allow threads, it would appear that just about
everything except
the bonus part is doable.

Bill Bogstad

References:
- Asynchronous File I/O on Linux
  - From: markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org (Mark Woodward)
- Asynchronous File I/O on Linux
  - From: richard.pieri-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org (Richard Pieri)
- Asynchronous File I/O on Linux
  - From: bogstad-e+AXbWqSrlAAvxtiuMwx3w at public.gmane.org (Bill Bogstad)
- Asynchronous File I/O on Linux
  - From: richard.pieri-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org (Richard Pieri)

Prev by Date: Power Management and Encryption
Next by Date: Power Management and Encryption
Previous by thread: Asynchronous File I/O on Linux
Next by thread: Asynchronous File I/O on Linux
Index(es):
- Date
- Thread


BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Boston Linux & Unix / webmaster@blu.org