Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

How to best do zillions of little files?



I had a scenario where I was trying to create 300,000 files in
one directory.  The files were named V[number] where [number]
was monotonically increasing from 0-299999.  I killed the process
after waiting a couple hours.

Breaking it up into directories of about 2000 files each REALLY
helped.  In my case I used a one-level-deep sub-directory tree,
where the directories were named D[number] and D[i] contained
filed V[i*n] to V[i*(n+1)-1].  Creating the tree of 300,000 files
using this method took about 5 minutes, and lookups are also fast.
There are only 150 directories involved with 2000 files each,
and I only need to know the end filename in order to know where to
find it.

-derek

John Chambers <jc at trillian.mit.edu> writes:

> I have a job that entails on the order of 50 million web pages  (they
> say  this  week  ;-),  a few Kbytes each.  Now, unix file systems are
> generally known to not work that well when you have millions of files
> in  a  single  directory, and the general approach of splitting it up
> into a tree is well known.  But I haven't seen any  good  info  about
> linux  file systems, and the obvious google keywords seem to get lots
> of interesting but irrelevant stuff.
> 
> Anyone know of some good info on this topic for various file systems?
> Is there a generallly-useful directory scheme that makes it work well
> (or at least not too poorly) on all linux file systems?
> 
> There's also the possibility of trying the DB systems, but it'd be  a
> bit  disappointing  to spend months doing this and find that the best
> case is an order of magnitude slower than the  dumb  nested-directory
> approach.   (I've  seen this already so many times that I consider it
> the most likely outcome of storing files as records in a DB.  ;-)
> 
> _______________________________________________
> Discuss mailing list
> Discuss at blu.org
> http://www.blu.org/mailman/listinfo/discuss

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord at MIT.EDU                        PGP key available




BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org