Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Log rotation complexity (was Re: logrolling for dummies without root access)



On Fri, Aug 03, 2001 at 05:47:37PM -0000, Seth Gordon wrote:
> On one of our machines at work (running Digital Unix, if that
> matters), there's a daemon process that spits information into a log
> file.  We want to rotate that log file without stopping the daemon.

Ah, log rotation...  This for some reason has always intrigued me.
It's seemingly a simple thing, but because of some nuances of file I/O
in Unix, it turns out that it can be quite an interesting problem.

Many people mentioned logrotate, which is a great package...  But
Jerry hit upon the key problem that you face: Unless you can convince
your daemon to close the log file (at least temporarily), rotating
your logs in some of the more obvious ways will only result in your
filesystems mysteriously filling up.  This is because of the way Unix
handles file I/O.

When a process opens a file, the kernel creates various tables in
memory for that file.  So long as any process has that file open, the
physical file will remain on the disk, no matter what you do to it.
Even if you remove the file (thereby removing the directory entry),
the file will still physically reside on the disk, accessible to the
kernel via its inode entry in the filesystem, which is not removed
until all programs have closed the file and its link count goes to
zero.

If you can't convince your daemon program to close the file (often
this can be accomplished by sending the daemon a signal, such as
SIGHUP or SIGUSR1, etc.), then your only option may be to truncate the
file.  If you need to retain the data, and 100% data integrity of your
log file is a requirement, then you're pretty much out of luck.  If a
little data loss is acceptable, then there is an approach that will
work.

 - make a copy of the log file with a new name
 - truncate the previous copy of the file with the old name

If your daemon is busy, you will lose some data doing this, because
the daemon will log messages in between the time you copy the original
to a new file and the time that you truncate the original log file.
Here's an example of the concept in shell:

  Logfile="/some/log/file"
  
  cp $Logfile $Logfile.1
  # the `:' does nothing, but allows us to truncate $Logfile
  : > $Logfile

By the way, logrotate WILL do this (if you can get it to work on your
platform).  You just need to configure it properly.  If you can live
with the reletively small amount of data loss, check out the man page
for logrotate, and look for the `copytruncate' option.

Or, if you can't get logrotate to work, I think I have a shell script
that does essentially the same thing.  You may need to tweak it for
your environment.  Ask me if you're interested...

If your daemon will close and reopen its log file when it receives
say, a SIGHUP, then the process changes only very little.  

  Logfile="/some/log/file" 
   
  mv $Logfile $Logfile.1 
  kill -HUP <daemon_process_id>

This is sufficient to rotate the file without any data loss, because
the daemon will continue to log to the same file, even after you move
the file to a different filename!  Remember, once the file is opened,
I/O to it is done by the kernel referencing the inode (or other
related structure in memory -- I'm not a kernel hacker).  Moving the
file to another filename doesn't change the inode, so the I/O goes to
the same place.  Sending the HUP signal tells the daemon to close its
log file, and re-open it using the same name it opened the log file
with before, creating a NEW file (since the old one has a different
name now), with a NEW inode entry, to which log output will now go.

BTW, earlier I mentioned mysterious depletion of available space on
your filesystem...  This can happen due to the fact that the kernel
keeps the file open, referencing it by inode, even if you've removed
the file's directory entry.  Many programs use this technique with
temporary files.  The program does something similar to this:

     /* 
      * create a temporary file and immediately unlink it,
      * so that it can't be messed with by other processes 
      */
     file = fopen( "/some/temp/file", "w" );
     status = unlink( "/some/temp/file" );

     /* 
      * add some brain-dead code to this scenario and you have
      * a system administration nightmare...
      */
     while ( 1 ){
	   fprintf( file, "This will fill up your filesystem\n" );
     }

It's fairly obvious that this will repeat forever, and eventually fill
up your filesystem.  But now you can see that a badly written program
which loops infinitely due to a logic error (a fairly common
programming error) will have the same result.

The code above will, in many cases, be sufficient to drive your system
administrator insane.  You'll be hard-pressed to find out how your
filesystem got filled up, because the file which has filled up the FS
*HAS NO DIRECTORY ENTRY!*  This means you will not be able to find it
with ls -l, du, or any utility that looks at files by walking
directory entries.  Furthermore, you'll bang your head against your
desk as the filesystem CONTINUES to fill up every time you delete a
large file.  The only way to reclaim the disk space used up by this
process is to kill the process which has it open.  At that point, the
file has no directory entries, and is not open by any process, so it
will be removed, and the space is reclaimed.

Yes, I found out about this little gotcha the hard way, when a
developer decided to test buggy code which ran as root on a production
server.  The lsof utility is your friend!  Acutally this was on
HP-UX, and their Glance performance monitoring tool made finding the
process fairly easy, once we figured out what was going on.   I really
wish Linux had a tool like this -- it makes a whole host of problems
much easier to diagnose and correct.


-- 
---------------------------------------------------
Derek Martin          |   Unix/Linux geek
ddm at pizzashack.org    |   GnuPG Key ID: 0x81CFE75D
Retrieve my public key at http://pgp.mit.edu

-
Subcription/unsubscription/info requests: send e-mail with
"subscribe", "unsubscribe", or "info" on the first line of the
message body to discuss-request at blu.org (Subject line is ignored).




BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org