Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RAID Monitoring on Red Hat 7.2



What good is a SW RAID system that can smoothly survive a disk failure
if you never find out you need to replace a disk?  (This ignores the
question of whether the computer can survive the loss of a disk
without crashing--but even then if it reboots, don't I for sure want
to know that it came up with only disk?)  I want to know if something
goes wrong.


Here is my approach.

First, look at /proc/mdstat by hand and make sure you like what you
see, then make the reference copy:

  # mkdir /etc/mdstat.reference
  # cp /proc/mdstat /etc/mdstat.reference/mdstat

Then, create the file /etc/cron.daily/raidcheck, containing:

  #!/bin/sh

  diff /etc/mdstat.reference/mdstat /proc/mdstat || diff -Nu /etc/mdstat.reference/mdstat /proc/mdstat | mail root -s "RAID Status Changed on `hostname`"

Finally, make it executable:

  # chmod 755 /etc/cron.daily/raidcheck


It might be crude, but it is simple enough to maybe work correctly.  I
am making a few assumptions:

  1. /proc/mdstat doesn't change unless something notable changes

  2. cron it already running on the machine

  3. someone is actually reading root's e-mail (change the "root" to
     something louder in /etc/cron.daily/raidcheck if necessary)

  4. failure of a RAID disk (assuming it doesn't crash the machine),
     is not an emergency, the other disk(s) is (are) still running,
     hot spares will have been put into service if you cared to have
     them, but you need to buy a replacement disk soon--getting an
     e-mail every day until you do seems simple and appropriate
     
  5. this is useful even if not integrated into RH and the various
     raid tools (I put the reference copy in a directory so I can put
     an explanatory readme.txt next to it)


If anyone finds it useful, please use it.  If anyone has comments, I
am interested in hearing them.  


-kb, the Kent who likes that this is quite simple and doesn't attempt
to be clever and update state information.




BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org