Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] Is this bad?



Derek Atkins wrote:
> I finally turned on Smartd on one of my servers (don't ask me why it
> wasn't on earlier)...

I discovered that smartd wasn't enabled on my desktop long after it has
been deployed because /etc/default/smartmontools, which comes from
Debian and is carried through to Ubuntu, has it turned off by default.

I don't know if they've changed this in newer OS versions, or what the
justification is for not having it enabled by default. (I'd also have
the relevant packages to display desktop notification of SMART errors
installed by default.)

I've also found that the DEVICESCAN directive in /etc/smartd.conf to
automatically find the drives never quite works as I expect, so I end up
having to explicitly list each drive. (I generally want to stager the
time that selftests run anyway.)


>  /dev/sda [SAT] :
>     Prefailure: Raw_Read_Error_Rate (1) changed to 113...

Ditto on what Dan said. Posting a question to the smartmontools mailing
list might turn up a better answer, specific to your drive. smartd can
be tuned to ignore attributes or alter the warning thresholds if you
determine this to be noise.


>     Usage: Hardware_ECC_Recovered (195) changed to 34...

Ditto on what David Miller said. Note how this attribute is labeled
"usage" and not "prefailure." I'd adjust smartd or logwatch to ignore these.


>     Usage: Airflow_Temperature_Cel (190) changed to 68...

This one I'd look into. (Didn't that Google whitepaper on drive
longevity point to operating temperature as being a significant factor?)

Theoretically, the conversion formula used by smartd to calculate the
temperature could be wrong for your drive, so you could verify it using
a temperature probe, though that'll only tell you the exterior
temperature on the drive. Probably simpler to just try and cool the
drive(s) better.

Do you have a drive cage fan that died? If not, consider dropping in
another fan. You can pick up decent fans at Micro Center for as little
as $3 from their bulk packaged area. I added a few fans to my MythTV
back-end last summer.


> Could this be related to why when the machine is under heavy load that
> ksoftirqd/1 starts spinning and taking up lots of CPU? 

Seems unlikely as I'd expect the SMART errors you mentioned to be
handled internal to the drive, and thus not something that would trigger
more I/O operations.


>  **Unmatched Entries**
>  Device: /dev/sda [SAT], previous self-test completed without error
>  Device: /dev/sdb [SAT], previous self-test completed without error

Need to update logwatch?

 -Tom

-- 
Tom Metro
Venture Logic, Newton, MA, USA
"Enterprise solutions through open source."
Professional Profile: http://tmetro.venturelogic.com/



BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org