BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

mcelong reports AMD DRAM Parity Error?

Subject: mcelong reports AMD DRAM Parity Error?
From: warlord-DPNOqEs/LNQ at public.gmane.org (Derek Atkins)
Date: Fri, 19 Nov 2010 10:10:57 -0500
In-reply-to: <584AC39D-A294-4BF9-AFDA-F0D82FA58712-ajLrJawYSntWk0Htik3J/w@public.gmane.org> (Jarod Wilson's message of "Thu, 18 Nov 2010 10:44:07 -0500")
References: <sjmsjyy645c.fsf@pgpdev.ihtfp.org> <584AC39D-A294-4BF9-AFDA-F0D82FA58712@wilsonet.com>

Jarod Wilson <jarod-ajLrJawYSntWk0Htik3J/w at public.gmane.org> writes:

> On Nov 18, 2010, at 10:30 AM, Derek Atkins wrote:
>
>> Hey,
>> 
>> Back onto my mcelog issue from a while ago..
>
> Crap, I apologize, I'd meant to follow up on this, and it fell
> through the cracks... So I jumped right on it right now.
>
>> I finally updated to the
>> newly released mcelog.x86_64 2:1.0-0.1.pre3.fc13 and when I ran mcelog
>> I got this output:
>> 
>> HARDWARE ERROR. This is *NOT* a software problem!
>> Please contact your hardware vendor
>> MCE 0
>> CPU 0 4 northbridge TSC 24b8cb30a62636 
>> MISC c008000001000000 ADDR 3c5e80c80 
>>  Northbridge DRAM Parity Error
>>       bit34 = err cpu2
>>       bit43 = L3 subcache in error bit 1
>>       bit46 = corrected ecc error
>>       bit59 = misc error valid
>>  memory/cache error 'generic read mem transaction, generic transaction, level generic'
>> STATUS 9c294834001d011b MCGSTATUS 0
>> SOCKETID 0 
>> 
>> Does this mean I have a busted CPU?  Or busted RAM?
>
> RAM. However, its not a fatal error, its simply a corrected
> ecc error. I'm told this is all a single event here, and the
> event was the corrected ecc error, anyway. So you might want
> to replace some memory at some point, but hey, its ecc memory
> doing what its designed to do here.

Is there an easy way to figure out which bank of RAM had the error?

I guess I can wait until I have another issue..

> I'd probably not worry about the memory too much, unless its
> happening at least daily, and/or if its causing some sort of
> noticeable performance hit.

Well, when I was running the F13 kernel my VMs would get into a snit and
the virtual disks would lock up, causing "disk IO errors" inside the
VMs.  The same hardware running the F10 kernel doesn't exhibit this
problem.  So, is this a performance hit?  I would say so.  Or it could
be an issue with vmware-server and the F13 kernel.

-derek
-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord-DPNOqEs/LNQ at public.gmane.org                        PGP key available

References:
- mcelong reports AMD DRAM Parity Error?
  - From: warlord-DPNOqEs/LNQ at public.gmane.org (Derek Atkins)
- mcelong reports AMD DRAM Parity Error?
  - From: jarod-ajLrJawYSntWk0Htik3J/w at public.gmane.org (Jarod Wilson)

Prev by Date: NoSQL?
Next by Date: Shadow file entry question
Previous by thread: mcelong reports AMD DRAM Parity Error?
Next by thread: mcelong reports AMD DRAM Parity Error?
Index(es):
- Date
- Thread


BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Boston Linux & Unix / webmaster@blu.org