Opened 17 years ago
Last modified 16 years ago
#1298 assigned defect
suspect bad memory
| Reported by: | jhoblitt | Owned by: | cindy |
|---|---|---|---|
| Priority: | normal | Milestone: | |
| Component: | hardware | Version: | |
| Severity: | normal | Keywords: | |
| Cc: |
Description
These nodes have been suspect at one time or another of stability issues due to bad memory:
ipp005
ipp008
ipp018
ipp027
ipp025
With ipp018 being the worst offender as MCE errors are being logged.
They should be taken offline and have a memory check run on them. In the past, we've had dubious results running memtest86+ but we've never gone into the BIOS and disabled ECC checking. I think it's worth trying this sort of test again with ECC/scrubbing disabled. The way to track down a defective memory module is via a binary search. Eg. Run memtest86+ until an error is hit, remove have the memory sticks, run mt86+ again. If no error is hit, swap the removed memory back in and the good memory out, repeat. I'd let memtest86+ run for 5-7 days without an error to consider is a negative result.
Change History (3)
comment:1 by , 16 years ago
comment:3 by , 16 years ago
| Owner: | changed from to |
|---|---|
| Status: | new → assigned |

ipp005 done - no memory
ipp018 - done 1 bad memory board