IPP Software Navigation Tools IPP Links Communication Pan-STARRS Links

Opened 17 years ago

Last modified 16 years ago

#1298 assigned defect

suspect bad memory

Reported by: jhoblitt Owned by: cindy
Priority: normal Milestone:
Component: hardware Version:
Severity: normal Keywords:
Cc:

Description

These nodes have been suspect at one time or another of stability issues due to bad memory:

ipp005
ipp008
ipp018
ipp027
ipp025

With ipp018 being the worst offender as MCE errors are being logged.

They should be taken offline and have a memory check run on them. In the past, we've had dubious results running memtest86+ but we've never gone into the BIOS and disabled ECC checking. I think it's worth trying this sort of test again with ECC/scrubbing disabled. The way to track down a defective memory module is via a binary search. Eg. Run memtest86+ until an error is hit, remove have the memory sticks, run mt86+ again. If no error is hit, swap the removed memory back in and the good memory out, repeat. I'd let memtest86+ run for 5-7 days without an error to consider is a negative result.

Change History (3)

comment:1 by cindy, 16 years ago

ipp005 done - no memory
ipp018 - done 1 bad memory board

comment:2 by cindy, 16 years ago

ipp008 ipp0018 - no bad memory

comment:3 by eugene, 16 years ago

Owner: changed from jhoblitt to cindy
Status: newassigned
Note: See TracTickets for help on using tickets.