Hi! A few weeks ago our server shutdown because of one of the memory dimm encounters uncorrectable ECC error. In our host configuration, we are using the default Template Server Chassis by IPMI template but this template only has a Information severity level trigger about DIMM value change, and therefore we missed this alert.
So I'm wondering if we want to be aware of these kind of errors, are there any better template that can detect DIMM ECC errors other than report these as imformation?
Or maybe I should not use IPMI to detect memory DIMM error? If so what interface shall I use to detect these?
Any help would be grateful, thanks!

So I'm wondering if we want to be aware of these kind of errors, are there any better template that can detect DIMM ECC errors other than report these as imformation?
Or maybe I should not use IPMI to detect memory DIMM error? If so what interface shall I use to detect these?
Any help would be grateful, thanks!
Comment