Feature #17

avatar

EDAC monitoring support

Added by Mikhail Yakshin 3250 days ago.

Status:New Start:01/23/2009
Priority:Normal Due date:
Assigned to:- % Done:

0%

Category:-
Target version:4.0

Description

EDAC is a generic name for technologies that do ECC memory monitoring and control in modern Linux kernels. We need to implement monitoring script that will support ECC memory checking / monitoring in Inquisitor.

Most modern Linux kernels support EDAC somehow, stability the same status as lm_sensors - i.e. declared as "unstable", but in practice, EDAC is widely supported by distro vendors and backported into a wide range of enterprise kernels. Looks like modern Linux EDAC monitoring works better than chipset-dependent memtest86+ ECC support, that always seems to be a 1-1.5 years behind the schedule.

A short list of relevant links about EDAC:

To make a long story short: EDAC is implemented as Linux kernel modules that should be loaded and configured. After that, they will:

  • output errors in kernel's dmesg
  • allow querying using sysfs interface (for control, accessing error counters/indicators, resetting, etc)

+ TherŠµ are userspace utilites that can:

  • edac-ctl - load necessary modules, autodetecting what's needed by CPU/processor (as sensors-detect in lm_sensors)
  • edac-util - reads everything from sysfs interface and outputs in in ready-to-be-parsed manner (as sensors in lm_sensors)

Generally, to make this done, we'd need:

  1. A kernel with full EDAC support; usually they're already in most modern kernels.
  2. edac-utils installed in chroot.
  3. Loading of necessary kernel modules during init (can be done using edac-ctl).
  4. A monitoring script that wakes up every Nth second, checks if there are any errors and logs them to server.

Also available in: Atom PDF