[PATCH v5 00/21] EEH reorganization

Anton Blanchard anton at samba.org
Tue Apr 17 11:37:38 EST 2012


Hi,

> Thanks for the information. I'll try to reproduce the issue on
> Firebird-L today. By the way, it seems that "mstmread" is some
> user-level application accessing the config space while the problem
> happened?

The EEH error is caused by the Melanox firmware tools.

> It seems the crash was caused by something like WARN_ON(). I checked
> the function pointed by the backtrace (eeh_dn_check_failure) and I
> didn't find any place has called WARN_ON() staff. Maybe I missed
> something here.

No. I replaced that backtrace in eeh_dn_check_failure with a WARN_ON()
because the backtrace doesn't give us enough info. I'm submitting a
patch for that today.

Bottom line is mstmread has been causing an EEH error since at least
3.0, but in 3.4 we now oops instead of recovering. The signs all point
to the EEH rework in 3.4.

Anton


More information about the Linuxppc-dev mailing list