Hardware Watchdog Device in pSeries?

Alan Robertson alanr at unix.sh
Thu Oct 14 07:30:02 EST 2004


Mike Strosaker wrote:
> Linas Vepstas wrote:
> 
>> I might have volunteered to hack this up real quick, were it not for
>> Mike Strosaker's correction, that the surveillance featues were taken
>> out of Power5.  
>> Anyone on this list know why?
>>
> 
> I sent the reason I got from the hardware RAS folks to this list a while 
> back.
> Luckily, it's still in my sent mail folder:
> 
> "Because of the virtualization layer and partitioning, the surveillance
> requirement was moved to PHYP<->SP.  Apparently, this was a hotly
> contested issue among the platform design folks (especially considering 
> that
> partitioned power4 systems still have OS<->SP surveillance).  I think 
> the logic
> is: If an OS goes down, its not likely a server problem, hence no 
> requirement
> to monitor from the server side.
> 
> At least the platform gets notified of panics via os-term.  I gather
> that some user space tools are expected to monitor for deadlocks/hangs
> (maybe clustering tools). "

This is about half-right.

There is one particular circumstance which can ONLY be monitored from a 
hardware-level monitor.

OS hangs.

If the OS hangs, then, nothing but a hardware timer can bring the machine 
out of it's hung state.  Hangs do NOT panic (by definition), and can't be 
reliably detected any other way.

In highly available systems (like telecom systems), hardware level monitors 
are required.  Leaving it out sends the message that "availability isn't 
important".

The normal way that a highly available systems is to have layers (or a 
hierarchy) of watchers.

	At the bottom is the hardware monitor.

	Above that is an application monitor

	above that is resource monitors

	etc.

But, there are certain kinds of faults which cannot be caught without this 
bottom layer monitor.




-- 
     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions." - William Wilberforce



More information about the Linuxppc64-dev mailing list