Hardware Watchdog Device in pSeries?

Linas Vepstas linas at austin.ibm.com
Fri Oct 15 02:21:41 EST 2004

Hi Alan,

Long emails confuse me ...

On Wed, Oct 13, 2004 at 10:41:26PM -0600, Alan Robertson was heard to remark:
> Linas Vepstas wrote:
> >why should someone buy 12 pci-card watchdogs, one for each partition,
> >chewing up 12 pci slots, when the pSeries is already capable of doing
>  It looks really Rube Goldberg-ish (to say the least).

> The hardware watchdog timer is a 3rd party 
> monitoring system, and therefore is likely to be reliable when the thing it 
> is watching is sick - 

Not sure where you're going with this; are you saying that 
3rd-party watchdog PCI cards, one for each partition, is a 
good idea, or a bad idea?  

Would you rather have the OS monitoring done with 
(a) watchdog PCI cards,
(b) with 'surveillance' done by firmware/hypervisor, 
(c) or with some other method?

> 	The bootstrap loader should work much the

I guess I didn't get this exposition either.  Although its nice to 
know that boot was successful,  I see boot as a whole lot less 
important than monitoring the system once its gone 'online'.  The boot
sequence can be monitored much more loosely, with a whole-lot less
complexity.  The hypervisor knows when the OS boot sequence starts.
If the OS hasn't completely booted after, say, 10 minutes, then it
can call a human to look at the problem.  I don't see why one needs
to heartbeat once a second during boot; that's hard to do and seems
un-neccessary.  By contrast, I'd expect to turn on the once-per-second
heartbeat just before the system goes 'online' or 'critical'.


More information about the Linuxppc64-dev mailing list