[PATCH 1/3] powerpc: bare minimum checkpoint/restart implementation
Cedric Le Goater
legoater at free.fr
Tue Mar 17 17:55:37 EST 2009
> Again, how would 'cr' obtain exit status for these tasks, and how would
> it distinguish failure from normal operation?
Here's our solution to this issue.
mcr maintains in its kernel container object an exitcode attribute for
the mcr-restart process. This process is detached from the fork tree of
the restarted application.
when the restart is finished, an mcr-wait command can be called to reap
this exitcode. This make it possible to distinguish an exit of the
application process from an exit of the mcr-restart process.
This is a must-have for batch managers in an HPC environment.
Cheers,
C.
More information about the Linuxppc-dev
mailing list