[Skiboot] [PATCH] [v3] Fast reboot for P8

Stewart Smith stewart at linux.vnet.ibm.com
Mon Aug 22 14:35:50 AEST 2016

Stewart Smith <stewart at linux.vnet.ibm.com> writes:
> Benjamin Herrenschmidt <benh at kernel.crashing.org> writes:
>> This is an experimental patch that implements "Fast reboot" on P8
>> machines.
>> The basic idea is that when the OS calls OPAL reboot, we gather all
>> the threads in the system using a combination of patching the reset
>> vector and soft-resetting them, then cleanup a few bits of hardware
>> (we do re-probe PCIe for example), and reload & restart the bootloader.
>> This is very experimental and needs a lot of testing and also auditing
>> code for other bits of HW that might need to be cleaned up. I also need
>> to check if we are properly PERST'ing PCI devices.
>> I've successfully fast rebooted a Habanero a few times.
> I've been hammering it a bunch on Garrison. I had to patch around a bit
> for everything to build on master, but nothing too bad.
> my timings for it are pretty impressive. This is with latest BMC
> firmware build, petitboot env from current op-build master-next (so
> kernel with quiet) and Ubuntu on sda2... which I think is spinning rust:
> 18.5 seconds to petitboot
> 1.5 seconds to discover boot options
> 5 second (configured) petitboot timeout before booting.
> 20 seconds to boot Ubuntu.
> So, that's a reboot cycle every 45 seconds.
> I had HTX set up to do continuous bootme on it, and even found a bug! we
> have a mem leak, as on successive boots we have this much free HEAP at
> end of boot:
> 7334560
> 6804152
> 6273704
> 5743336
> 5212904
> 4682488
> 4152008
> I think it's the device tree blob, so should be easy fix.
> Now going to try with NVMe installed OS... that could be fun... and
> perhaps cut down the reboot time to *insanely* quick too.

Update on testing...

I left a garrison on a fast reboot loop over the weekend, fast rebooting
every 10 minutes (and running HTX for those booted times).

At some point over the weekend we crapped out and didn't succeed, but
due to my oversight in not logging say, everything going to SoL, I have
NFI what occured... nothing got garded though!

It does survive at least 173 fast reboots though, and all of this is
with the nap code too.

My hacked together tree is up on

There's still a memory leak, albeit a tiny one:
[1778586948042,5,10] RESET: Initiating fast reboot 30...
[2616226996510,5,10] RESET: Initiating fast reboot 40...
[3126353501745,5,10] RESET: Initiating fast reboot 44...

not sure where that bit of memory is going though, may have to delve
into things further.

Stewart Smith
OPAL Architect, IBM.

More information about the Skiboot mailing list