[Skiboot] [PATCH v3 0/5] CAPI disabling support for kexec/fast reboot

Andrew Donnellan andrew.donnellan at au1.ibm.com
Wed Mar 1 11:34:10 AEDT 2017


On 27/01/17 18:33, Andrew Donnellan wrote:
> Currently, if you attempt to kexec or fast reboot from a machine with a
> CAPI card and the cxl driver loaded, you are going to have an exceedingly
> bad time. It turns out that the hardware doesn't really cope very well with
> going through the standard Linux PCI initialisation process while a PHB is
> still in CAPI mode. Checkstops everywhere!
>
> This series implements support for switching a PHB from CAPI mode back to
> regular PCIe mode during a complete reset. The SCOM sequences have been
> derived through a mix of advice from other IBMers, reading through various
> internal workbooks, and trial and error.
>
> This has only been lightly tested - I've kexec-ed/fast rebooted quite a few
> times with no real problems, and I've run some basic CAPI tests that don't
> seem to fail too much more than they normally fail. As this procedure
> involves forcing the CAPP into recovery, we do see a lot of HMIs but as far
> as I'm aware they're harmless and there's not much we can really do about
> them.
>
> At this stage, I haven't thought too hard about whether we can
> disable CAPI mode while Linux is running for e.g. PCI hotplug, which could
> get tricky. That's a question for later...
>
> Thanks to Vaibhav Jain (who made a previous attempt at this), Mikey
> Neuling, Ben Herrenschmidt, Gavin Shan, Bill Daly, Ken Lauricella and JT
> Kellington for advice on various bits of this.

Stewart - thoughts on merging this?

I haven't had a chance to track down what's causing kexec failures on 
one of our machines/cards - it looks like the card involved may have 
been removed... it's not exactly a regression though, as it'll fail just 
as badly without this series. (Fred: did freak lose its FPGA card?)

I haven't seen any issues with fast reboot, and I haven't seen any kexec 
problems except on one specific box. I'm happy to do a bit more work on 
tracking down the failure but with upcoming leave it'll probably take me 
another month, I don't think anyone else from the CAPI team is able to 
take this on.


Andrew


>
> Andrew
>
> ---
>
> Changes since V2:
>
> * Address some code review comments from Gavin
>
> Changes since V1:
>
> * Add some more comments (thanks Fred)
>
> Changes since RFC V1:
>
> * switched to using a host sync notifier so the old kernel triggers the
> creset, rather than relying on new kernel, and we don't need a new OPAL
> call to check CAPI state (suggested by Stewart)
>
> * code style comments from Gavin
>
> * minor tidying up around the place
>
> Andrew Donnellan (5):
>   core/pci: remove misleading fast reboot comment
>   fast-reboot: creset PHBs on fast reboot
>   hw/phb3: disable CAPI mode during complete reset
>   hw/phb3: add host sync notifier to trigger creset/CAPP disable on kexec
>   fast-reboot: remove CAPI check
>
>  core/fast-reboot.c |  12 +---
>  core/pci.c         |  23 +++++-
>  hw/phb3.c          | 181 ++++++++++++++++++++++++++++++++++++++++++++--
>  include/phb3.h     |   9 +-
>  4 files changed, 202 insertions(+), 23 deletions(-)
>
> base-commit: e0225ccaf9bd6c8882a2839256d07645737836e4
>

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan at au1.ibm.com  IBM Australia Limited



More information about the Skiboot mailing list