[Skiboot] [PATCH v3 0/5] CAPI disabling support for kexec/fast reboot
Andrew Donnellan
andrew.donnellan at au1.ibm.com
Wed Mar 1 11:34:10 AEDT 2017
On 27/01/17 18:33, Andrew Donnellan wrote:
> Currently, if you attempt to kexec or fast reboot from a machine with a
> CAPI card and the cxl driver loaded, you are going to have an exceedingly
> bad time. It turns out that the hardware doesn't really cope very well with
> going through the standard Linux PCI initialisation process while a PHB is
> still in CAPI mode. Checkstops everywhere!
>
> This series implements support for switching a PHB from CAPI mode back to
> regular PCIe mode during a complete reset. The SCOM sequences have been
> derived through a mix of advice from other IBMers, reading through various
> internal workbooks, and trial and error.
>
> This has only been lightly tested - I've kexec-ed/fast rebooted quite a few
> times with no real problems, and I've run some basic CAPI tests that don't
> seem to fail too much more than they normally fail. As this procedure
> involves forcing the CAPP into recovery, we do see a lot of HMIs but as far
> as I'm aware they're harmless and there's not much we can really do about
> them.
>
> At this stage, I haven't thought too hard about whether we can
> disable CAPI mode while Linux is running for e.g. PCI hotplug, which could
> get tricky. That's a question for later...
>
> Thanks to Vaibhav Jain (who made a previous attempt at this), Mikey
> Neuling, Ben Herrenschmidt, Gavin Shan, Bill Daly, Ken Lauricella and JT
> Kellington for advice on various bits of this.
Stewart - thoughts on merging this?
I haven't had a chance to track down what's causing kexec failures on
one of our machines/cards - it looks like the card involved may have
been removed... it's not exactly a regression though, as it'll fail just
as badly without this series. (Fred: did freak lose its FPGA card?)
I haven't seen any issues with fast reboot, and I haven't seen any kexec
problems except on one specific box. I'm happy to do a bit more work on
tracking down the failure but with upcoming leave it'll probably take me
another month, I don't think anyone else from the CAPI team is able to
take this on.
Andrew
>
> Andrew
>
> ---
>
> Changes since V2:
>
> * Address some code review comments from Gavin
>
> Changes since V1:
>
> * Add some more comments (thanks Fred)
>
> Changes since RFC V1:
>
> * switched to using a host sync notifier so the old kernel triggers the
> creset, rather than relying on new kernel, and we don't need a new OPAL
> call to check CAPI state (suggested by Stewart)
>
> * code style comments from Gavin
>
> * minor tidying up around the place
>
> Andrew Donnellan (5):
> core/pci: remove misleading fast reboot comment
> fast-reboot: creset PHBs on fast reboot
> hw/phb3: disable CAPI mode during complete reset
> hw/phb3: add host sync notifier to trigger creset/CAPP disable on kexec
> fast-reboot: remove CAPI check
>
> core/fast-reboot.c | 12 +---
> core/pci.c | 23 +++++-
> hw/phb3.c | 181 ++++++++++++++++++++++++++++++++++++++++++++--
> include/phb3.h | 9 +-
> 4 files changed, 202 insertions(+), 23 deletions(-)
>
> base-commit: e0225ccaf9bd6c8882a2839256d07645737836e4
>
--
Andrew Donnellan OzLabs, ADL Canberra
andrew.donnellan at au1.ibm.com IBM Australia Limited
More information about the Skiboot
mailing list