[Skiboot] [PATCH v7 18/22] fadump: Add documentation

Tue Jun 4 20:15:28 AEST 2019

On Thu, May 30, 2019 at 1:16 PM Hari Bathini <hbathini at linux.ibm.com> wrote:
>
>
> On 29/05/19 12:09 PM, Oliver wrote:
> [...]
> >>> I'm wondering what we can safely do once we hit the final step. As far
> >>> as I can tell the intention is to boot into the same kernel that we
> >>> crashed from so that it can run makedumpsterfire to produce a
> >>> crashdump, invalidate the dump, and continue to boot into a
> >>> functioning OS. However I don't see how we'd actually guarantee that
> >>> actually happens. I realise that it's *probably* going to work most of
> >>> the time since we'll probably be running the same kernel that's the
> >>> default boot option, but surely we can come up with something that's
> >>> less jank.
> >> Yes. Idea is to boot back the same kernel. Since we are initializing everything
> >> again most of the time it will work fine (much better than kdump situation).
> > I'm not really convinced that MPIPL is a drasticly better than kdump.
> > The main reason kdump doesn't work well is GPUs with NVlink. As far as
>
>
> No. There are other reasons why FW assisted dump is better
> than KDump. KDump is susceptible to device state inconsistencies,
> device driver robustness, DMAs in flight, buggy software stomping
> on reserved memory where KDump kernel is loaded and
> reinitialization issues (PCI bus, PHB, etc..) with newer platform
> hardware and adapters...

When entering a kdump kernel we do a complete reset of each PHB before
scanning them and this is *identical* to what happens in the normal
IPL, fast-reboot, and MPIPL paths. See Linux commit 361f2a2a1536
("powrpc/powernv: Reset PHB in kdump kernel").

> > I can tell MPIPL doesn't do anything to help there.
>
> I don't know if it is not the case and why though. But definitely
> something to get right..

The real question is that after a GPU has done something bad and the
NPU has gone out to lunch, can we actually recover without a power
cycle? The actual implementation of MPIPL is a mess of code split
between hostboot and the SBE and I'm not entirely sure what happens in
the MPIPL case. If it re-scans the NPU then it might be ok. I can't
tell.

> >>> For contrast the kdump approach allows the crashing kernel to specify
> >>> what the crash environment is going to look like. If I were an OS
> >>> vendor I'd say that's a pretty compelling reason to use kdump instead
> >>> of this. If the main benifit of fadump is that we can reliably reset
> >>> and reinitialise hardware devices then maybe we should look at trying
> >>> to use MPIPL as an alternative kdump entry path. Rather than having
> >>> skiboot load petitboot from flash, we could have skiboot enter the
> >>> preloaded crash kernel and go from there.
> >> preloaded crash kernel is similar to kdump right? I don't think we again
> >> much from this approach.
> > Can you actually respond to what I'm saying rather than dismissing it
> > out of hand with a non-argument?
> >
> > If we can use MPIPL to make kdump more robust then I think we should
> > do that rather than having a completely separate mechanism to capture
>
>
> KDump uses kexec to boot the next kernel and I am not sure
> where MPIPL fits in this. But I understand your intention that
> KDump can be hardened instead of going for a new approach
> but KDump is a continuous chase with every new driver and
> hardware and the fix most often is needed in those modules
> but with f/w assisted dump, the hardening part is relatively
> straightforward with newer hardware...

The ABI between skiboot and the petitboot kernel is the same ABI as
kexec. There's no reason we can't go directly from skiboot into an
alternative payload, such as a kdump kernel. We could use that as a
basis for a harded-kdump rather than the going down the fadump route
which requires a trip through petitboot and whatever bugs that
introduces.

> > a crash dump. One of the goals of OpenPower is to have tooling and
> > processes that are consistent with what is used by the rest of the
> > industry rather than inventing IBM specific ways of doing everything.
> > So why are we doing this instead? I'm not saying that there's no good
> > reasons to take the approach you have, but you, Hari and Mahesh need
> > to do a better job of spelling them out. Have you spoken with anyone
> > from SuSE or RH about what they would prefer?
>
>
> The reasons I mentioned above is why we prefer this approach.
> pHyp guests have support for f/w assisted dump as alternative
> for KDump and there are customers using this for the reliability
> it brings to the table over KDump. We are trying to extend such
> support to OPAL platforms for the same reasons...
>
> Thanks
> Hari
>