[Skiboot] [PATCH v7 18/22] fadump: Add documentation

Tue May 21 02:40:54 AEST 2019

On 2019-05-16 15:35:21 Thu, Nicholas Piggin wrote:
> Vasant Hegde's on May 14, 2019 9:23 pm:
> > On 05/09/2019 10:28 AM, Nicholas Piggin wrote:
> >> Vasant Hegde's on April 13, 2019 7:15 pm:
> > Kernel informs OPAL about range of memory to be preserved during MPIPL
> > (source, destination, size).
> 
> Well it also contains crashing_cpu, type, and comes in this clunky
> structure.
> 
> > After reboot, we will result range from hostboot . We pass that to kernel via
> > device tree.
> > 
> >> 
> >> Why not just an API which can add a range, and delete a range, and
> >> that's it? Range would just be physical start, end, plus an arbitrary
> >> tag (which caller can use to retrieve metadata that is used to
> >> decipher the dump).
> > 
> > We want one to one mapping between source and destination.
> 
> Ah yes, sure that too. So two calls, one which adds or removes
> (source, dest, length) entries, and another which sets a tag.

The list of memory ranges that are selected by kernel to be
moved/preserved are based on the fact that they will be overwritten on
next boot after crash either by new kernel, initrd or some f/w code load.
Hence it is important to move and preserve all that memory content for a
successful dump creation which is valid. Failure to move/preserve even
one memory range would result into an invalid/incomplete vmcore.

Hence, the idea was to have a single call to add all the ranges at once
or fail the registration. Splitting the calls for a range at a time
increases the chances/window of getting partial ranges registered for
MPIPL. If OPAL crashes during the registration (between add API calls),
subsequent MPIPL would result in the kernel creating incomplete/incorrect
vmcore. If we decide to have multiple API to add memory ranges then we
may also need an API to complete the registration where kernel can
indicate OPAL that it is done adding the ranges and changes can be
committed.

> 
> > Also we have
> > to update this information in HDAT so that hostboot can access it.
> 
> That's okay though, isn't it? You can return failure if you don't
> have enough room.
> 
> > Also having structure allows us to pass all these information nicely to OPAL.
> 
> I don't think OPAL needs to know about the kernel crash metadata, and
> it could get its own by looking at addresses and tags that come up.
> Although I'm not really convinced it's a good idea to have a
> cooperative system where you have kernel and OPAL both managing crash
> dumps at the same time... I really think OPAL crash information and
> especially when the host is running could benefit from more thought.
> 
> > Finally this is similar concept we have in PowerVM LPAR as well. Hence I have
> > added structure.
> 
> Is that a point for or against this structure? :)
> 
> Thanks,
> Nick
> 
> _______________________________________________
> Skiboot mailing list
> Skiboot at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot
> 

-- 
Mahesh J Salgaonkar