[PATCH 3/3] watchdog/aspeed: add support for dual boot
linux at roeck-us.net
Thu Aug 22 04:10:09 AEST 2019
On Wed, Aug 21, 2019 at 08:42:24PM +0300, Alexander Amelkin wrote:
> 21.08.2019 19:32, Guenter Roeck wrote:
> > On Wed, Aug 21, 2019 at 06:57:43PM +0300, Ivan Mikhaylov wrote:
> >> Set WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION into WDT_CLEAR_TIMEOUT_STATUS
> >> to clear out boot code source and re-enable access to the primary SPI flash
> >> chip while booted via wdt2 from the alternate chip.
> >> AST2400 datasheet says:
> >> "In the 2nd flash booting mode, all the address mapping to CS0# would be
> >> re-directed to CS1#. And CS0# is not accessable under this mode. To access
> >> CS0#, firmware should clear the 2nd boot mode register in the WDT2 status
> >> register WDT30.bit."
> > Is there reason to not do this automatically when loading the module
> > in alt-boot mode ? What means does userspace have to determine if CS0
> > or CS1 is active at any given time ? If there is reason to ever have CS1
> > active instead of CS0, what means would userspace have to enable it ?
> Yes, there is. The driver is loaded long before the filesystems are mounted. The filesystems, in the event of alternate/recovery boot, need to be mounted from the same chip that the kernel was booted. For one reason because the main chip at CS0 is most probably corrupt. If you clear that bit when driver is loaded, your software will not know that and will try to mount the wrong filesystems. The whole idea of ASPEED's switching chipselects is to have identical firmware in both chips, without the need to process the alternate boot state in any way except for indicating a successful boot and restoring access to CS0 when needed.
> The userspace can read bootstatus sysfs node to determine if an alternate boot has occured.
> With ASPEED, CS1 is activated automatically by wdt2 when system fails to boot from the primary flash chip (at CS0) and disable the watchdog to indicate a successful boot. When that happens, both CS0 and CS1 controls get routed in hardware to CS1 line, making the primary flash chip inaccessible. Depending on the architecture of the user-space software, it may choose to re-enable access to the primary chip via CS0 at different times. There must be a way to do so.
So by activating cs0, userspace would essentially pull its own root file system
from underneath itself ?
> > If userspace can not really determine if CS1 or CS0 is active, all it could
> > ever do was to enable CS0 to be in a deterministic state. If so, it doesn't
> > make sense to ever have CS1 active, and re-enabling CS0 could be automatic.
> > Similar, if CS1 can ever be enabled, there is no means for userspace to ensure
> > that some other application did not re-enable CS0 while it believes that CS1
> > is enabled. If there is no means for userspace to enable CS1, it can never be
> > sure what is enabled (because some other entity may have enabled CS0 while
> > userspace just thought that CS1 is still enabled). Again, the only means
> > to guarantee a well defined state would be to explicitly enable CS0 and provive
> > no means to enable CS1. Again, this could be done during boot, not requiring
> > an explicit request from userspace.
> Please understand that activation of CS1 in place of CS0 is NOT a software choice!
> >> + if (unlikely(!wdt))
> >> + return -ENODEV;
> >> +
> > How would this ever happen, and how / where is drvdata set to NULL ?
> This is purely for robustness. Seeing a pointer obtained via a function accessed without first checking it for validity makes me nervous.
This is not how kernel code is commonly written.
Sure, we could add similar checks to each sysfs access code in the kernel,
blowing up its size significantly. I do not see a point of this.
> This code most probably adds nothing at the assembly level.
That seems quite unlikely. Please demonstrate.
> >> + writel(WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION,
> >> + wdt->base + WDT_CLEAR_TIMEOUT_STATUS);
> >> + wdt->wdd.bootstatus |= WDIOF_EXTERN1;
> > The variable reflects the _boot status_. It should not change after booting.
> Is there any documentation that dictates that? All I could find is
> "bootstatus: status of the device after booting". That doesn't look to me like it absolutely can not change to reflect the updated status (that is, to reflect that the originally set up alternate CS routing has been reset to normal).
You choose to interpret "after booting" in a kind of novel way,
which I find a bit disturbing. I am not really sure how else to
describe "boot status" in a way that does not permit such
reinterpratation of the term.
On top of that, how specifically would "WDIOF_EXTERN1" reflect
what you claim it does ? Not only you are hijacking bootstatus9
(which is supposed to describe the reason for a reboot), you
are also hijacking WDIOF_EXTERN1. That seems highly arbitrary
to me, and is not really how an API/ABI should be used.
> If you absolutely disallow that, I think we could make 'access_cs0' readable instead, so it could report the current state of the boot code selection bit. Reverted, I suppose. That way 'access_cs0' would report 1 after 1 has been written to it (it wouldn't be possible to write a zero).
> > @@ -223,6 +248,9 @@ static int aspeed_wdt_probe(struct platform_device *pdev)
> > wdt->ctrl = WDT_CTRL_1MHZ_CLK;
> > + if (of_property_read_bool(np, "aspeed,alt-boot"))
> > + wdt->wdd.groups = bswitch_groups;
> > +
> > Why does this have to be separate to the existing evaluation of
> > aspeed,alt-boot, and why does the existing code not work ?
> > Also, is it guaranteed that this does not interfer with existing
> > support for alt-boot ?
> I think Ivan will comment on this.
> With best regards,
> Alexander Amelkin,
> BIOS/BMC Team Lead, YADRO
More information about the Linux-aspeed