[Skiboot] [PATCH v2 4/5] hw/phb3: add host sync notifier to trigger creset/CAPP disable on kexec

Mon Jan 16 10:51:42 AEDT 2017

On Fri, Jan 13, 2017 at 04:09:40PM +1100, Andrew Donnellan wrote:
>To support kexec in Linux, we need to trigger a creset to disable CAPP mode
>on each PHB that has been attached to a CAPP.
>
>Add a host sync notifier, phb3_host_sync_reset(), that will be triggered by
>the opal_sync_host_reboot() call that Linux makes when "shutting down" a
>powernv system (this includes bringing the system down to prepare it for
>kexec). This notifier will trigger a creset only on PHBs that need it, and
>will poll regularly until the creset completes.
>
>This approach is somewhat hacky, as it's somewhat of an abuse of the host
>sync notifier system (IMHO), but it seems the most obvious way to
>ensure that the reset/CAPP disable occurs that will work with old kernel
>versions and not require additional support on the kernel side.
>
>Suggested-by: Stewart Smith <stewart at linux.vnet.ibm.com>
>Signed-off-by: Andrew Donnellan <andrew.donnellan at au1.ibm.com>
>
>---
>
>v1->v2:
>* Add explanatory comment about use of host sync notifier (suggested by
>Fred)
>---
> hw/phb3.c | 40 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 40 insertions(+)
>
>diff --git a/hw/phb3.c b/hw/phb3.c
>index 87dc87f..05221a4 100644
>--- a/hw/phb3.c
>+++ b/hw/phb3.c
>@@ -4622,6 +4622,44 @@ static bool phb3_calculate_windows(struct phb3 *p)
> 	return true;
> }
> 
>+/*
>+ * Trigger a creset to disable CAPI mode on kernel shutdown.
>+ *
>+ * This helper is called repeatedly by the host sync notifier mechanism, which
>+ * relies on the kernel to regularly poll the OPAL_SYNC_HOST_REBOOT call as it
>+ * shuts down.
>+ *
>+ * This is a somewhat hacky abuse of the host sync notifier mechanism, but the
>+ * alternatives require a new API call which won't work for older kernels.
>+ */
>+static bool phb3_host_sync_reset(void *data)
>+{
>+	struct phb3 *p = (struct phb3 *)data;
>+	struct pci_slot *slot = p->phb.slot;
>+	struct proc_chip *chip = get_chip(p->chip_id);
>+	int64_t rc;
>+
>+	switch (slot->state) {
>+	case PHB3_SLOT_NORMAL:
>+		lock(&capi_lock);
>+		rc = (chip->capp_phb3_attached_mask & (1 << p->index)) ?
>+			OPAL_PHB_CAPI_MODE_CAPI :
>+			OPAL_PHB_CAPI_MODE_PCIE;
>+		unlock(&capi_lock);
>+
>+		if (rc == OPAL_PHB_CAPI_MODE_PCIE)
>+			return true;
>+
>+		PHBINF(p, "PHB in CAPI mode, resetting\n");
>+		p->flags &= ~PHB3_CAPP_RECOVERY;
>+		phb3_creset(slot);
>+		return false;
>+	default:
>+		rc = slot->ops.poll(slot);
>+		return rc == OPAL_SUCCESS;

The code seems incorrect. In opal_sync_host_reboot(), OPAL_BUSY_EVENT
is returned to Linux if any notifier returns false. Linux delays 10ms
and call opal_sync_host_reboot() again, meaning all notifiers will be
triggered (including phb3_host_sync_reset()) again.

When error is returned from slot->ops.poll(slot), the PHB might be put
to non_PHB3_SLOT_NORMAL state. false is returned on the error. This
function is called again and jump to default case. It's obviously not
what we want. So the code would be:

		return rc <= OPAL_SUCCESS;

Anyway, I think opal_sync_host_reboot() needs enhancement to avoid
triggering all notifier on failure from any of them. It's something
out of scope though.

>+	}
>+}
>+
> static void phb3_create(struct dt_node *np)
> {
> 	const struct dt_property *prop;
>@@ -4755,6 +4793,8 @@ static void phb3_create(struct dt_node *np)
> 	/* Load capp microcode into capp unit */
> 	capp_load_ucode(p);
> 
>+	opal_add_host_sync_notifier(phb3_host_sync_reset, p);
>+
> 	/* Platform additional setup */
> 	if (platform.pci_setup_phb)
> 		platform.pci_setup_phb(&p->phb, p->index);

Thanks,
Gavin