[Skiboot] [PATCH v1] PCI: pci should throw error on platform PCI devices not being detected

Gavin Shan gwshan at linux.vnet.ibm.com
Thu Jun 9 21:47:25 AEST 2016


On Thu, Jun 09, 2016 at 02:29:56PM +0530, Mamatha Inamdar wrote:
>Problem Description: Sometimes system boots to petitboot and get into a state where
>only the PHBs were detected and *no* other PCI devices.
>
>Fix: This patch is to check the detected PCI devices against the PCI slot 
>table in the platform definition and display an error if they don't match 
>and commit an errorlog.
>
>Test Results:
>After testing the patch, we see following traces on the SOL console.
>[8212824503,5] PCI: Check for a present device...
>[8212921065,3] Slot3 PCI: No device found
>[8212987391,5] Device Found in SLOT= Backplane PLX
>[8213085726,3] Slot4 PCI: No device found 
>
>Signed-off-by: Mamatha Inamdar <mamatha4 at linux.vnet.ibm.com>

Mamatha, there is root port behind PHB at least. It's conflicting
with the PCI hotplug patchset where the "struct pci_slot_info" is
replaced by "struct pci_slot".

>---
> core/pci.c         |   33 +++++++++++++++++++++++++++++++++
> include/errorlog.h |    3 ++-
> 2 files changed, 35 insertions(+), 1 deletion(-)
>
>diff --git a/core/pci.c b/core/pci.c
>index 9b238d0..f7ca477 100644
>--- a/core/pci.c
>+++ b/core/pci.c
>@@ -20,6 +20,7 @@
> #include <pci-cfg.h>
> #include <timebase.h>
> #include <device.h>
>+#include <errorlog.h>
> #include <fsp.h>
> 
> #define MAX_PHB_ID	256
>@@ -47,6 +48,9 @@ int last_phb_id = 0;
> 	      ((_bdfn) >> 8) & 0xff,			\
> 	      ((_bdfn) >> 3) & 0x1f, (_bdfn) & 0x7, ## a)
> 
>+DEFINE_LOG_ENTRY(OPAL_RC_PCI_DETECT_DEV, OPAL_PLATFORM_ERR_EVT, OPAL_PCI,
>+		OPAL_MISC_SUBSYSTEM,OPAL_PREDICTIVE_ERR_GENERAL,
>+		OPAL_NA);
> /*
>  * Generic PCI utilities
>  */
>@@ -1510,6 +1514,28 @@ static void pci_do_jobs(void (*fn)(void *))
> 	free(jobs);
> }
> 
>+static void scan_present_device(struct phb *phb)
>+{
>+	int64_t rc;
>+	struct pci_device *pd;
>+
>+	/*
>+	for PCI/PCI-X, we get the slot info and heck
>+	if the PHB has anything connected to it
>+	*/

I guess the log format can be improved.

>+	while ((pd = list_pop(&phb->devices, struct pci_device, link)) != NULL) {

This check is unecessary as you're going to check the link behind the root port
with phb->ops->presence_detect(). There are no PCI devices behind the root port
when it returns !OPAL_SHPC_DEV_PRESENT.

>+		if (platform.pci_get_slot_info)
>+			platform.pci_get_slot_info(phb, pd);

platform.pci_get_slot_info could be NULL or there is no PCI slot behind the root
port, meaning pd->slot_info can be NULL.

>+
>+		rc = phb->ops->presence_detect(phb);
>+		if (rc != OPAL_SHPC_DEV_PRESENT)
>+			log_simple_error(&e_info(OPAL_RC_PCI_DETECT_DEV), "%s "
>+			"PCI: No device found\n", pd->slot_info->label);
>+		else
>+			prlog(PR_NOTICE, "Device Found in SLOT= %s\n", pd->slot_info->label);

I don't think we need this log as the dumped devices reveals the PCI enumeration
result. By the way, I don't understand why we need log_simple_error() which would
be passed to management software if I'm correct enough, could you please explain
the purpose?

>+	}
>+}
>+
> void pci_init_slots(void)
> {
> 	unsigned int i;
>@@ -1538,6 +1564,13 @@ void pci_init_slots(void)
> 
> 		phbs[i]->ops->phb_final_fixup(phbs[i]);
> 	}
>+
>+	prlog(PR_NOTICE, "PCI: Check for a present device...\n");

This log seems unnecessary.

>+	for (i = 0; i < ARRAY_SIZE(phbs); i++) {
>+		if (!phbs[i])
>+			continue;
>+		scan_present_device(phbs[i]);

It could be run on multiple CPUs as we do in pci_scan_phb().

>+	}
> }
> 
> /*
>diff --git a/include/errorlog.h b/include/errorlog.h
>index b8fca7d..d6f18d1 100644
>--- a/include/errorlog.h
>+++ b/include/errorlog.h
>@@ -280,7 +280,8 @@ enum opal_reasoncode {
> 	OPAL_RC_PCI_INIT_SLOT   = OPAL_PC | 0x10,
> 	OPAL_RC_PCI_ADD_SLOT    = OPAL_PC | 0x11,
> 	OPAL_RC_PCI_SCAN        = OPAL_PC | 0x12,
>-	OPAL_RC_PCI_RESET_PHB   = OPAL_PC | 0x10,
>+	OPAL_RC_PCI_RESET_PHB   = OPAL_PC | 0x13,
>+	OPAL_RC_PCI_DETECT_DEV  = OPAL_PC | 0x14,
> /* ATTN */
> 	OPAL_RC_ATTN		= OPAL_AT | 0x10,
> /* MEM_ERR */
>

Thanks,
Gavin



More information about the Skiboot mailing list