[Skiboot] [PATCH v1] PCI: pci should throw error on platform PCI devices not being detected

Mamatha Inamdar mamatha4 at linux.vnet.ibm.com
Fri Jun 10 16:01:08 AEST 2016


Hi Gavin,

Thanks for the review...
I will work on all your comments, please find my comments below


On 06/09/2016 05:17 PM, Gavin Shan wrote:
> On Thu, Jun 09, 2016 at 02:29:56PM +0530, Mamatha Inamdar wrote:
>> Problem Description: Sometimes system boots to petitboot and get into a state where
>> only the PHBs were detected and *no* other PCI devices.
>>
>> Fix: This patch is to check the detected PCI devices against the PCI slot
>> table in the platform definition and display an error if they don't match
>> and commit an errorlog.
>>
>> Test Results:
>> After testing the patch, we see following traces on the SOL console.
>> [8212824503,5] PCI: Check for a present device...
>> [8212921065,3] Slot3 PCI: No device found
>> [8212987391,5] Device Found in SLOT= Backplane PLX
>> [8213085726,3] Slot4 PCI: No device found
>>
>> Signed-off-by: Mamatha Inamdar <mamatha4 at linux.vnet.ibm.com>
> Mamatha, there is root port behind PHB at least. It's conflicting
> with the PCI hotplug patchset where the "struct pci_slot_info" is
> replaced by "struct pci_slot".

So, Can I wait until new patchset integrated into skiboot?

>
>> ---
>> core/pci.c         |   33 +++++++++++++++++++++++++++++++++
>> include/errorlog.h |    3 ++-
>> 2 files changed, 35 insertions(+), 1 deletion(-)
>>
>> diff --git a/core/pci.c b/core/pci.c
>> index 9b238d0..f7ca477 100644
>> --- a/core/pci.c
>> +++ b/core/pci.c
>> @@ -20,6 +20,7 @@
>> #include <pci-cfg.h>
>> #include <timebase.h>
>> #include <device.h>
>> +#include <errorlog.h>
>> #include <fsp.h>
>>
>> #define MAX_PHB_ID	256
>> @@ -47,6 +48,9 @@ int last_phb_id = 0;
>> 	      ((_bdfn) >> 8) & 0xff,			\
>> 	      ((_bdfn) >> 3) & 0x1f, (_bdfn) & 0x7, ## a)
>>
>> +DEFINE_LOG_ENTRY(OPAL_RC_PCI_DETECT_DEV, OPAL_PLATFORM_ERR_EVT, OPAL_PCI,
>> +		OPAL_MISC_SUBSYSTEM,OPAL_PREDICTIVE_ERR_GENERAL,
>> +		OPAL_NA);
>> /*
>>   * Generic PCI utilities
>>   */
>> @@ -1510,6 +1514,28 @@ static void pci_do_jobs(void (*fn)(void *))
>> 	free(jobs);
>> }
>>
>> +static void scan_present_device(struct phb *phb)
>> +{
>> +	int64_t rc;
>> +	struct pci_device *pd;
>> +
>> +	/*
>> +	for PCI/PCI-X, we get the slot info and heck
>> +	if the PHB has anything connected to it
>> +	*/
> I guess the log format can be improved.

Sure, will update this...

>
>> +	while ((pd = list_pop(&phb->devices, struct pci_device, link)) != NULL) {
> This check is unecessary as you're going to check the link behind the root port
> with phb->ops->presence_detect(). There are no PCI devices behind the root port
> when it returns !OPAL_SHPC_DEV_PRESENT.
>
>> +		if (platform.pci_get_slot_info)
>> +			platform.pci_get_slot_info(phb, pd);
> platform.pci_get_slot_info could be NULL or there is no PCI slot behind the root
> port, meaning pd->slot_info can be NULL.
>
>> +
>> +		rc = phb->ops->presence_detect(phb);
>> +		if (rc != OPAL_SHPC_DEV_PRESENT)
>> +			log_simple_error(&e_info(OPAL_RC_PCI_DETECT_DEV), "%s "
>> +			"PCI: No device found\n", pd->slot_info->label);
>> +		else
>> +			prlog(PR_NOTICE, "Device Found in SLOT= %s\n", pd->slot_info->label);
> I don't think we need this log as the dumped devices reveals the PCI enumeration
> result. By the way, I don't understand why we need log_simple_error() which would
> be passed to management software if I'm correct enough, could you please explain
> the purpose?

Actually the requirement is...In skiboot check for the detected PCI 
devices against what's
in the slot table in the platform definition and throwing an error if 
the device is not present.
This way, there'd be a SEL/eSEL/PEL log saying "devices not detected".
else we don't have any data saying that there are no PCI devices found.

In one of the defect we observed system boots to petitboot and displays 
nothing to boot from.

>
>> +	}
>> +}
>> +
>> void pci_init_slots(void)
>> {
>> 	unsigned int i;
>> @@ -1538,6 +1564,13 @@ void pci_init_slots(void)
>>
>> 		phbs[i]->ops->phb_final_fixup(phbs[i]);
>> 	}
>> +
>> +	prlog(PR_NOTICE, "PCI: Check for a present device...\n");
> This log seems unnecessary.
>
>> +	for (i = 0; i < ARRAY_SIZE(phbs); i++) {
>> +		if (!phbs[i])
>> +			continue;
>> +		scan_present_device(phbs[i]);
> It could be run on multiple CPUs as we do in pci_scan_phb().
>
>> +	}
>> }
>>
>> /*
>> diff --git a/include/errorlog.h b/include/errorlog.h
>> index b8fca7d..d6f18d1 100644
>> --- a/include/errorlog.h
>> +++ b/include/errorlog.h
>> @@ -280,7 +280,8 @@ enum opal_reasoncode {
>> 	OPAL_RC_PCI_INIT_SLOT   = OPAL_PC | 0x10,
>> 	OPAL_RC_PCI_ADD_SLOT    = OPAL_PC | 0x11,
>> 	OPAL_RC_PCI_SCAN        = OPAL_PC | 0x12,
>> -	OPAL_RC_PCI_RESET_PHB   = OPAL_PC | 0x10,
>> +	OPAL_RC_PCI_RESET_PHB   = OPAL_PC | 0x13,
>> +	OPAL_RC_PCI_DETECT_DEV  = OPAL_PC | 0x14,
>> /* ATTN */
>> 	OPAL_RC_ATTN		= OPAL_AT | 0x10,
>> /* MEM_ERR */
>>
> Thanks,
> Gavin
>



More information about the Skiboot mailing list