[Skiboot] [PATCH] OPAL:PCI should throw error on platform PCI devices not being detected

Mukesh Ojha mukesh02 at linux.vnet.ibm.com
Thu Jun 9 16:08:36 AEST 2016



On Thursday 09 June 2016 11:20 AM, Mamatha Inamdar wrote:
>
>
> On 06/08/2016 09:48 PM, Mukesh Ojha wrote:
>> HI Mamatha,
>>
>> Few observations.
>>
>> On Wednesday 08 June 2016 09:00 PM, Mamatha Inamdar wrote:
>>> Problem Description: Some times system boots to petitboot and get 
>>> into a state where
>>> only the PHBs were detected and *no* other PCI devices.
>>
>> s/Some times/Sometimes
>
> Thanks..will update
>
>>
>>>
>>> Fix: This patch is to check the detected PCI devices against the PCI 
>>> slot table in the platform
>>> definition and display an error if they don't match and commit an 
>>> errorlog.
>>>
>>> Test Results:
>>> After testing the patch, we see following traces on the SOL console.
>>> [8212824503,5] PCI: Check for a present device...
>>> [8212921065,3] Slot3 PCI: No device found
>>> [8212987391,5] Device Found in SLOT= Backplane PLX
>>> [8213085726,3] Slot4 PCI: No device found
>>>
>>> From: Mamatha Inamdar <mamatha4 at linux.vnet.ibm.com>
>>>
>>> Signed-off-by: Mamatha Inamdar <mamatha4 at linux.vnet.ibm.com>
>>> ---
>>>   core/pci.c         |   33 +++++++++++++++++++++++++++++++++
>>>   include/errorlog.h |    3 ++-
>>>   2 files changed, 35 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/core/pci.c b/core/pci.c
>>> index 9b238d0..987f69d 100644
>>> --- a/core/pci.c
>>> +++ b/core/pci.c
>>> @@ -20,6 +20,7 @@
>>>   #include <pci-cfg.h>
>>>   #include <timebase.h>
>>>   #include <device.h>
>>> +#include <errorlog.h>
>>>   #include <fsp.h>
>>>     #define MAX_PHB_ID    256
>>> @@ -47,6 +48,9 @@ int last_phb_id = 0;
>>>             ((_bdfn) >> 8) & 0xff,            \
>>>             ((_bdfn) >> 3) & 0x1f, (_bdfn) & 0x7, ## a)
>>>   +DEFINE_LOG_ENTRY(OPAL_RC_PCI_SLOT, OPAL_PLATFORM_ERR_EVT, OPAL_PCI,
>>> +        OPAL_MISC_SUBSYSTEM,OPAL_PREDICTIVE_ERR_GENERAL,
>>> +        OPAL_NA);
>>
>> Is this critical enough to be logged into BMC?
>> For logging into BMC the severity should be greater than
>> 'OPAL_PREDICTIVE_ERR_FAULT_RECTIFY_REBOOT'.
>>>   /*
>>>    * Generic PCI utilities
>>>    */
>>> @@ -1510,6 +1514,28 @@ static void pci_do_jobs(void (*fn)(void *))
>>>       free(jobs);
>>>   }
>>>   +static void scan_present_device(struct phb *phb)
>>> +{
>>> +    int64_t rc;
>>> +    struct pci_device *pd;
>>> +
>>> +    /*
>>> +    for PCI/PCI-X, we get the slot info and heck
>>> +    if the PHB has anything connected to it
>>> +    */
>>> +    while ((pd = list_pop(&phb->devices, struct pci_device, link)) 
>>> != NULL) {
>>> +        if (platform.pci_get_slot_info)
>>> +            platform.pci_get_slot_info(phb, pd);
>>> +
>>> +        rc = phb->ops->presence_detect(phb);
>>> +        if (rc != OPAL_SHPC_DEV_PRESENT)
>>> +            log_simple_error(&e_info(OPAL_RC_PCI_SLOT), "%s "
>>> +            "PCI: No device found\n", pd->slot_info->label);
>>> +        else
>>> +            prlog(PR_NOTICE, "Device Found in SLOT= %s\n", 
>>> pd->slot_info->label);
>>> +    }
>>> +}
>>> +
>>>   void pci_init_slots(void)
>>>   {
>>>       unsigned int i;
>>> @@ -1538,6 +1564,13 @@ void pci_init_slots(void)
>>>             phbs[i]->ops->phb_final_fixup(phbs[i]);
>>>       }
>>> +
>>> +    prlog(PR_NOTICE, "PCI: Check for a present device...\n");
>>> +    for (i = 0; i < ARRAY_SIZE(phbs); i++) {
>>> +        if (!phbs[i])
>>> +            continue;
>>> +        scan_present_device(phbs[i]);
>>> +    }
>>>   }
>>>     /*
>>> diff --git a/include/errorlog.h b/include/errorlog.h
>>> index b8fca7d..5b754f7 100644
>>> --- a/include/errorlog.h
>>> +++ b/include/errorlog.h
>>> @@ -280,7 +280,8 @@ enum opal_reasoncode {
>>>       OPAL_RC_PCI_INIT_SLOT   = OPAL_PC | 0x10,
>>>       OPAL_RC_PCI_ADD_SLOT    = OPAL_PC | 0x11,
>>>       OPAL_RC_PCI_SCAN        = OPAL_PC | 0x12,
>>> -    OPAL_RC_PCI_RESET_PHB   = OPAL_PC | 0x10,
>>> +    OPAL_RC_PCI_RESET_PHB   = OPAL_PC | 0x13,
>>> +    OPAL_RC_PCI_SLOT    = OPAL_PC | 0x14,
>>
>> Can't we use 'OPAL_RC_PCI_INIT_SLOT' here?
>
> In this patch We are not initializing the SLOT to use  above reason code,
> We are checking the devices are detected or not in the available slot.
>

Then, we could name it as OPAL_RC_PCI_DETECT_SLOT.

-Mukesh

>>
>> Cheers,
>> -Mukesh
>>
>>>   /* ATTN */
>>>       OPAL_RC_ATTN        = OPAL_AT | 0x10,
>>>   /* MEM_ERR */
>>>
>>> _______________________________________________
>>> Skiboot mailing list
>>> Skiboot at lists.ozlabs.org
>>> https://lists.ozlabs.org/listinfo/skiboot
>>
>



More information about the Skiboot mailing list