fsi/sbefifo problems on bmc

Ivan Mikhaylov i.mikhaylov at yadro.com
Wed Jul 3 22:34:53 AEST 2019


On Wed, 2019-07-03 at 10:07 +1000, Benjamin Herrenschmidt wrote:
> 
> Nothing obvious comes to mind. Is basic CFAM access working ? Can you
> check the SBE it still up and running and not crashed ?
> 
> You rescan FSI on power on ?
> 
> Cheers,
> Ben.
> 

Ben, this is what I have before 'obmcutil poweron':
root at nicole:~# pdbg -a probe
pdbg: Unable to open /sys/devices/platform/gpio-fsi/fsi0/slave at 00:00/raw: No
such file or directory
root at nicole:~# echo 1 > /sys/devices/platform/gpio-fsi/fsi0/rescan
root at nicole:~# ls /sys/devices/platform/gpio-fsi/fsi0/
break    of_node/ rescan   uevent

After 'obmcutil poweron':

root at nicole:~# pdbg -a probe
fsi0: Kernel based FSI master (*)
    fsi1: CFAM hMFSI Port
        pib1: POWER FSI2PIB (*)
            chiplet0: POWER9 Chiplet
                core0: POWER9 Core (*)
                    thread0: POWER9 Thread (*)
...
a lot of chiplets/cores/etc
root at nicole:~# pdbg -a getscom 0x6C708
p1: 0x000000000006c708 = 0x4008152900000000 (/kernelfsi at 0/hmfsi at 100000/pib at 1000)
p0: 0x000000000006c708 = 0x400816fd00000000 (/kernelfsi at 0/pib at 1000)
p1: 0x000000002006c708 = 0x00000000000f0000 (/kernelfsi at 0/hmfsi at 100000/pib at 1000
/chiplet at 20000000/core at 0)
...
a lot of accesses.
root at nicole:~# ls /sys/devices/platform/gpio-fsi/fsi0/slave\@00\:00/
00:00:00:00       00:00:00:03       00:00:00:09       00:00:00:0d       00:00:00
:10       00:00:00:13       dev               send_echo_delays
00:00:00:01       00:00:00:06       00:00:00:0a       00:00:00:0e       00:00:00
:11       cfam_id           of_node           send_term
00:00:00:02       00:00:00:07       00:00:00:0c       00:00:00:0f       00:00:00
:12       chip_id           raw               uevent
root at nicole:~# echo 1 > /sys/devices/platform/gpio-fsi/fsi0/rescan
root at nicole:~# ls /sys/devices/platform/gpio-fsi/fsi0/slave\@00\:00/
00:00:00:00       00:00:00:03       00:00:00:09       00:00:00:0d       00:00:00
:10       00:00:00:13       dev               send_echo_delays
00:00:00:01       00:00:00:06       00:00:00:0a       00:00:00:0e       00:00:00
:11       cfam_id           of_node           send_term
00:00:00:02       00:00:00:07       00:00:00:0c       00:00:00:0f       00:00:00
:12       chip_id           raw               uevent


After 'obmcutil poweroff':

root at nicole:~# pdbg -a getscom 0x6C708
pdbg: Failed to read from 0x00003410 (0000000000003404): No such device
pdbg: Failed to write to 0x00003410 (0000000000003404): No such device
Unable to enable HMFSI port 1
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
root at nicole:~# pdbg -a probe
pdbg: Failed to read from 0x00003410 (0000000000003404): No such device
pdbg: Failed to write to 0x00003410 (0000000000003404): No such device
Unable to enable HMFSI port 1
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
fsi0: Kernel based FSI master (*)
root at nicole:~# ls /sys/devices/platform/gpio-fsi/fsi0/
break        of_node      rescan       slave at 00:00  uevent
root at nicole:~# ls /sys/devices/platform/gpio-fsi/fsi0/slave\@00\:00/
00:00:00:00/      00:00:00:03/      00:00:00:09/      00:00:00:0d/      00:00:00
:10/      00:00:00:13/      dev               send_echo_delays
00:00:00:01/      00:00:00:06/      00:00:00:0a/      00:00:00:0e/      00:00:00
:11/      cfam_id           of_node/          send_term
00:00:00:02/      00:00:00:07/      00:00:00:0c/      00:00:00:0f/      00:00:00
:12/      chip_id           raw               uevent

The slave is present here with some engines as I see, sbefifo is present also -
in our case sbefifo is '00:00:00:06'.

After rescan:
root at nicole:~# echo 1 > /sys/devices/platform/gpio-fsi/fsi0/rescan
root at nicole:~# ls /sys/devices/platform/gpio-fsi/fsi0/
break    of_node  rescan   uevent
root at nicole:~# pdbg -a probe
pdbg: Unable to open /sys/devices/platform/gpio-fsi/fsi0/slave at 00:00/raw: No
such file or directory


After that moment trying to get up the host with 'obmcutil poweron' again:

root at nicole:~# obmcutil poweron
root at nicole:~# pdbg -a probe
pdbg: Failed to read from 0x00003410 (0000000000003404): No such device
pdbg: Failed to write to 0x00003410 (0000000000003404): No such device
Unable to enable HMFSI port 1
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
fsi0: Kernel based FSI master (*)
root at nicole:~# ls /sys/devices/platform/gpio-fsi/fsi0/slave\@00\:00/
00:00:00:00       00:00:00:02       00:00:00:06       chip_id           of_node 
          send_echo_delays  uevent
00:00:00:01       00:00:00:03       cfam_id           dev               raw     
          send_term
root at nicole:~# echo 1 > /sys/devices/platform/gpio-fsi/fsi0/rescan
root at nicole:~# ls /sys/devices/platform/gpio-fsi/fsi0/
break    of_node  rescan   uevent


This is what we see in journalctl after that last 'obmcutil poweron':

Jul 03 10:48:22 nicole systemd[1]: Stopped target Stop Host0.
Jul 03 10:48:22 nicole systemd[1]: Stopped target Chassis0 (Power Off).
Jul 03 10:48:22 nicole systemd[1]: Stopped target Power0 Off.
Jul 03 10:48:22 nicole systemd[1]: op-power-stop at 0.service: Succeeded.
Jul 03 10:48:22 nicole systemd[1]: Stopped Stop Power0.
Jul 03 10:48:22 nicole systemd[1]: op-wait-power-off at 0.service: Succeeded.
Jul 03 10:48:22 nicole systemd[1]: Stopped Wait for Power0 to turn off.
Jul 03 10:48:22 nicole systemd[1]: Stopped target Power0 Off (Pre).
Jul 03 10:48:22 nicole systemd[1]: Stopped target Host0 (Stopped).
Jul 03 10:48:22 nicole systemd[1]: Stopped target Host0 (Stopping).
Jul 03 10:48:22 nicole systemd[1]: Stopped target Stop Host0 (Pre).
Jul 03 10:48:22 nicole systemd[1]: Started mapper subtree-remove
/xyz/openbmc_project/software:xyz.openbmc_project.Software.ActivationBlocksTrans
ition.
Jul 03 10:48:22 nicole systemd[1]: Started Reload mboxd during power on.
Jul 03 10:48:22 nicole systemd[1]: mapper-subtree-remove at -xyz-
openbmc\x5fproject-
software\x3Axyz.openbmc_project.Software.ActivationBlocksTransition.service:
Succeeded.
Jul 03 10:48:22 nicole phosphor-host-state-manager[1117]: Received signal that
host is off
Jul 03 10:48:22 nicole phosphor-host-state-manager[1117]: Change to Host State
Jul 03 10:48:22 nicole phosphor-chassis-state-manager[1099]: Received signal
that power OFF is complete
Jul 03 10:48:22 nicole phosphor-chassis-state-manager[1099]: Change to Chassis
Power State
Jul 03 10:48:22 nicole systemd[1]: Reached target Power0 On (Pre).
Jul 03 10:48:22 nicole systemd[1]: Starting Start Power0...
Jul 03 10:48:23 nicole power_control.exe[1029]: PowerControl: setting power up
SOFTWARE_PGOOD to 1
Jul 03 10:48:23 nicole power_control.exe[1029]: PowerControl: setting power up
BMC_POWER_UP to 1
Jul 03 10:48:23 nicole systemd[1]: Started Start Power0.
Jul 03 10:48:23 nicole phosphor-gpio-monitor[1540]: GPIO line altered
Jul 03 10:48:23 nicole systemd[1]: phosphor-gpio-monitor at checkstop.service:
Succeeded.
Jul 03 10:48:23 nicole systemd[1]: Stopped Phosphor GPIO checkstop monitor.
Jul 03 10:48:23 nicole systemd[1]: Stopping Phosphor poweron watchdog...
Jul 03 10:48:23 nicole systemd[1]: Starting OpenPOWER debug data collector for
host checkstop...
Jul 03 10:48:23 nicole systemd[1]: phosphor-watchdog at poweron.service: Main
process exited, code=killed, status=15/TERM
Jul 03 10:48:23 nicole systemd[1]: phosphor-watchdog at poweron.service: Succeeded.
Jul 03 10:48:23 nicole systemd[1]: Stopped Phosphor poweron watchdog.
Jul 03 10:48:24 nicole systemd[1]: Started Wait for Power0 to turn on.
Jul 03 10:48:24 nicole systemd[1]: Reached target Power0 On.
Jul 03 10:48:24 nicole systemd[1]: Reached target Power0 (On).
Jul 03 10:48:24 nicole systemd[1]: Started Phosphor Fan Presence Tach Daemon.
Jul 03 10:48:25 nicole systemd[1]: Starting Scan FSI devices...
Jul 03 10:48:25 nicole kernel: sbefifo 00:00:00:06: Cleanup: FIFO not clean
(up=0x02a8fe01 down=0x01100000)
Jul 03 10:48:33 nicole checkstop_app[1633]: Host checkstop condition detected
Jul 03 10:48:33 nicole phosphor-log-manager[1051]: Failed to find metadata
Jul 03 10:48:33 nicole systemd[1]: openpower-debug-collector-checkstop at 0.service
: Succeeded.
Jul 03 10:48:33 nicole systemd[1]: Started OpenPOWER debug data collector for
host checkstop.
Jul 03 10:48:33 nicole phosphor-dump-manager[1061]: Dump not captured due to a
cap.
Jul 03 10:48:33 nicole systemd[1]: Reached target Quiesce Target.
Jul 03 10:48:33 nicole systemd[1]: Reached target Host instance 0 crashed.
Jul 03 10:48:33 nicole phosphor-host-state-manager[1117]: Auto reboot enabled,
rebooting
Jul 03 10:48:33 nicole phosphor-host-state-manager[1117]: Beginning reboot...
Jul 03 10:48:33 nicole phosphor-host-state-manager[1117]: Host State transaction
request
Jul 03 10:48:34 nicole sh[1536]: Job for obmc-host-startmin at 0.target canceled.
Jul 03 10:48:34 nicole systemd[1]: phosphor-reboot-host at 0.service: Main process
exited, code=exited, status=1/FAILURE
Jul 03 10:48:34 nicole systemd[1]: phosphor-reboot-host at 0.service: Failed with
result 'exit-code'.
Jul 03 10:48:34 nicole systemd[1]: Reached target Stop Host0 (Pre).
Jul 03 10:48:34 nicole systemd[1]: Stopped target Host0 (Reset Check).
Jul 03 10:48:34 nicole systemd[1]: Stopped target Host0 running after reset.
Jul 03 10:48:34 nicole systemd[1]: Stopped target Host instance 0 crashed.
Jul 03 10:48:34 nicole systemd[1]: Stopped target Quiesce Target.
Jul 03 10:48:34 nicole systemd[1]: Starting Soft power off of the host...
Jul 03 10:48:34 nicole ipmid[1102]: Command in process, no attention
Jul 03 10:48:52 nicole ipmid[1102]: Host control timeout hit!
Jul 03 10:48:52 nicole ipmid[1102]: Failed to deliver host command
Jul 03 10:48:52 nicole ipmid[1102]: Failed to deliver host command
Jul 03 10:48:52 nicole phosphor-softpoweroff[1634]: Timeout on host attention,
continue with power down
Jul 03 10:48:52 nicole systemd[1]:
xyz.openbmc_project.Ipmi.Internal.SoftPowerOff.service: Succeeded.
Jul 03 10:48:52 nicole systemd[1]: Started Soft power off of the host.
Jul 03 10:48:52 nicole systemd[1]: Reached target Host0 (Stopping).
Jul 03 10:48:52 nicole systemd[1]: Reached target Host0 (Stopped).
Jul 03 10:48:52 nicole systemd[1]: Reached target Power0 Off (Pre).
Jul 03 10:48:52 nicole systemd[1]: Started Stop Power0.
Jul 03 10:48:52 nicole systemd[1]: Starting Wait for Power0 to turn off...
Jul 03 10:48:52 nicole power_control.exe[1029]: PowerControl: setting power up
SOFTWARE_PGOOD to 0
Jul 03 10:48:52 nicole power_control.exe[1029]: PowerControl: setting power up
BMC_POWER_UP to 0
Jul 03 10:48:52 nicole kernel: sbefifo 00:00:00:06: Failed to read UP fifo
status during reset , rc=-19
Jul 03 10:48:52 nicole kernel: sbefifo 00:00:00:06: Initial HW cleanup failed,
will retry later
Jul 03 10:48:52 nicole kernel: occ-hwmon occ-hwmon.1: failed to get OCC poll
response: -19
Jul 03 10:48:52 nicole kernel:  slave at 00:00: error reading slave registers
Jul 03 10:48:52 nicole systemd[1]: fsi-scan at 0.service: Main process exited,
code=killed, status=15/TERM
Jul 03 10:48:52 nicole systemd[1]: fsi-scan at 0.service: Failed with result
'signal'.
Jul 03 10:48:52 nicole systemd[1]: Stopped Scan FSI devices.
Jul 03 10:48:52 nicole systemd[1]: Stopped target Power0 (On).
Jul 03 10:48:52 nicole systemd[1]: Stopped target Power0 On.



More information about the openbmc mailing list