NXP P50XX/e5500: SMP doesn't work anymore with the latest Git kernel

Michael Ellerman mpe at ellerman.id.au
Thu Nov 1 00:20:57 AEDT 2018


Christian Zigotzky <chzigotzky at xenosoft.de> writes:

> Little progress ...
>
> I reverted the following two OF files of the commit 'Merge tag 
> devicetree-for-4.20' and SMP works! The problematic code is somewhere in 
> these two files.
>
> a/include/linux/of.h
> a/drivers/of/base.c

Hi Christian,

Trying to debug things by reverting like this can work, but it's quite
error prone and is usually only used *after* a bisect has identified the
suspect code, or if a bisect can't work for some reason.

I know you said you'd had trouble bisecting in the past, but this one
should be a good one to practice on.

You already identified that the merge of the devicetree changes was the
problem, ie. 

  b27186abb37b Merge tag 'devicetree-for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux


So you do:
  $ git show b27186abb37b 
  commit b27186abb37b7bd19e0ca434f4f425c807dbd708
  Merge: 0ef7791e2bfb d061864b89c3
  Author: Linus Torvalds <torvalds at linux-foundation.org>
  Date:   Fri Oct 26 12:09:58 2018 -0700
  
      Merge tag 'devicetree-for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux


And that shows you the two commits that were merged 0ef7791e2bfb and
d061864b89c3. If you look at them you see:

  $ git log -1 --oneline 0ef7791e2bfb
  0ef7791e2bfb Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
   
  $ git log -1 --oneline d061864b89c3
  d061864b89c3 ARM: dt: relicense two DT binding IRQ headers

You can see that the first one is the previous commit on Linus' branch,
ie. an unrelated merge. The 2nd commit is the commit that was on top of
robh's tree, ie. that's the start of the interesting commits for us.

You can also get to that 2nd commit using b27186abb37b^2.

If you look at what came in via Rob's branch with:

  $ git log --oneline d061864b89c3
  or
  $ git log --oneline b27186abb37b^2

You see there's quite a few commits, and in particular there's another
merge:

  389d0a8a7af8 Merge branch 'dt/cpu-type-rework' into dt/next

If we log the 2nd parent of that, we see:

 $ git log --oneline 389d0a8a7af8^2
 4c29e5934f6c microblaze: get cpu node with of_get_cpu_node
 a691240e36e3 fbdev: fsl-diu: get cpu node with of_get_cpu_node
 651d44f9679c of: use for_each_of_cpu_node iterator
 a9a455e854cd iommu: fsl_pamu: use for_each_of_cpu_node iterator
 37dc218bed44 edac: cpc925: use for_each_of_cpu_node iterator
 76ec23b127cd clk: mvebu: use for_each_of_cpu_node iterator
 7de8f4aa2f35 x86: DT: use for_each_of_cpu_node iterator
 8cabf5bc1049 SH: use for_each_of_cpu_node iterator
 38959a091e4a powerpc: 8xx: get cpu node with of_get_cpu_node
 84dbc69a2ff3 powerpc: 4xx: get cpu node with of_get_cpu_node
 a94fe366340a powerpc: use for_each_of_cpu_node iterator
 5e5abae858b5 openrisc: use for_each_of_cpu_node iterator
 1f0fe1f67cef nios2: get cpu node with of_get_cpu_node
 5a931a3c80b5 c6x: use for_each_of_cpu_node iterator
 de76e70a8d4e arm64: use for_each_of_cpu_node iterator
 5af5d40c4015 ARM: shmobile: use for_each_of_cpu_node iterator
 07d44f1f82b7 ARM: topology: remove unneeded check for /cpus node
 d4866f751edf ARM: use for_each_of_cpu_node iterator
 6487c15f1cc9 of: Support matching cpu nodes with no 'reg' property
 f1f207e43b8a of: Add cpu node iterator for_each_of_cpu_node()
 f6707fd6241e of: make PowerMac cache node search conditional on CONFIG_PPC_PMAC
 6d0a70a284be vsprintf: print OF node name using full_name
 a613b26a5013 of: Convert to using %pOFn instead of device_node.name
 6901378c799d of/unittest: add printf tests for node name
 b610e2ff4622 of/unittest: remove use of node name pointer in overlay high level test
 57361846b52b (tag: v4.19-rc2) Linux 4.19-rc2


So if we think the suspect commit is in there, we would confirm that by
checking out v4.19-rc2 and testing it works. And then checkout out
4c29e5934f6c and testing that it's broken.

Assuming the former worked and the latter was broken, we do:

 $ git bisect good v4.19-rc2
 $ git bisect bad 4c29e5934f6c 

And then just follow the prompts.

One thing to watch out for is hitting an unrelated bug, that can
sometimes derail your bisection.

In this case the bug we're looking for is that CPU 1 isn't onlined
properly. But if the system doesn't boot entirely for example then you
shouldn't mark the commit as bad, instead it's better to skip it. Then
git will choose a different commit for you to test.

Anyway hope that helps.

cheers

> On 29 October 2018 at 6:00PM, Christian Zigotzky wrote:
>> Hello,
>>
>> I figured out that the problem is in the OF source code of the commit: 
>> Merge tag devicetree-for-4.20. [1]
>>
>> I reverted the following OF files and SMP works!
>>
>> drivers/of/base.c
>> drivers/of/device.c
>> drivers/of/of_mdio.c
>> drivers/of/of_numa.c
>> drivers/of/of_private.h
>> drivers/of/overlay.c
>> drivers/of/platform.c
>> drivers/of/unittest-data/overlay_15.dts
>> drivers/of/unittest-data/tests-overlay.dtsi
>> drivers/of/unittest.c
>> include/linux/of.h
>>
>> Cheers,
>> Christian
>>
>> [1] 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b27186abb37b7bd19e0ca434f4f425c807dbd708
>>
>>
>> On 29 October 2018 at 10:56AM, Christian Zigotzky wrote:
>>> Hello,
>>>
>>> I have figured out that the commit 'devicetree-for-4.20' [1] is 
>>> responsible for the SMP problem. I was able to revert this commit 
>>> with 'git revert b27186abb37b7bd19e0ca434f4f425c807dbd708 -m 1' today.
>>>
>>> [master ec81438] Revert "Merge tag 'devicetree-for-4.20' of 
>>> git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux"
>>> 138 files changed, 931 insertions(+), 1538 deletions(-)
>>> rename Documentation/devicetree/bindings/arm/{atmel-sysregs.txt => 
>>> atmel-at91.txt} (67%)
>>> delete mode 100644 
>>> Documentation/devicetree/bindings/arm/freescale/fsl,layerscape-dcfg.txt
>>> delete mode 100644 
>>> Documentation/devicetree/bindings/arm/freescale/fsl,layerscape-scfg.txt
>>> rename Documentation/devicetree/bindings/arm/{zte,sysctrl.txt => 
>>> zte.txt} (62%)
>>> delete mode 100644 Documentation/devicetree/bindings/misc/lwn-bk4.txt
>>> create mode 100644 arch/c6x/boot/dts/linked_dtb.S
>>> delete mode 100644 arch/nios2/boot/dts/Makefile
>>> create mode 100644 arch/nios2/boot/linked_dtb.S
>>> delete mode 100644 arch/powerpc/boot/dts/Makefile
>>> delete mode 100644 arch/powerpc/boot/dts/fsl/Makefile
>>> delete mode 100644 scripts/dtc/yamltree.c
>>>
>>> It solves the SMP problem! SMP works again on my P5020 board and on 
>>> virtual e5500 QEMU machines.
>>>
>>> QEMU command: ./qemu-system-ppc64 -M ppce500 -cpu e5500 -m 2048 
>>> -kernel /home/christian/Downloads/uImage-4.20-alpha5 -drive 
>>> format=raw,file=/home/christian/Dokumente/ubuntu_MATE_16.04.3_LTS_PowerPC_QEMU/ubuntu_MATE_16.04_PowerPC.img,index=0,if=virtio 
>>> -nic user,model=e1000 -append "rw root=/dev/vda3" -device virtio-vga 
>>> -device virtio-mouse-pci -device virtio-keyboard-pci -soundhw es1370 
>>> -smp 4
>>>
>>> Screenshot: 
>>> https://plus.google.com/u/0/photos/photo/115515624056477014971/6617705776207990082
>>>
>>> Do we need a new dtb file or is it a bug?
>>>
>>> Thanks,
>>> Christian
>>>
>>> [1] 
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b27186abb37b7bd19e0ca434f4f425c807dbd708
>>>
>>>
>>> On 28 October 2018 at 5:35PM, Christian Zigotzky wrote:
>>>> Hello,
>>>>
>>>> SMP doesn't work anymore with the latest Git kernel (28/10/18 
>>>> 11:12AM GMT) on my P5020 board and on virtual e5500 QEMU machines.
>>>>
>>>> Board with P5020 dual core CPU:
>>>>
>>>> [    0.000000] -----------------------------------------------------
>>>> [    0.000000] phys_mem_size     = 0x200000000
>>>> [    0.000000] dcache_bsize      = 0x40
>>>> [    0.000000] icache_bsize      = 0x40
>>>> [    0.000000] cpu_features      = 0x00000003008003b4
>>>> [    0.000000]   possible        = 0x00000003009003b4
>>>> [    0.000000]   always          = 0x00000003008003b4
>>>> [    0.000000] cpu_user_features = 0xcc008000 0x08000000
>>>> [    0.000000] mmu_features      = 0x000a0010
>>>> [    0.000000] firmware_features = 0x0000000000000000
>>>> [    0.000000] -----------------------------------------------------
>>>> [    0.000000] CoreNet Generic board
>>>>
>>>>     ...
>>>>
>>>> [    0.002161] smp: Bringing up secondary CPUs ...
>>>> [    0.002339] No cpu-release-addr for cpu 1
>>>> [    0.002347] smp: failed starting cpu 1 (rc -2)
>>>> [    0.002401] smp: Brought up 1 node, 1 CPU
>>>>
>>>> Virtual e5500 quad core QEMU machine:
>>>>
>>>> [    0.026394] smp: Bringing up secondary CPUs ...
>>>> [    0.027831] No cpu-release-addr for cpu 1
>>>> [    0.027989] smp: failed starting cpu 1 (rc -2)
>>>> [    0.030143] No cpu-release-addr for cpu 2
>>>> [    0.030304] smp: failed starting cpu 2 (rc -2)
>>>> [    0.032400] No cpu-release-addr for cpu 3
>>>> [    0.032533] smp: failed starting cpu 3 (rc -2)
>>>> [    0.033117] smp: Brought up 1 node, 1 CPU
>>>>
>>>> QEMU command: ./qemu-system-ppc64 -M ppce500 -cpu e5500 -m 2048 
>>>> -kernel 
>>>> /home/christian/Downloads/vmlinux-4.20-alpha4-AmigaOne_X1000_X5000/X5000_and_QEMU_e5500/uImage-4.20 
>>>> -drive 
>>>> format=raw,file=/home/christian/Downloads/MATE_PowerPC_Remix_2017_0.9.img,index=0,if=virtio 
>>>> -nic user,model=e1000 -append "rw root=/dev/vda" -device virtio-vga 
>>>> -device virtio-mouse-pci -device virtio-keyboard-pci -usb -soundhw 
>>>> es1370 -smp 4
>>>>
>>>> .config:
>>>>
>>>> ...
>>>> CONFIG_SMP=y
>>>> CONFIG_NR_CPUS=4
>>>> ...
>>>>
>>>> Please test the latest Git kernel on your NXP P50XX boards.
>>>>
>>>> Thanks,
>>>> Christian
>>>>
>>>
>>>
>>
>>


More information about the Linuxppc-dev mailing list