Petitboot crash on Talos 2 with Broadcom SAS controller

Timothy Pearson tpearson at raptorengineering.com
Tue Jul 7 03:59:09 AEST 2020


There was an unfortunate kernel bug that crept in just before we released our firmware -- it only affects the LSI SAS controllers and creates the exact symptom you see.

You can work around it by using the beta PNOR from here:
https://wiki.raptorcs.com/wiki/Talos_II/Firmware/Public_Beta#2.01-next_.2804-16-2019_branch.29

The actual patch that fixes it in the skiroot kernel is here:
https://git.raptorcs.com/git/talos-op-build/commit/?id=5d7b717fceeaa1e7cf45de2938e190cb6e0ed1ee

----- Original Message -----
> From: "Christian Müller" <cmueller at trufflepig-forensics.com>
> To: "petitboot" <petitboot at lists.ozlabs.org>
> Sent: Monday, July 6, 2020 6:04:42 AM
> Subject: Petitboot crash on Talos 2 with Broadcom SAS controller

> Hi guys,
> I've bought a Talos 2 and currently trying to get it working.
> There's a Broadcom SAS 9305-24i controller on the board and petitboot
> comes up normally.
> My Problem: When kexec is being executed (I can select the Alpine image
> I want to boot from the petitboot TUI) it crashes with the following
> stack trace:
> 
> SIGTERM received, booting...
> cpu 0x0: Vector: 380 (Data Access Out of Range) at [c000001fe6c47620]
>    pc: c0000000001d05f0: __free_pages+0x10/0x50
>    lr: c000000000123c24: dma_direct_free_pages+0x54/0x90
>    sp: c000001fe6c478b0
>   msr: 900000000280b033
>   dar: c04240000000f8b4
>  current = 0xc000001fe6b88400
>  paca    = 0xc000001fff7ff480   irqmask: 0x03   irq_happened: 0x01
>    pid   = 1021, comm = kexec
> Linux version 5.5.0-openpower1 (root at raptor-build-public-staging-01) (gcc
> version 6.5.0 (Buildroot 2019.05.3-06769-g7bdd570165)) #2 SMP Thu Feb 20
> 02:19:47 UTC 2020
> enter ? for help
> [c000001fe6c478b0] c000000000123c24 dma_direct_free_pages+0x54/0x90 (unreliable)
> [c000001fe6c478d0] c000000000038728 dma_iommu_free_coherent+0x98/0xc0
> [c000001fe6c47920] c000000000123020 dma_free_attrs+0x100/0x110
> [c000001fe6c47970] c0000000001d9bf4 dma_pool_destroy+0x174/0x200
> [c000001fe6c47a10] c0080000011617e8 _base_release_memory_pools+0x1e0/0x498
> [mpt3sas]
> [c000001fe6c47aa0] c00800000116b428 mpt3sas_base_detach+0x40/0x160 [mpt3sas]
> [c000001fe6c47b10] c00800000117bb5c scsih_shutdown+0xc4/0x110 [mpt3sas]
> [c000001fe6c47b70] c0000000003cda10 pci_device_shutdown+0x50/0xc0
> [c000001fe6c47ba0] c00000000064a908 device_shutdown+0x1f8/0x330
> [c000001fe6c47c40] c0000000000cfe2c kernel_restart_prepare+0x4c/0x60
> [c000001fe6c47c60] c0000000001505c0 kernel_kexec+0xa0/0xe0
> [c000001fe6c47cd0] c0000000000d03f4 __do_sys_reboot+0x234/0x2c0
> [c000001fe6c47e20] c00000000000b50c system_call+0x5c/0x68
> --- Exception: c01 (System Call) at 00007fffa70f10a4
> SP (7ffff4af0240) is in userspace
> 0:mon>
> 
> As far as I can tell it's before the entry point of the Alpine kernel is beeing
> called, so this might be an Petitboot issue or (more likely) a firmware issue
> with the card. I have updated the firmware to the current version and the
> problem persists.
> Can you give me some hints how I can proceed/debug this issue?
> 
> Thanks for your time and have a nice day!
> 
> Chris
> _______________________________________________
> Petitboot mailing list
> Petitboot at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/petitboot


More information about the Petitboot mailing list