linux 4.10 on ast2400
Joel Stanley
joel at jms.id.au
Tue Nov 7 20:56:52 AEDT 2017
On Tue, Nov 7, 2017 at 8:09 PM, Joel Stanley <joel at jms.id.au> wrote:
> On Tue, Nov 7, 2017 at 11:42 AM, Patrick Venture <venture at google.com> wrote:
>> I've been doing testing with linux 4.10 on the ast2400 and on some
>> percentage (20% of systems) when they boot they're not able to really
>> launch applications. The one we see failing is agetty, but ipmid also
>> ends up not running. Here is the log from what we're seeing on the
>> quanta-q71l:
>>
>> [ OK ] Started Clear one time boot overrides.
>> [ OK ] Found device /dev/ttyS4.
>> [ OK ] Found device /dev/ttyVUART0.
>> [ 42.360000] 8021q: adding VLAN 0 to HW filter on device eth1
>> [ OK ] Started Network Service.
>> [ 42.420000] 8021q: adding VLAN 0 to HW filter on device eth0
>> [ OK ] Started Phosphor Inventory Manager.
>> [ OK ] Started Phosphor Settings Daemon.
>> [ OK ] Reached target Network.
>> Starting Permit User Sessions...
>> [ OK ] Started Lightweight SLP Server.
>> [ OK ] Started Phosphor Console Muxer listening on device /dev/ttyVUART0.
>> [ OK ] Started Phosphor Inband IPMI.
>> [ OK ] Created slice system-xyz.openbmc_project.Hwmon.slice.
>> [ OK ] Started Permit User Sessions.
>> [ OK ] Started Serial Getty on ttyS4.
>> [ OK ] Reached target Login Prompts.
>> [ OK ] Reached target Multi-User System.
>> [ 44.530000] ftgmac100 1e680000.ethernet eth1: NCSI interface down
>> [ 45.800000] ftgmac100 1e660000.ethernet eth0: NCSI interface down
>> [ 49.430000] Unable to handle kernel paging request at virtual
>> address e1a00006
>> [ 49.430000] pgd = 85354000
>> [ 49.430000] [e1a00006] *pgd=00000000
>> [ 49.430000] Internal error: Oops: 1 [#1] ARM
>> [ 49.430000] CPU: 0 PID: 932 Comm: (agetty) Not tainted
>> 4.10.17-eced538e6233c50729cc107958596a1443947ba2 #1
>
> This SHA isn't in the OpenBMC dev-4.10 tree. Where are you getting
> your kernel sources from?
>
> Wherever you've grabbed it from it's out of date as the line numbers
> don't quite make sense.
>
>> [ 49.430000] Hardware name: ASpeed SoC
>> [ 49.430000] task: 86e1c000 task.stack: 858f6000
>> [ 49.430000] PC is at unlink_anon_vmas+0x98/0x1b0
>
> We have seen memory corruption when running under Qemu. This is the
> first time I've had a report of it happening on hardware.
>
> https://github.com/openbmc/qemu/issues/9
>
> Can you share some information with how you're booting?
>
> Are you netbooting?
>
> Which u-boot tree are you using? Does it enable networking before
> jumping to the kenrel? Or trigger any other kinds of DMA?
Can you reproduce with some debugging turned on? Build your kernel with:
DEBUG_LIST
PAGE_POISONING
DEBUG_PAGEALLOC
DEBUG_SLAB
Or even more. Take a look through the Kernel hacking menu in
menuconfig and enable things until the system slows down too much to
reproduce the issue :)
Does it reproduce if you disable the FTGMAC100 devices (set them to
status = "disabled" in your device tree, or disable them in the kernel
config)?
More information about the openbmc
mailing list