linux 4.10 on ast2400
Patrick Venture
venture at google.com
Tue Dec 19 08:57:46 AEDT 2017
I loaded 4.10 with some debug memory stuff, but I noticed that each
reboot could have wildly different free memory. So, here's the
results from just dumping the file immediately after boot and then
rebooting.
root at quanta-q71l:~# cat /proc/meminfo
MemTotal: 115076 kB
MemFree: 42228 kB
root at quanta-q71l:~# cat /proc/meminfo
MemTotal: 115076 kB
MemFree: 1668 kB
root at quanta-q71l:~# cat /proc/meminfo
MemTotal: 115076 kB
MemFree: 1876 kB
root at quanta-q71l:~# cat /proc/meminfo
MemTotal: 115076 kB
MemFree: 27464 kB
root at quanta-q71l:~# cat /proc/meminfo
MemTotal: 115076 kB
MemFree: 12140 kB
root at quanta-q71l:~# cat /proc/meminfo
MemTotal: 115076 kB
MemFree: 2084 kB
On Thu, Nov 9, 2017 at 11:47 AM, Patrick Venture <venture at google.com> wrote:
> I added these configurations and after ~10 reboots it wasn't
> reproducing, but I'll keep an eye out and update over the coming days.
>
> Thanks!
>
> On Tue, Nov 7, 2017 at 1:56 AM, Joel Stanley <joel at jms.id.au> wrote:
>> On Tue, Nov 7, 2017 at 8:09 PM, Joel Stanley <joel at jms.id.au> wrote:
>>> On Tue, Nov 7, 2017 at 11:42 AM, Patrick Venture <venture at google.com> wrote:
>>>> I've been doing testing with linux 4.10 on the ast2400 and on some
>>>> percentage (20% of systems) when they boot they're not able to really
>>>> launch applications. The one we see failing is agetty, but ipmid also
>>>> ends up not running. Here is the log from what we're seeing on the
>>>> quanta-q71l:
>>>>
>>>> [ OK ] Started Clear one time boot overrides.
>>>> [ OK ] Found device /dev/ttyS4.
>>>> [ OK ] Found device /dev/ttyVUART0.
>>>> [ 42.360000] 8021q: adding VLAN 0 to HW filter on device eth1
>>>> [ OK ] Started Network Service.
>>>> [ 42.420000] 8021q: adding VLAN 0 to HW filter on device eth0
>>>> [ OK ] Started Phosphor Inventory Manager.
>>>> [ OK ] Started Phosphor Settings Daemon.
>>>> [ OK ] Reached target Network.
>>>> Starting Permit User Sessions...
>>>> [ OK ] Started Lightweight SLP Server.
>>>> [ OK ] Started Phosphor Console Muxer listening on device /dev/ttyVUART0.
>>>> [ OK ] Started Phosphor Inband IPMI.
>>>> [ OK ] Created slice system-xyz.openbmc_project.Hwmon.slice.
>>>> [ OK ] Started Permit User Sessions.
>>>> [ OK ] Started Serial Getty on ttyS4.
>>>> [ OK ] Reached target Login Prompts.
>>>> [ OK ] Reached target Multi-User System.
>>>> [ 44.530000] ftgmac100 1e680000.ethernet eth1: NCSI interface down
>>>> [ 45.800000] ftgmac100 1e660000.ethernet eth0: NCSI interface down
>>>> [ 49.430000] Unable to handle kernel paging request at virtual
>>>> address e1a00006
>>>> [ 49.430000] pgd = 85354000
>>>> [ 49.430000] [e1a00006] *pgd=00000000
>>>> [ 49.430000] Internal error: Oops: 1 [#1] ARM
>>>> [ 49.430000] CPU: 0 PID: 932 Comm: (agetty) Not tainted
>>>> 4.10.17-eced538e6233c50729cc107958596a1443947ba2 #1
>>>
>>> This SHA isn't in the OpenBMC dev-4.10 tree. Where are you getting
>>> your kernel sources from?
>>>
>>> Wherever you've grabbed it from it's out of date as the line numbers
>>> don't quite make sense.
>>>
>>>> [ 49.430000] Hardware name: ASpeed SoC
>>>> [ 49.430000] task: 86e1c000 task.stack: 858f6000
>>>> [ 49.430000] PC is at unlink_anon_vmas+0x98/0x1b0
>>>
>>> We have seen memory corruption when running under Qemu. This is the
>>> first time I've had a report of it happening on hardware.
>>>
>>> https://github.com/openbmc/qemu/issues/9
>>>
>>> Can you share some information with how you're booting?
>>>
>>> Are you netbooting?
>>>
>>> Which u-boot tree are you using? Does it enable networking before
>>> jumping to the kenrel? Or trigger any other kinds of DMA?
>>
>> Can you reproduce with some debugging turned on? Build your kernel with:
>>
>> DEBUG_LIST
>> PAGE_POISONING
>> DEBUG_PAGEALLOC
>> DEBUG_SLAB
>>
>> Or even more. Take a look through the Kernel hacking menu in
>> menuconfig and enable things until the system slows down too much to
>> reproduce the issue :)
>>
>> Does it reproduce if you disable the FTGMAC100 devices (set them to
>> status = "disabled" in your device tree, or disable them in the kernel
>> config)?
More information about the openbmc
mailing list