[Skiboot] [PATCH] core/mem_region: Give up on freeing unmanaged memory block
gwshan at linux.vnet.ibm.com
Mon May 29 11:13:17 AEST 2017
On Fri, May 26, 2017 at 04:57:19PM +1000, Oliver O'Halloran wrote:
>On Fri, May 26, 2017 at 4:30 PM, Gavin Shan <gwshan at linux.vnet.ibm.com> wrote:
>> On Fri, May 26, 2017 at 04:12:24PM +1000, Oliver O'Halloran wrote:
>>>On Fri, May 26, 2017 at 3:21 PM, Gavin Shan <gwshan at linux.vnet.ibm.com> wrote:
>>>> When adding memory regions to global list, the overlapped one
>>>> (including it's descriptor) is purged. However, the descriptor might
>>>> be in BSS (not HEAP) section. It will trigger the exception caused
>>>> by assert() in mem_free() as below backtrace indicates. The memory
>>>> regions causing this are "ibm,os-reserve" and "hostboot-reserve-0"
>>>> separately. The former descriptor is resident in BSS section while
>>>> the later one is in HEAP:
>>>> CPU 0050 Backtrace:
>>>> S: 0000000031d43a00 R: 00000000300136d0 .backtrace+0x2c
>>>> S: 0000000031d43a90 R: 00000000300191ec ._abort+0x4c
>>>> S: 0000000031d43b10 R: 0000000030019268 .assert_fail+0x34
>>>> S: 0000000031d43b90 R: 0000000030015d08 .mem_free+0x64
>>>> S: 0000000031d43c20 R: 0000000030017080 .__free+0x38
>>>> S: 0000000031d43cb0 R: 0000000030015518 .add_region+0x230
>>>> S: 0000000031d43d60 R: 0000000030016854 .mem_region_init+0x2c8
>>>> S: 0000000031d43e30 R: 0000000030014994 .main_cpu_entry+0x3ec
>>>> S: 0000000031d43f00 R: 0000000030002648 boot_entry+0x198
>>>> This fixes the issue by doing nothing when trying to freeing memory
>>>> block which is resident in unmanaged (not HEAP) region.
>>>> Signed-off-by: Gavin Shan <gwshan at linux.vnet.ibm.com>
>>>Not a fan. The specific bug you're seeing here is because of hostboot
>>>being broken on FSP machines and reserving memory that it shouldn't.
>>>This patch papers over the problem by making any invalid free()
>>>silently fail which is something we should avoid.
>> Well, It's not to fail sliently. The memory block resident in
>> bss/data can't simply free'd. I agree it's reasonable to avoid freeing
>> the memory chunks, which in unmanaged from source. If you're going
>> to do that, this patch can be dropped sliently. On the other hand,
>> I also found the last skiboot can't boot and it's related to get_hb_reserved_me().
>> Everything becomes fine when it's simply commented out (or skipped). It seems the
>> memory blocks (in device-tree) that should have reserved by kernel, aren't
>> reserved properly. It leads to data/memory corruption eventually. I'm
>> thinking the correct thing to do is mering the memory region with the
>> overlapped one in mem_region.c.
>In this situation merging the overlapping reservations is fine, but we
>can't rely on that being true in general. We've had problems in the
>past with reservations for the OCC being overlapped with those for the
>hostboot runtime services. In those situations I would prefer we just
>crashed early rather than silently corrupting state.
Yes, It makes sense. This patch can be dropped.
More information about the Skiboot