[PATCH 00/11] fix memory leak while kset_register() fails
Yang Yingliang
yangyingliang at huawei.com
Fri Oct 21 20:23:04 AEDT 2022
On 2022/10/21 16:41, Luben Tuikov wrote:
> On 2022-10-21 04:24, Luben Tuikov wrote:
>> On 2022-10-21 04:18, Greg KH wrote:
>>> On Fri, Oct 21, 2022 at 03:55:18AM -0400, Luben Tuikov wrote:
>>>> On 2022-10-21 01:37, Greg KH wrote:
>>>>> On Fri, Oct 21, 2022 at 01:29:31AM -0400, Luben Tuikov wrote:
>>>>>> On 2022-10-20 22:20, Yang Yingliang wrote:
>>>>>>> The previous discussion link:
>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F0db486eb-6927-927e-3629-958f8f211194%40huawei.com%2FT%2F&data=05%7C01%7Cluben.tuikov%40amd.com%7Cd41da3fd6449492d01f808dab33cdb75%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638019371236833115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=C%2Bj1THkHpzVGks5eqB%2Fm%2FPAkMRohR7CYvRnOCqUqdcM%3D&reserved=0
>>>>>> The very first discussion on this was here:
>>>>>>
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Fdri-devel%2Fmsg368077.html&data=05%7C01%7Cluben.tuikov%40amd.com%7Cd41da3fd6449492d01f808dab33cdb75%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638019371236833115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pSR10abmK8nAMvKSezqWC0SPUBL4qEwtCCizyIKW7Dc%3D&reserved=0
>>>>>>
>>>>>> Please use this link, and not the that one up there you which quoted above,
>>>>>> and whose commit description is taken verbatim from the this link.
>>>>>>
>>>>>>> kset_register() is currently used in some places without calling
>>>>>>> kset_put() in error path, because the callers think it should be
>>>>>>> kset internal thing to do, but the driver core can not know what
>>>>>>> caller doing with that memory at times. The memory could be freed
>>>>>>> both in kset_put() and error path of caller, if it is called in
>>>>>>> kset_register().
>>>>>> As I explained in the link above, the reason there's
>>>>>> a memory leak is that one cannot call kset_register() without
>>>>>> the kset->kobj.name being set--kobj_add_internal() returns -EINVAL,
>>>>>> in this case, i.e. kset_register() fails with -EINVAL.
>>>>>>
>>>>>> Thus, the most common usage is something like this:
>>>>>>
>>>>>> kobj_set_name(&kset->kobj, format, ...);
>>>>>> kset->kobj.kset = parent_kset;
>>>>>> kset->kobj.ktype = ktype;
>>>>>> res = kset_register(kset);
>>>>>>
>>>>>> So, what is being leaked, is the memory allocated in kobj_set_name(),
>>>>>> by the common idiom shown above. This needs to be mentioned in
>>>>>> the documentation, at least, in case, in the future this is absolved
>>>>>> in kset_register() redesign, etc.
>>>>> Based on this, can kset_register() just clean up from itself when an
>>>>> error happens? Ideally that would be the case, as the odds of a kset
>>>>> being embedded in a larger structure is probably slim, but we would have
>>>>> to search the tree to make sure.
>>>> Looking at kset_register(), we can add kset_put() in the error path,
>>>> when kobject_add_internal(&kset->kobj) fails.
>>>>
>>>> See the attached patch. It needs to be tested with the same error injection
>>>> as Yang has been doing.
>>>>
>>>> Now, struct kset is being embedded in larger structs--see amdgpu_discovery.c
>>>> starting at line 575. If you're on an AMD system, it gets you the tree
>>>> structure you'll see when you run "tree /sys/class/drm/card0/device/ip_discovery/".
>>>> That shouldn't be a problem though.
>>> Yes, that shouldn't be an issue as the kobject embedded in a kset is
>>> ONLY for that kset itself, the kset structure should not be controling
>>> the lifespan of the object it is embedded in, right?
>> Yes, and it doesn't. It only does a kobject_get(parent) and kobject_put(parent).
>> So that's fine and natural.
>>
>> Yang, do you want to try the patch in my previous email in this thread, since you've
>> got the error injection set up already?
> I spoke too soon. I believe you're onto something, because looking at the idiom:
>
> kobj_set_name(&kset->kobj, format, ...);
> kset->kobj.kset = parent_kset;
> kset->kobj.ktype = ktype;
> res = kset_register(kset);
>
> The ktype defines a release method, which frees the larger struct the kset is embedded in.
> And this would result in double free, for instance in the amdgpu_discovery.c code, if
> kset_put() is called after kset_register() fails, since we kzalloc the larger object
> just before and kfree it on error just after. Ideally, we'd only "revert" the actions
> done by kobj_set_name(), as there's some error recovery on create_dir() in kobject_add_internal().
>
> So, we cannot do this business with the kset_put() on error from kset_register(), after all.
> Not sure how this wasn't caught in Yang's testing--the kernel should've complained.
I have already tried the change that in your posted patch when I was
debugging this issue
before sent these fixes, because it may lead double free in some cases,
and I had mentioned
it in this mail:
https://lore.kernel.org/lkml/0db486eb-6927-927e-3629-958f8f211194@huawei.com/T/#m68eade1993859dfc6c3d14d35f988d35a32ef837
Thanks,
Yang
>
> Regards,
> Luben
>
> .
More information about the Linux-erofs
mailing list