[PATCH v6 37/37] cxlflash: Fix to avoid bypassing context cleanup
Andrew Donnellan
andrew.donnellan at au1.ibm.com
Thu Oct 22 13:01:57 AEDT 2015
On 22/10/15 07:16, Matthew R. Ochs wrote:
> Contexts may be skipped over for cleanup in situations where contention
> for the adapter's table-list mutex is experienced in the presence of a
> signal during the execution of the release handler.
>
> This can lead to two known issues:
>
> - A hang condition on remove as that path tries to wait for users to
> cleanup - something that will never complete should this scenario play
> out as the user has already cleaned up from their perspective.
>
> - An Oops in the unmap_mapping_range() call that is made as part of
> the user waiting mechanism that is invoked on remove when contexts
> are found to still exist.
>
> The root cause of this issue can be found in get_context() and how the
> table-list mutex is acquired. As this code path is shared by several
> different access points within the driver, a decision was made during
> the development cycle to acquire this mutex in this location using the
> interruptible version of the mutex locking service. In almost all of
> the use-cases and environmental scenarios this holds up, even when the
> mutex is contended. However, for critical system threads (such as the
> release handler), failing to acquire the mutex and bailing with the
> intention of the user being able to try again later is unacceptable.
>
> In such a scenario, the context _must_ be derived as it is on an
> irreversible path to being freed. Without being able to derive the
> context, the code mistakenly assumes that it has already been freed
> and proceeds to free up the underlying CXL context resources. From
> this point on, any usage of [the now stale] CXL context resources
> will result in undefined behavior. This is root cause of the Oops
> mentioned as the second known issue as the mapping passed to the
> unmap_mapping_range() service is owned by the CXL context.
>
> To fix this problem, acquisition of the table-list mutex within
> get_context() is simply changed to use the uninterruptible version
> of the mutex locking service. This is safe as the timing windows for
> holding this mutex are short and also protected against blocking.
>
> Signed-off-by: Matthew R. Ochs <mrochs at linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan at au1.ibm.com>
--
Andrew Donnellan Software Engineer, OzLabs
andrew.donnellan at au1.ibm.com Australia Development Lab, Canberra
+61 2 6201 8874 (work) IBM Australia Limited
More information about the Linuxppc-dev
mailing list