[Skiboot] [PATCH] core/lock: Add deadlock detection

Matt Brown matthew.brown.dev at gmail.com
Fri Jul 7 13:01:30 AEST 2017


On Fri, Jun 23, 2017 at 6:24 PM, Nicholas Piggin <npiggin at gmail.com> wrote:
> On Fri, 23 Jun 2017 17:14:34 +1000
> Oliver <oohall at gmail.com> wrote:
>
>> Hey nick, you're a concurrency wizard can you review this?
>
> I could try.
>
>> On Fri, Jun 23, 2017 at 2:35 PM, Matt Brown <matthew.brown.dev at gmail.com> wrote:
>> > This adds simple deadlock detection.
>> > The detection looks for circular dependencies in the lock requests.
>> > It will abort and display a stack trace when a deadlock occurs.
>> > The detection will reduce performance so only use for debugging.
>> > To use, enable DEBUG_DEADLOCKS in include/config.h (disabled by default).
>>
>> It might be useful if we can get stack traces for every thread
>> involved in the deadlock. You might be able
>> to use the NMI injection that nick was working on to wake up
>> individual threads to get them to print the stack
>> trace.
>
> Would probably be a good idea. We'll have to somehow bribe or
> Alistair to do the NMI injection firmware though :)
>
>> > Signed-off-by: Matt Brown <matthew.brown.dev at gmail.com>
>> > ---
>> >  core/lock.c      | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>> >  include/config.h |  3 ++
>> >  include/lock.h   |  5 ++++
>> >  3 files changed, 90 insertions(+), 1 deletion(-)
>
> Hmm, okay so the algorithm is follows, if a CPU can't get a lock
> immediately, it registers that it is waiting for the lock in
> lock_table, and then checks if the lock owner of that lock is waiting
> for another lock, etc. and if it reaches back to us it's a deadlock.
>
> Seems like a reasonable algorithm.
>
> I think the deadlock detection algorithm itself needs to be
> protected from concurrency. You could have a lock owner release
> its lock and later try to grab the one you hold while the check
> is running, I think?
>
Yeah you're right. I just submitted a v3 which stops other cpu's from
getting locks while you are checking for the deadlock. (forgot to add
this to v2)
There isn't any significant overhead from the mutual exclusion, which
was my main concern.


> An issue with this type of detection is that it doesn't catch a lot
> in testing. I mean only actual deadlocks. Probably better than
> nothing, but what would be really cool is to dynamically build the
> observed relationship between locks (aka lockdep aka witness).
> Then you can complain if you have ever seen locks being taken in
> the wrong order.
>
> But I don't want to say "don't do this because we can spend a lot
> more time and work to do something else" :) This could make a good
> start for lock debugging. A basic timeout warning for acquiring a
> lock wouldn't be a bad idea either (and may catch some different
> bugs).
>
Also added a simple patch to print a timeout warning and backtrace on
locks which wait for > 10 secs (pretty arbitrary value, but the xscom
and uart reads can take 6-8 seconds in my testing).

Thanks,
Matt

> Thanks,
> Nick


More information about the Skiboot mailing list