[Skiboot] [PATCH] core/lock: Add deadlock detection
npiggin at gmail.com
Fri Jun 23 18:24:12 AEST 2017
On Fri, 23 Jun 2017 17:14:34 +1000
Oliver <oohall at gmail.com> wrote:
> Hey nick, you're a concurrency wizard can you review this?
I could try.
> On Fri, Jun 23, 2017 at 2:35 PM, Matt Brown <matthew.brown.dev at gmail.com> wrote:
> > This adds simple deadlock detection.
> > The detection looks for circular dependencies in the lock requests.
> > It will abort and display a stack trace when a deadlock occurs.
> > The detection will reduce performance so only use for debugging.
> > To use, enable DEBUG_DEADLOCKS in include/config.h (disabled by default).
> It might be useful if we can get stack traces for every thread
> involved in the deadlock. You might be able
> to use the NMI injection that nick was working on to wake up
> individual threads to get them to print the stack
Would probably be a good idea. We'll have to somehow bribe or
Alistair to do the NMI injection firmware though :)
> > Signed-off-by: Matt Brown <matthew.brown.dev at gmail.com>
> > ---
> > core/lock.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> > include/config.h | 3 ++
> > include/lock.h | 5 ++++
> > 3 files changed, 90 insertions(+), 1 deletion(-)
Hmm, okay so the algorithm is follows, if a CPU can't get a lock
immediately, it registers that it is waiting for the lock in
lock_table, and then checks if the lock owner of that lock is waiting
for another lock, etc. and if it reaches back to us it's a deadlock.
Seems like a reasonable algorithm.
I think the deadlock detection algorithm itself needs to be
protected from concurrency. You could have a lock owner release
its lock and later try to grab the one you hold while the check
is running, I think?
An issue with this type of detection is that it doesn't catch a lot
in testing. I mean only actual deadlocks. Probably better than
nothing, but what would be really cool is to dynamically build the
observed relationship between locks (aka lockdep aka witness).
Then you can complain if you have ever seen locks being taken in
the wrong order.
But I don't want to say "don't do this because we can spend a lot
more time and work to do something else" :) This could make a good
start for lock debugging. A basic timeout warning for acquiring a
lock wouldn't be a bad idea either (and may catch some different
More information about the Skiboot