[RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking

Thu Jun 23 04:40:42 AEST 2016

On Wed, Jun 22, 2016 at 11:26:21AM -0700, Andy Lutomirski wrote:
> On Wed, Jun 22, 2016 at 11:22 AM, Josh Poimboeuf <jpoimboe at redhat.com> wrote:
> > On Wed, Jun 22, 2016 at 10:59:23AM -0700, Andy Lutomirski wrote:
> >> > So I got a chance to look at this some more.  I'm thinking that to make
> >> > this feature more consistently useful, we shouldn't only annotate
> >> > pt_regs frames for calls to handlers; other calls should be annotated as
> >> > well: preempt_schedule_irq, CALL_enter_from_user_mode,
> >> > prepare_exit_to_usermode, SWAPGS, TRACE_IRQS_OFF, DISABLE_INTERRUPTS,
> >> > etc.  That way, the unwinder will always be able to find pt_regs from an
> >> > interrupt/exception, even if starting from one of these other calls.
> >> >
> >> > But then, things get ugly.  You have to either setup and tear down the
> >> > frame for every possible call, or do a higher-level setup/teardown
> >> > across multiple calls, which invalidates several assumptions in the
> >> > entry code about the location of pt_regs on the stack.
> >> >
> >> > Also problematic is that several of the macros (like TRACE_IRQS_IRETQ)
> >> > make assumptions about the location of pt_regs.  And they're used by
> >> > both syscall and interrupt code.  So if we didn't create a frame pointer
> >> > header for syscalls, we'd basically need two versions of the macros: one
> >> > for irqs/exceptions and one for syscalls.
> >> >
> >> > So I think the cleanest way to handle this is to always allocate two
> >> > extra registers on the stack in ALLOC_PT_GPREGS_ON_STACK.  Then all
> >> > entry code can assume that pt_regs is at a constant location, and all
> >> > the above problems go away.  Another benefit is that we'd only need two
> >> > saves instead of three -- the pointer to pt_regs is no longer needed
> >> > since pt_regs is always immediately after the frame header.
> >> >
> >> > I worked up a patch to implement this -- see below.  It writes the frame
> >> > pointer in all entry paths, including syscalls.  This helps keep the
> >> > code simple.
> >> >
> >> > The downside is a small performance penalty: with getppid()-in-a-loop on
> >> > my laptop, the average syscall went from 52ns to 53ns, which is about a
> >> > 2% slowdown.  But I doubt it would be measurable in a real-world
> >> > workload.
> >> >
> >> > It looks like about half the slowdown is due to the extra stack
> >> > allocation (which presumably adds a little d-cache pressure on the stack
> >> > memory) and the other half is due to the stack writes.
> >> >
> >> > I could remove the writes from the syscall path but it would only save
> >> > about half a ns, and it would make the code less robust.  Plus it's nice
> >> > to have the consistency of having *all* pt_regs frames annotated.
> >>
> >> This is a bit messy, and I'm not really sure that the entry code
> >> should be have to operate under constraints like this.  Also,
> >> convincing myself this works for NMI sounds unpleasant.
> >>
> >> Maybe we should go back to my idea of just listing the call sites in a table.
> >
> > So are you suggesting something like:
> >
> >         .macro ENTRY_CALL func pt_regs_offset=0
> >         call \func
> > 1:      .pushsection .entry_calls, "a"
> >         .long 1b - .
> >         .long \pt_regs_offset
> >         .popsection
> >         .endm
> >
> > and then change every call in the entry code to ENTRY_CALL?
> 
> Yes, exactly, modulo whether the section name is good.  hpa is
> probably the authority on that.

Well, as you probably know, I don't really like peppering ENTRY_CALL
everywhere. :-/

Also I wonder how we could annotate the hypercalls, for example
DISABLE_INTERRUPTS actually wraps the call in a push/pop pair.

-- 
Josh