[PATCH][RFC] preempt_count corruption across H_CEDE call with CONFIG_PREEMPT on pseries
Darren Hart
dvhltc at us.ibm.com
Fri Jul 23 14:44:04 EST 2010
On 07/22/2010 04:57 PM, Darren Hart wrote:
> On 07/22/2010 03:25 PM, Benjamin Herrenschmidt wrote:
>> On Thu, 2010-07-22 at 11:24 -0700, Darren Hart wrote:
>>>
>>> 1) How can the preempt_count() get mangled across the H_CEDE hcall?
>>> 2) Should we call preempt_enable() in cpu_idle() prior to cpu_die() ?
>>
>> The preempt count is on the thread info at the bottom of the stack.
>>
>> Can you check the stack pointers ?
>
> Hi Ben, thanks for looking.
>
> I instrumented the area around extended_cede_processor() as follows
> (please confirm I'm getting the stack pointer correctly).
>
> while (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
> asm("mr %0,1" : "=r" (sp));
> printk("before H_CEDE current->stack: %lx, pcnt: %x\n", sp, preempt_count());
> extended_cede_processor(cede_latency_hint);
> asm("mr %0,1" : "=r" (sp));
> printk("after H_CEDE current->stack: %lx, pcnt: %x\n", sp, preempt_count());
> }
>
>
> On Mainline (2.6.33.6, CONFIG_PREEMPT=y) I see this:
> Jul 22 18:37:08 igoort1 kernel: before H_CEDE current->stack: c00000010e9e3ce0, pcnt: 1
> Jul 22 18:37:08 igoort1 kernel: after H_CEDE current->stack: c00000010e9e3ce0, pcnt: 1
>
> This surprised me as preempt_count is 1 before and after, so no
> corruption appears to occur on mainline. This makes the pcnt of 65 I see
> without the preempt_count()=0 hack very strange. I ran several hundred
> off/on cycles. The issue of preempt_count being 1 is still addressed by
> this patch however.
>
> On PREEMPT_RT (2.6.33.5-rt23 - tglx, sorry, rt/2.6.33 next time, promise):
> Jul 22 18:51:11 igoort1 kernel: before H_CEDE current->stack: c000000089bcfcf0, pcnt: 1
> Jul 22 18:51:11 igoort1 kernel: after H_CEDE current->stack: c000000089bcfcf0, pcnt: ffffffff
>
> In both cases the stack pointer appears unchanged.
>
> Note: there is a BUG triggered in between these statements as the
> preempt_count causes the printk to trigger:
> Badness at kernel/sched.c:5572
At Steven's suggestion I updated the instrumentation to display the
local_save_flags and irqs_disabled_flags:
while (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
local_save_flags(flags);
printk("local flags: %lx, irqs disabled: %d\n", flags, irqs_disabled_flags(flags));
asm("mr %0,1" : "=r" (sp));
printk("before H_CEDE current->stack: %lx, pcnt: %x\n", sp, preempt_count());
extended_cede_processor(cede_latency_hint);
asm("mr %0,1" : "=r" (sp));
printk("after H_CEDE current->stack: %lx, pcnt: %x\n", sp, preempt_count());
}
Jul 22 23:36:58 igoort1 kernel: local flags: 0, irqs disabled: 1
Jul 22 23:36:58 igoort1 kernel: before H_CEDE current->stack: c00000010e9e3ce0, pcnt: 1
Jul 22 23:36:58 igoort1 kernel: after H_CEDE current->stack: c00000010e9e3ce0, pcnt: 1
I'm not sure if I'm reading that right, but I believe interrupts are
intended to be disabled here. If accomplished via the
spin_lock_irqsave() this would behave differently on RT. However, this
path disables the interrupts handled by xics, all but the IPIs anyway.
On RT I disabled the decrementer as well.
Is it possible for RT to be receiving other interrupts here?
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
More information about the Linuxppc-dev
mailing list