ptrace on linux 2.6.12 causes oops

Fri Jul 15 15:03:12 EST 2005

Hi Anton,

On Fri, Jul 15, 2005 at 11:42:27AM +0200, Anton Wöllert wrote:
> > 
> > Yep, just that now its the ptraceing process which is faulting in the 
> > page,
> > instead of the (ptraced) process itself.
> > 
> > So Anton, can you move the _tlbie() call up to
> > 
> > && !test_bit(PG_arch_1, &page->flags)) {
> > <---------- HERE
> > if (vma->vm_mm == current->active_mm)
> > __flush_dcache_icache((void *) address);
> > else
> > flush_dcache_icache_page(page);
> > set_bit(PG_arch_1, &page->flags);
> > 
> > So that it covers both cases instead of just (vma->vm_mm == 
> > current->active_mm) ?
> > 
> > Its safe to do it because the address space ID is ignored by tlbie 
> > accordingly
> > to the manual page:
> > 
> > The ASID value in the entry is ignored for the purpose of
> > matching an invalidate address, thus multiple entries can be invalidated
> > if they have the same effective address and different ASID values.
> 
> 
> 
> Well, unfortunately, that doesn't work :(. 

Oh doh, on the case where the process faulting in the page is not the owner of the
vma (which is the case with ptrace here), the flushing is done via the physical address.

So its expected that the _tlbie() on the process virtual address will not change 
a thing, since flush_dcache_icache_page() is working on the physical address.

> If i'm right, the 
> __flush_dcache_icache((void *) address) should avoid that the cache says 
> faulting address again.
> The flush_dcache_icache_page(page) should flush the cache, where stands, 
> page not mapped. but the flush_dcache_icache_page(page) oopses on my system. 
> but instead of this call, the call __flush_dcache_icache(page_address(page)) 
> works. for me, that also makes more sence. 

Well, you end up flushing the page through its kernel virtual address mapping (returned
by page_address()). 

It will be necessary for the kernel virtual map to be faulted in the TLB, which is 
certainly slower than doing the direct physical address flush (requires updating 
the MSR twice (+isync) to turn the MMU on/off).

But other than that I see no problem with your suggestion really, as to workaround the 
oops.

Still, we should try to understand what is going on...

> and also, the flush_dcache_icache_page(page) calls the flush_dcache_icache_phys, which 
> turns off the data virtual address mapping. i found that a bit strange. any 
> comments? 

It seems that the instruction at "__flush_dcache_icache_phys+0x38" is icbi - can you send 
the disassembly of __flush_dcache_icache_phys?  

We should figure out what is causing the oops. 

What is causing DAR to be set to "00000010"? ie why the hell is it trying to access 
the 00000010 address.

Oops: kernel access of bad area, sig: 11 [#2]
NIP: C000543C LR: C000B060 SP: C0F35DF0 REGS: c0f35d40 TRAP: 0300 Not tainted
MSR: 00009022 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 10
DAR: 00000010, DSISR: C2000000
TASK = c0ea8430[761] 'gdbserver' THREAD: c0f34000
Last syscall: 26
GPR00: 00009022 C0F35DF0 C0EA8430 00F59000 00000100 FFFFFFFF 00F58000
00000001
GPR08: C021DAEF C0270000 00009032 C0270000 22044024 10025428 01000800
00000001
GPR16: 007FFF3F 00000001 00000000 7FBC6AC0 00F61022 00000001 C0839300
C01E0000
GPR24: 00CD0889 C082F568 3000AC18 C02A7A00 C0EA15C8 00F588A9 C02ACB00
C02ACB00
NIP [c000543c] __flush_dcache_icache_phys+0x38/0x54
LR [c000b060] flush_dcache_icache_page+0x20/0x30