Apparent kernel bug with GDB on ppc405
Grant Likely
grant.likely at secretlab.ca
Thu Oct 25 08:27:52 EST 2007
On 10/24/07, Matt Mackall <mpm at selenic.com> wrote:
> On Wed, Oct 24, 2007 at 03:42:16PM -0500, Matt Mackall wrote:
> > On Wed, Oct 24, 2007 at 02:28:14PM -0600, Grant Likely wrote:
> > > On 10/24/07, Matt Mackall <mpm at selenic.com> wrote:
> > > > I'm trying to debug a trivial statically-linked hello world program on
> > > > a Xilinx PPC 405 and I'm seeing the following behavior:
> > > >
> > > <snip>
> > > >
> > > > Any suggestions?
> > >
> > > http://thread.gmane.org/gmane.linux.ports.ppc.embedded/11202
> > >
> > > I was fighting with a similar problem almost 2 years ago. Looks like
> > > it might be related. At some point the problem seemed to go away and
> > > I determined what the root cause was. :-(
> > >
> > > I haven't been using gdb lately, so I don't know if it's the same
> > > problem. Nobody I had talked to had seen the issue on other 405
> > > platforms. It could very well be something virtex-specific.
> >
> > Could be the same problem, but I'm seeing only your symptom 3 so far.
> >
> > I've tried throwing some larger hammers at the problem. Flushing all
> > of the dcache and icache (flush_dcache_all and
> > flush_instruction_cache) isn't helping. But printk(".") does!
>
> Well there was one remaining cache - the TLB. This patch seems to make
> things work, but don't ask me why:
>
> --- include/asm-ppc/cacheflush.h (revision 10439)
> +++ include/asm-ppc/cacheflush.h (working copy)
> @@ -11,6 +11,7 @@
> #define _PPC_CACHEFLUSH_H
>
> #include <linux/mm.h>
> +#include <asm/tlbflush.h>
>
> /*
> * No cache flushing is required when address mappings are
> @@ -35,10 +36,23 @@
> extern void flush_icache_user_range(struct vm_area_struct *vma,
> struct page *page, unsigned long addr, int len);
>
> #define copy_to_user_page(vma, page, vaddr, dst, src, len) \
> do { memcpy(dst, src, len); \
> flush_icache_user_range(vma, page, vaddr, len); \
> + _tlbia(); \
> } while (0)
Hmmm; thinking out loud here...
- so tlbia invalidates all TLB entries
- When gdb inserts a breakpoint the .text pages are marked as read
only, so the kernel does a copy on write so that gdb can modify the
instruction. The kernel also updates the page tables so that the test
process now uses the new page.
- This means that there are now 2 pages for that one section of
executable code; the original and the one with the breakpoint.
- However, the program is still in memory, and there is probably
already a TLB entry pointing to the original page for that range of
addresses.
Could it be that the kernel page tables are getting updated to the new
page; but active set of TLB entries is not getting updated?
If so, then printk(".") probably solves the problem simply because it
touches enough pages in its execution path that the old TLB entry gets
overwritten? There are only 64 TLB entries afterall.
Thoughts?
g.
--
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.
grant.likely at secretlab.ca
(403) 399-0195
More information about the Linuxppc-embedded
mailing list