PowerPC radeon KMS - is it possible?

Wed Apr 18 21:17:57 EST 2012

On Wed, 2012-04-18 at 12:34 +0200, Michel Dänzer wrote:
> On Mit, 2012-04-18 at 20:20 +1000, Benjamin Herrenschmidt wrote: 
> > On Wed, 2012-04-18 at 10:02 +0200, Michel Dänzer wrote:
> > > 
> > > > GPU lockup appears to be a common problem with the radeon driver.
> > > 
> > > It's what happens when anything goes wrong with the GPU. If it doesn't
> > > happen with agpmode=-1, it's probably an AGP related coherency issue. 
> > 
> > I had some success hacking the DRM to do an in_le32 from the ring head
> > after writing it. Just a gross hack but it seemed to help on a G5.
> 
> AFAICT radeon_ring_commit() does that already:
> 
>         DRM_MEMORYBARRIER();
>         WREG32(ring->wptr_reg, (ring->wptr << ring->ptr_reg_shift) & ring->ptr_reg_mask);
>         (void)RREG32(ring->wptr_reg);
> 
> We added the readback about a decade ago. :)

Hrm, I have a different hack in that old tree I was playing with a while
back, let me see...

--- a/drivers/gpu/drm/radeon/radeon_cp.c
+++ b/drivers/gpu/drm/radeon/radeon_cp.c
@@ -2245,6 +2245,9 @@ void radeon_commit_ring(drm_radeon_private_t
*dev_priv)
        DRM_MEMORYBARRIER();
        GET_RING_HEAD( dev_priv );
 
+#ifdef CONFIG_PPC
+       in_be32(dev_priv->ring.start);
+#endif
        if ((dev_priv->flags & RADEON_FAMILY_MASK) >= CHIP_R600) {


I think that my rational was to ensure that all previous stores to
AGP (indirect buffers etc...) were pushed out & ordered vs the ring
wptr update or something like that, bcs I think those path aren't well
ordered in HW. In fact I suspect we might even need a bigger hammer than
that in_be32().

Another hack I had around was removing the SBA reset from agp-uninorth
completely on binding new pages, it seemed to cause hangs.

> > I suspect there's a fundamental design issue with apple bridge in that
> > the CPU to memory path isn't coherent at all with the GPU to memory path
> > ie. even vs. cache flush instructions (ie buffers in the memory
> > controllers can still be out of sync).
> > 
> > Darwin does some gross hacks to work around that, some of them visible
> > in the AGP drivers, some burried in the Apple driver, I don't know for
> > sure. It's possible that they end up mapping all AGP memory as cache
> > inhibited, but we can't do that because of our linear mapping.
> 
> We are doing that though...

Are we really ? I thought we were taking existing cachable RAM objects
and mapping them into the AGP gart. Are we replacing both kernel & user
mappings for those objects with an equivalent cache inhibited mapping ?

I'm not -that- familiar with how ttm works here. In any case it can
cause bus checkstops because the same pages can be prefetched into the
cache via the linear mapping which is covered by BATs (unless you make
your graphic objects HIGHMEM only but good luck with that :-)

To make that work reliably we should disable the BAT mapping so the
linear mapping can then be controlled on a per-page basis (on 32-bit)
and this is complicated .... we have code that more/less relies on the
BAT mapping being there elsewhere. On 64-bit it's even nastier because
we use 16M pages for the linear mapping.

Cheers,
Ben.