PowerPC radeon KMS - is it possible?

Wed Apr 18 23:27:58 EST 2012

On Mit, 2012-04-18 at 21:17 +1000, Benjamin Herrenschmidt wrote: 
> On Wed, 2012-04-18 at 12:34 +0200, Michel Dänzer wrote:
> > On Mit, 2012-04-18 at 20:20 +1000, Benjamin Herrenschmidt wrote: 
> > > On Wed, 2012-04-18 at 10:02 +0200, Michel Dänzer wrote:
> > > > 
> > > > > GPU lockup appears to be a common problem with the radeon driver.
> > > > 
> > > > It's what happens when anything goes wrong with the GPU. If it doesn't
> > > > happen with agpmode=-1, it's probably an AGP related coherency issue. 
> > > 
> > > I had some success hacking the DRM to do an in_le32 from the ring head
> > > after writing it. Just a gross hack but it seemed to help on a G5.
> > 
> > AFAICT radeon_ring_commit() does that already:
> > 
> >         DRM_MEMORYBARRIER();
> >         WREG32(ring->wptr_reg, (ring->wptr << ring->ptr_reg_shift) & ring->ptr_reg_mask);
> >         (void)RREG32(ring->wptr_reg);
> > 
> > We added the readback about a decade ago. :)
> 
> Hrm, I have a different hack in that old tree I was playing with a while
> back, let me see...
> 
> --- a/drivers/gpu/drm/radeon/radeon_cp.c
> +++ b/drivers/gpu/drm/radeon/radeon_cp.c

Note that radeon_cp.c is UMS code, for KMS you need to look at
radeon_ring.c.

> @@ -2245,6 +2245,9 @@ void radeon_commit_ring(drm_radeon_private_t
> *dev_priv)
>         DRM_MEMORYBARRIER();
>         GET_RING_HEAD( dev_priv );
>  
> +#ifdef CONFIG_PPC
> +       in_be32(dev_priv->ring.start);
> +#endif
>         if ((dev_priv->flags & RADEON_FAMILY_MASK) >= CHIP_R600) {
> 
> 
> I think that my rational was to ensure that all previous stores to
> AGP (indirect buffers etc...) were pushed out & ordered vs the ring
> wptr update or something like that, bcs I think those path aren't well
> ordered in HW. In fact I suspect we might even need a bigger hammer than
> that in_be32().

Probably wouldn't hurt trying something like that in the KMS code as
well.

> Another hack I had around was removing the SBA reset from agp-uninorth
> completely on binding new pages, it seemed to cause hangs.

You mean like commit 5613beb46d54da6ef7f1c4589e9f2e60eeb10721? :)

> > > I suspect there's a fundamental design issue with apple bridge in that
> > > the CPU to memory path isn't coherent at all with the GPU to memory path
> > > ie. even vs. cache flush instructions (ie buffers in the memory
> > > controllers can still be out of sync).
> > > 
> > > Darwin does some gross hacks to work around that, some of them visible
> > > in the AGP drivers, some burried in the Apple driver, I don't know for
> > > sure. It's possible that they end up mapping all AGP memory as cache
> > > inhibited, but we can't do that because of our linear mapping.
> > 
> > We are doing that though...
> 
> Are we really ? I thought we were taking existing cachable RAM objects
> and mapping them into the AGP gart.

No, the radeon driver has always mapped memory uncacheable to the CPU
while it's bound into the AGP GART.

> Are we replacing both kernel & user mappings for those objects with an
> equivalent cache inhibited mapping ? 
> 
> I'm not -that- familiar with how ttm works here.

I'm hardly more familiar with how it all works than you. :)

> In any case it can cause bus checkstops because the same pages can be
> prefetched into the cache via the linear mapping which is covered by
> BATs

So you've been saying for about a decade. :) But I've never seen any
problems tracked down to that.

> (unless you make your graphic objects HIGHMEM only but good luck with
> that :-)

FWIW I think TTM indeed prefers highmem pages for GPU access. The radeon
driver normally doesn't need kernel mappings for them.

-- 
Earthling Michel Dänzer           |                   http://www.amd.com
Libre software enthusiast         |          Debian, X and DRI developer