[Cbe-oss-dev] [RFC/PATCH] ps3vram: new driver for accessing PS3 video RAM as MTD

Vivien Chappelier vivien.chappelier at free.fr
Mon Apr 28 03:45:55 EST 2008


Hi Geert and Jim,

On Fri, Apr 25, 2008 at 10:45:14PM -0400, Jim Paris wrote:
> Having it properly divided between ps3fb and ps3vram would be nice.
> But it seems strange to me that lv1_gpu_memory_allocate doesn't
> already do that -- a memory allocator that returns overlapping memory?

Earlier versions of ps3fb used to allocate zero bytes of video memory. So the next context would start at the beginning of VRAM too. Newer versions of ps3fb allocate video memory correctly (a zero allocation would be rejected by FW >= 2.10 anyway), so the extra skipping should not be needed anymore. Initially, the module was aimed at being compatible with a wide range of older kernels; for submission we should remove this if not needed. I think it is better that each module handles it VRAM memory allocation independently, rather than having ps3fb do it.

> > Currently your cache is 7 x 256 KiB, allocated as one big block? Ps3fb and
> > ps3flash faced a similar problem, and preallocate memory in
> > arch/powerpc/platforms/ps3/setup.c.
> > 
> > You could reduce memory pressure by allocating individuals blocks of 256 KiB
> > instead.
> 
> We'd need to iomap each block individually to the GPU which I presume
> would need an separate context/fifo for each one?  I am not too
> familiar with this aspect of the code.  Unless we can just map all
> system RAM to the GPU in one go?

The problem is lv1_gpu_context_iomap() can only map memory by chunks of aligned multiples of 1MB (unless there exists some additional flags to this call that we are not yet aware of). So allocating individual 256KB blocks would not work unless they are all 1MB aligned. This is probably be easier to obtain than a contiguous region of 2MB physical memory, but I don't know of a simple way to ask that from Linux.
My initial idea was to map the whole RAM too, but this does not work either as the HV will reject mapping of more than 128MB. This might come from the fact that 'physical' memory as seen by Linux is actually allocated from the HV in two sets, one big chunk of 128MB 'real' memory, plus a second allocation for the rest.


> > Is there a special reason you chose 256 KiB for the cache blocks, e.g.
> > performance benchmarking?

Yes, DMA is more efficient on large blocks. Here is the performance I obtain on my PS3 (FW 2.01, 2.6.24 + initial patch):

dd if=/dev/zero of=/dev/mtdblock0 bs=1M count=229 oflag=direct
dd if=/dev/mtdblock0 of=/dev/null bs=1M count=229 iflag=direct

CACHE_PAGE_SIZE   32kB    64kB     128kB    256kB
write           117 MB/s 153 MB/s 181 MB/s 199 MB/s
read             78 MB/s 109 MB/s 135 MB/s 156 MB/s

So it is a compromise between DMA efficiency and number of chunks for random access, but it could be made configurable.

> > Would it be possible to have this memory non-contiguous and use vmalloc(), i.e.
> > can the RSX 2D accel engine do scatter/gather, or can the copy be split in
> > multiple commands that each copy one or more 4 KiB pages?

This would be possible if we found a way to remap the whole Linux memory to the GPU. It would also be more efficient as we would not have to copy memory to and from the bounce buffer. Unfortunately, I don't know how to do that.


> > > +	if ((priv->fifo_ptr - priv->fifo_base) * 4 > FIFO_SIZE - 1024) {
> > 
> > Where does the 1024 come from? Perhaps DMA_NOTIFIER_OFFSET_BASE?

No, this is unrelated. For some reason the GPU will crash if the FIFO buffer is used entirely. I don't know exactly why, maybe some prefetching.. Anyway, we have to issue the rewind command before the end of the FIFO, and 1024 is an empirical value, leaving 63kB out of 64kB for commands.

Thanks to Geert for your review, and to Jim for cleaning up and integrating my RSX changes, let me know if you have any further questions,

regards,
Vivien.



More information about the cbe-oss-dev mailing list