[Cbe-oss-dev] ps3vram performance

Siarhei Siamashka siarhei.siamashka at gmail.com
Thu Oct 15 12:00:54 EST 2009


On Wednesday 14 October 2009, Geoff Levand wrote:
> On 10/14/2009 02:01 AM, Siarhei Siamashka wrote:
> > On Wednesday 14 October 2009, Geoff Levand wrote:
> >> > On Wednesday 23 September 2009, Ken Werner wrote:
> >> >> It looks like the firmware from 2.7.6 to 3.0.1 update has no impact
> >> >> but the kernel version does.
> >>
> >> Just FYI, I have a kernel fix that I hope to release this week.  I need
> >> to do some more work to verify it before I release it though.
> >
> > OK, nice. Let me know if you need some external testing.
>
> I pushed out the fix to ps3-linux.git.  Please test and report.
>
>  
> http://git.kernel.org/?p=linux/kernel/git/geoff/ps3-linux-patches.git;a=blo
>b;f=ps3-wip/ps3-vram-speedup.patch;hb=HEAD

Thanks, it improves ps3vram throughput really a lot:

# dd if=/dev/zero of=/dev/ps3vram bs=1M oflag=direct
dd: writing `/dev/ps3vram': No space left on device
247+0 records in
246+0 records out
257949696 bytes (258 MB) copied, 1.07786 s, 239 MB/s

# dd if=/dev/ps3vram of=/dev/null bs=1M iflag=direct
246+0 records in
246+0 records out
257949696 bytes (258 MB) copied, 0.697402 s, 370 MB/s


Good to know that the problem has been pinpointed. Also looking at the patch
and current ps3vram implementation I got a few questions/ideas.

1. Would it make sense to change code to use udelay(1) or get rid of udelay
completely and check timeout based on jiffies like in mdelay loop?

If I understand it correctly, ps3vram driver is just spinning CPU and checking
the completion flag with 10usec period. Because of this granularity, ~5usec
are lost on average. Considering the read performance (~353MiB/s) and cache
page size 256KiB, we get ~1400 DMA reads completed per second, or ~700usec per
cache page. Based on these numbers, msleep(1) is quite wasteful because DMA
read operation should complete faster than 1ms on average. The 200usec timeout
was probably selected so that we almost never reach mdelay loop (700usec also
include the time spent outside ps3vram driver and the parts of ps3vram outside
this loop, like DMA setup, etc.), I have not yet checked whether it is true
though.

Does it even make any sense to keep mdelay loop?

2. Looks like it makes sense to try doing some DMA transfers for dirty cache
pages eviction "in the background", without waiting for their completion in
ps3vram.

It can be done by introducing something like a special eviction buffer. When
performing cache page eviction which requires pushing data to GPU memory, just
move it to the eviction buffer. And before exiting from ps3vram code, DMA
transfer can be started to actually push data from the eviction buffer to GPU
memory. This can be implemented without any extra memcpy by just assigning the
role of the eviction buffer to different cache pages or something like this.
In any case, at least one page should be preferably always free and ready for
allocation.

This stuff would require a bit more complex synchronization with ps3fb than
ps3_gpu_mutex.

-- 
Best regards,
Siarhei Siamashka


More information about the cbe-oss-dev mailing list