[Cbe-oss-dev] [PATCH] block/ps3: Fix slow VRAM IO

Akira Tsukamoto akirat at rd.scei.sony.co.jp
Fri Nov 13 13:03:18 EST 2009


Hello Andrew Morton,

Ping?

This patch is pretty important to improve the performance of PS3.
I really appreciate for your reply.

Thanks,

Akira

On Mon, 09 Nov 2009 15:40:42 +0900, 
Akira Tsukamoto <akirat at rd.scei.sony.co.jp> mentioned: 
> Thank you for the review!
> 
> > > The current PS3 VRAM driver uses msleep() to wait for completion
> > > of RSX DMA transfers between system memory and VRAM.  Depending
> > > on the system timing, the processing delay and overhead of this
> > > msleep() call can significantly impact VRAM driver IO.
> > > 
> > > To avoid the condition, add a short duration (200 usec max)
> > > udelay() polling loop before entering the msleep() polling
> > > loop.
> > > 
> > 
> > When raising a performance-based patch, please always try to include
> > before-and-after performance measurements in the changelog.  People
> > want to know the magnitude of the improvement.
> 
> No problem we will add the difference of improvement in the changelog.
> This is the results. Pretty impressive.
> Before
>   Reading:  33MB/s 
>   Writing:  16MB/s
> After
>   Reading: 370MB/s
>   Writing: 238MB/s
> 
> > > +		if (!notify[3])
> > > +			return 0;
> > > +		udelay(10);
> > > +	}
> > 
> > You might as well do a udelay(1) here.  The additional cost will be
> > negligible, and it will reduce latency.
> 
> Are you mentioning adding udelay(1) in the between udelay polling 
> and msleep polling? Or are you mentioning to change udelay(10) to udelay(1)
> inside the udelay polling?
> 
> The former is no problem, but the later has impact on performance of PS3 
> system.
> Because Cell/B.E.(consists of PPE and SPEs cores) and GPU are connected with 
> ring bus called EIB and every issuing notify[3] to check VRAM-DMA results 
> will generate data transfer to the bus. 
> There are only one EIB bus in PS3 and other devices connected on the bus
> such as SPEs will be affected if the bus is occupied by many notify[3] and
> as a result it will decrease the over all system performance.
> 
> The udelay(10) was the most reasonable distance not to overcrowd the bus 
> and not to wait too long for checking DMA on VRAM.
> We have tried udelay(5) but did not improve the VRAM IO speed.
> 
> > > +	timeout = jiffies + msecs_to_jiffies(timeout_ms);
> > 
> > The maximum latency is now timout_ms + 200usec.
> > 
> > That's OK with the current constants, but if someone later changes a
> > constant, the error could become significant.
> 
> Yes, I think so too. Probably reconstructing the design entirely based on 
> usec instead of msec might be ideal but adding 200usec loops fixes the
> current slow VRAM driver, so I thought it is acceptable work around.
> 
> > Perhaps that isn't worth bothering about though.
> > 
> > >  	do {
> > >  		if (!notify[3])
> 
> -- 
> Akira Tsukamoto
> Sony Computer Entertainment Inc. 
> Architecture Lab.
> Japan
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Akira Tsukamoto
Sony Computer Entertainment Inc. 
Architecture Lab.
Japan



More information about the cbe-oss-dev mailing list