[patch 05/18] PS3: Fix sparse warnings

Fri Jun 8 15:59:08 EST 2007

On 2007/06/07, at 23:34, Geoff Levand wrote:

> Arnd Bergmann wrote:
>> On Wednesday 06 June 2007, Geoff Levand wrote:
>>> -╴╴╴╴╴╴╴spu->local_store = ioremap(spu->local_store_phys, LS_SIZE);
>>> +╴╴╴╴╴╴╴spu->local_store = (__force void 
>>> *)ioremap(spu->local_store_phys,
>>> +╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴ ╴ LS_SIZE);
>>
>> I haven't noticed this before, but it seems to be a preexisting bug:
>> You map the local_store as with the guarded page table bit set, which
>> causes a performance degradation when accessing the memory from kernel
>> space.
>>
>> If you're lucky, your hypervisor knows this and will fix it up for
>> you, but I would replace the ioremap call with an
>> ioremap_flags(..., _PAGE_NO_CACHE); to be on the safe side.
>>
>> If you want to measure the impact, I'd suggest timing a user space
>> read() on the mem file of a running SPU context.
>
> Hi Arnd,
>
> I asked Noguchi-san to check the performance and below is his
> report and test program.  I'll add the change into my patch set.
>
> -Geoff
>
> -------- Original Message --------
> Subject: RE: [patch 05/18] PS3: Fix sparse warnings
> Date: Thu, 7 Jun 2007 05:39:43 -0700
> From: Noguchi, Masato <Masato.Noguchi at jp.sony.com>
> To: Levand, Geoff <Geoffrey.Levand at am.sony.com>
>
>  << A time to read a whole of LS by read system call >>
> not patched: avg. 21053.7800 tick ( 263.831830 microseconds )
> patched:     avg. 20809.2412 tick ( 260.767434 microseconds )
>
> about 1% faster.
> I think it's a valid difference. (not a measurement error.)

Let me correct above measurements and analysis.

My understanding is:

	1) caching-inhibited loads are usually implemented as guarded. Thus,
	   guarded property will not affect load performance.

	2) caching-inhibited and non-guarded sequential stores are usually
	   gathered (merged) in store buffer. Thus, guarded property will
	   affect store performance much.

Therefor, we should measure write(2) performance to evaluate the impact
of the patch.

Attached is a little modified test program based on Noguchi-san's one.
It measures write(2) performance. Buffer area is 'pre-touched' to avoid
VM overhead.

***** test results *****

before:	103903	[TB cycles]	(best value out of 100 tries.)
after:	14277	[TB cycles] (best value out of 100 tries.)

-- Takao Shinohara

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test_write_ls.c
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20070608/7ab7c989/attachment.txt>
-------------- next part --------------