83xx GPIO/EXT int in arch/powerpc/

Fri Jun 15 07:04:57 EST 2007

On Jun 12, 2007, at 11:06 AM, Marc Leeman wrote:

>> I'm confused what you are comparing here, 3 seconds on arch=ppc vs
>> over a minute on arch=powerpc?
>
> Loading of a TI DSP over HPI. HPI is implemented in UPMB  
> (programmed by
> U-Boot); all that the kernel has to do is write to the in U-Boot
> programmed location (0xe2400000) to trigger the UPMB HPI protocol.
>
>> I'd expect the driver to be exactly the same (or close to it) for
>> arch=ppc vs arch=powerpc.
>
> Same here: there are a number of performance issues wrt to an older  
> 8245
> based implementation (network seems slower too) and this was not one I
> expected while switching to powerpc.

Are you comparing 8245 on arch=ppc to 83xx on arch=powerpc or 83xx in  
both cases?

>> There shouldn't be, but if you are seeing this we really should
>> figure out what's going on.
>>
>> What kernel is this on?  What processor are you using?
>
> $ cat /proc/cpuinfo
> processor       : 0
> cpu             : e300c1
> clock           : 396.000000MHz
> revision        : 1.1 (pvr 8083 0011)
> bogomips        : 131.28
> timebase        : 66000000
> platform        : BARCO834x SVC2

Hmm, we really need to put SVR in there as well (add that to the todo  
list).  Which 83xx is this?

> $ uname -a
> Linux barco 2.6.21.1-barco1 #1 PREEMPT Tue Jun 12 09:48:12 CEST  
> 2007 ppc unknown
>
> The board is based on the FreeScale SYS/EMDS reference design.
>
> Most of the HPI operations are stuff like this:
>
> static inline int8_t _hpi_set_hhwil(uint8_t b)
> {
>         volatile immap_t* im;
>
>         if(!(im = ioremap((immrbar),sizeof(struct immap)))){
>                 return -EINVAL;
>         }
>         (b)?(im->gpio[0].dat |= HHWIL):(im->gpio[0].dat &= ~HHWIL);
>         iounmap(im);
>
>         return 0;
> }
>
> static inline uint32_t __hpi_read_hpid(void)
> {
>         uint32_t returnval;
>
>         /* Program HPID */
>         _hpi_set_hcntl1(1);
>         _hpi_set_hcntl0(1);
>
>         /* first halfword */
>         _hpi_set_hhwil(0);
>         /* dummy read */
>         returnval = (((uint32_t)(*hpi_dsp))<<16);
>
>         /* delay */
>         udelay(1);
>
>         /* second halfword */
>         _hpi_set_hhwil(1);
>         /* dummy read */
>         returnval |= *hpi_dsp;
>
>         /* delay */
>         udelay(1);
>
>         return returnval;
> }
>
> The ioremap was certainly a bottleneck; and moving it to  
> initialisation
> with a global pointer got us from 60 secs back to around 1 sec, but  
> the
> a similar effect was obtained on the ppc arch with this change  
> (this was
> just making a bad situation to be hidden).

Do you sense of how many calls you make to _hpi_set_hhwil &  
__hpi_read_hpid?

I think I know why the ppc case was more efficient.

> Even after this change; the load of a streaming application was
> something of 40% on ppc and 60% on powerpc.

What's going on during the streaming?

- k