[Cbe-oss-dev] [PATCH 8/10] MARS: workload queue mutex protection

Fri Aug 29 11:31:49 EST 2008

Geoff Levand wrote:
> Yuji Mano wrote:
>> Kazunori Asayama wrote:
>>> Yuji Mano wrote:
>>>> This adds mutex protection when accessing the shared workload queue.
>>>> 
>>>> Prior to this patch the kernel scheduler accessed the whole queue block
>>>> atomically. Now the kernel scheduler atomically locks the queue block when it
>>>> needs to access it.
>>>> 
>>>> Host-side access of the queue block now also uses the mutex locks to keep the
>>>> workload queue functions thread-safe.
>>>> 
>>>> This also replaces the previously used MARS context mutex. The workload queue
>>>> handles its own mutex without the workload model APIs needing to manage the
>>>> MARS context mutex.
>>>> 
>>>> Signed-off-by: Yuji Mano <yuji.mano at am.sony.com>
>>> 
>>> (snip)
>>> 
>>>> --- a/include/common/mars/mars_workload_types.h
>>>> +++ b/include/common/mars/mars_workload_types.h
>>>> @@ -64,9 +64,9 @@ extern "C" {
>>>>  #define MARS_WORKLOAD_SIGNAL_ON			0x1	/* signal set on */
>>>>  #define MARS_WORKLOAD_SIGNAL_OFF		0x0	/* signal set off */
>>>>  
>>>> -#define MARS_WORKLOAD_MAX			1024	/* wl max */
>>>> -#define MARS_WORKLOAD_PER_BLOCK			16	/* wl per block */
>>>> -#define MARS_WORKLOAD_NUM_BLOCKS		64	/* wl max / per block */
>>>> +#define MARS_WORKLOAD_PER_BLOCK			15	/* wl/block */
>>> 
>>> This change of MARS_WORKLOAD_PER_BLOCK from 16 to 15 will cause *real*
>>> divide operation on MPU when calculating block number and index in the
>>> block by ID, i.e.:
>>> 
>>>   	int block = id / MARS_WORKLOAD_PER_BLOCK;
>>>   	int index = id % MARS_WORKLOAD_PER_BLOCK;
>>> 
>>> I think we should avoid that, if possible.
>>> 
>>> Other things look ok to me.
>> 
>> Unfortunately I can't think of a possible solution with the current
>> implementation to keep it 16.
>> Of course we can reduce MARS_WORKLOAD_PER_BLOCK, but that might be more wasteful
>> than beneficial.
>> 
> 
>>  /* 128 byte workload queue header structure */
>>  struct mars_workload_queue_header {
>> +	uint32_t lock;
>> +	uint32_t count;
>>  	uint64_t queue_ea;
>>  	uint64_t context_ea;
>> -	uint32_t count;
>>  	uint8_t flag;
>>  } __attribute__((aligned(MARS_WORKLOAD_QUEUE_HEADER_ALIGN)));
> 
> It seems like you would only need a single bit to implement a lock.
> I guess the EAs will always be aligned, so there will be unused bits
> that could maybe use for the lock.

I looked closer at what the problem was and realized it was not this,
but this:

>  /* 128 byte workload queue block structure */
>  struct mars_workload_queue_block {
> +	uint32_t lock;
> +	uint32_t pad;
>  	struct mars_workload_queue_block_bits bits[MARS_WORKLOAD_PER_BLOCK];
>  } __attribute__((aligned(MARS_WORKLOAD_QUEUE_BLOCK_ALIGN)));

A solution would be to have a union:

MARS_WORKLOAD_PER_BLOCK = 16
union mars_workload_queue_block {
	uint32_t lock;
	struct mars_workload_queue_block_bits bits[MARS_WORKLOAD_PER_BLOCK];
}

The operations on mars_workload_queue_block.bits would know that bits[0]
is not used.  It should be faster a non-power of 2 divide.

-Geoff