[Cbe-oss-dev] [PATCH 8/10] MARS: workload queue mutex protection

Fri Aug 29 11:30:56 EST 2008

Geoff Levand wrote:
> Geoff Levand wrote:
>> Yuji Mano wrote:
>>> Kazunori Asayama wrote:
>>>> Yuji Mano wrote:
>>>>> This adds mutex protection when accessing the shared workload queue.
>>>>> 
>>>>> Prior to this patch the kernel scheduler accessed the whole queue block
>>>>> atomically. Now the kernel scheduler atomically locks the queue block when it
>>>>> needs to access it.
>>>>> 
>>>>> Host-side access of the queue block now also uses the mutex locks to keep the
>>>>> workload queue functions thread-safe.
>>>>> 
>>>>> This also replaces the previously used MARS context mutex. The workload queue
>>>>> handles its own mutex without the workload model APIs needing to manage the
>>>>> MARS context mutex.
>>>>> 
>>>>> Signed-off-by: Yuji Mano <yuji.mano at am.sony.com>
>>>> 
>>>> (snip)
>>>> 
>>>>> --- a/include/common/mars/mars_workload_types.h
>>>>> +++ b/include/common/mars/mars_workload_types.h
>>>>> @@ -64,9 +64,9 @@ extern "C" {
>>>>>  #define MARS_WORKLOAD_SIGNAL_ON			0x1	/* signal set on */
>>>>>  #define MARS_WORKLOAD_SIGNAL_OFF		0x0	/* signal set off */
>>>>>  
>>>>> -#define MARS_WORKLOAD_MAX			1024	/* wl max */
>>>>> -#define MARS_WORKLOAD_PER_BLOCK			16	/* wl per block */
>>>>> -#define MARS_WORKLOAD_NUM_BLOCKS		64	/* wl max / per block */
>>>>> +#define MARS_WORKLOAD_PER_BLOCK			15	/* wl/block */
>>>> 
>>>> This change of MARS_WORKLOAD_PER_BLOCK from 16 to 15 will cause *real*
>>>> divide operation on MPU when calculating block number and index in the
>>>> block by ID, i.e.:
>>>> 
>>>>   	int block = id / MARS_WORKLOAD_PER_BLOCK;
>>>>   	int index = id % MARS_WORKLOAD_PER_BLOCK;
>>>> 
>>>> I think we should avoid that, if possible.
>>>> 
>>>> Other things look ok to me.
>>> 
>>> Unfortunately I can't think of a possible solution with the current
>>> implementation to keep it 16.
>>> Of course we can reduce MARS_WORKLOAD_PER_BLOCK, but that might be more wasteful
>>> than beneficial.
>>> 
>> 
>>>  /* 128 byte workload queue header structure */
>>>  struct mars_workload_queue_header {
>>> +	uint32_t lock;
>>> +	uint32_t count;
>>>  	uint64_t queue_ea;
>>>  	uint64_t context_ea;
>>> -	uint32_t count;
>>>  	uint8_t flag;
>>>  } __attribute__((aligned(MARS_WORKLOAD_QUEUE_HEADER_ALIGN)));
>> 
>> It seems like you would only need a single bit to implement a lock.
>> I guess the EAs will always be aligned, so there will be unused bits
>> that could maybe use for the lock.
> 
> I looked closer at what the problem was and realized it was not this,
> but this:
> 
>>  /* 128 byte workload queue block structure */
>>  struct mars_workload_queue_block {
>> +	uint32_t lock;
>> +	uint32_t pad;
>>  	struct mars_workload_queue_block_bits bits[MARS_WORKLOAD_PER_BLOCK];
>>  } __attribute__((aligned(MARS_WORKLOAD_QUEUE_BLOCK_ALIGN)));
> 
> 
> A solution would be to have a union:
> 
> MARS_WORKLOAD_PER_BLOCK = 16
> union mars_workload_queue_block {
> 	uint32_t lock;
> 	struct mars_workload_queue_block_bits bits[MARS_WORKLOAD_PER_BLOCK];
> }
> 
> The operations on mars_workload_queue_block.bits would know that bits[0]
> is not used.  It should be faster a non-power of 2 divide.
> 
> -Geoff
> 

Yes that is definitely a possible solution.
Thanks for the insight!

Regards,
Yuji