[Cbe-oss-dev] [PATCH 8/10] MARS: workload queue mutex protection
Yuji Mano
Yuji.Mano at am.sony.com
Fri Aug 29 11:30:56 EST 2008
Geoff Levand wrote:
> Geoff Levand wrote:
>> Yuji Mano wrote:
>>> Kazunori Asayama wrote:
>>>> Yuji Mano wrote:
>>>>> This adds mutex protection when accessing the shared workload queue.
>>>>>
>>>>> Prior to this patch the kernel scheduler accessed the whole queue block
>>>>> atomically. Now the kernel scheduler atomically locks the queue block when it
>>>>> needs to access it.
>>>>>
>>>>> Host-side access of the queue block now also uses the mutex locks to keep the
>>>>> workload queue functions thread-safe.
>>>>>
>>>>> This also replaces the previously used MARS context mutex. The workload queue
>>>>> handles its own mutex without the workload model APIs needing to manage the
>>>>> MARS context mutex.
>>>>>
>>>>> Signed-off-by: Yuji Mano <yuji.mano at am.sony.com>
>>>>
>>>> (snip)
>>>>
>>>>> --- a/include/common/mars/mars_workload_types.h
>>>>> +++ b/include/common/mars/mars_workload_types.h
>>>>> @@ -64,9 +64,9 @@ extern "C" {
>>>>> #define MARS_WORKLOAD_SIGNAL_ON 0x1 /* signal set on */
>>>>> #define MARS_WORKLOAD_SIGNAL_OFF 0x0 /* signal set off */
>>>>>
>>>>> -#define MARS_WORKLOAD_MAX 1024 /* wl max */
>>>>> -#define MARS_WORKLOAD_PER_BLOCK 16 /* wl per block */
>>>>> -#define MARS_WORKLOAD_NUM_BLOCKS 64 /* wl max / per block */
>>>>> +#define MARS_WORKLOAD_PER_BLOCK 15 /* wl/block */
>>>>
>>>> This change of MARS_WORKLOAD_PER_BLOCK from 16 to 15 will cause *real*
>>>> divide operation on MPU when calculating block number and index in the
>>>> block by ID, i.e.:
>>>>
>>>> int block = id / MARS_WORKLOAD_PER_BLOCK;
>>>> int index = id % MARS_WORKLOAD_PER_BLOCK;
>>>>
>>>> I think we should avoid that, if possible.
>>>>
>>>> Other things look ok to me.
>>>
>>> Unfortunately I can't think of a possible solution with the current
>>> implementation to keep it 16.
>>> Of course we can reduce MARS_WORKLOAD_PER_BLOCK, but that might be more wasteful
>>> than beneficial.
>>>
>>
>>> /* 128 byte workload queue header structure */
>>> struct mars_workload_queue_header {
>>> + uint32_t lock;
>>> + uint32_t count;
>>> uint64_t queue_ea;
>>> uint64_t context_ea;
>>> - uint32_t count;
>>> uint8_t flag;
>>> } __attribute__((aligned(MARS_WORKLOAD_QUEUE_HEADER_ALIGN)));
>>
>> It seems like you would only need a single bit to implement a lock.
>> I guess the EAs will always be aligned, so there will be unused bits
>> that could maybe use for the lock.
>
> I looked closer at what the problem was and realized it was not this,
> but this:
>
>> /* 128 byte workload queue block structure */
>> struct mars_workload_queue_block {
>> + uint32_t lock;
>> + uint32_t pad;
>> struct mars_workload_queue_block_bits bits[MARS_WORKLOAD_PER_BLOCK];
>> } __attribute__((aligned(MARS_WORKLOAD_QUEUE_BLOCK_ALIGN)));
>
>
> A solution would be to have a union:
>
> MARS_WORKLOAD_PER_BLOCK = 16
> union mars_workload_queue_block {
> uint32_t lock;
> struct mars_workload_queue_block_bits bits[MARS_WORKLOAD_PER_BLOCK];
> }
>
> The operations on mars_workload_queue_block.bits would know that bits[0]
> is not used. It should be faster a non-power of 2 divide.
>
> -Geoff
>
Yes that is definitely a possible solution.
Thanks for the insight!
Regards,
Yuji
More information about the cbe-oss-dev
mailing list