[Cbe-oss-dev] [PATCH 8/10] MARS: workload queue mutex protection
Geoff Levand
geoffrey.levand at am.sony.com
Fri Aug 29 11:31:49 EST 2008
Geoff Levand wrote:
> Yuji Mano wrote:
>> Kazunori Asayama wrote:
>>> Yuji Mano wrote:
>>>> This adds mutex protection when accessing the shared workload queue.
>>>>
>>>> Prior to this patch the kernel scheduler accessed the whole queue block
>>>> atomically. Now the kernel scheduler atomically locks the queue block when it
>>>> needs to access it.
>>>>
>>>> Host-side access of the queue block now also uses the mutex locks to keep the
>>>> workload queue functions thread-safe.
>>>>
>>>> This also replaces the previously used MARS context mutex. The workload queue
>>>> handles its own mutex without the workload model APIs needing to manage the
>>>> MARS context mutex.
>>>>
>>>> Signed-off-by: Yuji Mano <yuji.mano at am.sony.com>
>>>
>>> (snip)
>>>
>>>> --- a/include/common/mars/mars_workload_types.h
>>>> +++ b/include/common/mars/mars_workload_types.h
>>>> @@ -64,9 +64,9 @@ extern "C" {
>>>> #define MARS_WORKLOAD_SIGNAL_ON 0x1 /* signal set on */
>>>> #define MARS_WORKLOAD_SIGNAL_OFF 0x0 /* signal set off */
>>>>
>>>> -#define MARS_WORKLOAD_MAX 1024 /* wl max */
>>>> -#define MARS_WORKLOAD_PER_BLOCK 16 /* wl per block */
>>>> -#define MARS_WORKLOAD_NUM_BLOCKS 64 /* wl max / per block */
>>>> +#define MARS_WORKLOAD_PER_BLOCK 15 /* wl/block */
>>>
>>> This change of MARS_WORKLOAD_PER_BLOCK from 16 to 15 will cause *real*
>>> divide operation on MPU when calculating block number and index in the
>>> block by ID, i.e.:
>>>
>>> int block = id / MARS_WORKLOAD_PER_BLOCK;
>>> int index = id % MARS_WORKLOAD_PER_BLOCK;
>>>
>>> I think we should avoid that, if possible.
>>>
>>> Other things look ok to me.
>>
>> Unfortunately I can't think of a possible solution with the current
>> implementation to keep it 16.
>> Of course we can reduce MARS_WORKLOAD_PER_BLOCK, but that might be more wasteful
>> than beneficial.
>>
>
>> /* 128 byte workload queue header structure */
>> struct mars_workload_queue_header {
>> + uint32_t lock;
>> + uint32_t count;
>> uint64_t queue_ea;
>> uint64_t context_ea;
>> - uint32_t count;
>> uint8_t flag;
>> } __attribute__((aligned(MARS_WORKLOAD_QUEUE_HEADER_ALIGN)));
>
> It seems like you would only need a single bit to implement a lock.
> I guess the EAs will always be aligned, so there will be unused bits
> that could maybe use for the lock.
I looked closer at what the problem was and realized it was not this,
but this:
> /* 128 byte workload queue block structure */
> struct mars_workload_queue_block {
> + uint32_t lock;
> + uint32_t pad;
> struct mars_workload_queue_block_bits bits[MARS_WORKLOAD_PER_BLOCK];
> } __attribute__((aligned(MARS_WORKLOAD_QUEUE_BLOCK_ALIGN)));
A solution would be to have a union:
MARS_WORKLOAD_PER_BLOCK = 16
union mars_workload_queue_block {
uint32_t lock;
struct mars_workload_queue_block_bits bits[MARS_WORKLOAD_PER_BLOCK];
}
The operations on mars_workload_queue_block.bits would know that bits[0]
is not used. It should be faster a non-power of 2 divide.
-Geoff
More information about the cbe-oss-dev
mailing list