[PATCH 25/45] staging/octeon: Use get/put_online_cpus_atomic() to prevent CPU offline

Tue Jun 25 04:17:06 EST 2013

On 06/23/2013 11:55 AM, Srivatsa S. Bhat wrote:
> On 06/23/2013 11:47 PM, Greg Kroah-Hartman wrote:
>> On Sun, Jun 23, 2013 at 07:13:33PM +0530, Srivatsa S. Bhat wrote:
>>> Once stop_machine() is gone from the CPU offline path, we won't be able
>>> to depend on disabling preemption to prevent CPUs from going offline
>>> from under us.
>>>
>>> Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going
>>> offline, while invoking from atomic context.
>>>
>>> Cc: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
>>> Cc: devel at driverdev.osuosl.org
>>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat at linux.vnet.ibm.com>
>>> ---
>>>
>>>   drivers/staging/octeon/ethernet-rx.c |    3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/staging/octeon/ethernet-rx.c b/drivers/staging/octeon/ethernet-rx.c
>>> index 34afc16..8588b4d 100644
>>> --- a/drivers/staging/octeon/ethernet-rx.c
>>> +++ b/drivers/staging/octeon/ethernet-rx.c
>>> @@ -36,6 +36,7 @@
>>>   #include <linux/prefetch.h>
>>>   #include <linux/ratelimit.h>
>>>   #include <linux/smp.h>
>>> +#include <linux/cpu.h>
>>>   #include <linux/interrupt.h>
>>>   #include <net/dst.h>
>>>   #ifdef CONFIG_XFRM
>>> @@ -97,6 +98,7 @@ static void cvm_oct_enable_one_cpu(void)
>>>   		return;
>>>
>>>   	/* ... if a CPU is available, Turn on NAPI polling for that CPU.  */
>>> +	get_online_cpus_atomic();
>>>   	for_each_online_cpu(cpu) {
>>>   		if (!cpu_test_and_set(cpu, core_state.cpu_state)) {
>>>   			v = smp_call_function_single(cpu, cvm_oct_enable_napi,
>>> @@ -106,6 +108,7 @@ static void cvm_oct_enable_one_cpu(void)
>>>   			break;
>>>   		}
>>>   	}
>>> +	put_online_cpus_atomic();
>>
>> Does this driver really need to be doing this in the first place?  If
>> so, why?  The majority of network drivers don't, why is this one
>> "special"?

It depends on your definition of "need".

The current driver receives packets from *all* network ports into a 
single queue (in OCTEON speak this queue is called a POW group).  Under 
high packet rates, the CPU time required to process the packets may 
exceed the capabilities of a single CPU.

In order to increase throughput beyond the single CPU limited rate, we 
bring more than one CPUs into play for NAPI receive.  The code being 
patched here is part of the logic that controls which CPUs are used for 
NAPI receive.

Just for the record:  Yes I know that doing this may lead to packet 
reordering when doing forwarding.

A further question that wasn't asked is: Will the code work at all if a 
CPU is taken offline even if the race, the patch eliminates, is avoided?

I doubt it.

As far as the patch goes:

Acked-by: David Daney <david.daney at cavium.com>

David Daney

>>
>
> Honestly, I don't know. Let's CC the author of that code (David Daney).
> I wonder why get_maintainer.pl didn't generate his name for this file,
> even though the entire file is almost made up of his commits alone!
>
> Regards,
> Srivatsa S. Bhat
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>