[PATCH v3] can: grcan: Add device driver for GRCAN and GRHCAN cores
Wolfgang Grandegger
wg at grandegger.com
Thu Nov 1 07:21:04 EST 2012
On 10/31/2012 05:33 PM, Andreas Larsson wrote:
> On 2012-10-31 13:51, Wolfgang Grandegger wrote:
>> On 10/30/2012 05:24 PM, Andreas Larsson wrote:
>>> On 10/30/2012 11:07 AM, Wolfgang Grandegger wrote:
>>>>> + /* AHB bus error interrupts (not CAN bus errors) - shut down the
>>>>> + * device.
>>>>> + */
>>>>> + if (sources & (GRCAN_IRQ_TXAHBERR | GRCAN_IRQ_RXAHBERR)) {
>>>>> + if (sources & GRCAN_IRQ_TXAHBERR) {
>>>>> + netdev_err(dev, "got AHB bus error on tx\n");
>>>>> + stats->tx_errors++;
>>>>> + } else {
>>>>> + netdev_err(dev, "got AHB bus error on rx\n");
>>>>> + stats->rx_errors++;
>>>>> + }
>>>>> + netdev_err(dev, "halting device\n");
>>>>> +
>>>>> + /* Prevent anything to be enabled again and halt device */
>>>>> + SPIN_LOCK(&priv->lock);
>>>>> + priv->closing = true;
>>>>> + netif_stop_queue(dev);
>>>>> + grcan_stop(dev);
>>>>> + SPIN_UNLOCK(&priv->lock);
>>>>
>>>> Hm, does that really happen? How can the user/app realized the problem
>>>> and recover?
>>>
>>> My understanding of it is that if you get amba bus errors something is
>>> seriously wrong and nothing can be done at driver level to recover.
>>> Shutting down the device is to prevent the driver from spamming console
>>> messages. I used to have a sysfs indication of such errors. Now dmesg is
>>> the way to find out about the problem. The user can always bring the
>>> interface down and up again and try again after such an error.
>>
>> Well, sounds like a fatal error. Did you ever saw it? If that happens
>> frequently, or even once, the device/system seems not really usable.
>
> Fatal and yes if this is ever seen something is very bad with the
> system. I have never seen it. However the hardware reports if it would
> happen, so why not try to take care of it in some manner. If this is
> seen the can device is probably useless but it is possible that the rest
> of the system might be usable. I don't really know in what circumstances
> this would trigger.
OK, maybe a more serious message would just be fine.
>>>> Furthermore, why is a spin_clock enough here? THe interrupt may run a
>>>> thread.
>>>
>>> It should be enough because the function is only called
>>> directly from the interrupt handler, right? Or did I miss something?
>>
>> The interrupt handler may run as thread, e.g. if the -rt patch has been
>> applied. Therefore it's safer to use the irqsave/restore variants of the
>> spin_lock functions. At least that's my understanding. See also
>> "Documentation/spinlock.txt".
>
> I'll look into this. Seems strange if the same cpu can be interrupted in
> an interrupt handler to run the very same interrupt handler. Sounds like
> a horrible situation for any driver that does not lock down most of its
> interrupt handler. Or maybe there is some situation that I don't foresee.
As I said, if you use CONFIG_PREEMPT_RT all interrupt isr will be
threaded (running by threads).
>>>>> + priv->can.ctrlmode_supported =
>>>>> + CAN_CTRLMODE_LISTENONLY | CAN_CTRLMODE_ONE_SHOT;
>>>>
>>>> What about triple-sampling?
>>>
>>> That is not supported by the hardware.
>
> Well, now it will be in future versions of the controller, so support is
> added to the driver.
>
>
>>>> I curious how the device behaves on bus errors and state changes. Could
>>>> you please show the output of "candump -e any,0:0,#FFFFFFFF" while
>>>> sending a CAN message with no other node on the bus (not connected) and
>>>> with going bus off (by short-circuiting CAN high and low).
>>>
>>>
>>> Here is the output (with long sequences of similar error frames
>>> where only one counter is increasing cut out) from the the upcoming v4.
>>> let me know if you see any problems with this.
>>
>> How did your trigger that sequence? Short-circuit or not cable
>> connected? The latter, I assume, as bus-off is not reached.
>
> Triggered by short-circuiting, just as asked.
I ask for both tests but I missed the bus-off forther down.
>>>
>>> can0 20000006 [8] 00 00 00 00 00 00 10 00 ERRORFRAME
>>> lost-arbitration{at bit 0}
>>> controller-problem{}
>>> error-counter-tx-rx{{16}{0}}
>>
>> This is most probably an ack slot bus error similar to. You seem not to
>> handle bus errors and it's not a controller problem as the state did not
>> change.
>
> No, the hardware has no support for can-bus error reporting.
OK, but why is CAN_ERR_CRTL set?
>> http://lxr.linux.no/#linux+v3.6.4/drivers/net/can/flexcan.c#L353
>>
>>> can0 20000004 [8] 00 20 00 00 00 00 88 00 ERRORFRAME
>>> controller-problem{tx-error-passive}
>>> error-counter-tx-rx{{136}{0}}
>>
>> OK, a state change to error passive. The device seems not to report
>> changes to error warning.
>
> No, there is no interrupts or anything that alert about the error
> states. That is why I set the sate based on the error counters in the
> error handler.
Who does trigger the message above?
>>> can0 20000006 [8] 00 20 00 00 00 00 90 00 ERRORFRAME
>>> lost-arbitration{at bit 0}
>>> controller-problem{tx-error-passive}
>>> error-counter-tx-rx{{144}{0}}
>>
>> State changes should be reported only once.
>
> Do you mean that an error message should only be sent on state change?
Yes. On state change or another error, e.g. the lost-arbitration above.
>> But it did not change. This
>> is then a bus error (CAN_ERR_PROT | CAN_ERR_BUSERROR). See also above.
>>
>>> [...]
>>> can0 20000006 [8] 00 20 00 00 00 00 F0 00 ERRORFRAME
>>> lost-arbitration{at bit 0}
>>> controller-problem{tx-error-passive}
>>> error-counter-tx-rx{{240}{0}}
>>> can0 20000006 [8] 00 20 00 00 00 00 F8 00 ERRORFRAME
>>> lost-arbitration{at bit 0}
>>> controller-problem{tx-error-passive}
>>> error-counter-tx-rx{{248}{0}}
>>> can0 20000042 [8] 00 00 00 00 00 00 80 00 ERRORFRAME
>>> lost-arbitration{at bit 0}
>>> bus-off
>>> error-counter-tx-rx{{128}{0}}
>>> can0 20000006 [8] 00 00 00 00 00 00 18 00 ERRORFRAME
>>> lost-arbitration{at bit 0}
>>> controller-problem{}
Obviously the device restarts automatically after bus-off. Can this be
switched off. The normal procedure is to call "ip ... type can restart"
or to set "restart-ms" for automatic restart after the specified delay.
>> Also a sequence to bus-off including the recovery would be nice.
The sequence with no cable connected would be nice as well.
...
> Thanks for the input! I'll send in v4 so you can see what the code that
> generated the above.
Please fix the error handling first.
Thanks,
Wolfgang.
More information about the devicetree-discuss
mailing list