[PATCH v8 3/9] misc: smpro-errmon: Add Ampere's SMpro error monitor driver

Quan Nguyen quan at os.amperecomputing.com
Thu Jun 2 19:36:22 AEST 2022


On 01/06/2022 16:33, Greg Kroah-Hartman wrote:
> On Wed, Jun 01, 2022 at 03:21:47PM +0700, Quan Nguyen wrote:
>>>> +	if (err_type & BIT(2)) {
>>>> +		/* Error with data type */
>>>> +		ret = regmap_read(errmon->regmap, err_info->err_data_low, &data_lo);
>>>> +		if (ret)
>>>> +			goto done;
>>>> +
>>>> +		ret = regmap_read(errmon->regmap, err_info->err_data_high, &data_hi);
>>>> +		if (ret)
>>>> +			goto done;
>>>> +
>>>> +		count = sysfs_emit(buf, "%01x%02x%01x%02x%04x%04x%04x\n",
>>>> +				   4, (ret_hi & 0xf000) >> 12, (ret_hi & 0x0800) >> 11,
>>>> +				   ret_hi & 0xff, ret_lo, data_hi, data_lo);
>>>> +		/* clear the read errors */
>>>> +		ret = regmap_write(errmon->regmap, err_info->err_type, BIT(2));
>>>> +
>>>> +	} else if (err_type & BIT(1)) {
>>>> +		/* Error type */
>>>> +		count = sysfs_emit(buf, "%01x%02x%01x%02x%04x%04x%04x\n",
>>>> +				   2, (ret_hi & 0xf000) >> 12, (ret_hi & 0x0800) >> 11,
>>>> +				   ret_hi & 0xff, ret_lo, data_hi, data_lo);
>>>> +		/* clear the read errors */
>>>> +		ret = regmap_write(errmon->regmap, err_info->err_type, BIT(1));
>>>> +
>>>> +	} else if (err_type & BIT(0)) {
>>>> +		/* Warning type */
>>>> +		count = sysfs_emit(buf, "%01x%02x%01x%02x%04x%04x%04x\n",
>>>> +				   1, (ret_hi & 0xf000) >> 12, (ret_hi & 0x0800) >> 11,
>>>> +				   ret_hi & 0xff, ret_lo, data_hi, data_lo);
>>
>> Hi Greg,
>>
>> Since the internal representation of the internal error is split into high
>> low chunks of the info and data values which need to be communicated
>> atomicly, I'm treating them as "one value" here.
> 
> That is a huge "one value", that's not what this really is, it needs to
> be parsed by userspace, right?
> 
Thanks Greg for the review,

User space needs all of this "one value" to know what exactly is the error.

In our latest version, we remove all the if...else and simplify the code 
as below:
/*
  * The internal representation of the internal error is split into high
  * low chunks of the info and data values. Rather than temporarily
  * dumping these into an array and printing that, skip the intermediate
  * step and print them using a concatenation encoding.
  */
count = sysfs_emit(buf, "%04x%04x%04x%04x\n", info_h, info_l, data_h, 
data_l);

/* clear the read error */
ret = regmap_write(errmon->regmap, err_info->type, err_type);
return ret ? ret : count;

> And why does this have to be atomic?  What happens if the values change
> right after you read them?  What is userspace going to do with them?
> 
Because the error is bigger than single register can hold so it is split 
into small chunks to report via multiple separate registers.

Firmware stores each error in a queue. As the error's chunks are stored 
in separate registers. All of these registers will need to be read out 
before the error is clear so that the next error in the queue can be 
reported. That is why we say those chunks must be read out atomically.

User space will need to parse these information themself.

>> I could dump them in a
>> temporary array and print that, but it seems like additional complexity for
>> the same result. Can we consider this concatenated encoding as "an array of
>> the same type" for the purposes of this driver?"
> 
> That's really not a good idea as sysfs files should never need to be
> "parsed" like this.
> > Again, what are you trying to do here, and why does it have to be
> atomic?
> 
> thanks,
> 
> greg k-h



More information about the openbmc mailing list