[PATCH] powerpc/papr_scm: Fix warning triggered by perf_stats_show()

Vaibhav Jain vaibhav at linux.ibm.com
Sat Sep 12 18:27:41 AEST 2020


Ira Weiny <ira.weiny at intel.com> writes:

> On Sat, Sep 12, 2020 at 01:36:46AM +0530, Vaibhav Jain wrote:
>> Thanks for reviewing this patch Ira,
>> 
>> 
>> Ira Weiny <ira.weiny at intel.com> writes:
>> 
>> > On Thu, Sep 10, 2020 at 02:52:12PM +0530, Vaibhav Jain wrote:
>> >> A warning is reported by the kernel in case perf_stats_show() returns
>> >> an error code. The warning is of the form below:
>> >> 
>> >>  papr_scm ibm,persistent-memory:ibm,pmemory at 44100001:
>> >>  	  Failed to query performance stats, Err:-10
>> >>  dev_attr_show: perf_stats_show+0x0/0x1c0 [papr_scm] returned bad count
>> >>  fill_read_buffer: dev_attr_show+0x0/0xb0 returned bad count
>> >> 
>> >> On investigation it looks like that the compiler is silently truncating the
>> >> return value of drc_pmem_query_stats() from 'long' to 'int', since the
>> >> variable used to store the return code 'rc' is an 'int'. This
>> >> truncated value is then returned back as a 'ssize_t' back from
>> >> perf_stats_show() to 'dev_attr_show()' which thinks of it as a large
>> >> unsigned number and triggers this warning..
>> >> 
>> >> To fix this we update the type of variable 'rc' from 'int' to
>> >> 'ssize_t' that prevents the compiler from truncating the return value
>> >> of drc_pmem_query_stats() and returning correct signed value back from
>> >> perf_stats_show().
>> >> 
>> >> Fixes: 2d02bf835e573 ('powerpc/papr_scm: Fetch nvdimm performance
>> >>        stats from PHYP')
>> >> Signed-off-by: Vaibhav Jain <vaibhav at linux.ibm.com>
>> >> ---
>> >>  arch/powerpc/platforms/pseries/papr_scm.c | 3 ++-
>> >>  1 file changed, 2 insertions(+), 1 deletion(-)
>> >> 
>> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
>> >> index a88a707a608aa..9f00b61676ab9 100644
>> >> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> >> @@ -785,7 +785,8 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>> >>  static ssize_t perf_stats_show(struct device *dev,
>> >>  			       struct device_attribute *attr, char *buf)
>> >>  {
>> >> -	int index, rc;
>> >> +	int index;
>> >> +	ssize_t rc;
>> >
>> > I'm not sure this is really fixing everything here.
>> 
>> The issue is with the statement in perf_stats_show():
>> 
>> 'return rc ? rc : seq_buf_used(&s);'
>> 
>> The function seq_buf_used() returns an 'unsigned int' and with 'rc'
>> typed as 'int', forces a promotion of the expression to 'unsigned int'
>> which causes a loss of signedness of 'rc' and compiler silently
>> assigns this unsigned value to the function return typed as 'signed
>> long'.
>> 
>> Making 'rc', a 'signed long' forces a promotion of the expresion to
>> 'signed long' which preserves the signedness of 'rc' and will also be
>> compatible with the function return type.
>
> Ok, ok, I read this all wrong.
>
> FWIW I would also cast seq_buf_used() to ssize_t to show you know what you are
> doing there.
>
>> 
>> >
>> > drc_pmem_query_stats() can return negative errno's.  Why are those not checked
>> > somewhere in perf_stats_show()?
>> >
>> For the specific invocation 'drc_pmem_query_stats(p, stats, 0)' we only
>> expect return value 'rc <=0' with '0' indicating a successful fetch of
>> nvdimm performance stats from hypervisor. Hence there are no explicit
>> checks for negative error codes in the functions as all return values
>> !=0 indicate an error.
>> 
>> 
>> > It seems like all this fix is handling is a > 0 return value: 'ret[0]' from
>> > line 289 in papr_scm.c...  Or something?
>> No, in case the arg 'num_stats' is '0' and 'buff_stats != NULL' the
>> variable 'size' is assigned a non-zero value hence that specific branch
>> you mentioned  is never taken. Instead in case of success
>> drc_pmem_query_stats() return '0' and in case of an error a negative
>> error code is returned.
>> 
>> >
>> > Worse yet drc_pmem_query_stats() is returning ssize_t which is a signed value.
>> > Therefore, it should not be returning -errno.  I'm surprised the static
>> > checkers did not catch that.
>> Didnt quite get the assertion here. The function is marked to return
>> 'ssize_t' because we can return both +ve for success and -ve values to
>> indicate errors.
>
> Sorry I was reading this as size_t and meant to say unsigned...  I was looking
> at this too quickly.
>
>> 
>> >
>> > I believe I caught similar errors with a patch series before which did not pay
>> > attention to variable types.
>> >
>> > Please audit this code for these types of errors and ensure you are really
>> > doing the correct thing when using the sysfs interface.  I'm pretty sure bad
>> > things will eventually happen (if they are not already) if you return some
>> > really big number to the sysfs core from *_show().
>> I think this problem is different compared to what you had previously pointed
>> to. The values returned from drc_pmem_query_stats() can be stored in an
>> 'int' variable too, however it was the silent promotion of a signed type
>> to unsigned type was what caused this specific issue.
>
> Ok this makes more sense now.  Sorry about not looking more carefully.
>
> But I still think matching up the return of seq_buf_used() is worth it.  I
> don't particularly like depending on 'automatic' promotions which make
> reviewing code harder like this.
Agree, I have sent out a v2 addressing this.

>
> And sorry if my email seemed harsh I did not mean it to be.  I just like when
> types are more explicit because I feel like it can avoid issues like this.
> (Specifically my confusion over the types...)
>
> :-D
No worries :-)


Thanks,
-- 
Cheers
~ Vaibhav


More information about the Linuxppc-dev mailing list