[RFC] powerpc/pseries: Interface to represent PAPR firmware attributes

Thu Jun 10 18:01:02 AEST 2021

On 10/06/21 5:33 am, Fabiano Rosas wrote:
> Pratik Sampat <psampat at linux.ibm.com> writes:
>
>>>> 3. version info                      - 1 byte
>>>> 4. A data array of size num attributes, which contains the following:
>>>>     a. attribute ID              - 8 bytes
>>>>     b. attribute value in number - 8 bytes
>>>>     c. attribute name in string  - 64 bytes
>>>>     d. attribute value in string - 64 bytes
>>> Is this new hypercall already present in the spec? These seem a bit
>>> underspecified to me.
>> Yes, it is present in the spec. I probably summarized a little more than needed
>> here and I could expand upon below.
>>
>> The input buffer recives the following data:
>>
>> 1. “flags”:
>> 	a. Bit 0: singleAttribute
>> 		If set to 1, only return the single attribute matching firstAttributeId.
>> 	b. Bits 1-63: Reserved
>> 2. “firstAttributeId”: The first attribute to retrieve
>> 3. “bufferAddress”: The logical real address of the start of the output buffer
>> 4. “bufferSize”: The size in bytes of the output buffer
>> 	
>>
>>   From the document, the format of the output buffer is as follows:
>>
>> Table 1 --> output buffer
>> ================================================================================
>> | Field Name           | Byte   | Length   |  Description
>> |                      | Offset | in Bytes |
>> ================================================================================
>> | NumberOf             |        |          | Number of Attributes in Buffer
>> | AttributesInBuffer   | 0x000  | 0x08     |
>> --------------------------------------------------------------------------------
>> | AttributeArrayOffset | 0x008  | 0x08     | Byte offset to start of Array
>> |                      |        |          | of Attributes
>> |                      |        |          |
>> --------------------------------------------------------------------------------
>> | OutputBufferData     |        |          | Version of the Header.
>> | HeaderVersion        | 0x010  | 0x01     | The header will be always
>> | AttributesInBuffer   |        |          | backward compatible, and changes
>> |                      |        |          | will not impact the Array of
>> |                      |        |          | attributes.
>> |                      |        |          | Current version = 0x01
> This is not clear to me. In the event of a header version change, is the
> total set of attributes guaranteed to remain the same? Or only the array
> layout? We might not need to expose the version information after all.

I believe, the way versioning currently works is that if any new attribute is
added/modified to the list, this will entail a new version.

Regardless, the older attributes and their ids will not change and will still
be backwards compatible.

If the versioning does change, this patch does introduce a version check and
will fail to populate the sysfs and, a tool like powerpc-utils will not read
incorrect/non-coherent information.

So I'm inclined also believe now that verisoning information may not be needed
to expose to userspace.

>> --------------------------------------------------------------------------------
>> | ArrayOfAttributes    |        |          | The array will contain
>> |                      |        |          | "NumberOfAttributesInBuffer"
>> |                      |        |          | array elements not to exceed
>> |                      |        |          | the size of the buffer.
>> |                      |        |          | Layout of the array is
>> |                      |        |          | detailed in Table 2.
>> --------------------------------------------------------------------------------
>>
>>
>> Table 2 --> Array of attributes
>> ================================================================================
>> | Field Name           | Byte   | Length   |  Description
>> |                      | Offset | in Bytes |
>> ================================================================================
>> | 1st AttributeId      | 0x000  | 0x08     | The ID of the Attribute
>> --------------------------------------------------------------------------------
>> | 1st AttributeValue   | 0x008  | 0x08     | The numerical value of
>> |                      |        |          | the attribute
>> --------------------------------------------------------------------------------
>> | 1st AttributeString  | 0x010  | 0x40     | The ASCII string
>> | Description          |        |          | description of the
>> |                      |        |          | attribute, up to 63
>> |                      |        |          | characters plus a NULL
>> |                      |        |          | terminator.
> There is a slight disconnect in that this is called "description" by the
> spec, which makes me think they could eventually have something more
> verbose than what you'd expect from "name".
>
> So they could give us either: "Frequency" or "The Frequency in GigaHertz".

Yes, the description can be more verbose, like I can see attributes with the
description as "Minimum Frequency (MHz)". That's probably why parsing based on
IDs is a better approach.

>
>> --------------------------------------------------------------------------------
>> | 1st AttributeValue   | 0x050  | 0x40     | The ASCII string
>> | StringDescription    |        |          | description of the
>> |                      |        |          | attribute value, up to 63
>> |                      |        |          | characters plus a NULL
>> |                      |        |          | terminator. If this
>> |                      |        |          | contains only a NULL
>> |                      |        |          | terminator, then there is
>> |                      |        |          | no ASCII string
>> |                      |        |          | associated with AttributeValue.
>> --------------------------------------------------------------------------------
>> | ....                 |        |          |
>>
>>
>>>> The new H_CALL exports information in direct string value format, hence
>>>> a new interface has been introduced in /sys/firmware/papr to export
>>> Hm.. Maybe this should be something less generic than "papr"?
>> The interface naming was inspired from /sys/firmware/opal's naming convention.
>> We believed the name PAPR could serve as more generic name to be used by both
>> Linux running on PHYP and linux on KVM.
> Right, I agree with that rationale, but /opal has identifiable elements
> in it whereas /papr would have the generic "attr_X_name", which does not
> give much hint about what they are.
>
> We also expect people to iterate the "attr_X_*" files, so if we decide
> to add something else under /papr in the future, that would potentially
> cause issues with any tool that just lists the content of the directory.
>
> So maybe we should be proactive and put the hcall stuff inside a
> subdirectory already. /papr/energy_scale_attrs comes to mind, but I
> don't have a strong opinion on the particular name.

Encapsulating it within another directory like energy_scale_attrs does make
sense and keeps the PAPR directory open to more such information going forward.

>> If you have something more concrete in mind, please let me know. I'm open to
>> suggestions.
>>
>>>> this information to userspace in an extensible pass-through format.
>>>> The H_CALL returns the name, numeric value and string value. As string
>>>> values are in human readable format, therefore if the string value
>>>> exists then that is given precedence over the numeric value.
>>> So the hypervisor could simply not send the string representation? How
>>> will the userspace tell the difference since they are reading everything
>>> from a file?
>>>
>>> Overall I'd say we should give the data in a more structured way and let
>>> the user-facing tool do the formatting and presentation.
>> That's a valid concern, the design for this was inspired from hwmon's interface
>> to housing the sensor information.
>>
>> One alternative to add more structure to this format could be to introduce:
>> attr_X_name, attr_X_num_val, attr_X_str_val
>>
>> However, in some cases like min/max frequency the string value is empty. In
>> that case the file attr_X_str_val will also be empty.
>> Is that an acceptable format of having empty files that in some cases will
>> never be populated?
> I'm thinking yes, but I'm not sure. Let's see if someone else has a say
> in this.

Sure, if we can have empty sysfs files, then this presents a coherent interface.

@mpe, can you weigh in here, can we have an interface where we have the following structure:
/sys/firmware/papr/energy_scale_attrs/
    |-- <id>/
      |-- desc
      |-- value
      |-- value_desc
where value_desc can be empty in some case?
If so, can we leave them empty or do we need to have them populated with a
string "NULL"/"NONE"?

>
>> We also went ahead to confirm with the SPEC team that if a string value exists
>> in their buffer, that must be given precedence.
> Huh.. That must be a recommendation only. The hypervisor has no control
> over how people present the information in userspace.
>
>> Another alternative format could to keep attr_X_name, attr_X_val intact but
>> change what X means. Currently X is just an iteratively increasing number. But
>> X can also serve as an ID which we get from H_CALL output buffer.
> This seems like a good idea. It makes it easier to correlate the
> attribute with what is in PAPR.
>
>> In this case, we should also include some versioning so that the tool now also
>> has cognizance of contents of each file.
>>
>>>> The format of exposing the sysfs information is as follows:
>>>> /sys/firmware/papr/
>>>>     |-- attr_0_name
>>>>     |-- attr_0_val
>>>>     |-- attr_1_name
>>>>     |-- attr_1_val
>>>> ...
>>> How do we keep a stable interface with userspace? Say the hypervisor
>>> decides to add or remove attributes, change their order, string
>>> representation, etc? It will inform us via the version field, but that
>>> is lost when we output this to sysfs.
>>>
>>> I get that if the userspace just iterate over the contents of the
>>> directory then nothing breaks, but there is not much else it could do it
>>> seems.
>> Fair point, having the version exposed to the sysfs does seem crucial.
>>
>> Currently in ppc-utils we iterate over all the information, however as you
>> rightly pointed out there may be other tools needing just specific information.
>> The alternative I suggested a few sentences above to include ID based attribute
>> naming and versioning maybe a more elegant way of solving this problem.
>>
>> What are your thoughts on a design like this?
>>
> Based on all the new information you provided, I'd say present all the
> data and group it under the ID:
>
> /sys/firmware/papr/energy_scale_attrs/
>     |-- <id>/
>       |-- desc
>       |-- value
>       |-- value_desc
>     |-- <id>/
>       |-- desc
>       |-- value
>       |-- value_desc
>
> Is that workable?

If we can confirm if value descriptions can be empty, then I too think this is
a good interface to introduce for energy attributes.

Thanks for your feedback.
Pratik

>>>> The energy information that is exported is useful for userspace tools
>>>> such as powerpc-utils. Currently these tools infer the
>>>> "power_mode_data" value in the lparcfg, which in turn is obtained from
>>>> the to be deprecated H_GET_EM_PARMS H_CALL.
>>>> On future platforms, such userspace utilities will have to look at the
>>>> data returned from the new H_CALL being populated in this new sysfs
>>>> interface and report this information directly without the need of
>>>> interpretation.
>>>>
>>>> Signed-off-by: Pratik R. Sampat <psampat at linux.ibm.com>
>> Thanks
>> Pratik