PLDM design proposal
Deepak Kodihalli
dkodihal at linux.vnet.ibm.com
Fri Dec 14 03:30:18 AEDT 2018
Hi All,
I've put down some thoughts below on an initial PLDM design on OpenBMC.
Thie structure of the document is based on the OpenBMC design template.
Please review and let me know your feedback. Once we've had a discussion
here on the list, I can move this to Gerrit with some more details. I'd
say reading the MCTP proposal from Jeremy should be a precursor to
reading this.
# PLDM Stack on OpenBMC
Author: Deepak Kodihalli <dkodihal at linux.vnet.ibm.com> <dkodihal>
## Problem Description
On OpenBMC, in-band IPMI is currently the primary industry-standard
means of communication between the BMC and the Host firmware. We've
started hitting some inherent limitations of IPMI on OpenPOWER servers:
a limited number of sensors, and a lack of a generic control mechanism
(sensors are a generic monitoring mechanism) are the major ones. There
is a need to improve upon the communication protocol, but at the same
time inventing a custom protocol is undesirable.
This design aims to employ Platform Level Data Model (PLDM), a standard
application layer communication protocol defined by the DMTF. PLDM draws
inputs from IPMI, but it overcomes most of the latter's limitations.
PLDM is also designed to run on standard transport protocols, for e.g.
MCTP (also designed by the DMTF). MCTP provides for a common transport
layer over several physical channels, by defining hardware bindings. The
solution of PLDM over MCTP also helps overcome some of the limitations
of the hardware channels that IPMI uses.
PLDM's purpose is to enable all sorts of "inside the box communication":
BMC - Host, BMC - BMC, BMC - Network Controller and BMC - Other (for
e.g. sensor) devices. This design doesn't preclude enablement of
communication channels not involving the BMC and the host.
## Background and References
PLDM is designed to be an effective interface and data model that
provides efficient access to low-level platform inventory, monitoring,
control, event, and data/parameters transfer functions. For example,
temperature, voltage, or fan sensors can have a PLDM representation that
can be used to monitor and control the platform using a set of PLDM
messages. PLDM defines data representations and commands that abstract
the platform management hardware.
As stated earlier, PLDM is designed for different flavors of "inside the
box" communication. PLDM groups commands under broader functions, and
defines separate specifications for each of these functions (also called
PLDM "Types"). The currently defined Types (and corresponding specs) are
: PLDM base (with associated IDs and states specs), BIOS, FRU, Platform
monitoring and control, Firmware Update and SMBIOS. All these
specifications are available at:
https://www.dmtf.org/standards/pmci
Some of the reasons PLDM sounds promising (some of these are advantages
over IPMI):
- Common in-band communication protocol.
- Already existing PLDM Type specifications that cover the most common
communication requirements. Up to 64 PLDM Types can be defined (the last
one is OEM). At the moment, 6 are defined. Each PLDM type can house up
to 256 PLDM commands.
- PLDM sensors are 2 bytes in length.
- PLDM introduces the concept of effecters - a control mechanism. Both
sensors and effecters are associated to entities (similar to IPMI,
entities cab be physical or logical), where sensors are a mechanism for
monitoring and effecters are a mechanism for control. Effecters can be
numeric or state based. PLDM defines commonly used entities and their
IDs, but there 8K slots available to define OEM entities.
- PLDM allows bidirectional communication, and sending asynchronous events.
- A very active PLDM related working group in the DMTF.
The plan is to run PLDM over MCTP. MCTP is defined in a spec of its own,
and a proposal on the MCTP design is in discussion already. There's
going to be an intermediate PLDM over MCTP binding layer, which lets us
send PLDM messages over MCTP. This is defined in a spec of its own, and
the design for this binding will be proposed separately.
## Requirements
How different BMC/Host/other applications make use of PLDM messages is
outside the scope of this requirements doc. The requirements listed here
are related to the PLDM protocol stack and the request/response model:
- Marshalling and unmarshalling of PLDM messages, defined in various
PLDM Type specs, must be implemented. This can of course be staged based
on the need of specific Types and functions. Since this is just encoding
and decoding PLDM messages, I believe there would be motivation to build
this into a library that could be shared between BMC, host and other
firmware stacks. The specifics of each PLDM Type (such as FRU table
structures, sensor PDR structures, etc) are implemented by this lib.
- Mapping PLDM concepts to native OpenBMC concepts must be implemented.
For e.g.: mapping PLDM sensors to phosphor-hwmon hosted D-Bus objects,
mapping PLDM FRU data to D-Bus objects hosted by
phosphor-inventory-manager, etc. The mapping shouldn't be restrictive to
D-Bus alone (meaning it shouldn't be necessary to put objects on the Bus
just so serve PLDM requests, a problem that exists with
phosphor-host-ipmid today). Essentially these are platform specific PLDM
message handlers.
- The BMC should be able to act as a PLDM responder as well as a PLDM
requester. As a PLDM responder, the BMC can monitor/control other
devices. As a PLDM responder, the BMC can react to PLDM messages
directed to it via requesters in the platform, for e.g, the Host.
- As a PLDM requester, the BMC must be able to discover other PLDM
enabled components in the platform.
- As a PLDM requester, the BMC must be able to send simultaneous
messages to different responders, but at the same time it can issue a
single message to a specific responder at a time.
- As a PLDM requester, the BMC must be able to handle out of order
responses.
- As a PLDM responder, the BMC may simultaneously respond to messages
from different requesters, but the spec doesn't mandate this. In other
words the responder could be single-threaded.
- It should be possible to plug-in non-existent PLDM functions (these
maybe new/existing standard Types, or OEM Types) into the PLDM stack.
## Proposed Design
The following are high level structural elements of the design:
### PLDM encode/decode libraries
This library would take a PLDM message, decode it and spit out the
different fields of the message. Conversely, given a PLDM Type, command
code, and the command's data fields, it would make a PLDM message. The
thought is to design this library such that it can be used by BMC and
the host firmware stacks, because it's the encode/decode and protocol
piece (and not the handling of a message). I'd like to know if there's
enough motivation to have this as a common lib. That would mean
additional requirements such as having this as a C lib instead of C++,
because of the runtime constraints of host firmware stacks. If there's
not enough interest to have this as a common lib, this could just be
part of the provider libs (see below), and it could then be written in C++.
There would be one encode/decode lib per PLDM Type. So for e.g.
something like /usr/lib/pldm/libbase.so, /usr/lib/pldm/libfru.so, etc.
### PLDM provider libraries
These libraries would implement the platform specific handling of
incoming PLDM requests (basically helping with the PLDM responder
implementation, see next bullet point), so for instance they would query
D-Bus objects (or even something like a JSON file) to fetch platform
specific information to respond to the PLDM message. They would link
with the encode/decode libs. Like the encode/decode libs, there would be
one per PLDM Type (for e.g /usr/lib/pldm/providers/libfru.so).
These libraries would essentially be plug-ins. That lets someone add
functionality for new PLDM (standard as well as OEM) Types, and it also
lets them replace default handlers. The libraries would implement a
"register" API to plug-in handlers for specific PLDM messages. Something
like:
template <typename Handler, typename... args>
auto register(uint8_t type, uint8_t command, Handler handler);
This allows for providing a strongly-typed C++ handler registration
scheme. It would also be possible to validate the parameters passed to
the handler at compile time.
### Request/Response Model
There are two approaches that I've described here, and they correlate to
the two options in Jeremy's MCTP design for how to notify on incoming
PLDM messages: in-process callbacks vs D-Bus signals.
#### With in-process callbacks
In this case, there would be a single PLDM (over MCTP) daemon that
implements both the PLDM responder and PLDM requester function. The
daemon would link with the encode/decode libs mentioned above, and the
MCTP lib.
The PLDM responder function would involve registering the PLDM provider
libs on startup. The PLDM responder implementation would sit in the
callback handler from the transport's rx. If it receives PLDM messages
of type Request, it will route them to an appropriate handler in a
provider lib, get the response back, and send back a PLDM response
message via the transport's tx API. If it receives messages of type
Response, it will put them on a "Response queue".
I think designing the BMC as a PLDM requester is interesting. We haven't
had this with IPMI, because the BMC was typically an IPMI server. I
envision PLDM requester functions to be spread across multiple OpenBMC
applications (instead of a single big requester app) - based on the
responder they're talking and the high level function they implement.
For example, there could be an app that lets the BMC upgrade firmware
for other devices using PLDM - this would be a generic app in the sense
that the same set of commands might have to be run irrespective of the
device on the other side. There could also be an app that does fan
control on a remote device, based on sensors from that device and
algorithms specific to that device.
The PLDM daemon would have to provide a D-Bus interface to send a PLDM
request message. This API would be used by apps wanting to send out PLDM
requests. If the message payload is too large, the interface could
accept an fd (containing the message), instead of an array of bytes. The
implementation of this would send the PLDM request message via the
transport's tx API, and then conditionally wait on the response queue to
have an entry that matches this request (the match is by instance id).
The conditional wait (or something equivalent) is required because the
app sending the PLDM message must block until getting a response back
from the remote PLDM device.
With what's been described above, it's obvious that the responder and
requester functions need to be able to run concurrently (this is as per
the PLDM spec as well). The BMC can simultaneously act as a responder
and requester. Waiting on a rx from the transport layer shouldn't block
other BMC apps from sending PLDM messages. So this means the PLDM daemon
would have to be multi-threaded, or maybe we can instead achieve this
via an event loop.
#### With D-Bus signals
This lets us separate PLDM daemons from the MCTP daemon, and eliminates
the need to handle request and response messages concurrently in the
same daemon, at the cost of much more D-Bus traffic. The MCTP daemon
would emit D-Bus signals describing the type of the PLDM message
(request/response) and containing the message payload. Alternatively it
could pass the PLDM message over a D-Bus API that the PLDM daemons would
implement. The MCTP daemon would also implement a D-Bus API to send PLDM
messages, as with the previous approach.
With this approach, I'd recommend two separate PLDM daemons - a
responder daemon and a requester daemon. The responder daemon reacts to
D-Bus signals corresponding to PLDM Request messages. It handles
incoming requests as before. The requester daemon would react to D-Bus
signals corresponding to PLDM response messages. It would implement the
instance id generation, and would also implement the response queue and
the conditional wait on that queue. It would also have to implement a
D-Bus API to let other PLDM-enabled OpenBMC apps send PLDM requests. The
implementation of that API would send the message to the MCTP daemon,
and then block on the response queue to get a response back.
### Multiple requesters and responders
The PLDM spec does allow simultaneous connections between multiple
responders/requesters. For e.g. the BMC talking to a multi-host system
on two different physical channels. Instead of implementing this in one
MCTP/PLDM daemon, we could spawn one daemon per physical channel.
## Impacts
Development would be required to implement the PLDM protocol, the
request/response model, and platform specific handling. Low level design
is required to implement the protocol specifics of each of the PLDM
Types. Such low level design is not included in this proposal.
Design and development needs to involve potential host firmware
implementations.
## Testing
Testing can be done without having to depend on the underlying transport
layer.
The responder function can be tested by mocking a requester and the
transport layer: this would essentially test the protocol handling and
platform specific handling. The requester function can be tested by
mocking a responder: this would test the instance id handling and the
send/receive functions.
APIs from the shared libraries can be tested via fuzzing.
Thanks,
Deepak
More information about the openbmc
mailing list