<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Aug 21, 2017 at 2:13 PM, Brendan Higgins <span dir="ltr"><<a href="mailto:brendanhiggins@google.com" target="_blank">brendanhiggins@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello all,<br>
<br>
Nancy has informed me that there is some renewed interest in our Generic Flash<br>
Access mechanism and I know people have been asking for it for a while now,<br>
here it finally is!<br>
<br>
Our design has evolved quite a bit since it was originally shared (original<br>
proposal can be found here<br>
<a href="https://lists.ozlabs.org/pipermail/openbmc/2016-October/005131.html" rel="noreferrer" target="_blank">https://lists.ozlabs.org/<wbr>pipermail/openbmc/2016-<wbr>October/005131.html</a> for anyone<br>
interested), so I think it makes sense to start from scratch.<br>
<br>
I decided to add Corey Minyard, the Linux IPMI maintainer on this email as well<br>
because I think he will have some useful insights and this will also shed some<br>
light on the types of things we are trying to get our BMC to do as well as our<br>
conceptual approach.<br>
<br>
Also, before anyone says anything, I am aware that Generic Flash Access and<br>
Generic Transport Layer are not great names, but we have not thought of anything<br>
better yet. If anyone has any suggestions of better names, or any suggestions of<br>
any sort, please feel free to share.<br>
<br>
The primary motivation for Generic Flash Access is that there is no generic<br>
flash access mechanism that is available to all BMC based systems. LPC based<br>
mailbox is not available on all platforms and thus cannot be the sole basis for<br>
a generic mechanism. Furthermore, we have many types of operations that we would<br>
like to do with flash that do not fit well into the mailbox model.<br></blockquote><div><br></div><div>Definitely the case. Any approach should be something that can be implemented per system. Such that the data and the command & control can go over separate channels. And that they don't imply direct flash access.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Generic Flash Access is composed of two parts: the Generic Transport Layer, and<br>
the Generic Flash Access Protocol.<br>
<br>
The Generic Transport Layer (GTL) attempts to provide an abstract stream<br>
oriented interface between the BMC and the host CPU. It accomplishes this by<br>
piggy-backing on another pre-existing interface, currently IPMI, however, others<br>
are possible.<br>
<br>
Conceptually, GTL sort of looks like a net device. A user would open a "socket";<br>
in this case, would request a transaction id, and then would be able to write<br>
arbitrary data which would be sent as a request. The request would be routed to<br>
a handler on the opposite side which would parse the request and send a<br>
response, which could be then read by the requester. A request may be<br>
initialized by either the BMC or the host.<br>
<br>
GTL is essentially a packetization specification for arbitrary data, with<br>
bindings on other protocols. Given an arbitrary piece of data, GTL breaks the<br>
data up into packets:<br>
<br>
```<br>
struct packet {<br>
uint8 transaction_id;<br>
uint8 control;<br>
uint32 seq;<br>
uint8 checksum;<br>
uint8 payload[];<br>
};<br>
```<br></blockquote><div><br></div><div>This packet doesn't have a payload size. Also is an 8-bit checksum worthwhile instead of a 16-bit checksum? If the data is sufficiently large, the 8-bit checksum loses a lot of value.</div><div><br></div><div>I'm less worried about the 32-bit for the sequence number as that'll work for quite some time (famous last words). I don't think it's worth adding a timestamp field as was done to address sequence overflow in TCP.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
where `transaction_id` allows packets to be associated with a given transaction<br>
so that multiple transactions can be in progress all at once. The most<br>
significant bit identifies whether the transaction was originated by the host or<br>
the BMC; if it is unset it is from the host, if it is set, it is from the BMC.<br>
In this way, each endpoint only has to keep track of allocations for its<br>
transaction IDs.<br>
<br>
`control` is a bit field:<br>
<br>
* 7:3 reserved for future use<br>
* 2:0 an enum:<br>
- 000 final packet: indicates that it is the last packet in a message.<br>
- 001 not final packet: indicates that there are subsequent packets in the<br>
message.<br>
- 010 retransmit request: requests that all packets after the specified<br>
packet, using the seq field, should be resent.<br>
- 011 ack: indicates that the last packet for a message was received.<br>
- 100 invalid state: indicates that a response packet was received for a<br>
unknown request.<br>
- 1xx reserved for future use: packets using invalid (reserved) control<br>
codes must be silently dropped.<br>
<br>
`seq` is a sequence number that is incremented for each packet, for a retry<br>
packet the sequence number is the last packet it received correctly, or 0 if it<br>
has not received the first packet.<br>
<br>
`checksum` is a CRC 8 checksum that includes all of the data of the packet with<br>
the exception of itself. If the checksum in a received packet is invalid, the<br>
packet is dropped and a retransmit request with the previous sequence number is<br>
sent.<br>
<br>
GTL itself is divided into two layers: the transaction layer and the link layer.<br>
<br>
The transaction layer is common across all GTL implementations and is<br>
responsible for translating a message (a request or a response) into a series<br>
of packets and then translating packets back into messages. When the transaction<br>
layer receives a packet, it finds the rx queue associated with the packet's<br>
transaction ID and then checks to see if the packet's sequence number<br>
immediately follows the previous; if it does, it is added to the queue; if not,<br>
it is dropped and a retransmit request is sent with the last sequence number it<br>
correctly received or zero if no packets have been received for that<br>
transaction. If the packet is marked "no more data available", the message is<br>
assembled and passed to the user and an ack is sent. If there is no active<br>
transaction for the packet, meaning a response packet was received and there is<br>
no associated request, an invalid state packet is sent with the same transaction<br>
ID, and sequence number.<br></blockquote><div><br></div><div>Just to verify, if you're sending a 32MB message, I don't think it's prudent to rebuild the entire message in memory before passing it to the receiving end of the "socket."</div><div><br></div><div>In the prototype implementation design, are you planning on having something effectively call "recv" on some library implementing this, with similar semantics to Berkeley sockets?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Sending a packet must follow the above rules. It must retain all packets<br>
associated with a message until an ack is sent. If it receives an invalid state<br>
response, it must cancel the corresponding request, if one exists; it may, but<br>
is not required to take additional action. NB: the most likely explanation for<br>
an invalid state packet is that the endpoint being communicated with was reset.<br>
Other methods of detecting endpoint resets are encouraged and may be used to<br>
cancel requests.<br>
<br>
Although not required for correct functionality, each message will be selected<br>
according to a round robin strategy from which to send a packet; however,<br>
retransmit, ack, and invalid state packets will always be sent before other<br>
packets.<br>
<br>
The link layer is specific to the underlying protocol and is onlg responsible<br>
for exchanging packets between the BMC and the host. For IPMI, the host will<br>
continually send GTL IPMI OEM requests to the BMC; if there are any packets to<br>
send, the body of the IPMI request will contain the next packet to send as<br>
determined by the transaction layer; the BMC will then complete the request with<br>
the next packet as determined by its transaction layer. If either side has<br>
nothing to send, the corresponding body will simply be left empty. When the link<br>
layer is sending the final packet in a message; the next packet that it receives<br>
should be an ACK because ACK packets have higher priority than all other<br>
packets; if an ACK was not received, an error occurred and the link layer should<br>
retransmit the final packet unless requested to retransmit an earlier packet as<br>
directed by a retransmit request.<br></blockquote><div><br></div><div>This made sense for me until the messages portion that's living on top of, but really next to, or both. Can you elaborate a use-case more fully with pseudo-code?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
The host's link layer will issue a GTL IPMI OEM request whenever it has packets<br>
available to send. It must also issue a GTL IPMI OEM request if the response to<br>
the previous request contained a packet. If neither of these conditions are<br>
true, the host will periodically send empty requests to see if the BMC has<br>
anything to send. We could also maybe take advantage of `SMS_ATN`, but I do not<br>
think that is important now.<br>
<br>
As I mentioned, the Generic Flash Access Protocol is built on top of GTL: if GTL<br>
is analogous to TCP, GFA is analogous to a REST interface. All it is is a set of<br>
messages mapped to a particular transaction id. GFA is just a set of protocol<br>
buffer messages:<br></blockquote><div><br></div><div>This comparison is confusing to me. REST is a layer that is directly tied to data with actions-driven endpoints. In the use-case you're describing, -- well, I'm confused. I plan to upload the image over mechanism in big data packets (64KB minus headers), and handle all C2 (command & control) over IPMI OEM.</div><div><br></div><div>In the use-case I'm designing, there is a setup step that precedes being able to even send the data. Is that all handled by the receiver of the messages or should there be "ready to send" type messages?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
```<br>
message RawFlashRequest {<br>
message Read {<br>
uint64 offset = 1;<br>
uint64 size = 2;<br>
}<br>
<br>
message Write {<br>
uint64 offset = 1;<br>
bytes data = 2;<br>
}<br>
<br></blockquote><div>Per the security of our environment, we are disallowing direct write to the flash chip itself. We'll be staging the image and cryptographically verifying it before flashing it. Presumably whatever mechanism is parsing and handling the protobufs will handle that implementation.</div><div><br></div><div>To send the cryptographic signature, should a separate mechanism be used, possibly also over GTL, or would this protobuf set be extended?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
message Info {<br>
}<br>
<br>
message Checksum {<br>
uint64 offset = 1;<br>
uint64 size = 2;<br>
}<br></blockquote><div><br></div><div>I'm not familiar with protobufs but I don't a corresponding response to the Checksum request type.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
bytes partition_guid = 1;<br>
<br>
oneof request {<br>
Read read = 2;<br>
Write write = 3;<br>
Info info = 4;<br>
Checksum checksum = 5;<br>
}<br>
}<br>
<br>
message RawFlashResponse {<br>
message Info {<br>
bytes guid = 1;<br>
uint64 block_size = 2;<br>
uint64 num_blocks = 3;<br>
bool read_only = 4;<br>
}<br>
<br>
oneof response { // Empty for Write<br>
Info info = 1; // from Info<br>
bytes data = 2; // from Read<br>
bytes checksum = 3; // from Checksum<br>
}<br>
}<br>
<br>
message PblogRequest {<br>
message GetBootnum {<br>
}<br>
<br>
message WriteEvent {<br>
bytes payload = 1;<br>
}<br>
<br>
message GetRawRegions {<br>
}<br>
<br>
oneof request {<br>
GetBootnum get_bootnum = 1;<br>
WriteEvent write_event = 2;<br>
GetRawRegions get_raw_regions = 3;<br>
}<br>
}<br>
<br>
message PblogResponse {<br>
oneof response { // Empty for write_event<br>
uint32 bootnum = 1; // from GetBootnum<br>
bytes raw_regions = 2; // from GetRawRegions<br>
}<br>
}<br>
<br>
message FirmwareUpdateRequest {<br>
bytes checksum = 1;<br>
bytes payload = 2;<br>
}<br>
<br>
message FirmwareUpdateResponse {<br>
}<br>
<br>
message GfaRequest {<br>
oneof request {<br>
RawFlashRequest raw_request = 1;<br>
PblogRequest pblog_request = 2;<br>
FirmwareUpdateRequest firmware_update_request = 3;<br>
}<br>
}<br>
<br>
message GfaResponse {<br>
bool success = 1;<br>
string error = 2; // Set when success == false<br>
<br>
oneof response {<br>
RawFlashResponse raw_response = 3;<br>
PblogResponse pblog_response = 4;<br>
FirmwareUpdateResponse firmware_update_response = 5;<br>
}<br>
}<br>
``` <br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Unlike GTL, which is a strictly defined protocol, the above GFA messages are<br>
more of a suggestion of operations that would be provided. The reason we chose<br>
protobufs is that protobufs, are an efficient encoding, and more importantly<br>
that they are extensable.</blockquote><div> </div><div>Are protobufs sufficiently widely accepted that using them here is worthwhile? I don't see anything in the structures that can't be equally compactly represented via normal data structures, and are just as extensible.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The reason that this is so important is that a flash<br>
chip is likely to have multiple partitions that each have its own security<br>
policy; for example, a CPU flash chip is likely to have a firmware image, which<br>
should only be updated with cryptographically signed updates, but it will also<br>
have an NVRAM variable section, which depending on the BIOS implementation could<br>
have variables that could effect the security of the BIOS and they may not; we<br>
also have our own system event log called PBLog; most would probably not be<br>
interested in this, but it is a good example of a message type that the host<br>
should be able to send us at anytime and the only policy we would likely have is<br>
rate limiting.<br>
<br>
In addition to allowing easy modification to meet the needs of different<br>
systems, using a mechanism that readily allows modification means that the<br>
protocol can be updated and can evolve over time.<br>
<br>
In summary, the only required messages are GfaRequest and GfaResponse, which can<br>
be modified to support arbitrary functionality.<br>
<br>
On a side note, all of the mechanisms provided above can be used to implement<br>
other arbitrary protocols; for example, we have implemented I2C passthrough<br>
using raw IPMI OEM messages, which have some short comings due to the length<br>
restrictions of IPMI messages; using the above mechanisms, GTL with protobufs,<br>
implementing I2C passthrough would be very easy. The possibility of implementing<br>
other protocols beyond GFA is why I decided to support multiple concurrent<br>
transactions.<br>
<br>
We are planning on implementing GTL within the kernel as its own framework. The<br>
IPMI based link layer implementation is intended to be implemented on top of the<br>
BMC side IPMI framework I have been discussing with Corey. I think my decision<br>
to put it here is intuitive since it is really supposed to be a hardware<br>
abstraction of sorts, but we can discuss it if it is not obvious.<br>
<br>
Currently, GFA has a userland tool to be used from the host and a BMC side<br>
daemon. The daemon is mostly complete; the tool is basically a wrapper around<br>
the protobufs listed above, so there is not much to do.<br>
<br>
Cheers!<br>
</blockquote></div><br></div></div>