Summarizing Meeting on BMC Aggregation

Mon Jan 27 20:49:48 AEDT 2020

Hi Richard,

Thanks for capturing and sharing the discussion here. If I am reading it 
all correct, it looks like the aggregator here is an external entity and 
not part of one of the BMCs in the domain. To somewhat relate, this is 
kind of an aggregator like Nagios. Did I get it correct ?

The email mentions "data and control". Could you give an example 
solution on how below problem statements may be seen and executed by the 
proposed aggregator ?

*Hypothetical Problems*:

Case-1 : I have 4 Nodes in the rack, with each having a BMC inside, 
responsible for doing things for THAT node.
I want to power-on all the nodes in the rack and I want to use RedFish 
from a Management console.
Where is the aggregator in this setup and how is it orchestrated ?

Case-2 : Some BMC fails to power on the container node and it needs to 
report the error back to the initiator.

Thank you very much for taking this initiative,

!! Vishwa !!

On 1/17/20 1:45 AM, Richard Hanley wrote:
> Hi everyone,
>
> We had a meeting today to talk about BMC aggregation.  I wanted to 
> thank everyone who joined.
>
> Below is my summary of the topics we discussed, and some of the action 
> items I took from the meeting.  Please let me know if there was 
> something important that I missed or miss-characterized.
> ------------------------------------------------------------------------------------------------------
>
> There is a strong need to aggregate data and control features from 
> multiple BMCs into a single uniform view of a "machine."
>
> The definition of a machine here is relatively opaque, but it can be 
> thought of as an atomic physical unit for management.  A machine is 
> then split into multiple domains, each of which is managed by some 
> management controller (most cases it would be a BMC).  There may be 
> some cases where a domain has multiple BMCs for redundancy.
>
> Domains are relatively close to each other physically. Sometimes they 
> will be in the same chassis/enclosure, while other cases they will be 
> in an adjacent tray.
>
> One key point that was discussed in this meeting was that the data and 
> transport of these domains is relatively unconstrained.  Domains may 
> be connected to the aggregator via a LAN, but there is a 
> community need to support other transports like SMBus and PCIe.
>
> An aggregator will likely need to be split up into three layers:
>
> 1) The lowest layer would detect, import, and transform individual 
> domains into a common data model.  We would need to provide a 
> specification for that data model and tooling for implementers to 
> create their own instance of a domain's data.
>
> 2) An aggregation layer would take the instances of these domain level 
> data models, and aggregate them into a single view or graph of the 
> system.  This process could be relatively automated graph manipulation.
>
> 3) A presentation layer would take that aggregate, and expose it to 
> the outside world.  This presentation layer could be Redfish, but 
> there is some divergence on that (see below). Regardless, we would 
> need tooling to program against the data model for implementers to 
> modify their presentation layers as needed.
>
> There is fairly broad agreement that Layer 1 would need to support 
> multiple protocols including; Redfish, PLDM/MCTP, and legacy IPMI 
> systems.  There would need to be support for creating custom drivers 
> for importing these various transports into a common data model.
>
> There is some diverging needs when it comes to the presentation 
> layer.  Here at Google, we were planning to have the presentation 
> layer be primarily Redfish and the common data model would be more 
> Redfish focused.  Neeraj pointed out that there are some needs for 
> other presentation layers besides Redfish.
>
> Some other design considerations include the hardware target for this 
> aggregator.  This aggregator will have to run on an OpenBMC platform, 
> but Google has some need for an aggregator to run on host linux 
> machines for legacy platforms without an out of band connection.
>
> Another consideration is the security of this aggregator. The 
> aggregation layer will have the primary responsibility of 
> adjudicating authentication and authorization for the sub-ordinate nodes.
>
> One of the key takeaways (for me anyways) from this meeting is that 
> there is a community interest in keeping this aggregator generic, and 
> not tied to closely to a particular protocol, transport, or 
> presentation layer.  There was mention of the CIM data model that may 
> be appropriate for this situation.
>
> We will be having follow-up meetings because this project is going to 
> take some time to scope out and design.  I will be researching prior 
> art for existing data models that we could build a presentation layer 
> off of.
>
> Regards,
> Richard