Summarizing Meeting on BMC Aggregation

Tue Jan 28 01:58:01 AEDT 2020

Missed mentioning this variant.

All the 4 nodes in the rack together form 1 Machine. So, a power-on 
would mean, power-on all the nodes. Similarly, "Get the data" would 
mean, "Get the data" from all the nodes.

 From an external entity, there is ONE power-on. However, it needs to be 
deciphered into 4 power-on, one per each BMC in the rack

Thanks,

!!Vishwa !!

On 1/27/20 3:19 PM, vishwa wrote:
> Hi Richard,
>
> Thanks for capturing and sharing the discussion here. If I am reading 
> it all correct, it looks like the aggregator here is an external 
> entity and not part of one of the BMCs in the domain. To somewhat 
> relate, this is kind of an aggregator like Nagios. Did I get it correct ?
>
> The email mentions "data and control". Could you give an example 
> solution on how below problem statements may be seen and executed by 
> the proposed aggregator ?
>
> *Hypothetical Problems*:
>
> Case-1 : I have 4 Nodes in the rack, with each having a BMC inside, 
> responsible for doing things for THAT node.
> I want to power-on all the nodes in the rack and I want to use RedFish 
> from a Management console.
> Where is the aggregator in this setup and how is it orchestrated ?
>
> Case-2 : Some BMC fails to power on the container node and it needs to 
> report the error back to the initiator.
>
> Thank you very much for taking this initiative,
>
> !! Vishwa !!
>
> On 1/17/20 1:45 AM, Richard Hanley wrote:
>> Hi everyone,
>>
>> We had a meeting today to talk about BMC aggregation.  I wanted to 
>> thank everyone who joined.
>>
>> Below is my summary of the topics we discussed, and some of the 
>> action items I took from the meeting.  Please let me know if there 
>> was something important that I missed or miss-characterized.
>> ------------------------------------------------------------------------------------------------------ 
>>
>>
>> There is a strong need to aggregate data and control features from 
>> multiple BMCs into a single uniform view of a "machine."
>>
>> The definition of a machine here is relatively opaque, but it can be 
>> thought of as an atomic physical unit for management.  A machine is 
>> then split into multiple domains, each of which is managed by some 
>> management controller (most cases it would be a BMC).  There may be 
>> some cases where a domain has multiple BMCs for redundancy.
>>
>> Domains are relatively close to each other physically. Sometimes they 
>> will be in the same chassis/enclosure, while other cases they will be 
>> in an adjacent tray.
>>
>> One key point that was discussed in this meeting was that the data 
>> and transport of these domains is relatively unconstrained.  Domains 
>> may be connected to the aggregator via a LAN, but there is a 
>> community need to support other transports like SMBus and PCIe.
>>
>> An aggregator will likely need to be split up into three layers:
>>
>> 1) The lowest layer would detect, import, and transform individual 
>> domains into a common data model.  We would need to provide a 
>> specification for that data model and tooling for implementers to 
>> create their own instance of a domain's data.
>>
>> 2) An aggregation layer would take the instances of these domain 
>> level data models, and aggregate them into a single view or graph of 
>> the system.  This process could be relatively automated graph 
>> manipulation.
>>
>> 3) A presentation layer would take that aggregate, and expose it to 
>> the outside world.  This presentation layer could be Redfish, but 
>> there is some divergence on that (see below). Regardless, we would 
>> need tooling to program against the data model for implementers to 
>> modify their presentation layers as needed.
>>
>> There is fairly broad agreement that Layer 1 would need to support 
>> multiple protocols including; Redfish, PLDM/MCTP, and legacy IPMI 
>> systems.  There would need to be support for creating custom drivers 
>> for importing these various transports into a common data model.
>>
>> There is some diverging needs when it comes to the presentation 
>> layer.  Here at Google, we were planning to have the presentation 
>> layer be primarily Redfish and the common data model would be more 
>> Redfish focused.  Neeraj pointed out that there are some needs for 
>> other presentation layers besides Redfish.
>>
>> Some other design considerations include the hardware target for this 
>> aggregator.  This aggregator will have to run on an OpenBMC platform, 
>> but Google has some need for an aggregator to run on host linux 
>> machines for legacy platforms without an out of band connection.
>>
>> Another consideration is the security of this aggregator. The 
>> aggregation layer will have the primary responsibility of 
>> adjudicating authentication and authorization for the sub-ordinate 
>> nodes.
>>
>> One of the key takeaways (for me anyways) from this meeting is that 
>> there is a community interest in keeping this aggregator generic, and 
>> not tied to closely to a particular protocol, transport, or 
>> presentation layer.  There was mention of the CIM data model that may 
>> be appropriate for this situation.
>>
>> We will be having follow-up meetings because this project is going to 
>> take some time to scope out and design.  I will be researching prior 
>> art for existing data models that we could build a presentation layer 
>> off of.
>>
>> Regards,
>> Richard
>