apply for a new repo "openbmc/node-data-sync"

Patrick Williams patrick at stwcx.xyz
Fri Apr 23 23:04:21 AEST 2021


Hello Daniel,

I've read through this and feel like most of my original questions are
still unanswered.  I've pasted them here again for reference:

---
> So, we need a new repo to sync the sensor data between primary node and
+secondary nodes, could you create a repo "openbmc/node-data-sync"? thanks.

Do you have a design on how this would be accomplished?  I'm curious
what your proposal ends up looking like at a dbus-level.  We should make
sure this is documented at a high level and also aligns with how we are
handling multi-host scenarios already in our dbus layout.

How are the two BMC devices connected?  How do you ensure you can safely
/ securely accept the data between the two BMCs?  If done poorly this
could open up a large security attack surface.

I suspect your software design is somewhat driven by a particular hardware
design as well, so we need to make sure it is clear the scoping of your
solution and should probably even communicate that in the name.  For
instance, if your data transport is IPMB over i2c and you don't plan on
this being a general daemon that could be expanded to a network
connection, we shouldn't name it so generally.
---


On Mon, Apr 19, 2021 at 10:26:31AM -0400, Brad Bishop wrote:
> Hi Daniel
> 
> Typically email attachments are not opened.  To ensure your mail gets 
> read, avoid attachments.  Resending without the attachment...
> 
> On Mon, Apr 19, 2021 at 12:41:00PM +0800, yugang.chen wrote:
> >attach the design document, please take a look.
> >
> >Best Regards
> >Daniel(Yugang)
> >
> 
> >## Requirements
> >resources, secondary BMCs need to collect its local management resources and report
> >them to primary BMC.

Why is this a requirement?  What is driving it?

You could alternatively have DC-level software gather data directly from
both BMCs.  Please put that in the alternative and give rationale on why
this extra software is a better solution to the problem.

> >## Proposed Design
> >Each time a BMC reboot, The BMC needs to check 3 GPIO Pins: FM_STANDALONE_MONDE_N / FM_4S_8S_N_MODE / FM_NODE_ID to get working mode as standalone mode or primary /secondary(4S/8S) role according to the GPIO values.
> >After confirming the mode and BMCs' role, BMCs should set properties according
> >to the correct configuration.
> >In 4S/8S mode only node id 0 will be primary BMC because only this node will be the PCH.L. Node id 1,2,3 will be the secondary nodes.

Some of this is very specific to your hardware design / architecture
being used (ex. "PCH" is an Intel term).  We may want to put some of
this into a background or architecture-specific section.

I get two high-level design points from reading this in my mind:
    1. You are not handling dynamic primary/secondary BMC selection; it
       is all assigned by physical position.  This is not unreasonable
       but it isn't sufficient for some other designs, so we should
       spell this out.
    2. Assignment of role will be done based on GPIOs for your
       particular implementation, but could be handled in a different
       manner for other systems.

Based on #2, I would suggest the role-selection code is separated from
any dbus/Redfish propagation code.  The role should likely be a dbus
property.

Also, these GPIO names seem like they are schematic names.  You might
want to look at docs:designs/device-tree-gpio-naming.md.  We typically
want to see a logical name from a userspace application point of view.

> >
> >Once a BMC gets mode is in 4S/8S, node roles are configured by node
> >ID (GPIO Pins) and keep consistent once AC on. 

I think I know what you meant by this, but the wording is very difficult for
BMCs to accomplish.  OCP systems don't even have "AC power" since the
server is supplied DC 12 or 48v from the rack.  The BMC has almost no
way to know if it was reset in certain ways vs had standby power cycled.

I would suggest at a hardware level that the node roles are kept static
during the entire standby power cycle.  Likely this is what you meant, but
the wording here could be clearer.

> >Once node role check is done,
> >
> >In 4S/8S mode, Primary BMC needs to broadcast its role to make sure there is only
> >one primary BMC in the system.

And what if not?

> >Need a new feature to make sure secondary BMCs send local redfish events to primary
> >BMC. And primary BMC needs to add a tag to those events coming from secondary BMC so
> >that user can know the event logs happened on which node.

Why Redfish and not dbus?  I think this is the first time that Redfish
is mentioned.

Does the primary really just need "events" or does it really need all of
the data necessary to create a Redfish model of the resources modeled by
the secondary?  I assume the latter, because if it was just events then
something still needs to go to the secondary to get the rest of the
model and to make changes to it.

> >Even in 4S/8S mode, each BMC will collect its local management resources,
> >including sensors, fans and do FSC according to the values of local sensors.
> >PSU and Fans on each node will not be connected together. Configuration
> >settings of each secondary node will remain the same, and won't be synced across
> >the nodes.

FSC?  Fan Speed Control?  Please don't use new and previously unused acronyms,
unless they are a well-known / industry standard protocol.

> >In 4S/8S mode, PECI will only be connected to primary node. Primary BMC needs to
> >monitor all CPU and DIMM sensors and deliver the sensor values of the CPUs/DIMMs
> >on secondary nodes to secondary BMCs. So that secondary BMCs can use the sensor values to control their own FSCs. Primary BMC also needs to have a way to find how many
> >CPUs are in the whole system include Primary and Secondary nodes.

This paragraph implies to me that the primary is also *pushing* sensor
values to the secondary so that the secondary can make local decisions
about fan speeds from information obtained over PECI on the primary?

It really seems like Redfish "events" is not sufficient for what you're
trying to accomplish, but you want some generic "sync dbus state between
BMCs" design.

> >## Alternatives Considered

All of these alternatives need an explaination of why this alternative
was not chosen.

> >Primary node monitors all the IPMI sensors in secondary nodes and creates redfish log
> >by itself.

Is "IPMI sensors" really what you wanted to say here?

> >Instead of BMC reboot, only AC cycle will make BMCs check GPIO pins and set
> >Legacy BMC or Non-Legacy BMC mode.

I didn't get the impression above that BMC reboot was when GPIO pins
were checked (and part of the reason why I pointed that out above).

> >Only primary BMC broadcast its role and secondary only waiting for the broadcast
> >from primary.

I'm not sure what this means...

> >## Impacts
> >Only on the motherboard where legacy PCH is located, POST code/Front Panel/KCS
> >port/UART will work, while these interfaces on board with non-legacy PCH will
> >not work due to BIOS and HW design. And this will cause non-functioning of
> >SOL/KVM/Virtual media on secondary BMCs.

We are interested in *software* impacts here.  What code needs to
change?  How does this new design affect the way we think about existing
software components?

-- 
Patrick Williams
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20210423/8c4752be/attachment-0001.sig>


More information about the openbmc mailing list