BMC redundancy

Deepak Kodihalli dkodihal at
Fri Apr 13 21:57:24 AEST 2018

On 13/04/18 5:15 pm, Deepak Kodihalli wrote:
> On 06/02/18 12:14 pm, Deepak Kodihalli wrote:
>> On 06/02/18 11:40 am, Michael E Brown wrote:
>>> On Fri, Feb 02, 2018 at 01:10:43PM -0800, Vernon Mauery wrote:
>>>> On 02-Feb-2018 11:18 AM, Andrew Jeffery wrote:
>>>>> Hi Deepak,
>>>>>> So several of the existing OpenBMC apps implement specific D-Bus
>>>>>> services. What does it take to make remote D-Bus calls to such apps?
>>>>>> - It doesn't look like the D-Bus spec or libdbus officially has 
>>>>>> anything
>>>>>> for D-Bus across computers. There are some good notes at
>>>>> Applications can cannect to remote dbus servers; the --address 
>>>>> option to dbus-daemon allows it to listen on a TCP socket and 
>>>>> setting DBUS_SESSION_BUS_ADDRESS will point applications in the 
>>>>> right direction. So there are probably two ways we could do this:
>>>> Putting DBus on an externally-available TCP socket is a security 
>>>> architect's
>>>> nightmare. All command and control of the entire BMC is done over 
>>>> DBus; we
>>>> cannot put that on an externally-available address. I suppose if you 
>>>> have an
>>>> internal connection and switching fabric between the nodes, this 
>>>> would be
>>>> possible.
>>> Vernon, I completely and wholeheartedly agree with this assessment. 
>>> Most of the
>>> things I've heard so far start way too high up in the stack and try 
>>> to solve
>>> the issue there. I believe that we ought to start at the networking 
>>> layer and
>>> build up from there.
>>> Here is a dump of my thoughts related to what we talked about on the 
>>> conference
>>> call.
>>> Step 0: Internal network
>>>     Definition: This is a private network that is internal to a 
>>> chassis and
>>> cannot be attached via the external world. This does not make this 
>>> network
>>> implicitly secure, however. I would strongly suggest that we create 
>>> standards
>>> for addressing this network and how components communicate at the IP 
>>> level
>>> between BMCs.
>>>     Proposal: all nodes on this network use IPv6 SLAAC, with a 
>>> designation
>>> for one or a redundant pair of nodes running "radvd" to provide stable
>>> site-local address space assignments.
>>> Step 1: Discovery
>>>     Definition: This is how you figure out that there are other BMCs 
>>> on the
>>> internal network. Zeroconf/Avahi/MDNS are three names for one method. 
>>> This also
>>> includes figuring out which role any specific node may fill 
>>> (chassis-level BMC
>>> vs node-level BMC, for example).
>>>     Proposal: MDNS DNS-SD with TXT records indicating BMC Role
>>> Step 2: Exchanging certificates
>>>     Definition: To start a crypto system, one needs to exchange public
>>> keys, at a minimum. Other information about a machine can also be 
>>> useful. Note
>>> that exchanging certificates does not imply "trust" of those 
>>> certificates, but
>>> only provides the basis upon which you later decide if you trust or not.
>>>     Proposal: Oauth 2.0 Dynamic Client Registration
>>> Step 3: Trust
>>>     Definition: based on many factors, a machine decides if it trusts
>>> another machine. There is the implication that end-users may wish to 
>>> disable
>>> new trust relationships from forming, but that's not a requirement. 
>>> Factors
>>> that determine trust: 1) which network initiated connection, 2) machine
>>> identity certs, 3) any signatures on the machine identity certs (for 
>>> example, a
>>> vendor signature probably implies higher trust), 4) user 
>>> wishes/configuration,
>>> and 5) any possible remote attestation (TPM) or similar. These 
>>> factors could
>>> certainly be extended to include many other things, but these are a 
>>> baseline
>>> for starting to talk about this. Depending on user settings, devices 
>>> might
>>> require OAuth 2.0 device flow or be implicitly trusted based on 
>>> vendor-signed
>>> device certificates.
>>>     Proposal: The OAuth 2.0 Client Credentials grant or the OAuth 2.0
>>> device flow. (Depending on policy)
>>> Step 4: RPC
>>>     After establishing trust, you need a mechanism to do remote calls
>>> between machines. This could be as simple as REST (using oauth tokens 
>>> granted
>>> in #3), or as complex as a full DBUS interface.
>>>     Proposal: None at this time
>>> Step 5: Clustering
>>>     Definition: Clustering is generally divided into a cluster
>>> communication protocol and a cluster resource manager. The resource 
>>> manager has
>>> the job of taking developer constraints about which daemons have to 
>>> run where
>>> and what resources they need, and running the jobs on the machines 
>>> available.
>>> For example, you might specify that only one machine should run sensors
>>> connected to physical hardware i2c lines, and specify the list of 
>>> daemons that
>>> depend on these hardware resources. The resource manager would be 
>>> responsible
>>> for running the daemons in the correct order on the correct number of 
>>> machines.
>>>     Proposal: Corosync/Pacemaker have fairly reliable and flexible
>>> clustering, and can describe complex requirements.
>>> The one thing to keep in mind is that everything up to step 4/5 is 
>>> part of your
>>> forwards/backwards compatibility guarantee for all time for 
>>> everything in the
>>> chassis. To make sure it is supportable for a very long time, try to 
>>> keep it as
>>> simple as possible, but no simpler.
>>> Another part of the design is figuring out if you need Active/Active
>>> clustering, or Active/Standby. If you can get away with 
>>> Active/Standby, you can
>>> greatly minimize your RPC requirements between the machines, to the 
>>> point you
>>> don't really need much other than REST.
>>> -- 
>>> Michael
>> Thanks for this break-up and summary, Michael. I'm trying to collect 
>> the factors that can help us weighing up the pros and cons of the RPC 
>> mechanism (REST vs D-Bus vs something else). This is what I've 
>> gathered so far :
>> - Some of the other layers in the picture can influence the RPC 
>> mechanism - you've brought out the active/active vs active/standby 
>> design, but I'm not sure if it's possible to identify an RPC mechanism 
>> that fits all such designs, because such configurations could depend 
>> on the system design, which can vary. So REST for example may not fit 
>> the bill for a multi-master.
>> - One thing about D-Bus that people have brought up is that our 
>> REST/Redfish API might be based around the existing D-Bus interface, 
>> so it kind of seems natural that the same interface serves as the RPC 
>> interface as well.
>> - Security aspects of D-Bus vs REST : depends on the design/mechanisms 
>> chosen for the steps prior to RPC.
>> - Any other factors?
>> Regards,
>> Deepak
> Hello,
> I'd like to resurrect this topic. A quick recap - this is applicable to 
> multi BMC (or peer BMC) systems, where typically each BMC is managing a 
> host.
> While I agree about solving this problem across different layers : the 
> internal network between BMCs, RPC, trust, clustering, etc, some of 
> these have well-known solutions that can be easily integrated into 
> OpenBMC (such as Avahi for discovery and Corosync for clustering). We 
> still need discussions in those areas, but with this email I'd like to 
> propose a peer-BMC RPC mechanism based on a D-Bus model. The motivation 
> behind using D-Bus is that most BMC apps should be minimally impacted or 
> would be agnostic to the fact that there are peer BMCs in the system.
> I've been thinking of various use-cases to see how a D-Bus based RPC 
> fits in. Consider one such use case where an external application wants 
> to issue a power-on command to each of the peer BMCs. Further, the 
> external app wants to communicate with a specific point-of-contact (POC) 
> BMC, expecting the POC to broadcast the command across peers and to 
> aggregate responses. Let's also consider that the external application 
> is using redfish api to communicate to the POC, so it might send out 
> something like /redfish/v1/system/control with {"power" : "on"} (I don't 
> know the exact redfish api for this).
> The POC has two tasks here - translating the redfish request to a D-Bus 
> method call (which would have to be done even for a single BMC system), 
> and then propagating that call across BMCs. The proposal is that the 
> same D-Bus model is created on every peer BMC, with appropriately named 
> object paths. So in this case, say on a 4-BMC system, you could have the 
> following :
> /bmc0/xyz/openbmc_project/host/control/power
> /bmc1/xyz/openbmc_project/host/control/power
> /bmc2/xyz/openbmc_project/host/control/power
> /bmc3/xyz/openbmc_project/host/control/power
> On every BMC, the objects in the model that point to other BMCs are 
> proxies, they'll route D-Bus calls to the relevant BMC and retrieve the 
> response. The advantage of this approach is that for eg the rest/redfish 
> code that attempts to find D-Bus objects corresponding to a REST URI 
> (say by making a mapper query based on the D-Bus interface) will still 
> work as before, just that there will be proxy objects found as well now. 
> These proxy objects will implement the same interface that a 
> /xyz/openbmc_project/host/control/power D-Bus object would. This 
> approach also works with native D-Bus apps wanting to communicate with 
> peer BMCs; a differentiation of a native vs remote path can be made 
> based on the path itself, or the proxy objects could implement an 
> interface indicating they're remote objects.
> So effectively most of the well-known D-Bus model would reside on each 
> peer BMC. In terms of how this scales, I guess it's comparable to a 
> single BMC managing multiple nodes.
> With the previous example, the object paths to construct are well-known, 
> but this may not apply to objects such as error logs. In that case, I 
> think it should be possible to implement proxy "object managers" under 
> well known D-Bus roots. So a call to retrieve all objects under the 
> D-Bus root will retrieve objects across all the peer BMCs, with the 
> proxy object managers routing the request to the appropriate BMCs.
> For the actual remote D-Bus calls, one possibility is to use the API 
> offered by sdbus (it transports D-Bus messages over ssh).
> Thanks,
> Deepak

Forgot copying the list.


More information about the openbmc mailing list