BMC redundancy
Deepak Kodihalli
dkodihal at linux.vnet.ibm.com
Fri Apr 13 21:57:24 AEST 2018
On 13/04/18 5:15 pm, Deepak Kodihalli wrote:
> On 06/02/18 12:14 pm, Deepak Kodihalli wrote:
>> On 06/02/18 11:40 am, Michael E Brown wrote:
>>> On Fri, Feb 02, 2018 at 01:10:43PM -0800, Vernon Mauery wrote:
>>>> On 02-Feb-2018 11:18 AM, Andrew Jeffery wrote:
>>>>> Hi Deepak,
>>>>>
>>>>>> So several of the existing OpenBMC apps implement specific D-Bus
>>>>>> services. What does it take to make remote D-Bus calls to such apps?
>>>>>> - It doesn't look like the D-Bus spec or libdbus officially has
>>>>>> anything
>>>>>> for D-Bus across computers. There are some good notes at
>>>>>> https://www.freedesktop.org/wiki/Software/DBusRemote/.
>>>>>
>>>>> Applications can cannect to remote dbus servers; the --address
>>>>> option to dbus-daemon allows it to listen on a TCP socket and
>>>>> setting DBUS_SESSION_BUS_ADDRESS will point applications in the
>>>>> right direction. So there are probably two ways we could do this:
>>>>
>>>> Putting DBus on an externally-available TCP socket is a security
>>>> architect's
>>>> nightmare. All command and control of the entire BMC is done over
>>>> DBus; we
>>>> cannot put that on an externally-available address. I suppose if you
>>>> have an
>>>> internal connection and switching fabric between the nodes, this
>>>> would be
>>>> possible.
>>>
>>> Vernon, I completely and wholeheartedly agree with this assessment.
>>> Most of the
>>> things I've heard so far start way too high up in the stack and try
>>> to solve
>>> the issue there. I believe that we ought to start at the networking
>>> layer and
>>> build up from there.
>>>
>>> Here is a dump of my thoughts related to what we talked about on the
>>> conference
>>> call.
>>>
>>> Step 0: Internal network
>>> Definition: This is a private network that is internal to a
>>> chassis and
>>> cannot be attached via the external world. This does not make this
>>> network
>>> implicitly secure, however. I would strongly suggest that we create
>>> standards
>>> for addressing this network and how components communicate at the IP
>>> level
>>> between BMCs.
>>> Proposal: all nodes on this network use IPv6 SLAAC, with a
>>> designation
>>> for one or a redundant pair of nodes running "radvd" to provide stable
>>> site-local address space assignments.
>>>
>>> Step 1: Discovery
>>> Definition: This is how you figure out that there are other BMCs
>>> on the
>>> internal network. Zeroconf/Avahi/MDNS are three names for one method.
>>> This also
>>> includes figuring out which role any specific node may fill
>>> (chassis-level BMC
>>> vs node-level BMC, for example).
>>> Proposal: MDNS DNS-SD with TXT records indicating BMC Role
>>>
>>> Step 2: Exchanging certificates
>>> Definition: To start a crypto system, one needs to exchange public
>>> keys, at a minimum. Other information about a machine can also be
>>> useful. Note
>>> that exchanging certificates does not imply "trust" of those
>>> certificates, but
>>> only provides the basis upon which you later decide if you trust or not.
>>> Proposal: Oauth 2.0 Dynamic Client Registration
>>>
>>> Step 3: Trust
>>> Definition: based on many factors, a machine decides if it trusts
>>> another machine. There is the implication that end-users may wish to
>>> disable
>>> new trust relationships from forming, but that's not a requirement.
>>> Factors
>>> that determine trust: 1) which network initiated connection, 2) machine
>>> identity certs, 3) any signatures on the machine identity certs (for
>>> example, a
>>> vendor signature probably implies higher trust), 4) user
>>> wishes/configuration,
>>> and 5) any possible remote attestation (TPM) or similar. These
>>> factors could
>>> certainly be extended to include many other things, but these are a
>>> baseline
>>> for starting to talk about this. Depending on user settings, devices
>>> might
>>> require OAuth 2.0 device flow or be implicitly trusted based on
>>> vendor-signed
>>> device certificates.
>>> Proposal: The OAuth 2.0 Client Credentials grant or the OAuth 2.0
>>> device flow. (Depending on policy)
>>>
>>> Step 4: RPC
>>> After establishing trust, you need a mechanism to do remote calls
>>> between machines. This could be as simple as REST (using oauth tokens
>>> granted
>>> in #3), or as complex as a full DBUS interface.
>>> Proposal: None at this time
>>>
>>> Step 5: Clustering
>>> Definition: Clustering is generally divided into a cluster
>>> communication protocol and a cluster resource manager. The resource
>>> manager has
>>> the job of taking developer constraints about which daemons have to
>>> run where
>>> and what resources they need, and running the jobs on the machines
>>> available.
>>> For example, you might specify that only one machine should run sensors
>>> connected to physical hardware i2c lines, and specify the list of
>>> daemons that
>>> depend on these hardware resources. The resource manager would be
>>> responsible
>>> for running the daemons in the correct order on the correct number of
>>> machines.
>>> Proposal: Corosync/Pacemaker have fairly reliable and flexible
>>> clustering, and can describe complex requirements.
>>>
>>> The one thing to keep in mind is that everything up to step 4/5 is
>>> part of your
>>> forwards/backwards compatibility guarantee for all time for
>>> everything in the
>>> chassis. To make sure it is supportable for a very long time, try to
>>> keep it as
>>> simple as possible, but no simpler.
>>>
>>> Another part of the design is figuring out if you need Active/Active
>>> clustering, or Active/Standby. If you can get away with
>>> Active/Standby, you can
>>> greatly minimize your RPC requirements between the machines, to the
>>> point you
>>> don't really need much other than REST.
>>>
>>> --
>>> Michael
>>
>>
>> Thanks for this break-up and summary, Michael. I'm trying to collect
>> the factors that can help us weighing up the pros and cons of the RPC
>> mechanism (REST vs D-Bus vs something else). This is what I've
>> gathered so far :
>>
>> - Some of the other layers in the picture can influence the RPC
>> mechanism - you've brought out the active/active vs active/standby
>> design, but I'm not sure if it's possible to identify an RPC mechanism
>> that fits all such designs, because such configurations could depend
>> on the system design, which can vary. So REST for example may not fit
>> the bill for a multi-master.
>> - One thing about D-Bus that people have brought up is that our
>> REST/Redfish API might be based around the existing D-Bus interface,
>> so it kind of seems natural that the same interface serves as the RPC
>> interface as well.
>> - Security aspects of D-Bus vs REST : depends on the design/mechanisms
>> chosen for the steps prior to RPC.
>> - Any other factors?
>>
>> Regards,
>> Deepak
>
> Hello,
>
> I'd like to resurrect this topic. A quick recap - this is applicable to
> multi BMC (or peer BMC) systems, where typically each BMC is managing a
> host.
>
> While I agree about solving this problem across different layers : the
> internal network between BMCs, RPC, trust, clustering, etc, some of
> these have well-known solutions that can be easily integrated into
> OpenBMC (such as Avahi for discovery and Corosync for clustering). We
> still need discussions in those areas, but with this email I'd like to
> propose a peer-BMC RPC mechanism based on a D-Bus model. The motivation
> behind using D-Bus is that most BMC apps should be minimally impacted or
> would be agnostic to the fact that there are peer BMCs in the system.
>
> I've been thinking of various use-cases to see how a D-Bus based RPC
> fits in. Consider one such use case where an external application wants
> to issue a power-on command to each of the peer BMCs. Further, the
> external app wants to communicate with a specific point-of-contact (POC)
> BMC, expecting the POC to broadcast the command across peers and to
> aggregate responses. Let's also consider that the external application
> is using redfish api to communicate to the POC, so it might send out
> something like /redfish/v1/system/control with {"power" : "on"} (I don't
> know the exact redfish api for this).
>
> The POC has two tasks here - translating the redfish request to a D-Bus
> method call (which would have to be done even for a single BMC system),
> and then propagating that call across BMCs. The proposal is that the
> same D-Bus model is created on every peer BMC, with appropriately named
> object paths. So in this case, say on a 4-BMC system, you could have the
> following :
>
> /bmc0/xyz/openbmc_project/host/control/power
> /bmc1/xyz/openbmc_project/host/control/power
> /bmc2/xyz/openbmc_project/host/control/power
> /bmc3/xyz/openbmc_project/host/control/power
>
> On every BMC, the objects in the model that point to other BMCs are
> proxies, they'll route D-Bus calls to the relevant BMC and retrieve the
> response. The advantage of this approach is that for eg the rest/redfish
> code that attempts to find D-Bus objects corresponding to a REST URI
> (say by making a mapper query based on the D-Bus interface) will still
> work as before, just that there will be proxy objects found as well now.
> These proxy objects will implement the same interface that a
> /xyz/openbmc_project/host/control/power D-Bus object would. This
> approach also works with native D-Bus apps wanting to communicate with
> peer BMCs; a differentiation of a native vs remote path can be made
> based on the path itself, or the proxy objects could implement an
> interface indicating they're remote objects.
>
> So effectively most of the well-known D-Bus model would reside on each
> peer BMC. In terms of how this scales, I guess it's comparable to a
> single BMC managing multiple nodes.
>
> With the previous example, the object paths to construct are well-known,
> but this may not apply to objects such as error logs. In that case, I
> think it should be possible to implement proxy "object managers" under
> well known D-Bus roots. So a call to retrieve all objects under the
> D-Bus root will retrieve objects across all the peer BMCs, with the
> proxy object managers routing the request to the appropriate BMCs.
>
> For the actual remote D-Bus calls, one possibility is to use the API
> offered by sdbus (it transports D-Bus messages over ssh).
>
> Thanks,
> Deepak
Forgot copying the list.
Regards,
Deepak
More information about the openbmc
mailing list