BMC redundancy
Brad Bishop
bradleyb at fuzziesquirrel.com
Tue Jan 30 05:14:49 AEDT 2018
+cc: list
> On Jan 29, 2018, at 1:13 PM, Brad Bishop <bradleyb at fuzziesquirrel.com> wrote:
>
> Thanks for the quick reply!
>
>> On Jan 29, 2018, at 11:45 AM, Alexander Amelkin <a.amelkin at yadro.com> wrote:
>>
>> Brad, do you have any examples or existing systems having multiple BMCs with any commercial firmware like MegaRAC ?
>
> Not really. I’m just familiar with IBM systems and this is something
> we have done in our commercial BMC stacks for a long time.
>
>>
>> I can't think of a problem you're proposing to address with multiple BMCs for a single host, but I can imagine a number of problems it may add.
>
> Indeed. Obviously complexity is one. I’d be interested in hearing about
> other problems. You’ve exposed my agenda - I’m looking for ways for IBM to
> be able to support something like this in OpenBMC but at the same time
> minimize the complexity burden for everyone else.
>
>>
>> BMC lockup? Solved by hardware watchdog.
>> BMC firmware corruption? Solved by read-only golden image on a separate flash IC (well supported by at least Aspeed).
>> BMC DoS attack? Solved by network isolation and overall correct network environment configuration.
>> BMC chip burnout? Does it happen at all? Isn't this an indicator of some major hardware design flaw? Does adding another chip actually solve this problem?
>
> Yeah on the surface I agree with all your points here, assuming the
> definition of a system is a single board. It doesn’t make sense.
>
> We do have some modular system designs though where N discrete chassis
> can be connected together with high speed cabling or a backplane for
> a single SMP fabric across the N chassis for example. Its these kind
> of system designs where multiple BMCs make a little more sense.
>
>>
>> What else?
>
> It's really the connections between the BMC and the host hardware on
> these larger systems. These busses can and do have both transient and
> hard failures, and IBM needs a way to maintain connections from a BMC
> to the host hardware in that state. Also, while not really a redundancy
> statement, it simply isn’t economical for BMC vendors to develop SOCs
> that can provide enough pins on systems this large.
>
> -brad
More information about the openbmc
mailing list