BMC redundancy

Brad Bishop bradleyb at fuzziesquirrel.com
Tue Jan 30 05:14:49 AEDT 2018


+cc: list

> On Jan 29, 2018, at 1:13 PM, Brad Bishop <bradleyb at fuzziesquirrel.com> wrote:
> 
> Thanks for the quick reply!
> 
>> On Jan 29, 2018, at 11:45 AM, Alexander Amelkin <a.amelkin at yadro.com> wrote:
>> 
>> Brad, do you have any examples or existing systems having multiple BMCs with any commercial firmware like MegaRAC ?
> 
> Not really.  I’m just familiar with IBM systems and this is something
> we have done in our commercial BMC stacks for a long time.
> 
>> 
>> I can't think of a problem you're proposing to address with multiple BMCs for a single host, but I can imagine a number of problems it may add.
> 
> Indeed.  Obviously complexity is one.  I’d be interested in hearing about
> other problems.  You’ve exposed my agenda - I’m looking for ways for IBM to
> be able to support something like this in OpenBMC but at the same time
> minimize the complexity burden for everyone else.
> 
>> 
>> BMC lockup? Solved by hardware watchdog.
>> BMC firmware corruption? Solved by read-only golden image on a separate flash IC (well supported by at least Aspeed).
>> BMC DoS attack? Solved by network isolation and overall correct network environment configuration.
>> BMC chip burnout? Does it happen at all? Isn't this an indicator of some major hardware design flaw? Does adding another chip actually solve this problem?
> 
> Yeah on the surface I agree with all your points here, assuming the
> definition of a system is a single board.  It doesn’t make sense.
> 
> We do have some modular system designs though where N discrete chassis
> can be connected together with high speed cabling or a backplane for
> a single SMP fabric across the N chassis for example.  Its these kind
> of system designs where multiple BMCs make a little more sense.
> 
>> 
>> What else?
> 
> It's really the connections between the BMC and the host hardware on
> these larger systems.  These busses can and do have both transient and
> hard failures, and IBM needs a way to maintain connections from a BMC
> to the host hardware in that state.  Also, while not really a redundancy
> statement, it simply isn’t economical for BMC vendors to develop SOCs
> that can provide enough pins on systems this large.
> 
> -brad


More information about the openbmc mailing list