<div dir="ltr">I'm going to resurrect this thread for the new year.<div><br></div><div>It sounds like there is a decent need for some type of aggregator. Would anyone be interested in setting up a meeting to try and synthesize our use cases into some broadly applicable requirements?</div><div><br></div><div>I'm located on the West Coast, but I have a pretty flexible schedule for other time zones next week.</div><div><br></div><div>Some topics for us to discuss (either in a meeting or offline) include:</div><div><br></div><div>1) Layer 2/3 discovery and negotiation</div><div>2) Caching, proxy, and consistency requirements</div><div>3) Target hardware, performance requirements, and scale of aggregation</div><div>4) Tooling and infrastructure improvements needed to support an aggregator</div><div>5) Amount of configuration and knowledge an aggregator needs to know a priori.</div><div><br></div><div>Any ideas on what else we can cover? Is there a preferred format or medium that would work best to gather these higher level requirements?</div><div><br></div><div></div><div>Regards,</div><div>Richard</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Dec 19, 2019 at 2:17 AM vishwa <<a href="mailto:vishwa@linux.vnet.ibm.com">vishwa@linux.vnet.ibm.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>Richard, <br>
</p>
<p>Thanks for putting it together.<br>
</p>
<div>On 12/13/19 1:32 AM, Richard Hanley
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">In our case we are working to migrate away from
IPMI to Redfish. Most of the solutions I've been thinking about
have leaned pretty heavily into that.
<div><br>
</div>
<div>In my mind I've sliced this project up into a few different
areas.
<div><br>
</div>
<div><b>Merging/Transforming Redfish Resources</b></div>
<div>Let's say that there are several Redfish services. They
will have collections of Systems, Chassis, and Managers
that need to be merged. In the simplest uses this would be
just an HTTP proxy cache with some URL cleaning.</div>
<div><br>
</div>
<div>However, this could end up being a pretty deep merge in
cases where some resources are split across multiple
management domains. Memory errors being on one node, but
the temperature sensor being on a separate node is a good
example. Another example would be the "ContainedBy" link.
These links might reach across different BMC boundaries, and
would need to be inserted by the primary node. </div>
<div><br>
</div>
<div><b>Aggregating Services and Actions</b></div>
<div>This is where I think the DMTF proposals for Redfish
aggregation (located <a href="https://members.dmtf.org/apps/org/workgroup/redfish/document.php?document_id=91811" target="_blank">here</a>) provide
the most insight. My reading of this proposal is that an
aggregation service would be used to tie actions together.
For example, there may be individual chassis reset action
embedded in the chassis resources, and then aggregated
action for a full reset.</div>
<div><br>
</div>
<div>DMTF seems to be leaving the arbiter of the aggregation
up to the implementation. I'd imagine that some
implementations would provide a static aggregation service,
while others would allow clients to create their own dynamic
aggregates.</div>
<div><b><br>
</b></div>
<div><b>Discovery, Negotiation, and Error Recovery</b></div>
<div>This is an area where I'd like to hear more about your
requirements, Vishwa. Would you expect the BMC cluster to
be hot-swappable? Is there a particular reason that it has
to be peer to peer? What kind of error recovery should be
supported when a node fails? </div>
<div><br>
</div>
<div>At a high level, the idea that has been suggested
internally is to have a designated master node at install
time. That node would discover any other Redfish services
on the LAN, and begin aggregating them. The master node
would keep any in memory cache of the other services, and
reload resources on demand. If a node goes down, then there
error is propagated using HTTP return codes. If the master
node goes down, then the entire aggregate will go down. In
theory a client could talk to individual nodes if it needed
to.</div>
<div><b><br>
</b></div>
</div>
</div>
</blockquote>
<p>Case-1:<br>
.......<br>
</p>
<p>Consider a hypothetical case where I have 4 compute nodes, each
having BMC in it and that BMC is responsible for initiating
power-on and other services for that node / getting the debug data
out of that node / etc...</p>
<p>We would want an external Management Console(MC) to manage this
rack. Instead of going to 4 nodes separately, MC can ask 1 BMC
that I am calling as "Point Of Contact" BMC / Primary BMC for that
rack. It is the job of that BMC to do whatever is needed to return
the result.</p>
<p>Similarly, when the POC goes down, we would need another POC.</p>
<p>I believe, Redfish discovery can be used to discover each BMCs.
But how does the heart beat work between discovered BMCs ?<br>
Also, when the POC goes down, how can we sense that and make some
other BMC as POC using Redfish framework ?</p>
<p><br>
Case-2:<br>
.......</p>
<p>I have a control node that is housing 2 BMCs. One can be Primary
and other can be Slave. Each BMC has the complete view of the
whole systems. <br>
</p>
<p>I am assuming, we could still discover the other BMC using
Redfish.. But again, how do we exchange heartbeat and do failover
operations ?</p>
<p>Thanks,</p>
<p>!! Vishwa !!<br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div>
<div><b> Authentication and Authorization</b></div>
</div>
<div>This is an area where I think Redfish is a little hands
off. In an ideal world ACLs could be setup without
proliferating username/passwords across nodes. As an aside,
we've been thinking about how to use Redfish without any
usernames or passwords. By using a combination of
certificates and authorization tokens it should be possible to
extend a security zone to a small cluster of BMCs.</div>
<div><br>
</div>
<div>Regards,</div>
<div>Richard</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, Dec 11, 2019 at 11:33
PM Neeraj Ladkani <<a href="mailto:neladk@microsoft.com" target="_blank">neladk@microsoft.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div lang="EN-US">
<div>
<p class="MsoNormal"><span style="color:windowtext">Sure,
how do we want to enable BMC-BMC communication?
Standard redfish/IPMI ?
</span></p>
<p class="MsoNormal"><span style="color:windowtext"> </span></p>
<p class="MsoNormal"><span style="color:windowtext">Neeraj</span></p>
<p class="MsoNormal"><span style="color:windowtext"> </span></p>
<p class="MsoNormal"><span style="color:windowtext"> </span></p>
<div>
<div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(225,225,225);padding:3pt 0in 0in">
<p class="MsoNormal"><b><span style="color:windowtext">From:</span></b><span style="color:windowtext"> vishwa <<a href="mailto:vishwa@linux.vnet.ibm.com" target="_blank">vishwa@linux.vnet.ibm.com</a>>
<br>
<b>Sent:</b> Wednesday, December 11, 2019 10:59 PM<br>
<b>To:</b> Neeraj Ladkani <<a href="mailto:neladk@microsoft.com" target="_blank">neladk@microsoft.com</a>><br>
<b>Cc:</b> <a href="mailto:openbmc@lists.ozlabs.org" target="_blank">openbmc@lists.ozlabs.org</a>;
<a href="mailto:sgundura@in.ibm.com" target="_blank">sgundura@in.ibm.com</a>;
<a href="mailto:kusripat@in.ibm.com" target="_blank">kusripat@in.ibm.com</a>;
<a href="mailto:shahjsha@in.ibm.com" target="_blank">shahjsha@in.ibm.com</a>;
<a href="mailto:vikantan@in.ibm.com" target="_blank">vikantan@in.ibm.com</a>;
Richard Hanley <<a href="mailto:rhanley@google.com" target="_blank">rhanley@google.com</a>><br>
<b>Subject:</b> Re: [EXTERNAL] Re: Managing
heterogeneous systems</span></p>
</div>
</div>
<p class="MsoNormal"> </p>
<div>
<p class="MsoNormal">On 12/10/19 3:20 PM, Neeraj Ladkani
wrote:</p>
</div>
<blockquote style="margin-top:5pt;margin-bottom:5pt">
<p class="MsoNormal"><span style="color:rgb(0,32,96)">Great
discussion. </span></p>
<p class="MsoNormal"><span style="color:rgb(0,32,96)"> </span></p>
<p class="MsoNormal"><span style="color:rgb(0,32,96)">The
problem is not physical interface as they can
communicate using LAN. The problem is entity binding
as one compute node can be connected to 1 or more
storage nodes. How can we have one view of system
from operational perspective? Power on/off, SEL
logs, telemetry? </span></p>
</blockquote>
<div>
<p class="MsoNormal"><span style="color:windowtext"> </span></p>
</div>
<div>
<p class="MsoNormal"><span style="color:windowtext"><br>
Correct. This is where I mentioned about "Primary
BMC acting as Point Of Contact" for external
requests.<br>
Depending on how we want to service the request, we
could orchestrate that via PoC BMC or respond to
external requesters on where they can get the data
and they connect to 'em directly.</span></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12pt"><span style="color:windowtext"><br>
!! Vishwa !!</span></p>
</div>
<blockquote style="margin-top:5pt;margin-bottom:5pt">
<p class="MsoNormal"><span style="color:rgb(0,32,96)"> </span></p>
<p class="MsoNormal"><span style="color:rgb(0,32,96)">Some
of problems :</span></p>
<p class="MsoNormal"><span style="color:rgb(0,32,96)"> </span></p>
<ol style="margin-top:0in" start="1" type="1">
<li style="color:rgb(0,32,96);margin-left:0in">
Power operations : Power/resets/ need to be
coordinated in all nodes in a system </li>
<li style="color:rgb(0,32,96);margin-left:0in">
Telemetry : OS runs only on head node so if there
are requests to read telemetry, it should get
telemetry ( SEL logs, Sensor Values ) from all the
nodes.
</li>
<li style="color:rgb(0,32,96);margin-left:0in">
Firmware Update</li>
<li style="color:rgb(0,32,96);margin-left:0in">
RAS: Memory errors are logged by UEFI SMM in to head
node but corresponding DIMM temperature , inlet
temperature are logged on secondary node which are
not mapped. </li>
</ol>
<p class="MsoNormal"><span style="color:rgb(0,32,96)"> </span></p>
<p class="MsoNormal"><span style="color:rgb(0,32,96)"> </span></p>
<p class="MsoNormal"><span style="color:rgb(0,32,96)">I
have been exploring couple of routes
</span></p>
<p class="MsoNormal"><span style="color:rgb(0,32,96)"> </span></p>
<ol style="margin-top:0in" start="1" type="1">
<li style="color:rgb(0,32,96);margin-left:0in">
LUN discovery and routing: this is similar to IPMI
but I am working on architecture to extend this to
support multiple LUNs and route them from Head node.
( we would need LUN routing over LAN )
</li>
<li style="color:rgb(0,32,96);margin-left:0in">
Redfish hierarchy for systems </li>
</ol>
<pre><span style="color:black"> "Systems": {</span></pre>
<pre><span style="color:black"> "@<a href="http://odata.id" target="_blank">odata.id</a>": "/redfish/v1/Systems"</span></pre>
<pre><span style="color:black"> },</span></pre>
<pre><span style="color:black"> "Chassis": {</span></pre>
<pre><span style="color:black"> "@<a href="http://odata.id" target="_blank">odata.id</a>": "/redfish/v1/Chassis"</span></pre>
<pre><span style="color:black"> },</span></pre>
<pre><span style="color:black"> "Managers": {</span></pre>
<pre><span style="color:black"> "@<a href="http://odata.id" target="_blank">odata.id</a>": "/redfish/v1/Managers"</span></pre>
<pre><span style="color:black"> },</span></pre>
<pre><span style="color:black"> "AccountService": {</span></pre>
<pre><span style="color:black"> "@<a href="http://odata.id" target="_blank">odata.id</a>": "/redfish/v1/AccountService"</span></pre>
<pre><span style="color:black"> },</span></pre>
<pre><span style="color:black"> "SessionService": {</span></pre>
<pre><span style="color:black"> "@<a href="http://odata.id" target="_blank">odata.id</a>": "/redfish/v1/SessionService"</span></pre>
<pre><span style="color:black"> },</span></pre>
<pre><span style="color:black"> "Links": {</span></pre>
<pre><span style="color:black"> "Sessions": {</span></pre>
<pre><span style="color:black"> "@<a href="http://odata.id" target="_blank">odata.id</a>": "/redfish/v1/SessionService/Sessions"</span></pre>
<pre><span style="color:black"> }</span></pre>
<pre style="margin-left:0.5in"><span>3.<span style="font:7pt "Times New Roman""> </span></span><span style="font-family:Calibri,sans-serif;color:rgb(0,32,96)">Custom Messaging over LAN ( PubSub)</span></pre>
<p class="MsoNormal"><span style="color:rgb(0,32,96)"> </span></p>
<p class="MsoNormal"><span style="color:rgb(0,32,96)">I
am also working on a whitepaper on same area
</span><span style="font-family:Wingdings;color:rgb(0,32,96)">J</span><span style="color:rgb(0,32,96)">. Happy to work with you
guys if you have any ideas on how can we standardize
this.
</span></p>
<p class="MsoNormal"><span style="color:rgb(0,32,96)"> </span></p>
<p class="MsoNormal"><span style="color:rgb(0,32,96)">Neeraj</span></p>
<p class="MsoNormal"><span style="color:windowtext"> </span></p>
<div>
<div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(225,225,225);padding:3pt 0in 0in">
<p class="MsoNormal"><b><span style="color:windowtext">From:</span></b><span style="color:windowtext"> vishwa
<a href="mailto:vishwa@linux.vnet.ibm.com" target="_blank"><vishwa@linux.vnet.ibm.com></a>
<br>
<b>Sent:</b> Tuesday, December 10, 2019 1:00 AM<br>
<b>To:</b> Richard Hanley <a href="mailto:rhanley@google.com" target="_blank"><rhanley@google.com></a>;
Neeraj Ladkani
<a href="mailto:neladk@microsoft.com" target="_blank"><neladk@microsoft.com></a><br>
<b>Cc:</b> <a href="mailto:openbmc@lists.ozlabs.org" target="_blank">openbmc@lists.ozlabs.org</a>;
<a href="mailto:sgundura@in.ibm.com" target="_blank">sgundura@in.ibm.com</a>;
<a href="mailto:kusripat@in.ibm.com" target="_blank">
kusripat@in.ibm.com</a>; <a href="mailto:shahjsha@in.ibm.com" target="_blank">shahjsha@in.ibm.com</a>;
<a href="mailto:vikantan@in.ibm.com" target="_blank">vikantan@in.ibm.com</a><br>
<b>Subject:</b> [EXTERNAL] Re: Managing
heterogeneous systems</span></p>
</div>
</div>
<p class="MsoNormal"> </p>
<p>Hi Richard / Neeraj,</p>
<p>Thanks for bringing this up. It's one of the
interesting topic for IBM.</p>
<p>Some of the thoughts here.....</p>
<p>When we have multiple BMCs as part of a single
system, then there are 3 main parts into it.</p>
<p>1/. Discovering the peer BMCs and role assignment<br>
2/. Monitoring the existence of peer BMCs - heartbeat
<br>
3/. In the event of loosing the master, detect so
using #2 and then reassign the role</p>
<p>Depending on how we want to establish the roles, we
could have Single-Master, Many-slave or Multi-Master,
Multi-Slave. etc</p>
<p>One of the team here is trying to do a POC for Multi
BMC architecture and is still in the very beginning
stage.
<br>
The team is currently studying/evaluating the
available solution - Corosync / Heartbeat /
Pacemaker".<br>
Corosync works nice with the clusters, but we need to
see if we can trim it down for BMC.<br>
<br>
If we can not use corosync for some reason, then need
to see if we can use the discovery using PLDM (
probably use the terminus IDs )<br>
and come up with custom rules for assigning
Master-Slave roles.</p>
<p>If we choose to have Single-Master and Many-Slave, we
could have that Single-Master as an entity acting as a
Point of Contact for external request and then could
orchestrate with the needed BMCs internally to get the
job done</p>
<p>I will be happy to know if there are alternatives
that suit BMC kind of an architecture</p>
<p>!! Vishwa !!</p>
<div>
<p class="MsoNormal">On 12/10/19 4:32 AM, Richard
Hanley wrote:</p>
</div>
<blockquote style="margin-top:5pt;margin-bottom:5pt">
<div>
<p class="MsoNormal">Hi Neeraj, </p>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">This is an open question that
I've been looking into as well. </p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">For BMC to BMC communication
there are a few options.</p>
</div>
<div>
<ol start="1" type="1">
<li class="MsoNormal">
If you have network connectivity you can
communicate using Redfish.</li>
<li class="MsoNormal">
If you only have a PCIe connection, you'll
have to use either the inband connection or
the side band I2C*. PLDM and MCTP are
protocols that defined to handle this use
case, although I'm not sure if the OpenBMC
implementations have been used in production.</li>
<li class="MsoNormal">
There is always IPMI, which has its own
pros/cons.</li>
</ol>
<div>
<p class="MsoNormal">For taking several BMCs and
aggregating them into a single logical
interface that is exposed to the outside
world, there are a few things happening on
that front. DMTF has been working on an
aggregation protocol for Redfish. However,
it's my understanding that their proposal is
more directed at the client level, as opposed
to within a single "system".</p>
</div>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">I just recently joined the
community, but I've been thinking about how a
proxy layer could merge two Redfish services
together. Since Redfish is fairly strongly
typed and has a well defined mechanism for OEM
extensions, this should be pretty generally
applicable. I am planning on having a white
paper on the issue sometime after the holidays.</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">Another thing to note,
recently DMTF released a spec for running a
binary Redfish over PLDM called RDE. That might
be a useful way of tying all these concepts
together. </p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">I'd be curious about your
thoughts and use cases here. Would either PLDM
or Redfish fit your use case?</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">Regards,</p>
</div>
<div>
<p class="MsoNormal">Richard</p>
</div>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">*I've heard of some proposals
that run a network interface over PCIe. I don't
know enough about PCIe to know if this is a good
idea.</p>
</div>
</div>
<p class="MsoNormal"> </p>
<div>
<div>
<p class="MsoNormal">On Mon, Dec 9, 2019 at 1:27
PM Neeraj Ladkani <<a href="mailto:neladk@microsoft.com" target="_blank">neladk@microsoft.com</a>>
wrote:</p>
</div>
<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0in 0in 0in 6pt;margin:5pt 0in 5pt 4.8pt">
<div>
<div>
<p class="MsoNormal">Are there any standards
in managing heterogeneous systems? For
example in a rack if there is a compute
node( with its own BMC) and storage node(
with its own BMC) connected using a PCIe
switch. How these two BMC represented as
one system ? are there any standards for
BMC – BMC communication?
</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Neeraj</p>
<p class="MsoNormal"> </p>
</div>
</div>
</blockquote>
</div>
</blockquote>
</blockquote>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote></div>