Design to isolate BMC service access

Wed Apr 5 08:22:37 AEST 2023

This is to gather information needed to write a design(1) for a BMC 
function to upload a "service access token" which gives access to BMC 
internals to a service agent.

BMCWeb uses the DMTF Redfish standard(2)(3), and I will pursue a Redfish 
spec change for this (or a reading that we should create a custom OEM 
solution which I would pursue at the OpenBMC community level).  I 
believe we can and should (at least partially) standardize some REST 
APIs in this area.

This topic was introduced and briefly discussed in the OpenBMC Security 
Working Group meeting(4) on 2023-03-29 where I agreed to write a design 
for this.  I intend to move forward with this email and a proper design.

If this interests you, please read and reply with changes needed to fit 
your scheme.  Study questions are at the bottom of the design sketch.

- Joseph

References:

 1. https://github.com/openbmc/docs/blob/master/designs/design-template.md
 2. https://redfish.dmtf.org/
 3. https://github.com/openbmc/bmcweb/blob/master/Redfish.md
 4. https://docs.google.com/document/d/1b7x9BaxsfcukQDqbvZsU2ehMq4xoJRQvLxxsDUWmAOI

------------------------------------------------------------------------

*Problem description*

The BMC has:

 1. External interfaces such as REST APIs.  These provide full
    operational control over the BMC.
 2. Internal interfaces within the BMC such as D-Bus, Systemd, and the
    sysfs file system.  These provide full access to the BMC's raw
    capabilities, are needed to provide the BMC's external function, and
    are needed to diagnose and fix problems.  Direct access to these
    internal interfaces is typically via SSH to the BMC's command shell.
 3. A default root user who has full administrator access to the BMC's
    external interfaces and access to the BMC's internal interfaces.

In some use cases, it is desirable to isolate the internal interfaces 
away from administrator users.  For example, when the BMC part of a 
system which has sensitive data, you want to isolate the BMC user from 
what little access the BMC has to that data.  When this use case is 
desired, a typical deployment involves three organizations:

 1. Development.  The development team creates the BMC's firmware, is
    responsible for the function of all internal and external
    interfaces, and may be needed to debug complex problems on
    operational systems.
 2. Operations.  The operations team installs the BMC, and the BMC
    administrator operates the BMC *only* via its intended external
    interface, and does not allow access to BMC internals.
 3. Service.  The service team diagnoses and debugs problems on
    operational BMCs, and sometimes needs to use the BMC's internal
    interfaces.

The trust relationships from an operational point of view:

 1. BMC administrators are trained only to operate the BMC's external
    interfaces.  When they find a problem, they call for authorized
    service agents.  To be extra careful, they may remove any sensitive
    workloads from the system before allowing service technicians to
    access the BMC.
 2. BMC service technicians are trained to use the BMC's internal
    interfaces.  They may not be fully trusted by the operations team
    who may carefully watch them or may operate the BMC's internal
    controls for them.
 3. BMC developers may be called to diagnose problems, and sometimes
    need access to the failing BMC's internal interfaces.

In addition, the development team may not trust the operations team to 
have access to BMC internals.  For example, operations' access to BMC 
internals can cause problems for which development may be blamed.  
Developers want to lock out everyone except (trained and trusted) 
authorized service agents from the BMC's internal interfaces.

To be clear, the problem is:

 1. The administrator users have access to the BMC's internal interfaces
    which they do not need, and that access can be used to harm the
    system.  Also developers may want to lock out access to the BMC's
    internal interfaces.
 2. If this access it taken away then this limits the capability to
    service the BMC.  We must retain the ability to access the BMC's
    internal interfaces so we can debug the BMC.
 3. Both the BMC administrator and service must agree before access to
    BMC internals is granted.

*The solution*

A way to solve this access problem has three pieces (all needed for a 
complete solution):

 1. Make BMC firmware so administrators are only allowed to access to
    the BMC's intended operational interfaces (and are not allowed to
    access BMC internal interfaces).  For example, move away from root
    account logins, and create a new "admin" account which can access
    only the BMC's external interfaces.
 2. Add function to the BMC so it can be accessed by a service user,
    where access is disabled by default, and where access can be to
    varying levels of BMC internal functions (for example, using
    "service" APIs to perform common functions, or root access to the
    BMC command shell for the deepest or most permissive level of access).
 3. Provide a way for a BMC admin user to request service access to
    their BMC, with 2 requirements:
     1. Only the BMC admin should be allowed to enable this access,
        meaning a service user should not be allowed to self-enable
        their own access (presuming they don't also have admin access).
     2. When service access is enabled, the admin user should not have
        service access.  For example, the service user should have to
        authenticate to the BMC using credentials not known to the admin.

Note: This design does not give the solution to create an "admin" 
account as in solution point 1 above.  That part of the solution is 
necessary, but it can be provided elsewhere.  This design addresses 
points 2 and 3.

A "service access token" is proposed.  Details are below but for now, a 
service access token:

  * Is a small file (kilobytes), a digitally-signed request to access a
    specific BMC function on a specific BMC for a limited time window. 
    This token may have additional information about its origin, etc.
  * Is created by an authorized service agent.  Only service agents can
    digitally sign the tokens so they can be verified by the BMC.
  * Is uploaded to the BMC by an admin user to perform a specific
    service function.
  * Has nothing that is secret to the BMC admin user.  If the token
    encodes a password, it is stored in the form of a secure hash.

Here is a sketch of the steps a BMC admin and their service agent would 
use to make a service call to gain access to BMC internals:

 1. The administrator gathers information about their BMC.  They may get
    the system model and serial number, or use the system to generate a
    token (via the system's root of trust) needed to make the service
    call.  The admin passes this data to the service agent along with
    their request for service.
 2. The service agent receives the data and (using their privileged
    position behind their organization’s firewall) creates a "service
    access token" needed to gain service access to the BMC.
 3. The service rep gives (via shared storage, or email, etc) the
    service access token to the BMC admin.
 4. The BMC admin uploads the service access token to the BMC. Doing so
    enables the service function indicated within the service access
    token.  Design question: Should the function be activated when it is
    uploaded, or via a separate activate function?
 5. If the service function is to allow root login to the BMC command
    shell, the service user can now login to the BMC, using a unique
    password associated with the service access token, and known only them.
 6. Other popular functions might be to recover the admin account,
    disable various security features, or perform a service dump.
    Example: Customers regularly call for service because they lost
    access to their admin account.  Recovery means, for example:
    recreate the admin account or set it to a usable state, and set its
    password to a known value, reset its password lockout, etc.
 7. The service agent then deletes or deactivates the service access
    token or allows it to expire.

To simplify the design and implementation, at most one service access 
token is allowed on the BMC at any given time.  Design question: Is this 
okay with the service planner?

Anti-replay protection for these access tokens is assumed.  For example, 
an access token used to get access to a BMC command shell could not be 
used twice: the second attempt to upload it should result in permission 
denied with reason: anti-replay protection. (In this example, login 
access is allowed multiple times until the ACF expires or is deleted.)

New BMC functions

This new "service access" function is optionally built into the firmware 
image, controlled by an image feature, and defaults to disabled (which 
means the REST APIs and underlying function is not present).  Enabling 
this feature includes the new REST APIs and its implementation on the BMC.

The new BMC REST API functions (used by an admin user) for this are:

 1. Generate request for service.  The BMC returns a small file to the
    admin (which contains the BMC model and serial number, or a signed
    request, etc).
 2. Upload (POST) the service access token.  The BMC admin uses this
    upload the service access token to the BMC.
 3. Inspect (GET) information about the service access token.
 4. Activate the service access token.  This causes the service access
    token to do its thing (take a service dump, activate  the "service"
    account login, etc.)
 5. Delete the service access token.

The backend function which runs on the BMC would be a new D-Bus service 
to handle the "service access token", possibly with 2 popular functions: 
(1) enable service account access to the BMC command shell, and (2) 
recover admin account access.  The implementor would have freedom to 
customize these functions and to provide their own custom functions.  
Each manufacturer would have a different set of purposes for the access 
tokens, and the use cases would not be shared.

This function could be used to begin to standardize various existing 
custom schemes such as IBM's Access Control File (ACF), Microsoft's 
Secure Unlock, and others.

*Questions*

 1. Is this design sketch clear?  What improvements are needed?
 2. Who in the OpenBMC community can use this?  IBM and Microsoft have
    discussed this.
 3. Is this the right set of BMC functions to support all the use
    cases?  What else is needed?
 4. Should the "service access token" be an X.509 certificate?  Or is
    that inappropriate?
 5. Is Redfish interested in putting this into the spec (or is this
    already spec'd)?  Versus OpenBMC OEM solution.
 6. Does threading this through BMCWeb, D-Bus, and a new Systemd D-Bus
    service seem right?
 7. Should uploading the service access token also activate it
    immediately, or should that be a separate step?  For example, a BMC
    admin might want to: (A) upload a token, (B) inspect the token
    (using the BMC function) to ensure it looks legitimate and perform
    the function they agreed to, and then finally (C) activate the
    token, for example, to disable secure boot.
 8. Does it make sense to have a common implementation for the functions
    as listed above (like to recover admin account access).