[Skiboot] [PATCH v2 2/6] secvar/secboot_tpm: unify behavior for bank hash check and secboot header check

Daniel Axtens dja at axtens.net
Wed Nov 3 16:35:52 AEDT 2021


Eric Richter <erichte at linux.ibm.com> writes:

> On 11/2/21 9:16 AM, Daniel Axtens wrote:
>> Eric Richter <erichte at linux.ibm.com> writes:
>> 
>>> As the PNOR variable space cannot be locked, the data must be integrity
>>> checked when loaded to ensure it has not beeen modified by an unauthorized
>>> party. In the event that a modification has been detected (i.e. hash mismatch),
>>> we must not load in data that could potentially be compromised.
>>>
>>> However, the previous code was a bit overzealous with its reaction to detecting
>>> a compromised SECBOOT partition, and also had some inconsistencies in behavior.
>>>
>>> Case 1: SECBOOT partition cleared.
>>> .init() checks the header for the magic number and version. As neither matches,
>>> will reformat the entire partition. Now, .load_bank() will pass, as the data
>>> was just freshly reformatted (note: this also could trigger the bug addressed
>>> in the previous patch). Only variables in the TPM will be loaded by
>>> .load_bank() as the data in SECBOOT is now empty.
>>>
>>> Case 2: Bank hash mismatch.
>>> .load_bank() panics and returns an error code, causing secvar_main() to jump
>>> to the error scenario, which prevents the secvar API from being exposed.
>>> os-secure-enforcing is set unconditionally, and the user will have no API to
>>> manage or attempt to fix their system without issuing a key clear request.
>>>
>>> This patch unifies the behavior of both of these cases. Now, .init() handles
>>> checking the header AND comparing the bank hash. If either check fails, the
>>> SECBOOT partition will be reformatted. Variables in the TPM will still be
>>> loaded in the .load_bank() step, and provided the backend stores its
>>> secure boot state in the TPM, secure boot state can be preserved.
>>>
>>> Signed-off-by: Eric Richter <erichte at linux.ibm.com>
>>> Tested-by: Nick Child <nick.child at ibm.com>
>>> ---
>>>  libstb/secvar/storage/secboot_tpm.c          | 30 +++++++++++++++++++-
>>>  libstb/secvar/test/secvar-test-secboot-tpm.c |  6 ++++
>>>  2 files changed, 35 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/libstb/secvar/storage/secboot_tpm.c b/libstb/secvar/storage/secboot_tpm.c
>>> index 5907ff07..932e5fd3 100644
>>> --- a/libstb/secvar/storage/secboot_tpm.c
>>> +++ b/libstb/secvar/storage/secboot_tpm.c
>>> @@ -374,7 +374,9 @@ fail:
>>>  	return rc;
>>>  }
>>>  
>>> -static int secboot_tpm_load_variable_bank(struct list_head *bank)
>>> +
>>> +/* Helper to validate the current active SECBOOT bank's data against the hash stored in the TPM */
>>> +static int compare_bank_hash(void)
>>>  {
>>>  	char bank_hash[SHA256_DIGEST_LENGTH];
>>>  	uint64_t bit = tpmnv_control_image->active_bit;
>>> @@ -394,6 +396,15 @@ static int secboot_tpm_load_variable_bank(struct list_head *bank)
>>>  		/* Tampered pnor space detected, abandon ship */
>>>  		return OPAL_PERMISSION;
>>>  
>>> +	return OPAL_SUCCESS;
>>> +}
>>> +
>> 
>> So in doing this you take the check out of secvar_ops.load_bank(). I am
>> guessing that:
>> 
>>  - secboot_tpm_store_init() gets called before any calls to load_bank()
>> 
>
> Correct.
>
>>  - there's nothing that skiboot does that might cause the contents of
>>    the PNOR to change between store_init() and load_bank()?
>> 
>
> Correct.
>
>>  - anything that's able to change the PNOR while skiboot is running this
>>    early boot stuff is outside the threat model?
>> 
>
> Correct.
>
>> Is that right?
>
> See secvar_main() in secvar_main.c for the flow. It is one call to _init,
> then one call to _load_bank() for the variable bank, then the update bank.
>
> Side note: the entire SECBOOT partition is loaded into memory in _init().
> No further reads from PNOR are made, so even if PNOR is modified, it
> won't be read until the next boot (assuming it doesn't get overwritten
> by a write). load_bank() is more accurately "parse_bank()".
>
> This is of course assuming memory can't be arbitrarily changed in this
> time, but that would have even worse implications than this.

Yep, ok.

>
>> 
>>> +
>>> +static int secboot_tpm_load_variable_bank(struct list_head *bank)
>>> +{
>>> +	uint64_t bit = tpmnv_control_image->active_bit;
>>> +	int rc;
>>> +
>>>  	rc = secboot_tpm_deserialize_from_buffer(bank, tpmnv_vars_image->vars, tpmnv_vars_size, SECVAR_FLAG_PROTECTED);
>>>  	if (rc)
>>>  		return rc;
>>> @@ -692,8 +703,25 @@ static int secboot_tpm_store_init(void)
>>>  		rc = secboot_format();
>>>  		if (rc)
>>>  			goto error;
>>> +		goto done;
>>>  	}
>>>  
>>> +	/* Verify the active bank's integrity by comparing against the hash in TPM.
>>> +	 * Reformat if it does not match -- we do not want to load potentially
>>> +	 * compromised data.
>>> +	 * Ideally, the backend driver should retain secure boot state in
>>> +	 * protected (TPM) storage, so secure boot state should be the same, albeit
>>> +	 * without the data in unprotected (PNOR) storage.
>> 
>> Don't you have 3 reserved bytes in struct secboot_header? Could you use
>> one and say it's only defined in the case of the tpmnv_control header?
>> Or does this state already get preserved and I've just misunderstood the
>> interplay between backends/storage drivers/whatever?
>> 
>
> This is fancy wordplay since this is a storage driver that isn't supposed
> to care about secure boot modes, states, and variables.
>
> I'm specifically talking about the PK. If the PK is populated with a key,
> we are in os-secure-enforcing. That is the "secure boot state" that is
> being preserved across a PNOR wipe by being stored in TPM NV instead.

I think I get the vague picture: things will be OK because the actual
storage driver puts the PK in the TPM, but this code refers to a general
abstract storage driver rather than the one we have at the moment. I
find it all confusing but I think the design is just interacting poorly
with my general tiredness so I wouldn't read too much into it.

>> I haven't go through the rest of the series but is the rework part of
>> this series or not?
>> 
>
> No. The test case rework is next up, but it is not as urgently needed.
> Currently we are testing by hand the changes in this set.
>
>> I think I conceptually prefer just deleting the code and leaving it in
>> the git history but I don't _really_ care, especially for tests...

If you end up respinning this, I'd probably remove the code, but I won't
hold things up for that.

In conclusion, the commit seems to:
 - do what it says on the tin.

 - prevent a nasty error case (bank hash mismatch -> complete lockdown
   without any recovery mechanism other than a physical presence assert)

 - not introduce any other obvious issues. You can probably use this to
   construct a replay attack - the sort of thing you address in the next
   patch - but if you can corrupt the bank hash you can probably wipe
   the magic bytes so this hardly seems worse, and you address at least
   the obvious case in the next patch.

With the caveat that I haven't compiled it or tested it at all:
Reviewed-by: Daniel Axtens <dja at axtens.net>

Kind regards,
Daniel


More information about the Skiboot mailing list