[PATCH v8 1/5] crypto: aspeed: Add HACE hash driver

Tue Aug 9 17:39:10 AEST 2022

> >> On 2022/7/26 19:34, Neal Liu wrote:
> >>> Hash and Crypto Engine (HACE) is designed to accelerate the
> >>> throughput of hash data digest, encryption, and decryption.
> >>>
> >>> Basically, HACE can be divided into two independently engines
> >>> - Hash Engine and Crypto Engine. This patch aims to add HACE hash
> >>> engine driver for hash accelerator.
> >>>
> >>> Signed-off-by: Neal Liu <neal_liu at aspeedtech.com>
> >>> Signed-off-by: Johnny Huang <johnny_huang at aspeedtech.com>
> >>> ---
> >>>  MAINTAINERS                              |    7 +
> >>>  drivers/crypto/Kconfig                   |    1 +
> >>>  drivers/crypto/Makefile                  |    1 +
> >>>  drivers/crypto/aspeed/Kconfig            |   32 +
> >>>  drivers/crypto/aspeed/Makefile           |    6 +
> >>>  drivers/crypto/aspeed/aspeed-hace-hash.c | 1389
> >> ++++++++++++++++++++++
> >>>  drivers/crypto/aspeed/aspeed-hace.c      |  213 ++++
> >>>  drivers/crypto/aspeed/aspeed-hace.h      |  186 +++
> >>>  8 files changed, 1835 insertions(+)  create mode 100644
> >>> drivers/crypto/aspeed/Kconfig  create mode 100644
> >>> drivers/crypto/aspeed/Makefile  create mode 100644
> >>> drivers/crypto/aspeed/aspeed-hace-hash.c
> >>>  create mode 100644 drivers/crypto/aspeed/aspeed-hace.c
> >>>  create mode 100644 drivers/crypto/aspeed/aspeed-hace.h
> >>>
> >>> diff --git a/MAINTAINERS b/MAINTAINERS index
> >>> f55aea311af5..23a0215b7e42 100644
> >>> --- a/MAINTAINERS
> >>> +++ b/MAINTAINERS
> >>> @@ -3140,6 +3140,13 @@ S:	Maintained
> >>>  F:	Documentation/devicetree/bindings/media/aspeed-video.txt
> >>>  F:	drivers/media/platform/aspeed/
> >>>
> >>> +ASPEED CRYPTO DRIVER
> >>> +M:	Neal Liu <neal_liu at aspeedtech.com>
> >>> +L:	linux-aspeed at lists.ozlabs.org (moderated for non-subscribers)
> >>> +S:	Maintained
> >>> +F:
> >> 	Documentation/devicetree/bindings/crypto/aspeed,ast2500-hace.yaml
> >>> +F:	drivers/crypto/aspeed/
> >>> +
> >>>  ASUS NOTEBOOKS AND EEEPC ACPI/WMI EXTRAS DRIVERS
> >>>  M:	Corentin Chary <corentin.chary at gmail.com>
> >>>  L:	acpi4asus-user at lists.sourceforge.net
> >>> diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig index
> >>> ee99c02c84e8..b9f5ee126881 100644
> >>> --- a/drivers/crypto/Kconfig
> >>> +++ b/drivers/crypto/Kconfig
> >>> @@ -933,5 +933,6 @@ config CRYPTO_DEV_SA2UL
> >>>  	  acceleration for cryptographic algorithms on these devices.
> >>>
> >>>  source "drivers/crypto/keembay/Kconfig"
> >>> +source "drivers/crypto/aspeed/Kconfig"
> >>>
> >>>  endif # CRYPTO_HW
> >>> diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile index
> >>> f81703a86b98..116de173a66c 100644
> >>> --- a/drivers/crypto/Makefile
> >>> +++ b/drivers/crypto/Makefile
> >>> @@ -1,5 +1,6 @@
> >>>  # SPDX-License-Identifier: GPL-2.0
> >>>  obj-$(CONFIG_CRYPTO_DEV_ALLWINNER) += allwinner/
> >>> +obj-$(CONFIG_CRYPTO_DEV_ASPEED) += aspeed/
> >>>  obj-$(CONFIG_CRYPTO_DEV_ATMEL_AES) += atmel-aes.o
> >>>  obj-$(CONFIG_CRYPTO_DEV_ATMEL_SHA) += atmel-sha.o
> >>>  obj-$(CONFIG_CRYPTO_DEV_ATMEL_TDES) += atmel-tdes.o diff --git
> >>> a/drivers/crypto/aspeed/Kconfig b/drivers/crypto/aspeed/Kconfig new
> >>> file mode 100644 index 000000000000..059e627efef8
> >>> --- /dev/null
> >>> +++ b/drivers/crypto/aspeed/Kconfig
> >>> @@ -0,0 +1,32 @@
> >>> +config CRYPTO_DEV_ASPEED
> >>> +	tristate "Support for Aspeed cryptographic engine driver"
> >>> +	depends on ARCH_ASPEED
> >>> +	help
> >>> +	  Hash and Crypto Engine (HACE) is designed to accelerate the
> >>> +	  throughput of hash data digest, encryption and decryption.
> >>> +
> >>> +	  Select y here to have support for the cryptographic driver
> >>> +	  available on Aspeed SoC.
> >>> +
> >>> +config CRYPTO_DEV_ASPEED_HACE_HASH
> >>> +	bool "Enable Aspeed Hash & Crypto Engine (HACE) hash"
> >>> +	depends on CRYPTO_DEV_ASPEED
> >>> +	select CRYPTO_ENGINE
> >>> +	select CRYPTO_SHA1
> >>> +	select CRYPTO_SHA256
> >>> +	select CRYPTO_SHA512
> >>> +	select CRYPTO_HMAC
> >>> +	help
> >>> +	  Select here to enable Aspeed Hash & Crypto Engine (HACE)
> >>> +	  hash driver.
> >>> +	  Supports multiple message digest standards, including
> >>> +	  SHA-1, SHA-224, SHA-256, SHA-384, SHA-512, and so on.
> >>> +
> >>> +config CRYPTO_DEV_ASPEED_HACE_HASH_DEBUG
> >>> +	bool "Enable HACE hash debug messages"
> >>> +	depends on CRYPTO_DEV_ASPEED_HACE_HASH
> >>> +	help
> >>> +	  Print HACE hash debugging messages if you use this option
> >>> +	  to ask for those messages.
> >>> +	  Avoid enabling this option for production build to
> >>> +	  minimize driver timing.
> >>> diff --git a/drivers/crypto/aspeed/Makefile
> >> b/drivers/crypto/aspeed/Makefile
> >>> new file mode 100644
> >>> index 000000000000..8bc8d4fed5a9
> >>> --- /dev/null
> >>> +++ b/drivers/crypto/aspeed/Makefile
> >>> @@ -0,0 +1,6 @@
> >>> +obj-$(CONFIG_CRYPTO_DEV_ASPEED) += aspeed_crypto.o
> >>> +aspeed_crypto-objs := aspeed-hace.o \
> >>> +		      $(hace-hash-y)
> >>> +
> >>> +obj-$(CONFIG_CRYPTO_DEV_ASPEED_HACE_HASH) +=
> aspeed-hace-hash.o
> >>> +hace-hash-$(CONFIG_CRYPTO_DEV_ASPEED_HACE_HASH) :=
> >> aspeed-hace-hash.o
> >>> diff --git a/drivers/crypto/aspeed/aspeed-hace-hash.c
> >> b/drivers/crypto/aspeed/aspeed-hace-hash.c
> >>> new file mode 100644
> >>> index 000000000000..63a8ad694996
> >>> --- /dev/null
> >>> +++ b/drivers/crypto/aspeed/aspeed-hace-hash.c
> >>> @@ -0,0 +1,1389 @@

[...]

> >>> +static int aspeed_ahash_dma_prepare_sg(struct aspeed_hace_dev
> >> *hace_dev)
> >>> +{
> >>> +	struct aspeed_engine_hash *hash_engine =
> &hace_dev->hash_engine;
> >>> +	struct ahash_request *req = hash_engine->req;
> >>> +	struct aspeed_sham_reqctx *rctx = ahash_request_ctx(req);
> >>> +	struct aspeed_sg_list *src_list;
> >>> +	struct scatterlist *s;
> >>> +	int length, remain, sg_len, i;
> >>> +	int rc = 0;
> >>> +
> >>> +	remain = (rctx->total + rctx->bufcnt) % rctx->block_size;
> >>> +	length = rctx->total + rctx->bufcnt - remain;
> >>> +
> >>> +	AHASH_DBG(hace_dev, "%s:0x%x, %s:0x%x, %s:0x%x, %s:0x%x\n",
> >>> +		  "rctx total", rctx->total, "bufcnt", rctx->bufcnt,
> >>> +		  "length", length, "remain", remain);
> >>> +
> >>> +	sg_len = dma_map_sg(hace_dev->dev, rctx->src_sg, rctx->src_nents,
> >>> +			    DMA_TO_DEVICE);
> >>> +	if (!sg_len) {
> >>> +		dev_warn(hace_dev->dev, "dma_map_sg() src error\n");
> >>> +		rc = -ENOMEM;
> >>> +		goto end;
> >>> +	}
> >>> +
> >>> +	src_list = (struct aspeed_sg_list *)hash_engine->ahash_src_addr;
> >>> +	rctx->digest_dma_addr = dma_map_single(hace_dev->dev,
> rctx->digest,
> >>> +					       SHA512_DIGEST_SIZE,
> >>> +					       DMA_BIDIRECTIONAL);
> >>> +	if (dma_mapping_error(hace_dev->dev, rctx->digest_dma_addr)) {
> >>> +		dev_warn(hace_dev->dev, "dma_map() rctx digest error\n");
> >>> +		rc = -ENOMEM;
> >>> +		goto free_src_sg;
> >>> +	}
> >>> +
> >>> +	if (rctx->bufcnt != 0) {
> >>> +		rctx->buffer_dma_addr = dma_map_single(hace_dev->dev,
> >>> +						       rctx->buffer,
> >>> +						       rctx->block_size * 2,
> >>> +						       DMA_TO_DEVICE);
> >>> +		if (dma_mapping_error(hace_dev->dev,
> rctx->buffer_dma_addr)) {
> >>> +			dev_warn(hace_dev->dev, "dma_map() rctx buffer
> error\n");
> >>> +			rc = -ENOMEM;
> >>> +			goto free_rctx_digest;
> >>> +		}
> >>> +
> >>> +		src_list[0].phy_addr = rctx->buffer_dma_addr;
> >>> +		src_list[0].len = rctx->bufcnt;
> >>> +		length -= src_list[0].len;
> >>> +
> >>> +		/* Last sg list */
> >>> +		if (length == 0)
> >>> +			src_list[0].len |= HASH_SG_LAST_LIST;
> >>> +
> >>> +		src_list[0].phy_addr = cpu_to_le32(src_list[0].phy_addr);
> >>> +		src_list[0].len = cpu_to_le32(src_list[0].len);
> >>> +		src_list++;
> >>> +	}
> >>> +
> >>> +	if (length != 0) {
> >>> +		for_each_sg(rctx->src_sg, s, sg_len, i) {
> >>> +			src_list[i].phy_addr = sg_dma_address(s);
> >>> +
> >>> +			if (length > sg_dma_len(s)) {
> >>> +				src_list[i].len = sg_dma_len(s);
> >>> +				length -= sg_dma_len(s);
> >>> +
> >>> +			} else {
> >>> +				/* Last sg list */
> >>> +				src_list[i].len = length;
> >>> +				src_list[i].len |= HASH_SG_LAST_LIST;
> >>> +				length = 0;
> >>> +			}
> >>> +
> >>> +			src_list[i].phy_addr = cpu_to_le32(src_list[i].phy_addr);
> >>> +			src_list[i].len = cpu_to_le32(src_list[i].len);
> >>> +		}
> >>> +	}
> >>> +
> >>> +	if (length != 0) {
> >>> +		rc = -EINVAL;
> >>> +		goto free_rctx_buffer;
> >>> +	}
> >>> +
> >>> +	rctx->offset = rctx->total - remain;
> >>> +	hash_engine->src_length = rctx->total + rctx->bufcnt - remain;
> >>> +	hash_engine->src_dma = hash_engine->ahash_src_dma_addr;
> >>> +	hash_engine->digest_dma = rctx->digest_dma_addr;
> >>> +
> >>> +	goto end;
> >> Exiting via "goto xx" is not recommended in normal code logic (this
> >> requires two jumps), exiting via "return 0" is more efficient.
> >> This code method has many times in your entire driver, it is
> >> recommended to modify it.
> >
> > If not exiting via "goto xx", how to release related resources without any
> problem?
> > Is there any proper way to do this?
> maybe I didn't describe it clearly enough.
> "in normal code logic"  means rc=0
> In this scenario (rc=0), "goto xx" is no longer required, it can be replaced with
> "return 0"

Okay, I got your point. In this case, "goto end" is no longer required of course.
I would send next patch with this fixed included.

> >
> >>> +
> >>> +free_rctx_buffer:
> >>> +	if (rctx->bufcnt != 0)
> >>> +		dma_unmap_single(hace_dev->dev, rctx->buffer_dma_addr,
> >>> +				 rctx->block_size * 2, DMA_TO_DEVICE);
> >>> +free_rctx_digest:
> >>> +	dma_unmap_single(hace_dev->dev, rctx->digest_dma_addr,
> >>> +			 SHA512_DIGEST_SIZE, DMA_BIDIRECTIONAL);
> >>> +free_src_sg:
> >>> +	dma_unmap_sg(hace_dev->dev, rctx->src_sg, rctx->src_nents,
> >>> +		     DMA_TO_DEVICE);
> >>> +end:
> >>> +	return rc;
> >>> +}
> >>> +
> >>> +static int aspeed_ahash_complete(struct aspeed_hace_dev *hace_dev)
> >>> +{
> >>> +	struct aspeed_engine_hash *hash_engine =
> &hace_dev->hash_engine;
> >>> +	struct ahash_request *req = hash_engine->req;
> >>> +
> >>> +	AHASH_DBG(hace_dev, "\n");
> >>> +
> >>> +	hash_engine->flags &= ~CRYPTO_FLAGS_BUSY;
> >>> +
> >>> +	crypto_finalize_hash_request(hace_dev->crypt_engine_hash, req, 0);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/*
> >>> + * Copy digest to the corresponding request result.
> >>> + * This function will be called at final() stage.
> >>> + */
> >>> +static int aspeed_ahash_transfer(struct aspeed_hace_dev *hace_dev)
> >>> +{
> >>> +	struct aspeed_engine_hash *hash_engine =
> &hace_dev->hash_engine;
> >>> +	struct ahash_request *req = hash_engine->req;
> >>> +	struct aspeed_sham_reqctx *rctx = ahash_request_ctx(req);
> >>> +
> >>> +	AHASH_DBG(hace_dev, "\n");
> >>> +
> >>> +	dma_unmap_single(hace_dev->dev, rctx->digest_dma_addr,
> >>> +			 SHA512_DIGEST_SIZE, DMA_BIDIRECTIONAL);
> >>> +
> >>> +	dma_unmap_single(hace_dev->dev, rctx->buffer_dma_addr,
> >>> +			 rctx->block_size * 2, DMA_TO_DEVICE);
> >>> +
> >>> +	memcpy(req->result, rctx->digest, rctx->digsize);
> >>> +
> >>> +	return aspeed_ahash_complete(hace_dev); }
> >>> +
> >>> +/*
> >>> + * Trigger hardware engines to do the math.
> >>> + */
> >>> +static int aspeed_hace_ahash_trigger(struct aspeed_hace_dev
> *hace_dev,
> >>> +				     aspeed_hace_fn_t resume)
> >>> +{
> >>> +	struct aspeed_engine_hash *hash_engine =
> &hace_dev->hash_engine;
> >>> +	struct ahash_request *req = hash_engine->req;
> >>> +	struct aspeed_sham_reqctx *rctx = ahash_request_ctx(req);
> >>> +
> >>> +	AHASH_DBG(hace_dev, "src_dma:0x%x, digest_dma:0x%x,
> >> length:0x%x\n",
> >>> +		  hash_engine->src_dma, hash_engine->digest_dma,
> >>> +		  hash_engine->src_length);
> >>> +
> >>> +	rctx->cmd |= HASH_CMD_INT_ENABLE;
> >>> +	hash_engine->resume = resume;
> >>> +
> >>> +	ast_hace_write(hace_dev, hash_engine->src_dma,
> >> ASPEED_HACE_HASH_SRC);
> >>> +	ast_hace_write(hace_dev, hash_engine->digest_dma,
> >>> +		       ASPEED_HACE_HASH_DIGEST_BUFF);
> >>> +	ast_hace_write(hace_dev, hash_engine->digest_dma,
> >>> +		       ASPEED_HACE_HASH_KEY_BUFF);
> >>> +	ast_hace_write(hace_dev, hash_engine->src_length,
> >>> +		       ASPEED_HACE_HASH_DATA_LEN);
> >>> +
> >>> +	/* Memory barrier to ensure all data setup before engine starts */
> >>> +	mb();
> >>> +
> >>> +	ast_hace_write(hace_dev, rctx->cmd, ASPEED_HACE_HASH_CMD);
> >> A hardware service sending requires 5 hardware commands to complete.
> >> In a multi-concurrency scenario, how to ensure the order of commands?
> >> (If two processes send hardware task at the same time, How to ensure
> >> that the hardware recognizes which task the current command belongs
> >> to?)
> >
> > Linux crypto engine would guarantee that only one request at each time to
> be dequeued from engine queue to process.
> > And there has lock mechanism inside Linux crypto engine to prevent the
> scenario you mentioned.
> > So only 1 aspeed_hace_ahash_trigger() hardware service would go through
> at a time.
> >
> > [...]
> > .
> >
> You may not understand what I mean, the command flow in a normal scenario:
> request_A: Acmd1-->Acmd2-->Acmd3-->Acmd4-->Acmd5
> request_B: Bcmd1-->Bcmd2-->Bcmd3-->Bcmd4-->Bcmd5
> In a multi-process concurrent scenario, multiple crypto engines can be enabled,
> and each crypto engine sends a request. If multiple requests here enter
> aspeed_hace_ahash_trigger() at the same time, the command flow will be
> intertwined like this:
> request_A, request_B:
> Acmd1-->Bcmd1-->Acmd2-->Acmd3-->Bcmd2-->Acmd4-->Bcmd3-->Bcmd4-->A
> cmd5-->Bcmd5
> 
> In this command flow, how does your hardware identify whether these
> commands belong to request_A or request_B?
> Thanks.
> Longfang.

For my understanding, all requests will transfer into engine queue through crypto_transfer_hash_request_to_engine().
In your example, request_A & request_B would also enqueue into the engine queue, and pump out 1 request which might be FIFO to handle it.
crypto_pump_requests() will dequeue only 1 request at a time and to prepare_request() & do_one_request() if it's registered.
And aspeed_hace_ahash_trigger() is inside do_one_request(), so that means no other requests would come in during aspeed_hace_ahash_trigger() whole process.
The command flow intertwined
Acmd1-->Bcmd1-->Acmd2-->Acmd3-->Bcmd2-->Acmd4-->Bcmd3-->Bcmd4-->Acmd5-->Bcmd5 would not exist in any scenario.
Correct me if I'm misunderstanding, Thanks.