summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'linus' of ↵Linus Torvalds2017-02-23187-9574/+26959
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Try to catch hash output overrun in testmgr - Introduce walksize attribute for batched walking - Make crypto_xor() and crypto_inc() alignment agnostic Algorithms: - Add time-invariant AES algorithm - Add standalone CBCMAC algorithm Drivers: - Add NEON acclerated chacha20 on ARM/ARM64 - Expose AES-CTR as synchronous skcipher on ARM64 - Add scalar AES implementation on ARM64 - Improve scalar AES implementation on ARM - Improve NEON AES implementation on ARM/ARM64 - Merge CRC32 and PMULL instruction based drivers on ARM64 - Add NEON acclerated CBCMAC/CMAC/XCBC AES on ARM64 - Add IPsec AUTHENC implementation in atmel - Add Support for Octeon-tx CPT Engine - Add Broadcom SPU driver - Add MediaTek driver" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (142 commits) crypto: xts - Add ECB dependency crypto: cavium - switch to pci_alloc_irq_vectors crypto: cavium - switch to pci_alloc_irq_vectors crypto: cavium - remove dead MSI-X related define crypto: brcm - Avoid double free in ahash_finup() crypto: cavium - fix Kconfig dependencies crypto: cavium - cpt_bind_vq_to_grp could return an error code crypto: doc - fix typo hwrng: omap - update Kconfig help description crypto: ccm - drop unnecessary minimum 32-bit alignment crypto: ccm - honour alignmask of subordinate MAC cipher crypto: caam - fix state buffer DMA (un)mapping crypto: caam - abstract ahash request double buffering crypto: caam - fix error path for ctx_dma mapping failure crypto: caam - fix DMA API leaks for multiple setkey() calls crypto: caam - don't dma_map key for hash algorithms crypto: caam - use dma_map_sg() return code crypto: caam - replace sg_count() with sg_nents_for_len() crypto: caam - check sg_count() return value crypto: caam - fix HW S/G in ablkcipher_giv_edesc_alloc() ..
| * crypto: xts - Add ECB dependencyMilan Broz2017-02-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | Since the commit f1c131b45410a202eb45cc55980a7a9e4e4b4f40 crypto: xts - Convert to skcipher the XTS mode is based on ECB, so the mode must select ECB otherwise it can fail to initialize. Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: cavium - switch to pci_alloc_irq_vectorsChristoph Hellwig2017-02-232-141/+65Star
| | | | | | | | | | | | | | | | | | pci_enable_msix has been long deprecated, but this driver adds a new instance. Convert it to pci_alloc_irq_vectors and greatly simplify the code, and make sure the prope code properly unwinds. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: cavium - switch to pci_alloc_irq_vectorsChristoph Hellwig2017-02-232-53/+10Star
| | | | | | | | | | | | | | | | | | pci_enable_msix has been long deprecated, but this driver adds a new instance. Convert it to pci_alloc_irq_vectors and greatly simplify the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: cavium - remove dead MSI-X related defineChristoph Hellwig2017-02-231-2/+0Star
| | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: brcm - Avoid double free in ahash_finup()Rob Rice2017-02-151-1/+0Star
| | | | | | | | | | | | | | | | | | | | In Broadcom SPU driver, in case where incremental hash is done in software in ahash_finup(), tmpbuf was freed twice. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Rob Rice <rob.rice@broadcom.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: cavium - fix Kconfig dependenciesArnd Bergmann2017-02-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The driver fails to build if MSI support is disabled: In file included from /git/arm-soc/drivers/crypto/cavium/cpt/cptpf_main.c:18:0: drivers/crypto/cavium/cpt/cptpf.h:57:20: error: array type has incomplete element type 'struct msix_entry' struct msix_entry msix_entries[CPT_PF_MSIX_VECTORS]; ^~~~~~~~~~~~ drivers/crypto/cavium/cpt/cptpf_main.c: In function 'cpt_enable_msix': drivers/crypto/cavium/cpt/cptpf_main.c:344:8: error: implicit declaration of function 'pci_enable_msix';did you mean 'cpt_enable_msix'? [-Werror=implicit-function-declaration] On the other hand, it doesn't seem to have any build dependency on ARCH_THUNDER, so let's allow compile-testing to catch this kind of problem more easily. The 64-bit dependency is needed for the use of readq/writeq. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David Daney <david.daney@cavium.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: cavium - cpt_bind_vq_to_grp could return an error codeGeorge Cherian2017-02-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | cpt_bind_vq_to_grp() could return an error code. However, it currently returns a u8. This produce the static checker warning. drivers/crypto/cavium/cpt/cptpf_mbox.c:70 cpt_bind_vq_to_grp() warn: signedness bug returning '(-22)' Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: George Cherian <george.cherian@cavium.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: doc - fix typoGilad Ben-Yossef2017-02-151-1/+1
| | | | | | | | | | | | | | Fix a single letter typo in api-skcipher.rst. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * hwrng: omap - update Kconfig help descriptionRussell King2017-02-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | omap-rng also supports Marvell Armada 7k/8k SoCs, but no mention of this is made in the help text, despite the dependency being added. Explicitly mention these SoCs in the help description so people know that it covers more than just TI SoCs. Fixes: 383212425c92 ("hwrng: omap - Add device variant for SafeXcel IP-76 found in Armada 8K") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: ccm - drop unnecessary minimum 32-bit alignmentArd Biesheuvel2017-02-151-2/+1Star
| | | | | | | | | | | | | | | | | | | | The CCM driver forces 32-bit alignment even if the underlying ciphers don't care about alignment. This is because crypto_xor() used to require this, but since this is no longer the case, drop the hardcoded minimum of 32 bits. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: ccm - honour alignmask of subordinate MAC cipherArd Biesheuvel2017-02-151-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CCM driver was recently updated to defer the MAC part of the algorithm to a dedicated crypto transform, and a template for instantiating such transforms was added at the same time. However, this new cbcmac template fails to take the alignmask of the encapsulated cipher into account, which may result in buffer addresses being passed down that are not sufficiently aligned. So update the code to ensure that the digest buffer in the desc ctx appears at a sufficiently aligned offset, and tweak the code so that all calls to crypto_cipher_encrypt_one() operate on this buffer exclusively. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: caam - fix state buffer DMA (un)mappingHoria Geantă2017-02-151-55/+52Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we register the DMA API debug notification chain to receive platform bus events: dma_debug_add_bus(&platform_bus_type); we start receiving warnings after a simple test like "modprobe caam_jr && modprobe caamhash && modprobe -r caamhash && modprobe -r caam_jr": platform ffe301000.jr: DMA-API: device driver has pending DMA allocations while released from device [count=1938] One of leaked entries details: [device address=0x0000000173fda090] [size=63 bytes] [mapped with DMA_TO_DEVICE] [mapped as single] It turns out there are several issues with handling buf_dma (mapping of buffer holding the previous chunk smaller than hash block size): -detection of buf_dma mapping failure occurs too late, after a job descriptor using that value has been submitted for execution -dma mapping leak - unmapping is not performed in all places: for e.g. in ahash_export or in most ahash_fin* callbacks (due to current back-to-back implementation of buf_dma unmapping/mapping) Fix these by: -calling dma_mapping_error() on buf_dma right after the mapping and providing an error code if needed -unmapping buf_dma during the "job done" (ahash_done_*) callbacks Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: caam - abstract ahash request double bufferingHoria Geantă2017-02-151-29/+48
| | | | | | | | | | | | | | | | | | | | caamhash uses double buffering for holding previous/current and next chunks (data smaller than block size) to be hashed. Add (inline) functions to abstract this mechanism. Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: caam - fix error path for ctx_dma mapping failureHoria Geantă2017-02-151-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case ctx_dma dma mapping fails, ahash_unmap_ctx() tries to dma unmap an invalid address: map_seq_out_ptr_ctx() / ctx_map_to_sec4_sg() -> goto unmap_ctx -> -> ahash_unmap_ctx() -> dma unmap ctx_dma There is also possible to reach ahash_unmap_ctx() with ctx_dma uninitialzed or to try to unmap the same address twice. Fix these by setting ctx_dma = 0 where needed: -initialize ctx_dma in ahash_init() -clear ctx_dma in case of mapping error (instead of holding the error code returned by the dma map function) -clear ctx_dma after each unmapping Fixes: 32686d34f8fb6 ("crypto: caam - ensure that we clean up after an error") Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: caam - fix DMA API leaks for multiple setkey() callsHoria Geantă2017-02-152-252/+102Star
| | | | | | | | | | | | | | | | | | | | | | | | setkey() callback may be invoked multiple times for the same tfm. In this case, DMA API leaks are caused by shared descriptors (and key for caamalg) being mapped several times and unmapped only once. Fix this by performing mapping / unmapping only in crypto algorithm's cra_init() / cra_exit() callbacks and sync_for_device in the setkey() tfm callback. Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: caam - don't dma_map key for hash algorithmsHoria Geantă2017-02-151-17/+1Star
| | | | | | | | | | | | | | | | | | | | | | Shared descriptors for hash algorithms are small enough for (split) keys to be inlined in all cases. Since driver already does this, all what's left is to remove unused ctx->key_dma. Fixes: 045e36780f115 ("crypto: caam - ahash hmac support") Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: caam - use dma_map_sg() return codeHoria Geantă2017-02-151-62/+71
| | | | | | | | | | | | | | | | | | dma_map_sg() might coalesce S/G entries, so use the number of S/G entries returned by it instead of what sg_nents_for_len() initially returns. Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: caam - replace sg_count() with sg_nents_for_len()Horia Geantă2017-02-152-112/+88Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace internal sg_count() function and the convoluted logic around it with the standard sg_nents_for_len() function. src_nents, dst_nents now hold the number of SW S/G entries, instead of the HW S/G table entries. With this change, null (zero length) input data for AEAD case needs to be handled in a visible way. req->src is no longer (un)mapped, pointer address is set to 0 in SEQ IN PTR command. Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: caam - check sg_count() return valueHoria Geantă2017-02-151-2/+42
| | | | | | | | | | | | | | | | | | | | | | sg_count() internally calls sg_nents_for_len(), which could fail in case the required number of bytes is larger than the total bytes in the S/G. Thus, add checks to validate the input. Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: caam - fix HW S/G in ablkcipher_giv_edesc_alloc()Horia Geantă2017-02-151-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HW S/G generation does not work properly when the following conditions are met: -src == dst -src/dst is S/G -IV is right before (contiguous with) the first src/dst S/G entry since "iv_contig" is set to true (iv_contig is a misnomer here and it actually refers to the whole output being contiguous) Fix this by setting dst S/G nents equal to src S/G nents, instead of leaving it set to init value (0). Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: caam - fix JR IO mapping if one failsTudor Ambarus2017-02-151-8/+9
| | | | | | | | | | | | | | | | | | If one of the JRs failed at init, the next JR used the failed JR's IO space. The patch fixes this bug. Signed-off-by: Tudor Ambarus <tudor-dan.ambarus@nxp.com> Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: caam - check return code of dma_set_mask_and_coherent()Horia Geantă2017-02-152-10/+24
| | | | | | | | | | | | | | | | Setting the dma mask could fail, thus make sure it succeeds before going further. Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: caam - don't include unneeded headersHoria Geantă2017-02-152-3/+0Star
| | | | | | | | | | | | | | | | intern.h, jr.h are not needed in error.c error.h is not needed in ctrl.c Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: ccp - Simplify some buffer management routinesGary R Hook2017-02-151-86/+56Star
| | | | | | | | | | | | | | | | The reverse-get/set functions can be simplified by eliminating unused code. Signed-off-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: ccp - Update the command queue on errorsGary R Hook2017-02-151-2/+5
| | | | | | | | | | | | | | | | Move the command queue tail pointer when an error is detected. Always return the error. Signed-off-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: ccp - Change mode for detailed CCP init messagesGary R Hook2017-02-151-3/+2Star
| | | | | | | | | | | | | | | | The CCP initialization messages only need to be sent to syslog in debug mode. Signed-off-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: atmel-sha - fix error management in atmel_sha_start()Cyrille Pitchen2017-02-151-5/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch clarifies and fixes how errors should be handled by atmel_sha_start(). For update operations, the previous code wrongly assumed that (err != -EINPROGRESS) implies (err == 0). It's wrong because that doesn't take the error cases (err < 0) into account. This patch also adds many comments to detail all the possible returned values and what should be done in each case. Especially, when an error occurs, since atmel_sha_complete() has already been called, hence releasing the hardware, atmel_sha_start() must not call atmel_sha_finish_req() later otherwise atmel_sha_complete() would be called a second time. Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: atmel-sha - fix missing "return" instructionsCyrille Pitchen2017-02-151-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a previous patch: "crypto: atmel-sha - update request queue management to make it more generic". Indeed the patch above should have replaced the "return -EINVAL;" lines by "return atmel_sha_complete(dd, -EINVAL);" but instead replaced them by a simple call of "atmel_sha_complete(dd, -EINVAL);". Hence all "return" instructions were missing. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: ccp - Set the AES size field for all modesGary R Hook2017-02-153-2/+10
| | | | | | | | | | | | | | | | Ensure that the size field is correctly populated for all AES modes. Signed-off-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: brcm - Add Broadcom SPU driverRob Rice2017-02-1112-0/+9516
| | | | | | | | | | | | | | | | | | | | Add Broadcom Secure Processing Unit (SPU) crypto driver for SPU hardware crypto offload. The driver supports ablkcipher, ahash, and aead symmetric crypto operations. Signed-off-by: Steve Lin <steven.lin1@broadcom.com> Signed-off-by: Rob Rice <rob.rice@broadcom.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: brcm - DT documentation for Broadcom SPU hardwareRob Rice2017-02-111-0/+22
| | | | | | | | | | | | | | | | | | | | Device tree documentation for Broadcom Secure Processing Unit (SPU) crypto hardware. Signed-off-by: Steve Lin <steven.lin1@broadcom.com> Signed-off-by: Rob Rice <rob.rice@broadcom.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: cavium - Enable CPT options crypto for buildGeorge Cherian2017-02-113-0/+9
| | | | | | | | | | | | | | | | | | | | | | Add the CPT options in crypto Kconfig and update the crypto Makefile Update the MAINTAINERS file too. Signed-off-by: George Cherian <george.cherian@cavium.com> Reviewed-by: David Daney <david.daney@cavium.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: cavium - Add the Virtual Function driver for CPTGeorge Cherian2017-02-118-1/+2581
| | | | | | | | | | | | | | | | | | Enable the CPT VF driver. CPT is the cryptographic Acceleration Unit in Octeon-tx series of processors. Signed-off-by: George Cherian <george.cherian@cavium.com> Reviewed-by: David Daney <david.daney@cavium.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: cavium - Add Support for Octeon-tx CPT EngineGeorge Cherian2017-02-117-0/+1774
| | | | | | | | | | | | | | | | | | | | | | Enable the Physical Function driver for the Cavium Crypto Engine (CPT) found in Octeon-tx series of SoC's. CPT is the Cryptographic Accelaration Unit. CPT includes microcoded GigaCypher symmetric engines (SEs) and asymmetric engines (AEs). Signed-off-by: George Cherian <george.cherian@cavium.com> Reviewed-by: David Daney <david.daney@cavium.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * hwrng: cavium - Use per device name to allow for multiple devices.David Daney2017-02-111-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Systems containing the Cavium HW RNG may have one device per NUMA node. A typical configuration is a 2-node NUMA system, which results in 2 RNG devices. The hwrng subsystem refuses (and rightly so) to register more than one device with he same name, so we get failure messages on these systems. Make the hwrng name unique by including the underlying device name. Also remove spaces from the name to make it possible to switch devices via the sysfs knobs. Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: atmel - fix 64-bit build warningsArnd Bergmann2017-02-112-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we enable COMPILE_TEST building for the Atmel sha and tdes implementations, we run into a couple of warnings about incorrect format strings, e.g. In file included from include/linux/platform_device.h:14:0, from drivers/crypto/atmel-sha.c:24: drivers/crypto/atmel-sha.c: In function 'atmel_sha_xmit_cpu': drivers/crypto/atmel-sha.c:571:19: error: format '%d' expects argument of type 'int', but argument 6 has type 'size_t {aka long unsigned int}' [-Werror=format=] In file included from include/linux/printk.h:6:0, from include/linux/kernel.h:13, from drivers/crypto/atmel-tdes.c:17: drivers/crypto/atmel-tdes.c: In function 'atmel_tdes_crypt_dma_stop': include/linux/kern_levels.h:4:18: error: format '%u' expects argument of type 'unsigned int', but argument 2 has type 'size_t {aka long unsigned int}' [-Werror=format=] These are all fixed by using the "%z" modifier for size_t data. There are also a few uses of min()/max() with incompatible types: drivers/crypto/atmel-tdes.c: In function 'atmel_tdes_crypt_start': drivers/crypto/atmel-tdes.c:528:181: error: comparison of distinct pointer types lacks a cast [-Werror] Where possible, we should use consistent types here, otherwise we can use min_t()/max_t() to get well-defined behavior without a warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: atmel - refine Kconfig dependenciesArnd Bergmann2017-02-111-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the new authenc support, we get a harmless Kconfig warning: warning: (CRYPTO_DEV_ATMEL_AUTHENC) selects CRYPTO_DEV_ATMEL_SHA which has unmet direct dependencies (CRYPTO && CRYPTO_HW && ARCH_AT91) The problem is that each of the options has slightly different dependencies, although they all seem to want the same thing: allow building for real AT91 targets that actually have the hardware, and possibly for compile testing. This makes all four options consistent: instead of depending on a particular dmaengine implementation, we depend on the ARM platform, CONFIG_COMPILE_TEST as an alternative when that is turned off. This makes the 'select' statements work correctly. Fixes: 89a82ef87e01 ("crypto: atmel-authenc - add support to authenc(hmac(shaX), Y(aes)) modes") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: algapi - make crypto_xor() and crypto_inc() alignment agnosticArd Biesheuvel2017-02-118-34/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of unconditionally forcing 4 byte alignment for all generic chaining modes that rely on crypto_xor() or crypto_inc() (which may result in unnecessary copying of data when the underlying hardware can perform unaligned accesses efficiently), make those functions deal with unaligned input explicitly, but only if the Kconfig symbol HAVE_EFFICIENT_UNALIGNED_ACCESS is set. This will allow us to drop the alignmasks from the CBC, CMAC, CTR, CTS, PCBC and SEQIV drivers. For crypto_inc(), this simply involves making the 4-byte stride conditional on HAVE_EFFICIENT_UNALIGNED_ACCESS being set, given that it typically operates on 16 byte buffers. For crypto_xor(), an algorithm is implemented that simply runs through the input using the largest strides possible if unaligned accesses are allowed. If they are not, an optimal sequence of memory accesses is emitted that takes the relative alignment of the input buffers into account, e.g., if the relative misalignment of dst and src is 4 bytes, the entire xor operation will be completed using 4 byte loads and stores (modulo unaligned bits at the start and end). Note that all expressions involving misalign are simply eliminated by the compiler when HAVE_EFFICIENT_UNALIGNED_ACCESS is defined. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: improve gcc optimization flags for serpent and wp512Arnd Bergmann2017-02-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An ancient gcc bug (first reported in 2003) has apparently resurfaced on MIPS, where kernelci.org reports an overly large stack frame in the whirlpool hash algorithm: crypto/wp512.c:987:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=] With some testing in different configurations, I'm seeing large variations in stack frames size up to 1500 bytes for what should have around 300 bytes at most. I also checked the reference implementation, which is essentially the same code but also comes with some test and benchmarking infrastructure. It seems that recent compiler versions on at least arm, arm64 and powerpc have a partial fix for this problem, but enabling "-fsched-pressure", but even with that fix they suffer from the issue to a certain degree. Some testing on arm64 shows that the time needed to hash a given amount of data is roughly proportional to the stack frame size here, which makes sense given that the wp512 implementation is doing lots of loads for table lookups, and the problem with the overly large stack is a result of doing a lot more loads and stores for spilled registers (as seen from inspecting the object code). Disabling -fschedule-insns consistently fixes the problem for wp512, in my collection of cross-compilers, the results are consistently better or identical when comparing the stack sizes in this function, though some architectures (notable x86) have schedule-insns disabled by default. The four columns are: default: -O2 press: -O2 -fsched-pressure nopress: -O2 -fschedule-insns -fno-sched-pressure nosched: -O2 -no-schedule-insns (disables sched-pressure) default press nopress nosched alpha-linux-gcc-4.9.3 1136 848 1136 176 am33_2.0-linux-gcc-4.9.3 2100 2076 2100 2104 arm-linux-gnueabi-gcc-4.9.3 848 848 1048 352 cris-linux-gcc-4.9.3 272 272 272 272 frv-linux-gcc-4.9.3 1128 1000 1128 280 hppa64-linux-gcc-4.9.3 1128 336 1128 184 hppa-linux-gcc-4.9.3 644 308 644 276 i386-linux-gcc-4.9.3 352 352 352 352 m32r-linux-gcc-4.9.3 720 656 720 268 microblaze-linux-gcc-4.9.3 1108 604 1108 256 mips64-linux-gcc-4.9.3 1328 592 1328 208 mips-linux-gcc-4.9.3 1096 624 1096 240 powerpc64-linux-gcc-4.9.3 1088 432 1088 160 powerpc-linux-gcc-4.9.3 1080 584 1080 224 s390-linux-gcc-4.9.3 456 456 624 360 sh3-linux-gcc-4.9.3 292 292 292 292 sparc64-linux-gcc-4.9.3 992 240 992 208 sparc-linux-gcc-4.9.3 680 592 680 312 x86_64-linux-gcc-4.9.3 224 240 272 224 xtensa-linux-gcc-4.9.3 1152 704 1152 304 aarch64-linux-gcc-7.0.0 224 224 1104 208 arm-linux-gnueabi-gcc-7.0.1 824 824 1048 352 mips-linux-gcc-7.0.0 1120 648 1120 272 x86_64-linux-gcc-7.0.1 240 240 304 240 arm-linux-gnueabi-gcc-4.4.7 840 392 arm-linux-gnueabi-gcc-4.5.4 784 728 784 320 arm-linux-gnueabi-gcc-4.6.4 736 728 736 304 arm-linux-gnueabi-gcc-4.7.4 944 784 944 352 arm-linux-gnueabi-gcc-4.8.5 464 464 760 352 arm-linux-gnueabi-gcc-4.9.3 848 848 1048 352 arm-linux-gnueabi-gcc-5.3.1 824 824 1064 336 arm-linux-gnueabi-gcc-6.1.1 808 808 1056 344 arm-linux-gnueabi-gcc-7.0.1 824 824 1048 352 Trying the same test for serpent-generic, the picture is a bit different, and while -fno-schedule-insns is generally better here than the default, -fsched-pressure wins overall, so I picked that instead. default press nopress nosched alpha-linux-gcc-4.9.3 1392 864 1392 960 am33_2.0-linux-gcc-4.9.3 536 524 536 528 arm-linux-gnueabi-gcc-4.9.3 552 552 776 536 cris-linux-gcc-4.9.3 528 528 528 528 frv-linux-gcc-4.9.3 536 400 536 504 hppa64-linux-gcc-4.9.3 524 208 524 480 hppa-linux-gcc-4.9.3 768 472 768 508 i386-linux-gcc-4.9.3 564 564 564 564 m32r-linux-gcc-4.9.3 712 576 712 532 microblaze-linux-gcc-4.9.3 724 392 724 512 mips64-linux-gcc-4.9.3 720 384 720 496 mips-linux-gcc-4.9.3 728 384 728 496 powerpc64-linux-gcc-4.9.3 704 304 704 480 powerpc-linux-gcc-4.9.3 704 296 704 480 s390-linux-gcc-4.9.3 560 560 592 536 sh3-linux-gcc-4.9.3 540 540 540 540 sparc64-linux-gcc-4.9.3 544 352 544 496 sparc-linux-gcc-4.9.3 544 344 544 496 x86_64-linux-gcc-4.9.3 528 536 576 528 xtensa-linux-gcc-4.9.3 752 544 752 544 aarch64-linux-gcc-7.0.0 432 432 656 480 arm-linux-gnueabi-gcc-7.0.1 616 616 808 536 mips-linux-gcc-7.0.0 720 464 720 488 x86_64-linux-gcc-7.0.1 536 528 600 536 arm-linux-gnueabi-gcc-4.4.7 592 440 arm-linux-gnueabi-gcc-4.5.4 776 448 776 544 arm-linux-gnueabi-gcc-4.6.4 776 448 776 544 arm-linux-gnueabi-gcc-4.7.4 768 448 768 544 arm-linux-gnueabi-gcc-4.8.5 488 488 776 544 arm-linux-gnueabi-gcc-4.9.3 552 552 776 536 arm-linux-gnueabi-gcc-5.3.1 552 552 776 536 arm-linux-gnueabi-gcc-6.1.1 560 560 776 536 arm-linux-gnueabi-gcc-7.0.1 616 616 808 536 I did not do any runtime tests with serpent, so it is possible that stack frame size does not directly correlate with runtime performance here and it actually makes things worse, but it's more likely to help here, and the reduced stack frame size is probably enough reason to apply the patch, especially given that the crypto code is often used in deep call chains. Link: https://kernelci.org/build/id/58797d7559b5149efdf6c3a9/logs/ Link: http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11488 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149 Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64/aes - add NEON/Crypto Extensions CBCMAC/CMAC/XCBC driverArd Biesheuvel2017-02-112-2/+267
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On ARMv8 implementations that do not support the Crypto Extensions, such as the Raspberry Pi 3, the CCM driver falls back to the generic table based AES implementation to perform the MAC part of the algorithm, which is slow and not time invariant. So add a CBCMAC implementation to the shared glue code between NEON AES and Crypto Extensions AES, so that it can be used instead now that the CCM driver has been updated to look for CBCMAC implementations other than the one it supplies itself. Also, given how these algorithms mostly only differ in the way the key handling and the final encryption are implemented, expose CMAC and XCBC algorithms as well based on the same core update code. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: ccm - switch to separate cbcmac driverArd Biesheuvel2017-02-112-137/+245
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update the generic CCM driver to defer CBC-MAC processing to a dedicated CBC-MAC ahash transform rather than open coding this transform (and much of the associated scatterwalk plumbing) in the CCM driver itself. This cleans up the code considerably, but more importantly, it allows the use of alternative CBC-MAC implementations that don't suffer from performance degradation due to significant setup time (e.g., the NEON based AES code needs to enable/disable the NEON, and load the S-box into 16 SIMD registers, which cannot be amortized over the entire input when using the cipher interface) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: testmgr - add test cases for cbcmac(aes)Ard Biesheuvel2017-02-112-0/+67
| | | | | | | | | | | | | | | | | | In preparation of splitting off the CBC-MAC transform in the CCM driver into a separate algorithm, define some test cases for the AES incarnation of cbcmac. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: aes - add generic time invariant AES cipherArd Biesheuvel2017-02-113-0/+393
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lookup table based AES is sensitive to timing attacks, which is due to the fact that such table lookups are data dependent, and the fact that 8 KB worth of tables covers a significant number of cachelines on any architecture, resulting in an exploitable correlation between the key and the processing time for known plaintexts. For network facing algorithms such as CTR, CCM or GCM, this presents a security risk, which is why arch specific AES ports are typically time invariant, either through the use of special instructions, or by using SIMD algorithms that don't rely on table lookups. For generic code, this is difficult to achieve without losing too much performance, but we can improve the situation significantly by switching to an implementation that only needs 256 bytes of table data (the actual S-box itself), which can be prefetched at the start of each block to eliminate data dependent latencies. This code encrypts at ~25 cycles per byte on ARM Cortex-A57 (while the ordinary generic AES driver manages 18 cycles per byte on this hardware). Decryption is substantially slower. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: aes-generic - drop alignment requirementArd Biesheuvel2017-02-111-32/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The generic AES code exposes a 32-bit align mask, which forces all users of the code to use temporary buffers or take other measures to ensure the alignment requirement is adhered to, even on architectures that don't care about alignment for software algorithms such as this one. So drop the align mask, and fix the code to use get_unaligned_le32() where appropriate, which will resolve to whatever is optimal for the architecture. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: sha512-mb - Protect sha512 mb ctx mgr accessTim Chen2017-02-111-22/+42
| | | | | | | | | | | | | | | | | | | | The flusher and regular multi-buffer computation via mcryptd may race with another. Add here a lock and turn off interrupt to to access multi-buffer computation state cstate->mgr before a round of computation. This should prevent the flusher code jumping in. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64/crc32 - merge CRC32 and PMULL instruction based driversArd Biesheuvel2017-02-115-312/+41Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PMULL based CRC32 implementation already contains code based on the separate, optional CRC32 instructions to fallback to when operating on small quantities of data. We can expose these routines directly on systems that lack the 64x64 PMULL instructions but do implement the CRC32 ones, which makes the driver that is based solely on those CRC32 instructions redundant. So remove it. Note that this aligns arm64 with ARM, whose accelerated CRC32 driver also combines the CRC32 extension based and the PMULL based versions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Matthias Brugger <mbrugger@suse.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm/aes - don't use IV buffer to return final keystream blockArd Biesheuvel2017-02-032-11/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ARM bit sliced AES core code uses the IV buffer to pass the final keystream block back to the glue code if the input is not a multiple of the block size, so that the asm code does not have to deal with anything except 16 byte blocks. This is done under the assumption that the outgoing IV is meaningless anyway in this case, given that chaining is no longer possible under these circumstances. However, as it turns out, the CCM driver does expect the IV to retain a value that is equal to the original IV except for the counter value, and even interprets byte zero as a length indicator, which may result in memory corruption if the IV is overwritten with something else. So use a separate buffer to return the final keystream block. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64/aes - don't use IV buffer to return final keystream blockArd Biesheuvel2017-02-032-18/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The arm64 bit sliced AES core code uses the IV buffer to pass the final keystream block back to the glue code if the input is not a multiple of the block size, so that the asm code does not have to deal with anything except 16 byte blocks. This is done under the assumption that the outgoing IV is meaningless anyway in this case, given that chaining is no longer possible under these circumstances. However, as it turns out, the CCM driver does expect the IV to retain a value that is equal to the original IV except for the counter value, and even interprets byte zero as a length indicator, which may result in memory corruption if the IV is overwritten with something else. So use a separate buffer to return the final keystream block. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64/aes - replace scalar fallback with plain NEON fallbackArd Biesheuvel2017-02-032-11/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new bitsliced NEON implementation of AES uses a fallback in two places: CBC encryption (which is strictly sequential, whereas this driver can only operate efficiently on 8 blocks at a time), and the XTS tweak generation, which involves encrypting a single AES block with a different key schedule. The plain (i.e., non-bitsliced) NEON code is more suitable as a fallback, given that it is faster than scalar on low end cores (which is what the NEON implementations target, since high end cores have dedicated instructions for AES), and shows similar behavior in terms of D-cache footprint and sensitivity to cache timing attacks. So switch the fallback handling to the plain NEON driver. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>