summaryrefslogtreecommitdiffstats
path: root/arch/arm64/crypto
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'linus' of ↵Linus Torvalds2019-07-096-102/+130
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "Here is the crypto update for 5.3: API: - Test shash interface directly in testmgr - cra_driver_name is now mandatory Algorithms: - Replace arc4 crypto_cipher with library helper - Implement 5 way interleave for ECB, CBC and CTR on arm64 - Add xxhash - Add continuous self-test on noise source to drbg - Update jitter RNG Drivers: - Add support for SHA204A random number generator - Add support for 7211 in iproc-rng200 - Fix fuzz test failures in inside-secure - Fix fuzz test failures in talitos - Fix fuzz test failures in qat" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (143 commits) crypto: stm32/hash - remove interruptible condition for dma crypto: stm32/hash - Fix hmac issue more than 256 bytes crypto: stm32/crc32 - rename driver file crypto: amcc - remove memset after dma_alloc_coherent crypto: ccp - Switch to SPDX license identifiers crypto: ccp - Validate the the error value used to index error messages crypto: doc - Fix formatting of new crypto engine content crypto: doc - Add parameter documentation crypto: arm64/aes-ce - implement 5 way interleave for ECB, CBC and CTR crypto: arm64/aes-ce - add 5 way interleave routines crypto: talitos - drop icv_ool crypto: talitos - fix hash on SEC1. crypto: talitos - move struct talitos_edesc into talitos.h lib/scatterlist: Fix mapping iterator when sg->offset is greater than PAGE_SIZE crypto/NX: Set receive window credits to max number of CRBs in RxFIFO crypto: asymmetric_keys - select CRYPTO_HASH where needed crypto: serpent - mark __serpent_setkey_sbox noinline crypto: testmgr - dynamically allocate crypto_shash crypto: testmgr - dynamically allocate testvec_config crypto: talitos - eliminate unneeded 'done' functions at build time ...
| * crypto: arm64/aes-ce - implement 5 way interleave for ECB, CBC and CTRArd Biesheuvel2019-07-033-31/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implements 5-way interleaving for ECB, CBC decryption and CTR, resulting in a speedup of ~11% on Marvell ThunderX2, which has a very deep pipeline and therefore a high issue latency for NEON instructions operating on the same registers. Note that XTS is left alone: implementing 5-way interleave there would either involve spilling of the calculated tweaks to the stack, or recalculating them after the encryption operation, and doing either of those would most likely penalize low end cores. For ECB, this is not a concern at all, given that we have plenty of spare registers. For CTR and CBC decryption, we take advantage of the fact that v16 is not used by the CE version of the code (which is the only one targeted by the optimization), and so we can reshuffle the code a bit and avoid having to spill to memory (with the exception of one extra reload in the CBC routine) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64/aes-ce - add 5 way interleave routinesArd Biesheuvel2019-07-033-68/+52Star
| | | | | | | | | | | | | | | | | | | | In preparation of tweaking the accelerated AES chaining mode routines to be able to use a 5-way stride, implement the core routines to support processing 5 blocks of input at a time. While at it, drop the 2 way versions, which have been unused for a while now. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: chacha - constify ctx and iv argumentsEric Biggers2019-06-131-1/+1
| | | | | | | | | | | | | | | | | | | | Constify the ctx and iv arguments to crypto_chacha_init() and the various chacha*_stream_xor() functions. This makes it clear that they are not modified. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64/sha2-ce - correct digest for empty data in finupElena Petrova2019-06-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sha256-ce finup implementation for ARM64 produces wrong digest for empty input (len=0). Expected: the actual digest, result: initial value of SHA internal state. The error is in sha256_ce_finup: for empty data `finalize` will be 1, so the code is relying on sha2_ce_transform to make the final round. However, in sha256_base_do_update, the block function will not be called when len == 0. Fix it by setting finalize to 0 if data is empty. Fixes: 03802f6a80b3a ("crypto: arm64/sha2-ce - move SHA-224/256 ARMv8 implementation to base layer") Cc: stable@vger.kernel.org Signed-off-by: Elena Petrova <lenaptr@google.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64/sha1-ce - correct digest for empty data in finupElena Petrova2019-06-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sha1-ce finup implementation for ARM64 produces wrong digest for empty input (len=0). Expected: da39a3ee..., result: 67452301... (initial value of SHA internal state). The error is in sha1_ce_finup: for empty data `finalize` will be 1, so the code is relying on sha1_ce_transform to make the final round. However, in sha1_base_do_update, the block function will not be called when len == 0. Fix it by setting finalize to 0 if data is empty. Fixes: 07eb54d306f4 ("crypto: arm64/sha1-ce - move SHA-1 ARMv8 implementation to base layer") Cc: stable@vger.kernel.org Signed-off-by: Elena Petrova <lenaptr@google.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner2019-06-1923-92/+23Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner2019-05-302-12/+2Star
|/ | | | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Merge branch 'linus' of ↵Linus Torvalds2019-05-0715-39/+56
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Add support for AEAD in simd - Add fuzz testing to testmgr - Add panic_on_fail module parameter to testmgr - Use per-CPU struct instead multiple variables in scompress - Change verify API for akcipher Algorithms: - Convert x86 AEAD algorithms over to simd - Forbid 2-key 3DES in FIPS mode - Add EC-RDSA (GOST 34.10) algorithm Drivers: - Set output IV with ctr-aes in crypto4xx - Set output IV in rockchip - Fix potential length overflow with hashing in sun4i-ss - Fix computation error with ctr in vmx - Add SM4 protected keys support in ccree - Remove long-broken mxc-scc driver - Add rfc4106(gcm(aes)) cipher support in cavium/nitrox" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (179 commits) crypto: ccree - use a proper le32 type for le32 val crypto: ccree - remove set but not used variable 'du_size' crypto: ccree - Make cc_sec_disable static crypto: ccree - fix spelling mistake "protedcted" -> "protected" crypto: caam/qi2 - generate hash keys in-place crypto: caam/qi2 - fix DMA mapping of stack memory crypto: caam/qi2 - fix zero-length buffer DMA mapping crypto: stm32/cryp - update to return iv_out crypto: stm32/cryp - remove request mutex protection crypto: stm32/cryp - add weak key check for DES crypto: atmel - remove set but not used variable 'alg_name' crypto: picoxcell - Use dev_get_drvdata() crypto: crypto4xx - get rid of redundant using_sd variable crypto: crypto4xx - use sync skcipher for fallback crypto: crypto4xx - fix cfb and ofb "overran dst buffer" issues crypto: crypto4xx - fix ctr-aes missing output IV crypto: ecrdsa - select ASN1 and OID_REGISTRY for EC-RDSA crypto: ux500 - use ccflags-y instead of CFLAGS_<basename>.o crypto: ccree - handle tee fips error during power management resume crypto: ccree - add function to handle cryptocell tee fips error ...
| * crypto: arm64/aes-neonbs - don't access already-freed walk.ivEric Biggers2019-04-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the user-provided IV needs to be aligned to the algorithm's alignmask, then skcipher_walk_virt() copies the IV into a new aligned buffer walk.iv. But skcipher_walk_virt() can fail afterwards, and then if the caller unconditionally accesses walk.iv, it's a use-after-free. xts-aes-neonbs doesn't set an alignmask, so currently it isn't affected by this despite unconditionally accessing walk.iv. However this is more subtle than desired, and unconditionally accessing walk.iv has caused a real problem in other algorithms. Thus, update xts-aes-neonbs to start checking the return value of skcipher_walk_virt(). Fixes: 1abee99eafab ("crypto: arm64/aes - reimplement bit-sliced ARM/NEON implementation for arm64") Cc: <stable@vger.kernel.org> # v4.11+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64/cbcmac - handle empty messages in same way as templateEric Biggers2019-04-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | My patches to make testmgr fuzz algorithms against their generic implementation detected that the arm64 implementations of "cbcmac(aes)" handle empty messages differently from the cbcmac template. Namely, the arm64 implementations return the encrypted initial value, but the cbcmac template returns the initial value directly. This isn't actually a meaningful case because any user of cbcmac needs to prepend the message length, as CCM does; otherwise it's insecure. However, we should keep the behavior consistent; at the very least this makes testing easier. Do it the easy way, which is to change the arm64 implementations to have the same behavior as the cbcmac template. For what it's worth, ghash does things essentially the same way: it returns its initial value when given an empty message, even though in practice ghash is never passed an empty message. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64 - convert to use crypto_simd_usable()Eric Biggers2019-03-2215-34/+47
| | | | | | | | | | | | | | | | | | Replace all calls to may_use_simd() in the arm64 crypto code with crypto_simd_usable(), in order to allow testing the no-SIMD code paths. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64/gcm-aes-ce - fix no-NEON fallback codeEric Biggers2019-03-221-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The arm64 gcm-aes-ce algorithm is failing the extra crypto self-tests following my patches to test the !may_use_simd() code paths, which previously were untested. The problem is that in the !may_use_simd() case, an odd number of AES blocks can be processed within each step of the skcipher_walk. However, the skcipher_walk is being done with a "stride" of 2 blocks and is advanced by an even number of blocks after each step. This causes the encryption to produce the wrong ciphertext and authentication tag, and causes the decryption to incorrectly fail. Fix it by only processing an even number of blocks per step. Fixes: c2b24c36e0a3 ("crypto: arm64/aes-gcm-ce - fix scatterwalk API violation") Fixes: 71e52c278c54 ("crypto: arm64/aes-ce-gcm - operate on two input blocks at a time") Cc: <stable@vger.kernel.org> # v4.19+ Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | arm64: HWCAP: add support for AT_HWCAP2Andrew Murray2019-04-167-12/+12
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | As we will exhaust the first 32 bits of AT_HWCAP let's start exposing AT_HWCAP2 to userspace to give us up to 64 caps. Whilst it's possible to use the remaining 32 bits of AT_HWCAP, we prefer to expand into AT_HWCAP2 in order to provide a consistent view to userspace between ILP32 and LP64. However internal to the kernel we prefer to continue to use the full space of elf_hwcap. To reduce complexity and allow for future expansion, we now represent hwcaps in the kernel as ordinals and use a KERNEL_HWCAP_ prefix. This allows us to support automatic feature based module loading for all our hwcaps. We introduce cpu_set_feature to set hwcaps which complements the existing cpu_have_feature helper. These helpers allow us to clean up existing direct uses of elf_hwcap and reduce any future effort required to move beyond 64 caps. For convenience we also introduce cpu_{have,set}_named_feature which makes use of the cpu_feature macro to allow providing a hwcap name without a {KERNEL_}HWCAP_ prefix. Signed-off-by: Andrew Murray <andrew.murray@arm.com> [will: use const_ilog2() and tweak documentation] Signed-off-by: Will Deacon <will.deacon@arm.com>
* Merge branch 'linus' of ↵Linus Torvalds2019-03-057-347/+383
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Add helper for simple skcipher modes. - Add helper to register multiple templates. - Set CRYPTO_TFM_NEED_KEY when setkey fails. - Require neither or both of export/import in shash. - AEAD decryption test vectors are now generated from encryption ones. - New option CONFIG_CRYPTO_MANAGER_EXTRA_TESTS that includes random fuzzing. Algorithms: - Conversions to skcipher and helper for many templates. - Add more test vectors for nhpoly1305 and adiantum. Drivers: - Add crypto4xx prng support. - Add xcbc/cmac/ecb support in caam. - Add AES support for Exynos5433 in s5p. - Remove sha384/sha512 from artpec7 as hardware cannot do partial hash" [ There is a merge of the Freescale SoC tree in order to pull in changes required by patches to the caam/qi2 driver. ] * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (174 commits) crypto: s5p - add AES support for Exynos5433 dt-bindings: crypto: document Exynos5433 SlimSSS crypto: crypto4xx - add missing of_node_put after of_device_is_available crypto: cavium/zip - fix collision with generic cra_driver_name crypto: af_alg - use struct_size() in sock_kfree_s() crypto: caam - remove redundant likely/unlikely annotation crypto: s5p - update iv after AES-CBC op end crypto: x86/poly1305 - Clear key material from stack in SSE2 variant crypto: caam - generate hash keys in-place crypto: caam - fix DMA mapping xcbc key twice crypto: caam - fix hash context DMA unmap size hwrng: bcm2835 - fix probe as platform device crypto: s5p-sss - Use AES_BLOCK_SIZE define instead of number crypto: stm32 - drop pointless static qualifier in stm32_hash_remove() crypto: chelsio - Fixed Traffic Stall crypto: marvell - Remove set but not used variable 'ivsize' crypto: ccp - Update driver messages to remove some confusion crypto: adiantum - add 1536 and 4096-byte test vectors crypto: nhpoly1305 - add a test vector with len % 16 != 0 crypto: arm/aes-ce - update IV after partial final CTR block ...
| * crypto: arm64/aes-blk - update IV after partial final CTR blockEric Biggers2019-02-221-2/+1Star
| | | | | | | | | | | | | | | | | | | | Make the arm64 ctr-aes-neon and ctr-aes-ce algorithms update the IV buffer to contain the next counter after processing a partial final block, rather than leave it as the last counter. This makes these algorithms pass the updated AES-CTR tests. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64/aes-neonbs - fix returning final keystream blockEric Biggers2019-02-081-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The arm64 NEON bit-sliced implementation of AES-CTR fails the improved skcipher tests because it sometimes produces the wrong ciphertext. The bug is that the final keystream block isn't returned from the assembly code when the number of non-final blocks is zero. This can happen if the input data ends a few bytes after a page boundary. In this case the last bytes get "encrypted" by XOR'ing them with uninitialized memory. Fix the assembly code to return the final keystream block when needed. Fixes: 88a3f582bea9 ("crypto: arm64/aes - don't use IV buffer to return final keystream block") Cc: <stable@vger.kernel.org> # v4.11+ Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64/crct10dif-ce - cleanup and optimizationsEric Biggers2019-02-082-267/+233Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The x86, arm, and arm64 asm implementations of crct10dif are very difficult to understand partly because many of the comments, labels, and macros are named incorrectly: the lengths mentioned are usually off by a factor of two from the actual code. Many other things are unnecessarily convoluted as well, e.g. there are many more fold constants than actually needed and some aren't fully reduced. This series therefore cleans up all these implementations to be much more maintainable. I also made some small optimizations where I saw opportunities, resulting in slightly better performance. This patch cleans up the arm64 version. Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64/crct10dif - register PMULL variants as separate algosArd Biesheuvel2019-02-011-12/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The arm64 CRC-T10DIF implementation either uses 8-bit or 64-bit polynomial multiplication instructions, since the latter are faster but not mandatory in the architecture. Since that prevents us from testing both implementations on the same system, let's expose both implementations to the crypto API, with the priorities reflecting that the P64 version is the preferred one if available. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64/crct10dif - remove dead codeArd Biesheuvel2019-02-011-11/+0Star
| | | | | | | | | | | | | | | | | | Remove some code that is no longer called now that we make sure never to invoke the SIMD routine with less than 16 bytes of input. Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64/crct10dif - revert to C code for short inputsArd Biesheuvel2019-02-011-19/+6Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SIMD routine ported from x86 used to have a special code path for inputs < 16 bytes, which got lost somewhere along the way. Instead, the current glue code aligns the input pointer to 16 bytes, which is not really necessary on this architecture (although it could be beneficial to performance to expose aligned data to the the NEON routine), but this could result in inputs of less than 16 bytes to be passed in. This not only fails the new extended tests that Eric has implemented, it also results in the code reading past the end of the input, which could potentially result in crashes when dealing with less than 16 bytes of input at the end of a page which is followed by an unmapped page. So update the glue code to only invoke the NEON routine if the input is at least 16 bytes. Reported-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Fixes: 6ef5737f3931 ("crypto: arm64/crct10dif - port x86 SSE implementation to arm64") Cc: <stable@vger.kernel.org> # v4.10+ Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64/ghash - register PMULL variants as separate algosArd Biesheuvel2019-02-011-28/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The arm64 GHASH implementation either uses 8-bit or 64-bit polynomial multiplication instructions, since the latter are faster but not mandatory in the architecture. Since that prevents us from testing both implementations on the same system, let's expose both implementations to the crypto API, with the priorities reflecting that the P64 version is the preferred one if available. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64/aes-ccm - don't use an atomic walk needlesslyArd Biesheuvel2019-02-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When the AES-CCM code was first added, the NEON register were saved and restored eagerly, and so the code avoided doing so, and executed the scatterwalk in atomic context inside the kernel_neon_begin/end section. This has been changed in the meantime, so switch to non-atomic scatterwalks. Fixes: bd2ad885e30d ("crypto: arm64/aes-ce-ccm - move kernel mode neon ...") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64/aes-ccm - fix bugs in non-NEON fallback routineArd Biesheuvel2019-02-011-3/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 5092fcf34908 ("crypto: arm64/aes-ce-ccm: add non-SIMD generic fallback") introduced C fallback code to replace the NEON routines when invoked from a context where the NEON is not available (i.e., from the context of a softirq taken while the NEON is already being used in kernel process context) Fix two logical flaws in the MAC calculation of the associated data. Reported-by: Eric Biggers <ebiggers@kernel.org> Fixes: 5092fcf34908 ("crypto: arm64/aes-ce-ccm: add non-SIMD generic fallback") Cc: stable@vger.kernel.org Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: arm64/aes-ccm - fix logical bug in AAD MAC handlingArd Biesheuvel2019-02-011-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NEON MAC calculation routine fails to handle the case correctly where there is some data in the buffer, and the input fills it up exactly. In this case, we enter the loop at the end with w8 == 0, while a negative value is assumed, and so the loop carries on until the increment of the 32-bit counter wraps around, which is quite obviously wrong. So omit the loop altogether in this case, and exit right away. Reported-by: Eric Biggers <ebiggers@kernel.org> Fixes: a3fd82105b9d1 ("arm64/crypto: AES in CCM mode using ARMv8 Crypto ...") Cc: stable@vger.kernel.org Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm64/chacha - fix hchacha_block_neon() for big endianEric Biggers2019-02-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | On big endian arm64 kernels, the xchacha20-neon and xchacha12-neon self-tests fail because hchacha_block_neon() outputs little endian words but the C code expects native endianness. Fix it to output the words in native endianness (which also makes it match the arm32 version). Fixes: cc7cf991e9eb ("crypto: arm64/chacha20 - add XChaCha20 support") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm64/chacha - fix chacha_4block_xor_neon() for big endianEric Biggers2019-02-281-0/+16
|/ | | | | | | | | | | | | | | The change to encrypt a fifth ChaCha block using scalar instructions caused the chacha20-neon, xchacha20-neon, and xchacha12-neon self-tests to start failing on big endian arm64 kernels. The bug is that the keystream block produced in 32-bit scalar registers is directly XOR'd with the data words, which are loaded and stored in native endianness. Thus in big endian mode the data bytes end up XOR'd with the wrong bytes. Fix it by byte-swapping the keystream words in big endian mode. Fixes: 2fe55987b262 ("crypto: arm64/chacha - use combined SIMD/ALU routine for more speed") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* Merge tag 'kbuild-v4.21' of ↵Linus Torvalds2018-12-291-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: "Kbuild core: - remove unneeded $(call cc-option,...) switches - consolidate Clang compiler flags into CLANG_FLAGS - announce the deprecation of SUBDIRS - fix single target build for external module - simplify the dependencies of 'prepare' stage targets - allow fixdep to directly write to .*.cmd files - simplify dependency generation for CONFIG_TRIM_UNUSED_KSYMS - change if_changed_rule to accept multi-line recipe - move .SECONDARY special target to scripts/Kbuild.include - remove redundant 'set -e' - improve parallel execution for CONFIG_HEADERS_CHECK - misc cleanups Treewide fixes and cleanups - set Clang flags correctly for PowerPC boot images - fix UML build error with CONFIG_GCC_PLUGINS - remove unneeded patterns from .gitignore files - refactor firmware/Makefile - remove unneeded rules for *offsets.s - avoid unneeded regeneration of intermediate .s files - clean up ./Kbuild Modpost: - remove unused -M, -K options - fix false positive warnings about section mismatch - use simple devtable lookup instead of linker magic - misc cleanups Coccinelle: - relax boolinit.cocci checks for overall consistency - fix warning messages of boolinit.cocci Other tools: - improve -dirty check of scripts/setlocalversion - add a tool to generate compile_commands.json from .*.cmd files" * tag 'kbuild-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (51 commits) kbuild: remove unused cmd_gentimeconst kbuild: remove $(obj)/ prefixes in ./Kbuild treewide: add intermediate .s files to targets treewide: remove explicit rules for *offsets.s firmware: refactor firmware/Makefile firmware: remove unnecessary patterns from .gitignore scripts: remove unnecessary ihex2fw and check-lc_ctypes from .gitignore um: remove unused filechk_gen_header in Makefile scripts: add a tool to produce a compile_commands.json file kbuild: add -Werror=implicit-int flag unconditionally kbuild: add -Werror=strict-prototypes flag unconditionally kbuild: add -fno-PIE flag unconditionally scripts: coccinelle: Correct warning message scripts: coccinelle: only suggest true/false in files that already use them kbuild: handle part-of-module correctly for *.ll and *.symtypes kbuild: refactor part-of-module kbuild: refactor quiet_modtag kbuild: remove redundant quiet_modtag for $(obj-m) kbuild: refactor Makefile.asm-generic user/Makefile: Fix typo and capitalization in comment section ...
| * kbuild: move .SECONDARY special target to Kbuild.includeMasahiro Yamada2018-12-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 54a702f70589 ("kbuild: mark $(targets) as .SECONDARY and remove .PRECIOUS markers"), I missed one important feature of the .SECONDARY target: .SECONDARY with no prerequisites causes all targets to be treated as secondary. ... which agrees with the policy of Kbuild. Let's move it to scripts/Kbuild.include, with no prerequisites. Note: If an intermediate file is generated by $(call if_changed,...), you still need to add it to "targets" so its .*.cmd file is included. The arm/arm64 crypto files are generated by $(call cmd,shipped), so they do not need to be added to "targets", but need to be added to "clean-files" so "make clean" can properly clean them away. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* | crypto: arm64/chacha - use combined SIMD/ALU routine for more speedArd Biesheuvel2018-12-132-35/+239
| | | | | | | | | | | | | | | | | | | | | | To some degree, most known AArch64 micro-architectures appear to be able to issue ALU instructions in parellel to SIMD instructions without affecting the SIMD throughput. This means we can use the ALU to process a fifth ChaCha block while the SIMD is processing four blocks in parallel. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm64/chacha - optimize for arbitrary length inputsArd Biesheuvel2018-12-132-37/+184
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update the 4-way NEON ChaCha routine so it can handle input of any length >64 bytes in its entirety, rather than having to call into the 1-way routine and/or memcpy()s via temp buffers to handle the tail of a ChaCha invocation that is not a multiple of 256 bytes. On inputs that are a multiple of 256 bytes (and thus in tcrypt benchmarks), performance drops by around 1% on Cortex-A57, while performance for inputs drawn randomly from the range [64, 1024) increases by around 30%. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm64/chacha - add XChaCha12 supportEric Biggers2018-12-132-1/+19
| | | | | | | | | | | | | | | | | | | | | | Now that the ARM64 NEON implementation of ChaCha20 and XChaCha20 has been refactored to support varying the number of rounds, add support for XChaCha12. This is identical to XChaCha20 except for the number of rounds, which is 12 instead of 20. This can be used by Adiantum. Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm64/chacha20 - refactor to allow varying number of roundsEric Biggers2018-12-133-49/+57
| | | | | | | | | | | | | | | | | | In preparation for adding XChaCha12 support, rename/refactor the ARM64 NEON implementation of ChaCha20 to support different numbers of rounds. Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm64/chacha20 - add XChaCha20 supportEric Biggers2018-12-133-43/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | Add an XChaCha20 implementation that is hooked up to the ARM64 NEON implementation of ChaCha20. This can be used by Adiantum. A NEON implementation of single-block HChaCha20 is also added so that XChaCha20 can use it rather than the generic implementation. This required refactoring the ChaCha20 permutation into its own function. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: arm64/nhpoly1305 - add NEON-accelerated NHPoly1305Eric Biggers2018-12-134-0/+188
| | | | | | | | | | | | | | | | | | | | | | | | Add an ARM64 NEON implementation of NHPoly1305, an ε-almost-∆-universal hash function used in the Adiantum encryption mode. For now, only the NH portion is actually NEON-accelerated; the Poly1305 part is less performance-critical so is just implemented in C. Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> # big-endian Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: chacha20-generic - refactor to allow varying number of roundsEric Biggers2018-11-201-20/+20
|/ | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for adding XChaCha12 support, rename/refactor chacha20-generic to support different numbers of rounds. The justification for needing XChaCha12 support is explained in more detail in the patch "crypto: chacha - add XChaCha12 support". The only difference between ChaCha{8,12,20} are the number of rounds itself; all other parts of the algorithm are the same. Therefore, remove the "20" from all definitions, structures, functions, files, etc. that will be shared by all ChaCha versions. Also make ->setkey() store the round count in the chacha_ctx (previously chacha20_ctx). The generic code then passes the round count through to chacha_block(). There will be a ->setkey() function for each explicitly allowed round count; the encrypt/decrypt functions will be the same. I decided not to do it the opposite way (same ->setkey() function for all round counts, with different encrypt/decrypt functions) because that would have required more boilerplate code in architecture-specific implementations of ChaCha and XChaCha. Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Martin Willi <martin@strongswan.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes-blk - ensure XTS mask is always loadedArd Biesheuvel2018-10-121-4/+4
| | | | | | | | | | | | | | Commit 2e5d2f33d1db ("crypto: arm64/aes-blk - improve XTS mask handling") optimized away some reloads of the XTS mask vector, but failed to take into account that calls into the XTS en/decrypt routines will take a slightly different code path if a single block of input is split across different buffers. So let's ensure that the first load occurs unconditionally, and move the reload to the end so it doesn't occur needlessly. Fixes: 2e5d2f33d1db ("crypto: arm64/aes-blk - improve XTS mask handling") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes - fix handling sub-block CTS-CBC inputsEric Biggers2018-10-081-4/+9
| | | | | | | | | | | | In the new arm64 CTS-CBC implementation, return an error code rather than crashing on inputs shorter than AES_BLOCK_SIZE bytes. Also set cra_blocksize to AES_BLOCK_SIZE (like is done in the cts template) to indicate the minimum input size. Fixes: dd597fb33ff0 ("crypto: arm64/aes-blk - add support for CTS-CBC mode") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes-blk - improve XTS mask handlingArd Biesheuvel2018-09-213-19/+32
| | | | | | | | | | | | | | | | | | | The Crypto Extension instantiation of the aes-modes.S collection of skciphers uses only 15 NEON registers for the round key array, whereas the pure NEON flavor uses 16 NEON registers for the AES S-box. This means we have a spare register available that we can use to hold the XTS mask vector, removing the need to reload it at every iteration of the inner loop. Since the pure NEON version does not permit this optimization, tweak the macros so we can factor out this functionality. Also, replace the literal load with a short sequence to compose the mask vector. On Cortex-A53, this results in a ~4% speedup. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes-blk - add support for CTS-CBC modeArd Biesheuvel2018-09-212-1/+243
| | | | | | | | | | | | | Currently, we rely on the generic CTS chaining mode wrapper to instantiate the cts(cbc(aes)) skcipher. Due to the high performance of the ARMv8 Crypto Extensions AES instructions (~1 cycles per byte), any overhead in the chaining mode layers is amplified, and so it pays off considerably to fold the CTS handling into the SIMD routines. On Cortex-A53, this results in a ~50% speedup for smaller input sizes. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes-blk - revert NEON yield for skciphersArd Biesheuvel2018-09-211-173/+108Star
| | | | | | | | | | | | | | The reasoning of commit f10dc56c64bb ("crypto: arm64 - revert NEON yield for fast AEAD implementations") applies equally to skciphers: the walk API already guarantees that the input size of each call into the NEON code is bounded to the size of a page, and so there is no need for an additional TIF_NEED_RESCHED flag check inside the inner loop. So revert the skcipher changes to aes-modes.S (but retain the mac ones) This partially reverts commit 0c8f838a52fe9fd82761861a934f16ef9896b4e5. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes-blk - remove pointless (u8 *) castsArd Biesheuvel2018-09-211-24/+23Star
| | | | | | | | | | For some reason, the asmlinkage prototypes of the NEON routines take u8[] arguments for the round key arrays, while the actual round keys are arrays of u32, and so passing them into those routines requires u8* casts at each occurrence. Fix that. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/crct10dif - implement non-Crypto Extensions alternativeArd Biesheuvel2018-09-042-2/+162
| | | | | | | | | | | | | | | | The arm64 implementation of the CRC-T10DIF algorithm uses the 64x64 bit polynomial multiplication instructions, which are optional in the architecture, and if these instructions are not available, we fall back to the C routine which is slow and inefficient. So let's reuse the 64x64 bit PMULL alternative from the GHASH driver that uses a sequence of ~40 instructions involving 8x8 bit PMULL and some shifting and masking. This is a lot slower than the original, but it is still twice as fast as the current [unoptimized] C code on Cortex-A53, and it is time invariant and much easier on the D-cache. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/crct10dif - preparatory refactor for 8x8 PMULL versionArd Biesheuvel2018-09-042-76/+90
| | | | | | | | | Reorganize the CRC-T10DIF asm routine so we can easily instantiate an alternative version based on 8x8 polynomial multiplication in a subsequent patch. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/crc32 - remove PMULL based CRC32 driverArd Biesheuvel2018-09-044-539/+0Star
| | | | | | | | | | | | | | | | | | Now that the scalar fallbacks have been moved out of this driver into the core crc32()/crc32c() routines, we are left with a CRC32 crypto API driver for arm64 that is based only on 64x64 polynomial multiplication, which is an optional instruction in the ARMv8 architecture, and is less and less likely to be available on cores that do not also implement the CRC32 instructions, given that those are mandatory in the architecture as of ARMv8.1. Since the scalar instructions do not require the special handling that SIMD instructions do, and since they turn out to be considerably faster on some cores (Cortex-A53) as well, there is really no point in keeping this code around so let's just remove it. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes-modes - get rid of literal load of addend vectorArd Biesheuvel2018-09-041-7/+9
| | | | | | | | | | | | | | | | Replace the literal load of the addend vector with a sequence that performs each add individually. This sequence is only 2 instructions longer than the original, and 2% faster on Cortex-A53. This is an improvement by itself, but also works around a Clang issue, whose integrated assembler does not implement the GNU ARM asm syntax completely, and does not support the =literal notation for FP registers (more info at https://bugs.llvm.org/show_bug.cgi?id=38642) Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: speck - remove SpeckJason A. Donenfeld2018-09-044-643/+0Star
| | | | | | | | | | | | | | | | These are unused, undesired, and have never actually been used by anybody. The original authors of this code have changed their mind about its inclusion. While originally proposed for disk encryption on low-end devices, the idea was discarded [1] in favor of something else before that could really get going. Therefore, this patch removes Speck. [1] https://marc.info/?l=linux-crypto-vger&m=153359499015659 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Eric Biggers <ebiggers@google.com> Cc: stable@vger.kernel.org Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/aes-gcm-ce - fix scatterwalk API violationArd Biesheuvel2018-08-251-6/+23
| | | | | | | | | | | | | | | | | | | | | | | | Commit 71e52c278c54 ("crypto: arm64/aes-ce-gcm - operate on two input blocks at a time") modified the granularity at which the AES/GCM code processes its input to allow subsequent changes to be applied that improve performance by using aggregation to process multiple input blocks at once. For this reason, it doubled the algorithm's 'chunksize' property to 2 x AES_BLOCK_SIZE, but retained the non-SIMD fallback path that processes a single block at a time. In some cases, this violates the skcipher scatterwalk API, by calling skcipher_walk_done() with a non-zero residue value for a chunk that is expected to be handled in its entirety. This results in a WARN_ON() to be hit by the TLS self test code, but is likely to break other user cases as well. Unfortunately, none of the current test cases exercises this exact code path at the moment. Fixes: 71e52c278c54 ("crypto: arm64/aes-ce-gcm - operate on two ...") Reported-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/sm4-ce - check for the right CPU feature bitArd Biesheuvel2018-08-251-1/+1
| | | | | | | | | | | | | ARMv8.2 specifies special instructions for the SM3 cryptographic hash and the SM4 symmetric cipher. While it is unlikely that a core would implement one and not the other, we should only use SM4 instructions if the SM4 CPU feature bit is set, and we currently check the SM3 feature bit instead. So fix that. Fixes: e99ce921c468 ("crypto: arm64 - add support for SM4...") Cc: <stable@vger.kernel.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: arm64/ghash-ce - implement 4-way aggregationArd Biesheuvel2018-08-072-51/+142
| | | | | | | | | | Enhance the GHASH implementation that uses 64-bit polynomial multiplication by adding support for 4-way aggregation. This more than doubles the performance, from 2.4 cycles per byte to 1.1 cpb on Cortex-A53. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>