summaryrefslogtreecommitdiffstats
path: root/arch/x86/crypto
Commit message (Collapse)AuthorAgeFilesLines
* crypto: x86 - add more optimized XTS-mode for serpent-avxJussi Kivilinna2013-04-254-46/+244
| | | | | | | | | | | | | | | | | This patch adds AVX optimized XTS-mode helper functions/macros and converts serpent-avx to use the new facilities. Benefits are slightly improved speed and reduced stack usage as use of temporary IV-array is avoided. tcrypt results, with Intel i5-2450M: enc dec 16B 1.00x 1.00x 64B 1.00x 1.00x 256B 1.04x 1.06x 1024B 1.09x 1.09x 8192B 1.10x 1.09x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: crc32-pclmul - Use gas macro for pclmulqdqSandy Wu2013-04-251-2/+3
| | | | | | | | | | Occurs when CONFIG_CRYPTO_CRC32C_INTEL=y and CONFIG_CRYPTO_CRC32C_INTEL=y. Older versions of bintuils do not support the pclmulqdq instruction. The PCLMULQDQ gas macro is used instead. Signed-off-by: Sandy Wu <sandyw@twitter.com> Cc: stable@vger.kernel.org # 3.8+ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: sha512 - Create module providing optimized SHA512 routines using ↵Tim Chen2013-04-252-0/+284
| | | | | | | | | | SSSE3, AVX or AVX2 instructions. We added glue code and config options to create crypto module that uses SSE/AVX/AVX2 optimized SHA512 x86_64 assembly routines. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX2 RORX ↵Tim Chen2013-04-251-0/+743
| | | | | | | | | | | instruction. Provides SHA512 x86_64 assembly routine optimized with SSE, AVX and AVX2's RORX instructions. Speedup of 70% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX ↵Tim Chen2013-04-251-0/+423
| | | | | | | | | | instructions. Provides SHA512 x86_64 assembly routine optimized with SSE and AVX instructions. Speedup of 60% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: sha512 - Optimized SHA512 x86_64 assembly routine using Supplemental ↵Tim Chen2013-04-251-0/+421
| | | | | | | | | | SSE3 instructions. Provides SHA512 x86_64 assembly routine optimized with SSSE3 instructions. Speedup of 40% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: sha256 - Create module providing optimized SHA256 routines using ↵Tim Chen2013-04-252-0/+277
| | | | | | | | | | SSSE3, AVX or AVX2 instructions. We added glue code and config options to create crypto module that uses SSE/AVX/AVX2 optimized SHA256 x86_64 assembly routines. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructionsTim Chen2013-04-031-0/+772
| | | | | | | | | Provides SHA256 x86_64 assembly routine optimized with SSE, AVX and AVX2's RORX instructions. Speedup of 70% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: sha256 - Optimized sha256 x86_64 assembly routine with AVX instructions.Tim Chen2013-04-031-0/+496
| | | | | | | | Provides SHA256 x86_64 assembly routine optimized with SSE and AVX instructions. Speedup of 60% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: sha256 - Optimized sha256 x86_64 assembly routine using Supplemental ↵Tim Chen2013-04-031-0/+506
| | | | | | | | | | SSE3 instructions. Provides SHA256 x86_64 assembly routine optimized with SSSE3 instructions. Speedup of 40% or more has been measured over the generic implementation. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: x86 - build AVX block cipher implementations only if assembler ↵Jussi Kivilinna2013-04-031-11/+23
| | | | | | | | | | | | | supports AVX instructions These modules require AVX support in assembler, so add new check to Makefile for this. Other option would be to use CONFIG_AS_AVX inside source files, but that would result dummy/empty/no-fuctionality modules being created. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: x86/crc32-pclmul - assembly clean-ups: use ENTRY/ENDPROCJussi Kivilinna2013-04-031-3/+3
| | | | | Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: crc32c - Update the links to the white papers on CRC32C calculations ↵Tim Chen2013-03-101-2/+3
| | | | | | | | | | | | | | with PCLMULQDQ instructions. Herbert, The following patch update the stale link to the CRC32C white paper that was referenced. Tim Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: crc32c - Kill pointless CRYPTO_CRC32C_X86_64 optionHerbert Xu2013-02-261-1/+1
| | | | | | | This bool option can never be set to anything other than y. So let's just kill it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds2013-02-2624-294/+668
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull crypto update from Herbert Xu: "Here is the crypto update for 3.9: - Added accelerated implementation of crc32 using pclmulqdq. - Added test vector for fcrypt. - Added support for OMAP4/AM33XX cipher and hash. - Fixed loose crypto_user input checks. - Misc fixes" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (43 commits) crypto: user - ensure user supplied strings are nul-terminated crypto: user - fix empty string test in report API crypto: user - fix info leaks in report API crypto: caam - Added property fsl,sec-era in SEC4.0 device tree binding. crypto: use ERR_CAST crypto: atmel-aes - adjust duplicate test crypto: crc32-pclmul - Kill warning on x86-32 crypto: x86/twofish - assembler clean-ups: use ENTRY/ENDPROC, localize jump labels crypto: x86/sha1 - assembler clean-ups: use ENTRY/ENDPROC crypto: x86/serpent - use ENTRY/ENDPROC for assember functions and localize jump targets crypto: x86/salsa20 - assembler cleanup, use ENTRY/ENDPROC for assember functions and rename ECRYPT_* to salsa20_* crypto: x86/ghash - assembler clean-up: use ENDPROC at end of assember functions crypto: x86/crc32c - assembler clean-up: use ENTRY/ENDPROC crypto: cast6-avx: use ENTRY()/ENDPROC() for assembler functions crypto: cast5-avx: use ENTRY()/ENDPROC() for assembler functions and localize jump targets crypto: camellia-x86_64/aes-ni: use ENTRY()/ENDPROC() for assembler functions and localize jump targets crypto: blowfish-x86_64: use ENTRY()/ENDPROC() for assembler functions and localize jump targets crypto: aesni-intel - add ENDPROC statements for assembler functions crypto: x86/aes - assembler clean-ups: use ENTRY/ENDPROC, localize jump targets crypto: testmgr - add test vector for fcrypt ...
| * crypto: crc32-pclmul - Kill warning on x86-32Herbert Xu2013-01-201-1/+0Star
| | | | | | | | | | | | | | | | This patch removes a gratuitous warning on x86-32: arch/x86/crypto/crc32-pclmul_asm.S:87:2: warning: #warning Using 32bit code support [-Wcpp] Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: x86/twofish - assembler clean-ups: use ENTRY/ENDPROC, localize jump ↵Jussi Kivilinna2013-01-204-48/+29Star
| | | | | | | | | | | | | | | | labels Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: x86/sha1 - assembler clean-ups: use ENTRY/ENDPROCJussi Kivilinna2013-01-201-5/+5
| | | | | | | | | | | | Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: x86/serpent - use ENTRY/ENDPROC for assember functions and localize ↵Jussi Kivilinna2013-01-203-48/+27Star
| | | | | | | | | | | | | | | | jump targets Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: x86/salsa20 - assembler cleanup, use ENTRY/ENDPROC for assember ↵Jussi Kivilinna2013-01-203-34/+27Star
| | | | | | | | | | | | | | | | functions and rename ECRYPT_* to salsa20_* Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: x86/ghash - assembler clean-up: use ENDPROC at end of assember functionsJussi Kivilinna2013-01-201-0/+4
| | | | | | | | | | | | Signed-off-by: Jussi Kivilinna <jussi.kivilinn@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: x86/crc32c - assembler clean-up: use ENTRY/ENDPROCJussi Kivilinna2013-01-201-2/+6
| | | | | | | | | | | | Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: cast6-avx: use ENTRY()/ENDPROC() for assembler functionsJussi Kivilinna2013-01-201-24/+11Star
| | | | | | | | | | | | Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: cast5-avx: use ENTRY()/ENDPROC() for assembler functions and ↵Jussi Kivilinna2013-01-201-30/+18Star
| | | | | | | | | | | | | | | | localize jump targets Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: camellia-x86_64/aes-ni: use ENTRY()/ENDPROC() for assembler ↵Jussi Kivilinna2013-01-202-52/+36Star
| | | | | | | | | | | | | | | | functions and localize jump targets Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: blowfish-x86_64: use ENTRY()/ENDPROC() for assembler functions and ↵Jussi Kivilinna2013-01-201-25/+14Star
| | | | | | | | | | | | | | | | localize jump targets Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: aesni-intel - add ENDPROC statements for assembler functionsJussi Kivilinna2013-01-201-1/+22
| | | | | | | | | | | | Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: x86/aes - assembler clean-ups: use ENTRY/ENDPROC, localize jump targetsJussi Kivilinna2013-01-202-25/+20Star
| | | | | | | | | | | | Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: crc32 - add crc32 pclmulqdq implementation and wrappers for table ↵Alexander Boyko2013-01-203-0/+450
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implementation This patch adds crc32 algorithms to shash crypto api. One is wrapper to gerneric crc32_le function. Second is crc32 pclmulqdq implementation. It use hardware provided PCLMULQDQ instruction to accelerate the CRC32 disposal. This instruction present from Intel Westmere and AMD Bulldozer CPUs. For intel core i5 I got 450MB/s for table implementation and 2100MB/s for pclmulqdq implementation. Signed-off-by: Alexander Boyko <alexander_boyko@xyratex.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: aesni-intel - remove rfc3686(ctr(aes)), utilize rfc3686 from ↵Jussi Kivilinna2013-01-081-37/+0Star
|/ | | | | | | | | | | ctr-module instead rfc3686 in CTR module is now able of using asynchronous ctr(aes) from aesni-intel, so rfc3686(ctr(aes)) in aesni-intel is no longer needed. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds2012-12-1518-587/+3052
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull crypto update from Herbert Xu: - Added aesni/avx/x86_64 implementations for camellia. - Optimised AVX code for cast5/serpent/twofish/cast6. - Fixed vmac bug with unaligned input. - Allow compression algorithms in FIPS mode. - Optimised crc32c implementation for Intel. - Misc fixes. * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (32 commits) crypto: caam - Updated SEC-4.0 device tree binding for ERA information. crypto: testmgr - remove superfluous initializers for xts(aes) crypto: testmgr - allow compression algs in fips mode crypto: testmgr - add larger crc32c test vector to test FPU path in crc32c_intel crypto: testmgr - clean alg_test_null entries in alg_test_descs[] crypto: testmgr - remove fips_allowed flag from camellia-aesni null-tests crypto: cast5/cast6 - move lookup tables to shared module padata: use __this_cpu_read per-cpu helper crypto: s5p-sss - Fix compilation error crypto: picoxcell - Add terminating entry for platform_device_id table crypto: omap-aes - select BLKCIPHER2 crypto: camellia - add AES-NI/AVX/x86_64 assembler implementation of camellia cipher crypto: camellia-x86_64 - share common functions and move structures and function definitions to header file crypto: tcrypt - add async speed test for camellia cipher crypto: tegra-aes - fix error-valued pointer dereference crypto: tegra - fix missing unlock on error case crypto: cast5/avx - avoid using temporary stack buffers crypto: serpent/avx - avoid using temporary stack buffers crypto: twofish/avx - avoid using temporary stack buffers crypto: cast6/avx - avoid using temporary stack buffers ...
| * crypto: cast5/cast6 - move lookup tables to shared moduleJussi Kivilinna2012-12-062-16/+16
| | | | | | | | | | | | | | | | | | CAST5 and CAST6 both use same lookup tables, which can be moved shared module 'cast_common'. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: camellia - add AES-NI/AVX/x86_64 assembler implementation of ↵Jussi Kivilinna2012-11-093-0/+1663
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | camellia cipher This patch adds AES-NI/AVX/x86_64 assembler implementation of Camellia block cipher. Implementation process data in sixteen block chunks, which are byte-sliced and AES SubBytes is reused for Camellia s-box with help of pre- and post-filtering. Patch has been tested with tcrypt and automated filesystem tests. tcrypt test results: Intel Core i5-2450M: camellia-aesni-avx vs camellia-asm-x86_64-2way: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 0.98x 0.96x 0.99x 0.96x 0.96x 0.95x 0.95x 0.94x 0.97x 0.98x 64B 0.99x 0.98x 1.00x 0.98x 0.98x 0.99x 0.98x 0.93x 0.99x 0.98x 256B 2.28x 2.28x 1.01x 2.29x 2.25x 2.24x 1.96x 1.97x 1.91x 1.90x 1024B 2.57x 2.56x 1.00x 2.57x 2.51x 2.53x 2.19x 2.17x 2.19x 2.22x 8192B 2.49x 2.49x 1.00x 2.53x 2.48x 2.49x 2.17x 2.17x 2.22x 2.22x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 0.97x 0.98x 0.99x 0.97x 0.97x 0.96x 0.97x 0.98x 0.98x 0.99x 64B 1.00x 1.00x 1.01x 0.99x 0.98x 0.99x 0.99x 0.99x 0.99x 0.99x 256B 2.37x 2.37x 1.01x 2.39x 2.35x 2.33x 2.10x 2.11x 1.99x 2.02x 1024B 2.58x 2.60x 1.00x 2.58x 2.56x 2.56x 2.28x 2.29x 2.28x 2.29x 8192B 2.50x 2.52x 1.00x 2.56x 2.51x 2.51x 2.24x 2.25x 2.26x 2.29x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: camellia-x86_64 - share common functions and move structures and ↵Jussi Kivilinna2012-11-091-57/+23Star
| | | | | | | | | | | | | | | | | | | | | | function definitions to header file Prepare camellia-x86_64 functions to be reused from AVX/AESNI implementation module. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: cast5/avx - avoid using temporary stack buffersJussi Kivilinna2012-10-242-131/+280
| | | | | | | | | | | | | | | | | | | | | | | | | | Introduce new assembler functions to avoid use temporary stack buffers in glue code. This also allows use of vector instructions for xoring output in CTR and CBC modes and construction of IVs for CTR mode. ECB mode sees ~0.5% decrease in speed because added one extra function call. CBC mode decryption and CTR mode benefit from vector operations and gain ~5%. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: serpent/avx - avoid using temporary stack buffersJussi Kivilinna2012-10-242-95/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | Introduce new assembler functions to avoid use temporary stack buffers in glue code. This also allows use of vector instructions for xoring output in CTR and CBC modes and construction of IVs for CTR mode. ECB mode sees ~0.5% decrease in speed because added one extra function call. CBC mode decryption and CTR mode benefit from vector operations and gain ~3%. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: twofish/avx - avoid using temporary stack buffersJussi Kivilinna2012-10-242-129/+152
| | | | | | | | | | | | | | | | | | | | | | | | | | Introduce new assembler functions to avoid use temporary stack buffers in glue code. This also allows use of vector instructions for xoring output in CTR and CBC modes and construction of IVs for CTR mode. ECB mode sees ~0.2% decrease in speed because added one extra function call. CBC mode decryption and CTR mode benefit from vector operations and gain ~3%. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: cast6/avx - avoid using temporary stack buffersJussi Kivilinna2012-10-243-125/+227
| | | | | | | | | | | | | | | | | | | | | | | | | | Introduce new assembler functions to avoid use temporary stack buffers in glue code. This also allows use of vector instructions for xoring output in CTR and CBC modes and construction of IVs for CTR mode. ECB mode sees ~0.5% decrease in speed because added one extra function call. CBC mode decryption and CTR mode benefit from vector operations and gain ~2%. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: x86/glue_helper - use le128 instead of u128 for CTR modeJussi Kivilinna2012-10-247-45/+45
| | | | | | | | | | | | | | | | | | | | 'u128' currently used for CTR mode is on little-endian 'long long' swapped and would require extra swap operations by SSE/AVX code. Use of le128 instead of u128 allows IV calculations to be done with vector registers easier. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instructionTim Chen2012-10-153-0/+542
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the crc_pcl function that calculates CRC32C checksum using the PCLMULQDQ instruction on processors that support this feature. This will provide speedup over using CRC32 instruction only. The usage of PCLMULQDQ necessitate the invocation of kernel_fpu_begin and kernel_fpu_end and incur some overhead. So the new crc_pcl function is only invoked for buffer size of 512 bytes or more. Larger sized buffers will expect to see greater speedup. This feature is best used coupled with eager_fpu which reduces the kernel_fpu_begin/end overhead. For buffer size of 1K the speedup is around 1.6x and for buffer size greater than 4K, the speedup is around 3x compared to original implementation in crc32c-intel module. Test was performed on Sandy Bridge based platform with constant frequency set for cpu. A white paper detailing the algorithm can be found here: http://download.intel.com/design/intarch/papers/323405.pdf Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * crypto: crc32c - Rename crc32c-intel.c to crc32c-intel_glue.cTim Chen2012-10-152-0/+1
| | | | | | | | | | | | | | | | | | | | This patch renames the crc32c-intel.c file to crc32c-intel_glue.c file in preparation for linking with the new crc32c-pcl-intel-asm.S file, which contains optimized crc32c calculation based on PCLMULQDQ instruction. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | crypto: aesni - fix XTS mode on x86-32, add wrapper function for asmlinkage ↵Jussi Kivilinna2012-10-181-2/+7
|/ | | | | | | | | | | | | | | aesni_enc() Calling convention for internal functions and 'asmlinkage' functions is different on x86-32. Therefore do not directly cast aesni_enc as XTS tweak function, but use wrapper function in between. Fixes crash with "XTS + aesni_intel + x86-32" combination. Cc: stable@vger.kernel.org Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* crypto: x86/glue_helper - fix storing of new IV in CBC encryptionJussi Kivilinna2012-10-041-1/+1
| | | | | | | | | Glue_helper incorrectly XORs new IV over old IV at end of CBC encryption function when it should store. This causes CBC encryption to give incorrect output on multi-page encryption requests. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: cast5/avx - fix storing of new IV in CBC encryptionJussi Kivilinna2012-09-271-1/+1
| | | | | | | | | cast5/avx incorrectly XORs new IV over old IV at end of CBC encryption function when it should store. This causes CBC encryption to give incorrect output on multi-page encryption requests. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: camellia-x86_64 - fix sparse warnings (constant is so big)Jussi Kivilinna2012-09-061-688/+688
| | | | | | | Fix "constant 0xXXXXXXXXXXXXXXXX is so big it's unsigned long" sparse warnings. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: cast6-avx - tune assembler code for more performanceJussi Kivilinna2012-09-061-114/+162
| | | | | | | | | | | | | | | | | | | | | | | | | | Patch replaces 'movb' instructions with 'movzbl' to break false register dependencies, interleaves instructions better for out-of-order scheduling and merges constant 16-bit rotation with round-key variable rotation. tcrypt ECB results: Intel Core i5-2450M: size old-vs-new new-vs-generic old-vs-generic enc dec enc dec enc dec 256 1.13x 1.19x 2.05x 2.17x 1.82x 1.82x 1k 1.18x 1.21x 2.26x 2.33x 1.93x 1.93x 8k 1.19x 1.19x 2.32x 2.33x 1.95x 1.95x [v2] - Do instruction interleaving another way to avoid adding new FPU<=>CPU register moves as these cause performance drop on Bulldozer. - Improvements to round-key variable rotation handling. - Further interleaving improvements for better out-of-order scheduling. Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: cast5-avx - tune assembler code for more performanceJussi Kivilinna2012-09-061-106/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | Patch replaces 'movb' instructions with 'movzbl' to break false register dependencies, interleaves instructions better for out-of-order scheduling and merges constant 16-bit rotation with round-key variable rotation. tcrypt ECB results (128bit key): Intel Core i5-2450M: size old-vs-new new-vs-generic old-vs-generic enc dec enc dec enc dec 256 1.18x 1.18x 2.45x 2.47x 2.08x 2.10x 1k 1.20x 1.20x 2.73x 2.73x 2.28x 2.28x 8k 1.20x 1.19x 2.73x 2.73x 2.28x 2.29x [v2] - Do instruction interleaving another way to avoid adding new FPU<=>CPU register moves as these cause performance drop on Bulldozer. - Improvements to round-key variable rotation handling. - Further interleaving improvements for better out-of-order scheduling. Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: twofish-avx - tune assembler code for more performanceJussi Kivilinna2012-09-061-85/+142
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch replaces 'movb' instructions with 'movzbl' to break false register dependencies and interleaves instructions better for out-of-order scheduling. Tested on Intel Core i5-2450M and AMD FX-8100. tcrypt ECB results: Intel Core i5-2450M: size old-vs-new new-vs-3way old-vs-3way enc dec enc dec enc dec 256 1.12x 1.13x 1.36x 1.37x 1.21x 1.22x 1k 1.14x 1.14x 1.48x 1.49x 1.29x 1.31x 8k 1.14x 1.14x 1.50x 1.52x 1.32x 1.33x AMD FX-8100: size old-vs-new new-vs-3way old-vs-3way enc dec enc dec enc dec 256 1.10x 1.11x 1.01x 1.01x 0.92x 0.91x 1k 1.11x 1.12x 1.08x 1.07x 0.97x 0.96x 8k 1.11x 1.13x 1.10x 1.08x 0.99x 0.97x [v2] - Do instruction interleaving another way to avoid adding new FPU<=>CPU register moves as these cause performance drop on Bulldozer. - Further interleaving improvements for better out-of-order scheduling. Tested-by: Borislav Petkov <bp@alien8.de> Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: aesni_intel - improve lrw and xts performance by utilizing parallel ↵Jussi Kivilinna2012-08-201-35/+218
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AES-NI hardware pipelines Use parallel LRW and XTS encryption facilities to better utilize AES-NI hardware pipelines and gain extra performance. Tcrypt benchmark results (async), old vs new ratios: Intel Core i5-2450M CPU (fam: 6, model: 42, step: 7) aes:128bit lrw:256bit xts:256bit size lrw-enc lrw-dec xts-dec xts-dec 16B 0.99x 1.00x 1.22x 1.19x 64B 1.38x 1.50x 1.58x 1.61x 256B 2.04x 2.02x 2.27x 2.29x 1024B 2.56x 2.54x 2.89x 2.92x 8192B 2.85x 2.99x 3.40x 3.23x aes:192bit lrw:320bit xts:384bit size lrw-enc lrw-dec xts-dec xts-dec 16B 1.08x 1.08x 1.16x 1.17x 64B 1.48x 1.54x 1.59x 1.65x 256B 2.18x 2.17x 2.29x 2.28x 1024B 2.67x 2.67x 2.87x 3.05x 8192B 2.93x 2.84x 3.28x 3.33x aes:256bit lrw:348bit xts:512bit size lrw-enc lrw-dec xts-dec xts-dec 16B 1.07x 1.07x 1.18x 1.19x 64B 1.56x 1.56x 1.70x 1.71x 256B 2.22x 2.24x 2.46x 2.46x 1024B 2.76x 2.77x 3.13x 3.05x 8192B 2.99x 3.05x 3.40x 3.30x Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Reviewed-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* crypto: cast6 - add x86_64/avx assembler implementationJohannes Goetzfried2012-08-013-0/+985
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a x86_64/avx assembler implementation of the Cast6 block cipher. The implementation processes eight blocks in parallel (two 4 block chunk AVX operations). The table-lookups are done in general-purpose registers. For small blocksizes the functions from the generic module are called. A good performance increase is provided for blocksizes greater or equal to 128B. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: Intel Core i5-2500 CPU (fam:6, model:42, step:7) cast6-avx-x86_64 vs. cast6-generic 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 0.97x 1.00x 1.01x 1.01x 0.99x 0.97x 0.98x 1.01x 0.96x 0.98x 64B 0.98x 0.99x 1.02x 1.01x 0.99x 1.00x 1.01x 0.99x 1.00x 0.99x 256B 1.77x 1.84x 0.99x 1.85x 1.77x 1.77x 1.70x 1.74x 1.69x 1.72x 1024B 1.93x 1.95x 0.99x 1.96x 1.93x 1.93x 1.84x 1.85x 1.89x 1.87x 8192B 1.91x 1.95x 0.99x 1.97x 1.95x 1.91x 1.86x 1.87x 1.93x 1.90x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 0.97x 0.99x 1.02x 1.01x 0.98x 0.99x 1.00x 1.00x 0.98x 0.98x 64B 0.98x 0.99x 1.01x 1.00x 1.00x 1.00x 1.01x 1.01x 0.97x 1.00x 256B 1.77x 1.83x 1.00x 1.86x 1.79x 1.78x 1.70x 1.76x 1.71x 1.69x 1024B 1.92x 1.95x 0.99x 1.96x 1.93x 1.93x 1.83x 1.86x 1.89x 1.87x 8192B 1.94x 1.95x 0.99x 1.97x 1.95x 1.95x 1.87x 1.87x 1.93x 1.91x Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>