- Aug 04, 2017
-
-
Ard Biesheuvel authored
For the final round, avoid the expanded and padded lookup tables exported by the generic AES driver. Instead, for encryption, we can perform byte loads from the same table we used for the inner rounds, which will still be hot in the caches. For decryption, use the inverse AES Sbox directly, which is 4x smaller than the inverse lookup table exported by the generic driver. This should significantly reduce the Dcache footprint of our code, which makes the code more robust against timing attacks. It does not introduce any additional module dependencies, given that we already rely on the core AES module for the shared key expansion routines. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Ard Biesheuvel authored
Implement a NEON fallback for systems that do support NEON but have no support for the optional 64x64->128 polynomial multiplication instruction that is part of the ARMv8 Crypto Extensions. It is based on the paper "Fast Software Polynomial Multiplication on ARM Processors Using the NEON Engine" by Danilo Camara, Conrado Gouvea, Julio Lopez and Ricardo Dahab (https://hal.inria.fr/hal-01506572 ) On a 32-bit guest executing under KVM on a Cortex-A57, the new code is not only 4x faster than the generic table based GHASH driver, it is also time invariant. (Note that the existing vmull.p64 code is 16x faster on this core). Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Ard Biesheuvel authored
There are quite a number of occurrences in the kernel of the pattern if (dst != src) memcpy(dst, src, walk.total % AES_BLOCK_SIZE); crypto_xor(dst, final, walk.total % AES_BLOCK_SIZE); or crypto_xor(keystream, src, nbytes); memcpy(dst, keystream, nbytes); where crypto_xor() is preceded or followed by a memcpy() invocation that is only there because crypto_xor() uses its output parameter as one of the inputs. To avoid having to add new instances of this pattern in the arm64 code, which will be refactored to implement non-SIMD fallbacks, add an alternative implementation called crypto_xor_cpy(), taking separate input and output arguments. This removes the need for the separate memcpy(). Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Jun 01, 2017
-
-
Ard Biesheuvel authored
Make the module autoloadable by tying it to the CPU feature bits that describe whether the optional instructions it relies on are implemented by the current CPU. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Ard Biesheuvel authored
Make the module autoloadable by tying it to the CPU feature bit that describes whether the optional instructions it relies on are implemented by the current CPU. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Ard Biesheuvel authored
Make the module autoloadable by tying it to the CPU feature bit that describes whether the optional instructions it relies on are implemented by the current CPU. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Ard Biesheuvel authored
Make the module autoloadable by tying it to the CPU feature bit that describes whether the optional instructions it relies on are implemented by the current CPU. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Ard Biesheuvel authored
Make the module autoloadable by tying it to the CPU feature bit that describes whether the optional instructions it relies on are implemented by the current CPU. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Mar 09, 2017
-
-
Ard Biesheuvel authored
Currently, the bit sliced NEON AES code for ARM has a link time dependency on the scalar ARM asm implementation, which it uses as a fallback to perform CBC encryption and the encryption of the initial XTS tweak. The bit sliced NEON code is both fast and time invariant, which makes it a reasonable default on hardware that supports it. However, the ARM asm code it pulls in is not time invariant, and due to the way it is linked in, cannot be overridden by the new generic time invariant driver. In fact, it will not be used at all, given that the ARM asm code registers itself as a cipher with a priority that exceeds the priority of the fixed time cipher. So remove the link time dependency, and allocate the fallback cipher via the crypto API. Note that this requires this driver's module_init call to be replaced with late_initcall, so that the (possibly generic) fallback cipher is guaranteed to be available when the builtin test is performed at registration time. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Mar 01, 2017
-
-
Ard Biesheuvel authored
The accelerated CRC32 module for ARM may use either the scalar CRC32 instructions, the NEON 64x64 to 128 bit polynomial multiplication (vmull.p64) instruction, or both, depending on what the current CPU supports. However, this also requires support in binutils, and as it turns out, versions of binutils exist that support the vmull.p64 instruction but not the crc32 instructions. So refactor the Makefile logic so that this module only gets built if binutils has support for both. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by:
Jon Hunter <jonathanh@nvidia.com> Tested-by:
Jon Hunter <jonathanh@nvidia.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Ard Biesheuvel authored
Annotate a vmov instruction with an explicit element size of 32 bits. This is inferred by recent toolchains, but apparently, older versions need some help figuring this out. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Feb 03, 2017
-
-
Ard Biesheuvel authored
The ARM bit sliced AES core code uses the IV buffer to pass the final keystream block back to the glue code if the input is not a multiple of the block size, so that the asm code does not have to deal with anything except 16 byte blocks. This is done under the assumption that the outgoing IV is meaningless anyway in this case, given that chaining is no longer possible under these circumstances. However, as it turns out, the CCM driver does expect the IV to retain a value that is equal to the original IV except for the counter value, and even interprets byte zero as a length indicator, which may result in memory corruption if the IV is overwritten with something else. So use a separate buffer to return the final keystream block. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Ard Biesheuvel authored
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Ard Biesheuvel authored
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Jan 23, 2017
-
-
Ard Biesheuvel authored
The GNU assembler for ARM version 2.22 or older fails to infer the element size from the vmov instructions, and aborts the build in the following way; .../aes-neonbs-core.S: Assembler messages: .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[1],r10' .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[0],r9' .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[1],r8' .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[0],r7' .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[1],r10' .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[0],r9' .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[1],r8' .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[0],r7' Fix this by setting the element size explicitly, by replacing vmov with vmov.32. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Jan 13, 2017
-
-
Ard Biesheuvel authored
The ARMv8-M architecture introduces 'tt' and 'ttt' instructions, which means we can no longer use 'tt' as a register alias on recent versions of binutils for ARM. So replace the alias with 'ttab'. Fixes: 81edb426 ("crypto: arm/aes - replace scalar AES cipher") Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Ard Biesheuvel authored
This replaces the unwieldy generated implementation of bit-sliced AES in CBC/CTR/XTS modes that originated in the OpenSSL project with a new version that is heavily based on the OpenSSL implementation, but has a number of advantages over the old version: - it does not rely on the scalar AES cipher that also originated in the OpenSSL project and contains redundant lookup tables and key schedule generation routines (which we already have in crypto/aes_generic.) - it uses the same expanded key schedule for encryption and decryption, reducing the size of the per-key data structure by 1696 bytes - it adds an implementation of AES in ECB mode, which can be wrapped by other generic chaining mode implementations - it moves the handling of corner cases that are non critical to performance to the glue layer written in C - it was written directly in assembler rather than generated from a Perl script Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Jan 12, 2017
-
-
Ard Biesheuvel authored
This replaces the scalar AES cipher that originates in the OpenSSL project with a new implementation that is ~15% (*) faster (on modern cores), and reuses the lookup tables and the key schedule generation routines from the generic C implementation (which is usually compiled in anyway due to networking and other subsystems depending on it). Note that the bit sliced NEON code for AES still depends on the scalar cipher that this patch replaces, so it is not removed entirely yet. * On Cortex-A57, the performance increases from 17.0 to 14.9 cycles per byte for 128-bit keys. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Ard Biesheuvel authored
This is a straight port to ARM/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. It uses the new skcipher walksize attribute to process the input in strides of 4x the block size. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Dec 28, 2016
-
-
Herbert Xu authored
This patch reverts the following commits: 8621caa0 80966672 I should not have applied them because they had already been obsoleted by a subsequent patch series. They also cause a build failure because of the subsequent commit 9ae433bc. Fixes: 9ae433bc ("crypto: chacha20 - convert generic and...") Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Dec 27, 2016
-
-
Ard Biesheuvel authored
This is a straight port to ARM/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Dec 07, 2016
-
-
Ard Biesheuvel authored
This is a combination of the the Intel algorithm implemented using SSE and PCLMULQDQ instructions from arch/x86/crypto/crc32-pclmul_asm.S, and the new CRC32 extensions introduced for both 32-bit and 64-bit ARM in version 8 of the architecture. Two versions of the above combo are provided, one for CRC32 and one for CRC32C. The PMULL/NEON algorithm is faster, but operates on blocks of at least 64 bytes, and on multiples of 16 bytes only. For the remaining input, or for all input on systems that lack the PMULL 64x64->128 instructions, the CRC32 instructions will be used. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Ard Biesheuvel authored
This is a transliteration of the Intel algorithm implemented using SSE and PCLMULQDQ instructions that resides in the file arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only operate on buffers that are 16 byte aligned (but of any size) Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Dec 01, 2016
-
-
Herbert Xu authored
The variable aes_simd_algs should be static. In fact if it isn't it causes build errors when multiple copies of aes-ce-glue.c are built into the kernel. Fixes: da40e7a4 ("crypto: aes-ce - Convert to skcipher") Reported-by:
kbuild test robot <fengguang.wu@intel.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Nov 30, 2016
-
-
Ard Biesheuvel authored
The CBC encryption routine should use the encryption round keys, not the decryption round keys. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Herbert Xu authored
This patch adds one more missing SIMD select for AES_ARM_BS. It also changes selects on ALGAPI to BLKCIPHER. Fixes: 211f41af ("crypto: aesbs - Convert to skcipher") Reported-by:
Horia Geantă <horia.geanta@nxp.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Nov 29, 2016
-
-
Herbert Xu authored
The skcipher conversion for ARM missed the select on CRYPTO_SIMD, causing build failures if SIMD was not otherwise enabled. Fixes: da40e7a4 ("crypto: aes-ce - Convert to skcipher") Fixes: 211f41af ("crypto: aesbs - Convert to skcipher") Reported-by:
Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Nov 28, 2016
-
-
Herbert Xu authored
This patch converts aesbs over to the skcipher interface. Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Herbert Xu authored
This patch converts aes-ce over to the skcipher interface. Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Oct 21, 2016
-
-
Ard Biesheuvel authored
The AES key schedule generation is mostly endian agnostic, with the exception of the rotation and the incorporation of the round constant at the start of each round. So implement a big endian specific version of that part to make the whole routine big endian compatible. Fixes: 86464859 ("crypto: arm - AES in ECB/CBC/CTR/XTS modes using ARMv8 Crypto Extensions") Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Sep 13, 2016
-
-
Ard Biesheuvel authored
The AES-CTR glue code avoids calling into the blkcipher API for the tail portion of the walk, by comparing the remainder of walk.nbytes modulo AES_BLOCK_SIZE with the residual nbytes, and jumping straight into the tail processing block if they are equal. This tail processing block checks whether nbytes != 0, and does nothing otherwise. However, in case of an allocation failure in the blkcipher layer, we may enter this code with walk.nbytes == 0, while nbytes > 0. In this case, we should not dereference the source and destination pointers, since they may be NULL. So instead of checking for nbytes != 0, check for (walk.nbytes % AES_BLOCK_SIZE) != 0, which implies the former in non-error conditions. Fixes: 86464859 ("crypto: arm - AES in ECB/CBC/CTR/XTS modes using ARMv8 Crypto Extensions") Cc: stable@vger.kernel.org Reported-by:
xiakaixu <xiakaixu@huawei.com> Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Sep 07, 2016
-
-
Ard Biesheuvel authored
The fact that the internal synchrous hash implementation is called "ghash" like the publicly visible one is causing the testmgr code to misidentify it as an algorithm that requires testing at boottime. So rename it to "__ghash" to prevent this. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Ard Biesheuvel authored
Since commit 8996eafd ("crypto: ahash - ensure statesize is non-zero"), all ahash drivers are required to implement import()/export(), and must have a non-zero statesize. Fix this for the ARM Crypto Extensions GHASH implementation. Fixes: 8996eafd ("crypto: ahash - ensure statesize is non-zero") Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Ard Biesheuvel authored
The ARMv7 NEON module is explicitly built in ARM mode, which is not supported by the Thumb2 kernel. So remove the explicit override, and leave it up to the build environment to decide whether the core SHA1 routines are assembled as ARM or as Thumb2 code. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Jun 23, 2016
-
-
Herbert Xu authored
This patch fixes an old bug where requests can be reordered because some are processed by cryptd while others are processed directly in softirq context. The fix is to always postpone to cryptd if there are currently requests outstanding from the same tfm. This patch also removes the redundant use of cryptd in the async init function as init never touches the FPU. Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Feb 17, 2016
-
-
Stephan Mueller authored
Commit 28856a9e missed the addition of the crypto/xts.h include file for different architecture-specific AES implementations. Signed-off-by:
Stephan Mueller <smueller@chronox.de> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Feb 16, 2016
-
-
Stephan Mueller authored
The patch centralizes the XTS key check logic into the service function xts_check_key which is invoked from the different XTS implementations. With this, the XTS implementations in ARM, ARM64, PPC and S390 have now a sanity check for the XTS keys similar to the other arches. In addition, this service function received a check to ensure that the key != the tweak key which is mandated by FIPS 140-2 IG A.9. As the check is not present in the standards defining XTS, it is only enforced in FIPS mode of the kernel. Signed-off-by:
Stephan Mueller <smueller@chronox.de> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Feb 15, 2016
-
-
Jeremy Linton authored
ECB modes don't use an initialization vector. The kernel /proc/crypto interface doesn't reflect this properly. Acked-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Jeremy Linton <jeremy.linton@arm.com> Signed-off-by:
Will Deacon <will.deacon@arm.com>
-
- Jul 06, 2015
-
-
Baruch Siach authored
These files are generated since commits f2f770d7 (crypto: arm/sha256 - Add optimized SHA-256/224, 2015-04-03) and c80ae7ca (crypto: arm/sha512 - accelerated SHA-512 using ARM generic ASM and NEON, 2015-05-08). Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Baruch Siach <baruch@tkos.co.il> Acked-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- May 11, 2015
-
-
Ard Biesheuvel authored
This trims off a couple of instructions of the total size of the core AES transform by reordering the final branch in the AES-192 code path with the rounds that are performed regardless of whether the branch is taken or not. Other than the slight size reduction, this has no performance benefit. Fix up a comment regarding the prototype of this function while we're at it. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-