Carry-less product

The carry-less product of two binary numbers is the result of carry-less multiplication of these numbers. This operation conceptually works like long multiplication except for the fact that the carry is discarded instead of applied to the more significant position. It can be used to model operations over finite fields, in particular multiplication of polynomials from GF(2)[X], the polynomial ring over GF(2).
The operation is also known as an XOR multiplication, as carry-discarding addition is equivalent to an exclusive or.
Definition
Given two numbers and , with denoting the bits of these numbers, the carry-less product of these two numbers is defined to be , with each bit computed as the exclusive or of products of bits from the input numbers as follows:[1]
Example
Consider a = 101000102 and b = 100101102, with all numbers given in binary. Then the carry-less multiplication of these is essentially what one would get from performing a long multiplication but ignoring the carries.
1 0 1 0 0 0 1 0 = a
---------------|---|-------|--
1 0 0 1 0 1 1 0|0 0 0 0 0 0 0
1 0 0 1 0 1 1 0|0 0 0 0 0
1 0 0 1 0 1 1 0|0
------------------------------
1 0 1 1 0 0 0 1 1 1 0 1 1 0 0
^ ^
For every logic 1 bit in the number a, the number b is shifted to the left
as many bits as indicated by the position these bits a.
All these shifted versions are then "added" by using an XOR instead of the regular binary addition used in regular long multiplication.
Results of XOR are indicated by ^ . In full binary adder these would result in a carry to the column to the left. Here this does not happen so the carry-less product of a and b would be c = 1011000111011002.
Multiplication of polynomials
The carry-less product can also be seen as a multiplication of polynomials over the field GF(2). This is because the exclusive or corresponds to the addition in this field.
In the example above, the numbers a and b correspond to polynomials
and the product of these is
which is what the number c computed above encodes. Notice how and thanks to the arithmetic in GF(2). This corresponds to the columns marked ^ in the example.
Applications
The elements of GF(2n), i.e. a finite field whose order is a power of two, are usually represented as polynomials in GF(2)[X]. Multiplication of two such field elements consists of multiplication of the corresponding polynomials, followed by a reduction with respect to some irreducible polynomial which is taken from the construction of the field. If the polynomials are encoded as binary numbers, carry-less multiplication can be used to perform the first step of this computation.
Such fields have applications in cryptography and for some checksum algorithms.
Implementations
Many modern CPU architectures support various extensions that provide hardware support for some form of carry-less multiplication. This support may take the form of either vector or scalar instructions.
Listed below are some instruction set extensions for carry-less multiplication in various CPU architectures.
| CPU architecture | Instruction set extension | Instruction mnemonic | Multiplication size | Implementations |
|---|---|---|---|---|
| Intel x86 | PCLMULQDQ (scalar) | PCLMULQDQ |
64x64→128 | Intel Core/Xeon (Westmere and later), Intel Atom (Goldmont and later) AMD Bulldozer/Jaguar/Zen, Zhaoxin ZX-C and later |
| VPCLMULQDQ (vector) | VPCLMULQDQ |
64x64→128 (vector) | Intel Core/Xeon (Ice Lake/Rocket Lake and later) AMD Zen 3 and later | |
| ARM (32-bit) | NEON | VMUL.P8 |
8x8→8 (vector, low half) | ARM Cortex-A8 and later |
VMULL.P8 |
8x8→16 (vector) | |||
| ARMv8 Cryptographic Extensions |
VMULL.P64 |
64x64→128 | ARM Cortex-A57 and later, Apple Cyclone and later | |
| ARM (64-bit) | NEON | PMUL |
8x8→8 (vector, low half) | |
PMULLPMULL2 |
8x8→16 (vector) | |||
| ARMv8 Cryptographic Extensions |
PMULL,PMULL2 |
64x64→128 | ||
| Power ISA | Power ISA v2.07[2] | VPMSUMB | 8x8→16 (vector) | IBM POWER8 and later |
| VPMSUMH | 16x16→32 (vector) | |||
| VPMSUMW | 32x32→64 (vector) | |||
| VPMSUMD | 64x64→128 | |||
| RISC-V | Zbc (scalar bit-manipulation) | CLMUL,CLMULH,CLMULR[3]
|
XLENxXLEN→XLEN (32x32 in 32-bit mode, 64x64 in 64-bit mode, each instruction returns half of the result) |
Many, e.g.: |
| Zvbc (vector bit-manipulation) | VCLMUL,VCLMULH[3]
|
64x64→64 (vector; low or high half) | SpacemiT K3[8] | |
| SPARC | VIS 3[9] | XMULX,XMULXHI |
64x64→64 (low or high half) | SPARC T4 and later |
| z/Architecture | Vector Facility[10] | VGFM,VGFMA
|
8x8→16 (vector), 16x16→32 (vector), 32x32→64 (vector), 64x64→128 |
IBM z13 and later |
For other targets it is possible to implement the computation above as a software algorithm, and many cryptography libraries will contain an implementation as part of their finite field arithmetic operations.
For wide carry-less multiplications, it is possible to adapt fast integer multiplication algorithms such as the Karatsuba and Toom-Cook algorithms to work with carry-less multiplications.[11][12]
Other bases
The definition of a carry-less product as the result of a long multiplication discarding carry would readily apply to bases other than 2. But the result depends on the basis, which is therefore an essential part of the operation. As this operation is typically being used on computers operating in binary, the binary form discussed above is the one employed in practice.
Polynomials over other finite fields of prime order do have applications, but treating the coefficients of such a polynomial as the digits of a single number is rather uncommon, so the multiplication of such polynomials would not be seen as a carry-less multiplication of numbers.
See also
- CLMUL instruction set, an x86 ISA extension
- Finite field arithmetic
- Galois/Counter Mode
References
- ↑ Shay Gueron (2011-04-13). "Intel Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode - Rev 2". Intel. http://software.intel.com/en-us/articles/intel-carry-less-multiplication-instruction-and-its-usage-for-computing-the-gcm-mode/.
- ↑ IBM, Power ISA, Version 2.07, 3 May 2013, page 259. Archived on 7 Jan 2017.
- ↑ 3.0 3.1 RISC-V International, The RISC-V Instruction Set Manual, Volume I, version 20260120 - see section 30.4 on page 225 for the Zbc extension and section 33.2.2 on page 477 for the Zvbc extension. Archived on 14 Apr 2026.
- ↑ RISC-V International, XuanTie C908: High-performance RISC-V Processor Catered to AIoT Industry, Nov 3, 2022
- ↑ OpenHW Group, RVZbc: Carry-less multiplication
- ↑ AndesTech, AndesCore A25
- ↑ AMD, AMD MicroBlaze V processor
- ↑ SpacemiT, SpacemiT K3: A RVA23 RISC-V AI CPU with 60 TOPS AI Compute, 29 Jan 2026. Archived on 29 Apr 2026.
- ↑ Oracle, Oracle SPARC Architecture 2011, 12 Jan 2016, section 7.143 on page 362. Archived on 13 Dec 2025.
- ↑ IBM, z/Architecture Principles of Operation, ref no. SA22-7832-14, Apr 2025, see page 1700. Archived on 11 Feb 2026.
- ↑ ean-Marc Robert and Pascal Véron, Faster Multiplication over F2(X) using AVX512 instruction set and VPCLMULQDQ instruction, 25 Jan 2022, see appendices A and D for carry-less Karatsuba and Toom-Cook algorithms. Archived on 11 Feb 2026.
- ↑ Inria, gf2x library, see toom-gpl.c (archive) file for a C/C++ implementation of carryless Toom-Cook.
