Carry-less product

The carry-less product of two binary numbers is the result of carry-less multiplication of these numbers. This operation conceptually works like long multiplication except for the fact that the carry is discarded instead of applied to the more significant position. It can be used to model operations over finite fields, in particular multiplication of polynomials from GF(2)[X], the polynomial ring over GF(2).

The operation is also known as an XOR multiplication, as carry-discarding addition is equivalent to an exclusive or.

Definition

Given two numbers $a = \sum_{i} a_{i} 2^{i}$ and $b = \sum_{i} b_{i} 2^{i}$ , with $a_{i}, b_{i} \in {0, 1}$ denoting the bits of these numbers, the carry-less product of these two numbers is defined to be $c = \sum_{i} c_{i} 2^{i}$ , with each bit $c_{i}$ computed as the exclusive or of products of bits from the input numbers as follows:^[1]

c_{i} = ⨁_{j = 0}^{i} a_{j} b_{i - j}

Example

Consider a = 10100010₂ and b = 10010110₂, with all numbers given in binary. Then the carry-less multiplication of these is essentially what one would get from performing a long multiplication but ignoring the carries.

                  1 0 1 0 0 0 1 0 = a
   ---------------|---|-------|--
   1 0 0 1 0 1 1 0|0 0 0 0 0 0 0
       1 0 0 1 0 1 1 0|0 0 0 0 0
               1 0 0 1 0 1 1 0|0
   ------------------------------
   1 0 1 1 0 0 0 1 1 1 0 1 1 0 0
             ^ ^

For every logic 1 bit in the number a, the number b is shifted to the left as many bits as indicated by the position these bits a. All these shifted versions are then "added" by using an XOR instead of the regular binary addition used in regular long multiplication. Results of XOR are indicated by ^ . In full binary adder these would result in a carry to the column to the left. Here this does not happen so the carry-less product of a and b would be c = 101100011101100₂.

Multiplication of polynomials

The carry-less product can also be seen as a multiplication of polynomials over the field GF(2). This is because the exclusive or corresponds to the addition in this field.

In the example above, the numbers a and b correspond to polynomials

A = \sum_{i} a_{i} X^{i} = X^{7} + X^{5} + X^{1} B = \sum_{i} b_{i} X^{i} = X^{7} + X^{4} + X^{2} + X^{1}

and the product of these is

C = A \cdot B = \sum_{i} c_{i} X^{i} = X^{14} + X^{12} + X^{11} + X^{7} + X^{6} + X^{5} + X^{3} + X^{2}

which is what the number c computed above encodes. Notice how $(X^{7} \cdot X^{1}) + (X^{1} \cdot X^{7}) \equiv 0$ and $(X^{7} \cdot X^{2}) + (X^{5} \cdot X^{4}) \equiv 0$ thanks to the arithmetic in GF(2). This corresponds to the columns marked ^ in the example.

Applications

The elements of GF(2ⁿ), i.e. a finite field whose order is a power of two, are usually represented as polynomials in GF(2)[X]. Multiplication of two such field elements consists of multiplication of the corresponding polynomials, followed by a reduction with respect to some irreducible polynomial which is taken from the construction of the field. If the polynomials are encoded as binary numbers, carry-less multiplication can be used to perform the first step of this computation.

Such fields have applications in cryptography and for some checksum algorithms.

Implementations

Many modern CPU architectures support various extensions that provide hardware support for some form of carry-less multiplication. This support may take the form of either vector or scalar instructions.

Listed below are some instruction set extensions for carry-less multiplication in various CPU architectures.

CPU architecture	Instruction set extension	Instruction mnemonic	Multiplication size	Implementations
Intel x86	PCLMULQDQ (scalar)	`PCLMULQDQ`	64x64→128	Intel Core/Xeon (Westmere and later), Intel Atom (Goldmont and later) AMD Bulldozer/Jaguar/Zen, Zhaoxin ZX-C and later
Intel x86	VPCLMULQDQ (vector)	`VPCLMULQDQ`	64x64→128 (vector)	Intel Core/Xeon (Ice Lake/Rocket Lake and later) AMD Zen 3 and later

ARM (32-bit)	NEON	`VMUL.P8`	8x8→8 (vector, low half)	ARM Cortex-A8 and later
	NEON	`VMULL.P8`	8x8→16 (vector)	ARM Cortex-A8 and later
	ARMv8 Cryptographic Extensions	`VMULL.P64`	64x64→128	ARM Cortex-A57 and later, Apple Cyclone and later
ARM (64-bit)	NEON	`PMUL`	8x8→8 (vector, low half)
	NEON	`PMULL` `PMULL2`	8x8→16 (vector)
	ARMv8 Cryptographic Extensions	`PMULL`, `PMULL2`	64x64→128

Power ISA	Power ISA v2.07^[2]	VPMSUMB	8x8→16 (vector)	IBM POWER8 and later
		VPMSUMH	16x16→32 (vector)
		VPMSUMW	32x32→64 (vector)
		VPMSUMD	64x64→128

RISC-V	Zbc (scalar bit-manipulation)	`CLMUL`, `CLMULH`, `CLMULR`^[3]	XLENxXLEN→XLEN (32x32 in 32-bit mode, 64x64 in 64-bit mode, each instruction returns half of the result)	Many, e.g.: Alibaba XuanTie C908^[4] OpenHW Group CVA6^[5] Andes Technology AndesCore A25^[6] AMD MicroBlaze V^[7]
RISC-V	Zvbc (vector bit-manipulation)	`VCLMUL`, `VCLMULH`^[3]	64x64→64 (vector; low or high half)	SpacemiT K3^[8]

SPARC	VIS 3^[9]	`XMULX`, `XMULXHI`	64x64→64 (low or high half)	SPARC T4 and later

z/Architecture	Vector Facility^[10]	`VGFM`, `VGFMA`	8x8→16 (vector), 16x16→32 (vector), 32x32→64 (vector), 64x64→128	IBM z13 and later

For other targets it is possible to implement the computation above as a software algorithm, and many cryptography libraries will contain an implementation as part of their finite field arithmetic operations.

For wide carry-less multiplications, it is possible to adapt fast integer multiplication algorithms such as the Karatsuba and Toom-Cook algorithms to work with carry-less multiplications.^[11]^[12]

Other bases

The definition of a carry-less product as the result of a long multiplication discarding carry would readily apply to bases other than 2. But the result depends on the basis, which is therefore an essential part of the operation. As this operation is typically being used on computers operating in binary, the binary form discussed above is the one employed in practice.

Polynomials over other finite fields of prime order do have applications, but treating the coefficients of such a polynomial as the digits of a single number is rather uncommon, so the multiplication of such polynomials would not be seen as a carry-less multiplication of numbers.

References

↑ Shay Gueron (2011-04-13). "Intel Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode - Rev 2". Intel. http://software.intel.com/en-us/articles/intel-carry-less-multiplication-instruction-and-its-usage-for-computing-the-gcm-mode/.
↑ IBM, Power ISA, Version 2.07, 3 May 2013, page 259. Archived on 7 Jan 2017.
↑ ^3.0 ^3.1 RISC-V International, The RISC-V Instruction Set Manual, Volume I, version 20260120 - see section 30.4 on page 225 for the Zbc extension and section 33.2.2 on page 477 for the Zvbc extension. Archived on 14 Apr 2026.
↑ RISC-V International, XuanTie C908: High-performance RISC-V Processor Catered to AIoT Industry, Nov 3, 2022
↑ OpenHW Group, RVZbc: Carry-less multiplication
↑ AndesTech, AndesCore A25
↑ AMD, AMD MicroBlaze V processor
↑ SpacemiT, SpacemiT K3: A RVA23 RISC-V AI CPU with 60 TOPS AI Compute, 29 Jan 2026. Archived on 29 Apr 2026.
↑ Oracle, Oracle SPARC Architecture 2011, 12 Jan 2016, section 7.143 on page 362. Archived on 13 Dec 2025.
↑ IBM, z/Architecture Principles of Operation, ref no. SA22-7832-14, Apr 2025, see page 1700. Archived on 11 Feb 2026.
↑ ean-Marc Robert and Pascal Véron, Faster Multiplication over F2(X) using AVX512 instruction set and VPCLMULQDQ instruction, 25 Jan 2022, see appendices A and D for carry-less Karatsuba and Toom-Cook algorithms. Archived on 11 Feb 2026.
↑ Inria, gf2x library, see toom-gpl.c (archive) file for a C/C++ implementation of carryless Toom-Cook.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Carry-less product. Read more

[1] Shay Gueron (2011-04-13). "Intel Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode - Rev 2". Intel. http://software.intel.com/en-us/articles/intel-carry-less-multiplication-instruction-and-its-usage-for-computing-the-gcm-mode/.

[2] IBM, Power ISA, Version 2.07, 3 May 2013, page 259. Archived on 7 Jan 2017.

[riscv-ref-2026-3] 3.0 ^3.1 RISC-V International, The RISC-V Instruction Set Manual, Volume I, version 20260120 - see section 30.4 on page 225 for the Zbc extension and section 33.2.2 on page 477 for the Zvbc extension. Archived on 14 Apr 2026.

[4] RISC-V International, XuanTie C908: High-performance RISC-V Processor Catered to AIoT Industry, Nov 3, 2022

[5] OpenHW Group, RVZbc: Carry-less multiplication

[6] AndesTech, AndesCore A25

[7] AMD, AMD MicroBlaze V processor

[8] SpacemiT, SpacemiT K3: A RVA23 RISC-V AI CPU with 60 TOPS AI Compute, 29 Jan 2026. Archived on 29 Apr 2026.

[9] Oracle, Oracle SPARC Architecture 2011, 12 Jan 2016, section 7.143 on page 362. Archived on 13 Dec 2025.

[10] IBM, z/Architecture Principles of Operation, ref no. SA22-7832-14, Apr 2025, see page 1700. Archived on 11 Feb 2026.

[11] -Marc Robert and Pascal Véron, Faster Multiplication over F2(X) using AVX512 instruction set and VPCLMULQDQ instruction, 25 Jan 2022, see appendices A and D for carry-less Karatsuba and Toom-Cook algorithms. Archived on 11 Feb 2026.

[12] Inria, gf2x library, see toom-gpl.c (archive) file for a C/C++ implementation of carryless Toom-Cook.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

Anonymous

Search

Carry-less product

Namespaces

More

Page actions

Contents

Definition

Example

Multiplication of polynomials

Applications

Implementations

Other bases

See also

References

Navigation

Navigation

Resources

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Carry-less product

Definition

Example

Multiplication of polynomials

Applications

Implementations

Other bases

See also

References

Navigation

Wiki tools

Page tools

Other projects

Categories