Dixon's factorization method

From HandWiki

In number theory, Dixon's factorization method (also Dixon's random squares method[1] or Dixon's algorithm) is a general-purpose integer factorization algorithm; it is the prototypical factor base method. Unlike for other factor base methods, its run-time bound comes with a rigorous proof that does not rely on conjectures about the smoothness properties of the values taken by a polynomial. The algorithm was designed by John D. Dixon, a mathematician at Carleton University, and was published in 1981.[2]

Basic idea

Dixon's method is based on finding a congruence of squares modulo the integer N which is intended to factor. Fermat's factorization method finds such a congruence by selecting random or pseudo-random x values and hoping that the integer x2 mod N is a perfect square (in the integers):

[math]\displaystyle{ x^2\equiv y^2\quad(\hbox{mod }N),\qquad x\not\equiv\pm y\quad(\hbox{mod }N). }[/math]

For example, if N = 84923, (by starting at 292, the first number greater than N and counting up) the 5052 mod 84923 is 256, the square of 16. So (505 − 16)(505 + 16) = 0 mod 84923. Computing the greatest common divisor of 505 − 16 and N using Euclid's algorithm gives 163, which is a factor of N.

In practice, selecting random x values will take an impractically long time to find a congruence of squares, since there are only N squares less than N.

Dixon's method replaces the condition "is the square of an integer" with the much weaker one "has only small prime factors"; for example, there are 292 squares smaller than 84923; 662 numbers smaller than 84923 whose prime factors are only 2,3,5 or 7; and 4767 whose prime factors are all less than 30. (Such numbers are called B-smooth with respect to some bound B.)

If there are many numbers [math]\displaystyle{ a_1 \ldots a_n }[/math] whose squares can be factorized as [math]\displaystyle{ a_i^2 \mod N = \prod_{j=1}^m b_j^{e_{ij}} }[/math] for a fixed set [math]\displaystyle{ b_1 \ldots b_m }[/math] of small primes, linear algebra modulo 2 on the matrix [math]\displaystyle{ e_{ij} }[/math] will give a subset of the [math]\displaystyle{ a_i }[/math] whose squares combine to a product of small primes to an even power — that is, a subset of the [math]\displaystyle{ a_i }[/math] whose squares multiply to the square of a (hopefully different) number mod N.

Method

Suppose the composite number N is being factored. Bound B is chosen, and the factor base is identified (which is called P), the set of all primes less than or equal to B. Next, positive integers z are sought such that z2 mod N is B-smooth. Therefore we can write, for suitable exponents ai,

[math]\displaystyle{ z^2 \text{ mod } N = \prod_{p_i\in P} p_i^{a_i} }[/math]

When enough of these relations have been generated (it is generally sufficient that the number of relations be a few more than the size of P), the methods of linear algebra, such as Gaussian elimination, can be used to multiply together these various relations in such a way that the exponents of the primes on the right-hand side are all even:

[math]\displaystyle{ {z_1^2 z_2^2 \cdots z_k^2 \equiv \prod_{p_i\in P} p_i^{a_{i,1}+a_{i,2}+\cdots+a_{i,k}}\ \pmod{N}\quad (\text{where } a_{i,1}+a_{i,2}+\cdots+a_{i,k} \equiv 0\pmod{2}) } }[/math]

This yields a congruence of squares of the form a2b2 (mod N), which can be turned into a factorization of N, N = gcd(a + b, N) × (N/gcd(a + b, N)). This factorization might turn out to be trivial (i.e. N = N × 1), which can only happen if a ≡ ±b (mod N), in which case another try must be made with a different combination of relations; but if a nontrivial pair of factors of N is reached, the algorithm terminates.

Pseudocode

input: positive integer [math]\displaystyle{ N }[/math]
output: non-trivial factor of [math]\displaystyle{ N }[/math]

Choose bound [math]\displaystyle{ B }[/math]
Let [math]\displaystyle{ P := \{p_1, p_2, \ldots, p_k\} }[/math] be all primes [math]\displaystyle{ \leq B }[/math]

repeat
    for [math]\displaystyle{ i=1 }[/math] to [math]\displaystyle{ k+1 }[/math] do
        Choose [math]\displaystyle{ 0\lt z_i\lt N }[/math] such that [math]\displaystyle{ z_i^2 \text{ mod } N }[/math] is [math]\displaystyle{ B }[/math]-smooth
        Let [math]\displaystyle{ a_i := \{a_{i1}, a_{i2}, \ldots, a_{ik}\} }[/math] such that [math]\displaystyle{ z_i^2 \text{ mod } N = \prod_{p_j \in P} p_j^{a_{ij}} }[/math]
    end for

    Find non-empty [math]\displaystyle{ T \subseteq \{1, 2, \ldots, k+1\} }[/math] such that [math]\displaystyle{ \sum_{i \in T} a_i \equiv \vec{0} \pmod{2} }[/math]
    Let [math]\displaystyle{ x := \left(\prod_{i \in T} z_i\right) \text{ mod } N }[/math]
        [math]\displaystyle{ y := \left(\prod_{p_j \in P} p_j^{\left(\sum_{i \in T} a_{ij}\right)/2} \right) \text{ mod } N }[/math]
while [math]\displaystyle{ x \equiv \pm y \pmod{N} }[/math] 

return [math]\displaystyle{ \gcd(x+y, N) }[/math]

Example

This example will try to factor N = 84923 using bound B = 7. The factor base is then P = {2, 3, 5, 7}. A search can be made for integers between [math]\displaystyle{ \left\lceil\sqrt{84923} \right\rceil = 292 }[/math] and N whose squares mod N are B-smooth. Suppose that two of the numbers found are 513 and 537:

[math]\displaystyle{ 513^2 \mod 84923 = 8400 = 2^4 \cdot 3 \cdot 5^2 \cdot 7 }[/math]
[math]\displaystyle{ 537^2 \mod 84923 = 33600 = 2^6 \cdot 3 \cdot 5^2 \cdot 7 }[/math]

So

[math]\displaystyle{ (513 \cdot 537)^2 \mod 84923 = 2^{10} \cdot 3^2 \cdot 5^4 \cdot 7^2 \mod 84923 }[/math]

Then

[math]\displaystyle{ \begin{align} & {} (513 \cdot 537)^2 \mod 84923 \\ & = (275481)^2 \mod 84923 \\ & = (84923 \cdot 3 + 20712)^2 \mod 84923 \\ & =(84923 \cdot 3)^2 + 2\cdot(84923\cdot 3 \cdot 20712) + 20712^2 \mod 84923 \\ & = 0 + 0 + 20712^2 \mod 84923 \end{align} }[/math]

That is, [math]\displaystyle{ 20712^2 \mod 84923 = (2^5 \cdot 3 \cdot 5^2 \cdot 7)^2 \mod 84923 = 16800^2 \mod 84923. }[/math]

The resulting factorization is 84923 = gcd(20712 − 16800, 84923) × gcd(20712 + 16800, 84923) = 163 × 521.

Optimizations

The quadratic sieve is an optimization of Dixon's method. It selects values of x close to the square root of N such that x2 modulo N is small, thereby largely increasing the chance of obtaining a smooth number.

Other ways to optimize Dixon's method include using a better algorithm to solve the matrix equation, taking advantage of the sparsity of the matrix: a number z cannot have more than [math]\displaystyle{ \log_2 z }[/math] factors, so each row of the matrix is almost all zeros. In practice, the block Lanczos algorithm is often used. Also, the size of the factor base must be chosen carefully: if it is too small, it will be difficult to find numbers that factorize completely over it, and if it is too large, more relations will have to be collected.

A more sophisticated analysis, using the approximation that a number has all its prime factors less than [math]\displaystyle{ N^{1/a} }[/math] with probability about [math]\displaystyle{ a^{-a} }[/math] (an approximation to the Dickman–de Bruijn function), indicates that choosing too small a factor base is much worse than too large, and that the ideal factor base size is some power of [math]\displaystyle{ \exp\left(\sqrt{\log N \log \log N}\right) }[/math].

The optimal complexity of Dixon's method is

[math]\displaystyle{ O\left(\exp\left(2 \sqrt 2 \sqrt{\log n \log \log n}\right)\right) }[/math]

in big-O notation, or

[math]\displaystyle{ L_n [1/2, 2 \sqrt 2] }[/math]

in L-notation.

References

  1. Kleinjung, Thorsten (2010). "Factorization of a 768-Bit RSA Modulus". Advances in Cryptology – CRYPTO 2010. Lecture Notes in Computer Science. 6223. pp. 333–350. doi:10.1007/978-3-642-14623-7_18. ISBN 978-3-642-14622-0. 
  2. Dixon, J. D. (1981). "Asymptotically fast factorization of integers". Math. Comp. 36 (153): 255–260. doi:10.1090/S0025-5718-1981-0595059-1.