Schonhage Strassen

M. Mlist,
Victoria, Australia

September 2021

Abstract The Schonhage-Strassen algorithm for multiplying large integers makes use of the Fast Fourier Transform (FFT) algorithm which performs the Discrete Fourier Transform (DFT). An interactive demonstration of these algorithms is given.

1. Introduction

The Schonhage-Strassen algorithm [SS71] for multiplying two large integers, m and n, converts the integers into vectors of "digits", applies the Fast Fourier Transform (FFT) algorithm (which performs the Discrete Fourier Transform (DFT)) to the vectors, multiplies corresponding elements of the transformed vectors, applies to inverse FFT algorithm to the result, and lastly propagates carries to recover the digits of the product, m×n. If m and n both have L digits the Schonhage-Strassen algorithm has time-complexity O(L log(L) log(log(L))). Table 1 gives a toy interactive demonstration of these algorithms. It operates in base ten for ease of checking and it omits the recursive step in SS(M,N) but the javascript shows where.

2. Multiplication

Long-multiplication (fig. 1) of multi-digit numbers is taught in school. It involves multiplying pairs of digits, adding up columns, and propagating "carries" from right to left in each row and also in the final total.

        1 2 3 4    m x n = 1234 x 5678
      x 5 6 7 8
  --------------.   ^
  6 1 7 0       |  5|
    7 4 0 4     |  6|
      8 6 3 8   |  7|
        9 8 7 2 |  8|
  -------------
  7 0 0 6 6 5 2

Figure 1: Long-Multiplication.

In the final total, 2=2 carry 0, 8+7+0=5 carry 1, and so on.

It makes no difference if we process n in reverse.

        1 2 3 4    m x n = 1234 x 5678
      x 5 6 7 8
  --------------.
        9 8 7 2 |  8|
      8 6 3 8   |  7|
    7 4 0 4     |  6|
  6 1 7 0       |  5|
  -------------     v
  7 0 0 6 6 5 2

Figure 2: Flip n.

The working of long-multiplication can be laid out in another slightly different but equivalent way (fig. 3) which involves adding up along diagonals rather than columns.

       1   2   3   4    m x n = 1234 x 5678
     .<-------------
   8 | 9   8   7   2
     |               .
   7 | 8   6   3   8   .
     |               .   .
   6 | 7   4   0   4   .   .
     |               .   .   2
   5 | 6   1   7   0   .   5
     |   .   .   .   .   6
     v     .   .   .   6
             .   .   0
               .   0
                 7

   1234 x 5678 = 7006652

Figure 3: Another Angle on Long-Multiplication.

This suggests the convolution of two vectors (sec. 3).

3. Convolutions

The (linear-) convolution of two vectors mm and nn, both of length L, can be visualised by writing the vectors as in fig. 4, performing multiplications of elements, mm_i×nn_j, and adding up the diagonals to give a vector of length 2L−1. This is just like long-multiplication but without propagating carries from one diagonal (column) to the next in the final result.

       1   2   3   4    1,2,3,4 conv 5,6,7,8
       <------------
   8 | 8  16  24  32
     |               .
   7 | 7  14  21  28   .
     |               .   .
   6 | 6  12  18  24   .   .
     |               .   .  32
   5 | 5  10  15  20   .  52
     |   .   .   .   .  61
     v     .   .   .  60
             .   .  34
               .  16
                 5

  5,16,34,60,61,52,32 >carries> 7,0,0,6,6,5,2

Figure 4: Linear Convolution

If mm and nn are the digits of integers m and n respectively, the elements of the convolution are larger than the digits of m and n but if "carries" are then propagated along the convolution, e.g., 32=2 carry 3, 52+3=55=5 carry 5, and so on, then we recover the digits of their product. (The product may be of length 2L−1 or 2L.)

The convolution theorem states that
conv(u,v) = F⁻¹(F(u).F(v))
where F is the Discrete Fourier Transform, F⁻¹ is the inverse transform, and '.' is element-wise multiplication, (F(u))_i×(F(v))_i. If a vector v has n elements, the Fast Fourier Algorithm (FFT) (see below) computes the Discrete Fourier Transform of v in O(n log n) steps.

4. DFT and FFT

An integer, e.g., n=d₃d₂d₁d₀ can be thought of as a polynomial, e.g., f(x)=d₃x³+d₂x²+d₁x+d₀. A polynomial of degree L−1 is also uniquely defined by its values at L, or more, distinct points and, taken together, these values form an alternative representation of the polynomial; it is quick to multiply two polynomials represented in this way on a common set of points. The product of two polynomials each of degree L−1 is of degree 2(L−1) and is uniquely defined by values at 2L−1 points (so 2L points more than suffice).

The Karatsuba (O(d^log₂(3)) time) and Toom3 (O(d^log₃(5)) time) algorithms for integer multiplication can be constructed by considering the values of the polynomials derived from integers at certain values – at −1, 0 and +1 for Karatsuba and at 0, +1, −1, −2 and "∞" for Toom3 [All21]. In comparison, a Discrete Fourier Transform is made of the values of a polynomial at powers of a primitive^† L^th root of unity where L is a power of two (if a given L is not a power of two it is rounded up to a power, e.g., 5→8, and the integer/polynomial padded with leading zeroes).

The Fast Fourier Transform (FFT) algorithm [CT65] performs the Discrete Fourier Transform (DFT) of a vector, quickly.

E.g., FFT, n=4 . . .

n=d₃,d₂,d₁,d₀; or f(x)=d₃x³+d₂x²+d₁x+d₀				divide
n_e=d₂, d₀		n_o=d₃, d₁		divide
n_e,e=d₀	n_e,o=d₂	n_o,e=d₁	n_o,o=d₃	base cases
y_e,0=d₀+d₂, y_e,1=d₀−d₂		y_o,0=d₁+d₃, y_o,1=d₁−d₃		conquer
y₀=y_e,0+y_o,0=d₀+d₂+d₁+d₃=f(1); y₁=y_e,1+ry_o,1=d₀−d₂+r(d₁−d₃)=f(r); y₂=y_e,0−y_o,0=d₀+d₂−(d₁+d₃)=f(r^2); y₃=y_e,1−ry_o,1=d₀−d₂−r(d₁−d₃)=f(r^3)				conquer

where n_e is evens, n_o is odds and, in this case, r is a primitive (4th) root of unity, r²=−1, r³=−r, r⁴=1.

A remarkable thing is that, with a small modification, the same process gives its inverse {y_i}→{d_i}. Check . . .

x₀=y₀+y₂+y₁+y₃=4d₀
x₁=y₀−y₂+r^-1(y₁−y₃)=4d₁
x₂=y₀+y₂−(y₁+y₃)=4d₂
x₃=y₀−y₂−r^-1(y₁−y₃)=4d₃

so instead of r use r^-1 (=r³ in this case), and divide the answer by two at each "conquer" step.

The example above where n=4 generalises to 'n' being other powers of 2.

The demonstration below calculates the linear-, cyclic-, and negative-cyclic- convolutions of m and n; these take quadratic time and are for comparison purposes. It also interpolates the product polynomial which involves continuous values and thus may show rounding errors. It lastly runs a simplified(!) S and S algorithm. Note, if m or n have more than 8 digits results may be too big for a javascript "int".

5. Rings

In signal processing the FFT algorithm is performed on complex numbers C with r being a primitive root of unity in C but it can be performed over rings other than C. All that FFT requires is that there are primitive 2^?th roots of unity.

Ring: + commutative & associative; × associative; ∃ 0 s.t. ∀ a, a+0=0+a=a; ∀ a, ∃ (−a) s.t. a+(−a)=0; ∃ 1 s.t. ∀ a, a×1=1×a=a; × distributes over +. (× can, but need not, be commutative. We can write a−b for a+(−b).) E.g., ∀ k>1, Z_k is a ring. (If k happens to be a prime number then Z_k is a field in which case there are no divisors of zero and every non-zero element has a multiplicative inverse.): Note, (k−1)²=k(k−2)+1, so (k−1)²=1 mod k, i.e., k−1 (=−1) is a square root of unity if k>2.
Z₅: 2²=4, 3²=4, 4²=1.: Note, 2 is a primitive (4th) root of unity 2→2²=4→2³=3→2⁴=1. (So is 3: 3→4→2→1, which gives the same cycle of values in reverse.)
Z₉: 2²=4, 3²=9, 4²=7, 5²=7, 6²=0, 7²=4, 8²=1.: Note, 2 is a primitive (sixth) root of unity, 2→4→8→7→5→1.
Z₁₇ (17=2⁴+1),: squares: 0, 1, 4, 9, 16, 5²=8, 6²=2, 7²=15, 8²=13, 9²=13, 10²=15, 11²=2, 12²=8, 13²=16, 14²=9, 15²=4, 16²=1.; Note, 4 is a primitive (4th) root of unity, 4→16→13→1 and 2 is a primitive (8th) root of unity, 2→4→8→16→15→13→9→1.
2 is a 2k^th root of unity in Z_2^k+1, as 2^k=(2^k+1)−1=−1, so 2^2k=(2^k)²=1, all mod 2^k+1.: In Z_2^k+1, r=2, r^k=2^k=−1, and ∀ j, r^j=−r^k+j.

For use in integer multiplication, the ring Z_k means there is no rounding error (as would be the case with complex numbers). The only question is, how big should the modulus k=2^?+1 be? It should be large enough that multiplying digits and then summing a diagonal (fig. 4) should accomodate distinct values from zero to at least (base−1)²×L.

5.1 Mod base^p+1

An interesting thing

123456 mod 11₁₀ = −1+2−3+4−5+6 mod 11 = 3; 123456 mod 101₁₀ = 12−34+56 = 34 mod 101, (123456=1222×101+34); 123456 mod 1001₁₀ = −123+456 mod 1001 = 333
consider the base ten numeral d₅d₄d₃d₂d₁d₀ and the operation 'mod 101₁₀' (mod one hundred and one).: d₁d₀ mod 101 = d₁d₀; d₃d₂×100 mod 101 = (d₃d₂×101 − d₃d₂) mod 101 = −d₃d₂; d₅d₄×10000 mod 101 = (d₅d₄×100×100) mod 101 = (d₅d₄×100×101 − d₅d₄×100) mod 101 = −d₅d₄×100 mod 101 = +d₅d₄
i.e., d₅d₄d₃d₂d₁d₀ mod 101 = d₅d₄−d₃d₂+d₁d₀ mod 101.
(Note, 4321 mod 101 = (−43+21) mod 101 = −22 mod 101 = 79.)

This relates to the negacyclic convolution (and suggests another way of achieving 'mod 2^k+1').

Corresponding results hold for other bases including 2^k and the operation 'mod (2^k+1)' where, instead of digit groupings and digit-shifts, bit groupings and bit-shifts are used – useful in Z_2^k+1. This avoids the use of division to implement mod.

5.2 Mod base^p−1

In a similar vein to the previous section

123456 mod 999 = 123+456 = 579
123456 mod 99 = 12+34+56 mod 99 = 102 mod 99 = 3
123456 mod 9 = 1+2+3+4+5+6 mod 9 = 21 mod 9 = 3

This relates to the cyclic convolution.

5.3 the "base"

For ease of checking, the demonstration (table 1) divides m and n into decimal digits; you could say that it uses base ten. It could as well use base 100 or base 2^k. The best choice is approximately √L.

6. Conclusion

The Schonhage-Strassen algorithm [SS71] (O(L log(L) log(log(L)))-time for L-digit integers) was the fastest multiplication algorithm for large integers for almost four decades until Furer [Fu07] gave an algorithm that is asymptotically faster for enormous integers. Harvey and Van Der Hoeven [HV19] finally removed the log log L factor, as Schonhage and Strassen had speculated would be possible.

References

[All21] L. Allison, 'Arithmetic on bigInts', Aug., 2021 [www]←click.: Discusses and demonstrates +, − × and ÷ on large integers, on class bigInt.
[CT65] J. W. Cooley and J. W. Tukey, 'An algorithm for the machine calculation of complex Fourier series', Math. Comp., 19, 297-301, 1965 [doi:10.1090/S0025-5718-1965-0178586-1].: Also see the FFT@[wikip]['21]; another source is [Co09].
[Co09] T. Cormen, C. Leiserson, R. Rivest, C. Stein, 'Introduction to Algorithms', MIT Press, 3rd ed., 2009.: Discusses the FFT and briefly sketches the Schonhage-Strassen algorithm.
[Fu07] M. Furer, 'Faster integer multiplication', STOC 39, pp.57-66, 2007 [doi:10.1145/1250790.1250800] or [www]['21].: "... running in time n log(n) 2^{(log^*(n))} ..." (Note here, "log^* (log star) is the number of times the logarithm function must be iteratively applied before the result is less than or equal to 1. ..." – wikip. And, "... 2^{(log^*(n))} is better than log(log(n)) for n > 2^2⁶⁴ ..." – wikip. So little difference for practical values.) Also see [HV21].
[HV19] D. Harvey and J. Van Der Hoeven, 'Integer multiplication in time O(n log n)', HALarchive, 2019 [www]['21].: This is conjectured to be as fast as is possible. Will it be proven so?
[SS71] A. Schonhage and V. Strassen, 'Schnelle Multiplikation grosser Zahlen', Computing, 7, pp.281-292, 1971, [doi:10.1007/BF02242355].: Also see SS@[wikip]['21].

^† r is a primitive k^th root of unity iff the smallest j s.t. r^j = 1 is j = k.

M. Mlist,Victoria, Australia