On the Schonhage-Strassen Multiplication Algorithm

M. Mlist,
Victoria, Australia

September 2021

Abstract The Schonhage-Strassen algorithm for multiplying large integers makes use of the Fast Fourier Transform (FFT) algorithm which performs the Discrete Fourier Transform (DFT). An interactive demonstration of these algorithms is given.

1. Introduction

The Schonhage-Strassen algorithm [SS71] for multiplying two large integers, m and n, converts the integers into vectors of "digits", applies the Fast Fourier Transform (FFT) algorithm (which performs the Discrete Fourier Transform (DFT)) to the vectors, multiplies corresponding elements of the transformed vectors, applies to inverse FFT algorithm to the result, and lastly propagates carries to recover the digits of the product, m×n. If m and n both have L digits the Schonhage-Strassen algorithm has time-complexity O(L log(L) log(log(L))). Table 1 gives a toy interactive demonstration of these algorithms. It operates in base ten for ease of checking and it omits the recursive step in SS(M,N) but the javascript shows where.

2. Multiplication

Long-multiplication (fig. 1) of multi-digit numbers is taught in school. It involves multiplying pairs of digits, adding up columns, and propagating "carries" from right to left in each row and also in the final total.

        1 2 3 4    m x n = 1234 x 5678
      x 5 6 7 8
  --------------.   ^
  6 1 7 0       |  5|
    7 4 0 4     |  6|
      8 6 3 8   |  7|
        9 8 7 2 |  8|
  -------------
  7 0 0 6 6 5 2

    Figure 1: Long-Multiplication.

In the final total, 2=2 carry 0,  8+7+0=5 carry 1, and so on.

It makes no difference if we process n in reverse.

        1 2 3 4    m x n = 1234 x 5678
      x 5 6 7 8
  --------------.
        9 8 7 2 |  8|
      8 6 3 8   |  7|
    7 4 0 4     |  6|
  6 1 7 0       |  5|
  -------------     v
  7 0 0 6 6 5 2

    Figure 2: Flip n.

The working of long-multiplication can be laid out in another slightly different but equivalent way (fig. 3) which involves adding up along diagonals rather than columns.

       1   2   3   4    m x n = 1234 x 5678
     .<-------------
   8 | 9   8   7   2
     |               .
   7 | 8   6   3   8   .
     |               .   .
   6 | 7   4   0   4   .   .
     |               .   .   2
   5 | 6   1   7   0   .   5
     |   .   .   .   .   6
     v     .   .   .   6
             .   .   0
               .   0
                 7

   1234 x 5678 = 7006652

      Figure 3: Another Angle on Long-Multiplication.

This suggests the convolution of two vectors (sec. 3).

3. Convolutions

The (linear-) convolution of two vectors mm and nn, both of length L, can be visualised by writing the vectors as in fig. 4, performing multiplications of elements, mmi×nnj, and adding up the diagonals to give a vector of length 2L−1. This is just like long-multiplication but without propagating carries from one diagonal (column) to the next in the final result.

       1   2   3   4    1,2,3,4 conv 5,6,7,8
       <------------
   8 | 8  16  24  32
     |               .
   7 | 7  14  21  28   .
     |               .   .
   6 | 6  12  18  24   .   .
     |               .   .  32
   5 | 5  10  15  20   .  52
     |   .   .   .   .  61
     v     .   .   .  60
             .   .  34
               .  16
                 5

  5,16,34,60,61,52,32 >carries> 7,0,0,6,6,5,2

      Figure 4: Linear Convolution

If mm and nn are the digits of integers m and n respectively, the elements of the convolution are larger than the digits of m and n but if "carries" are then propagated along the convolution, e.g., 32=2 carry 3, 52+3=55=5 carry 5, and so on, then we recover the digits of their product. (The product may be of length 2L−1 or 2L.)

The convolution theorem states that
    conv(u,v) = F−1(F(u).F(v))
where F is the Discrete Fourier Transform, F−1 is the inverse transform, and '.' is element-wise multiplication, (F(u))i×(F(v))i. If a vector v has n elements, the Fast Fourier Algorithm (FFT) (see below) computes the Discrete Fourier Transform of v in O(n log n) steps.

4. DFT and FFT

An integer, e.g., n=d3d2d1d0 can be thought of as a polynomial, e.g., f(x)=d3x3+d2x2+d1x+d0. A polynomial of degree L−1 is also uniquely defined by its values at L, or more, distinct points and, taken together, these values form an alternative representation of the polynomial; it is quick to multiply two polynomials represented in this way on a common set of points. The product of two polynomials each of degree L−1 is of degree 2(L−1) and is uniquely defined by values at 2L−1 points (so 2L points more than suffice).

The Karatsuba (O(dlog2(3)) time) and Toom3 (O(dlog3(5)) time) algorithms for integer multiplication can be constructed by considering the values of the polynomials derived from integers at certain values – at −1, 0 and +1 for Karatsuba and at 0, +1, −1, −2 and "∞" for Toom3 [All21]. In comparison, a Discrete Fourier Transform is made of the values of a polynomial at powers of a primitive Lth root of unity where L is a power of two (if a given L is not a power of two it is rounded up to a power, e.g., 5→8, and the integer/polynomial padded with leading zeroes).

The Fast Fourier Transform (FFT) algorithm [CT65] performs the Discrete Fourier Transform (DFT) of a vector, quickly.

E.g., FFT, n=4 . . .
n=d3,d2,d1,d0; or f(x)=d3x3+d2x2+d1x+d0 divide
ne=d2, d0 no=d3, d1 divide
ne,e=d0 ne,o=d2 no,e=d1 no,o=d3 base cases
ye,0=d0+d2, ye,1=d0−d2 yo,0=d1+d3, yo,1=d1−d3 conquer
y0=ye,0+yo,0=d0+d2+d1+d3=f(1);
y1=ye,1+ryo,1=d0−d2+r(d1−d3)=f(r);
y2=ye,0−yo,0=d0+d2−(d1+d3)=f(r^2);
y3=ye,1−ryo,1=d0−d2−r(d1−d3)=f(r^3)
conquer
where ne is evens, no is odds and, in this case, r is a primitive (4th) root of unity, r2=−1, r3=−r, r4=1.
A remarkable thing is that, with a small modification, the same process gives its inverse {yi}→{di}. Check . . .
x0=y0+y2+y1+y3=4d0
x1=y0−y2+r-1(y1−y3)=4d1
x2=y0+y2−(y1+y3)=4d2
x3=y0−y2−r-1(y1−y3)=4d3
so instead of r use r-1 (=r3 in this case), and divide the answer by two at each "conquer" step.
The example above where n=4 generalises to 'n' being other powers of 2.

The demonstration below calculates the linear-, cyclic-, and negative-cyclic- convolutions of m and n; these take quadratic time and are for comparison purposes. It also interpolates the product polynomial which involves continuous values and thus may show rounding errors. It lastly runs a simplified(!) S and S algorithm. Note, if m or n have more than 8 digits results may be too big for a javascript "int".

m=  n=   

— Table 1, demonstration of the algorithm —

5. Rings

In signal processing the FFT algorithm is performed on complex numbers C with r being a primitive root of unity in C but it can be performed over rings other than C. All that FFT requires is that there are primitive 2?th roots of unity.

Ring: + commutative & associative; × associative; ∃ 0 s.t. ∀ a, a+0=0+a=a; ∀ a, ∃ (−a) s.t. a+(−a)=0; ∃ 1 s.t. ∀ a, a×1=1×a=a; × distributes over +. (× can, but need not, be commutative. We can write a−b for a+(−b).)
E.g., ∀ k>1, Zk is a ring.  (If k happens to be a prime number then Zk is a field in which case there are no divisors of zero and every non-zero element has a multiplicative inverse.)
Note, (k−1)2=k(k−2)+1, so (k−1)2=1 mod k, i.e., k−1 (=−1) is a square root of unity if k>2.
Z5: 22=4, 32=4, 42=1.
Note, 2 is a primitive (4th) root of unity 2→22=4→23=3→24=1. (So is 3: 3→4→2→1, which gives the same cycle of values in reverse.)
Z9: 22=4, 32=9, 42=7, 52=7, 62=0, 72=4, 82=1.
Note, 2 is a primitive (sixth) root of unity, 2→4→8→7→5→1.
Z17 (17=24+1),
squares: 0, 1, 4, 9, 16, 52=8, 62=2, 72=15, 82=13, 92=13, 102=15, 112=2, 122=8, 132=16, 142=9, 152=4, 162=1.
Note, 4 is a primitive (4th) root of unity, 4→16→13→1 and 2 is a primitive (8th) root of unity, 2→4→8→16→15→13→9→1.
2 is a 2kth root of unity in Z2k+1, as 2k=(2k+1)−1=−1, so 22k=(2k)2=1, all mod 2k+1.
In Z2k+1, r=2, rk=2k=−1, and ∀ j, rj=−rk+j.

For use in integer multiplication, the ring Zk means there is no rounding error (as would be the case with complex numbers). The only question is, how big should the modulus k=2?+1 be? It should be large enough that multiplying digits and then summing a diagonal (fig. 4) should accomodate distinct values from zero to at least (base−1)2×L.

5.1 Mod basep+1

An interesting thing

123456 mod 1110 = −1+2−3+4−5+6 mod 11 = 3
123456 mod 10110 = 12−34+56 = 34 mod 101,   (123456=1222×101+34)
123456 mod 100110 = −123+456 mod 1001 = 333
consider the base ten numeral d5d4d3d2d1d0 and the operation 'mod 10110' (mod one hundred and one).
d1d0 mod 101 = d1d0
d3d2×100 mod 101 = (d3d2×101 − d3d2) mod 101 = −d3d2
d5d4×10000 mod 101 = (d5d4×100×100) mod 101 = (d5d4×100×101 − d5d4×100) mod 101 = −d5d4×100 mod 101 = +d5d4
i.e., d5d4d3d2d1d0 mod 101 = d5d4−d3d2+d1d0 mod 101.
(Note, 4321 mod 101 = (−43+21) mod 101 = −22 mod 101 = 79.)

This relates to the negacyclic convolution (and suggests another way of achieving 'mod 2k+1').

Corresponding results hold for other bases including 2k and the operation 'mod (2k+1)' where, instead of digit groupings and digit-shifts, bit groupings and bit-shifts are used – useful in Z2k+1. This avoids the use of division to implement mod.

5.2 Mod basep−1

In a similar vein to the previous section

123456 mod 999 = 123+456 = 579
123456 mod 99 = 12+34+56 mod 99 = 102 mod 99 = 3
123456 mod 9 = 1+2+3+4+5+6 mod 9 = 21 mod 9 = 3

This relates to the cyclic convolution.

5.3 the "base"

For ease of checking, the demonstration (table 1) divides m and n into decimal digits; you could say that it uses base ten. It could as well use base 100 or base 2k. The best choice is approximately √L.

6. Conclusion

The Schonhage-Strassen algorithm [SS71] (O(L log(L) log(log(L)))-time for L-digit integers) was the fastest multiplication algorithm for large integers for almost four decades until Furer [Fu07] gave an algorithm that is asymptotically faster for enormous integers. Harvey and Van Der Hoeven [HV19] finally removed the log log L factor, as Schonhage and Strassen had speculated would be possible.

References

[All21] L. Allison, 'Arithmetic on bigInts', Aug., 2021 [www]←click.
Discusses and demonstrates +, − × and ÷ on large integers, on class bigInt.

[CT65] J. W. Cooley and J. W. Tukey, 'An algorithm for the machine calculation of complex Fourier series', Math. Comp., 19, 297-301, 1965 [doi:10.1090/S0025-5718-1965-0178586-1].
Also see the FFT@[wikip]['21]; another source is [Co09].

[Co09] T. Cormen, C. Leiserson, R. Rivest, C. Stein, 'Introduction to Algorithms', MIT Press, 3rd ed., 2009.
Discusses the FFT and briefly sketches the Schonhage-Strassen algorithm.

[Fu07] M. Furer, 'Faster integer multiplication', STOC 39, pp.57-66, 2007 [doi:10.1145/1250790.1250800] or [www]['21].
"... running in time n log(n) 2(log*(n)) ..."  (Note here, "log* (log star) is the number of times the logarithm function must be iteratively applied before the result is less than or equal to 1. ..." – wikip.  And, "... 2(log*(n)) is better than log(log(n)) for n > 2264 ..." – wikip.  So little difference for practical values.)  Also see [HV21].

[HV19] D. Harvey and J. Van Der Hoeven, 'Integer multiplication in time O(n log n)', HALarchive, 2019 [www]['21].
This is conjectured to be as fast as is possible. Will it be proven so?

[SS71] A. Schonhage and V. Strassen, 'Schnelle Multiplikation grosser Zahlen', Computing, 7, pp.281-292, 1971, [doi:10.1007/BF02242355].
Also see SS@[wikip]['21].

r is a primitive kth root of unity iff the smallest j s.t. rj = 1 is j = k.