Abstract The Schonhage-Strassen algorithm for multiplying large integers makes use of the Fast Fourier Transform (FFT) algorithm which performs the Discrete Fourier Transform (DFT). An interactive demonstration of these algorithms is given.
The Schonhage-Strassen algorithm [SS71] for multiplying two large integers, m and n, converts the integers into vectors of "digits", applies the Fast Fourier Transform (FFT) algorithm (which performs the Discrete Fourier Transform (DFT)) to the vectors, multiplies corresponding elements of the transformed vectors, applies to inverse FFT algorithm to the result, and lastly propagates carries to recover the digits of the product, m×n. If m and n both have L digits the Schonhage-Strassen algorithm has time-complexity O(L log(L) log(log(L))). Table 1 gives a toy interactive demonstration of these algorithms. It operates in base ten for ease of checking and it omits the recursive step in SS(M,N) but the javascript shows where.
Long-multiplication (fig. 1) of multi-digit numbers is taught in school. It involves multiplying pairs of digits, adding up columns, and propagating "carries" from right to left in each row and also in the final total.
1 2 3 4 m x n = 1234 x 5678 x 5 6 7 8 --------------. ^ 6 1 7 0 | 5| 7 4 0 4 | 6| 8 6 3 8 | 7| 9 8 7 2 | 8| ------------- 7 0 0 6 6 5 2
In the final total, 2=2 carry 0, 8+7+0=5 carry 1, and so on.
It makes no difference if we process n in reverse.
1 2 3 4 m x n = 1234 x 5678 x 5 6 7 8 --------------. 9 8 7 2 | 8| 8 6 3 8 | 7| 7 4 0 4 | 6| 6 1 7 0 | 5| ------------- v 7 0 0 6 6 5 2
The working of long-multiplication can be laid out in another slightly different but equivalent way (fig. 3) which involves adding up along diagonals rather than columns.
1 2 3 4 m x n = 1234 x 5678 .<------------- 8 | 9 8 7 2 | . 7 | 8 6 3 8 . | . . 6 | 7 4 0 4 . . | . . 2 5 | 6 1 7 0 . 5 | . . . . 6 v . . . 6 . . 0 . 0 7 1234 x 5678 = 7006652
This suggests the convolution of two vectors (sec. 3).
The (linear-) convolution of two vectors mm and nn, both of length L, can be visualised by writing the vectors as in fig. 4, performing multiplications of elements, mmi×nnj, and adding up the diagonals to give a vector of length 2L−1. This is just like long-multiplication but without propagating carries from one diagonal (column) to the next in the final result.
1 2 3 4 1,2,3,4 conv 5,6,7,8 <------------ 8 | 8 16 24 32 | . 7 | 7 14 21 28 . | . . 6 | 6 12 18 24 . . | . . 32 5 | 5 10 15 20 . 52 | . . . . 61 v . . . 60 . . 34 . 16 5 5,16,34,60,61,52,32 >carries> 7,0,0,6,6,5,2
If mm and nn are the digits of integers m and n respectively, the elements of the convolution are larger than the digits of m and n but if "carries" are then propagated along the convolution, e.g., 32=2 carry 3, 52+3=55=5 carry 5, and so on, then we recover the digits of their product. (The product may be of length 2L−1 or 2L.)
The convolution theorem states that
conv(u,v) = F−1(F(u).F(v))
where F is the Discrete Fourier Transform,
F−1 is the inverse transform, and
'.' is element-wise multiplication,
(F(u))i×(F(v))i.
If a vector v has n elements, the Fast Fourier Algorithm (FFT) (see below)
computes the Discrete Fourier Transform of v in O(n log n) steps.
An integer, e.g., n=d3d2d1d0 can be thought of as a polynomial, e.g., f(x)=d3x3+d2x2+d1x+d0. A polynomial of degree L−1 is also uniquely defined by its values at L, or more, distinct points and, taken together, these values form an alternative representation of the polynomial; it is quick to multiply two polynomials represented in this way on a common set of points. The product of two polynomials each of degree L−1 is of degree 2(L−1) and is uniquely defined by values at 2L−1 points (so 2L points more than suffice).
The Karatsuba (O(dlog2(3)) time) and Toom3 (O(dlog3(5)) time) algorithms for integer multiplication can be constructed by considering the values of the polynomials derived from integers at certain values – at −1, 0 and +1 for Karatsuba and at 0, +1, −1, −2 and "∞" for Toom3 [All21]. In comparison, a Discrete Fourier Transform is made of the values of a polynomial at powers of a primitive† Lth root of unity where L is a power of two (if a given L is not a power of two it is rounded up to a power, e.g., 5→8, and the integer/polynomial padded with leading zeroes).
The Fast Fourier Transform (FFT) algorithm [CT65] performs the Discrete Fourier Transform (DFT) of a vector, quickly.
n=d3,d2,d1,d0; or f(x)=d3x3+d2x2+d1x+d0 | divide | |||
ne=d2, d0 | no=d3, d1 | divide | ||
ne,e=d0 | ne,o=d2 | no,e=d1 | no,o=d3 | base cases |
ye,0=d0+d2, ye,1=d0−d2 | yo,0=d1+d3, yo,1=d1−d3 | conquer | ||
y0=ye,0+yo,0=d0+d2+d1+d3=f(1); y1=ye,1+ryo,1=d0−d2+r(d1−d3)=f(r); y2=ye,0−yo,0=d0+d2−(d1+d3)=f(r^2); y3=ye,1−ryo,1=d0−d2−r(d1−d3)=f(r^3) | conquer |
The demonstration below calculates the linear-, cyclic-, and negative-cyclic- convolutions of m and n; these take quadratic time and are for comparison purposes. It also interpolates the product polynomial which involves continuous values and thus may show rounding errors. It lastly runs a simplified(!) S and S algorithm. Note, if m or n have more than 8 digits results may be too big for a javascript "int".
In signal processing the FFT algorithm is performed on complex numbers C with r being a primitive root of unity in C but it can be performed over rings other than C. All that FFT requires is that there are primitive 2?th roots of unity.
For use in integer multiplication, the ring Zk means there is no rounding error (as would be the case with complex numbers). The only question is, how big should the modulus k=2?+1 be? It should be large enough that multiplying digits and then summing a diagonal (fig. 4) should accomodate distinct values from zero to at least (base−1)2×L.
An interesting thing
This relates to the negacyclic convolution (and suggests another way of achieving 'mod 2k+1').
Corresponding results hold for other bases including 2k and the operation 'mod (2k+1)' where, instead of digit groupings and digit-shifts, bit groupings and bit-shifts are used – useful in Z2k+1. This avoids the use of division to implement mod.
In a similar vein to the previous section
This relates to the cyclic convolution.
For ease of checking, the demonstration (table 1) divides m and n into decimal digits; you could say that it uses base ten. It could as well use base 100 or base 2k. The best choice is approximately √L.
The Schonhage-Strassen algorithm [SS71] (O(L log(L) log(log(L)))-time for L-digit integers) was the fastest multiplication algorithm for large integers for almost four decades until Furer [Fu07] gave an algorithm that is asymptotically faster for enormous integers. Harvey and Van Der Hoeven [HV19] finally removed the log log L factor, as Schonhage and Strassen had speculated would be possible.