Approximations: From symbolic to numerical computation ... · Approximations: From symbolic to numerical computation, and applications Master d’informatique fondamentale École

Approximations:From symbolic to

numerical computation,and applications

Master d’informatique fondamentaleÉcole normale supérieure de Lyon

Fall-winter 2013

Nicolas Brisebarre Bruno Salvy

http://www.ens-lyon.fr/LIP/AriC/M2R/ASNA

Table of contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1 From symbolic to numerical computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.1.1 Examples of numerical instability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.1.2 Efficiency issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2 Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3 Symbolic Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3.1 Representation of mathematical objects . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.3.2 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

One minute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Complexity models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Asymptotic estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Integers and polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Polynomial approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1 Density of the polynomials in (C([a, b]),‖.‖∞) . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2 Best L∞ (or minimax) approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3 Polynomial interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Linear algebra. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

The divided-difference method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Lagrange’s Formula. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4 Interpolation and approximation, Chebyshev polynomials . . . . . . . . . . . . . . . . . . . 232.5 Clenshaw’s method for Chebyshev sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.6 Computation of the Chebyshev coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 D-Finiteness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1 Linear differential equations and linear recurrences . . . . . . . . . . . . . . . . . . . . . . . 273.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.1.2 Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2 Closure properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2.1 Sum and product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2.2 Hadamard product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.3 Algebraic series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.4 Binary splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.4.1 Fast computation of n! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.4.2 General case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.5 Numerical evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.5.1 exp(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.5.2 Analytic D-finite series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.5.3 Analytic continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.5.4 Bit burst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 Rational Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.1 Why rational approximation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2 Best L∞ approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.2.1 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.2.2 Equioscillation and unicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3

4.3 An extension of Taylor approximation: Padé approximation . . . . . . . . . . . . . . . . . . 394.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.3.2 Rational fraction reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.3.2.1 A reminder of the extended Euclidean algorithm . . . . . . . . . . . . . . . . . . 404.3.2.2 Solving the approximation problems P1 and P2 . . . . . . . . . . . . . . . . . . . 43

4.3.3 Summary for the case of Padé approximation . . . . . . . . . . . . . . . . . . . . . . . 444.4 Application of Padé approximation to irrationality and transcendence proofs . . . . . . . 44

5 Numerical approximation using Padé approximants . . . . . . . . . . . . . . . . . . . 47

5.1 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.1.1 Starting from power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.1.2 Acceleration of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Huygens and the calculation of π. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Euler’s method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Aitken’s ∆2 method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Shanks’ method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Slow convergence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Wynn’s ε-algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.2 Continued Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.2.1 Definition and notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.2.2 Möbius transformations and the Riemann sphere . . . . . . . . . . . . . . . . . . . . . 52

The Riemann sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Möbius transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.2.3 Basic properties of continued fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.2.4 Relation to Padé approximants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.2.5 Changes of representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.2.6 Simple continued fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.2.7 Hypergeometric series and recurrences of order 2 . . . . . . . . . . . . . . . . . . . . . 55

Hypergeometric series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.2.7.1 The hypergeometric 0F1(α; z) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.2.7.2 The hypergeometric 1F1(α; β; z) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.2.7.3 The hypergeometric 2F1(α, β; γ; z) . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.3.1 Series expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.3.2 Positive coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.3.3 Fractions with complex coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.3.4 Speed of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.3.5 Stieltjes series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6 Orthogonal polynomials - Chebyshev series . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.1 Orthogonal polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636.2 A little bit of quadrature: Gauss methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666.3 Lebesgue constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.3.1 Lebesgue constants for polynomial interpolation . . . . . . . . . . . . . . . . . . . . . . 676.3.2 Lebesgue constants for L2 best approximation . . . . . . . . . . . . . . . . . . . . . . . 686.3.3 Corollary: A first statement on the convergence of Chebyshev interpolants and truncatedChebyshev series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.4 Chebyshev expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696.4.1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696.4.2 Relation with Chebyshev interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.5 Chebyshev expansions for D-finite functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.5.1 Warm-up: Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716.5.2 Formal manipulation of Chebyshev series . . . . . . . . . . . . . . . . . . . . . . . . . . 716.5.3 An algorithm for Chebyshev expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726.5.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4 Table of contents

6.6 Ore polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746.6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746.6.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Degree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Euclidean division on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Euclidean algorithm for right gcds (gcrds) . . . . . . . . . . . . . . . . . . . . . . . 75

Extended Euclidean algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Least common left multiples (lclms) . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.6.3 Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756.6.4 Application to Chebyshev Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

7 Interval Arithmetic, Interval Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

7.1 Interval arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777.1.1 Operations on intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787.1.2 Floating-point interval arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

7.2 Interval functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

8 Linear recurrences and Chebyshev series . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

8.1 Constant coefficients and the power method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 858.1.1 Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 858.1.2 Inverse Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Application to recurrences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

8.1.3 Block Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878.2 Asymptotics of linear recurrences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

8.2.1 Poincaré’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878.2.2 Perron’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 898.2.3 From Perron’s theorem to asymptotic behaviour . . . . . . . . . . . . . . . . . . . . . . 908.2.4 Birkhoff-Trjitzinsky’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

8.3 Miller’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 908.4 Computation of Chebyshev series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

8.4.1 Shape of the recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 918.4.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 928.4.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 928.4.4 Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 938.4.5 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

9 Rigorous Polynomial Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

9.1 Taylor models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 959.1.1 Taylor series, automatic differentiation and Taylor models . . . . . . . . . . . . . . . 959.1.2 Arithmetic operations on Taylor models . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Addition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Multiplication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Composition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Basic functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

9.1.3 Ranges of polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 979.2 Chebyshev models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 979.3 A little, little, little bit of fixed-point theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

Table of contents 5

Chapter 1

Introduction

The classical presentation of mathematical methods usually leaves out the problems of actuallygetting numerical values. In practice, a compromise between accuracy and efficiency is desirableand this turns out to require the development of specific (and often very nice) algorithms. Ouraim in this course is to exhibit the interplay between symbolic and numerical computation in orderto achieve as precise (or guaranteed or proved) computations as possible, and fast ! (At least in anumber of problems).

1.1 From symbolic to numerical computations

1.1.1 Examples of numerical instability

Example 1.1. The Fibonacci recurrence.

It is classical that the recurrence

un+2=un+1+ un

admits a solution of the form

un= aϕn+ b ϕ̄n, with ϕ=

1+ 5√

2and ϕ̄=

1

ϕ=

1− 5√

2

the solutions of the characteristic polynomial x2−x− 1.The values of a and b are dictated by the initial conditions. The classical Fibonacci sequence

is obtained with u0=0 and u1=1, leading to a=−b=1/ 5√

, so that in particular un→∞ whenn→∞, since a> 0 and |ϕ|> 1. On the opposite side, the sequence obtained with a= 0, b=1, orequivalently u0= 1, u1= ϕ̄ is (ϕ̄n) whose terms tend to 0 when n→∞ since |ϕ̄|< 1. In practicehowever, this phenomenon is very difficult to observe, as the following experiments show.

Maple 1] phi:=[solve(x^2=x+1,x)];

[1/2 5√

+1/2, 1/2− 1/2 5√

]

First, a purely numerical experiment: we compute ϕ̄ numerically and use it to get the first 50 values

Maple 2] map(evalf,phi);

[1.618033988,−0.6180339880]Maple 3] phi2:=%[2];

−0.6180339880Maple 4] N:=50:

Maple 5] u[0]:=1:u[1]:=phi2:for i from 0 to N-2 do u[i+2]:=u[i]+u[i+1] od:

Maple 6] L:=[seq(u[i],i=0..N)];

7

[1,−0.6180339880, 0.3819660120,−0.2360679760, 0.1458980360,−0.0901699400, 0.0557280960,−0.0344418440, 0.0212862520, −0.0131555920, 0.0081306600, −0.0050249320, 0.0031057280, −0.0019192040, 0.0011865240, −0.0007326800, 0.0004538440, −0.0002788360, 0.0001750080, −0.0001038280, 0.0000711800,−0.0000326480, 0.0000385320, 0.0000058840, 0.0000444160, 0.000\0503000, 0.0000947160, 0.0001450160, 0.0002397320, 0.0003847480, 0.0006244800, 0.001009\2280, 0.0016337080, 0.0026429360, 0.0042766440, 0.0069195800, 0.0111962240, 0.0181158040,0.0293120280, 0.0474278320, 0.0767398600, 0.1241676920, 0.2009075520, 0.3250752440, 0.5259\827960, 0.8510580400, 1.377040836, 2.228098876, 3.605139712, 5.833238588, 9.438378300]

Here is a plot of these values:

Maple 7] plots[listplot]([seq(u[i],i=0..N)]):

The problem is that the numerical error introduced when replacing ϕ̄ by a 10-digit approximationamounts to having a very small, but nonzero, value for a. At first, this goes unnoticed, buteventually, since ϕn tends to infinity, it overtakes the part in ϕ̄n.A natural solution is to work exactly , starting with a symbolic value for ϕ̄ and reproducing thesame steps using symbolic computation:

Maple 8] phi2:=phi[2];

1/2− 1/2 5√

Maple 9] u[0]:=1:u[1]:=phi2: for i from 0 to N do u[i+2]:=u[i]+u[i+1]

od:L:=[seq(u[i],i=0..N)];

[1, 1/2− 1/2 5√

, 3/2− 1/2 5√

, 2− 5√

, 7/2− 3/2 5√

, 11/2− 5/2 5√

, 9− 4 5√

,29

2− 13/

2 5√

,47

2− 21/2 5

√, 38 − 17 5

√,123

2− 55

25√

,199

2− 89

25√

, 161 − 72 5√

,521

2− 233

25√

,

843

2− 377

25√

, 682 − 305 5√

,2207

2− 987

25√

,3571

2− 1597

25√

, 2889 − 1292 5√

,9349

2−

4181

25√

,15127

2− 6765

25√

, 12238− 5473 5√

,39603

2− 17711

25√

,64079

2− 28657

25√

, 51841−

23184 5√

,167761

2− 75025

25√

,271443

2− 121393

25√

, 219602 − 98209 5√

,710647

2−

317811

25√

,1149851

2− 514229

25√

, 930249 − 416020 5√

,3010349

2− 1346269

25√

,4870847

2−

2178309

25√

, 3940598− 1762289 5√

,12752043

2− 5702887

25√

,20633239

2− 9227465

25√

, 1669\

2641 − 7465176 5√

,54018521

2− 24157817

25√

,87403803

2− 39088169

25√

, 70711162 − 3162\

2993 5√

,228826127

2− 102334155

25√

,370248451

2− 165580141

25√

, 299537289 − 13395\

7148 5√

,969323029

2− 433494437

25√

,1568397607

2− 701408733

25√

, 1268860318 − 56745\

1585 5√

,4106118243

2− 1836311903

25√

,6643838879

2− 2971215073

25√

, 5374978561− 240376\

3488 5√

,17393796001

2− 7778742049

25√

,28143753123

2− 12586269025

25√

]

8 Introduction

However, a new difficulty occurs:Maple 10] plots[listplot]([seq(u[i],i=0..N)]):

Again, the values explode eventually, although we have exact values all along. The reason for thislies in the numerical evaluation of the large coefficients involved in the exact expression. This canbe seen by evaluating both terms in each value separately:

Maple 11] u[50];

28143753123

2− 12586269025

25√

Maple 12] A:=[op(%)];

[28143753123

2,−12586269025

25√

]

Maple 13] evalf(A);

[14071876560.0,−14071876560.0]Thus, in this case, increasing the precision of the numerical evaluation is sufficient:

Maple 14] evalf(u[50],20);

0.0

Maple 15] evalf(u[50],30);

3.55318637× 10−11

Note that since both summands in the expression grow like ϕn and we are computing a value thatdecreases like ϕ̄n, the number of required digits grows linearly with n, making such a computationcostly.

The behaviour of this sequence is by no means an isolated accident. Every time a sequence hasan asymptotic behaviour which is not the dominating one, its direct evaluation presents this kindof difficulty. A simple and efficient way to compute such sequences will be presented in Chapter 8.

Example 1.2. The Airy function.This is a classical special function, with many applications in asymptotic analysis and math-

ematical physics (it is related to the location of the supernumerary rays that are sometimes visibleunderneath a rainbow). It can be defined by the following equations

y ′′(x)− x y(x)= 0, y(0)= 3−2/3

Γ(2/3), y ′(0)=−3

1/6Γ(2/3)

2 π.

For very similar reasons, solving this equation numerically by a scheme like Euler’s or Runge-Kuttais bound to explode and fail to capture the true behaviour of this function, which tends to 0 asx→∞.Maple 16] deq:={diff(y(x),x,x)-x*y(x)=0,y(0)=3^(1/3)/3/GAMMA(2/3),D(y)(0)=-3^(1/

6)*GAMMA(2/3)/2/Pi};

1.1 From symbolic to numerical computations 9

{

d2

d x2y(x)−x y(x)= 0, y(0)=1/3 3

3√

Γ (2/3),D(y)(0)=−1/2 3

6√

Γ (2/3)

π

}

Maple 17] dsolve(deq,y(x));

y(x)=A i(x)

Maple recognizes this function that it knows about and can plot it from there:

Maple 18] plot(rhs(%),x=-10..10):

But the numerical solver ends up exploding:

Maple 19] plots[odeplot](dsolve(deq,y(x),numeric),x=-10..10,color=black):

Note that apart from this region where x is large, the numerical solver behaves very well, as wecan see by superposing both curves

Maple 20] plots[display](%%,%):

Here, this behaviour is very unfortunate: this function has been isolated and given a name bymathematical physicists precisely because it has a mild behaviour at ∞. It is therefore necessaryto find other ways for its evaluation. An efficient approach to the guaranteed computation of suchfunctions with high precision will be presented in Chapter 3.

10 Introduction

1.1.2 Efficiency issues

A typical example is provided by the following equation

y ′′(x)− y(x) = 0, y(0)= 0, y ′(0)= 1,that defines the sine function. Suppose we want to evaluate numerically this function on the interval[0, π] with, say, absolute error bounded by ε= 10−10.

It is easy to compute many terms of the Taylor expansion of sinand make sure that the errorcommited in truncating the power series is negligible compared to ε. However, approximation bythe Taylor expansion is usually not the most efficient way to approximate such a function. We willsee in Chapter 2 another approach based on

1. computing a Chebyshev series symbolically instead of a Taylor series (i.e., expand on thebasis of Chebyshev polynomials (Tn(x)) rather than the power basis (xn));

2. evaluating this series numerically at well chosen points;

3. computing a polynomial of small degree interpolating these points, while containing theerrors to be smaller than ε;

4. (optionally) use a process called “economization” to compute a polynomial that is evencheaper to evaluate.

All these steps are motivated by the fact that evaluating a polynomial can be rather efficient.

1.2 Approximations

We now list a few of the questions dealt with in this course, whose natural habitat lies sometimeswithin approximation theory and sometimes within symbolic computation.

• Compute the first 1000 digits of π, ln 2, 7√

, exp(-10), ... (see Chapter 3);

• Compute the floating point number in the IEEE standard that is closest to these numbers;• Compute the first 1000 Taylor coefficients of

1

1− x−x2 , arcsin(x), sin(tan(x))− tan(sin(x)),

or of the solutions of

y(x) = 1+ x y(x)5, y(x)= x+ x log1

1− y(x) , x2 y ′′(x)+ x y ′(x)+ (x2− 1) y(x)= 0

(Efficient algorithms exist in computer algebra, see Chapter 3).

• Compute a polynomial P of minimal degree such that

|f(x)−P (x)|< 10−15 for all x∈ [0, 1/4],

and for each of the functions above (see Chapter 2).

• Conversely, given a function f and a polynomial P , compute a bound on |f − P | on suchan interval (Chapter 2);

• Polynomials are not very good at approximating functions that have poles at or near theendpoints of the interval. It is therefore natural to ask the same questions with rationalfunctions instead of polynomials, minimizing the sum of the degrees of the numerator anddenominator (Chapters 4 and 5);

• Same questions when minimizing∫

0

1/4

(f(t)−P (t))2dtinstead;

1.2 Approximations 11

• Given (x1, y1), ..., (xn, yn), compute a polynomial of minimal degree (or a rational functionwith minimal sum of degrees) such that P (xi)= yi, i=1, ..., n;

• Same question with a fixed degree, minimizing∑

i=1

n

|P (xi)− yi| or∑

i=1

n

|P (xi)− yi|2;

• Same question with a given f , with yi= f(xi) and now the aim is to minimize |f(x)−P (x)|for x∈ [a, b] or

∫

a

b

|f(t)−P (t)|2dt;

• Same question if the choice of x1, ..., xn is free;• Compute these f(xi) when f is given by a linear differential equation (end of the course);• Compute integrals, zeroes of functions,...

For all these questions, the objects of study will be the existence and uniqueness of solutions, thediscovery of iterations converging to them or other means of computing them and their efficiency.Proofs of irrationnality are sometimes not very far.

1.3 Symbolic Computation

We want to use symbolic computation tools to help produce good numerical approximations tofunctions. In the setting of approximation theory, the input is often given as “a function f con-tinuous (say) over an interval [a, b].” In the setting of symbolic computation, it is necessary to bemuch more precise concerning the way this function and this interval are given, as well as take intoaccount constraints due to the algorithms. We thus conclude this first course by a brief introductionto symbolic computation.

A fundamental issue is that not all mathematics can be decided by a computer. An importantresult in this area is the following.

Theorem 1.3. [Richardson-Matiyasevich] There cannot exist an algorithm taking as input anyfunction of one variable f(x) built by repeated composition from the constant 1, addition, subtrac-tion, multiplication, the functions sine and absolute value, and that decides whether f=0 or not.

This result restricts the class of functions that can be handled in symbolic computation systems.It also implies that simplification is at best a collection of useful heuristics. The way out of thisundecidability result is to stay within algebraic constructions that preserve decidability.

1.3.1 Representation of mathematical objects

We call effective sets of objects of mathematics that can be dealt with (meaning, for which there isa representation such that arithmetic operations and test for equality to 0 are given by algorithms)are: machine integers (usually Z/264Z or Z/232Z); integers of arbitrary size; vectors and matricesover an effective ring; polynomials in one or several variables over such a ring; rational functions;truncated power series.

An important idea that enlarges the scope of symbolic computation is that equations are adata-structure for their solutions. Thus, using univariate polynomials as a data-structure lets onemanipulate algebraic numbers and therefore work in the algebraic closure of any effective field(the algorithms are nothing but Euclid’s algorithm and the computation of resultants). A typicalexercice in this thread is to prove automatically the following beautiful identity:

sin(2π/7)

sin2(3π/7)− sin(π/7)

sin2(2π/7)+

sin(3π/7)

sin2(π/7)= 2 7√

.

12 Introduction

Similarly, and closer to the aim of this course, the solutions of linear differential equations withpolynomial coefficients over an effective field enjoy a large number of closure properties madeeffective by simple algorithms that will be presented in Chapter 3. Not only can one show auto-matically identities like sin2 + cos2 = 1, but this gives access to an effective access to variousoperations with special functions and orthogonal polynomials, thereby providing us with a largeset of effective “functions” that require approximation.

1.3.2 Efficiency

One minute

The basic algorithms of symbolic computation are extremely efficient. Thus in one minute ofcpu time of a typical modern computer, it is possible to compute

• the product of integers with 500,000,000 digits;• the factorial of 20,000,000 (the result has roughly 140× 106 digits);• (by contrast, only the factorisation of a 45-digits number can be done within this time);• the product of polynomials in K[x] of degree 14× 106 (where K=Z/pZ with p= 67, 108,

879 is a 26-bit long prime number);

• the gcd of polynomials of K[x] of degree 600, 000;• the determinant of matrices of size 4, 500× 4, 500 over K;• the determinant of matrices of size 700× 700 if their entries are 32-bit integers.

Complexity models

A simple means of assessing the efficiency of algorithms is by analyzing their complexity. Thisentails selecting a complexity model that defines precisely what operations are counted in theanalysis. One such model that is commonly used is the RAM model (for Random Access Machine).In this model, the machine has one tape where it reads its input (one integer per cell in the tape);another tape where it writes its output; a tape for its computations. It has an arbitrary numberof registers and the operations that are counted at unit cost are: reading or writing a cell; adding,multiplying, subtracting, dividing integers; jumps that can be either unconditionnal or dependingon tests of the type ‘=0’ or ‘>0’ on the value of a register.

The complexity measured in this model is called the arithmetic complexity . While it is a goodmeasure of time when working in settings where the sizes of the integers are all similar, for instancefor polynomials or matrices over a finite field, this model does not predict time correctly whenlarge size integers come into play. A variant of this model is then used, where the cells can onlycontain one bit (0 or 1) and the operations only act on one bit as well. The complexity measuredin this model is called bit complexity .

Asymptotic estimates

On modern computers, a computation that takes more than 10 seconds is usually alreadyspending its time in the part that dominates the other ones asymptotically and thus fair predictionsof execution time can be obtained by a first-order asymptotic estimate of the complexity. In eachcase it is of course necessary to specify the complexity model (arithmetic or bit complexity in thiscourse).

For instance, the computation of n! requires

• O(n2 log2n) bit operations by the naïve algorithm and only O(n) arithmetic operations;• O( n√ log n) arithmetic operations with the currently known best algorithm in terms of

arithmetic operations;

• O(n log3n loglogn) bit operations, with an algorithm presented in Chapter 3, which explainsthe speed displayed above.

Integers and polynomials

1.3 Symbolic Computation 13

The starting point for fast algorithms in symbolic computation is fast multiplication. We useM(N) to denote the arithmetic complexity of the product of two polynomials of degree boundedby N in one variable. Then,

M(N)=

O(N2) by the naïve algorithm;

O(N log23) by Karatsuba’s algorithm;O(N logN loglogN ) by fast Fourier transform (FFT)..

Thus asymptotically, multiplication is not much slower than addition.Similarly, we use MZ(N) to denote the complexity of the product of two integers of at most N

bits. The algorithms are a bit more intricate because of the need to deal with carry propagation,but the end result is almost the same:

MZ(N) =

O(N2) by the naïve algorithm;

O(N log23) by Karatsuba’s algorithm;O(N logN loglogN) by fast Fourier transform (FFT).

These will be the building blocks for fast algorithms in the following chapters. In particular, it isimportant to keep in mind the following complexity estimates:

• For power series, product, inverses, square-roots and more generally solutions of polynomialscan be computed in O(M(N)) arithmetic operations;

• For polynomials, gcd, multipoint evaluation (evaluation of a polynomial of degree N at Npoints) and interpolation can be computed in O(M(N)logN) arithmetic operations.

14 Introduction

Chapter 2

Polynomial approximations

In this chapter, we present various theoretical and algorithmic results regarding polynomial approx-imations of functions. We will mainly deal with real-valued continuous functions over a compactinterval [a, b], a, b∈R, a6 b. We will denote C([a, b]) the real vector space of continuous functionsover [a,b]. In the framework of function evaluation one usually works with the following two normsover this vector space:

• the least-square norm L2: given a nonnegative weight function w ∈ C([a, b]), if dx denotesthe Lebesgue measure, we write

g ∈L2([a, b], w, dx)if

∫

a

b

w(x) |g(x)|2dx

We first show that En(f)→0 as n→∞, a result due to Weierstraß theorem, 1885). Various proofsof this result have been published, in particular, those by Runge (1885), Picard (1891), Lerch(1892 and 1903), Volterra (1897) Lebesgue (1898), Mittag-Leffler (1900), Fejér (1900 and 1916),Landau (1908), la Vallée Poussin (1908), Jackson (1911), Sierpinski (1911), Bernstein (1912),Montel (1918). The text [16] is an interesting account on Weierstraß’ contribution to Approxima-tion Theory and, in particular, his fundamental result on the density of polynomials in C([a, b]).

We give now one proof inspired by Bernstein’s one.

Theorem 2.1. [Weierstraß, 1885] For all f ∈C([a,b]) and for all ε>0, there exists n∈N, p∈Rn[x]such that ‖p− f ‖∞ 0. The function f is continuous and hence uniformly continuous over [0, 1], hence thereexists δ > 0 such that

∀x1, x2∈[0, 1], |x2−x1|δ

n

bn,k(x) = ε+2M∑

k=0|x−k/n|>δ

n

bn,k(x).

Note that we actually have∣∣∣∣∣∣∣

∑

k=0|x−k/n|>δ

n

bn,k(x)

∣∣∣∣∣∣∣

6

∣∣∣∣∣∣∣

∑

k=0|x−k/n|>δ

n (x− k/n

δ

)2

bn,k(x)

∣∣∣∣∣∣∣

6x (1−x)n δ2

.

16 Polynomial approximations

Therefore, we obtained |f(x)−Bn(f , x)|6 ε + M2nδ2 . The upper bound does not depend on x andcan be made as small as desired. �

Remark 2.2. One of the very nice features of this proof is that it provides an explicit sequence ofpolynomials which converges to the function f . It is worth mentioning that Bernstein polynomialsprove useful in various other domains (computer graphics, global optimization, ...). See [7] forinstance.

Note that, in the proof, we only used the values of the Bn(f , x) for 06n6 2. In fact, we havethe following result.

Theorem 2.3. (Bohman and Korovkin) Let Ln be a sequence of monotone linear operators onC(|a, b]), that is to say: for all f , g ∈C([a, b])• Ln (µf +λg)=λLn(f) + µLn(g) for all λ, µ∈R,• if f(x)>g(x) for allx∈ [a, b] then Lnf(x)>Lng(x) for allx∈ [a, b],

the following conditions are equivalent:

i. Lnf→ f uniformly for all f ∈C([a, b]);ii. Lnf→ f uniformly for the three functions x 7→1, x, x2;iii. Ln1→1 and (Lnφt)(t)→0 uniformly in t∈ [a, b] where φt :x∈ [a, b] 7→ (t− x)2 .

Proof. See [4]. �

A refinement of Weierstraß’s theorem that gives the speed of convergence is obtained in termsof the modulus of continuity.

Definition 2.4. The modulus of continuity of f is the function ω defined as

for all δ > 0, ω(δ) = sup|x−y | 0 and x∈ [0, 1]. Let k ∈ {0, ..., n} such that |x− k/n|6 δ, then |f(x)− f(k/n)|6w(δ). Since bn,k(y)> 0 for all y ∈ [0, 1], we have

∣∣∣∣∣∣∣

∑

k=0|x−k/n| δ. Let M =⌊

|x− k/n|δ

⌋

, let yj = x+j

M +1(k/n− x)

for j=0, ...,M +1. Note that, for all j=0, ...,M , we have |yj+1− yj |

For all x∈ [0, 1], we can write

|f(x)−Bn(f , x)|6

∣∣∣∣∣∣∣

∑

k=0|x−k/n|δ

n

(f(x)− f(k/n)) bn,k(x)

∣∣∣∣∣∣∣

6w(δ) +∑

k=0|x−k/n|>δ

n

w(δ)

(

1+1

δ2

∣∣∣∣x− k

n

∣∣∣∣

2)

bn,k(x)

6w(δ)

2+

1

δ2

∑

k=0|x−k/n|>δ

n (

x− kn

)2

bn,k(x)

6w(δ)

(

2+x (1−x)n δ2

)

6w(δ)

(

2+1

4n δ2

)

.

Finally, replace δ with 1/ n√

. �

Remark 2.6. This result is not optimal. For improvements and refinements, see Section 4.6 of[4] or Chapter 16 of [17] for a presentation of Jackson theorems.

Corollary 2.7. When f is Lipschitz continuous, En(f)=O(n−1/2

).

2.2 Best L∞ (or minimax) approximation

The infimum En(f) is reached, thanks to the following proposition.

Proposition 2.8. Let (E, ‖·‖) be a normed R-vector space, let F be a finite dimensional subspaceof (E, ‖·‖). For all f ∈E, there exists p∈F such that ‖p− f ‖=minq∈F ‖q − f ‖. Moreover, theset of best approximations to a given f ∈E is convex.

Proof. Let f ∈E. Consider F0={p∈F :‖p‖62 ‖f ‖}. Then F0 is nonempty (it contains 0), closed,bounded, and we assumed dimF ‖p‖ −||f |> ‖f ‖> ϕ(p⋆) since 0 ∈ F0.Thus,‖f − p⋆‖=minp∈F ||f − p‖.

Now, let p and q∈F be two best approximations to f . For all λ∈ [0,1], the vector λp+(1−λ)qis an element of the vector space F and we have, from the triangle inequality, ‖λp+(1−λ)q− f ‖6λ‖p− f ‖+(1−λ)‖q− f ‖=minq∈F ‖q− f ‖: the vector λp+(1−λ)q is also a best approximationto f . �

The best L2 approximation is unique, which is not always the case in the L∞ setting.

Exercise 2.1. Consider the following simple situation : the interval is [−1, 1], f is the constant function 1 andF =Rgwhere g:x→x2. Determine the set of best L∞ approximations to f .

In the case of L∞, it is necessary to introduce an additional condition known as the Haarcondition.

Definition 2.9. Consider n + 1 functions ϕ0, ..., ϕn defined over [a, b]. We say that ϕ0, ..., ϕnsatisfy the Haar condition iff

a) the ϕi are continuous;

b) and the following equivalent statements hold:

• for all x0, x1, ..., xn∈ [a, b],∣∣∣∣∣∣

ϕ0(x0) ··· ϕn(x0)··· ···ϕ0(xn) ··· ϕn(xn)

∣∣∣∣∣∣

=0 ⇔ ∃i=/ j , xi=xj;


• given pairwise distinct x0, ..., xn ∈ [a, b] and values y0, ..., yn, there exists a uniqueinterpolant

p=∑

k=0

n

αkϕk,withαk ∈R, ∀k=0,... , n,

such that p(xi) = yi, ∀i=0, ..., n;• any p=∑

k=0n

αkϕk=/ 0 has at most n distinct zeros in [a, b].

Exercise 2.2. Prove that the conditions above are equivalent.

A set of functions that satisfy the Haar condition is called a Chebyshev system . The prototypeexample is ϕi(x) =x

i, for which we have∣∣∣∣∣∣

ϕ0(x0) ··· ϕn(x0)··· ···ϕ0(xn) ··· ϕn(xn)

∣∣∣∣∣∣

=

∣∣∣∣∣∣

1 ··· x0n··· ···1 ··· xnn

∣∣∣∣∣∣

=Vn=∏

06i

Example 2.13. The best approximation to cos over [0, 10π] on the Chebyshev system {1, x, x2}is the constant function 0! Moreover, the same is true for {1, x, ..., xh} up to and including h=9.

Proof. We can assume that f ∈/ SpanR{ϕ0, ..., ϕn}.We already proved the existence of a best approximation.We now show that the equioscillation property implies optimality of the approximation. Let

p be an approximation with equioscillating error function, and suppose that there exists q=∑

βj ϕjwith ‖f − q‖< ‖f − p‖. Writing p− q=(p− f)− (q− f), we see that p− q changes sign betweeneach pair of consecutive xi. It follows from the intermediate value theorem that there exist (n+1)points y0, ..., yn such that x0< y0

Finally, let us prove the uniqueness. Let p, q be two best approximations, and let

µ= ‖f − p‖∞= ‖f − q‖∞.

It follows from Proposition 2.8 that1

2(p + q) is a best approximation too. Thus there exist

t0< t1< ···

Proof. We show that the La Vallée-Poussin theorem tells us that after each iteration, we have|ε|6En(f)6 |ε|+ δ. �

We will not give more details concerning this algorithm. See [4] or [17].

Theorem 2.17. Let pk denote the value of p after k(n+2) loop turns, and let p∗ be such that

En(f)= ‖f − p∗‖∞. There exists θ ∈ (0, 1) such that ‖pk− p∗‖∞=O(θk).

Under mild regularity assumptions, the bound O(θk) can in fact be improved to O(θ2k

) [25].

2.3 Polynomial interpolation

Now we restrict our study to polynomials in Rn[x].At this stage, it seems natural to focus on techniques for computing polynomials that interpolate

functions at a given finite family of points:

• sometimes a finite number of values is the only information we have on the function,• Step 2.a of Remez’ algorithm requires an efficient interpolation process,• Theorem 2.10 shows that, for all n, there exists a 6 z0 < z1 < ··· < zn 6 b such that

f(zi) = p∗(zi) for i=0, ..., n, where p∗ is the minimax approximation of f : the polynomial

p∗ is an interpolation polynomial of f .

Let A be a commutative ring (with unity). Given pairwise distinct x0, ..., xn ∈ A and corres-ponding y0, ..., yn∈A, the interpolation problem is to find p∈An[x] such that p(xi)= yi for all i.Write p=

∑

kakx

k. The problem can be restated as

V ·a= y (2.2)where V is a Vandermonde matrix. If detV is invertible, there is a unique solution.

From now on we assume A=R. The expression (2.1) of the Vandermonde determinant showsthat as soon as the xi are pairwise distinct, there is a unique solution. We now discuss several waysto compute the interpolation polynomial.

Linear algebra. We could invert the system (2.2) using standard linear algebra algorithms. Thistakes O(n3) operations using Gaussian elimination. In theory, the best known complexity bound

is currently O(nθ) where θ≈ 2.3727 (Williams). In practice, Strassen’s algorithm yields a cost ofO(nlog2 7). There are issues with this approach, though:

• the problem is ill-conditioned: a small perturbation on the yi leads to a significant perturb-ation of the solution,

• we can do better from the complexity point of view: O(n2) or even O(n logO(1)n) in general,O(n log n) if the xi are so-called Chebyshev nodes .

The divided-difference method. Newton’s divided-difference method allows us to computeinterpolation polynomials incrementally. The idea is as follows. Let pk ∈ Rk[x] be such thatpk(xi)= yi for 06 i6 k6n, and write

pn+1(x)= pn(x)+ an+1 (x−x0) ··· (x− xn).Then we have

pn+1(xj) = yj , 06 j6n,

pn+1(xn+1) = pn(xn+1)+ an+1 (xn+1−x0) ··· (xn+1−xn).

Given y0, ..., yk, we denote by [y0, ..., yk] the corresponding ak: Then, we can compute ak usingthe relation

[y0, ..., yk+1] =[y1, ..., yk+1]− [y0, ..., yk]

xk+1− x0.


This leads to a tree of the following shape.[y0, ..., yn]

[y0, ..., yn−1]

...

[y0] = y0 [y1] = y1

...

[y1, ..., yn]

... ...

[yn−1] = yn−1 [yn] = yn

Hence, the cost for computing the coefficients in O(n2) operations.

The evaluation cost at a given point z is in O(n) operations in R.

Lagrange’s Formula. For all j, let

Lj(x)=∏

k=/ j

x− xkxj− xk

.

Then we have degLj=n and Lj(xi)= δi,j for all 06 i, j6n. The polynomials Lj, 06 j6n, forma basis of Rn[x], and the interpolation polynomial p can be written

p(x)=∑

i=0

n

yiLi(x).

Thus, writing the interpolation polynomial on the Lagrange basis is straightforward.

What about the cost of evaluating the resulting polynomial at a given point z? If we do itnaively, computing Lj(z) costs (say) 2n subtractions, 2n+1 multiplications and 1 division. Thetotal cost is O(n2) operations in R.

But we can also write

p(x)=W (x)∑

i=0

nyi

(x− xi)W ′(xi), W (x) =

∏

i=0

n

(x− xi).

Assuming the W′

(xi) are precomputed, the cost of evaluating p(z) using this formula is only O(n)arithmetical operations.

Google “barycentric Lagrange interpolation” and/or see Trefethen’s book [22] for more inform-ation (this decomposition has excellent properties regarding stability issues).

2.4 Interpolation and approximation, Chebyshev polynomials

How useful is interpolation for our initial L∞ approximation problem? It turns out that the choiceof the points is critical. The more points, the better? Actually, with equidistant points, the errorcan grow with the number of points (Runge’s phenomenon).

Exercise 2.4. Using your computer algebra system of choice, interpolate the function

f :x 7→1

1+ 5x2

at the points −1+ 2 kn, 06 k6n, for n= 10, 15, ..., 30. Compare with f on [−1, 1].

In short, we should never use equidistant points when approximating a function by interpola-tion. Are there better choices?

Theorem 2.18. [Faber] For each n, let a system of n+1 distinct nodes ξ0(n), ..., ξn

(n)∈ [a, b]. Thenfor some f ∈C([a,b]), the sequence of errors (‖f − pn‖∞)n∈N is unbounded, where pn∈Rn[x] denotethe polynomial which interpolates f at the ξ

0

(n), ..., ξn

(n).

2.4 Interpolation and approximation, Chebyshev polynomials 23

We discuss better choices below. We start with the following analogue of the Taylor-Lagrangeformula.

Theorem 2.19. Let a 6 x0 < ··· < xn 6 b, and let f ∈ Cn+1([a, b]). Let p ∈ Rn[x] be such thatf(xi)= p(xi) for all i. Then, for all x∈ [a, b], there exists ξx∈ (a, b) such that

f(x)− p(x) = f(n+1)(ξx)

(n+1)!W (x), W (x)=

∏

i=0

n

(x−xi).

Proof. This is obvious when x∈{xi, i=0, ..., n}. Assuming x∈/ {xi, i=0, ..., n}, let ϕ= f − p−λWwhere λ is chosen so that ϕ(x)=0. Then, we have ϕ(xi)=0 for all i, and by Rolle’s theorem thereexist n+1 points y1 < ··· < yn+1 with ϕ′(yi) = 0. Iterating the argument, there exists ξ ∈ (a, b)such that ϕ(n+1)(ξ) = 0. Now recall that the polynomial W is monic and has degree n + 1, the

polynomial p has degree at most n: this implies W (n+1)(ξ) = (n + 1)! and p(n+1)(ξ) = 0, whichyields the result. �

This result encourages us to search for families of xi which make ‖W ‖∞ as small as possible.It’s time for us to introduce Chebyshev polynomials.

Assume [a, b] = [−1, 1]. The n-th Chebyshev polynomial of the first kind is defined byTn(cos t)= cos(n t),∀t∈ [0, 2π].

The Tn can also be defined by

T0(x)= 1, T1(x) =x, Tn+2(x)= 2xTn+1(x)−Tn(x),∀n∈N.Among their numerous nice features, there is the following result which suggests to consider acertain family of interpolation nodes.

Proposition 2.20. Let n∈N, n=/ 0.The minimum value of the set{

maxx∈[−1,1]

|p(x)|: p∈Rn[x], lc(p) = 1}

is uniquely attained for Tn/2n−1 and is therefore equal to 2−n+1.

Forcing W (x)= 2−nTn+1(x) leads to the interpolation points

µk= cos(2 k+1)π

2 (n+1), k=0, ..., n,

called the Chebyshev nodes of the first kind.Another important family is that of Chebyshev polynomials of the second kind Un(x), defined by

Un(cos x) =sin((n+1)x)

sin(x).

They can also be defined by

U0(x) = 1, U1(x) = 2x, Un+2(x)= 2xUn+1(x)−Un(x),∀n∈N.

For all n> 0, we haved

dxTn=nUn−1. So the extrema of Tn+1 are −1, 1 and the zeros of Un,

that is,

νk= cos

(k π

n

)

, k=0, ..., n,

called the Chebyshev nodes of the second kind. With W (x) = 2−n+1(1 − x2) Un−1(x), we have‖W ‖∞=2−n+1.

It is obvious that degTn=degUn=n for all n∈N. Therefore, in particular, the family (Tk)06k6nis a basis ofRn[x]. In the sequel of the chapter, we give results that allow for the (fast) computationof the coefficients of interpolation polynomials, at the Chebyshev nodes, expressed in the basis(Tk)06k6n.


Let∑′′ denote a sum such that the first and the last terms of the sum have to be halved.

Proposition 2.21. (Discrete orthogonality.) Let j , ℓ∈{0, ..., n}.i. We have

∑

k=0

n

Tj(µk)Tℓ(µk) =

0, j=/ ℓ,n+1, j= ℓ=0,n+1

2, j= ℓ=/ 0.

ii. We have

∑

k=0

n ′′Tj(νk)Tℓ(νk)=

0, j=/ ℓ,n, j= ℓ∈ {0, n},n

2, j= ℓ∈/ {0, n}.

Exercise 2.5. Prove the previous proposition.

The discrete orthogonality property implies the following (∑′ denotes that the first term of the

sum has to be halved).

Proposition 2.22.

i. If p1,n=∑

06i6n′

c1,iTi∈Rn[x] interpolates f on the set {µk : 06 k6n}, then

c1,i=2

n+1

∑

k=0

n

f(µk) Ti(µk).

ii. Likewise, if p2,n=∑

06i6n′′

c2,iTi interpolates f at {νk: 06 k6n}, then

c2,i=2

n

∑

k=0

n ′′f(νk)Ti(νk).

Proof. Exercise. �

2.5 Clenshaw’s method for Chebyshev sums

Given coefficients c0, ..., cN and a point t, we would like to compute the sum

∑

k=0

N

ckTk(t).

Recall that the polynomials Tk satisfy Tk+2(x) = 2x Tk+1(x) − Tk(x). A first idea would be touse this relation to compute the Tk(t) that appear in the sum. Unfortunately, this method isnumerically unstable. This is related to the fact that the Uk(x) satisfy the same recurrence butgrow faster: we have

‖Tk‖∞=1, ‖Uk‖∞= k+1.Clenshaw’s algorithm below does better.

Algorithm 2.2

Input. Chebyshev coefficients c0, ..., cN, a point t

Output.∑

k=0N

ckTk(t)

1. bN+1← 0, bN← cN2. for k=N − 1, N − 2, ..., 1

a. bk← 2 t bk+1− bk+2+ ck

2.5 Clenshaw’s method for Chebyshev sums 25

3. return c0+ t b1− b2

Proof. By definition of the bk, we have

∑

k=0

N

ckTk(t) = c0+(b1− 2 t b2+ b3) T1(t)+ ···+(bN−1− 2 t bN + bN+1)TN−1(t)+ cNTN(t).

The sum simplifies to c0+ b1 t+ b2 (T2− 2 t T1) using the recurrence relation and the values of bN ,bN+1. �

This algorithm runs in O(N) arithmetic operations.

2.6 Computation of the Chebyshev coefficients

Now, how do we compute the ck? Assume we want to perform interpolation at the Chebyshevnodes of the second kind and obtain the result on the Chebyshev basis: given y0, ..., yN, we arelooking for c0, ..., cN such that p(x)=

∑

j=0N ′′

cjTj(x) satisfies p(νk)= yk for all k.

By discrete orthogonality, we have

cj=2

N

∑

k=0

N

ykTk(νj).

Observe that we have

Tk(νj)= cos(

j kπ

N

)

and hence

cj=2

NRe

(∑

k=0

N

ykωjk

)

, ω= eiπ/N.

So the cj are (up to scaling) the real part of the discrete Fourier transform of the yk.The DFT is the map Y 7→VωY , where ω= e2πi/M and Vω=Vandermonde(1, ω, ..., ωM−1). We

have Vω−1 · Vω =M · Id, hence the DFT is almost its own inverse. The DFT sends the coefficientvector of a polynomial P =

∑

n=0M−1

ynxn to its values P (1), P (ω), ..., P (ωM−1).

Assume that M =2m is even; then, ωm=−1. Rewrite P asP (x)=Q0(X) (X

m− 1)+R0(X)=Q1(X) (Xm+1)+R1(X)with degR0, degR1

Chapter 3

D-Finiteness

In this chapter, we present a nice class of functions that:

• contains a large number of elementary and special functions;• admits fast algorithms for evaluation;• allows for automatic proofs of identities.

3.1 Linear differential equations and linear recurrences

3.1.1 Definition

Notation 3.1. K denotes a field, K[[x]] the ring of formal power series with coefficients in K,and K((x)) the field of fractions of K[[x]], that is, the field of formal Laurent series. Observe thatK((x)) is an algebra over K(x).

Definition 3.2. A formal power series A ∈ K[[x]] is called differentially finite (abbreviated D-finite) when its derivatives A, A′, A′′, ... span a finite-dimensional vector subspace of K((x))regarded as a vector space over K(x).

In other words, there exist polynomials p0(x), ..., pr(x) in K[x] such that A satisfies a lineardifferential equation of the form

p0(x)A(r)(x)+ ···+ pr−1(x)A′(x)+ pr(x)A(x)= 0.

Example 3.3. Rational functions are D-finite, and so are the classical exp, ln, sin, cos, sinh, cosh,arcsin(h), arccos(h), arctan(h) as well as many special functions like Bessel Jν, Iν, Kν, Yν, Airy Aiand Bi, the integral sine (Si), cosine (Ci) and exponential (Ei) and many more.

Our point of view on these objects is that differential equations will serve as a data structureto work with D-finite series.

Definition 3.4. A sequence (an) of elements of K is called polynomially recursive (abbreviatedP-recursive) when its shifts (an), (an+1), ... span a finite-dimensional vector space over K(n).

Translation. A sequence (an) is P-recursive when it satisfies a recurrence relation of the form

q0(n) an+ℓ+ ···+ qℓ(n) an=0, n> 0,with polynomial coefficients q0, ..., qℓ.

3.1.2 Translation

Theorem 3.5. A formal power series is D-finite if and only if its sequence of coefficients isP-recursive.

27

Proof. We have the following dictionary (actually a ring morphism in a suitable setting):

f(x) ↔ fnα f(x) αfnx f(x) fn−1x f ′(x) n fn.

By combining these rules, we can translate any monomial xi f (j)(x) (resp. ni fn+j), and hence anylinear differential equation/recurrence. �

Example 3.6. The differential equation y ′= y that defines λexp translates into (n+1)yn+1= ynthat defines λ/n!.

Example 3.7. The orders of the linear recurrence and differential equation do not necessarilymatch. For instance, the first order y ′ − xk−1y = 0, that defines λexp(xk /k) translates into thelinear recurrence (n + 1)yn+1 − yn−k+1 = 0 of order k. This recurrence has a vector space ofsolutions of dimension k, but only a subspace of dimension 1 corresponds to the solutions of thelinear differential equation. This subspace can be isolated by paying attention to the initial valuesduring the translation. Here, the identities y1= ···= yk−1=0 also come out of the translation.

Example 3.8. Assume we want to compute the coefficient of x1000 in

p(x)= (1+ x)1000 (1+x+x2)500.

A naive way of doing it would be to expand the polynomial. However, observing that

p′(x)p(x)

=1000

1+x+ 500

(2 x+1)

1+x+ x2,

yields a linear differential equation (LDE) of order 1 for p, with coefficients of order 3:

Maple 7] p:=(1+x)^1000*(1+x+x^2)^500:

deq:=numer(diff(y(x),x)/y(x)-diff(p,x)/p);

(1+ x)999 (x2+ x+1)499((

d

d xy(x)

)

x3+2

(d

d xy(x)

)

x2− 2000 y(x) x2+2(d

d xy(x)

)

x−

2500 y(x)x+d

d xy(x)− 1500 y(x)

)

This equation then translates into a linear recurrence equation (LRE) of order 3 with linearcoefficients:Maple 8] gfun:-diffeqtorec({%,y(0)=1},y(x),u(n));

{(−2000 + n) u(n) + (−2498 + 2 n) u (n + 1) + (−1496 + 2 n) u (n + 2) + (n + 3) u (n + 3),u(0)= 1, u(1)= 1500, u(2)= 1124750}Then it suffices to unroll this recurrence. A fast way of doing so is presented in §3.4.

3.2 Closure properties

3.2.1 Sum and product

Theorem 3.9. The set of D-finite power series in K[[x]] is a K-algebra. The set of P-recursivesequences with elements in K is a K-algebra as well.

Proof. We need to prove that both D-finite series and P-recursive sequences are stable under theoperations of sum and product. All these proofs are similar, products beeing slightly more difficultthan sums. We detail the case of products of D-finite series.

28 D-Finiteness

Let f , g ∈K[[x]] be D-finite series, and let h= f · g. We know that for all i, j,

f (i)∈VectK(x)(f , f ′, f ′′, ..., f (m)

)

g(j)∈VectK(x)(g, g ′, g ′′, ..., f (ℓ)

)

for some m and ℓ. Now, by Leibniz’ formula,

h(k)=∑

i=0

k (n

k

)

f i g(k−i)∈VectK(x){f (i) g(j) : 06 i6m, 06 j6 ℓ

},

i.e. the derivatives of h all lie in a finite-dimensional vector space. �

Note that these proofs are fully effective: for instance, from the differential equations for f and g,we can compute a differential equation for f g using linear algebra over K(x).

Example 3.10. A simple proof that

arcsin(x)2=∑

k=0

∞k!

1

2···( 1

2+ k)x2k+2

2 k+2.

The starting point is the observation that f(x) = arcsin(x) satisfies f ′ =1

1− x2√ , which makes it

possible to define it by the linear differential equation f ′′= x1−x2 f

′, plus initial conditions. Fromthere, we get the sequence of computations:

h= f2,

h′=2f f ′,

h′′=2f ′2+2x

1−x2 f f′,

h′′′=

(4x+2

1− x2 +6x2

(1− x2)2)

f f ′+2x

1− x2f′2.

At this stage we have 4 vectors (h, h′, h′′, h′′′) expressed in terms of 3 generators (f2, f f ′, f ′2). Alinear dependency is therefore found by looking for the kernel of a 3× 4 matrix. The coordinates ofa generator of the kernel are the coefficients of the desired differential equation. (Note that actuallyin this example, since f2 only occurs in h, it is sufficient to consider the three last equations.) Thiswhole computation is easily automated:

Example 3.11. A computation-free proof that sin2 + cos2 = 1. Both sin and cos are defined byy ′′+ y=0. If f is a solution of this equation, then as in the previous example, f2 is solution of alinear differential equation of order at most 3, all derivatives being generated by (f2, f f ′, f ′2). Thusboth sin2 and cos2 satisfy the same linear differential equation of order at most 3, and therefore sodoes their sum. Next, −1 is solution of a linear differential equation of order 1, namely y ′=0 andthus the sum sin2+ cos2− 1 and its derivatives live in a vector space of dimension at most 4, henceis solution of a linear differential equation of order at most 4. Moreover, since the computations onlyinvolve differential equations with constant coefficients, the leading term of the resulting equationis a constant, so that Cauchy’s theorem on differential equations applies. Thus, checking that thedesired solution of this equation is exactly 0 reduces to checking 4 initial conditions, ie, the proofis summarized by

sin2x+ cos2x− 1=O(x4)=⇒ sin2+ cos2=1.

Note that the actual value of the differential equation is not important here, only its order hasbeen used.

Note also that one can simplify the argument further (and reduce the order accordingly) if onetakes into account the fact that sin′= cos. Then we consider h= f2+ f ′2− 1 whose derivative ish′=2f f ′− 2f ′f =0 and thus checking the value at 0 is sufficient.

3.2 Closure properties 29

3.2.2 Hadamard product

Proposition 3.12. If f(x)=∑

fn xn and g(x)=

∑gn x

n are D-finite, then so is their Hadarmardproduct

f ⊙ g=∑

fn gnxn.

Proof. This is a combination of the previous results: since f and g are D-finite, their sequencesof coefficients (fn) and (gn) are P-recursive, then their product (fngn) is P-recursive too by theprevious theorem and finally the generating series of this product is D-finite. �

Example 3.13. Mehler’s identity for Hermite polynomials. Hermite polynomials are defined by

∑

n>0

Hn(x)zn

n!= exp(z (2x− z)).

Mehler’s identity asserts that

∑

n

Hn(x)Hn(y)zn

n!=

exp(

4z(x y − z (x2+ y2))1− 4z2

)

1− 4z2√ .

This can be proved easily by noticing that the left-hand side is nothing but

exp(z(2x− z))⊙ exp(z(2y− z))⊙∑

n=0

∞n!zn,

a Hadamard product of three factors that are clearly D-finite.

3.3 Algebraic series

Definition 3.14. A formal power series A(x) ∈ K[[x]] is algebraic if there exists a nonzeropolynomial P ∈K[x, y] such that P (x,A(x))= 0.

Theorem 3.15. Algebraic series are D-finite.

As a simple consequence, the series expansions of algebraic functions can be computed in onlylinear (ie, optimal) arithmetic complexity.

Proof. Without loss of generality, we can assume that P and Py =∂

∂yP are coprime. From the

equation P (x, A(x)) = 0, we get by differentiation

Px(x,A(x))+Py(x,A(x))A′(x) = 0.

We then invert Pymod P . Let U , V ∈K(x)[y] be the co-factors in Bézout’s identity, that satisfyUPy+VP =1. Then multiplying the previous equation by U(x,A(x)) leads to

U(x,A(x))Px(x,A(x))+(

1−V (x,A(x))P (x,A(x))︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︷︷︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸

0

)

A′(x)= 0.

The factor of A′ is exactly 1 (this was the aim of the inversion modulo P ) and the first term is apolynomial evaluated at A(x), which is therefore equal to the evaluation of its remainder in theEuclidean division by P . Denoting by δ the degree of P with respect to y, we obtain

A′(x)=R1(x,A(x)), with degyR1

Corollary 3.16. If f is D-finite and A is algebraic with A(0)=0, then f ◦A is D-finite.

Proof. Consider Vect(f (i)(A)Aj). �

3.4 Binary splitting

3.4.1 Fast computation of n!

Stirling’s formula tells us that

logn! =n logn−n+ 12log n+O(1) as n→∞.

Hence the bit size of n! is Θ(n logn) (and thus we cannot hope to compute it in less than Ω(n) bitoperations). Similarly, 1!, 2!, ..., n! taken together have size Θ(n2 log n).

In the naïve alogrithm (compute n! by successive multiplications by k, k = 2, ..., n), themultiplication k× (k− 1)! can be done by decomposing (k− 1)! into k chunks of roughly log k bits.The cost of the multiplication is then O(k log k log log k log log log k), and the total complexity

O(n2 log n log logn log log logn),

which is not too bad if we need all the values 1!, 2!, ..., n!.If however we want to compute n! alone, we can to much better. Define

P (a, b)= (a+1) (a+2) ··· b.Compute n! as

n! =P (0, n)= (1 · 2 ··· ⌈n/2⌉) · ((⌈n/2⌉+1) ···n) =P (0, ⌈n/2⌉)P (⌈n/2⌉+1, n)

and recurse. The key observation is that P (0, ⌈n / 2⌉) has size half of that of n! (by Stirling’sformula) and thefore so does the second factor. Assuming for simplicity that n is a power of 2, thebinary complexity can be bounded as follows:

C(0, n) =C(

0,n

2

)

+C(n

2, n)

+MZ

(n

2logn

)

,

6MZ

(n

2logn

)

+2C(n

2, n)

,

6MZ

(n

2logn

)

+2MZ

(n

4log n

)

︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︷︷︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸6MZ

(n

2log n

)

+4C

(3n

4, n

)

,

6 ···6MZ(n

2log n

)

log n=O(n log3n log logn).

In the second line, we use the fact that the factors increase; in the third line, we iterate theinequality once and use the convexity of the multiplication function M; in the last one the boundlog n on the number of recursive steps.

Finally, we have obtained n! in quasi-optimal bit complexity.

3.4.2 General case

We now consider a general recurrence

p0(n)un+r+ ···+ pr(n)un=0, n> 0 (3.1)with given initial values u0, ..., ur−1. Letting

Un=

unun+1···un+r−1

3.4 Binary splitting 31

lets us reduce the question to a first-order recurrence over vectors:

Un+1=1

p0(n)

p0(n)p0(n) ···

p0(n)pr(n) −pr−1(n) ··· −p1(n)

︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︷︷︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸︸A(n)

Un,

and hence we can apply the same idea and get

uN =1

p0(N − 1) ··· p0(0)A(N − 1)A(N − 2) ···A(0)U0.

This “matrix factorial” can be computed the same way as the real factorial above.

Theorem 3.17. Assume that the recurrence ( 3.1) is nonsingular ( 0 ∈/ p0(N)), that all pi havedegree bounded by d and integer coefficients bounded by 2ℓ, and that u0, ..., ur−1∈Q have numer-ators and denominators bounded by 2ℓ. Then one can compute uN in

O(rθN log2N log logN(d logN + ℓ))

bit operations, where rθ is a bound on the number of arithmetic operations needed to compute theproduct of two r× r matrices.

Proof. Consider the norm ‖u‖:=∑j=1r |uj | and the induced norm on matrices

‖M ‖:= max16j6r

∑

i=1

r

|mij |.The hypothesis implies

‖A(k)‖6Cr2ℓkd

for some C > 0. From there portions of the matrix factorial are bounded by

‖A(λn)‖···‖A(µn)‖6(Cr2ℓ)(λ−µ)n((λn)!

(µn)!

)d

(λ> µ)

and thus have size O((λ − µ)(dn log n+ ℓn)) as n→∞. From there the proof proceeds as in thecase of n!.

(Note that the values of pi(n) for 06n6N − 1 are also needed. They can be computed withinthe same complexity bound even with a naïve algorithm.) �

3.5 Numerical evaluation

The idea of binary splitting leads to fast methods for the numerical evaluation of a wide varietyof constants and functions.

3.5.1 exp(1)

As a first example, consider the sequence

en=∑

k=0

n1

k!→ e, n→∞.

Additionally, we have 0< e− en< 1n ·n! for all n, hence e− en< 10−N for n=O(N / logN). Now

we obtain a linear recurrence of order 2 satisfied by (en):

en+1− en= 1(n+1)!

=1

n+1(en− en−1).

32 D-Finiteness

By Theorem 3.17, it follows that e con be computed within precision 10−N in O(N log2N log logN)bit operations. (This gives us a huge rational number close to e. To get a binary/decimal expansion,there remains to do a division. The inverse of the denominator can be computed efficiently byNewton’s method for O(MZ(N)) bit operations.)

Example 3.18. All recent records in the computation of π were obtained using basically the sametechnique, starting from the following series due to the Chudnovsky’s

1

π=

12

C3/2

∑

n=0

∞(−1)n(6n)!(A+Bn)

(3n)!n!3C3n,

with A= 13, 591, 409, B= 545, 140, 134, C = 640, 320. This series yields about 14 digits per term.That alone is not sufficient to reach a good efficiency, which is achieved by observing that thesummand satisfies a linear recurrence of first order and applying binary splitting.

3.5.2 Analytic D-finite series

The fast computation of exp(1) can be generalized.

Consider y(z) solution to

a0(z) y(r)(z)+ ···+ ar(z) y(z)= 0, (3.2)

where the ai are polynomials. Our aim is to evaluate y numerically given initial conditions. Wefirst state an existence theorem, with an effective proof for later use.

Theorem 3.19. (Cauchy’s theorem.) If z0 ∈C is such that a0(D(z0, R)) ∋/ 0 then, for any y0, ...,yr−1, there exists a solution of Eq. ( 3.2) such that y

(i) = yi for 0 6 i < r and that is analyticin D(z0, R).

Proof. (By the Cauchy-Kovalevskaya method of majorants.) Rewrite (3.2) as

Y ′=AY , Y =

y

y ′···

y(r−1)

. (3.3)

Since a0(D(z0, R)) ∋/ 0, the matrix function A is analytic in D(z0, R). This implies that it has anexpansion as a power series

A(z)=∑

k>0

Ak (z − z0)k,

such that for 0< ρ 0 satisfying ‖Ak‖6 αρk+1

for all k> 0.

Now consider the formal power series defined by

Y (0)=

y0···yr−1

and Y ′=AY ,

that is, by

(n+1) Yn+1=∑

k=0

n

AkYn−k.

We have

(n+1) ‖Yn+1‖6∑

k=0

n

‖Ak‖ ‖Yn−k‖6∑

k=0

nα

ρk+1‖Yn−k‖.

Define (un) by u0= ‖Y0‖ and

(n+1)un+1=∑

k=0

nα

ρk+1un−k.

3.5 Numerical evaluation 33

Then ‖Yn‖6un. The generating series u(z)=∑

un zn satisfies

u′(z)=α

ρ− z u(z),hence

u(z) =u0

(

1− zρ

)−α.

Since the coefficients of Y are dominated by those of u, which is analytic, the series Y is convergentfor |z − z0|< ρ. And since we can do this for any ρN

Yn (z − z0)n∥∥∥∥∥

6∑

n>N

yn |z − z0|n

where

yn= y0 ρ−n (−1)n(−αn

)

= y0 ρ−n(α−n+1

n

)

6 y0 ρ−N(α+n− 1

N

)

|z − z0|N(

1+α+N

1+N

|z − z0|ρ

+ ···)

.

The series on the right-hand side is convergent, say its sum is bounded by M . Now, in order toensure

∥∥∥∥∥

∑

n>N

Yn (z − z0)n∥∥∥∥∥

6 10−k,

it is sufficient to take

k log 106N logρ

|z − z0|+ log(...) + cst

i.e.

N > klog 10

logρ

|z − z0|+ cst.

Combining this bound with the binary splitting method, we obtain the following theorem.

Corollary 3.20. When yi∈Q, ai∈Q[z], ζ ∈Q∩D(z0,R), all with numerators and denominatorsbounded by 2ℓ, then y(ζ) can be computed at precision 10−N in

O

N log2N log logNr logN + ℓ

logR

|ζ − z0|

bit operations. So can y ′(ζ), ..., y(r−1)(ζ).

3.5.3 Analytic continuation

Proposition 3.21. The set of solutions of Y ′=AY forms an r-dimensional vector space over C.

Proof. Consider Y [i] with initial conditions 0 except in the ith coordinate that is 1, for 16 i6 r. �

Definition 3.22. A fundamental matrix of Y ′ =A Y is a matrix W whose columns form a basisof solutions.

Clearly, such a matrix satisfies W ′ = A W and any solution can be written W · C with C aconstant vector.

Definition 3.23. The transition matrix between z0 and z1 ∈ D(z0, R) is the matrix such thatW (z1)=M(z0→ z1) ·W (z0).

34 D-Finiteness

This matrix is well defined sinceW (z0) is invertible. The fundamental matrix itself has a radiusof convergence and by Cauchy’s theorem the solution can be extended to an analytic continuationinside this new disk. Proceeding in this manner, one constructs a path (z0, z1, ..., zk) and transitionmatrices M(z0→ z1), M(z1→ z2), ...,M(zk−1→ zk) whose product (in the right order) constructsthe analytic continuation of the fundamental matrix along that path. Each of these matrices canbe computed efficiently by binary splitting if one takes for zi points with rational coordinatesbounded by 2ℓ in the notation of the previous section. With a bit more effort, the cumulated erroris bounded by the product of the norms ‖M(zi→ zi+1)‖ that can be controled.

3.5.4 Bit burst

The previous section shows how to compute the value of an arbitrary D-finite function at anarbitrary non-singular point given by its rational coordinates of moderate size. If the point ζ itselfis given at precision 2−N, then the estimate obtained with ℓ = N gives a quadratic complexity.However, this can be improved.

Proposition 3.24. For any ζ ∈ D(z0, R), the value y(ζ) can be computed at precision 2−N inO(N log3N loglogN) binary operations.

Proof. Compute z0, z1, ..., zm= ζ with

zi= ⌊22iζ ⌋2−2i,ie, zi is the binary number with the first 2i bits of ζ. The idea is that the computation of y(zi+1)needs more accuracy than that of y(zi), but fewer terms of the series expansion. Thus in thecomplexity estimate, the term

ℓ+ r logN

logR

|zi+1− zi|becomes

2i+ r logN

2i+ logRand their sum is

∑

i=1

logN (2i

2i+ logR+

r logN

2i+ logR

)

6

(∑

i=1

logN

1

)

+∑

i=1

logNr logN

2i6 (r+1)logN =O(logN).

�

3.5 Numerical evaluation 35

Chapter 4

Rational Approximation

4.1 Why rational approximation?

From a computer scientist’s point of view, the two most natural kinds of approximations to dealwith are polynomial and rational approximations. Indeed, polynomial approximations can be eval-uated using only the ring operations +, −, ×, and rational approximations using +, −, ×, /.However, we will use rational fractions only when they offer a significant advantage in terms ofapproximation, for the following reasons.

1. Rational functions are nonlinear objects.

2. Divisions are usually slower than multiplications, especially for small precisions. (Floating-point division at machine precision is currently about ten times as slow as multiplication.At high precisions, the cost of division is not much larger than that of multiplication, thanksto Newton’s method.)

The complexity/“nonlinearity” of rational functions can also be an advantage. For instance, it ishopeless to approximate a function by a polynomial in the neighborhood of a singularity, whilerational functions can “swallow” poles.

Let m,n∈N. Denote

Rm,n={

R=P

Q∈R(x) :P ∈Rm[x], Q∈Rn[x]\{0}

}

.

We could additionally assume P ∧Q=1 and lcQ=1 in the definition. Additionally, since we areinterested in continuous functions (and thus elements of Rm,n with no poles on [a, b]), we canreplace Rm,n by

Rm,n′ ={

R=P

Q∈R(x) :P ∈Rm[x], Q∈Rn[x]\{0}, |lcQ|=1, x∈ [a, b]⇒Q(x)> 0

}

in the statements of this chapter.

Let f ∈C([a, b]). Observe that {‖f −R‖, R∈Rm,n} is a nonempty subset of R+, and let

Em,n= infR∈Rm,n

‖f −R‖.

(This generalizes the notation En introduced for polynomials in Chapter 2.)

Let us compare the quality of approximations by polynomials and by rational functions on someexamples. In the case of f = exp over [0, 1], one finds

E4,4(f)= 4.95... · 10−13,E8,0(f)= 3.49... · 10−10,E10,0(f)= 1.98... · 10−14.

Using Horner’s scheme, evaluation of a polynomial of degree 8 uses 8 multiplications and 8 addi-tions. Evaluation of a deg 4/deg 4 rational fractions uses 8 multiplications, 8 additions, plus one(expensive!) division. Rational approximation makes little sense in this case.

37

If f(x)= tanh(5x) over [0, 1], one gets

E4,4(f)= 3.33... · 10−6,E8,0(f)= 2.99... · 10−4E15,0(f)= 2.49... · 10−6,E16,0(f)= 5.99... · 10−7.

Now consider f(x) =ex

2−x over [−1, 1]. We find

E5,1(f)= 5.11... · 10−6,E5,5(f)= 6.13... · 10−13,E22,0(f)= 6.43... · 10−13,E23,0(f)= 1.72... · 10−13.

Here rational functions perform much better; this is related to the presence of a pole of the functionnot too far from the segment we are interested in.

More examples:

• if we consider f(x)= |x| over [−1, 1], one can show

En,0(f)∼∞ βn,with β= 0.2801... [24],

En,n(f)∼∞ 8e−π n√

[19].

• Assume f(x) = ex on (−∞, 0]. Since f is bounded, it cannot be well approximated bynonconstant polynomials. In contrast, one can prove that

E0,n(f)n√

→ 13[18] and En,n∼ 2Hn+

1

2, H−1≈ 9.28 [1]

as n→∞. This last example is related to a question raised in [3].

Rational approximation is useful for

• function evaluation;• digital signal processing;• Diophantine approximation (e.g., irrationality proofs);• analytic continuation;• acceleration of convergence;• ...

4.2 Best L∞ approximations

4.2.1 Existence

Proposition 4.1. To each function f ∈ C([a, b]), there corresponds at least one best rationalapproximation in Rm,n.

Proof. By analogy with the polynomial case, we might be tempted to consider the set

{R∈Rm,n : ‖R− f ‖6 2 ‖f ‖}.This is a nonempty, closed and bounded set. It is not compact, though (remember that Rm,n isnot a finite-dimensional vector space over R, unlike Rn[X]). To illustrate this, simply consider the

sequence of continuous functions Rk(x) =1

kx+1, x∈ [0, 1], k ∈N. For any k ∈N, ‖Rk‖6 1 but the

function R defined as limk→+∞Rk is not continuous since R(0)= 1 and R(x)= 0 otherwise.

38 Rational Approximation

Instead, let (Rh)h∈N be a sequence of elements of Rm,n such that

‖Rh− f ‖∞→Em,n(f) and ‖Rh‖6 2 ‖f ‖.

Write Rh=Ph/Qh. We can assume that Qh(x) =/ 0, ∀x∈ [a, b] and‖Qh‖=1. We have

‖Ph‖6 ‖Qh‖ ‖Rh‖6 2 ‖f ‖,hence both (‖Ph‖) and (‖Qh‖) are bounded sequences, now in finite-dimensional vector spaces. Letϕ:N→N be a strictly increasing function such that Pϕ(h)→P ∗∈Rm[x] and Qϕ(h)→Q∗∈Rn[x].Note that ‖Q∗‖=1: Q∗is nonzero. Set R∗=P ∗/Q∗. For all x∈ [a, b] such that Q∗(x)=/ 0, we have

|f(x)−R∗(x)|= limh→∞

∣∣∣∣f(x)− Pϕ(h)(x)

Qϕ(h)(x)

∣∣∣∣6Em,n(f).

The same holds for the remaining x by continuity. �

4.2.2 Equioscillation and unicity

Let R=P /Q∈Rm,n, with P , Q∈R[x], P ∧Q=1, lcQ=1. Write µ= degP , ν= degQ.

Definition 4.2. The defect of R is the integer d(R)=min (m− µ, n− ν).

Theorem 4.3. (Achieser, 1930) Let f ∈ C([a, b]). A rational function R ∈ Rm,n is a bestapproximation to f if and only if R − f equioscillates between at least m+ n+ 2 − d(R) extremepoints. There is a unique best approximation.

Remark 4.4. There is again a Remez algorithm for computing best rational approximations, withthe same rate of convergence as in the polynomial case.

4.3 An extension of Taylor approximation: Padé approxima-tion

4.3.1 Introduction

Let K be a field, and let f ∈K[[x]]. Form∈N, the degree-m Taylor approximant to f is the uniquepm∈Km[x] such that

f(x)− pm(x)= 0 mod xm+1.

If now f ∈ Cm+1 in the neighborhood of 0 (instead of f being a formal series), the analogouscondition is f(x)− pm(x)=O(xm+1).

To extend this to rational functions, given f ∈ K[[x]], we would like to determine R = P /Q(P ∈Km[x], Q∈Kn[x]) such that

f(x)−R(x)= 0 modxm+n+1. (4.1)

Here again, we may also consider f ∈Cm+n+1, in which case we ask thatf(x)−R(x) =O(xm+n+1).

In contrast with the case of Taylor approximation, it is not always possible to satisfy (4.1).

Example 4.5. Consider m=n=1 and f(x)= 1+ x2+x4+ ···. If R is a solution, then

R(x)= 1+ x2 mod x3. (4.2)

Write R(x)= (a x+ b)/(c x+ d), where we can assume that a d− b c=/ 0 (otherwise R∈K). Then

R′(0)=a d− b cd2

=/ 0

which contradicts (4.2).

4.3 An extension of Taylor approximation: Padé approximation 39

If we consider instead the problem of finding P ∈Km[x] and Q∈Kn[x] such that

Q(x)f(x)−P (x) = 0 modxm+n+1,it always has a nontrivial solution: think of it as a linear algebra problem, the homogeneouslinear system has n+1+m+1= n+m+ 2 unknowns which are the coefficients of P and Q andn+m+1 équations. Actually this linear system is given by a so-called Toeplitz matrix of dimension(m+ n+ 1, n+ 1). It is a “structured matrix” for which fast inversion algorithms exist, with thesame costs as the ones given in Remark 4.13. Nevertheless, we favour another presentation of theproblem.

4.3.2 Rational fraction reconstruction

Definition 4.6. Let M, P ∈K[x]\{0}, with degM =N > deg P. Given k ∈ J1, NK, the rationalreconstruction of P moduloM is the determination of a pair (R,T )∈K[x]2 satisfying the conditions

P1. gcd(T ,M)= 1, degR

3. If n≥m, this algorithm requires at most (2m+1) (n−m+1)+1 arithmetic operations inK.

Proof. 1. Let (Q1, R1) and (Q2, R2)∈K[x]2 such that A=BQ1+R1 and degR1 1 and

Ui(A

B

)

=Wi(Ri−1Ri

)

=

(0 11 −Qi

)(Ri−1Ri

)

=

(Ri

Ri−1−QiRi

)

=

(RiRi+1

)

,

4.3 An extension of Taylor approximation: Padé approximation 41

2. A straightforward induction yields the first equality which, combined with 1 implies SiA+TiB=Ri for 06 i6 ℓ+1.

3. We have, by definition, Ui = Wi...W1U0. It follows det(Ui) = det(Wi)...det(W1 )det(U0).Since det Ui = Si Ti+1 − Si+1 Ti, det (Wj) = −1 for all j and det U0 = 1, we obtainSiTi+1−Si+1Ti=(−1)i.

4. Let i∈{0, ..., ℓ}. We deduce from 1 that(

Rℓ0

)

=Wℓ···Wi+1Ui(

A

B

)

=Wℓ···Wi+1(

RiRi+1

)

.

It follows that Rℓ is a linear combination over K[x] of Ri and Ri+1. Therefore gcd(Ri,Ri+1)divides Rℓ. Moreover, detWi=−1 : the matrix Wi is invertible of inverse

Wi−1=(Qi 11 0

)

.

Hence(

RiRi+1

)

=Wi+1−1 ···Wℓ−1(Rℓ0

)

which implies that Rℓ divides Ri and Ri+1. This shows that Rℓ is agreatest common divisorof Ri and Ri+1 and gcd(Ri,Ri+1) =Rℓ/ lc(Rℓ). This is true in particular for i=0. �

Proposition 4.10. Assume degA> degB. Then

deg Si=∑

26j deg Ri+1 for 16 i6 ℓ. It follows that deg Q1 > 0 and for 26 i6 ℓ, deg Qi > 0 since Qjis the quotient of the division of Rj−1 by Rj for j = 1, ..., ℓ. Therefore we have for 1 6 i 6 ℓ,deg (QiRi)=degQi+degRi>degRi>degRi+1, hence degRi−1=deg (QiRi+Ri+1)=deg (QiRi)i.e. degQi=degRi−1−degRi for 16 26 ℓ. We obtain

∑

26j2 and the second one follows from the induction hypothesis.It comes

degSi+1> deg Si and deg Si+1=deg (QiSi)= deg (Qi) +deg (Si) =∑

26jdegTi−1 and deg Ti=∑

16j

for all i=3, ..., ℓ+1. �

Proposition 4.11. The cost of the extended Euclidean algorithm is O(mn) operations in K.

Proof. If deg B > deg A, the first step then consists of swapping A and B and there is noarithmetical cost. We assume deg B 6 deg A. From Proposition4.8, we know that the Euclideandivision of a polynomial P by a polynomial Q requires at most (2 degQ+1) (degP −degQ+1)+1arithmetic operations in K. The cost of the Euclidean algorithm (i.e. the computation of thesequences (Qi)16i6ℓ and (Ri)06i6ℓ+1) is therefore upper bounded by the sum

∑

i=1

ℓ

((2 degRi+1) (degRi−1−degRi) + 1).

The degree of each Ri is lesser or equal to degB=m for i> 1. As the degRi are nonincreasing fori>1, the cost is then upper bounded by (2m+1)

∑

i=1ℓ (degRi−1−degRi)+ℓ=(2m+1) (degR0−

degRℓ)+ ℓ. The number ℓ is lesser or equal to m+1 since R1=B and degRi+1 1.This yields a cost upper bounded by (2m+1) (n−degRℓ)+m+1≤ 2mn+n+m+1≤ 5mn assoon as min (n,m)≥ 1.

The computation of Si+1= Si−1−Qi Si requires

Approximations: From symbolic to numerical computation ... · Approximations: From symbolic to numerical computation, and applications Master d’informatique fondamentale École

Documents