Multivariate Approximation and Matrix Calculus Mathematical Modeling and Simulation; Module 2: Matrix Calculus and Optimization Page 1 Chapter 1: Introduction to Linear Regression Introduction Taylor series is a representation of a function as an infinite sum of terms that are calculated from the values of the function's derivatives at a single point. Any finite number of initial terms of the Taylor series of a function is called a Taylor polynomial. In single variable calculus, Taylor polynomial of n degrees is used to approximate an (n+1)-order differentiable function and the error of the approximation can be estimated by the (n+1)-th term of the Taylor series. By introducing vector and matrix calculus notations, we can express the same idea for multivariate functions and vector functions. Applications Because all numbers that can be represented by finite digits are rational numbers, the numerical computation of an irrational function at a particular point is almost always approximated. The first order and second order of Taylor polynomials are most frequently selected as the proper rational function to approximate irrational functions. This idea is called linear and quadratic approximation in calculus, respectively. In addition, the quadratic approximation is also used to in optimization because local maximum or minimum occurs at the critical points where the second term (first derivatives) of the Taylor polynomial is zero and the third term (second derivatives) are definitely positive or negative. In order to obtain the first or second order Taylor polynomial, we compute the coefficients of Taylor series by calculating the first and second derivatives of the original function. When we move towards the advanced mathematical applications (temperature in 4 dimensional temporal-spatial space and vector field of moving hurricane centers), we need to use multivariate (vector) functions, instead of single variable functions. In terms of the linear and quadratic approximation, we still use the idea of first and second order Taylor polynomials. However, we need to first generalize the concepts of the first and second order derivatives in multivariate context to obtain the coefficients of Taylor polynomials. Then, we can obtain the multivariate Taylor polynomial to approximate an irrational multivariate function. Goal and Objectives Reflection Questions We will extend the concepts of the first and second derivatives in the context of multivariate functions and apply these concepts to obtain the first and second order Taylor polynomials for multivariate functions. Our objectives are to learn the following concepts and associative formulas: 1. Gradient vector and matrix calculus 2. Linear approximation multivariate functions 3. Quadratic Taylor formula for multivariate functions 4. Use MATLAB to compute the Taylor series In history, mathematicians had to spend years calculating the value tables of many special functions such as Bessel functions and Legendre function. Nowadays, it is a trivial click to use MATLAB to estimate the value of any known function at any particular point. It seems that it is unnecessary to learn the approximating techniques. But, think about these questions. 1. How do you render a smooth surface to graph a multivariate function? 2. Why we only need to consider the first and second derivative to find optimal solutions to most applications? 3. What does an IMU (Inertial Measurement Unit) for robotic navigation system need to sense in order to estimate its own positions?
15
Embed
Multivariate Approximation and Matrix Calculus · PDF fileMultivariate Approximation and Matrix Calculus Mathematical Modeling and Simulation; Module 2: Matrix Calculus and Optimization
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Multivariate Approximation and Matrix Calculus
Mathematical Modeling and Simulation; Module 2: Matrix Calculus and Optimization Page 1 Chapter 1: Introduction to Linear Regression
Introduction Taylor series is a representation of a function as an infinite sum of terms
that are calculated from the values of the function's derivatives at a single point. Any finite number of initial terms of the Taylor series of a function is
called a Taylor polynomial. In single variable calculus, Taylor polynomial of
n degrees is used to approximate an (n+1)-order differentiable function
and the error of the approximation can be estimated by the (n+1)-th term
of the Taylor series. By introducing vector and matrix calculus notations,
we can express the same idea for multivariate functions and vector
functions.
Applications Because all numbers that can be represented by finite digits are rational
numbers, the numerical computation of an irrational function at a
particular point is almost always approximated. The first order and second
order of Taylor polynomials are most frequently selected as the proper
rational function to approximate irrational functions. This idea is called
linear and quadratic approximation in calculus, respectively. In addition,
the quadratic approximation is also used to in optimization because local
maximum or minimum occurs at the critical points where the second term
(first derivatives) of the Taylor polynomial is zero and the third term
(second derivatives) are definitely positive or negative. In order to obtain
the first or second order Taylor polynomial, we compute the coefficients of
Taylor series by calculating the first and second derivatives of the original
function. When we move towards the advanced mathematical applications
(temperature in 4 dimensional temporal-spatial space and vector field of
moving hurricane centers), we need to use multivariate (vector) functions,
instead of single variable functions. In terms of the linear and quadratic
approximation, we still use the idea of first and second order Taylor
polynomials. However, we need to first generalize the concepts of the first
and second order derivatives in multivariate context to obtain the
coefficients of Taylor polynomials. Then, we can obtain the multivariate
Taylor polynomial to approximate an irrational multivariate function.
Goal and
Objectives
Reflection
Questions
We will extend the concepts of the first and second derivatives in the
context of multivariate functions and apply these concepts to obtain the
first and second order Taylor polynomials for multivariate functions. Our
objectives are to learn the following concepts and associative formulas:
1. Gradient vector and matrix calculus
2. Linear approximation multivariate functions
3. Quadratic Taylor formula for multivariate functions
4. Use MATLAB to compute the Taylor series
In history, mathematicians had to spend years calculating the value tables
of many special functions such as Bessel functions and Legendre function.
Nowadays, it is a trivial click to use MATLAB to estimate the value of any
known function at any particular point. It seems that it is unnecessary to
learn the approximating techniques. But, think about these questions.
1. How do you render a smooth surface to graph a multivariate function?
2. Why we only need to consider the first and second derivative to find
optimal solutions to most applications?
3. What does an IMU (Inertial Measurement Unit) for robotic navigation
system need to sense in order to estimate its own positions?
Mathematical Modeling and Simulation; Module 2: Matrix Calculus and Optimization Page 11 Chapter 1: Introduction to Linear Regression
Use build in
functions to
define
functions
Take
derivatives in
different
orders
Define
multivariate
functions
Partial
derivatives
In first order
Third order
Define a
vector
multivariate
function
calculate
Jacobian
matrix
Differentiation of a symbolic expression is performed by means of the function diff. For instance, let's find the derivative of f(x)=sin(ex). >> syms x >> f=sin(exp(x)) f = sin(exp(x))
>> diff(f) ans = cos(exp(x))*exp(x) The nth derivative of f is diff(f,n).
>> diff(f,2)
ans = -sin(exp(x))*exp(x)^2+cos(exp(x))*exp(x) To compute the partial derivative of an expression with respect to some variable we specify that variable as an additional argument in diff. Let f(x,y)=x3y4+ysinx. >> syms x y >> f=x^3*y^4+y*sin(x)
f = x^3*y^4+y*sin(x) First we compute df/dx. >> diff(f,x) ans = 3*x^2*y^4+y*cos(x)
Next we compute df/dy. >> diff(f,y) ans = 4*x^3*y^3+sin(x)
Finally we compute d3 f/dx3. >> diff(f,x,3) ans = 6*y^4-y*cos(x)
0 0.5 1 1.5 2 2.5 3 3.5 4
12
14
16
18
20
22
x
Approximation using Taylor series
constant
linear
fifth-order and original
quadratic
f and T
aylo
r expansio
n a
bout
2.5
Multivariate Approximation and Matrix Calculus
Mathematical Modeling and Simulation; Module 2: Matrix Calculus and Optimization Page 12 Chapter 1: Introduction to Linear Regression
Jacobian of
linear
transformation
matrix is
the coefficient
Matrix of the
transformation
Find critical
points
Hession matrix
is the jacobian
of the jacobian
matrix
The Jacobian matrix of a function f:Rn -> Rm can be found directly using the
jacobian function. For example, let f:R2 -> R3 be defined by f(x,y)=(sin(xy),x2+y2,3x-2y).
[ 5*x1+7*x2+9*x3+2*x4] [ 8*x1+12*x2-6*x3+3*x4] Now let's find the Jacobian of T. >> JT=jacobian(T)
JT = [ 11, -3, 14, 7] [ 5, 7, 9, 2] [ 8, 12, -6, 3] The Jacobian of T is precisely A.
Next suppose f:Rn -> R is a scalar valued function. Then its Jacobian is just its
gradient. (Well, almost. Strictly speaking, they are the transpose of one another since the Jacobian is a row vector and the gradient is a column vector.) For example, let f(x,y)=(4x2-1)e-x2-y2.
Next we use solve to find the critical points of f.
Multivariate Approximation and Matrix Calculus
Mathematical Modeling and Simulation; Module 2: Matrix Calculus and Optimization Page 13 Chapter 1: Introduction to Linear Regression
Calculate the
determinant to
find maximum
or minimum
>> S=solve(gradf(1),gradf(2));
>> [S.x S.y] ans =
[ 0, 0] [ 1/2*5^(1/2), 0] [ -1/2*5^(1/2), 0] Thus the critical points are (0,0), (Ö5/2,0) and (-Ö5/2,0).
The Hessian of a scalar valued function f:Rn -> R is the n×n matrix of second order
partial derivatives of f. In MATLAB we can obtain the Hessian of f by computing the Jacobian of the Jacobian of f. Consider once again the function f(x,y)=(4x2-1)e-x2-y2.
>> syms x y real >> Hf=jacobian(jacobian(f));
>> Hf=simple(Hf)
Hf = [2*exp(-x^2-y^2)*(2*x+1)*(2*x-1)*(2*x^2-5), 4*x*y*exp(-x^2-y^2)*(-5+4*x^2)] [4*x*y*exp(-x^2-y^2)*(-5+4*x^2), 2*exp(-x^2-y^2)*(-1+2*y^2)*(2*x+1)*(2*x-1)] We can now use the Second Derivative Test to determine the type of each critical
point of f found above. >> subs(Hf,{x,y},{0,0}) ans = 10 0 0 2 >> subs(Hf,{x,y},{1/2*5^(1/2),0})
ans = -5.7301 0
0 -2.2920 >> subs(Hf,{x,y},{-1/2*5^(1/2),0}) ans = -5.7301 0 0 -2.2920
Thus f has a local minimum at (0,0) and local maxima at the other two critical points. Evaluating f at the critical points gives the maximum and minimum values of f. >> subs(f,{x,y},{0,0}) ans = -1
>> subs(f,{x,y},{'1/2*5^(1/2)',0}) ans = 4*exp(-5/4) >> subs(f,{x,y},{'-1/2*5^(1/2)',0})
ans = 4*exp(-5/4)
Thus the minimum value of f is f(0,0)=-1 and the maximum value is
√ √ .
The graph of f is shown in figure 1.
Multivariate Approximation and Matrix Calculus
Mathematical Modeling and Simulation; Module 2: Matrix Calculus and Optimization Page 14 Chapter 1: Introduction to Linear Regression
Review
Exercises
1. Use your scientific calculator to find the answers
(a) Find the linear approximation of at point , use what you find to estimate . (b) Find the quadratic approximation of at point and use what you find estimate .
(b) Use your calculator to calculate the “accurate” answer and compare
the relative accuracy of the two approximations above, where the
relative accuracy is defined as the ratio of the error against the
accurate answer.
2. Use MATLAB command to find the answers to the problem 1.
Multivariate Approximation and Matrix Calculus
Mathematical Modeling and Simulation; Module 2: Matrix Calculus and Optimization Page 15 Chapter 1: Introduction to Linear Regression
Answers to Self-Check Exercises
1. Answer (
)
2. Answer, the Jacobian matrix is the matrix consisting of the three gradient of the
vector function above
[
] (
) [
]
3. Let , ,
√ √
=
√
√ = ,
,
4. , ,
5. Let , ,
√ √
=
√
√ = ,
,
[
]
√ √
=-
,
√
√ =
,
√ √
=
[
]
[
]
[
] [
]
(
)
(
) (
)
(
)
(
) (
)
The “accurate” answer from MATLAB is 14.9403.
The absolute error for linear approximation is 14.9690-14.9403 = 0.0287,
Relative error is 0.19%
The absolute error for quadratic approximation is =14.9649-14.9403=0.0246
Relative error is 0.16%.
Remark: In this example, the quadratic approximation is only slightly better than the
linear approximation. Try both estimates for and discuss the differences.