Computational Methods in Inverse Problems, Mat Krylov subspaces The kth Krylov subspace associated with the matrix A and the vector b is Kk(A, b

(1)

Conjugate Gradient algorithm

• Need: A symmetric positive definite;

• Cost: 1 matrix-vector product per step;

• Storage: fixed, independent of number of steps.

The CG method minimizes the A norm of the error, x_k = arg min

x∈Kk(A,b)kx − x_∗k²_A. x_∗ = true solution, kzk²_A = z^TAz.

Computational Methods in Inverse Problems, Mat–1.3626 0-0

(2)

Krylov subspaces

The kth Krylov subspace associated with the matrix A and the vector b is

K_k(A, b) = span{b, Ab, . . . , A^k−1b}.

Iterative methods which seek the solution in a Krylov subspace are called Krylov subspace iterative methods.

(3)

At each step, minimize

α 7→ kx_k−1 + αp_k−1 − x_∗k²_A Solution:

α_k = kr_k−1k² p^T_k−1Ap_k−1 New update

x_k = x_k−1 + α_k−1p_k−1. Search directions:

p₀ = r₀ = b − Ax₀,

(4)

Iteratively, A-conjugate to the previous ones:

p^T_k Ap_j = 0, 0 ≤ j ≤ k − 1.

Found by writing

p_k = r_k + β_kp_k−1, r_k = b − Ax_k, β_k = kr_kk²

kr_k−1k² ,

(5)

Algorithm (CG) Initialize: x₀ = 0; r₀ = b − Ax₀; p₀ = r₀;

for k = 1,2, . . . until stopping criterion is satisfied α_k = kr_k−1k²

p^T_k−1Ap_k−1 ; x_k = x_k−1 + α_kp_k−1; r_k = r_k−1 − α_kAp_k−1; β_k = kr_kk²

kr_k−1k² ; p_k = r_k + β_kp_k−1; end

(6)

CGLS Method

Conjugate Gradient method for Least Squares (CGLS)

• Need: A can be rectangular (non-square);

• Cost: 2 matrix-vector products (one with A, one with A^T) per step;

• Storage: fixed, independent of number of steps.

Mathematically equivalent to applying CG to normal equations A^TAx = A^Tb

without actually forming them.

(7)

CGLS minimization problem The kth iterate solves the minimization problem

x_k = arg min

x∈K_k(A^TA,A^Tb)kb − Axk.

The kth iterate x_k of CGLS method (x₀ = 0) is characterized by Φ(x_k) = min

x∈Kk(A^TA,A^Tb)Φ(x) where

Φ(x) := 1

2x^TA^TAx − x^TA^Tb.

(8)

Determination of the minimizer

Perform sequential linear searches along A^TA-conjugate directions p₀, p₁, . . . , p_k−1

that span K_k(A^TA, A^Tb).

Determine x_k from x_k−1 and p_k−1 according to

x_k := x_k−1 + α_k−1p_k−1 where α_k−1 ∈ R solves

minα∈R Φ(x_k−1 + αp_k−1).

(9)

Residual Error Introduce the residual error associated with x_k:

r_k := A^Tb − A^TAx_k. Then

p_k := r_k + β_k−1p_k−1

Choose β_k−1 so that p_k is A^TA-conjugate to the previous search directions:

p^T_k A^TAp_j = 0, 1 ≤ j ≤ k − 1.

(10)

Discrepancy The discrepancy associated with x is

d_k = b − Ax_k. Then

r_k = A^Td_k It was shown by Hestenes and Stiefel that

kd_k+1k ≤ kd_kk; kx_k+1k ≥ kx_kk.

(11)

Algorithm (CGLS) x₀ := 0; d₀ = b; r₀ = A^Tb;

p₀ = r₀; t₀ = Ap₀;

for k = 1,2, . . . until stopping criterion is satisfied α_k = kr_k−1k²/kt_k−1k²

x_k = x_k−1 + α_kp_k−1; d_k = d_k−1 − α_kt_k−1; r_k = A^Td_k;

β_k = kr_kk²/kr_k−1k²; p_k = r_k + β_kp_k−1; t_k = Ap_k;

end

(12)

Example: a Toy Problem An invertible 2 × 2-matrix A,

y_j = Ax_∗ + ε_j, j = 1,2.

Preimages,

x_j = A⁻¹y_j, j = 1,2,

(13)

x*

x1

x2

A

y*

y1

y2

A⁻¹

(14)

solution by iterative methods: semiconvergence.

Write

B = Ax_∗ + E, E ∼ N(0, σ²I), and generate a sample of data vectors, b₁, b₂, . . . , b_n Solve with CG

(15)

(16)

When should one stop iterating?

Let

Ax = b_∗ + ε = Ax_∗ + ε = b, Approximate information

kεk ≈ η, where η > 0 is known. Write

kA(x − x_∗)k = kεk ≈ η Any solution satisfying

kAx − bk ≤ τ η is reasonable.

Morozov discrepancy principle

(17)

Example: numerical differentiation Let f : [0,1] → R be a differentiable function, f(0) = 0.

Data = f(t_j)+ noise, t_j = j

n, j = 1,2, . . . n.

Problem: Estimate f⁰(t_j).

Direct numerical differentiation by, e.g., finite difference formula does not work: the noise takes over.

Where is the inverse problem?

(18)

Data

−3 −2 −1 0 1 2 3

−0.5 0 0.5 1 1.5 2 2.5

Noise level 5%

(19)

Solution by finite difference

−3 −2 −1 0 1 2 3

−1.5

−1

−0.5 0 0.5 1 1.5 2 2.5

(20)

Formulation as an inverse problem Denote g(t) = f⁰(t). Then,

f(t) = Z _t

0

g(τ)dτ.

Linear model:

Data = y_j = f(t_j) + e_j =

Z _t_j

0

g(τ)dτ + e_j, where e_j is the noise.

(21)

Discretization Write

Z _t_j

0

g(τ)dτ ≈ 1 n

Xj

k=1

g(t_k).

By denoting g(t_k) = x_k,

y = Ax + e, where

A = 1 n





 1 1 1

... . ..

1 1





.