Regularization methods - Regularization of inverse problems by the Landweber iteration

In regularizing an ill-posed problem the obvious questions one needs to provide an answer to are as follows:

• How can I construct such regularization (regularization operators) as discussed ear-lier?

• How can I select a parameter choice rule to produce convergent methods of regu-larization?

• How can I perform these steps in some optimal way?

These questions will be dealt with in this and the following sections. The result below provides a description of regularization operators, thus answers the first question by this basic property; it can be shown that for any regularization an a-priori parameter choice rule, and thus, a convergent regularization, exists.

Proposition 3.6. Let {R_α}_α>0 be a family of continuous operators. Then from Defini-tion 3.1, there exists an a-priori parameter choice ruleα, such that (R_α, α)is a conver-gent regularization for equation(41).

Proof. Lety∈ D(A^†)be arbitrary but fixed. Due to the pointwise convergenceR_α →A^†, we can find a monotone increasing function σ : R⁺ → R⁺ withlimε→0σ(ε) = 0 such that for everyε >0, we have

kR_σ(ε)y^δ−A^†yk ≤ ε 2.

As the operatorR_σ(ε)is continuous for fixedε, there existsρ(ε)>0such that kR_σ(ε)z−R_σ(ε)yk ≤ ε

2 for all z ∈H₂ if kz−yk ≤ρ(ε).

Without loss of generality we can assume thatρis continuous, strictly monotone increas-ing function with lim_ε→0ρ(ε) = 0. Then there is a strictly monotone and continuous inverse functionρ⁻¹ on the rangeRan(ρ)withlimδ→0ρ⁻¹(δ) = 0. We now continuously extendρ⁻¹ onR⁺and define the a-priori parameter choice rule strategy as

α:R⁺ → R⁺

δ → σ(ρ⁻¹(δ)).

Thenlimδ→0α(δ) = 0follows. By our construction, there existsδ := ρ(ε)for allε > 0 such that withα(δ) =σ(ε)

kR_α(δ)y^δ−A^†yk ≤ kR_σ(ε)y^δ−R_σ(ε)yk+kR_σ(ε)y^δ−A^†yk ≤ ε 2 +ε

2 =ε

follows for all y^δ ∈ H₂ if ky−y^δk ≤ δ. Thus (R_α, α) is a convergent regularization method for equation (41) and the functionαdefines an a-priori parameter choice rule.

Remark 3.7. Conversely from Proposition 3.6, if(R_α, α)is a convergent regularization method, then we can conclude from equation(44)that

δ→0limR_α(δ,y^δ₎y=A^†y, with y∈ D(A^†),

andαis continuous with respect toδ, then this implies

σ→0limR_σy=A^†y as σ →0.

Therefore, the correct approach to understanding the concept of regularization is point-wise convergence of the regularization operators. Furthermore, in the case ofy /∈ D(A^†), as the generalised inverse is not defined for those functions, we cannot expect that R_α, a convergent regularization remain bounded asα → 0, since thenA^† would have to be bounded. This is confirmed by the following result.

Proposition 3.8. Let{R_α}_α>0 be a continuous linear regularization ofA^†. Let

x_α :=R_αy. (47)

as defined in Definition 3.1. Moreover, if sup

α>0

kAR_αk<∞, (48)

then

kx_αk → ∞ for y /∈ D(A^†). (49)

Proof. In the case fory ∈ D(A^†)in equation (47), the convergence ofxα is centred on Proposition 3.6 above. We then only need to look at the case when y /∈ D(A^†). Now, assume that there is a sequence α_n → 0 such that kx_α_nk is uniformly bounded. Then there is a weakly convergent subsequence x_α_n with some limit x ∈ H₁. As continuous linear operators are also weakly continuous, we have Ax_α_n → Ax. However, as AR_α are uniformly bounded operators, we also conclude Ax_α_n = AR_α_ny → Qy. Hence, Ax =Qy and consequently y ∈ D(A^†)is a contradiction to the assumptiony /∈ D(A^†).

In conclusion, fory /∈ D(A^†), no bounded sequencekx_α_nkcan exist, hence equation (49) holds.

We finally can characterise an a-priori parameter choice rule α that lead to convergent regularization methods by the following Proposition.

Proposition 3.9. Let{R_α}_α>0be a linear regularization, and α: R⁺ →R⁺an a-priori parameter choice rule. Then(R_α, α)is a convergent regularization method if and only if

limδ→0α(δ) = 0 (50)

and

limδ→0δkR_α(δ)k= 0 (51)

hold.

Proof. If equations (50) and (51) hold, then for every y^δ ∈ H₂ with ky^δ−yk ≤ δ, we have

kR_α(δ)y^δ−A^†yk ≤ kx_α(δ)−A^†yk+kx_α(δ)−R_α(δ)y^δk

≤ kx_α(δ)−A^†yk+δkR_α(δ)k.

Due to equations (47), (50) and (51), the right-hand side of the inequality converges to zero and thus(R_α, α)is a convergent regularization method. We now show the converse.

Now let (R_α, α) be a convergent regularization method and assume that equation (51) does not hold, so that there exists a sequence δ_n → 0 such that kδ_nR_α(δ_n₎k ≥ C > 0 for some constantC. Then we can find a sequence{z_n}inH₂ withkz_nk = 1such that δ_nkR_α(δ_n₎z_nk ≥ ^C₂. Then for anyy∈ D(A^†)andy_n:=y+δ_nz_n, we obtainky−y_nk ≤δ_n, but

R_α(δ_n₎y_n−A^†y= (R_α(δ_n₎y−A^†y) +δ_nR_α(δ_n₎z_n

does not converge to0, since the second termδ_nR_α(δ_n₎z_nis unbounded. Hence, for suffi-ciently smallδ_n, equation (45) is violated fory^δ =z_nand thus,(R_α, α)is not a convergent regularization method.

Now we consider an example of a regularization constructed to fit the definitions above.

Refer to [8] for more examples.

Example 3.10. Consider the operatorA:L²[0,1]→L²[0,1]defined by

(Ax)(s) :=

x(t)dt.

ThenAis a bounded linear compact operator and it is easily seen that

Ran(A) ={y∈ W^2,1[0,1]|y ∈ C([0,1]), y(0) = 0} (52) where W^2,1 denote the Sobolev space on [0,1] with order 1 for the L² space and C is the set of continuous functions on [0,1]. The distributional derivative from Ran(A) to L²[0,1]is the inverse ofA. SinceC₀^∞[0,1] ⊇ Ran(A), Ran(A)is dense in L²[0,1]and

Ran(A)^⊥ ={0}. Fory∈ C([0,1])andα >0, define

(R_αy)(t) :=







α(y(t+α)−y(t)), if 0≤t≤ ¹₂,

α(y(t)−y(t−α)), if ¹₂ < t≤1.

(53)

Then{R_α}is a family of linear and bounded operators with

kR_αyk_L²_[0,1]≤

√6

α kyk_L²_[0,1] (54)

defined on a dense linear subspace ofL²[0,1], thus it an be extended to the wholeL²[0,1]

and equation(54)holds. Since the measure of[0,1]is finite, for y ∈ Ran(A)the distri-butional derivative of y lies inL¹[0,1], so y is a function of bounded variation. By the Lebesgue’s Theorem, the derivativey⁰ exists almost everywhere in [0,1]and it is equiv-alent to the distributional derivative of y as an L²−function. We can therefore use the Dominate Convergence Theorem to prove that

kR_αy−A^†yk_L²_[0,1] →0, as α →0

so that, according to Proposition 3.6,R_αis a regularization for the distributional deriva-tiveA^†.

4 LANDWEBER ITERATION

4.1 Introduction

Let us consider equation (1) and that equation (40) is satisfied given a perturbed data y^δ. The idea of most iterative methods is to approximateA^†ywith a sequence of iterates {x_k}_k∈_Nand are based on the transformation of the normal equation (18) into equivalent fixed point equations such as

x=x+A^∗(y−Ax) = (I−A^∗A)x+A^∗y (55) [2, 20, 42]. The vector A^∗(y−Ax)is the directional negative gradient of the quadratic functional

x7−→ kAx−yk².

Landweber [43] in 1951 established a very strong convergence provided Ais compact andy ∈ D(A^†), Fridman [44] studied other properties ofA; not only compact, but also being a self-adjoint positive semi-definite operator. Bialy [45] on the other hand in 1959 extended the results of Landweber and Fridman to not necessarily a compact operator A. The Landweber iteration is given an appropriate initial guess say x^∗ which selects the particular solution which will be approximated in case one is given a noisy data y^δ instead of y and usingx^δ₀ = x^∗ the iteration computes the sequence of iterates {x^δ_k}k∈N

recursively.

The Landweber iteration is defined as follows:

Definition 4.1 (Landweber Iteration). Fix any appropriate initial guess x^δ₀ = x^∗ ∈ H₁ and for k = 1,2, . . . compute the Landweber approximations recursively using the formula

x^δ_k=x^δ_k−1+A^∗(y^δ−Ax^δ_k−1). (56)

As observed in the Definition 4.1, one can conveniently assume without loss of generality

kAk ≤1, (57)

in which caseI−A^∗AandI−AA^∗are both positive semi-definite operators with at most norm one as seen in [44], since

kTk= sup

kxk_H

1=1

|hx, T xi_H₁|

for any self-adjoint operatorT :H₁ → H₁andT =A^∗A =AA^∗ [46]. If it was not the case as in equation (57), then one would introduce a fixed relaxation parameterτ >0∈R with0< τ ≤ 1

kAk² that precedesA^∗, in equation (56). That is, the iteration would be x^δ_k =x^δ_k−1+τ A^∗(y^δ−Ax^δ_k−1) = (I−τ A^∗A)x^δ_k−1 +τ A^∗y^δ, k∈N. (58) This iteration scheme is a special case of the steepest descent algorithm applied to the quadratic functionalkAx−yk²₂and is seen in the following lemma

Lemma 4.2. Let the sequence{x^δ_k}be defined by equation(58)and define the quadratic functional Ψ : H1 → Rby Ψ(x) = ¹₂kAx−y^δk²₂. ThenΨ is Fréchet differentiable in eachz ∈H1and

Ψ⁰(z)x= (Az−y^δ, Ax) = (A^∗(Az−y^δ), x), x∈H₁. (59) The linear functionalΨ⁰(z)can be identified withA^∗(Az−y^δ)∈H₁ in the Hilbert space H₁over the field of real numbersR.

Thereforex^δ_k =x^δ_k−1+τ A^∗(y^δ−Ax^δ_k−1)is the steepest descent step with step-sizeτ. Equivalently, one could multiply the equationAx=y^δby√

τ and perform iteration with equation (56).

Furthermore, following from equation (58), if{z^δ_k}is the sequence of iterates with initial guess valuez₀^δ = 0and the datay^δ−Ax^δ₀, thenx^δ_k=x^δ₀+z_k^δ. So one can assume without loss of generality that the standard choice of initial guess is thatx^δ₀ = 0.

IfkAk² = 1 <2then the associated fixed pointI−A^∗Ain equation (55) is nonexpansive and the method of successive approximations may be applied [2, 42]. For ill-posed prob-lems, the fixed point operatorI−A^∗Ais no contraction. This is because the spectrum of A^∗Aclusters at the starting point (origin). For instance, ifAis compact, then there exists a set{λ_n}of eigenvalues ofA^∗Asuch that|λ_n| → 0asn → ∞and with its associated eigenvectors{v_n}we have

k(I−A^∗A)v_nkkv_nk⁻¹ =k(1−λ_n)v_nkkv_nk⁻¹ =|1−λ_n| −→1 as n→ ∞.

That iskI−A^∗Ak ≤1.

The following theorem is the work of Landweber [43] where he proved the strong con-vergence of compact operators.

Theorem 4.3. If y ∈ D(A^†), then the Landweber approximations x_k corresponding to the true data yconverge to A^†y, i.e., x_k → A^†y = x^† ask → ∞. If y /∈ D(A^†), then

kx_kk → ∞ask → ∞.

Proof. By mathematical induction, the iteration termsx_kmay be written non-recursively in the form We can denote functionsgandras

g_k(λ) =

k−1

j=0

(1−λ)^j and r_k(λ) = (1−λ)^k, (62)

whereg_k(λ)andr_k(λ)are parameter-dependent family of functions which are piecewise continuous on[0,kAk²](that is, on a set containing the spectrumA^∗A). SincekAk ≤ 1 as we have assumed before, we considerλ ∈ (0,1] :in this intervalλg_k(λ) = 1−r_k(λ) is uniformly bounded and g_k(λ) converges to _λ¹ as k → ∞because r_k(λ) converges to 0.

Theorem 4.3 states that the approximation error of the Landweber iteration converges to 0 wheny ∈ D(A^†). However, what happens if we have a perturbed data y^δ? We must then examine the behaviour of the propagated data error. According to Theorem 4.3, for a noisy datay^δ withy^δ ∈ D(A/ ^†)the iterates must diverge; on the other hand thek−th iteratex^δ_kfor fixedkis continuously dependent on the data. This is seen in the following theorem and result.

Theorem 4.4. For fixed iteration index k, the iterate x^δ_k depends continuously on the perturbed data y^δ, sincex^δ_k is the result of a combination of continuous operations and

fork ∈N, the linear and bounded operatorR_k:H₂ →H₁ defined by

holds. Assuming further thatkAk ≤1and thatAis injective with dense rangeRan(A)⊂ H₂, ify^δ ∈Ran(A), thenx^δ_k →A⁻¹y^δ ask → ∞, and ify^δ ∈/ Ran(A), thenkx^δ_kk → ∞ ask→ ∞.

Proof. It is obvious to see thatRk : y^δ 7→x^δ_k is continuous, becausex^δ_kis the result of a (finite) combination of continuous transformations of the given data. The particular form ofR_k is readily obtained by mathematical induction: Sincex^δ₀ = 0, we havex^δ₁ =A^∗y^δ, and henceR₁ =A^∗as claimed; fork >1it follows from equation (56) and the induction hypothesis that

Accordingly, by the assumption thatA is injective with dense rangeRan(A) ⊂ H₂ im-plies thatRan(A^∗)is dense in H₁, and that (I −A^∗A)^k converges pointwise to 0on a dense subset ofH₁, and by the Banach-Steinhaus theorem thus the iteration error in equa-tion (64) converges to0for everyx^δ ∈H₁ask→ ∞.

Now, we have

kAR_kk ≤1, (65)

and hence ifAis injective with dense range inH₂, and ify^δ ∈/ Ran(A), then the iterates diverge to∞in the norm ask→ ∞.

Thus, for fixedk,x^δ_kis continuously dependent on the data so that the error of propagation cannot be arbitrarily high. The following result is as a consequence of this.

This results in the following.

Proposition 4.5. Lety,y^δ ∈H₂be a pair of data with equation(40)and let{x_k}^∞_k=1and {x^δ_k}^∞_k=1be their respective iteration sequences. We then have

kx_k−x^δ_kk ≤√

and we are to find the norm ofRk. From equation (61) follows kR_kk² =kR_kR^∗_kk=k hand side of the inequality is bounded byk, and the assertion follows.

Remark 4.6. From previous theorems and results, there are two components of the the total error, as seen in equation(42); an approximation error with slow convergence to0 and the data error propagation of the order of at most√

kδ. From equation(42), the data error is multiplied by the condition number kR_αk and by comparing this with equation (66), then kR_αk = √

k. If k is small, then the computed approximation x_k is an over smoothed solution [47]. That is the data error in equation(42)is negligible (the difference betweenx_kandx^δ_kin equation(66)is very small) and the total error converge to the true solution A^†y, but when √

kδ approaches the order of the degree of the approximation error, the data error is now seen inx^δ_k and the total error starts to increase until there is worst-case rate of divergence.

Remark 4.7. The phenomenon observed in Remark 4.6 has been termed semi-convergence behaviour of iterative methods by Natterer [48], see also [49,50]. The regularizing effects of iterative techniques is efficient if one finds a realistic stopping rule or criteria to detect the transient between convergence and divergence. This means that the iteration indexk takes the role of the regularization parameterαand the stopping rule works similarly as the parameter choice rule for continuous regularization methods [51]. That is, in the case of a noisy data, the iteration procedure is combined with a stopping rule in order to serve as a regularization method and one should terminate the iteration procedure by an ap-propriate stopping rule that involve the noise and noise levelδ. This means the iteration in equation(58)is stopped afterk∗ = k∗(δ, y^δ), wherek∗ is the stopping index [42, 52].

A generalized principle employed here is the most well-known stopping rule called the discrepancy principle of Morozov [53] which we will discuss fully in the next subsection.

4.2 Connection of the Singular Value Expansion and the

In document Regularization of inverse problems by the Landweber iteration (sivua 33-43)