Shrinkage - Path to collaborative denoising

2.3 Path to collaborative denoising

2.3.4 Shrinkage

The core of transform-domain filters is shrinkage in the chosen transform do-main. The shrinkage operator is a non-linear filter applied separately on each transform coefficient, leveraging the enhanced sparsity of the domain to atten-uate noise. Denoting by T^dD andQ^dD thed-dimensional forward and inverse transforms, respectively, shrinkage of a patch⁴ at a coordinatexinT^dD trans-form domain can be expressed as

ˆ_x =Q^dD(Υ(T^dD(z_x))), (2.6) where yˆ_x is the resulting patch estimate, and Υ is the non-linear shrinkage operator. We denote by s^x_i =⟨︁

z_x, b^dD_i ⟩︁

,i= 1, . . . , N, whereN is the total pixel count of the patch and b^dD_i is the i-th basis function of T^dD, a coefficient ofz_x in the T^dD domain; a generic shrinkage operator can then be expressed by

Υ :s^x_i ↦−→αis^x_i, (2.7) where αi ∈ [0,1] is a shrinkage attenuation factor which depends on s^x_i, the noise statistics, and possible other priors. Coefficients small in relation to the corresponding transform-domain noise variance v_i are commonly assumed to be mostly noise (as they are likely to arise fromN(0, v_i)), and as such assigned a small α_i.

2.3.4.1 Denoising by thresholding

Hard-thresholding is a simple yet effective shrinkage operation performed by setting spectrum coefficients smaller than a threshold √v_iλto zero:

α^HT_i =

⎧⎨

⎩

1 if |s^x_i| ≥√viλ 0 otherwise,

(2.8)

4In case of filters which consider the full image at once, this patch is simply of the image size.

whereλ≥0 is a fixed constant andviis the noise variance ofs^x_i, which for white noise is equal to ∑︁N

i=1∥b^dD_i ∥²2var{η}. Hard-thresholding can offer very good performance when the signal is concentrated to only few coefficients, as any coefficient not exceeding the threshold will be completely eliminated. However, no noise will be attenuated from the coefficients with significant signal, meaning that noise in such coefficients will be preserved.

Another thresholding function is soft-thresholding, where, on top of setting to zero coefficients smaller than the threshold, all larger coefficients are shrunk by the threshold magnitude. As such, some noise is eliminated even from large coefficients. Although soft-thresholding commonly achieves a smoother appear-ance, the average error tends to be greater than that of the hard threshold, making hard-thresholding a common choice despite its simplicity (Johnstone 2019).

Considering an orthonormal T^dD, the thresholding functions can be seen as an optimization of quadratic fidelity and an ℓ^p-penalty promoting sparsity, where smaller pcorresponds to higher sparsity (Johnstone 2019):

ˆ_x = argmin

(∥z_x−y_x∥²2+c∥T^dD(y_x)∥p),

where cis a constant scaling the penalty, relating to the threshold value. Set-ting p = 1 yields a soft-thresholding in (2.6), whereas p = 0 (with ℓ⁰ defined as the number of non-zero elements) yields hard-thresholding.

Choice of threshold

For ideal noise attenuation, the threshold should be large enough to cover all noise extremes. However, if the signal is nonzero in the thresholded coefficient, too large a threshold will also attenuate significant amounts of signal. For this reason, the preferred threshold is commonly just above the expected max-imum of noise (Mallat 1999), which is proportional to the transform-domain noise standard deviation √v_i. An optimized threshold can also be estimated through minimizing an expected risk (e.g., the expected squared error) based on the noisy data itself, as used in Donoho and Johnstone (1995) through the Stein unbiased risk estimate (Stein 1981). Adaptive estimation for sub-band thresholding is further developed in Chang et al. (2000).

Universal threshold

A commonly used approximation of the expected maximum of white Gaus-sian noise, the so-called universal threshold (Donoho and Johnstone 1994), is

√v_i√︁

2ln(N); this leads to a natural choice λ = √︁

2ln(N). The curious de-pendence on the sample size arises from the tail of the Gaussian distribution, raising the likelihood of increasingly large coefficients as the sample size in-creases. The universal threshold gives only the upper bound for the expected maximum for white Gaussian noise; it is nevertheless commonly employed due to its simplicity (Johnstone 2019). The universal threshold is not optimal in terms of risk; in general, a lower threshold reduces this expected error, and is commonly preferred in practical applications (Johnstone 2019; Mallat 1999).

Thresholding for correlated noise

Hard-thresholding (2.8) is directly applicable to correlated noise denoising.

In such case, the noise variance of each transform-domain coefficient depends on the power spectrum of the noise. The theoretical basis of the universal threshold √v_i√︁

2ln(N) considers only white noise, although it is also used in thresholding correlated noise. In practice, it indicates only an upper bound (of an upper bound) for correlated noise, noting that the probability of the sample maximum to exceed a given threshold is highest when standardized samples are independent (Johnstone and Silverman 1997).

2.3.4.2 Wiener filter

Wiener filter is a filter defined through minimization of the mean squared error between the estimate and an oraclereference signal representing the underlying noise-free data. The mean squared error of an estimateˆ of a noise-free signalr r can be written as a sum of bias and variance as

MSE(r,rˆ) = E{(rˆ−r)²}= var{rˆ}+ Biasr{rˆ}². (2.9) Considering a noisy coefficient s^x_i =r^x_i +n^x_i, where r_i^x is the underlying noise-free coefficient and n^x_i is the zero-mean noise component, the shrinkage

oper-ation in (2.7) with a factorαi gives rˆ^x_i =αis^x_i =αir_i^x+αin^x_i. Then, argmin

αi∈[0,1]

(var{rˆ^x_i}+ Biasr^x_i {rˆ^x_i}²) = argmin

αi∈[0,1]

(var{αin^x_i}+ (E{αir_i^x} −r^x_i)²)

= argmin

αi∈[0,1]

(α²_ivi+ (αir_i^x−r_i^x)²), which yields the attenuation factor

α^wie_i = |r_i^x|²

|r^x_i|²+v_i. (2.10) Instead of using directly the mean squared error, adding scaling factors to the bias or variance of (2.9) can be used to adjust the balance between minimization of the two sources of error. In practical applications, an oracle signal is not available; in such cases, an image estimate from another filter, e.g., hard-thresholding, can be used as the reference signal (Ghael et al. 1997).

An example of a simple transform-domain filter utilizing patch shrinkage is the sliding patch filter (Oktem et al. 1998; Yaroslavsky et al. 2001), in which a patchzx is extracted from the image at each coordinatex. Each patch is then filtered as in (2.6); commonly, the full resulting patch-estimates yˆ_x are aggre-gated to form the image estimate. The basic model uses rectangular blocks as patches; shape-adaptive alternatives have also been proposed to improve denoising performance (Foi et al. 2006; Foi et al. 2007). A particularly success-ful approach for denoising of white as well as correlated noise is to model the transform-domain coefficient neighborhoods through multiscale Gaussian scale mixtures, where clusters of coefficients are modeled as a product of a Gaussian vector and a positive scaling variable; from this model, the noise-free coeffi-cients can be effectively estimated through Bayesian least-squares (Portilla et al. 2003).

2.3.4.3 Transform-domain noise power spectrum

To calculate the noise variance v_i of the T^dD spectrum coefficients needed for accurate shrinkage, we note thats^x_i =⟨︁

z_x, b^dD_i ⟩︁

=(︁

z⊛^←^→b_i^dD)︁

(x), where the ^←^→

decoration denotes the reflection about the origin of Z^d. Thus var{s^x_i}= var{︂(︂

ν⊛g⊛^←^→b_i^dD)︂

(x)}︂

= var{ν}⃦⃦⃦g⊛^←^→b_i^dD⃦⃦⃦²

In document Exact Transform-Domain Variances for Collaborative Filtering of Correlated Noise (sivua 26-30)