Exact Transform-Domain Variances for Collaborative Filtering of Correlated Noise

(1)

Exact Transform- Domain Variances for Collaborative Filtering of

Correlated Noise

YMIR MÄKINEN

(2)

(3)

Tampere University Dissertations 527

YMIR MÄKINEN

Exact Transform-Domain Variances for Collaborative Filtering of Correlated Noise

ACADEMIC DISSERTATION To be presented, with the permission of

the Faculty of Information Technology and Communication Sciences of Tampere University,

for public discussion in the auditorium TB109 of Tietotalo, Korkeakoulunkatu 1, Tampere,

on 10 December 2021, at 12 o’clock.

(4)

ACADEMIC DISSERTATION

Tampere University, Faculty of Information Technology and Communication Sciences Finland

Responsible supervisor and Custos

Prof. Alessandro Foi Tampere University Finland

Pre-examiners Dr. Javier Portilla Instituto de Optica, CSIC Spain

Prof. Bart Goossens Ghent University Belgium

Opponent Dr. Charles Deledalle Brain Corp

USA

The originality of this thesis has been checked using the Turnitin OriginalityCheck service.

ISBN 978-952-03-2222-9 (print) ISBN 978-952-03-2223-6 (pdf) ISSN 2489-9860 (print) ISSN 2490-0028 (pdf)

http://urn.fi/URN:ISBN:978-952-03-2223-6 PunaMusta Oy – Yliopistopaino

Joensuu 2021

(5)

PREFACE

First and foremost, I want to thank my supervisor Prof. Alessandro Foi, without whom this work obviously would not have been possible. Not only did Alessandro provide endless ideas and guide me through creating all of the publications included in this work, but he was also the one to motivate and (patiently) introduce me to the world of image processing. I also want to thank him for organizing many opportunities for me to present my work around the world, and perhaps most importantly in these times, ensuring that I also man- age to return home.

Huge thanks go out to my pre-examiners, Prof. Javier Portilla and Prof. Bart Goossens, for their excellent and constructive feedback regarding my thesis, as well as to Dr. Charles Deledalle for accepting to serve as the opponent for my defense.

I want to thank Dr. Stefano Marchesini for introducing us to the ring arti- fact denoising problem, for lending his expertise in computed tomography, and for welcoming us to visit the synchrotron at the Berkeley labs in California.

Special thanks go also to Dr. Lucio Azzari, without whose guidance I might still be stuck at learning the basics of transform-domain filtering, for his in- valuable professional help, but also listening to my endless complaining and dragging me to Pump. I also want to thank my other co-workers, in particular Dr. Venkatessh Ramakrishnan and Nasser Eslahi, for all the interesting discussions and support. I further want to thank other staff of the lab, in particular Prof. Joni Kämäräinen, Kari Suomela, Virve Larmila, and Elina Orava for helping with practical matters and understanding the mysterious university policies.

I also want to thank my family, in particular my wonderful mother for her neverending support (even if she may still ask whether this program was for a ”professor’s degree”), and the amazing Ossi for endless encouragement,

(6)

patience, love, and food. I also want to thank all the people at Excalibur for many a game played during these four years and more. Lastly, I want to thank my late father for introducing me to the wonder of computers.

Finally, I want to thank Dr. Essi Isohanni and again Prof. Joni Kämäräinen, without whom two I would have never found this position as a doctoral re- searcher, or the field of image processing.

(7)

ABSTRACT

Noise as a distortion of signal is an unavoidable part of all imaging. Extreme imaging conditions tend to induce more noise due to low signal energy and faster degradation of the imaging equipment. We focus on modeling and removal of spatially correlated noise, caused by, for example, pixel cross-talk, thermal fluctuations, and variations in sensor response.

Collaborative filters perform denoising through shrinkage on jointly transformed groups of similar patches extracted from the image, crucially depending on the computation of noise variance in the joint transform domain. These variances have conventionally been only crudely approximated, preventing successful collaborative denoising of strongly correlated noise. In the work presented in this thesis, we propose a method for exact computation of the joint transform- domain noise power spectrum, used to improve the conventional algorithms in shrinkage, patch-matching and aggregation of the patch estimates. The improved filters offer state-of-the-art attenuation of correlated noise, with very significant improvements over the conventional versions for strongly structured correlation. We further demonstrate the usage of the exact transform-domain variances in noise estimation, enabling a fully blind denoising setup.

Correlated noise is abundant in many imaging applications. In this work, we specifically consider projections in computed microtomography, commonly corrupted by streak noise with very long-range correlation. We note that the streak noise can, upon a logarithmic transformation, be approximately modeled as additive correlated noise, which we denoise through multiscale frameworks embedding the collaborative filters. The acquisitions are further corrupted by approximately Poissonian noise; as a separate multiscale filtering step, we consider attenuation of this Poissonian component upon a variance stabilization applied on the log-scale data. The presented fully automatic denoisers yield state-of-the-art results in denoising of real microtomography data.

(8)

(9)

ORIGINAL PUBLICATIONS INCLUDED IN THIS THESIS

Publication I M¨akinen, Y., Azzari, L. and Foi, A. (2020). Collaborative filtering of correlated noise: exact transform-domain variance for improved shrinkage and patch matching. IEEE Trans.

Image Process. 29, 8339–8354.

Publication II M¨akinen, Y., Marchesini, S. and Foi, A. (2021a). Ring arti- fact reduction via multiscale nonlocal collaborative filtering of spatially correlated noise. J. Synchrotron Radiat. 28.3, 876–888.

Publication III M¨akinen, Y., Marchesini, S. and Foi, A. (2021b). Ring arti- fact and Poisson noise attenuation via volumetric multiscale nonlocal collaborative filtering of spatially correlated noise.

Submitted to J. Synchrotron Radiat. (under review).

Author’s contribution

The author of the thesis is the first author and main contributor of all of the publications included in this work. The author was responsible for the main development and analysis of the algorithms, as well as all program code used in the experiments and the distributed packages. The author was also the main person responsible for the preparation of each of the manuscripts.

All co-authors have contributed to the research work with valuable comments and ideas, and helped with the preparation of the respective manuscripts. All publications were prepared under the supervision of Prof. Alessandro Foi.

(14)

(15)

1 INTRODUCTION

Whether a photograph, thermal image or a tomogram, all sensor-based imaging is subject to errors, called noise. Noise in images varies both in statistics and source; most commonly, noise is generated by the particle nature of light and thermal agitation, but can also be a result of the imaging pipeline or defects in the sensor. Generally, the lower the energy used in the imaging, the greater the relative effects of noise: this can be easily noted when comparing photographs taken in different lighting, but the same principle extends to other imaging methods, such as X-ray, where dose reduction may be desirable.

Hence, noise, as a physical phenomenon, cannot be avoided. It is then left for post-processing, called denoising, to estimate the noise-free image. This thesis focuses on modeling and removal of spatially correlated noise, i.e., noise whose value in any single pixel of an image is correlated with the noise of some other pixels. Such correlation is caused by, for example, pixel cross-talk, thermal fluctuations, and variations in sensor response.

Although many advancements have been recently proposed for indepen- dently distributed noise, especially through neural networks, (e.g., Agostinelli et al. (2013), Burger et al. (2012), Lefkimmiatis (2017), Mao et al. (2016), Tai et al. (2017), Xie et al. (2012), Zhang et al. (2017a) and Zhang et al. (2017b);

an extensive overview is provided in Tian et al. (2020)), the same progress has not yet been extended for correlated noise denoising. Although neural networks perform very well if trained precisely for a specific noise correlation, training for a more broad range can be difficult, as demonstrated in Appendix A. As noise is very rarely completely uncorrelated, it is crucial to allow for denoising of non-independent noise. Furthermore, some setups exhibit extremely long- range correlation, making proper modeling of the noise distribution essential for a successful denoising.

This work is built around the stationary additive correlated noise model,

(16)

meaning that the noise does not depend on the signal and the noise statistics stay constant across the image. Although few natural noise sources fit directly into this model, most noise distributions can be approximated in this form with relative ease, e.g., through variance stabilization of signal-dependent noise.

In this thesis, we focus specifically on denoising throughcollaborative filters, most prominent of which is the Block-matching and 3-D filtering (BM3D) denoiser (Dabov et al. 2007). Collaborative filters combine the principles of non-local filters, which leverage image self-similarity, and transform-domain methods, which perform shrinkage in a domain where the signal is sparse, i.e., concentrated to only a few non-zero coefficients. In particular, collaborative filters perform denoising through shrinkage on jointly transformed groups of similar patches extracted from the image. By utilizing the patch similarity, the joint spectra can offer enhanced sparsity compared to that of an individual patch.

Although neural networks are commonly regarded as state-of-the-art in independent noise removal, BM3D is still widely used due to its stability and execution speed generally superior to neural networks when executed on GPU or programmable hardware (Davy 2019; Davy and Ehret 2020; Davy et al.

2019; Wang et al. 2020), and avoiding the potentially expensive re-training common for neural networks. For example, the 2019 Huawei mobile processor Kirin was manufactured and marketed through its special hardware support for BM3D denoising (HiSilicon Kirin 990 5G Specifications 2019).

Although originally proposed for independent noise denoising, BM3D has been expanded for correlated noise removal already in 2008 (Dabov et al. 2008);

however, it has suffered from massive misapproximations of correlated noise variance in the joint transform domain. Indeed, in this work (Publication I), we note that the collaborative transform-domain shrinkage at the core of the denoiser crucially depends on accurate calculations of transform-domain noise variances, which have previously been only crudely approximated. Implement- ing the exact calculation of these variances within BM3D and the volumetric denoiser BM4D (Maggioni et al. 2012b), we achieve massive improvements in denoising of strongly correlated noise. We further apply the exact variances to noise estimation, allowing a fully blind denoising setup.

Correlated noise is abundant in many imaging applications. Previously, the

(17)

insufficient modeling of the transform-domain variances has prevented the application of collaborative filtering to the denoising of strongly correlated noise, especially of noise with long range correlation. We consider specifically the case of computed microtomography, where an object is imaged across multiple angles for 3-D reconstruction. Any defects in the imaging system hence create streak noise with very long-range correlation across the angular dimension. In the reconstructed images, this noise presents as concentric circular distortions, commonly referred to as ring artifacts.

We note that the streak noise is, upon a logarithmic transformation, approximately modeled through the additive correlated noise model, which we consider through a multiscale framework. Through a separate noise variance estimation at each scale, most streaks can be attenuated by the collaborative filter even with a simple line kernel noise model. We consider both 2-D (Publi- cation II) and 3-D (Publication III) frameworks for denoising of the projection data. As the imaging operates through a photon-counting detector, some approximately Poissonian noise is necessarily included in the acquisitions. In Publication III, as a separate multiscale filtering step, we consider the attenuation of this Poissonian component upon a variance stabilization applied on the log-scale data. The presented denoisers are evaluated on both synthetic and real microtomography data, yielding state-of-the-art results.

Software packages for BM3D, BM4D, and the tomography denoising frameworks are publicly available, described in Appendix B.

1.1 Contributions and outline of the thesis

The relations of contributions of the thesis are illustrated in Figure 1.1. The core of this work is the computation of exact collaborative transform-domain noise variances, which are utilized in shrinkage, patch-matching and aggregation for improved BM3D and BM4D denoising. Furthermore, we utilize the proposed exact variance formula for noise power spectrum estimation and blind denoising. The improved collaborative denoisers are used through the additive correlated noise model for multiscale filtering of streak noise for sinogram data in computed microtomography, as well as Poissonian noise attenuation upon stabilization.

(18)

Publication I Publication II Publication III

streak model

streak estimation

multiscale streak attenuation exact variances

(Section 2.4.4)

improved BM3D

3-D streak estimation improved BM4D

multiscale 3-D streak attenuation

multiscale Poisson attenuation

collaborative filtering(Section 2.4)

non-local filtering(Section 2.3.2) transform-domain filtering(Section 2.3.3)

correlated noise model(Section 2.2)

refiltering of denoising residual noise variance

estimation (Section 2.5)

variance stabilization (Section 2.6.3) estimationPSD

(Chapter 3)

Figure 1.1 Illustration of the contributions of the thesis, marked with solid boxes, as well as some of the core components used in the advancements. Arrows indicate the main components in building of each contribution. The italics correspond to locations within this thesis.

The thesis is based on the three publications Publication I, Publication II, and Publication III, developed as follows. In Chapter 2, we discuss different sources of correlated noise, present relevant noise models, and look at collaborative denoising as a combination of non-local and transform-domain denoising principles. We consider some of the problems in conventional collaborative denoising, and show the exact computation of collaborative transform-domain noise variances. We discuss practical application of the collaborative filter in combination with various noise models, and introduce briefly the imaging setup and noise of X-ray microtomography. In Chapter 3, we present an application of the exact non-local transform-domain variances to noise estimation through collections of jointly transformed groups of similar patches. In Chapter 4, we summarize the contributions of the publications included in this work. Chap- ter 5 concludes the thesis.

(19)

2 BACKGROUND

2.1 Correlated noise

Correlated noise refers to noise for which different values are not completely independent from each other, present in varying levels in most real imaging.

In digital cameras, the charge accumulated by a detector is commonly influ- enced not only by the photons which directly reach the sensor, but also by the charges of the surrounding detectors, meaning that noise in spatially nearby detectors will be correlated with each other. Furthermore, electrical coupling of quantities may occur during the reading and transportation of charges from the detectors, inducing correlation in the resulting values; this type of correlation is often referred to as cross-talk. (Azzari et al. 2018)

Pink noise is correlated noise characterized by its power spectrum recipro- cal to the frequency; some amount is created by all analog electronic devices due to resistor voltage fluctuations (Weissman 1988). Various patterned noise types may also be modeled as correlated noise; such patterns include grid-like structures, band-pass noise, and line patterns commonly found in analog videos (Aizenberg and Butakoff 2002; Varghese 2016).

In X-ray medical imaging, noise affecting the projections is often assumed correlated. This correlation is commonly caused by spatial blurring of signal and noise, which may happen due to X-ray spread, scintillator scattering the emitted light, or sensor cross-talk (Hajdok et al. 2008; Lubberts 1968). Cor- relation may also be induced by the reconstruction process (Riederer et al.

1978). Correlated noise models have been proposed, for example, for flat-panel imagers (Samei 2003; Siewerdsen et al. 1998), flat-panel cone-beam computed tomography (Steven Tilley et al. 2014), film mammograms (Burgess 1999), and digital breast tomosynthesis (Zheng et al. 2017).

Thermal cameras are also affected by spatially correlated noise, as well as a

(20)

significant fixed-pattern component, i.e., bias in the capture depending on the sensor pixel element; with multiple captures in time, these bias stay relatively constant and such may be considered as a case of very long-range correlation (Kennedy 1993; Maggioni et al. 2014; Mooney and Shepherd 1996).

Another similar type of correlation is often found in satellite and remote- sensing imagery, particularly those captured using push-broom sensors and radar (Abramov et al. 2017; Chen et al. 2003; G´omez-Chova et al. 2008). The sensors for each wavelength of the push-broom system are located within a thin slit perpendicular to the movement of the containing platform. As a result, each pixel within a column of the resulting image is acquired by the same pixel sensor, meaning that uncompensated variations in sensor sensitivity or slit size will result in approximately fixed-pattern stripe noise across this dimension.

(G´omez-Chova et al. 2008)

In fact, any setup where per-sensor differences in sensitivity induce noise and which involves capturing a sequence such that the same sensor is used to capture multiple pixels is prone to noise correlation between these pixels.

Another prominent case with such correlation is X-ray microtomography, where the use of radiation, and the very small size and high precision needed for the components make such defects common (M¨unch et al. 2009; Vo et al. 2018).

2.2 Modeling stationary correlated noise

A standard way to model correlated noise η is through its correlation kernel g, η = ν ⊛ g, where ⊛ is the convolution operator¹, and ν is statistically independent noise. This yields a stationary model, meaning that the noise statistics are consistent throughout the image. We consider specifically the denoising ofstationary additive Gaussian noise² for which a noisy observation

1In the notation and the derivation of the formulas, the convolution is always treated as circular, which allows to adopt the duality with multiplication in Fourier domain. However, circularity ofη is an ideal hypothesis that is not encountered in real acquisitions; hence, for added realism, all of the experiments in the thesis and publications are based on non-circular noise.

2There are simple and effective ways for extending additive Gaussian noise models to other types of noise; such are discussed in Section 2.6.

(21)

z:X→Rcan be modeled as

z(x) =y(x) +η(x), x∈X, η =ν⊛g, ν(·)∼ N(0,1),

(2.1)

where y is the deterministic noise-free image³, x∈X⊂Z^d is the coordinate in the finite d-dimensional regular image domain X, and ν is a independent and identically distributed (i.i.d.) random variable following a normal distribution with mean 0 and variance 1; the variance of η is then ∥g∥²₂.

Another way of characterizing stationary correlated noise is by its power spectral density (PSD), defined as the noise variance in Fourier domain. The PSD Ψ can be computed as

Ψ = var{F[η]}

= var{F[ν]F[g]}= var{F[ν]}|F[g]|²

=|X||F[g]|²,

(2.2)

where F denotes the d-dimensional Fourier transform. The scaling by |X| comes from the adopted normalization of F, where the Euclidean norm of each basis element is equal to √︁

|X|.

Stationary correlated noise is often calledcolored noise, whereas i.i.d. noise is white; other colors are sometimes used to characterize different correlated noise PSDs loosely based on the respective color wavelengths. However, correlation kernels can be of any shape, and as such not all types of colored noise have a corresponding ”color”.

Notably, white noise can be represented as an extreme case of correlated noise where g is a Dirac delta scaled to the noise standard deviation; in this case, Ψ simplifies to a constant var{η}|X|. Some examples of noise correlation are shown in Figure 2.1 reproduced from Publication I, and their mathematical definitions in Table 2.1 reproduced from Publication I.

3We refer to these data arrays asimages, and to individual data points aspixels regardless of dimensionality; e.g., in volumetric data, they correspond to volumes and voxels.

(22)

gw ν⊛gw |F[gw]|²

g1 ν⊛g1 |F[g1]|²

g2 ν⊛g2 |F[g2]|²

g3 ν⊛g3 |F[g3]|²

g4 ν⊛g4 |F[g4]|²

Figure 2.1 Left: correlation kernels (displayed on a 71×71-pixel canvas). Center: correlated noise fields obtained by convolving the kernels with white noise. Right:

corresponding power spectral densities Ψ. The horizontal kernel g₁ is rep- resentative of sensor crosstalk in digital image acquisition, g2 is represen- tative of band-pass noise, and g3 is a line pattern noise common to analog videos (Aizenberg and Butakoff 2002; Goossens 2008). The kernelg4realizes an example of pink noise, which is found in analog electronic devices due to resistor voltage fluctuations (Weissman 1988). Asg_wis a Dirac delta,ν⊛g_w is white noise and the corresponding PSD is flat.

(23)

Table 2.1 Reproducible models for the noise correlation examples in Figure 2.1, where x⁽¹⁾ andx⁽²⁾ denote the horizontal and vertical integer coordinates, andG10

denotes a Gaussian function with standard deviation10centered at the origin.

The first three models are defined by their kernel, upon suitable renormaliza- tion to a desired variance levelvar{η}=∥g∥²2. The last is defined through its PSD|F[g4]|² of equal size to the image.

g1 max{0,1− |x⁽²⁾|}max{0,16− |x⁽¹⁾|}

g2 cos(︁(︁

(x⁽¹⁾)²+ (x⁽²⁾)²)︁1/2)︁

G¹⁰(x⁽¹⁾, x⁽²⁾) g3 cos(x⁽¹⁾+x⁽²⁾)G¹⁰(x⁽¹⁾, x⁽²⁾)

|F[g4]|² (︁

((x⁽¹⁾)²+ (x⁽²⁾)²)︁1/2

+ 10⁻²|X|^1/2)⁻¹

2.3 Path to collaborative denoising

The goal of denoising is to produce an estimate yˆ of y from z, either blindly or by leveraging the (estimated) noise statistics, i.e. Ψ. This section focuses on the progression from local filters to transform-domain filters andnon-local methods, the principles of which form the basis for collaborative filters from which this work is born.

2.3.1 Local filtering

Local denoising methods estimate values for each pixel by utilizing its neighboring pixel values within a a small window. A basic local filter is the Nadaraya- Watson filter (Nadaraya 1964; Watson 1964). Both proposed to estimate the signal as a locally weighted average through a Gaussian kernel applied through the Euclidean distances between the neighboring pixel and the referencepixel:

yˆ(x_R) =

∑︁

x∈XG(∥x_R−x∥2)z(x)

∑︁

x∈XG(∥xR−x∥2) , (2.3) where x_R is the d-dimensional pixel coordinate of the reference pixel, G is a Gaussian kernel and X is the d-dimensional image domain in practice limited to the support ofG. Although fairly effective for smoothing, the filter does not perform well in preserving non-uniformities or removing proportionally large errors, as the weights are based on positional distance alone.

A similar, but non-linear local filter is the median filter: each resulting pixel

(24)

value is the median of pixel values within its limited neighborhood. Although it suffers from many of the same problems as a weighted average, the median filter can be effective in removing extreme outliers; unlike the mean, the median of a window is not likely to change significantly from a single pixel value, no matter how large. The median filter has also been applied in combination with linear filtering, e.g., by Frieden (1976).

Local filters also include polynomial regression (Cleveland and Devlin 1988;

Fan and Gijbels 1996; Savitzky and Golay 1964; Stone 1977): for each reference pixel, a low-degree polynomial of appropriate dimensionality is fitted (e.g., with least squares) into the window, and can then be used to obtain the pixel estimate.

Not all local filters discard the pixel values in weighting. In particular, bi- lateral filters (Tomasi and Manduchi 1998) compute the weights of a weighted average through a bi-variate Gaussian taking into account the positional distance but also the intensity difference between the two pixels. In this way, the filter better preserves both edges and texture.

2.3.2 Non-local filtering

While local filters utilize a small neighborhood of pixels around the reference pixel,non-localfilters may collect information from around the full noisy image.

Crucially, this allows for leveraging similarity of features around different areas in the image (De Bonet 1997). The core work of this category of denoisers is the Non-Local Means (NLM) algorithm (Buades et al. 2005). The core of NLM denoising is also a weighted average, but instead of a local neighborhood, NLM finds similar neighborhoods from the image which are then used as the elements of the weighted mean. The ℓ²-distance is commonly used as the similarity measure. We denote a patch containing the limited neighborhood of pixel by its coordinate x, and the contents of the patch by z_x. Further denoting the coordinate of thereference patchasxR, a distance functionLx_R(x) representing the inverse ℓ²-similarity to the reference patch can be written as

L_x_R(x) =∥z_x−z_x_R∥2. (2.4)

(25)

The chosen distance function can then used to inversely weight each pixel of the image to give more weight to pixels with similar neighborhoods:

y

ˆ(x_R) = ∑︂

x∈X

e^−L²^x^R^(x)H⁻²z(x)

∑︁

x∈Xe⁻^L²^x^R^(x)H⁻², (2.5) where z(x) is the pixel of z at x, and H is a parameter which relates to the degree of filtering. A similar method may also be used to simultaneously obtain estimates for each pixel of zx_R, hence aggregating multiple estimates for each pixel due to overlapping patches (Buades et al. 2011).

Many works aim to improve parts of the NLM algorithm, such as the patch- selection, shape, and weighting (Deledalle et al. 2012; Kervrann and Boulanger 2006; Kervrann and Boulanger 2008). Furthermore, an extension to NLM, called NLM-C, has been proposed specifically for correlated noise denoising through an iterative framework which considers the noise covariance in patch selection (Goossens et al. 2008).

2.3.3 Transform-domain filtering

Transform-domain filtering utilizes properties of natural images to separate signal from noise. The filtering is commonly performed in a domain where the noise-free signal is sparse, i.e. concentrated to only a few significantly non-zero coefficients (Donoho 1995; Donoho and Johnstone 1995). Noise in any other coefficient can hence be removed with little loss of signal details. Common sparsifying transforms include the discrete Fourier transform (DFT) and discrete cosine transform (DCT). Furthermore, various families of wavelets are commonly used for multi-scale analysis, such as Beylkin et al. (1991), Coifman and Meyer (1991), Daubechies (1988) and Haar (1911). Using separate transforms for different scales of the image can offer improved flexibility, for example to preserve both large smooth areas and small-scale details. These multiscale transforms are often built through specialized wavelets, e.g. Antoine et al.

(1996), Labate et al. (2005) and Starck et al. (2002).

(26)

2.3.4 Shrinkage

The core of transform-domain filters is shrinkage in the chosen transform domain. The shrinkage operator is a non-linear filter applied separately on each transform coefficient, leveraging the enhanced sparsity of the domain to attenuate noise. Denoting by T^dD andQ^dD thed-dimensional forward and inverse transforms, respectively, shrinkage of a patch⁴ at a coordinatexinT^dD transform domain can be expressed as

y

ˆ_x =Q^dD(Υ(T^dD(z_x))), (2.6) where yˆ_x is the resulting patch estimate, and Υ is the non-linear shrinkage operator. We denote by s^x_i =⟨︁

z_x, b^dD_i ⟩︁

,i= 1, . . . , N, whereN is the total pixel count of the patch and b^dD_i is the i-th basis function of T^dD, a coefficient ofz_x in the T^dD domain; a generic shrinkage operator can then be expressed by

Υ :s^x_i ↦−→αis^x_i, (2.7) where αi ∈ [0,1] is a shrinkage attenuation factor which depends on s^x_i, the noise statistics, and possible other priors. Coefficients small in relation to the corresponding transform-domain noise variance v_i are commonly assumed to be mostly noise (as they are likely to arise fromN(0, v_i)), and as such assigned a small α_i.

2.3.4.1 Denoising by thresholding

Hard-thresholding is a simple yet effective shrinkage operation performed by setting spectrum coefficients smaller than a threshold √v_iλto zero:

α^HT_i =

⎧⎨

⎩

1 if |s^x_i| ≥√viλ 0 otherwise,

(2.8)

4In case of filters which consider the full image at once, this patch is simply of the image size.

(27)

whereλ≥0 is a fixed constant andviis the noise variance ofs^x_i, which for white noise is equal to ∑︁N

i=1∥b^dD_i ∥²2var{η}. Hard-thresholding can offer very good performance when the signal is concentrated to only few coefficients, as any coefficient not exceeding the threshold will be completely eliminated. However, no noise will be attenuated from the coefficients with significant signal, meaning that noise in such coefficients will be preserved.

Another thresholding function is soft-thresholding, where, on top of setting to zero coefficients smaller than the threshold, all larger coefficients are shrunk by the threshold magnitude. As such, some noise is eliminated even from large coefficients. Although soft-thresholding commonly achieves a smoother appear- ance, the average error tends to be greater than that of the hard threshold, making hard-thresholding a common choice despite its simplicity (Johnstone 2019).

Considering an orthonormal T^dD, the thresholding functions can be seen as an optimization of quadratic fidelity and an ℓ^p-penalty promoting sparsity, where smaller pcorresponds to higher sparsity (Johnstone 2019):

y

ˆ_x = argmin

yx

(∥z_x−y_x∥²2+c∥T^dD(y_x)∥p),

where cis a constant scaling the penalty, relating to the threshold value. Set- ting p = 1 yields a soft-thresholding in (2.6), whereas p = 0 (with ℓ⁰ defined as the number of non-zero elements) yields hard-thresholding.

Choice of threshold

For ideal noise attenuation, the threshold should be large enough to cover all noise extremes. However, if the signal is nonzero in the thresholded coefficient, too large a threshold will also attenuate significant amounts of signal. For this reason, the preferred threshold is commonly just above the expected maximum of noise (Mallat 1999), which is proportional to the transform-domain noise standard deviation √v_i. An optimized threshold can also be estimated through minimizing an expected risk (e.g., the expected squared error) based on the noisy data itself, as used in Donoho and Johnstone (1995) through the Stein unbiased risk estimate (Stein 1981). Adaptive estimation for sub-band thresholding is further developed in Chang et al. (2000).

(28)

Universal threshold

A commonly used approximation of the expected maximum of white Gaus- sian noise, the so-called universal threshold (Donoho and Johnstone 1994), is

√v_i√︁

2ln(N); this leads to a natural choice λ = √︁

2ln(N). The curious de- pendence on the sample size arises from the tail of the Gaussian distribution, raising the likelihood of increasingly large coefficients as the sample size in- creases. The universal threshold gives only the upper bound for the expected maximum for white Gaussian noise; it is nevertheless commonly employed due to its simplicity (Johnstone 2019). The universal threshold is not optimal in terms of risk; in general, a lower threshold reduces this expected error, and is commonly preferred in practical applications (Johnstone 2019; Mallat 1999).

Thresholding for correlated noise

Hard-thresholding (2.8) is directly applicable to correlated noise denoising.

In such case, the noise variance of each transform-domain coefficient depends on the power spectrum of the noise. The theoretical basis of the universal threshold √v_i√︁

2ln(N) considers only white noise, although it is also used in thresholding correlated noise. In practice, it indicates only an upper bound (of an upper bound) for correlated noise, noting that the probability of the sample maximum to exceed a given threshold is highest when standardized samples are independent (Johnstone and Silverman 1997).

2.3.4.2 Wiener filter

Wiener filter is a filter defined through minimization of the mean squared error between the estimate and an oraclereference signal representing the underlying noise-free data. The mean squared error of an estimateˆ of a noise-free signalr r can be written as a sum of bias and variance as

MSE(r,rˆ) = E{(rˆ−r)²}= var{rˆ}+ Biasr{rˆ}². (2.9) Considering a noisy coefficient s^x_i =r^x_i +n^x_i, where r_i^x is the underlying noise- free coefficient and n^x_i is the zero-mean noise component, the shrinkage oper-

(29)

ation in (2.7) with a factorαi gives rˆ^x_i =αis^x_i =αir_i^x+αin^x_i. Then, argmin

αi∈[0,1]

(var{rˆ^x_i}+ Biasr^x_i {rˆ^x_i}²) = argmin

αi∈[0,1]

(var{αin^x_i}+ (E{αir_i^x} −r^x_i)²)

= argmin

αi∈[0,1]

(α²_ivi+ (αir_i^x−r_i^x)²), which yields the attenuation factor

α^wie_i = |r_i^x|²

|r^x_i|²+v_i. (2.10) Instead of using directly the mean squared error, adding scaling factors to the bias or variance of (2.9) can be used to adjust the balance between minimization of the two sources of error. In practical applications, an oracle signal is not available; in such cases, an image estimate from another filter, e.g., hard- thresholding, can be used as the reference signal (Ghael et al. 1997).

An example of a simple transform-domain filter utilizing patch shrinkage is the sliding patch filter (Oktem et al. 1998; Yaroslavsky et al. 2001), in which a patchzx is extracted from the image at each coordinatex. Each patch is then filtered as in (2.6); commonly, the full resulting patch-estimates yˆ_x are aggre- gated to form the image estimate. The basic model uses rectangular blocks as patches; shape-adaptive alternatives have also been proposed to improve denoising performance (Foi et al. 2006; Foi et al. 2007). A particularly successful approach for denoising of white as well as correlated noise is to model the transform-domain coefficient neighborhoods through multiscale Gaussian scale mixtures, where clusters of coefficients are modeled as a product of a Gaussian vector and a positive scaling variable; from this model, the noise-free coefficients can be effectively estimated through Bayesian least-squares (Portilla et al. 2003).

2.3.4.3 Transform-domain noise power spectrum

To calculate the noise variance v_i of the T^dD spectrum coefficients needed for accurate shrinkage, we note thats^x_i =⟨︁

z_x, b^dD_i ⟩︁

=(︁

z⊛^←^→b_i^dD)︁

(x), where the ^←^→

(30)

decoration denotes the reflection about the origin of Z^d. Thus var{s^x_i}= var{︂(︂

ν⊛g⊛^←^→b_i^dD)︂

(x)}︂

= var{ν}⃦⃦⃦g⊛^←^→b_i^dD⃦⃦⃦²

2,

showing that this variance does not depend on the coordinate x of the patch.

Hence, we have adopted the simple notation v_i= var{s^x_i}. Since var{ν}= 1, v_i =⃦⃦⃦g⊛^←^→b_i^dD⃦⃦⃦²

2 =

⃦⃦

⃦⃦|X|⁻²Ψ⃓⃓⃓F[︂_←_→ b_i^dD]︂⃓⃓⃓²

⃦⃦

1

, (2.11)

where the last equality follows from Parseval’s isometry and (2.2).

2.4 Collaborative filtering and BM(d+1)D

Although effective in many scenarios, the transforms used in patch-based transform-domain filters generally require a fairly small patch size for good sparsity;

this severely limits the information available for separating signal from noise.

On the other hand, non-local averaging filters gain a significant boost by leveraging redundancy through nonlocality, but are severely limited by the linear nature of the weighted average estimators.

In the light of this,collaborative filters, first proposed by Dabov et al. (2007) for white noise denoising, were designed to combine the principles of both nonlocality and transform-domain denoising; in particular, collaborative filters perform shrinkage on collectively transformed groups of similar patches. As collaborative filters are at the core of this work, the following sections from Section 2.4.1 to Section 2.4.3 are dedicated for an in-depth look at a general conventional BM(d+ 1)D⁵ denoising algorithm particularly through the implementation of BM3D for correlated noise (Dabov et al. 2008). Other extensions of BM3D include volumetric denoising (Maggioni et al. 2012b) and video denoising algorithms using both 3-D (Maggioni et al. 2012a) and 2-D patches (Maggioni et al. 2014).

Collaborative filters process the image through a moving reference patch, positioned at each point of a grid covering the full image. For example, we

5We consider the denoising of ad-dimensional image, hence rendering thenon-local stack dimension thed+1-th.

(31)

may obtain reference patch coordinates x_R every 3 pixels along each axis, yielding roughly |X|/3^d reference patch locations. For each reference patch, the following steps are executed:

1. Collect similar patches into a group through patch-matching.

2. Obtain the (d+1)-D transform spectrum by collectively transforming the group of patches.

3. Perform shrinkage.

4. Transform the shrunk spectra back to patch estimates andaggregatethem to the original locations from which they were collected.

The common implementation of both BM3D (Dabov et al. 2008) and BM4D (Maggioni et al. 2012b) include two distinct denoising stages, both of which perform each of these steps. In the first stage, shrinkage is performed through hard-thresholding; the second stage uses a Wiener filter. In the second stage, the hard-thresholding image estimate is then used both in patch-matching and as the reference signal of the Wiener filter.

2.4.1 Patch-matching

Like other non-local methods, collaborative filters compare patches around a moving reference patch atx_R. Each potential patch atx is evaluated through the ordering function L_x_R(x) in order to select the best M (possibly overlapping) patches to construct the group{z_x₁, . . . , z_x_M}extracted fromzat coordinatesx₁, . . . , x_M, respectively. In practice, the goal is often to find the patches which are most similar to the reference patch in terms of the underlying noise- free content. AsLoperates on noisy patches, the difference between noise-free content can only be estimated or approximated. One may use directly the ℓ²-norm of the difference of the patches as in (2.4), difference of their spectra L_x_R(x) =∥s^x_i^R−s^x_i∥²₂, or other, non-Euclidean distance measures (Rubel et al.

2014). To reduce the effects of noise, many methods (e.g., Dabov et al. (2008) and Rubel et al. (2015)) include a scaling of the difference by the respective standard deviations of the coefficients such that

L_x_R(x) =

⃦⃦

s^x_i^R−s^x_i

√vi

⃦⃦

2

. (2.12)

(32)

This aims to give less weight to those coefficients which are presumed to be especially noisy. However, such weighting poses problems particularly when few coefficients have very small variance. In such case, the patch-matching can be entirely guided by only a few coefficients with exceedingly large weights.

The issue is not merely numerical, but conflicts with the main goal of patch- matching, i.e., similarity between the grouped underlying noise-free patches;

similarity of low-noise coefficients within two patches does not necessarily imply similarity between other, noisy coefficients of the same patches.

In the second denoising stage, the patch-matching is performed on the hard- thresholded image estimate yˆ^HT instead ofz to drastically reduce the effect of noise on the ordering. As this estimate can reasonably be presumed to be a noise-free approximation of the clean signal, this matching is commonly done without any compensation for noise.

2.4.2 Shrinkage

The core of collaborative filtering is shrinkage in the T^(d+1)D domain of the group of similar patches. To obtain the group spectrum for collaborative shrinkage, each patch of the group {z_x₁, . . . , z_x_M} is first transformed locally through a d-dimensional transform T^dD. The transformed patches are then

”stacked” to form a (d+ 1)-dimensional volume [s^x_i¹, . . . , s^x_i^M], i= 1, . . . , N through which a separate one-dimensional transform T^NL of lengthM is then applied to produce the jointly-transformed group, for which a generic coefficient can be written as

s^x_i,j¹^,...,x^M =⟨︂

[z_x₁;· · · ;z_x_M], b^dD_i ⊗b^NL_j ⟩︂

=

=⟨︁

[s^x_i¹,· · · , s^x_i^M], b^NL_j ⟩︁

=

∑︂M t=1

b^NL_j (t)s^x_i^t, (2.13)

where b^NL_j (t) is the t-th element of the j-th basis function b^NL_j of T^NL, and [·;· · ·;·] denotes the stacking along the (d+1)-th dimension; notation of the transforms and stacking are illustrated in Figure 2.2 reproduced from Publi- cation I. A generic shrinkage of the combined T^(d+1)D spectrum can then be

(33)

written as

s^x_i,j¹^,...,x^M ↦−→α^x_i,j¹^,...,x^Ms^x_i,j¹^,...,x^M, (2.14) where α^x_i,j¹^,...,x^M is commonly computed as either

α^HT_i,j =

⎧⎨

⎩

1 if ⃓⃓⃓s^x_i,j¹^,...,x^M⃓⃓⃓ ≥√v_iλ 0 otherwise,

(2.15)

for hard-thresholding, or

α^wie_i,j=

⃦⃦

⃦⟨︂[︁

yˆ^HT_x₁ ;· · · ;yˆ^HT_x_M]︁

, b^dD_i ⊗b^NL_j ⟩︂⃦⃦⃦²

⃦⃦

⃦⟨︂[︁

y

ˆ^HT_x₁ ;· · · ;yˆ^HT_x_M]︁

, b^dD_i ⊗b^NL_j ⟩︂⃦⃦⃦²+v_i

, (2.16)

for the Wiener filter, where yˆ^HT is the image estimate produced by the hard- thresholding stage. For the sake of simplicity and based on empirical tuning, BM3D uses a fixed shrinkage threshold λ independent of N or M instead of directly the universal threshold√︁

2 ln(N M) (whereN M is the total maximum sample size of the stack), specifically λ= 2.7 for white noise (Dabov et al.

2007) and a slightly larger λ= 2.9 for correlated noise (Dabov et al. 2008).

The common parameters used in these implementations,N = 64 andM = 16, would instead result in a considerably largerλ≈3.7. Furthermore, considering solely the expected noise maximum (as with the universal threshold) as the basis forλ, the threshold of correlated noise would be expected to be smaller than that of white noise, further deviating these values from the theoretical guidelines of Johnstone and Silverman (1997).

In Publication I, we analyze the selection of λ, showing that the optimal choice for collaborative filtering depends significantly on the noise PSD, and does not follow directly the theoretical relation, where increased correlation would imply a smaller threshold. Hence, to solve the selection of parameters for a specific input PSD, Publication I resorts to a simple learning-based approach.

(34)

z

z_x₂

s_, s_,

1,1 x1

1

s^x₄¹ s^x₃²

s^x₃³

x1

s _, x₁

x2 x₁

s s ^,x²^,x³

x₁,x₂,x₃

,x2,x₃

,x₂,x₃ 41

x1 33 32x1

z

T^2D

T^NL

zx₃

T^2D T^2D

Figure 2.2 Notation and indexing of patch coordinatesx_k, patchesz_x_k, and coefficients s^x_i^k ands^x_i,j¹^,...,x^M in the correspondingT^dD andT^(d^+1)D spectra. The illustration is for a group of three patches of size2×2at coordinatesx₁= (4,3), x₂= (7,5),x₃= (8,6)within a10×10-pixel image (d= 2).

2.4.2.1 Noise variance of the T^(d+1)D spectrum

The variance of the (d+ 1)-D spectrum coefficients can be written as var{s^x_i,j¹^,...,x^M}= var

{︄ _M

∑︂

t=1

b^NL_j (t)s^x_i^t }︄

=

∑︂M k=1

∑︂M t=1

b^NL_j (k)b^NL_j (t) cov{s^x_i^t, s^x_i^k}=

=

∑︂M t=1

(︁b^NL_j (t))︁2

var{︁

s^x_i^t}︁

+∑︂

k̸=t

b^NL_j (k)b^NL_j (t) cov{s^x_i^t, s^x_i^k}=

=v_i

∑︂M t=1

(︁b^NL_j (t))︁2

+∑︂

k̸=t

b^NL_j (k)b^NL_j (t) cov{s^x_i^t, s^x_i^k}, (2.17)

where cov{s^x_i^t, s^x_i^k} is the covariance between same T^dD spectrum coefficients for different patches.

Despite the second term showing a clear difference to thed-dimensional case without the stack transform, one may note that the notation of variancev_iused in (2.15) and (2.16) is identical to those of Section 2.3.4. Indeed, all conventional collaborative filters, apart from the video denoising algorithmRF3D, use transform-spectrum noise variance directly adopted from the local transform filters. The common assumption allowing this simplification (e.g., Dabov et al.

(35)

(2007), Dabov et al. (2008) and Matrecano et al. (2012)) is the independence of the noise between stacked patches; in other words, noise may be correlated within each patch, but not across distinct patches. Thus, the covariance terms in (2.17) are all 0; under the further assumption that T^NL is orthonormal, (2.17) then simplifies to (2.11):

var{s^x_i,j¹^,...,x^M} ≈ v_i

∑︂M t=1

(︁b^NL_j (t))︁2

=v_i, (2.18)

meaning that T^(d+1)D spectrum variances are identical to those of T^dD and hence become independent of the patch coordinates.

Although the approximation (2.18) is used by BM3D and its derivative works, the requirement of independence between patches is not always fulfilled.

It obviously fails if the patches are overlapping, as already noted by Dabov et al. (2007); furthermore, correlation in the noise may extend between two different patches, failing the requirement even when they do not overlap. This results in very large approximation errors of the variances; to demonstrate this issue, we compare the approximations v_i to the real spectrum variances in Figure 2.3 reproduced from Publication I for white noise, and in Figure 2.4 reproduced from Publication I for correlated noise. The calculation of the variances presented in the coming Section 2.4.4 and in Publication I is exact even when this requirement is not met.

(36)

white Gaussian no is e

^-4

0 4

0.9 1

1.1 j=1

0.9 1

1.1 j=2

0.9 1

1.1 j=3

0.9 1

1.1 j=4

-4 0 4

0.8 1 1.4 j=1

0.8 1 1.4 j=2

0.8 1 1.4 j=3

0.8 1 1.7 j=4

Figure 2.3 Comparison between local transform-domain variancesv_iand the exact variances of the joint spectrumvar{s^x_i,j¹^,...,x^M}. Each group of plots shows a 1-D noise signal (black line) with unit variance from which we extractM= 4seg- ments ofN= 16samples (colored shaded areas). We compute the transform of each group using two 1-D orthonormal transforms (DCT of each segment followed by the Haar transform applied across the segments), equivalent to T^dD and T^NL in the d-D case. Finally, we show 4 bar-plots that repre- sent the ratio √︂

v_i/var{s^x_i,j¹^,...,x^M}, withj = 1, . . . ,4. Top: white Gaussian noise with non-overlapping segments. Bottom: white Gaussian noise with overlapping segments. Correlated examples are shown in Figure 2.4.

(37)

correlated Gaussian noise

-4 0 4

0.9 1

1.1 j=1

0.9 1

1.1 j=2

0.9 1

1.1 j=3

0.9 1

1.1 j=4

-4 0 4

0.7 1

1.6 j=1

0.9 1

1.3 j=2

0.7 1

1.3 j=3

0.7 1

1.3 j=4

-4 0 4

0.5 1

2.3 j=1

0.5

5.5 j=2

0.8

8 j=3

0.8

2.4 j=4

Figure 2.4 Comparison between local transform-domain variancesv_iand variances of the joint spectrumvar{s^x_i,j¹^,...,x^M}for correlated noise, shown as in Figure 2.3.

Top-to-bottom: 1) non-overlapping distant segments, 2) non-overlapping close segments, and 3) overlapping segments. Note the massive differences between v_i and var{s^x_i,j¹^,...,x^M}, especially when the patches overlap; v_i is exact only in the topmost case of Figure 2.3 with non-overlapping patches of i.i.d. noise.

Exact Transform-Domain Variances for Collaborative Filtering of Correlated Noise

Exact Transform- Domain Variances for Collaborative Filtering of

Correlated Noise

YMIR MÄKINEN

YMIR MÄKINEN

Exact Transform-Domain Variances for Collaborative Filtering of Correlated Noise

PREFACE

ABSTRACT

CONTENTS

ORIGINAL PUBLICATIONS INCLUDED IN THIS THESIS

Author’s contribution

1 INTRODUCTION

1.1 Contributions and outline of the thesis

Publication I Publication II Publication III

2 BACKGROUND

2.1 Correlated noise

2.2 Modeling stationary correlated noise

2.3 Path to collaborative denoising

2.3.1 Local filtering

2.3.2 Non-local filtering

2.3.3 Transform-domain filtering

2.3.4 Shrinkage

2.4 Collaborative filtering and BM(d+1)D

2.4.1 Patch-matching

2.4.2 Shrinkage

white Gaussian no is e

correlated Gaussian noise