Richardson–Lucy deconvolution
The Richardson–Lucy algorithm, also known as Lucy–Richardson deconvolution, is an iterative procedure for recovering an underlying image that has been blurred by a known point spread function. It was named after William Richardson and Leon B. Lucy, who described it independently.[1][2]
Description
When an image is produced using an optical system and detected using photographic film or a charge-coupled device, for instance, it is inevitably blurred, with an ideal point source not appearing as a point but being spread out into what is known as the point spread function. Extended sources can be decomposed into the sum of many individual point sources, thus the observed image can be represented in terms of a transition matrix p operating on an underlying image:
- [math]\displaystyle{ d_{i} = \sum_{j} p_{i,j} u_{j}\, }[/math]
where [math]\displaystyle{ u_{j} }[/math] is the intensity of the underlying image at pixel [math]\displaystyle{ j }[/math] and [math]\displaystyle{ d_{i} }[/math] is the detected intensity at pixel [math]\displaystyle{ i }[/math]. In general, a matrix whose elements are [math]\displaystyle{ p_{i,j} }[/math] describes the portion of light from source pixel j that is detected in pixel i. In most good optical systems (or in general, linear systems that are described as shift invariant) the transfer function p can be expressed simply in terms of the spatial offset between the source pixel j and the observation pixel i:
- [math]\displaystyle{ p_{i,j} = P(i-j) }[/math]
where [math]\displaystyle{ P(\Delta i) }[/math] is called a point spread function. In that case the above equation becomes a convolution. This has been written for one spatial dimension, but of course most imaging systems are two dimensional, with the source, detected image, and point spread function all having two indices. So a two dimensional detected image is a convolution of the underlying image with a two dimensional point spread function [math]\displaystyle{ P(\Delta x,\Delta y) }[/math] plus added detection noise.
In order to estimate [math]\displaystyle{ u_j }[/math] given the observed [math]\displaystyle{ d_i }[/math] and a known [math]\displaystyle{ P(\Delta i_x,\Delta j_y) }[/math], the following iterative procedure is employed in which the estimate of [math]\displaystyle{ u_j }[/math] (called [math]\displaystyle{ \hat{u}_{j}^{(t)} }[/math]) for iteration number t is updated as follows:
- [math]\displaystyle{ \hat{u}_{j}^{(t+1)} = \hat{u}_j^{(t)} \sum_{i} \frac{d_{i}}{c_{i}}p_{ij} }[/math]
where
- [math]\displaystyle{ c_{i} = \sum_{j} p_{ij} \hat{u}_{j}^{(t)} }[/math]
and [math]\displaystyle{ \sum_jp_{ij}=1 }[/math] is assumed. It has been shown empirically that if this iteration converges, it converges to the maximum likelihood solution for [math]\displaystyle{ u_j }[/math].[3]
Writing this more generally for two (or more) dimensions in terms of convolution with a point spread function P:
- [math]\displaystyle{ \hat{u}^{(t+1)} = \hat{u}^{(t)}\cdot\left(\frac{d}{\hat{u}^{(t)}\otimes P}\otimes P^*\right) }[/math]
where the division and multiplication are element wise, [math]\displaystyle{ \otimes }[/math] indicates a 2D convolution, and [math]\displaystyle{ P^* }[/math] is the flipped point spread function.
In problems where the point spread function [math]\displaystyle{ p_{ij} }[/math] is not known a priori, a modification of the Richardson–Lucy algorithm has been proposed, in order to accomplish blind deconvolution.[4]
Derivation
In the context of fluorescence microscopy, the probability of measuring a set of number of photons (or digitalization counts proportional to detected light) [math]\displaystyle{ \mathbf{m} = [m_0, ..., m_K] }[/math] for expected values [math]\displaystyle{ \mathbf{E} = [E_0, ..., E_K] }[/math] for a detector with [math]\displaystyle{ K+1 }[/math] pixels is given by
- [math]\displaystyle{ P(\mathbf{m} \vert \mathbf{E}) = \prod_{i}^{K} \mathrm{Poisson}(E_{i}) = \prod_{i}^{K} \frac{{E_{i}}^{m_i} e^{-E_{i}}}{m_i !} }[/math]
it is convenient to work with [math]\displaystyle{ \ln(P) }[/math] since in the context of maximum likelihood estimation we want to find the position of the maximum of the likelihood function and we are not interested in its absolute value.
- [math]\displaystyle{ \ln (P(m \vert E)) = \sum_{i}^{K} \left[ (m_i \ln E_i - E_i) - \ln(m_i!) \right] }[/math]
Again since [math]\displaystyle{ \ln(m_i!) }[/math] is a constant, it will not give any additional information regarding the position of the maximum, so let's consider
- [math]\displaystyle{ \alpha(m \vert E) = \sum_{i}^{K} \left[m_i \ln E_i - E_i\right] }[/math]
where [math]\displaystyle{ \alpha }[/math] is something that shares the same maximum position as [math]\displaystyle{ P(m \vert E) }[/math]. Now let's consider that [math]\displaystyle{ E }[/math] comes from a ground truth [math]\displaystyle{ x }[/math] and a measurement [math]\displaystyle{ \mathbf{H} }[/math] which we assume to be linear. Then
- [math]\displaystyle{ \mathbf{E} = \mathbf{H} \mathbf{x} }[/math]
where a matrix multiplication is implied. We can also write this in the form
- [math]\displaystyle{ E_m = \sum_{n}^{K} H_{mn} x_n }[/math]
where we can see how [math]\displaystyle{ H }[/math], mixes or blurs the ground truth.
It can also be shown that the derivative of an element of [math]\displaystyle{ \mathbf{E} }[/math], [math]\displaystyle{ (E_i) }[/math] with respect to some other element of [math]\displaystyle{ x }[/math] can be written as:
-
[math]\displaystyle{ \frac{\partial E_i}{\partial x_j} = H_{ij} }[/math]
(
)
-
Tip: it's easy to see this by writing a matrix [math]\displaystyle{ \mathbf{H} }[/math] of say (5 x 5) and two arrays [math]\displaystyle{ \mathbf{E} }[/math] and [math]\displaystyle{ \mathbf{x} }[/math] of 5 elements and check it. This last equation can interpreted as how much one element of [math]\displaystyle{ \mathbf{x} }[/math], say element [math]\displaystyle{ i }[/math] influences the other elements [math]\displaystyle{ j \neq i }[/math] (and of course the case [math]\displaystyle{ i = j }[/math] is also taken into account). For example in a typical case an element of the ground truth [math]\displaystyle{ \mathbf{x} }[/math] will influence nearby elements in [math]\displaystyle{ \mathbf{E} }[/math] but not the very distant ones (a value of [math]\displaystyle{ 0 }[/math] is expected on those matrix elements).
Now, the key and arbitrary step: we don't know [math]\displaystyle{ x }[/math] but we want to estimate it with [math]\displaystyle{ \hat{\mathbf {x}} }[/math], let's call [math]\displaystyle{ \hat{\mathbf {x}_{old}} }[/math] and [math]\displaystyle{ \hat{\mathbf {x}_{new}} }[/math] the estimated ground truths while we are using the RL algorithm, where the hat symbol is used to distinguish ground truth from estimator of the ground truth
-
[math]\displaystyle{ \hat{x}_{new} = \hat{x}_{old} + \lambda \frac{\partial \ \alpha(m \vert E(x))}{\partial x}|_{\hat{x}_{old}} }[/math]
(
)
-
Where [math]\displaystyle{ \frac{\partial}{\partial x} }[/math] stands for a [math]\displaystyle{ K }[/math]-dimensional gradient. If we work on the derivative of [math]\displaystyle{ \alpha(m \vert E(x)) }[/math] we get
- [math]\displaystyle{ \frac{\partial \ \alpha(m \vert E(x))}{\partial x_j} = \frac{\partial}{\partial x_j} \sum_{i}^{K} \left[m_i \ln E_i - E_i\right] = \sum_{i}^{K} \left[\frac{m_i}{E_i} \frac{\partial}{\partial x_j} E_i - \frac{\partial}{\partial x_j} E_i \right] = \sum_{i}^{K} \frac{\partial E_i}{\partial x_j} \left[\frac{m_i}{E_i} - 1 \right] }[/math]
and if we now use (1) we get
- [math]\displaystyle{ \frac{\partial \ \alpha(m \vert E(x))}{\partial x_j} = \sum_{i}^{K} H_{ij} \left[\frac{m_i}{E_i} - 1 \right] }[/math]
But we can also note that [math]\displaystyle{ {H}^{T}_{ji} = H_{ij} }[/math] by definition of transpose matrix. And hence
-
[math]\displaystyle{ \frac{\partial \ \alpha(m \vert E(x))}{\partial x_j} = \sum_{i}^{K} {H}^{T}_{ji} \left[\frac{m_i}{E_i} - 1 \right] }[/math]
(
)
-
Then if we consider [math]\displaystyle{ j }[/math] spanning all the elements from [math]\displaystyle{ 1 }[/math] to [math]\displaystyle{ K }[/math] this equation can be rewritten in its vectorial form
- [math]\displaystyle{ \frac{\partial \ \alpha(m \vert \mathbf{E}(x))}{\partial x} = {\mathbf{H}^T} \left[\frac{\mathbf{m}}{\mathbf{E}} - \mathbf{1} \right] }[/math]
where [math]\displaystyle{ \mathbf{H}^T }[/math] is a matrix and [math]\displaystyle{ m }[/math], [math]\displaystyle{ E }[/math] and [math]\displaystyle{ \mathbf{1} }[/math] are vectors. Let's now propose the following arbitrary and key step
-
[math]\displaystyle{ \lambda = \frac{\hat{\mathbf{x}}_{old}}{\mathbf{H}^T \mathbf{1}} }[/math]
(
)
-
where [math]\displaystyle{ \mathbf{1} }[/math] is a vector of ones of size [math]\displaystyle{ K }[/math] (same as [math]\displaystyle{ m }[/math], [math]\displaystyle{ E }[/math] and [math]\displaystyle{ x }[/math]) and the division is element-wise. Using (3) and (4) we can rewrite (2) as
- [math]\displaystyle{ \hat{\mathbf{x}}_{new} = \hat{\mathbf{x}}_{old} + \lambda \frac{\partial \alpha(\mathbf{m} \vert \mathbf{E}(x))}{\partial x} = \hat{\mathbf{x}}_{old} + \frac{\hat{\mathbf{x}}_{old}}{{\mathbf{H}^T} \mathbf{1}} {\mathbf{H}^T} \left[\frac{\mathbf{m}}{\mathbf{E}} - \mathbf{1} \right] = \hat{\mathbf{x}}_{old} + \frac{\hat{\mathbf{x}}_{old}}{\mathbf{H}^T \mathbf{1}} \mathbf{H}^T \frac{\mathbf{m}}{\mathbf{E}} - \hat{\mathbf{x}}_{old} }[/math]
which yields
-
[math]\displaystyle{ \hat{\mathbf{x}}_{new} = \hat{\mathbf{x}}_{old} \mathbf{H}^T \left(\frac{\mathbf{m}}{\mathbf{E}}\right) / \mathbf{H}^T \mathbf{1} }[/math]
(
)
-
Where division refers to element-wise matrix division and [math]\displaystyle{ \mathbf{H}^T }[/math] operates as a matrix but the division and the product (implicit after [math]\displaystyle{ \hat{\mathbf{x}}_{old} }[/math]) are element-wise. Also, [math]\displaystyle{ \mathbf{E} = E(\hat{\mathbf{x}}_{old}) = \mathbf{H} \hat{x}_{old} }[/math] can be calculated because we assume
- The initial guess [math]\displaystyle{ \hat{\mathbf{x}}_{0} }[/math] is known (and is typically set to be the experimental data)
- The measurement function [math]\displaystyle{ \mathbf{H} }[/math] is known
On the other hand [math]\displaystyle{ \mathbf{m} }[/math] is the experimental data. Therefore, equation (5) applied successively, provides an algorithm to estimate our ground truth [math]\displaystyle{ \mathbf{x}_{new} }[/math] by ascending (since it moves in the direction of the gradient of the likelihood) in the likelihood landscape. It has not been demonstrated in this derivation that it converges and no dependence on the initial choice is shown. Note that equation (2) provides a way of following the direction that increases the likelihood but the choice of the log-derivative is arbitrary. On the other hand equation (4) introduces a way of weighting the movement from the previous step in the iteration. Note that if this term was not present in (5) then the algorithm would output a movement in the estimation even if [math]\displaystyle{ \mathbf{m} = E(\hat{\mathbf{x}}_{old}) }[/math]. It's worth noting that the only strategy used here is to maximize the likelihood at all cost, so artifacts on the image can be introduced. It is worth noting that no prior knowledge on the shape of the ground truth [math]\displaystyle{ \mathbf{x} }[/math] is used in this derivation.
Software
- RawTherapee (since v.2.3)
See also
References
- ↑ Richardson, William Hadley (1972). "Bayesian-Based Iterative Method of Image Restoration". Journal of the Optical Society of America 62 (1): 55–59. doi:10.1364/JOSA.62.000055. Bibcode: 1972JOSA...62...55R.
- ↑ Lucy, L. B. (1974). "An iterative technique for the rectification of observed distributions". Astronomical Journal 79 (6): 745–754. doi:10.1086/111605. Bibcode: 1974AJ.....79..745L.
- ↑ Shepp, L. A.; Vardi, Y. (1982), "Maximum Likelihood Reconstruction for Emission Tomography", IEEE Transactions on Medical Imaging 1 (2): 113–22, doi:10.1109/TMI.1982.4307558, PMID 18238264
- ↑ Fish D. A.; Brinicombe A. M.; Pike E. R.; Walker J. G. (1995), "Blind deconvolution by means of the Richardson–Lucy algorithm", Journal of the Optical Society of America A 12 (1): 58–65, doi:10.1364/JOSAA.12.000058, Bibcode: 1995JOSAA..12...58F, https://pdfs.semanticscholar.org/9e3f/a71e22caf358dbe873e9649f08c205d0c0c0.pdf
Original source: https://en.wikipedia.org/wiki/Richardson–Lucy deconvolution.
Read more |