Penalized maximum likelihood simultaneous longitudinal PET image reconstruction with difference‐image priors

Purpose Many clinical contexts require the acquisition of multiple positron emission tomography (PET) scans of a single subject, for example, to observe and quantitate changes in functional behaviour in tumors after treatment in oncology. Typically, the datasets from each of these scans are reconstructed individually, without exploiting the similarities between them. We have recently shown that sharing information between longitudinal PET datasets by penalizing voxel‐wise differences during image reconstruction can improve reconstructed images by reducing background noise and increasing the contrast‐to‐noise ratio of high‐activity lesions. Here, we present two additional novel longitudinal difference‐image priors and evaluate their performance using two‐dimesional (2D) simulation studies and a three‐dimensional (3D) real dataset case study. Methods We have previously proposed a simultaneous difference‐image‐based penalized maximum likelihood (PML) longitudinal image reconstruction method that encourages sparse difference images (DS‐PML), and in this work we propose two further novel prior terms. The priors are designed to encourage longitudinal images with corresponding differences which have (a) low entropy (DE‐PML), and (b) high sparsity in their spatial gradients (DTV‐PML). These two new priors and the originally proposed longitudinal prior were applied to 2D‐simulated treatment response [18F]fluorodeoxyglucose (FDG) brain tumor datasets and compared to standard maximum likelihood expectation‐maximization (MLEM) reconstructions. These 2D simulation studies explored the effects of penalty strengths, tumor behaviour, and interscan coupling on reconstructed images. Finally, a real two‐scan longitudinal data series acquired from a head and neck cancer patient was reconstructed with the proposed methods and the results compared to standard reconstruction methods. Results Using any of the three priors with an appropriate penalty strength produced images with noise levels equivalent to those seen when using standard reconstructions with increased counts levels. In tumor regions, each method produces subtly different results in terms of preservation of tumor quantitation and reconstruction root mean‐squared error (RMSE). In particular, in the two‐scan simulations, the DE‐PML method produced tumor means in close agreement with MLEM reconstructions, while the DTV‐PML method produced the lowest errors due to noise reduction within the tumor. Across a range of tumor responses and different numbers of scans, similar results were observed, with DTV‐PML producing the lowest errors of the three priors and DE‐PML producing the lowest bias. Similar improvements were observed in the reconstructions of the real longitudinal datasets, although imperfect alignment of the two PET images resulted in additional changes in the difference image that affected the performance of the proposed methods. Conclusion Reconstruction of longitudinal datasets by penalizing difference images between pairs of scans from a data series allows for noise reduction in all reconstructed images. An appropriate choice of penalty term and penalty strength allows for this noise reduction to be achieved while maintaining reconstruction performance in regions of change, either in terms of quantitation of mean intensity via DE‐PML, or in terms of tumor RMSE via DTV‐PML. Overall, improving the image quality of longitudinal datasets via simultaneous reconstruction has the potential to improve upon currently used methods, allow dose reduction, or reduce scan time while maintaining image quality at current levels.


I. INTRODUCTION
Positron emission tomography (PET) is widely used to observe and measure functional and metabolic behaviour in vivo. In a number of contexts, multiple PET Accepted Article biomarkers in reflecting tumour response 4 . Such studies have been performed for various cancer treatments, such as chemotherapy 5 , radiotherapy 6 , and immunotherapy 7 . The number of scans that are performed in longitudinal oncological protocols varies greatly; although two to five scans are typical 2,6,8 , more than ten can be acquired in specific contexts 7 . Analysis of longitudinal changes throughout these image series is also varied, ranging from qualitative methods 2 , through semi-quantitative measures like standardised uptake values (SUVs) 3,5,6 , to full quantification of tumour metabolism, e.g. by Patlak analysis 8,9 . Reported SUVs include the maximum SUV measured within the tumour 5 , the mean value within the tumour 6 , and in some cases the SUV is compared to a background region such as the liver for increased stability 4,7 .
For the majority of these longitudinal PET studies, images are typically reconstructed with iterative reconstruction methods such as maximum likelihood expectationmaximisation (MLEM 10 ) or, more commonly, an accelerated version of MLEM known as ordered subsets expectation-maximisation, or OSEM 11 . Iterative methods can produce PET images that are superior in quality to analytic methods such as filtered back-projection (FBP) by incorporating more realistic models of the acquisition process and by taking into account the Poisson distributed nature of the acquired counts 12 .
These iterative methods attempt to find the set of parameters (most commonly voxel intensities) which produce expected data that is closest to the measured data, given some model of the acquisition process. However, due to the counts-limited nature of PET data, they produce noisy images at convergence. In clinical practice, this problem is solved by terminating the reconstruction early, before the noise becomes excessive. However, using unconverged image estimates can impact on quantification by introducing bias into intensity values. An alternative approach is to regularise the PET reconstruction to avoid noisy images. One of the most common methods of regularising PET image reconstructions is to use a Bayesian or penalised maximum likelihood framework. These methods allow the introduction of prior expectations into the reconstruction, and allow a tradeoff between how well the reconstructed image matches the measured data (data consistency), and how well it matches the prior expectations. Since noise manifests itself as large differences between neighbouring voxels, a common technique for reducing PET image noise via regularisation methods is to include a penalisation of neighbourhood differences in the prior expectations. However, large neighbourhood differences are to be expected over valid image edges and the penalisation is not desired in these regions of the image. It is possible to define isotropic, edge-preserving reconstruction priors that use total-variational methods [13][14][15] , but these methods can produce artificial-looking, piecewise constant images that may not reflect the true nature of the activity distribution.
Another option for edge-preservation in regularised PET image reconstruction is to use a prior source of knowledge to define which voxels are expected to appear similar, thereby switching off the PET regularisation over valid edges. A promising source of such information is from magnetic resonance (MR) images of the same patient, since MR images often have superior noise and resolution properties to PET. There have been many studies into the use of MR-based priors in PET reconstruction [16][17][18][19][20][21][22] , but one key remaining question is in ensuring that MR-specific features are not imprinted upon the PET image.
The advent of simultaneous PET-MR systems provides convenient MR images for guiding PET image reconstruction; due to the simultaneity of the acquisitions, the inter-modality registration problem is made simpler. Furthermore, simultaneous PET-MR image reconstruction has been proposed to allow both modalities to influence the reconstruction of the other, allowing improved PET images along with faster MR acquisition times [23][24][25] . However, again, there is a danger of unwanted crosstalk between the imaging modalities degrading the reconstructed images.
Inspired by joint simultaneous PET-MR reconstruction methods, we aim to couple longitudinal PET datasets during the reconstruction process to improve the images from all scans. Our proposed methods operate in the difference image domain in order to produce PET images that are natural looking and of a superior quality. We previously proposed the simultaneous reconstruction of longitudinal PET datasets 26 , where noise reduction was achieved by penalising voxel differences between longitudinal PET scans. Our results showed that this method allows for the reduction of image-wide reconstruction errors and noise levels while maintaining the appearance of regions of change, resulting in higher contrast-to-noise ratios for lesions in inserted lesion brain scans. However, the method was restricted to two scans with the same background intensity levels. In this work we aim to extend the simultaneous longitudinal reconstruction framework to a general number of scans with arbitrary activity levels. Additionally, we present two new longitudinal priors that have been designed to improve reconstruction performance within lesions. These new simultaneous longitudinal reconstruction methods are characterised using 2D simulation studies before being applied to real datasets, where two PET images were acquired in the context of a head and neck treatment response study.

A. Theory
Let θ be a vector containing the intensities of a PET image so that θ j is the model of the mean of the number of emissions (proportional to activity concentration) from This article is protected by copyright. All rights reserved.

Accepted Article
the jth voxel. In this work, we consider a set of longitudinal PET images, denoted as {θ s }, where s = 1, ..., S indicates the scan number. In this notation, the intensity of the jth voxel of the sth longitudinal scan is denoted θ j,s .
The joint reconstruction of multiple PET datasets is achieved by formulating the following penalised maximum likelihood (PML) objective function: where y s is the measured data vector (for example, the number of recorded counts in each sinogram bin) for scan s, L (θ s |y s ) is the Poisson log-likelihood of image θ s producing y s , U ({θ s }) is a penalty function in terms of all the longitudinal images, and β is a hyperparameter that controls the strength of the regularisation.
In this work, we apply penalties to the set of difference images, {δ sk }, given bỹ whereθ s = α s θ s represents the image θ s normalised so that voxels within regions which have not changed metabolically between scans yield differences of zero. In general, the penalty term is written as: Explicitly, given S scans, the S 2 difference images can be calculated and the total penalty is the weighted sum of the function u applied to all possible difference images. The weights w sk control the coupling between scans in the reconstruction; setting w sk = 0 ensures that there is no similarity encouraged directly between scans s and k.
In this work, in order to seek the maximum of Equation (1), we employ the one-step-late iterative reconstruction method proposed by Green 27 . This results in the following update equation: where P ij,s are the elements of the system matrix for scan s, ν is the iteration number, andȳ s θ (ν) s is the expected mean data vector given the image θ s , according to the affine relationship: where b i,s are the expected additive contributions to the measured data through random coincidences and scattered photons. As shown in equation 4, the one-step-late algorithm uses the gradient of the penalty function evaluated at the current image estimate to allow a closed form update similar to that of standard MLEM. The disadvantage of this approach is that it is not guaranteed to converge to a global maximum of the objective function and can result in unstable reconstructions when β is too large. Nonetheless, the one-step-late update has been widely used in the PET image reconstruction literature and provides valuable insight into the performance of PML reconstruction methods provided β is not too large.

Difference image sparsity prior -DS-PML
Assuming that the only a small image region changes from scan to scan, one possible prior is to expect the difference images to be sparse. In the inverse problems and parameter estimation literature, the ℓ 1 -norm is often used as a convex surrogate for the ℓ 0 -norm to encourage sparsity in some domain. Therefore, the difference image sparsity (DS) prior for longitudinal PET image reconstruction uses the following form for u(δ sk ): where ε is a small value introduced to ensure that the prior is differentiable at all values. In previous experiments 26 a similar longitudinal prior was shown to reduce PET reconstruction errors by lowering image noise in regions of no change, while producing small biases in longitudinally changing tumours.

Difference image entropy prior -DE-PML
For longitudinal PET images consisting of a number of distinct tissue classes (e.g. white matter, grey matter, tumour), it is expected that each one of these tissues varies in activity across the longitudinal scans in a consistent way, e.g. that all grey matter voxels change in intensity by the same amount. This can be formulated as assuming that the difference image should have a low entropy. The use of information-based priors has previously been explored in the field of MR-guided PET image reconstruction 16 , but to our knowledge, they have not been incorporated into longitudinal PET reconstructions or applied to difference images.
Given a set of parameters {θ j }, the probability density function (PDF) of a continuous random variable, X, can be estimated asp (x; θ) using the Parzen window method, or kernel density estimation: As previously mentioned, we want to encourage solutions which have a low entropy difference image, i.e: Note that in practice we replace the continuous variable X with a discrete approximation, and the integral becomes a sum. Using this formulation, the gradient of the entropy prior can be calculated 16 and used for the one-step-late reconstruction.

Difference image total variation prior -DTV-PML
The total variation (TV) of a 2D image can be defined as 28 where θ x,y here denotes the image indexed in each spatial dimension separately. Here we propose that we expect the difference images to have sparse spatial gradient images, i.e. that the TV of the difference images is low. Therefore, the difference image TV prior (DTV) uses the following for u(δ sk ): The TV penalty has previously been used in singledataset PET reconstruction 28 , but to the best of the authors' knowledge, it has never been applied to difference images between longitudinal PET scans.

B. 2D ground truth images and data simulation
Realistic brain phantom software 29 was used to create 2D [ 18 F]fluorodeoxyglucose (FDG) ground truth images comprising four segmented intensity regions: background, white matter, grey matter, and tumour. Background voxels were assigned activities of 0 and the ratio between grey and white matter intensities was 4:1. The tumour was a variable uniform circular region on the boundary between the left middle frontal gyrus and white matter, characterised by its radius and intensity. Ground truth images had an array size of 512 × 512 with a pixel size of 0.5× 0.5 mm 2 .
To transform between image and data spaces, the Radon and adjoint Radon transforms were used for projection and backprojection respectively. In order to generate the datasets for reconstruction, the ground truth images were blurred by a 4.3 mm Gaussian kernel before forward projecting. Sinogram space attenuation factors were calculated from an attenuation map comprising background (µ = 0 mm −1 ), soft tissue (µ = 0.0096 mm −1 ), and bone (µ = 0.0172 mm −1 ) 30 . Random events were approximated by a uniform sinogram, and scattered coincidences were approximated by smoothing the projected ground truth in each case. The scatters and randoms were scaled and added to the data to produce noise-free sinograms which contained 20% scatter and 20% randoms. Noise-free and non-resolution degraded data were reconstructed with filtered backprojection (FBP) to provide reference images in the same space as the noisy reconstructions.

C. Experiment 1: Hyperparameter selection
Two ground truth datasets were generated as outlined above, one baseline (PET1) and one follow-up (PET2). The PET1 tumour had a radius of 15 mm and a tumour to white matter intensity ratio of 8:1, and the PET2 tumour was 8 mm with a corresponding ratio of 6:1. The two images were scaled in order to set the total expected number of counts in the PET1 dataset to approximately 4 million. In addition, a second, double-counts pair of noise-free datasets were generated with double the intensity to provide a low-noise benchmark.
100 noisy realisations of each of the datasets (PET1, PET2, double-counts PET1, and double-counts PET2) were then produced by introducing Poisson noise into the noise-free datasets. These datasets were reconstructed with a variety of methods, all run for 200 iterations, and all using the original attenuation, scatters and randoms sinograms from the simulation process. The datasets with an average of approximately 4 million counts per sinogram were reconstructed with the following methods: 1. independent dataset MLEM, 2. DS-PML, 3. DE-PML, and 4. DTV-PML. The noisy double-counts PET1 and PET2 were reconstructed with independent dataset MLEM to demonstrate the image quality achievable when a higher level of counts are recorded.
For each of the longitudinal difference-image prior reconstruction methods, the scan-to-scan weighting matrix w sk was set to a value of 1 for all s,k, as were all normalising factors α s . The prior specific parameters were as follows: a. DS-PML: β values were varied between 7.5 × 10 −5 to 15 × 10 −4 and ε was set to 1 × 10 −6 .
b. DE-PML: A Gaussian window of standard deviation equal to 0.35 (arb. intensity units) was used to estimate the PDF of the difference images. During each iteration of the DE-PML algorithm, a 100-level discretisation was used for approximating the continuous PDF. In addition, a mask was used to ensure that entropy was only calculated in the brain itself since the background naturally has low entropy. β values were varied between This article is protected by copyright. All rights reserved.
In addition, a longitudinal smooth was applied to the MLEM reconstructions, using a two-point Gaussian kernel parameterised by the standard deviation, σ, so that with σ = 0 the images are unchanged, and with σ = +∞ the outputs are just the average of the input images.
To evaluate the quality of the reconstructed images, the 100 noisy realisations of the datasets were reconstructed as described above and the following regional metrics were calculated.
For a set of images, {θ jn }, with n indexing noisy realisations, the percentage root square mean error relative to a reference image {θ Ref j } for a given region Ω is calculated as: (12) where N Ω is the number of voxels in Ω and N (= 100) is the number of noisy realisations. %RMSE values were calculated for PET1 and PET2 images using the noisefree FBP images as the corresponding reference images. Ω was selected as a 226-voxel circular region of interest (ROI) encompassing the 15 mm tumour and some surrounding tissue. Note that the same region was used in both PET1 and PET2 analyses so that the PET2 image, with the smaller tumour, contained more background within Ω. The mean intensities within the PET1 and PET2 tumours were also calculated for each reconstruction, and the average of these across all noisy realisations calculated to provide a measure of the effects of the regularisation penalties on tumour quantification.
To measure the background noise present in the images, the coefficient of variation (CV, equal to the standard deviation divided by the mean) was calculated in an eroded white matter mask.
Finally, bias and standard deviation maps were produced by calculating the bias (relative to the noise-free FBP images) and standard deviation of each voxel across the N = 100 noisy realisations.
From the results of Experiment 1, hyperparameters for each of the reconstruction methods were selected and used for the rest of the 2D simulation experiments, where imaging conditions and counts levels per scan were nominally the same as in Experiment 1.

D. Experiment 2: Varying tumour response
Experiment 1 considered only one example of a tumour response. To evaluate the performance of the presented difference-image prior methods for a range of pre-and post-treatment behaviours, we performed a second experiment in which tumour radius and intensity were varied and the performance of the reconstruction methods with fixed β was assessed. Initially keeping the tumour intensities the same as listed in Section II C, the PET1 and PET2 tumour radii were varied between 3 mm and 15 mm in steps of 2 mm. 100 noisy realisations of simulated PET data were generated for each radius pair using the same methodology and counts levels as in Experiment 1, and reconstructed with 200 iterations of each method with the selected hyperparameters. For all PET1 and PET2 reconstructions for all tumour pairs, %RMSE was calculated in the same 226-voxel circular ROI used in Experiment 1.
Following this, the radii of the two tumours were kept the same as listed in Section II C, with the intensities varied between 2 and 8 (relative to a nominal white matter intensity of 1). Using 200 iterations of each reconstruction method, with 100 noisy realisations per dataset again, %RMSE was calculated in the tumour ROI.

E. Experiment 3: Five-scan simulation study
Experiments 1 and 2 used the simplifying case of a twoscan protocol. To investigate the effect of the inter-scan weights w sk , a five-scan case was simulated. Five ground truth images and datasets were created in accordance with Section II B. The tumour radii were: 15 mm, 8 mm, 4 mm, 3 mm, and 5 mm, and the respective intensities were 8, 6, 4, 3, and 6 (relative to white matter). The counts level of each scan was the same as that used in Experiments 1 and 2, i.e. approximately 4 million counts per scan, and 100 noisy realisations of each dataset were generated.
The five datasets were reconstructed with each of the difference-image prior methods. The w sk array is now rewritten so that: where σ w is a parameter than can be varied from 0 (i.e. w sk = 1 iff s = k) to +∞ (i.e. w sk = w∀s, k), and ∆ sk = min (|s − k|, |s − k − S|, |s − k + S|) is the cyclic longitudinal distance between scans s and k. Using a cyclic distance results in all scans having the same total weighting with other scans, ensuring that, for example, scans 1 and 5 are not regularised less just by virtue of having no preceding or succeeding scan. κ s is a normalising factor used to control the sum of the rows of the weights array. In this experiment we choose κ s so that k w sk = 2, in agreement with the total weight used in the previous experiments where w sk was 1 2×2 .
The σ w parameter was varied between 0.4 and 8, with a final σ w of ∞ also included to provide a case comparable to those used in Experiments 1 and 2. β values for the difference-image priors were set to the same values as used in Experiment 2, based on the results of Experiment 1.
In order to provide comparable images at different image noise levels, the five noise-free longitudinal datasets This article is protected by copyright. All rights reserved.

Accepted Article
were scaled to different counts levels, used to generate 100 noisy, high-counts datasets, and then reconstructed with 200 iterations of MLEM. These counts levels ranged from 1× to 5× the original level, to correspond to the maximum noise reduction expected from the DS-PML, DE-PML and DTV-PML methods.
Reconstructed images for all methods were analysed by calculating mean tumour intensities, white matter CV values, and mean squared error (MSE, equal to the sum of the squared bias and squared standard deviation) maps.

F. Experiment 4: Application to real data for head and neck cancer
In the final experiment, the proposed priors were applied to a real longitudinal pair of [ 18 F]FDG datasets acquired from a head and neck cancer patient in a Siemens mCT time-of-flight (TOF) PET-CT scanner (Siemens Healthcare, Erlangen, Germany). The patient exhibited a highly-uptaking tumour in the right side of the lower jaw, which reduced in size and uptake between the two scans (an interval of 85 days). The injected activity of each of the two scans was 340 MBq and 357 MBq, resulting in total prompt counts of 78.5 × 10 6 and 76.7 × 10 6 respectively in sinograms acquired in the head bed position. Note that although the data were acquired in TOF format, sinograms were rebinned to non-TOF for this experiment.
As well as the prompts sinograms, computed tomography (CT) images were acquired for attenuation correction, and delayeds sinograms were acquired for randoms estimation. The vendor-supplied e7 tools software was used to calculate attenuation correction factors and the randoms sinograms for each dataset to be used in the reconstructions. Scatter correction was not performed.
To allow a comparison between reconstructions with the proposed methods and double-counts images as performed in Experiments 1 to 3, the counts levels in the original mCT sinograms were reduced by 50% by randomly removing individual counts with a probability of 0.5. The randoms estimations were scaled by a factor of 0.5 accordingly.
The counts-reduced longitudinal datasets were then reconstructed with DS-PML, DE-PML, and DTV-PML, using projectors based on the mCT geometry 31 . To reduce reconstruction time, these reconstruction methods were accelerated with an ordered subsets implementation 11 . The same counts-reduced datasets were also reconstructed with standard OSEM. All reconstructions were run with 5 iterations of 21 subsets reconstructing into an image grid size of 400 × 400 × 109, with voxel sizes of 2.036 × 2.036 × 2.027 mm 3 . Each reconstruction used resolution modelling with a Gaussian kernel of full-width-at-half-maximum of 3 mm.
To address the inherent mis-alignment between the datasets due to differing patient position in the two acquisitions, non-rigid registration was performed with the two CT-based attenuation maps utilising a demons-based registration (MATLAB, MathWorks, MA, USA). Using the baseline scan as the reference image space, the calculation of the penalty gradients for each difference-image prior, as required in equation 4, was performed by applying the non-rigid alignment fields to the current PET2 image before calculation of the penalty gradients. This produced the penalty gradient in the same image space as the PET1 scan, so that the alignment operator estimated from the image registration process was then used to transform the penalty gradients back into the PET2 space. Note that with this methodology each of the PET datasets are reconstructed in their respective native image spaces; the only time alignment is applied is in the calculation of the penalty gradient terms.
For the difference-image prior methods the weights w sk were 1 for all s and k, and α s were given by T 1 /T s where T s denotes the total number of prompt counts recorded in scan s. For DS-PML, β was varied between 0 and 0.2 and ε was 1 × 10 −6 . For DE-PML, β was varied between 0 and 2000 with a Gaussian Parzen window width of 0.02. Finally, for DTV-PML, β values were varied between 0 and 0.1, and ε = 1 × 10 −6 . Whereas in the 2D simulation studies only DE-PML required a masking of voxels for calculation of the penalty gradient, for the real 3D datasets masking was required for all priors due to low sensitivity at the edges of the field-of-view. This penalty mask was calculated using the extent of the patient's head based on the PET1 attenuation map, and cropped to remove the low-sensitivity axial edges of the field-of-view.
The quality of the reconstructed images was assessed by measuring the mean of the tumour and the CV in a background region in both the PET1 and PET2 reconstructed images. To serve as a reference, the original full-counts data were reconstructed into the same image grid, using OSEM with 5 iterations of 21 subsets, and the tumour mean and background CV were measured. Figure 1 shows the tradeoffs between tumour mean and white matter CV as a function of regularisation hyperparameter for the proposed methods. An ideal method would provide a point that coincides with the doublecounts MLEM point with only the same amount of data that is available to the proposed methods. It is clear that compared to a naive longitudinal smooth, the proposed methods are far superior in terms of limiting the bias introduced into the tumour. In particular, the DE-PML method provides high amounts of noise reduction while keeping the tumour mean close to its MLEM value.

A. Experiment 1: Hyperparameter selection
The corresponding tradeoff between tumour %RMSE and white matter CV is shown in figure 2. Here the double-counts MLEM reconstructions exhibit both lower This article is protected by copyright. All rights reserved.  tumour %RMSE and lower white matter CV. The DS-PML method at high β values allows high levels of white matter noise reduction, while approximately keeping tumour %RMSE similar to the MLEM reconstructions. On the other hand, the DE-PML and in particular the DTV-PML methods reduce PET1 tumour error compared to the MLEM reconstructions, at the same time as reducing white matter CV. For the PET2 reconstructions, the three difference-image priors produce similar error levels, due to the region containing a proportionally higher amount of background than in the PET1 case.

Accepted Article
To assess the tradeoff between bias and variance that gives rise to the %RMSE values in figure 2, the tumour mean was plotted against the tumour CV as a function of β for the presented difference-image prior methods, as shown in figure 3. It was observed that both DTV-PML and DE-PML produced noise reduction in the tumour itself, resulting in the lower %RMSE values observed previously. The DS-PML method performed worse than the other two methods in terms of noise reduction. It is interesting to note that in the PET1 reconstructions, DE-PML and DTV-PML with sufficiently high values of β reduced tumour noise to levels considerably below even the double-counts reconstruction, although for the PET2 reconstructions tumour noise plateaued at a CV level greater than the double-counts case. Based on figures 1 to 3, the selection of an optimal β value for each method is non-trivial. In practice, there is a tradeoff between all variables shown in those figures, so that any single β value cannot be optimal for all metrics of image quality. Since optimising so many correlated and anti-correlated metrics simultaneously is difficult, and since ultimately any clinical usage of these algorithms would be subject to the requirements of the particular application in question, we did not attempt to select β values based on a quantitative measure. Instead, to demonstrate both the full potential offered by each method and their respective drawbacks, we selected β values that were sufficiently high so as to approach the noise levels of the double-counts MLEM reconstructions, without excessive penalisation. The selected values are indicated by arrows in figures 1 to 3, and listed in table I.
To test the convergence of the presented reconstruction methods, a single noisy realisation of the Experiment 1 data was reconstructed with each of the methods with the β values listed in table I, as well as with MLEM. All reconstructions were run to 1000 iterations, and the value of the corresponding objective function at each iteration number was calculated. Figure 4 shows the normalised This article is protected by copyright. All rights reserved.  When considering the bias and standard deviation maps (figure 6), it is apparent that all of the proposed methods reduce the voxel-wise variance in background regions due to the noise reduction indicated by figures 1 and 2. However, behaviour in and around the tumour is different for the various methods, with DS-PML maintaining voxel-wise variance at levels similar to MLEM, and with the DE-PML and DTV-PML methods reducing voxel-wise variance in the tumour. In terms of the bias maps, the difference-image prior methods produce similar results to the MLEM reconstructions, but with a slight visible increase in bias in the edges of the PET2 tumour for DE-PML and DTV-PML.

Algorithm
Selected β value DS-PML 9 × 10 −4 DE-PML 1.8 DTV-PML 4 × 10 −4 Table I. β values selected for use in the remainder of the 2D simulation experiments. Figure 7 displays the %RMSE in the tumour region as a function of PET1 and PET2 tumour radius, and figure  8 shows the corresponding results for changing PET1 and PET2 tumour intensities. The CV in background regions is similar to those shown in figures 1 and 2 and so are not replicated.

B. Experiment 2: Varying tumour response
Firstly, it is apparent that for all tumour sizes doubling the number of counts used in the MLEM reconstruction reduces reconstruction errors, reflecting the results shown in figure 2. In general, a smaller tumour radius and intensity causes higher reconstruction errors in MLEM reconstructions.
When using the difference-image priors, error levels are generally the same as or reduced compared to MLEM reconstructions with the same number of counts when the tumour radii vary (figure 7). In particular, with the difference-image prior methods, the highest errors occur when the change in tumour radius is greatest (c.f. bottom right and top left corners of the heat maps shown in figure 7). For the DS-PML method, these extreme cases have %RMSE levels similar to the MLEM reconstructions, while elsewhere %RMSE is reduced slightly due to noise reduction. For the DE-PML and DTV-PML methods, even at the most extreme radii changes, error levels are lower than the MLEM case, in agreement with figure 1. In some cases these errors approach the double-counts MLEM reconstructions.  This article is protected by copyright. All rights reserved.  Figure 9 shows the tumour mean values and the white matter CV as a function of σ w for DS-PML, DE-PML and DTV-PML with β values unchanged from Experiment 2. Also shown are the corresponding values for MLEM with a number of recorded counts varied between 1× and 5× the number used for the difference-image prior methods. As σ w tends towards infinity, the noise is increasingly reduced until at σ w = ∞, the noise levels reach a minimum which is determined by the selected β value. Out of the three priors, the DE-PML method gives the least biased tumour mean values, with σ w = ∞ providing values comparable to MLEM reconstructions.

C. Experiment 3: Five-scan simulation study
Observation of the PET1 and PET2 tumour mean/CV tradeoffs (figure 10) reflects the results seen in figure 3, despite the change of regularisation parameter from β to σ w . In figure 10, increasing σ w reduces tumour CV for the DTV-PML method, and to a lesser extent the DE-PML method. DS-PML has a weaker effect on tumour CV; in the PET1 reconstructions, the tumour noise was maintained at levels similar to the 1× counts MLEM reconstructions, despite the introduction of a bias lowering the measured tumour mean value. Of the three difference-image prior methods, DE-PML provides on average the best tradeoff between noise reduction in the background, noise reduction in the tumour and biasmitigation in the tumour.
When considering the example reconstructed images at σ w = ∞ (figure 11), the improvements are visually obvious. Compared to the MLEM reconstructions with the same level of counts, the difference-image priors produce visually superior images with reduced noise throughout the image. These noise levels are visually similar to the  I. Tumour intensities were kept constant at 8 and 6 (relative to a nominal white matter uptake of 1). Also shown are average %RMSEs across PET1 and PET2 and the difference of these averages from the MLEM averages. Note that in all cases the analysis region was a dilation of the largest (15 mm) tumour, and so for smaller tumours the analysis region contained proportionally more background. It is apparent that across tumour sizes, the difference-image prior methods allow for good reconstruction of the region around the tumour, with error values generally falling between MLEM and double-counts MLEM reconstructions.
MLEM reconstructions with 5× the data. In the tumour itself, the effects of the different methods manifest themselves. The DS-PML reconstruction provides a tumour that appears similar to the MLEM reconstruction with the same amount of data, due to the tendency of this method to leave regions of change relatively unaltered. For the DE-PML and DTV-PML methods, the tumour itself is visually improved compared to MLEM with the same number of counts, due to noise reduction in the tumour.
Finally, figure 12 shows the mean-squared-error (MSE, equal to the sum of voxel-wise bias squared and voxelwise standard deviation squared) images for each of the methods. Compared to MLEM reconstructions with the same amount of data, the difference-image prior reconstructions reduce image-wide MSE, due to noisereduction in shared regions. In the tumour the levels of error reduction vary between the proposed methods. The DS-PML method slightly reduces error compared to This article is protected by copyright. All rights reserved. the MLEM reconstructions, due to the small amount of noise-reduction that occurs when voxel-values are longitudinally biased offsetting the bias itself, as previously reported 26 . For the DE-PML and DTV-PML methods, error reduction in the tumour is stronger. Here we see that despite the correct estimation of tumour means provided by the DE-PML method in figure 9, this method still produces higher voxel-wise errors in the tumour compared to DTV-PML. This suggests that the tumour means are maintained with DE-PML at the expense of higher levels of tumour voxel variance.

D. Experiment 4: Application to real data for head and neck cancer
The tumour mean and background CV for the real data experiment are shown in figure 13. Similarly to the results seen in the simulation experiments, increasing β reduces background noise while affecting the tumour mean. In the PET1 reconstructions, DS-PML and DTV-PML introduced a negative bias into the tumour   mean, whereas DE-PML produced a positive bias. In the PET2 reconstructions, DS-PML and DTV-PML produced tumour mean values similar to standard reconstruction techniques (i.e. with β = 0), which themselves were positively biased compared to the double-counts reference OSEM reconstruction. It was also observed that while the background CV in the PET1 images reduced to levels similar to the double-counts OSEM reconstruction, in agreement with the simulation study results, the PET2 background CV did not fall by the same amount, remaining considerably higher than the double-counts reconstruction even at the highest β values used.
Based on figure 13, β values were selected for the real data case. These values were 0.2 for DS-PML, 2000 for DE-PML, and 0.04 for DTV-PML. Figure 14 shows the reconstructed images for each of these methods, alongside the double-counts OSEM reconstruction and the OSEM reconstruction with the same data as the difference-image prior methods. It was noted that the tumour in the PET2 image consisted of two distinct peaks as opposed to the single peak observed in the PET1 images. The background noise reduction indicated by figure 13 is visible in the difference-image prior methods, particularly in the PET1 scans. The appearance of the tumours generally appears the same as the OSEM methods, although DTV-PML smooths the tumour slightly. The distinction between the three methods is most pronounced when considering the difference images though, where the mechanism of each of the three priors becomes apparent. The DS-PML method successfully sets a large portion of the difference image to zero (or close to zero), enhancing the visibility of the remaining changes. DE-PML has also set much of the difference image background to zero, with relatively few remaining intense voxels, mainly in the tumour. Lastly, the DTV-PML method has effectively smoothed the difference image. It is noted that in this real data case, imperfections in the alignment process have left other large changes present in the difference image, most noticeably around the cerebellum.
Finally, figure 15 shows the profiles through the tumour in the PET1, aligned PET2, and difference images for each reconstruction method. In the PET1 DE-PML reconstruction ( figure 15(a)), the tumour peak is considerably higher than the other reconstruction methods, reflecting the positive bias observed in figure 13. In the PET2 reconstructions ( figure 15(b)), the overestimation This article is protected by copyright. All rights reserved.   figure 15(c)) show the overestimation of the difference provided by DE-PML, and also the effect of each method in the observed differences in the smaller peak of the PET2 tumour, which was more intense in PET2 than in PET1 (red circle). DS-PML and DE-PML suppress this small real change relative to the OSEM reconstructions, whereas DTV-PML better preserves it.

IV. DISCUSSION
In this work we have built upon the concept of simultaneous reconstruction of longitudinal PET datasets via the use of difference-image penalty terms. Specifically, we have proposed novel penalties that encourage low entropy difference images (DE-PML) and difference images with sparse spatial gradients (DTV-PML). The results of 2D simulation study experiments have shown that using longitudinal difference-image priors to reconstruct S longitudinal scans can reduce image noise up to levels typically achieved by standard reconstructions using S× the number of recorded counts. However, there is a tradeoff between the level of background noise reduction achieved and the reconstruction of regions of change between scans.
In particular, the results from Experiment 1 showed that all of the proposed penalty terms introduce some level of bias in terms of estimation of tumour mean values with sufficiently high penalty strengths ( figure 1). Nonetheless, the entropy-based method DE-PML was observed to produce the lowest levels of bias when using a double-counts MLEM as the reference. On the other hand, when considering reconstruction error in and around the tumour, the DTV-PML method produced the lowest %RMSE values, beating even the DE-PML method (figure 2). Given that the DE-PML method reconstructs tumour means more accurately, the improvement in %RMSE seen with the DTV-PML method results from noise reduction within the tumour, an observation reinforced by the lower voxel-wise variance observed in figure 6. Overall though, the level of bias introduced by any of the proposed methods was small, and inspection of the longitudinal trends of the tumour mean ( figure 5) shows that for this particular case (intensity reduction of 25% and radius reduction of 47%), even a relatively strong regularisation in terms of high levels of background noise reduction allows for clear discernment of the behaviour of the tumour.
When the regularisation level was held constant and tumour response varied, the performance of the proposed methods was largely stable. At large radius changes (with fixed intensities), error levels began to rise, with DTV-PML producing the lowest errors of the three penalties ( figure 7). This is due to the noise reduction properties observed in Experiment 1. However, in general, with any of the three methods error levels in the tumour as a function of response are more complicated than for MLEM reconstructions. Before use in a clinical setting, the proposed methods would need to be validated across a representative, application-specific set of tumour responses to ensure that they are robust to the range of changes anticipated in that context. Such a validation study is beyond the scope of this present work which focuses primarily on the proposal of new reconstruction methodologies.
Extending the difference-image prior methods to the five-scan 2D case reinforced the results seen previously with the two-scan case. In particular, the tradeoff between background noise reduction and tumour bias was shown to hold true for varying σ w , which controlled the coupling between scans throughout the longitudinal series (figure 9). At σ w = ∞, maximum noise reduction was achieved at the expense of bias levels similar to those observed in the two-scan case. Also reflected from the two-scan case was the ability of the DE-PML method to more accurately reconstruct tumour means, and the error reduction of DTV-PML due to noise reduction in the tumour ( figure 12). In the five-scan case, the visual impact of using difference-image priors is stronger in the reconstructed images due to the greater amount of data that is available to be shared between the scans.
Overall, the results of the real data case study reflected those of the simulation experiments, with increasing β reducing background image noise and preserving tumour appearance. Again, some bias was introduced into the tumour mean relative to the standard OSEM reconstructions. However, the real data experiment also highlighted some of the issues with the three difference-image priors, particularly when there are additional changes between the images beyond those caused by the tumour response, This article is protected by copyright. All rights reserved. Figure 14. Example images for the real data case study. From left to right: PET1 images in their native space; PET2 images in their native space; PET2 images aligned to the PET1 image space; and difference images in the PET1 image space (i.e. column 3 minus column 1). Top row shows CT images with tumour (red) and background (blue) regions of interest shown for reference. PET reconstructions include the double-counts OSEM reconstructions, OSEM reconstructions, and reconstructions using DS-PML, DE-PML and DTV-PML, each with β values as indicated in figure 13. These latter three difference-image prior methods reduce noise in the background of the image while maintaining the visual appearance of the tumour. Each also displays a distinctive difference image reflecting the characteristics encouraged by each prior.

Accepted Article
such as those that were observed in the brain of the patient. For the DS-PML method, the existence of these additional strong changes could be the cause of the suppression of the tumour differences caused by the appearance of a second tumour peak in the PET2 scan ( figure  15, red circle). Because the amplitude of this valid tumour change was lower than the differences observed elsewhere in the image, it was suppressed in place of these other changes in order to encourage the sparsity sought by the prior term. Likewise, the existence of the changes within the brain is suspected to be the cause of the larger biases observed with the DE-PML method; by attempt-ing to produce a low entropy difference image, similar changes can be encouraged to have the same value, even over large spatial distances. This is reflected in figure  14, where it can be seen that the changes in the brain and the tumour in the DE-PML difference image seem to have similar values, despite them appearing distinct in the other difference images. This issue could be alleviated in future by adopting a local entropy prior on the difference image, encouraging spatially distant parts of the image to possess their own low-entropy distributions without allowing them to interfere with each other. Contrary to the other two methods, the DTV-PML method This article is protected by copyright. All rights reserved. Figure 15. Line profiles through the tumour for (a) the PET1 images, (b) the aligned PET2 images, and (c) the difference images (bottom). The red circle highlights differences due to a region that increased in uptake from PET1 to PET2 that was suppressed by the DS-PML and DE-PML methods but retained by the DTV-PML method.

Accepted Article
seems to deal better with the existence of other changes in the difference image, maintaining the second tumour peak that was suppressed by the other two methods (figure 15, red circles). However, the DTV-PML method still introduced bias into the tumour reconstructions because of spatial smoothing applied by the penalty in the difference image domain. This reflects the results seen in the simulation studies (e.g. figure 3(a)), where the tumour was smoothed relative to the standard MLEM reconstructions.
The proposal of longitudinal image reconstruction for PET images is, as far as the authors are aware, still a novel concept previously proposed only by our group 26 . In MR imaging however, the coupling of longitudinal datasets has been previously proposed, via the use of adaptive data acquisition and penalisation of differences between the follow-up image and the baseline image in the reconstruction 32,33 . While there are similarities between this method and the ones we describe here and previously 26 , coupling of MR datasets is distinct from the PET case since in MR imaging information loss in the form of data undersampling results in image artefacts and not necessarily increased image noise. Therefore, in MR, it is possible to have a single high quality baseline scan to be able to improve the follow-up scan in a manner analogous to guided reconstructions in PET. On the other hand, in PET imaging, the noise levels are almost always high. In this case, reconstructing the follow-up scan by penalising differences with respect to a pre-reconstructed baseline image would not be useful. By penalising both images simultaneously as in our proposed framework, noise can be effectively suppressed at early iterations, achieving the inter-scan transfer of information for the PET case.
Another area of research related to longitudinal reconstruction is that of dynamic, or 4D PET image reconstruction. In 4D reconstruction, the temporal behaviour of a tracer distribution is required, and there exist many methodologies for estimating this 34 . However, there are notable differences between 4D reconstruction methods and longitudinal reconstruction methods. Firstly, 4D reconstructions generally have access to all data from within a dynamic acquisition, which ensures that timeframes are adjacent. This means that it is easier to encourage temporal similarity between frames, since it is safe to assume that no large changes have occurred. In the longitudinal case, the scans could quite feasibly be separated by months, which is ample time for large changes to occur. Secondly, because the overall aim of 4D reconstruction is often to estimate kinetic parameters according to pharmacokinetic compartmental models, there can be a greater level of robustness to biases in individual isolated voxel values that can be caused by the model fitting. On the other hand however, in longitudinal studies the exact values of voxels and ROIs can be very important for a given single scan, often more so than the case of a single frame within a 4D reconstruction. Finally, in 4D PET contexts, the number of time frames (commonly pseudo-continuous temporal sampling is possible when reconstructing in list-mode; or at least > 10 timeframes are used) is considerably higher than the typical number of scans available for longitudinal studies (2 is standard; 5 would be considered high). This limitation of longitudinal studies means that it is more difficult to devise meaningful longitudinal models which would be suitable for direct incorporation into the image reconstruction method.
The methods and results presented in this work have, of course, their limitations. Firstly, and most importantly, the introduction of bias into the tumour region could be a cause for concern. However, as shown in this work, these biases are relatively small compared to typically expected variations due to tumour progression or response. Furthermore, the amount of bias that is acceptable will be application specific. For example, in tumour detection contexts, the benefit of background noise reduction provided by our proposed methods might be considered worth the bias introduced into the tumour. On the other hand, in tumour response scans where the threshold for responder or non-responder is very fine these methods might be of limited benefit (or at least lower penalty strengths would need to be used).
As highlighted by the real data experiment, another limitation of the presented difference-image prior reconstruction methods is the need for high-quality alignment fields to be used during the reconstruction. Errors in alignment fields introduce additional features into the difference images, which can adversely impact the effectiveness of the priors as described above. In this work we used non-rigid deformation fields estimated using a demons-based method, which resulted in difference images that contained some residual changes that were not due to the tumour. Future work would require refinement of the registration process by considering other registration algorithms and optimising all relevant parameters, although the most appropriate registration algorithm will depend on the application. In this respect, previous work aiming to provide regional or voxel-level analysis of tumour differences 35-38 might be informative as to the applications that are most well-developed for incorporation of the methods presented in this paper. Another opportunity for improvement, with the increased usage of simultaneous PET-MR scanners, is the use of MR images, with superior soft tissue contrast over CT, to perform the registration. Furthermore, incorporating the estimation of motion fields during PET image reconstruction is a growing area of research [39][40][41] , which could have potential for refining alignment fields estimated from other imaging modalities.
Ultimately, however, the methods presented in this work need to be tested in an application-specific manner to ascertain which areas could benefit most from their use. This would include testing over many representative cases to allow parameter values like β and σ w (or rules for setting them) to be derived based on a population, rather than the limited single object used in the simulations in this work. It is important to note that while this work has focused on oncology treatment response scans as an application of longitudinal imaging, there are many others where difference-image priors could be of use, such as in the long-term observation of neuro-degenerative disorders 42,43 , ictal and interictal subtraction imaging in epilepsy 44,45 , and functional brain studies that explore changes in brain activation arising from drugs or other stimuli 46-48 .

V. CONCLUSIONS
Longitudinal PET scans are often acquired in order to observe and quantify physiological and molecular changes in vivo over extended time scales. Typically these scans are reconstructed independently using standard PET reconstruction algorithms such as MLEM. Simultaneous reconstruction of longitudinal PET datasets is a newly proposed area of research which has opened up many possibilities for improvement of longitudinal PET images. This work builds on the promising previous preliminary results by proposing additional difference-image priors and evaluating their performance. In simulation studies, results showed that considerable noise reduction can be achieved across background portions of the image. Performance of the reconstruction methods within the tumour itself vary, with encouragement of a low-entropy difference image maintaining low levels of bias in the tumour, while encouraging difference images with few spatial gradients provides the lowest reconstruction errors by way of noise reduction at the cost of induced tumour bias. Application to a real data case study showed similar improvements, but highlighted the need for high-quality alignment fields to allow the proposed methods to perform optimally.
Future work will require exploration of the issue of mis-registration of longitudinal scans by exploring other registration algorithms and optimising parameters. Furthermore, evaluation of the performance of the proposed methods in an application-specific manner will need to be carried out in order to ascertain which areas of longitudinal PET imaging, not limited to oncology, will benefit from simultaneous image reconstruction. Ultimately, the coupling of longitudinal datasets during image reconstruction has the potential to improve longitudinal PET images 'for free', if current injected doses are used, or to reduce injected dose (and therefore the patient's absorbed dose) by a factor approximately equal to the number of longitudinal scans and maintain current levels of image quality.