Volume 48, Issue 5 p. 2230-2244
Research Article
Free Access

Dynamic PET image reconstruction utilizing intrinsic data-driven HYPR4D denoising kernel

Ju-Chieh (Kevin) Cheng

Corresponding Author

Ju-Chieh (Kevin) Cheng

Pacific Parkinson’s Research Centre, The University of British Columbia, 2215 Wesbrook Mall, Vancouver, BC, V6T 1Z3 Canada

Department of Physics and Astronomy, The University of British Columbia, 6224 Agricultural Road, Vancouver, BC, V6T 1Z1 Canada

Author to whom correspondence should be addressed. Electronic mail: [email protected].

Search for more papers by this author
Connor Bevington

Connor Bevington

Department of Physics and Astronomy, The University of British Columbia, 6224 Agricultural Road, Vancouver, BC, V6T 1Z1 Canada

Search for more papers by this author
Arman Rahmim

Arman Rahmim

Department of Physics and Astronomy, The University of British Columbia, 6224 Agricultural Road, Vancouver, BC, V6T 1Z1 Canada

Department of Radiology, University of British Columbia, Vancouver, BC, V5Z 1M9 Canada

Search for more papers by this author
Ivan Klyuzhin

Ivan Klyuzhin

Department of Medicine, Division of Neurology, University of British Columbia, Vancouver, BC, V6T 2B5 Canada

Search for more papers by this author
Julian Matthews

Julian Matthews

Division of Neuroscience and Experimental Psychology, Wolfson Molecular Imaging Centre, The University of Manchester, Manchester, M20 3LJ UK

Search for more papers by this author
Ronald Boellaard

Ronald Boellaard

Department of Radiology and Nuclear Medicine, VU University Medical Center, De Boelelaan 1117, Amsterdam, 1081 HV Netherlands

Department of Nuclear Medicine and Molecular Imaging, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713 KC Groningen, Netherlands

Search for more papers by this author
Vesna Sossi

Vesna Sossi

Department of Physics and Astronomy, The University of British Columbia, 6224 Agricultural Road, Vancouver, BC, V6T 1Z1 Canada

Search for more papers by this author
First published: 03 February 2021
Citations: 14

Abstract

Purpose

Reconstructed PET images are typically noisy, especially in dynamic imaging where the acquired data are divided into several short temporal frames. High noise in the reconstructed images translates to poor precision/reproducibility of image features. One important role of “denoising” is therefore to improve the precision of image features. However, typical denoising methods achieve noise reduction at the expense of accuracy. In this work, we present a novel four-dimensional (4D) denoised image reconstruction framework, which we validate using 4D simulations, experimental phantom, and clinical patient data, to achieve 4D noise reduction while preserving spatiotemporal patterns/minimizing error introduced by denoising.

Methods

Our proposed 4D denoising operator/kernel is based on HighlY constrained backPRojection (HYPR), which is applied either after each update of OSEM reconstruction of dynamic 4D PET data or within the recently proposed kernelized reconstruction framework inspired by kernel methods in machine learning. Our HYPR4D kernel makes use of the spatiotemporal high frequency features extracted from a 4D composite, generated within the reconstruction, to preserve the spatiotemporal patterns and constrain the 4D noise increment of the image estimate.

Results

Results from simulations, experimental phantom, and patient data showed that the HYPR4D kernel with our proposed 4D composite outperformed other denoising methods, such as the standard OSEM with spatial filter, OSEM with 4D filter, and HYPR kernel method with the conventional 3D composite in conjunction with recently proposed High Temporal Resolution kernel (HYPRC3D-HTR), in terms of 4D noise reduction while preserving the spatiotemporal patterns or 4D resolution within the 4D image estimate. Consequently, the error in outcome measures obtained from the HYPR4D method was less dependent on the region size, contrast, and uniformity/functional patterns within the target structures compared to the other methods. For outcome measures that depend on spatiotemporal tracer uptake patterns such as the nondisplaceable Binding Potential (BPND), the root mean squared error in regional mean of voxel BPND values was reduced from ~8% (OSEM with spatial or 4D filter) to ~3% using HYPRC3D-HTR and was further reduced to ~2% using our proposed HYPR4D method for relatively small target structures (~10 mm in diameter). At the voxel level, HYPR4D produced two to four times lower mean absolute error in BPND relative to HYPRC3D-HTR.

Conclusion

As compared to conventional methods, our proposed HYPR4D method can produce more robust and accurate image features without requiring any prior information.

1 INTRODUCTION

Positron emission tomography (PET) is a functional imaging modality with significant capabilities to image and quantify numerous biochemical and molecular processes in vivo.1-3 Reconstructed PET images, however, are typically noisy, especially in dynamic imaging where the acquired data are divided into several short temporal frames. The signal-to-noise ratio (SNR) in these short frames can be very poor since the SNR is directly proportional to the number of acquired counts and thus frame duration for a given count rate.4 Poor SNR in the reconstructed images translates to poor precision or reproducibility of image features such as nondisplaceable binding potential (BPND). One important role of “denoising” is therefore to improve the precision of image features. For the case of PET, denoising can also be applied to monitor tracers for a longer period of time or to reduce the injected dose.5 However, typical denoising methods achieve noise reduction at the expense of accuracy.4, 5 It is thus highly desirable to minimize bias introduced by denoising.

Recently, we proposed a denoised image reconstruction method, HYPR-AU-OSEM,4 which incorporates HighlY constrained back-PRojection (HYPR) denoising6, 7 directly within the widely used Ordinary Poisson Ordered Subset Expectation Maximization (OSEM) algorithm8 (see Section 2 for the meaning of AU). Our previous work demonstrated that HYPR-AU-OSEM achieves spatial noise reduction and improves the reproducibility in spatial image features such as contrast recovery without degrading accuracy in terms of spatial resolution and contrast within single frame reconstruction. Furthermore, the method does not require any prior information and is not computationally intensive. In the present work, we present the 4D extension of our denoised reconstruction for dynamic imaging (i.e., HYPR4D-AU-OSEM) and the incorporation of our 4D denoising operator within the recently proposed kernelized reconstruction inspired by kernel methods in machine learning9 (i.e., HYPR4D-K-OSEM).

In dynamic PET imaging, denoising methods such as HYPR and Non-Local Mean (NLM) kernel method make use of composite image(s)/feature vector(s), generated by summing the temporal data,7, 9, 10 to achieve favorable noise characteristics of the composite while preserving spatial resolution. The drawbacks of this approach are: (a) mismatch in contrast between the composite and target image due to change in tracer distribution over time can introduce bias into spatiotemporal features such as magnitudes and shapes of the time-activity curves (TAC), (b) the sum of temporal data can still be quite noisy thus providing insufficient spatial noise reduction, and (c) this type of 3D composite is also not very effective in constraining temporal noise.11 In this work we propose a novel intrinsic data-driven spatiotemporal 4D composite within HYPR4D-AU-OSEM and HYPR4D-K-OSEM to remedy the drawbacks of conventional denoising methods.

In an effort to minimize potential bias introduced by external prior information such as differences and mis-matches between anatomical and functional features commonly observed in MR-guided denoising methods5 or models which rely on specific tracer kinetics typically used in 4D reconstructions,12, 13 we make use of only the PET data set itself to achieve denoising within the reconstruction task without any prior knowledge of the spatiotemporal patterns of the imaging tracer/biomarker. The proposed prior-free method was evaluated by first using an experimental phantom study to demonstrate the proof of concept that the proposed 4D denoising operator/kernel achieves noise reduction while preserving 4D resolution. The phantom study also showed a practical advantage of HYPR4D-K-OSEM over HYPR4D-AU-OSEM. Then we validated our proposed HYPR4D kernel method and compared to standard and denoised reconstructions using 4D simulation and clinical human data. The impact of 4D denoised reconstructions on parametric BPND estimates for target structures of different sizes, contrasts, and uniformities/functional patterns was also investigated.

2 MATERIALS AND METHODS

2.A HYPR denoising

Originally developed for time-resolved MR angiography,6 HYPR is a denoising method which has been demonstrated to improve SNR and detection sensitivity of imaging features in PET.7, 14 The image space HYPR post-processing is defined as:
urn:x-wiley:00942405:media:mp14751:mp14751-math-0001(1)
where Ht is the HYPR denoised image for the tth time frame of the dynamic series, It is the target image at tth time frame, Ct is the composite image for the tth time frame, F is the filter function (e.g., Gaussian), and urn:x-wiley:00942405:media:mp14751:mp14751-math-0002 represents the convolution operation. The composite image is typically defined as the weighted sum of dynamic frame images according to the frame duration or counts for the case of PET.7, 10

This denoising method is effective under the condition that the composite has similar resolution as compared to the target, and high frequency features in the composite contain less noise than those in the target. Furthermore, the contrast in the composite needs to be close to that in the target; otherwise, the composite can introduce bias in the denoised image.10 Note that the high frequency features in the image contain information about structure boundaries and noise. The noise in the HYPR denoised image depends on both the noise in the composite and that in the filtered target image. The filter kernel size controls the amount of high frequency features extracted from the composite and those reduced from the target; for example, narrower kernels would preserve more high frequency features from the target than wider kernels. Once the kernel reaches a certain size such that the filtered target contains negligible noise as compared to the composite, the noise in the HYPR denoised image would be purely defined by that in the composite (i.e., the best achievable noise performance). As a result, a narrow filter kernel is typically used to limit bias when the contrast in the composite is very different from that in the target, and a wide kernel can be used otherwise.

2.B HYPR-AU-OSEM

An effective strategy for incorporating HYPR denoising within the OSEM reconstruction has been proposed previously for single frame reconstruction.4 By incorporating the HYPR denoising operator (A)fter each OSEM (U)pdate (i.e., HYPR-AU-OSEM) and progressively updating the composite using the sum or superposition of denoised subset images from the previous iteration, one can effectively achieve noise reduction while preserving the spatial resolution and contrast. As different OSEM subset data correspond to different realizations of the same tracer distribution observed from different angular views for the projection-based reconstruction, they do not necessarily agree with one another especially at low count situations. As a result, “limit cycle” behavior or oscillation in image features such as contrast recovery across subset estimates has been typically observed. Moreover, at the end of each OSEM iteration, the estimate is biased toward the last subset of data.

On the other hand, the OSEM subset data and corresponding estimates also do not share the same noise pattern. Consequently, by summing the subset images, the true signals exhibit mostly “constructive interference,” whereas random noise is less likely to pile up. Furthermore, early updates of reconstruction contain low noise as the reconstruction process typically starts with a noise-free initial estimate. As a result, the high frequency features in the resultant composite image, predominantly obtained from early updates of the reconstruction, contain less noise than those in the target image. In addition, using the sum of the subset images as the composite for denoising also reduces the limit cycle behavior and makes the OSEM estimate less biased toward the last subset of data (see the structural similarity index comparison in the results section). In other words, inconsistent patterns/features across subsets are discouraged. The HYPR-AU-OSEM is given by:
urn:x-wiley:00942405:media:mp14751:mp14751-math-0003(2)
where ys is the measured projection data for the sth subset (s = 1,……., S), λm,s is the image estimate at the sth subset of mth iteration, and bs is the background contamination in PET measurement such as the scatter and randoms for the sth subset. Ps is the system matrix of the scanner for the sth subset, H is the HYPR denoising operator, Im,s is the target image at sth subset of mth iteration, Cm is the composite at mth iteration, F is the filter function (Gaussian in this case), and urn:x-wiley:00942405:media:mp14751:mp14751-math-0004 represents the convolution operation.

2.C A novel 4D composite and HYPR4D-AU-OSEM

As mentioned in the introduction, conventional 3D composite generated by summing temporal data may introduce bias and not be effective in 4D noise reduction. In this work, our proposed spatiotemporal 4D composite is based on the following observation: In our 3D denoised reconstruction framework where no temporal information was utilized, the composite was generated directly within the single frame reconstruction and updated for each OSEM iteration as the sum of the preceding subset images from the previous iteration.4 Since there is no mis-match in tracer distribution between the composite and target, a wide denoising kernel can be used to effectively constrain noise as previously demonstrated. When reconstructing 4D data set (i.e., multiple temporal frame images) using this approach, one can generate a frame-specific composite for each time point, and this dynamic series of spatial composites forms the 4D composite (i.e., there is a one-to-one dynamic voxel matching or “doxel” matching between the 4D image estimate and the 4D composite) which enables the denoising operation to be applicable in both spatial and temporal domains through the HYPR4D operation as will be described shortly. The conceptual illustration of the conventional 3D composite and the proposed 4D composite is shown in Fig. 1.

Details are in the caption following the image
Conceptual representation of the conventional three-dimensional (3D) composite (sum of temporal data) and the proposed four-dimensional composite (sum of spatial subset data). Note that all temporal information is lost in the conventional 3D composite.
The proposed HYPR4D denoising operator (H4Dm) is defined in Eq. (3) where m and s correspond to the OSEM iteration and subset indices, respectively; C4D and I4D represent the 4D composite and the target image, respectively, and F4D is the 4D Gaussian filter (3D spatial + 1D temporal convolutions) in this case. Here the contrast and magnitude/shape of TACs in the composite are ensured to be close to those in the target as the 4D composite is updated every iteration. This intrinsically computed 4D composite enables voxel specific data-driven temporal kernel to guide the temporal denoising process and allows the temporal information from the data to be utilized effectively (i.e., spatially variant temporal kernel) as will be described in more details later.
urn:x-wiley:00942405:media:mp14751:mp14751-math-0005(3)
The HYPR4D-AU-OSEM (as will be referred to as the AU method) is thus given by:
urn:x-wiley:00942405:media:mp14751:mp14751-math-0006(4)
Here all dynamic images are updated at once as the 4D image estimate (λ4D).

2.D HYPR4D-K-OSEM

Typically, kernelized reconstruction methods5, 9 reparameterize the EM algorithm into an alternative set of spatial basis functions (i.e., kernel matrix K) and kernel coefficients (α). In this work, the proposed HYPR4D denoising operator forms a set of spatiotemporally variant convolutional basis functions within the kernel matrix which constrains the noise increment in both spatial and temporal domains while effectively updating the 4D contrast. The proposed 4D denoised kernel OSEM (HYPR4D-K-OSEM) is defined as:
urn:x-wiley:00942405:media:mp14751:mp14751-math-0007(5)
where the HYPR4D kernel matrix (KH4Dm) is given by:
urn:x-wiley:00942405:media:mp14751:mp14751-math-0008(6)
Here the spatiotemporally variant convolutional kernel matrix is decomposed into the self-normalized spatiotemporal weights extracted from the 4D composite for the preservation of 4D high frequency features (hm) and the spatiotemporally invariant 4D Gaussian convolution (F4D). The sparsity of the kernel matrix only depends on the width of the 4D Gaussian since the matrix which contains hm is diagonal. Similar to the AU method, the proposed kernel matrix is updated along with the 4D composite every iteration. For both AU and kernel methods, one iteration of standard OSEM was used to initialize the composite (i.e., sum of subset updates within the first iteration of OSEM) in the 4D denoising operator and kernel matrix. The one OSEM iteration images are also used as the input initial 4D estimate for Eqs (4) and (5). After the first HYPR4D iteration, the composite is updated using the denoised subset images from the previous iteration as shown in Eqs (3) and (6) thus providing a highly constrained noise increment per update and allowing the 4D high frequency features to be updated in a cleaner fashion as compared to conventional methods. This progressive update also makes the kernel matrix more adaptive to the measured data.
To summarize, the proposed AU/kernel method:
  • (-) intrinsically generates PET data-driven 4D composite starting from the early iteration of the reconstruction (i.e., sum of the subset updates within the first OSEM iteration in this case; the initial 4D composite defines the best achievable noise performance).
  • (-) uses the HYPR4D denoising operator/kernel to extract the 4D high frequency features from the 4D composite; these 4D high frequency features or doxel-specific weights are what make the 4D convolution spatiotemporally “variant.”
  • (-) applies the extracted 4D high frequency features to guide the denoising process across all OSEM subset updates for the next iteration of reconstruction.
  • (-) progressively updates the 4D composite/kernel matrix at the end of each OSEM iteration using the denoised subset images from the preceding iteration.

Typically, early termination of the reconstruction (i.e., stopping at a relatively low number of iterations) has been commonly used to control noise in EM based reconstruction. This approach has also been employed in this work; other example criterions on when to stop iterating can be found in Section 4. As an effort to accelerate the computation of the 4D kernel operation, the attenuation image or µ-map which consists of the linear attenuation coefficients of the object was used as a mask for the extraction of temporal high frequency features and the temporal convolution process. In other words, the spatially variant temporal kernel was computed only for voxels with nonzero µ-value as tracking temporal patterns outside the object is not very meaningful and can be time consuming. A Gaussian blurring (5mm FWHM in this case) was applied to the µ-map to account for possible mis-match between the µ-map and the emission image (e.g., due to motion or error in the coregistration). For the case of High Resolution Research Tomograph (HRRT),15 only 20–30% of the voxels within the field of view correspond to the scanned object (i.e., three to five times gain in computation time of the 4D kernel).

2.E HYPRC3D-HTR-K-OSEM

In addition, HYPR kernel method using the conventional 3D composite (Cs) generated by summing the entire temporal series (It), that is, a single 3D spatial composite, as described in16 in conjunction with the recently proposed High Temporal Resolution (HTR) kernel11 (i.e., HYPRC3D-HTR) was evaluated to demonstrate the advantages of the proposed intrinsic 4D composite (with progressive/adaptive update) using 4D simulations as well as data acquired on the HRRT. Although this spatiotemporal kernel method based on conventional 3D composite and HTR has been evaluated using the NLM kernel in,11 HYPRC3D-HTR-K-OSEM presented here is the first demonstration of the HYPR kernel version based on similar prior composite concept, which is expected to behave similarly to the NLM counterpart. The consistent use of HYPR kernel in all of our kernel methods also allows fair comparisons between different composite implementations. The 4D kernel matrix (K) used in HYPRC3D-HTR method is defined by:
urn:x-wiley:00942405:media:mp14751:mp14751-math-0009(7)
where ⊗ is the Kronecker product, Ks and Kt are the spatial and temporal kernel matrices, respectively, hs and ht are the weights which recover the spatial and temporal high frequency features, respectively, and Fs and Ft are the 3D spatial and 1D temporal Gaussian filtering, respectively.

Here the guidance for 4D denoising is predefined using the weights/features extracted from the conventional 3D composite and the temporal composite (Ct) which consists of the trues sinogram counts at each time point t (T(t)). This method therefore uses a prior spatiotemporal composite precomputed with additional reconstruction(s), a fixed kernel matrix (i.e., no progressive/adaptive update), and a spatially invariant temporal kernel; that is, only a single temporal pattern extracted from PET sinogram data is used to guide the temporal denoising for all voxels. In contrast, the proposed (prior-free) HYPR4D kernel method computes and progressively updates the 4D composite/kernel matrix intrinsically within the denoised reconstruction while utilizing spatially variant or voxel specific data-driven temporal kernel. For comparison purpose, the same spatial kernel size and temporal kernel size as the HYPR4D-K were used (as will be described shortly) to extract the same amount of features from the composite in HYPRC3D-HTR method.

2.F Experimental contrast phantom study

A modified Esser phantom with fillable cylindrical inserts as described in4 was scanned on the HRRT. The phantom was filled with 18F with the hot-to-background ratio of approximately 4:1. The total activity within the phantom was approximately 40 MBq at the start of data acquisition with data recorded for 8 h. The list-mode data were histogrammed into span 9 sinograms with a typical tracer protocol used in our institution (16 temporal frames: 4 × 60 s, 3 × 120 s, 8 × 300 s, 1 × 600 s). The data were reconstructed up to 12 iterations with 16 subsets using OSEM, HYPR4D-AU-OSEM, and HYPR4D-K-OSEM with different 4D kernel sizes within the denoising operator/kernel. Note that the one iteration of OSEM used to initialize the HYPR4D methods is not considered/counted as a HYPR4D iteration. Here the kernel width of 7 × 7 × 7 × 7 4D doxels corresponds to 2.5 mm FWHM in the spatial domain and two frames FWHM in the temporal domain, and the 13 × 13 × 13 × 13 4D kernel width corresponds to 5 mm FWHM and four frames FWHM. Different sized kernels were used to evaluate the effect of kernel size on the proposed methods, and kernel sizes equal or greater than two voxels FWHM and two frames FWHM were selected in order to achieve effective 4D noise reduction. The spatial Gaussian post-filter applied to OSEM images had a FWHM of 2 mm in all cases. The reconstructed 3D volume matrix size is 256 × 256 × 207 with voxel size of (1.21875)3 mm3. For the 4D denoised reconstructions, the reconstructed 4D matrix size is 256 × 256 × 207 × (number of temporal frames) with doxel size of (1.21875)3 mm3 × frame duration.

The reconstructed data were corrected for detector normalization, object attenuation, randoms, and scattered events. The average (over frames) contrast recovery coefficient (CRC) vs image voxel noise, CRC vs reconstruction iterations, and image voxel noise vs reconstruction iterations were evaluated for the first four (60 s) frames which contained relatively low counts (~20 million counts). CRC and image voxel noise are defined as:
urn:x-wiley:00942405:media:mp14751:mp14751-math-0010(8)
urn:x-wiley:00942405:media:mp14751:mp14751-math-0011(9)
where CB and CH are the mean counts within the background and hot regions, respectively. AH and AB are the reference activities for the hot and background regions, respectively. xn is the nth image voxel value, and N is the total number of voxels in the uniform background regions.

An artificial temporal pattern was created by not correcting the dynamic images for the radioactive decay and image frame duration; that is, sharp temporal peaks and dips were formed by the difference in frame duration and the radioactive decay of 18F. Here the assigned frame duration increased with time thus creating rapid uptake patterns, while the radioactive decay produced a declined pattern across TAC. The gold standard reference was generated from the average of high count OSEM reconstructions and by assuming constant (fully corrected) activity across time. The uncorrected reference TAC was then created by decorrecting the straight TAC for decay and frame duration. The temporal patterns (i.e., TAC) obtained from OSEM with two frames FWHM Gaussian temporal filter and the proposed 4D denoised reconstructions were compared.

As will be seen from the results section, although better CRC vs noise trajectories can be achieved by the AU method as compared to the kernel method using the same kernel size, using the same wide (e.g., 13 × 13 × 13 × 13) kernel in the AU method would cause the CRC convergence rate to be too slow to be practical with only minor gain in CRC vs noise trajectory. When adjusting the 4D kernel size separately for the AU and kernel method such that they produce similar CRC vs noise trade-off, the kernel method would always require fewer iterations to reach similar results as the AU method. Due to this advantage, the rest of this work was carried out using the kernel method when comparing to OSEM with spatial/4D filter and HYPRC3D-HTR methods in the later analyses.

2.G 4D simulations

Analytical 4D simulations of dynamic [11C]Raclopride (RAC) scans with temporal framing of 4 × 60 s, 3 × 120 s, 8 × 300 s, 1 × 600 s similar to that described in17 were used for evaluations of cases with time-varying tracer distribution. Dynamic noise-free reference images were generated at the aforementioned temporal time points using the segmented anatomical regions, patient derived kinetic parameters such as R1, k2, and BPND, and the 2-pass Simplified Reference Tissue Model (SRTM2).18 The reference images were then blurred with a 2.5 mm FWHM Gaussian filter to simulate the intrinsic resolution of HRRT before forward projecting to singoram data using the system geometry of HRRT. The effect of attenuation and detector normalization was included in the simulation while scatter and randoms were not. Poisson noise was incorporated into the sinogram data. The simulated noise level is equivalent to that from [11C]RAC scans with 8 mCi bolus injection typically performed at our institution.

Twenty noisy realizations were generated and reconstructed using OSEM with a 2 mm FWHM spatial Gaussian filter (standard reconstruction protocol used at our institution), OSEM with a 2 mm FWHM and 2 frame FWHM 4D Gaussian filter, HYPRC3D-HTR with 13 × 13 × 13 × 13 4D kernel size, and the proposed HYPR4D kernel method with 13 × 13 × 13 × 13 4D kernel size. All reconstructions were performed up to 12 iterations with 16 subsets. Relatively narrow post-filters were used for OSEM in order to achieve noise reduction without greatly degrading the resolution and contrast, while the 13 × 13 × 13 × 13 4D kernel size was used for the kernel methods in order to achieve noise reduction without making the kernel matrix excessively nonsparse. Note that unlike the post-filters, the kernel size used in the kernel methods is not limited by the resolution and contrast since wider kernel sizes can still achieve the same resolution and contrast as well as stronger noise constraint. Reconstruction of noise-free data using OSEM was also included for comparison purpose (i.e., noise-free OSEM).

In order to focus on the effect due to noise, voxels within eroded regions of interest (ROI) were used to minimize the partial volume effect (PVE)/cross-talk contamination near the structure boundaries due to the intrinsic resolution of the PET scanner [see Fig. 2(a) for examples of eroded ROIs]. The standard deviation (STD) was computed over the 20 noisy realizations in voxel level within the eroded ROI, and the average STD across TAC was computed as a measure of the overall 4D noise for each voxel. The average percent coefficient of variation (%COV) across time points based on the STD and mean of the voxel over the 20 realizations was also computed as a relative measure of 4D voxel noise (see Fig. 3). The STD and %COV are defined as:
urn:x-wiley:00942405:media:mp14751:mp14751-math-0012(10)
urn:x-wiley:00942405:media:mp14751:mp14751-math-0013(11)
where xr is the voxel value at rth realization, and R is the total number of noisy realizations.
Details are in the caption following the image
(a) Examples of eroded ROIs used for the BPND analyses; the size of the ROI in the caudate corresponds to the volume of a 10 mm diameter sphere while the size of the ROI in the putamen corresponds to the volume of a 16 mm diameter sphere. (b-g) Parametric BPND images (zoomed in at the striatum) generated from simulated noise-free images, OSEM with/without post-filter, HYPRC3D-HTR, and HYPR4D-K (same color scale for all cases). Six iterations were used for all methods based on the lowest RMSE in BPND. A disease induced gradient pattern was introduced in the putamen as a challenge for the reconstruction task. (h) The corresponding vertical line profiles in the putamen, and (i) the mean profile +/- STD over the 20 realizations for the best performing method: HYPR4D-K.
Details are in the caption following the image
An example of image update progression for the standard OSEM and the proposed HYPR4D-K-OSEM reconstruction of the simulated [11C]RAC study; a coronal 2D slice of the 3D/4D activity concentration estimate for a frame corresponding to 25–30 mins from the start of the scan is shown for each method. A measure of 4D voxel noise (%COV) in the uniform background (cerebellum in this case) is shown for each case.
Percent mean absolute error (%MAE) across the TAC was generated using the mean over the 20 realizations as a measure of bias for each voxel. Additionally, regional TACs and their associated %MAE from a representative realization for regions with various sizes typically used in the kinetic analyses were compared between reconstruction methods. The %MAE is defined as:
urn:x-wiley:00942405:media:mp14751:mp14751-math-0014(12)
where xk is the voxel value or regional mean of voxels at the kth time point, Gk is the ground truth voxel value or regional mean of voxels at kth time point, and K is the total number of time points.
Parametric nondisplaceable binding potential (BPND) images were generated using SRTM2 with cerebellum as the reference region and compared between methods. The proposed method without the temporal component of the kernel matrix (i.e., HYPR3D-K) was also included here to examine the effect of temporal noise/features on BPND. The root mean squared error (%RMSE), %Bias, and coefficient of variation (%COV) of regional mean of voxel BPND values over 20 noisy realizations for different sized target regions were computed and compared across all methods [see Fig. 2(a), e.g., of different sized ROIs]. The %RMSE, %Bias, and %COV are defined as:
urn:x-wiley:00942405:media:mp14751:mp14751-math-0015(13)
urn:x-wiley:00942405:media:mp14751:mp14751-math-0016(14)
urn:x-wiley:00942405:media:mp14751:mp14751-math-0017(15)
where BPr is the regional mean of voxel BPND obtained from the rth realization, R is the total number of realizations, and BPGT is the regional ground truth BPND value.
The size of target regions evaluated in this work such as caudate and putamen ranged from ~300 to 1200 voxels (i.e., 10–16 mm in diameter), and the BPND ranged from 0 to 8 (e.g., > 4 was considered as high contrast while <2 was considered as low contrast). PET unique feature such as a disease induced BPND gradient pattern was also included in the posterior putamen. In addition, the %MAE in voxel BPND for target regions with different contrasts and uniformities/functional patterns were computed and compared across all methods [see Fig. 2(a) for an example ROI used to compute the %MAE in the putamen]. The %MAE in voxel BPND is defined as:
urn:x-wiley:00942405:media:mp14751:mp14751-math-0018(16)
where BPn is the nth voxel BPND value within the ROI, BPGn is the nth ground truth voxel BPND value within the ROI, and N is the total number of voxels within the ROI.

2.H Clinical patient study

A dynamic human [11C]RAC study with 8 mCi bolus injection acquired on the HRRT was included for the validation. The list-mode data were histogrammed into 16 temporal frames: 4 × 60 s, 3 × 120 s, 8 × 300 s, 1 × 600 s. TACs and parametric BPND images obtained from SRTM2 with cerebellum as the reference region were compared between all the reconstruction methods described above. Post-reconstruction interframe realignment/motion correction was performed for all methods; motion within each temporal frame was not corrected. The Structural Similarity Index (SSIM)19 between the average image estimate across all OSEM subsets for each iteration and the image estimate at the end of each iteration for the standard OSEM and the proposed HYPR4D kernel method with different 4D kernel sizes was computed. The averaged SSIM over all dynamic frames was then used to evaluate the consistency between the average subset estimate and the estimate at the end of the subset updates for each iteration.

3 RESULTS

3.A Experimental contrast phantom study

From the CRC vs noise comparison for a relatively small hot region (8 mm in diameter) in the low count data as shown in Fig. 4(a), one can observe that the kernel method with a relatively narrow spatial kernel had similar convergence rate in CRC as compared to the standard OSEM, while the AU method achieved better CRC vs noise trajectory using the same kernel size but with slower convergence rate in CRC. However, with a relatively wide spatial kernel size, the convergence rate in CRC for the AU method became much slower as previously observed.4 Moreover, the CRC after 12 iterations (~200 updates) was only slightly higher than that of OSEM with post-filter [see Fig. 4(b)] even though the CRC vs noise trajectory was approaching a straight vertical line (i.e., improving CRC without increasing noise). On the other hand, the kernel method using the same wide spatial kernel still had relatively fast convergence rate in CRC and the trajectory outperformed that of the AU method with a narrow kernel. OSEM with spatial post-filter or spatially invariant convolution achieved noise reduction at the cost of lower CRC as expected.

Details are in the caption following the image
(a) Contrast recovery coefficient (CRC) vs voxel noise comparison at low count level for the 8 mm hot insert from the experimental contrast phantom study. Each data point represents a complete OSEM iteration across all subsets, and the number of iteration increases from left to right. The data were extracted from phantom images reconstructed using standard OSEM with and without spatial post-filter, HYPR4D-K-OSEM, and HYPR4D-AU-OSEM with different 4D kernel sizes. The initial OSEM iteration for the HYPR4D methods is displayed by the first data point of the OSEM method. (b) CRC and (c) voxel noise as a function of number of iteration from the same data as (a). (d) The uncorrected regional TAC comparison between OSEM with temporal filter and HYPR4D-K-OSEM (7 × 7 × 7 × 7 kernel). Unlike the spatially or temporally invariant convolutional filtering/smoothing, the proposed 4D kernel based on spatiotemporally variant convolution preserved both spatial resolution/contrast and temporal pattern (i.e., 4D resolution).

Note that the noise increment per iteration approached zero with the larger kernel size for both AU and kernel methods as shown in Fig. 4(c). The best achievable noise performance for both methods depends on the initial point of the CRC vs noise trajectory, which is defined by the noise in the composite obtained within the first iteration of OSEM. This initial composite was ~50% less noisy than one iteration of OSEM image in this case. Since the same initial composite was used for both AU and kernel methods, the best achievable noise performance was the same for both methods. Additionally, the spatial CRC vs noise trajectories were not very sensitive to the size of the temporal kernel; for example, 7 × 7 × 7 × 13 kernel size produced nearly identical CRC vs noise trajectories as 7 × 7 × 7 × 7 kernel size (results not shown).

The temporal pattern comparison between the proposed HYPR4D kernel method and standard OSEM method with temporal filtering is depicted in Fig. 4(d). The temporal filtering or temporally invariant convolution altered the pattern along the TAC; for example, all sharp peaks and dips were removed, and the initial step pattern became a continuous uptake pattern. The proposed method based on spatiotemporally variant convolution, however, preserved both activity level and temporal pattern (i.e., 4D resolution). The AU method showed nearly identical results as the kernel method (not shown). The spatial CRC vs noise and temporal pattern comparisons demonstrated that the 4D high frequency features extracted from the proposed 4D composite with progressive update were able to recover the spatiotemporal patterns in the 4D image estimate after the 4D spatiotemporally invariant Gaussian filtering (F4D) during the kernel operation in Eq. (6). Although very good results have been obtained using the AU method in terms of CRC vs noise, due to its very slow convergence rate in image features (small in size) the rest of this work was carried out using the kernel method.

3.B 4D simulations

An example of image update progression for the standard OSEM and the proposed HYPR4D-K reconstruction of the simulated [11C]RAC study is shown in Fig. 3. The estimate from one iteration of standard OSEM was used as the initial input estimate for HYPR4D-K, and the composite derived from the subset images within the first iteration of OSEM was used to initialize the HYPR4D denoising operator/kernel matrix. Note that initially the composite contained lower noise and contrast as compared to the corresponding current estimate (i.e., one iteration of OSEM image), and this initial composite defined the best achievable noise level as mentioned previously.

After a relatively high number of updates (12 × 16 updates), the standard OSEM estimate became much noisier than that from the early iterations of the reconstruction, whereas the HYPR4D-K estimate showed lower noise than one iteration of OSEM image [this can also be observed from Fig. 4(a)]. The composite for the 12th iteration of HYPR4D-K showed nearly identical contrast and noise level as compared to the image estimate at the 12th iteration of HYPR4D-K. This demonstrated that the progressive update of the composite ensures that the contrast in the composite is close to that in the target and once the estimate becomes stable, the composite becomes nearly identical to the target. In contrast, without the progressive update of the composite (i.e., a “static” composite) the estimate would converge to a low contrast image very quickly since the contrast in the composite obtained from early subset updates of the reconstruction is underestimated (see Composite it1 in Fig. 3).

The voxel TAC comparison (mean +/− STD over 20 noisy realizations) obtained from the 4D [11C]RAC simulation is depicted in Fig. 5. The %MAE across the TAC performance was 6% for OSEM with spatial filter, 10% for OSEM with 4D filter (mostly contributed from high bias at early time points), 11% for HYPRC3D-HTR, and 5% for HYPR4D-K. Note that the underestimation bias typically observed from the post-filter used in filtered OSEM images was minimized as the voxel was away from the structure boundary (i.e., neighboring voxels had similar concentration values). Although OSEM with 4D filter performed very well for the later time points, the temporally invariant convolution produced the highest bias for the initial uptake pattern; for example, some of the early time points do not even agree with the reference within STD. Interestingly, the opposite trend was observed from HYPRC3D-HTR where it performed very well for the initial time points, and more apparent deviation from the reference was observed for later time points. This consistent deviation at later time points was likely due to the mis-match between the true temporal pattern of the voxel and that extracted from the PET sinogram data as well as the difference in contrast between the conventional 3D composite and the target frames. In contrast, HYPR4D-K showed more adaptive performance across the TAC and produced the lowest MAE. Although slightly higher error was observed near the peak of the TAC from HYPR4D-K in this case, this behavior was not consistent across voxels as can be seen from the regional TAC comparison depicted in Fig. 6 thus likely due to statistical variation.

Details are in the caption following the image
A representative voxel time-activity curves located away from the structure boundary in the putamen (mean +/- STD over 20 noisy realizations) obtained from the 4D [11C]RAC simulation, reconstructed using (a) OSEM with spatial filter, (b) OSEM with 4D filter, (c) HYPRC3D-HTR, and (d) HYPR4D-K methods. The OSEM reconstruction of noise-free data was also included for comparison purpose.
Details are in the caption following the image
Caudate (~10 mm in diameter) time-activity curves comparison between (a) OSEM with spatial filter, (b) OSEM with 4D filter, (c) HYPRC3D-HTR, and (d) HYPR4D-K methods. Noise-free data reconstructed using OSEM was also included for comparison purpose.

As compared to the standard OSEM with spatial filter, the HYPRC3D-HTR achieved a 13% reduction in STD, while OSEM with 4D filter and HYPR4D-K achieved a 44% and a 39% reduction in STD, respectively. This can also be seen from the width of the “strip” formed by the dashed curves in Fig. 5. As expected, OSEM with 4D filter achieved substantially lower STD than OSEM with spatial filter due to the additional temporal convolution but at the expense of a nearly two times higher MAE. On the other hand, HYPR4D-K achieved similar STD as compared to OSEM with 4D filter without the degradation in MAE/accuracy. Note that STD is a measure of the overall noise which is contributed from both spatial noise and temporal noise. Therefore, even though HYPRC3D-HTR produced very smooth voxel TACs (i.e., the residual sum of squares in the SRTM2 fit was more than ten times lower than all other methods), the relatively high spatial noise from the conventional 3D composite made the TACs for a given voxel not overlap between different noisy realizations, and thus, a relatively high STD was produced as the result (i.e., smooth TACs with highly varying magnitudes).

The regional TAC comparison between reconstruction methods for a representative realization of the 4D [11C]RAC simulation is shown in Fig. 6 for a ~10 mm diameter region in caudate. One can observe from Figs. 6(a) and 6(b) that the underestimation bias due to the post-filter used in the filtered OSEM images was more apparent here since voxels affected by partial volume were included within the ROI. The wider the filter size the worse the underestimation bias given that the target regions are surrounded by voxels with lower activity concentration values. Note that the eroded ROI used here only minimized the PVE due to the intrinsic resolution of the PET scanner, and it did not remove the additional PVE introduced by the post-filter. As expected OSEM with 4D filter produced a smoother TAC than OSEM with spatial filter only; however, higher bias was introduced to the early time points as was observed previously.

As for HYPRC3D-HTR, although it performed poorly in terms of voxel level metrics, its regional level performance was among one of the better methods (comparable to HYPR4D-K). Again, higher deviation from the reference was observed from the later part of the TAC as compared to the early time points for HYPRC3D-HTR. The %MAE was 0.6% for noise-free OSEM, 8% for OSEM with spatial filter, 13% for OSEM with 4D filter, 3% for HYPRC3D-HTR, and 3% for HYPR4D-K based on the representative realization. Note that without the inclusion of a temporal component in the kernel matrix, the (spatial only) kernel methods would exhibit similar noise-induced temporal pattern across the TAC as compared to OSEM with spatial filter (results not shown).

Tables I and II summarize the results from the parametric BPND analyses. OSEM with spatial filter showed nearly identical results as OSEM with 4D filter, while the proposed HYPR4D-K method outperformed the other methods in terms of both RMSE of regional mean of voxel BPND and MAE in voxel BPND. In particular, the RMSE in regional mean of voxel BPND values was reduced from ~8% (OSEM with either spatial or 4D filter) to ~3% using HYPRC3D-HTR, and it was further reduced to ~2% using the proposed HYPR4D-K method for relatively small targets (~10 mm in diameter). The higher RMSE from OSEM with spatial or 4D filter was mainly contributed from the additional PVE/bias due to the post-filter. HYPR4D-K achieved the best accuracy (lowest %Bias) and similar precision (%COV) as compared to all other methods. For relatively big targets (~16 mm in diameter), all methods performed similarly (RMSE= ~2%). From the MAE comparison in voxel BPND within regions with different contrasts and patterns, HYPRC3D-HTR showed the highest error due to the fact that the conventional 3D composite obtained from the sum of temporal data was still relatively noisy in this case [see Fig. 2(d) for the corresponding parametric BPND image].

Table I. Parametric BPND analyses.
image

  • The lowest %RMSE (out of all the reconstructed iterations), the corresponding %Bias, and %COV in mean voxel BPND within relatively small and big regions are listed for all methods. The proposed method without the temporal component in the kernel matrix (i.e., HYPR3D-K) was also included for comparison purpose.
Table II. The %MAE in voxel BPND within high contrast/uniform and low contrast gradient regions are listed below for all methods.
ROI type \ Recon. Method OSEM w. s. filter OSEM w. 4D filter HYPRC3D-HTR HYPR3D-K HYPR4D-K OSEM on noise-free data
MAE in voxel BPND (%): uniform/high contrast 13 12 24 12 12 2
MAE in voxel BPND (%): low contrast gradient 17 17 41 13 11 2

In addition, it was observed that the MAE in voxel BPND was substantially worse in the putamen with the low contrast gradient pattern than that in the uniform and/or high contrast regions for OSEM with spatial/4D filter and HYPRC3D-HTR, whereas it remained low with less dependency on the contrast and uniformity/functional pattern within the region for HYPR4D-K (i.e., more robust). This robust behavior can also be observed from the noise-free reconstruction. The improvement in BPND from SRTM2 achieved by the proposed method as compared to the other methods was mostly contributed from the better denoising/preservation of spatial patterns through the spatial component of the proposed kernel matrix, and BPND was observed to be less sensitive to temporal noise/features compared to spatial patterns as shown by the comparison between HYPR4D-K and HYPR3D-K. The comparison of reconstructed gradient pattern in the parametric BPND image is depicted in Fig. 2. One can observe that the standard methods are not very effective in recovering the low contrast BPND gradient, while a cleaner and more accurate gradient pattern in the putamen can be obtained using the proposed method (see Table II for the corresponding MAE in voxel BPND within the low contrast gradient region).

3.C Clinical patient study

The comparison of regional TACs from the [11C]RAC human scan between various reconstruction methods is shown in Fig. 7(a). Due to the large sized ROI (~33 mm in diameter), the Cerebellum TACs were nearly identical across all methods except for OSEM with 4D filter since the bias in the early time points introduced by the temporally invariant convolution is independent of the size of the ROI. On the other hand, substantial differences in TACs can be observed between reconstruction methods for relatively small regions such as the caudate. As expected, a noise-induced temporal pattern was observed from OSEM with spatial filter and kernel methods without the temporal component in the kernel matrix (not shown), while all other methods showed smoother TACs due to the temporal filtering or kernel denoising. Similar to what was observed from simulations, TACs obtained from filtered OSEM images have on average lower magnitude as compared to the kernel methods due to the lack of resolution preservation. Furthermore, HYPRC3D-HTR showed higher deviation from HYPR4D-K for the later part of the TAC due to the difference between the prior-based 4D composite and the more data-driven intrinsic 4D composite.

Details are in the caption following the image
(a) Regional time-activity curves comparison obtained from the patient scan between various reconstruction methods. (b) Structural similarity index (SSIM) between the mean image estimate across all OSEM subsets for each iteration and the image estimate at the end of each iteration obtained from the patient scan for the standard OSEM and the proposed HYPR4D-K method with different 4D kernel sizes. Note that since all reconstruction methods shown in Fig. 7(b) started with one iteration of OSEM, the first OSEM iteration was labeled as iteration zero.

The corresponding BPND images for the patient scan are shown in Fig. 8. One can observe that high frequency features such as the functional structure boundaries were blurred/buried by the post-filter in the parametric BPND map obtained from filtered OSEM images. The parametric BPND image obtained from HYPRC3D-HTR appeared to be noisier than that from OSEM with post-filter but maintained higher contrast. This showed that the sum of temporal data can still be noisy, and the resultant images from denoised reconstruction with noisy composite can be noisier than those from the standard OSEM with post-filter. In contrast, the proposed HYPR4D-K method outperformed all other methods in terms of 4D noise reduction while preserving 4D resolution which results in cleaner functional structure boundary definitions. In addition, the effect of the temporal kernel on BPND (i.e., difference between HYPR4D-K and HYPR3D-K) was more apparent in the background (low BPND) regions. Although there is no ground truth for the clinical human data, differences in image features between reconstruction methods observed here were consistent with those from the simulations.

Details are in the caption following the image
Parametric BPND images generated from various reconstruction methods (same color scale) for the human [11C]RAC scan. The transaxial, coronal, and sagittal views are shown from left to right, respectively. OSEM with spatial filter is omitted here as its BPND image is nearly identical visually to that from OSEM with 4D filter. The number of iterations used was determined based on the contrast recovery coefficient vs noise from the experimental phantom scan: six iterations for OSEM and ten iterations for HYPRC3D-HTR, HYPR3D-K, and HYPR4D-K were used.

As depicted by Fig. 7(b), the SSIM comparison shows that the estimate at the end of the subset updates for each iteration becomes more and more different in terms of structural features from the average across the subset estimates for OSEM as iteration/noise increases (after the first iteration), whereas much better agreement can be achieved by the proposed method. The relatively poor SSIM performance of OSEM is likely due to the fact that OSEM is known to be biased toward the last subset of data which does not necessarily agree with the rest of the subset data at low counts, and the proposed method is less susceptible to such bias. One can also observe that using a wider 4D kernel in the proposed method produced higher SSIM values as compared to using a narrow kernel. Similar trends were observed from simulations as well.

4 DISCUSSIONS

In this work, we have demonstrated that the proposed intrinsic data-driven HYPR4D denoising kernel can achieve 4D noise reduction while preserving spatiotemporal features (e.g., spatial contrast, sharpness and pattern in TAC, low contrast gradient pattern in BPND, etc) using experimental contrast phantom, 4D simulation, and human data as shown in Figs. 2-8. The AU method outperformed the kernel method in terms of CRC vs noise trajectories for a given 4D kernel size. On the other hand, the kernel method outperformed the AU method in terms of CRC convergence rate especially for relatively small hot regions. Furthermore, using a relatively wide kernel in the kernel method can achieve better CRC vs noise trajectories and faster CRC convergence rate as compared to the AU method using a narrow kernel, whereas using the same wide kernel in the AU method would cause the CRC convergence rate to be too slow to be practical with only minor improvement in CRC vs noise trajectory. When adjusting the 4D kernel size separately for the AU and kernel method such that they produce similar CRC vs noise trade-off, the kernel method would always require fewer iterations to reach similar results as the AU method. Due to this advantage, the rest of this work was carried out using the kernel method as previously mentioned.

With regard to the effect of kernel size used in the HYPR4D kernel, the spatial and temporal kernel widths control the level of noise constraint in the spatial and temporal domains, respectively. Using wider kernels can achieve better CRC vs spatial noise trajectories, lower temporal noise, and more consistent spatiotemporal image features. As mentioned previously, when the width of the 4D kernel is greater than a certain size such that the noise in the filtered target is negligible as compared to that in the composite, the 4D noise increment per update would approach zero, and the best achievable spatiotemporal noise performance is defined by the noise in the initial 4D composite derived from subset images within the first OSEM iteration (the same for both AU and kernel methods). The size of the kernel is therefore straightforward to optimize in terms of noise reduction performance. Nevertheless, using wider kernels does slow down the convergence rate in CRC especially for the AU method as observed previously.4 A wider kernel also implies that the kernel matrix is less sparse and the computation time for the kernel operation would be further increased. For the kernel method, it was observed that a 13 × 13 × 13 × 13 4D kernel can achieve good 4D denoising without making the kernel matrix excessively nonsparse.

One approach to determine the optimal kernel size and the corresponding optimal number of iterations for the purposed method would be to acquire a contrast phantom scan and frame the data according to the temporal count distribution of a typical dynamic scan. Standard voxel noise averaged across frames and the variation (%COV) across the fully corrected voxel TACs averaged over the uniform background voxels of the phantom can be used as a measure of 4D noise; alternatively, another measure of 4D noise can be generated using Eqs. (10) and (11) with bootstrap replicates of the phantom data.4 The optimal kernel size can then be determined by adjusting the kernel size until the 4D noise increment per update is close to zero or below a desired level (this can also be checked by comparing the noise in the 4D filtered target images to that in the 4D initial composite within the reconstruction) and iterating until the metric of choice such as CRC changes minimally per update (e.g., < 0.1% per update) or when a criterion of choice is met. For example, a criterion of choice we have used previously to determine the optimal number of iterations is the lowest RMSE in CRC.4 In this work, we chose a kernel size (13 × 13 × 13 × 13) such that the output of the final HYPR4D-K iteration, which achieved relatively accurate CRC, contained the noise level lower than that in the initial input image estimate; that is, one iteration of the standard OSEM reconstruction [see Fig. 4(a)]. This kernel size also did not introduce prolonged computation time as mentioned previously.

As shown in Fig. 7(b), the size of the kernel also affects the SSIM performance. Moreover, it was observed that it affects the log-likelihood (LLH) value across updates in a similar fashion for both AU and the kernel methods. A narrow kernel produces an oscillatory pattern in LLH across updates similar to that of standard OSEM but less severe, whereas a wide kernel introduces more consistent features and provides a continuously increasing pattern in LLH across updates with a relatively smooth transition between the last subset of current iteration and the first subset of the next iteration (i.e., when the kernel matrix is updated). Although we do not have any theoretical proof that the LLH is guaranteed to increase when the kernel matrix is updated, we hypothesize that the LLH would most likely increase when a wide enough kernel is used though the size of the kernel should be determined based on practical considerations as previously discussed.

As compared to the pure maximum likelihood EM method (MLEM) which does not divide data into subsets, both the proposed method and MLEM make use of all the events in the data for each update, but in a different way. The proposed method makes use of all the events through the usage of sum/average subset estimate (i.e., the intrinsic composite) to guide the denoising process within each subset update, whereas MLEM updates the image based on all the events without any noise constraint. As a result, the proposed method is able to achieve, for example, similar CRC as the later update of MLEM but with the noise level of early update of MLEM. As shown in Fig. 4(a), the proposed method achieved the noise level lower than that in the one iteration with 16 subsets OSEM image (equivalent to the noise level after 16 MLEM updates) but with the CRC close to that of 192 updates of MLEM (i.e., 12 iterations with 16 subsets).

When comparing the effectiveness of post-filtering and different 4D composites within the kernel method, the proposed 4D composite was found to outperform all other methods in terms of 4D noise reduction while preserving spatiotemporal patterns as well as accuracy and precision in TAC and BPND estimates. Post-filtering methods typically introduce spatial/temporal correlations between voxels to reduce noise without preserving the spatial/temporal resolution or patterns as can be observed from the CRC vs noise and TAC comparisons, while conventional 3D composites used in the denoised reconstruction are not very effective in spatial noise reduction when the composites are still relatively noisy. Moreover, prior temporal weights or pattern extracted from PET sinogram data does not always match with the true temporal pattern of each individual voxel. Although BPND from SRTM2 was observed to be not very sensitive to temporal noise/features, applications such as functional segmentation/clustering, lp-ntPET,14 and ROI drawing on PET data for blood pools or tumors are expected to show more benefits of temporal denoising while preserving temporal features/patterns.

It was observed that the error in estimated image features obtained from the proposed method is lower and less dependent on the size of the region, contrast, and uniformity/functional pattern within the target structures than methods using standard post-filters and prior based 4D composite. In other words, more robust and accurate image features can be obtained by the proposed intrinsic data-driven 4D denoising kernel. Additionally, since the proposed 4D composite is generated intrinsically within the reconstruction, there is no other decision to be made on how to obtain/reconstruct the composite prior to the reconstruction task; for example, decisions on how to sum the temporal data to form a single 3D composite or multiple sliding window composites, how to obtain the prior temporal information, etc.

Typically, kernelized reconstruction methods utilize the NLM kernel.9, 11 Recently, it has been reported that the HYPR kernel achieved better performance than the NLM kernel for the case of the conventional 3D composite while being simpler to implement.16 Nevertheless, we expect that the proposed 4D composite with the NLM kernel will behave similarly to what obtained here with the HYPR kernel and outperform the other methods that were tested in this work. The effectiveness of the proposed 4D composite with progressive update using the NLM kernel will certainly need to be validated.

The spatiotemporal patterns within the 4D image estimate are combinations of true signal, noise, and other physical effects such as subject motion. It is expected that once the effect of noise is reduced, the true signal with other physical effects would become clearer. For some cases of human studies, consistent changes in temporal pattern were observed from all methods (except for temporal post-filtering which smooths out any pattern across TACs and also HTR based kernel method, to an extent, if the physical effect is not captured by the temporal pattern of the PET sinogram data) and were likely attributed to subject motion (especially near the end of scans where the frame durations were 5 min or longer). This is not surprising as subject motion is more likely to occur within longer temporal frames. As demonstrated by the temporal pattern comparison from the experimental phantom study, the proposed method does not smooth out features (e.g., sharp peaks and dips) due to real physical effects in TACs since the spatiotemporal information for every voxel is preserved within the data-driven 4D composite; that is, consistent patterns across 4D subset images are always preserved. As a result, motion-induced temporal patterns would be preserved by the proposed 4D composite as well.

Although machine learning based denoising methods have been showing promising results in recent years (mostly in static imaging),20 we foresee that the proposed method would be beneficial for any existing or new imaging tasks where no prior information is available or desirable. For example, a relatively new imaging task is the detection of neurotransmitter release due to drug or intervention within a single PET scan session.14 In this case, the spatial location and timing of the neurotransmitter release are unknown and can be quite variable among subject populations, and denoising is crucial in accurate and reproducible estimation of those parameters from the PET images. As such, one needs to minimize any assumption or prior information about the spatiotemporal patterns within the PET images as prior information may remove/alter the change in spatiotemporal patterns due to drugs or intervention and introduces bias in the spatial activation/release pattern, the strength of release, and the timing parameters. Functional clustering/PET segmentation also works under similar principle. On the other hand, the proposed method is expected to be more suitable for these types of imaging tasks due to the intrinsic data-driven nature and that it does not require any prior knowledge of the spatiotemporal patterns of the tracer/biomarker.

5 CONCLUSION

Results from simulations, experimental phantom, and patient data showed that our proposed HYPR4D denoising kernel method outperforms other denoising methods, such as standard OSEM with spatial filter, OSEM with spatiotemporal 4D filter, and the HYPR kernel method using the conventional 3D composite in conjunction with the High Temporal Resolution kernel (HYPRC3D-HTR), in terms of 4D noise reduction and preservation of spatiotemporal patterns/distributions of radiotracer. As a result, the error in the outcome measures such as BPND remains low and becomes less dependent on the size of the region, contrast, and uniformity/functional patterns within the target structures. In summary, the proposed method can produce more robust and accurate image features without any prior information, as compared to the conventional methods.

ACKNOWLEDGMENTS

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) grant number 240670-13.

    CONFLICT OF INTEREST

    The authors have no conflict to disclose.