Adaptive Sparsity Level and Dictionary Size Estimation for 1 Image Reconstruction in Accelerated 2 D Radial Cine MRI

11 Purpose: In the past, Dictionary Learning (DL) and Sparse Coding (SC) have been 12 proposed for the regularization of image reconstruction problems. The regularization 13 is given by a sparse approximation of all image-patches using a learned dictionary, i.e. 14 an overcomplete set of basis functions learned from data. Despite its competitiveness, 15 DL and SC require the tuning of two essential hyper-parameters: the sparsity level 16 S the number of basis functions of the dictionary, called atoms, which are used to 17 approximate each patch, and K the overall number of such atoms in the dictionary. 18 These two hyper-parameters usually have to be chosen a-priori and are determined 19 by repetitive and computationally expensive experiments. Further, the final reported 20 values vary depending on the specific situation. As a result, the clinical application of 21 the method is limited, as standardized reconstruction protocols have to be used. 22 Methods: In this work, we use adaptive DL and propose a novel adaptive sparse 23 coding algorithm for 2D radial cine MR image reconstruction. Using adaptive DL and 24 adaptive SC, the optimal dictionary size K as well as the optimal sparsity level S are 25 chosen dependent on the considered data. 26 Results: Our three main results are the following: First, adaptive DL and adaptive SC 27 deliver results which are comparable or better than the most widely used non-adaptive 28 version of DL and SC. Second, the time needed for the regularization is accelerated 29 due to the fact that the sparsity level S is never overestimated. Finally, the a-priori 30 choice of S and K is no longer needed but is optimally chosen dependent on the data 31 under consideration. 32 Conclusions: Adaptive DL and adaptive SC can highly facilitate the application 33 of DLand SC-based regularization methods. While in this work we focussed on 2D 34 radial cine MR image reconstruction, we expect the method to be applicable to different 35 imaging modalities as well. 36


Contents
where F I := S I • F and e denotes random noise. Images directly reconstructed from under-137 sampled k-space by applying the adjoint operator F H I contain severe artefacts which limit 138 the diagnostic quality. Since by discarding non-measured data problem (1) see e.g. 5 and 6 . Here, x denotes the unknown solution, y I the measured undersampled 145 acquired k-space data, λ a regularization parameter, and γ j 0 counts the number of non-146 zero coefficients in γ j . The operator R j extracts the j-th 3D patch from the image x, Ψ 147 denotes the dictionary, i.e. a set of K unit norm vectors also referred to as atoms, and γ j 148 the sparse representation of the patch R j x with respect to Ψ. The difference between (P1) and (P2) is that in (P1), one assumes to have a pre-trained dictionary Ψ, while in (P2), the 150 dictionary Ψ is learned during the reconstruction based on the current image estimates. Note 151 that in 6 and 5 , a TV term is further added to the minimization problem (P2). However, since respectively. Problem (2) is solved with any SC algorithm, while (3) is typically solved using 162 an alternating minimization procedure, which alternates between DL to obtain Ψ and SC to 163 obtain the set of vectors {γ j } j . The choice of the algorithms used for training the dictionary 164 Ψ and obtaining the sparse codes {γ j } j is the main focus of this work and will be discussed and the right-hand-side b is given by a linear combination of the initial reconstruction x I 172 and an image which corresponds to a properly averaged combination of all patches Ψγ j , i.e. thresholded support I t n and the residual a n = y n − Ψ I t n Ψ † I t n y n , which captures the remaining 196 signal energy and is used for the atom update, 197ψ k = n:k∈I t n a n + ψ k ψ k , y n · sign( ψ k , y n ).
Obviously, ITKrM exhibits a much lower computational complexity than K-SVD and, de- where not only the learned dictionary exhibits good properties but also the dictionary size 222 K and the sparsity level S are adapted in each iteration.

223
Concretely, for adapting the dictionary size, the replacement strategy, resulting from an (|(Ψ † I t n y n )(i)| 2 ) i∈I t n and residual inner products with the dictionary (| ψ i , a n | 2 ) i / ∈I t n that are 250 larger than some threshold θ 2 times the residual energy a n 2 2 . Note that this threshold is 251 computed within the algorithm and has not be given as input parameter. If the average 252 of these estimated sparsity levelsS = 1 N n S n , is larger than S e , this indicates that the 253 current estimate S e is too small and has to be increased by one, ifS = S e , it is kept the 254 same and if S e >S, S e has to be decreased by one. As a next step, we introduce adaptivity into the SC procedure. In particular, we propose product is larger than τ r 2 can be calculated as which for τ = 2 log(4K)/d is smaller than 1 2 . Inequality (8) is the main ingredient of the 276 algorithm as it provides a proper threshold τ which is used as stopping condition in aOMP 277 and prevents aOMP from overfitting the considered patch by discarding noise.

278
To further accelerate aOMP, we introduce a preliminary step where we select the 'strongest' 279 part of the support. In particular, before always adding the next best fitting atom (one 280 at a time) we will choose part of the support by thresholding with τ 1 = 2 log(8K)/d,  were used as ground-truth images. From these images, the k-space data was retrospec-308 tively generated and corrupted by Gaussian noise in order to simulate an acceleration 309 factor of 9. We repeated the experiments for different choices of the sparsity level S.

310
More precisely, to demonstrate the impact of the choice of potentially too low/too high 311 S, we used S = 4, S = 8 and S = 16 for the non-adaptive DL and SC algorithms.   obtained by a zero-filled reconstruction as in 5 or 6 . Since the artefacts can be expected to 353 have a more high-frequency type of texture, we decided to only use K = 128.    For assessing the convergence speed of the reconstruction algorithms, we tracked the differ-

416
Here, we report the times for the different components of the DL-based reconstruction algo-417 rithms. The components which significantly contributed to the relatively high reconstruction 418 times were the DL and SC algorithms and the PCG method which is needed to obtain an 419 approximate solution of (4). Obviously, the latter was constant for the three different combi-420 nations of DL and SC.  We see that K-SVD was the slowest DL algorithm and took approximately 69-71 seconds and K was required. Moreover, the optimal number of atoms K is also data-dependent. In particular, for dic-484 tionaries learned on images containing more structure, a larger K is needed than for fairly 485 smooth ones. Further, the optimal size of the dictionary was also shown to be dependent on 486 the noise level of a corrupted image, i.e. the more noise, the smaller K should be chosen 22 .

487
These observations suggest that a global choice of S and K cannot be optimal, disregarding 488 from the fact that they are not known and can only be guessed.   Subsection II.C., the S-and K-adaptivity achieves competitive results compared to K-SVD 525 + OMP and additionally reduces the required reconstruction times. 526 Figure 6: Examples of eight three-dimensional atoms (un-stacked along the time dimension) of the dictionaries learned by K-SVD (left), ITKRM (mid) and aITKrM (right). The dictionaries were learned on 3D patches extracted from the initial NUFFT-reconstruction x I (first row) and the penultimate image estimate (second row). For K-SVD and ITKrM, the sparsity level was S = 16. Since S = 16 is relatively high, the atoms obtained by K-SVD and ITKrM contain quite some noise. Note that the constant atom is not shown in the images. As all iterative reconstruction methods which employ a-priori knowledge expressed as a 533 penalty term, the DL-based regularization method requires to choose the regularization pa-534 rameter λ. However, note that quite some work has been dedicated on how to adaptively 535 choose the parameter λ as well, see e.g. 10,11 , which might be incorporated in the reconstruc-536 tion algorithm using aITKrM + aOMP.

538
The achieved image quality using aITKrM + aOMP is comparable with the one achieved 539 using the standard combination K-SVD + OMP with the best reported choices of the sparsity 540 level as can be seen in Figure 1 and Table 1. The performed experiments reveal that for Learning a dictionary with aITKrM instead of K-SVD leads to an acceleration factor of 555 approximately 10 which is useful when the dictionary is learned during the reconstruction.

556
The reason is that the computationally most expensive component of K-SVD is OMP, where 557 aITKrM in contrast only requires the faster thresholding. More importantly, using aOMP 558 has the potential of highly reducing the time needed for the sparse approximation of all 559 patches since, instead of using a (as we have seen, potentially too high) global sparsity level 560 We point out that the used implementations of aITKrM as well as aOMP were not run 562 on a GPU and could therefore be further accelerated and optimized. In particular, the 563 underyling nature of ITKrM offers the possibility to transfer the calculations on a GPU 564 and exploit parallelisation as it can process the patches sequentially. Also, note that for