An efficient fusion algorithm combining feature extraction and variational optimization for CT and MR images

Abstract In medical image processing, image fusion is the process of combining complementary information from different or multimodality images to obtain an informative fused image in order to improve clinical diagnostic accuracy. In this paper, we propose a two‐stage fusion framework for computed tomography (CT) and magnetic resonance (MR) images. First, the intensity and geometric structure features in both CT and MR images are extracted by the saliency detection method and structure tensor, respectively, and an initial fused image is obtained. Then, the initial fused image is optimized by a variational model which contains a fidelity term and a regularization term. The fidelity term is to retain the intensity of the initial fused image, and the regularization term is to constrain the gradient information of the fused image to approximate the MR image. The primal‐dual algorithm is proposed to solve the variational problem. The proposed method is applied on five pairs of clinical medical CT and MR‐T1\MR‐T2 images, and the comparison metrics SF, MI, QAB/F , QW , and VIFF are calculated for assessment. Compared with seven state‐of‐the‐art methods, the proposed method shows a comprehensive advantage in preserving the salient intensity features, as well as texture structure information, not only in visual effects but also in objective assessments.


| INTRODUCTION
The rapid development of computer technology promotes the progress of medical imaging, clinic practices, and medical science. Different imaging methods are complementary and provide more information than any individual among those. For example, computed tomography (CT) can capture the high density bone structures, while magnetic resonance imaging (MRI) can provide clear soft tissue structures in organs, such as brain, heart, and liver. Medical image fusion is the process of combining various information from multimodality medical images to obtain a fused image in order to increase clinical applicability for medical diagnosis and treatment of medical problems. Medical image fusion methods have been applied in human organs, including brain, breast, liver, and other organs. 1,2 Researchers have paid more attention to fuse multimodal medical images in order to obtain a single image with more information for physicians. Medical image fusion can be performed through the multiscale transform (MST) methods. 3, 4 The laplace pyramids (LP) 5,6 and wavelet transforms (WT) [7][8][9] are two kinds of classical MST methods.
The multiscale geometric analysis (MGA) tools have been employed in image fusion, including curvelet transform, 10 non-subsampled contourlet transform (NSCT), 11,12 and non-subsampled shearlet transform (NSST). 13,14 Moreover, the MST methods can extract with other image processing methods, such as sparse representation (SR) 15 and pulse coupled neural network (PCNN). 16 Two-scale image fusion methods can decompose each input image into a base layer and a detail layer through a filter method. The guided filter-based fusion method (GFF) 17 first extracts the average contrast to build the weight maps, then the guided filter is used to refine the weight maps of the two layers, and the final fused image is generated by the weighted average method. In Ref. [18], the authors established a gradient domain guided filter-based fusion model, and the weight maps are obtained by combing multiple visual features including contrast, sharpness, and structure saliency measures.
Medical image fusion can also be regarded as optimization problems. The variational image fusion methods always contain two terms. The first term is a fidelity norm used to constrain the gray approximation between the fused image and input images, and the second term is a regularization constraint. In Ref. [19][20][21], the authors utilized the TV norm as the regularization term, while the fidelity term is chosen to be the quadratic norm or L 1 norm. As a higher-order regularization constraint, the total generalized variation (TGV) is applied for image fusion as it can suppress the staircasing effect. 22,23 As two of most commonly used medical imaging modalities, CT and MR images are frequently combined for medical image analysis.
CT images are density imaging, and can capture bone structures with high intensity. MR images can capture clear soft tissue structure information, especially the tumors which have high intensity in MR-T2 images. For CT and MR image fusion, the fusion strategies of most existing methods usually extract the same features from source images and use them to construct weight maps which may lead to the "feature average" problem. In order to extract different and complementary features from source CT and MR images, we propose a novel fusion method which can keep these extracted features enhancement in the fused images. For example, the proposed method can extract not only the high intensity features (such as bone structures and tumor) from CT and MR brain images, but also the soft tissue texture structures from MR images. As a result, we can obtain fused images which preserve the bone structures, tumor features, as well as the soft tissue texture information in CT and MR images.
In this study, a two-stage fusion framework is proposed for CT and MR images. First, the contrast-based method and structure tensor are used to extract salient intensity information and geometry structures of medical images, respectively. The extracted intensity and structure features are used to construct weight maps and obtain an initial fused image. Then, a variational model is proposed to optimize the initial fused image. The variational framework consists of two terms: the fidelity term which is used to retain the saliency intensity information, and the regularization term which is used to constrain the approximation of gradient information between the fused image and MR image, and keep the fused image smooth at the same time. In the numerical experiments, the proposed method is compared with seven state-of-the-art medical fusion methods on several pairs of CT and MR images. The experimental results show that the proposed method can well preserve the clear bone structures, tumor features, as well as the soft tissue texture information from the source images, and get a good visual quality.
The rest of the paper is organized as follows. In Section 2, we introduce some related works about saliency detection and structure tensor. In Section 3, an initial fusion image is generated with the weighted fusion method, and then it is optimized by the proposed variational fusion model. Numerical experiments and discussion are shown in Section 4, and the fusion performance is evaluated by visual observation and quantitative evaluation. The conclusion is given in Section 5.

2.A | Saliency detection
Human vision perception system is highly sensitive to contrast in visual signals, such as color, intensity, and texture. Based on this observation, the saliency values of image pixels are defined by using color statistics of the input images. 24,25 The saliency value of a pixel I k in an image I is where DðI i ; I k Þ denotes the color distance metric, and i s calculated by the absolute value of the color difference between pixels I k and I i .
The saliency of a pixel is defined by using its color contrast to all other pixels in the image.
For gray input images, the color distance metric DðI i ; I k Þ is calculated by the intensity difference. When using the intensity values to rearrange the pixels of the image I, the pixels with the same intensity value c l are grouped together, and they have the same saliency value S(c l Þ.
Let the intensity value of the pixel I k be equal to c l , then the Eq.
(1) can be rewritten as where n is the number of intensities and f j is the probability of the pixel intensity value c j appearing in the image I. The saliency value S(c l Þ is calculated by the gray difference, and the pixels with high intensity have high contrast. So, the high intensity regions can be detected globally in the saliency map S(I). In Fig. 1, we compare the saliency maps obtained by the different methods. Fig. 1

2.B | Structure tensor
Given a gray image I : Ω ! ½0; 1, for every pixel x ¼ ðx 1 ; x 2 Þ, and IðxÞ is the intensity of the input image I at x. The gradient rI is defined as follows to describe the contrast change When the distance of adjacent two pixels in the image I becomes infinitesimal, the differential dI can be represented as and its square norm is G is called as the structure tensor, and can be expressed as And, as a semi-definite matrix, the structure tensor G can be decomposed as where λ 1 and λ 2 are eigenvalues,θ 1 and θ 2 are the corresponding eigenvectors. The eigenvalues λ 1 and λ 2 , respectively represent maximum and minimum contrast change rate of the gray image I. The gradient at pixel p can be calculated by where and α is a positive parameter. In

3.A | Weighted fusion based on multiple features
In image processing, the contrast feature can be used to detect the saliency intensity information, and the structure tensor is an effective tool to describe image geometry structures. In this section, the contrast and structure features are detected to generate the weight maps for weighted image fusion. Given two input images I ct and I mr , the weighted fusion process is shown in Fig. 3, and the main steps to get the initial fused image F 0 are described in the following. First, we use Eq.
(2) to detect the high-intensity regions of the input images, and the saliency maps SðI ct Þ and SðI mr Þ are segmented to generate two binary weight maps S w1 and S w1 . Meanwhile, the structure tensor is used to calculate the structure feature maps STðI ct Þ and STðI mr Þ, and the two structure weight maps ST w1 and ST w1 can be obtained by normalization, that is, The goal of the weighted fusion is to extract the high-intensity regions from the input medical images, and preserve the structures of other regions, so we calculate the final weight maps as follows: ; Finally, the weight maps w 1 ; w 2 are used to fuse the input medical images I ct and I mr , and the initial fused image F 0 is generated by the weighted average method, that is,

3.B | Variation-based fusion
Given the image domain Ω ⊂ R 2 and a pair of input images u 0 ; u 1 , we propose the variational fusion method through a minimization problem: Here, the final fused image and two initial input images are denoted to be u; u 0 , and u 1 , respectively. The first term is a square norm, and it is used to constrain the intensity of the fused image u to approximate the input image u 0 . The second term is a regularization term to constrain the fused image u smooth, and preserve the gradient information of u 1 at the same time.
In Section 3.A, the high-intensity information is well preserved When the TV norm is used as the regularization term, we get a smoothed result shown in Fig. 4 The solution of problem (13) is obtained as the final fused result.

3.C | Algorithm
The proposed variational model (13) is convex, and can be solved by the primal-dual algorithm. 26 The minimization problem (13) can be converted to a saddle-point problem which is derived by duality principles: Let ðp tþ1 ; u tþ1 Þ be approximating solutions in the iteration process, then we have By applying the proximal point algorithm, the proposed model can be easily solved as follows: Algorithm 1: Numerical Algorithm: •  Initialization: Choose σ>0, τ>0, u 0 , p 0 and set •  Iteration (t ≥ 0): Update u t ,ũ t ,p t ,p t as follows

4.A | Visual observation
To evaluate the fusion performance, the fusion result of the proposed method is compared with seven state-of-the-art fusion methods which are TGV, 23

4.B | Quality assessment
For objective quality assessment, we use five metrics to evaluate the fusion quality, which are spatial frequency (SF), 28  The five image quality metrics are introduced as follows: The spatial frequency (SF) is to measure the overall gray difference in an image: where RF is the row frequency and CF is the column frequency.
MI measures the degree of dependency between two variables X and Y, and it is defined as follows: MR-T2 where x and y are sampling variables, p XY is the joint probability distribution for X and Y, p X and p Y are the probability distribution for X and Y, respectively. In addition, HðXÞ ¼ IðX; XÞ is the entropy of X.
For the fused image F and input images A and B, we calculate the two mutual information IðF; AÞ and IðF; BÞ, then the normalized mutual information is calculated by The metric Q AB=F measures the relative amount of edge information that is transferred from the input images A and B into the fused image F: MR-T2 where Q G AF ði; jÞ and Q G BF ði; jÞ are the edge preservation values at the pixel location ði; jÞ. W AF ði; jÞ and W BF ði; jÞ are the weights, which indicate the importance of Q G AF ði; jÞ and Q G BF ði; jÞ, respectively. The metric Q W is calculated to measure how well the structural information is preserved from input images. The metric is first calculated in a local region where Q 0 is the universal image quality index, λðwÞ is a local weight indicating the relative importance of image A compared to image B.
The metric VIFF is a multiresolution image fusion metric using visual   The comparison of the metric VIFF is shown in Fig. 16; the proposed method has higher values compared with the seven fusion methods.  In

4.C | Discussion
For the visual observation, the proposed method can extract the salient features from source CT and MR images, such as the high-intensity bone structures, tumor features, and soft tissue texture information, and these salient features are well preserved in the fused images, while the seven compared methods tend to generate average or blurred features in the process of fusion. For quantitative quality assessment, the proposed method has the highest MI and Q AB=F values. The SF and VIFF values are higher because of the smoothing effect in the fusion process. In summary, the proposed method outperforms in intensity contrast preservation and texture structure extraction. In addition, our experiments are performed on brain images, and other images will be processed in the future work.

| CONCLUSION
In this paper, we propose a two-stage fusion method for CT and MR images. The intensity and structure features are extracted from source images for initial weighted fusion, and then a variational model is applied to optimize the initial fusion result. The comparison between the proposed method and seven state-of-the-art fusion methods are performed in the numerical experiments, and five comparison metrics are used for objective assessment. The experimental results and discussion show that the proposed method can well preserve the intensity and texture structure information from source images, and get a good visual effect.

ACKNOWLEDG MENTS
This work is supported by National Natural Science Foundation of