Bleeding detection in wireless capsule endoscopy videos — Color versus texture features

Abstract Wireless capsule endoscopy (WCE) is an effective technology that can be used to make a gastrointestinal (GI) tract diagnosis of various lesions and abnormalities. Due to a long time required to pass through the GI tract, the resulting WCE data stream contains a large number of frames which leads to a tedious job for clinical experts to perform a visual check of each and every frame of a complete patient’s video footage. In this paper, an automated technique for bleeding detection based on color and texture features is proposed. The approach combines the color information which is an essential feature for initial detection of frame with bleeding. Additionally, it uses the texture which plays an important role to extract more information from the lesion captured in the frames and allows the system to distinguish finely between borderline cases. The detection algorithm utilizes machine‐learning‐based classification methods, and it can efficiently distinguish between bleeding and nonbleeding frames and perform pixel‐level segmentation of bleeding areas in WCE frames. The performed experimental studies demonstrate the performance of the proposed bleeding detection method in terms of detection accuracy, where we are at least as good as the state‐of‐the‐art approaches. In this research, we have conducted a broad comparison of a number of different state‐of‐the‐art features and classification methods that allows building an efficient and flexible WCE video processing system.

The capsule, shaped like a normal pill, can be swallowed by the patient in the presence of clinical experts without any discomfort.
Unlike conventional endoscopy procedures, it explores the whole GI tract of the patient without any pain, sedation, and air insufflation.
The Food and Drug Administration (FDA) approved the use of WCE in 2001 as a medical tool to examine the mucosa of the stomach and small intestine in order to detect various abnormalities and diseases. Until now, the WCE technology has assisted more than 1.6 million patients worldwide. Figure 1 shows the typical internal components of a WCE. Modern WCEs are pill-shaped (26 mm × 11 mm) devices, and they consist of the light sources, a short focal length charge-coupled device (CCD) camera, a radio frequency transmitter, a battery-based power supply, and a few other electronic components. Once a patient swallows the capsule, the WCE starts capturing frames with 1-30 frames per second (FPS), depending on the device type and its purpose, and the frames are sent wirelessly to the recorder unit. This process usually takes 8-10 h before the WCE's battery is drained. During this time, the WCE has produced around 50,000-80,000 frames for each patient. The captured video allows clinicians to diagnose and detect ulcers, tumors, bleedings, and other lesions within the GI tract later offline to make diagnostic decisions. Although the WCE technology has many advantages, there is still room for research. For example, currently, it is tough for the clinicians to inspect the whole set of 50,000 and more frames to locate a disease. They might miss the disease at the early stage due to visual fatigue and small size of the lesion area. A software was developed by Given Imaging, which aims to detect active blood automatically, but the sensitivity and specificity are reported very low. 6 A new method is proposed in this paper, based on morphological operations and a machine-learning-based classification including a support vector machine (SVM) to differentiate between normal and abnormal frames for bleeding findings. As color and texture are the main features to explore bleeding frame candidates, this paper is focusing on color detection in the red-green-blue (RGB) color space and various texture features. Experimental analysis depicts that this method is capable of performing bleeding detection with the performance achieved at least as good as the state-of-the-art techniques.
At the same time, this paper provides the broad comparison and analysis of the different state-of-the-art features and classification methods in terms of their usability for building the efficient and flexible WCE video processing systems. The remainder of this paper is organized as follows: Section 2 provides a short survey of the related works found in the literature. Section 3 describes our methodology and the proposed algorithm. Results and discussions are presented in section 4. Finally, in section 5, we present our conclusions and provide directions for future work.

| RELATED WORK
Bleeding is very a common abnormality found in the GI tract. Many researchers have contributed to detecting this with high-performance classifiers. It is crucial to detect bleeding at an early age since it is a precursor for inflammatory bowel diseases such as Crohn's disease and UC. Figures 2(a) and 2(b) show the normal mucosa and bleeding, respectively. Bleeding are not limited to the stomach, but in fact, they can occur anywhere in the whole GI tract, 7 and they can be considered as a common anomaly detected by WCEs often defined as "bleeding of unknown origin that recurs or persist or is visible after an upper endoscopy and/or negative endoscopy result". 8 The primary challenge is that blood spot and residual traces do not F I G . 1. Composition of WCE and data acquisition setup. 5 have any typical shape and texture, and the color of blood might vary from light red to dark intense red and brown, which makes the blood challenging to differentiate from the intestinal content or other objects present in the intestine. This diversity of color might depend on the position of the camera capsule, the bleeding timing 9 and the surrounding condition of the intestinal content. 10 Bleeding is not a single pathology, and it may be caused by a variety of small intestinal diseases, such as angiodysplasia, open wounds, ulcer, vascular lesions, tumors, and Crohn's disease. Both color and texture features have been used to discriminate pathology, and some related works are discussed in this section. Baopu Li 11 incorporated an intelligent system to detect a bleeding region in WCE videos using chrominance-moments-based texture. Mathew and Gopi 12 have presented a method of discrimination between bleeding and nonbleeding frames using a contourlet transform with two levels of decomposition for color and texture features into coarse band and sub-bands. A rotation invariant Local binary pattern is applied on coarse band and sub-bands. Liu and Gan 13 have designed an algorithm using a joint diagonalization principal component analysis (PCA) combined with the color coherence vector (CCV) where no iterations, approximations, and inverting procedures are required. This method overcomes the problem of PCA and the "curse of dimensionality" of the original asymptotic PCA. Tuba et al. 14  Lack of distinguish pattern and manually crafting of a feature vector of SURF, the author used a convolutional neural network to learn texture feature from various abnormal endoscopic findings.
Pixel-level methods are supposed to be more accurate in order to classify bleeding and nonbleeding pixel samples efficiently. Yuan 20 extracted color features on the pixels in WCE frames and used thresholding in the color space to identify bleeding regions. Jia 21 presented an automated bleeding detection strategy which includes discrimination of the bleeding and nonbleeding frames, and, later, applying segmentation on the bleeding region using pattern recognition approaches. Moreover, in Ref. 22 the authors used super-pixel segmentation to reduce the computational complexity with high diagnostic accuracy. In comparison with frame-level methods, detection using a pixel-based method is more accurate with respect to high performance and accuracy. However, pixel-based methods still have a high computational cost, and it is computationally demanding (more than 50,000 frames need to be examined for a single patient).
To summarize, researchers have studied to analyze each and every frame of WCE video sequences to detect the frames with a pathological alteration. These experiments have been performed by using various image processing and pattern recognition techniques to generate proper frame characteristics, for example, computing color and texture features using various color models. These characteristics define the classification on the basis of frame pixels and frame regions for discrimination between normal and abnormal tissue structure. In our previous work, we have developed an algorithm to extract color feature for ulcer using statistical feature analysis. 23 .
This work has the contribution to explore color-and texture-dependent features. Most of the techniques extract the color and texture feature from WCE frames. Various methods are dealing with the individual pixel value, although the blocks of pixels have the potential to detect bleeding frames with high-performance metrics such as sensitivity, specificity, and accuracy.

| BLEEDING DETECTION
The bleeding detection technique proposed in this paper is shown in  format conversion to RGB color space. Then, we perform a removal the frame borders, over-and under-exposed pixel block to reduce the number of fault detection in these areas. Next, a frame enhancement step is performed using edge masking and noise removal. Later, the feature extraction and bleeding detection are performed using color and texture features. Finally, a classification step is performed for bleeding detection at a frame-level and a pixel-level with an appropriate classifier.

3.A | Removal of bright and dark blocks
In the GI tract, there can be areas that are both under-and over-illuminated, and which are, therefore, cannot be processed. For example, a large air bubble packet 23 can be categorized in this class. Luminance is computed as the square root of a sum of individual RGB squares. We have calculated this for each 16x16 block of pixels: where i and j are the horizontal and vertical indices in the frame, respectively.

3.B | Edge and noise removal
We encounter various false results due to the presence of edge information in frames which may lead to a wrong detection. These edges are basically intestinal folds, and the ambiguous edging is caused by a random vector of camera's view direction. To eliminate this information, we are using a canny operator. 24 The parameters of the canny algorithm allow recognizing edge with differing characteristic depending on desired requirements. For this experimental setup, we have chosen a standard deviation of 0.35% and 35% of the pixels in the frame to reduce noise and to perform a robust detection of pixels at the edges. Hence, we have used the upper and lower threshold values of τ 1 = 0.3 and τ 2 = 0.7, respectively. Morphological dilation is then applied to dilate the detected information. If A is a frame after masking operation and B is the structural element, then dilation of A by B is defined as follows 22 : The above equation is based on getting the reflection of B to its origin, and the reflection is shifted by z. Also, frame enhancement 25 is required to highlight key data by removing auxiliary information in a frame. We have removed any Gaussian noise using wavelet denoising with three levels of decomposition. Wavelet db2 with soft thresholding is applied to reduce noise and enhance relevant information in bleeding frames.

3.C | Color features
Color is one of the most often used features of images, and it can be specified by using various color models. Once the color space is defined, the color feature can be extracted from the frame or a particular defined region. In the RGB color space, the optical frequency bands are defined as 630-780 nm for red (R) band, 490-560 nm for green (G) band, and 450-490 nm for blue (B). For bleeding, the red channel has a high reflectivity, but the green and blue channels have comparatively lower reflectivity and a little difference between values. Thus, we can detect a bleeding region by detecting high red areas, and by computing the red ratio feature for individual pixels containing the three components as features C1, C2, and C3 are The frame processing sequence of the proposed bleeding detection method.
F I G . 4. The example of the frame processing steps output for frame-level bleeding detection procedure.
shown in eqs. (3)-(5), respectively. The C3 feature is the proportion of the R channel in all three primary colors which is also called the chromaticity. Our fourth feature (C4) is the ratio of the red channel with the vector amplitude of the green and blue channels as represented in eq. (6). The chroma value for bleeding is very high compared to normal mucosa, and the chroma value is therefore used as another feature (C5), as shown in eq. (7).

3.D | Color-based classification
The color features extracted are then used as an input to SVM supervised learning model. SVMs are accurate as they contain appropriate kernels (implicit mappers of inputs into high-dimensional feature spaces) that work well even if the data are not linearly separable in the future base space. By using the kernel functions of SVMs, 24 one can perform a nonlinear classification more accurately by mapping its input to high-dimensional feature spaces. 25 Various hyper-planes separate the input instances between a set of predefined classes (two in our use-case). However, it is important to select the best one which has the largest distance to the nearest data point of two classes. Grid search 25 is the conventional method of performing the optimization of hyper-parameter utilizing parameter sweep or grid search through a manually specified subset of the hyper-parameter of a learning algorithm. This algorithm must be guided by some performance metric, normally measured by evaluation on a held-out validation set or by cross-validation of the training dataset. In this article, we are using an SVM classifier with a radial basis function (RBF) kernel having at least two parameters (regularization constant C and kernel hyperparameter γ) that need to be tuned to achieve high performance on the testing data. The mathematical descriptor is shown below for a binary classification problem: {(x 1 , y 1 ), (x 2 , y 2 ), …, (x k , y k )}, where x i ε R n represents the n-dimensional feature vectors, and y i ε {1, −1} is the corresponding class label. The SVM requires the solution of the following optimizing problem: here, ε i is the slack variable for misclassified examples, and C is the penalty parameter of the error term. In addition, is the kernel function. There are four kernel functions used for the pattern recognition and classification: a linear kernel, a polynomial kernel, an RBF and a sigmoid kernel. We have adopted the RBF 24 kernel in this paper: here, γ is the parameter which must be carefully selected in the experiment. The optimum values for the parameter C and log 2 γ were selected from the range: (−8, 7, 6, …, 6, 7, 8). The grid method 25 was adopted as the searching procedure (a 0.8 step was used). Each γ and C value pair was used in the training data with tenfold cross-validation in order to evaluate the model performance.
Once the optimal values of γ and C were found, they were adopted to train a new SVM model.
The feature vector used as an input for our SVM-based detection approach is defined as [C1, C2, C3, C4, C5]. After removal of dark spots, as shown in Fig. 3(b), each pixel is classified as either bleeding or nonbleeding pixels. All the features are fed to SVM which considers three types of kernels, that is, polynomial, linear, and RBF. The number of pixels is considered as the threshold for frame classification which depicts whether the current frame is showing bleeding or nonbleeding areas. A frame containing bleeding pixels is labeled as a bleeding sample; otherwise, it is labeled as a negative sample.

3.E | Texture features
Texture is a very useful feature for a wide range of use cases in image processing and classification tasks. It is generally assumed that the human visual system uses textures for recognition and interpretation of visual input. In general, color is usually a pixel property while texture can only be measured from a group of pixels. 26  One of the good candidates for the texture analysis is a statistical method of examining texture that considers the spatial relationship of pixels is the gray-level co-occurrence matrix (GLCM). GLCM is a matrix that is defined over a frame to be the distribution of cooccurring pixel values (grayscale values, or colors) at a given offset, also known as the gray-level spatial dependence matrix. The GLCM functions characterize the texture of a frame by calculating how often pairs of pixel with specific values and in a specified spatial relationship occur in a frame, creating a GLCM, and then extracting statistical measures from this matrix. 27 The horizontal direction 0°w ith a range of 1 (nearest neighbor) was used in this work. The 22 texture descriptions extracted from each of the gray tone spatial dependence matrices are presented in Table 1. The following pre-defined formulas are used: where p(i, j) is the (i, j)-th entry in a normalized gray-tone spatial-dependence matrices, p x (i) is the i-th entry in the marginal probability matrix obtained by summing the rows of p(i, j), N g is the number of distinct gray levels in the quantized frame, and HX and HY are the entropies of p x and p y .
The entire textural features are extracted from the gray-tone spatial-dependence matrices. The equations, which define a set of 22 measures of textural features, are presented in Table 1. The mentioned features T1, T2, T5, T6, T12-T19 are taken from the Haralick   feature, 27 the features T3, T8-T11 are inspired from, 28 and the other features T5, T21, and T22 are used from 29 .
The feature T1 is also called Energy or Uniformity, which is a measure of homogeneity of a frame. A similar scene will contain only a few gray levels, giving a GLCM with only a few but relatively high values of P(i, j). Thus, the sum of squares will be high. Cluster prominence (T10) is also a measure of asymmetry. 30 When the cluster prominence value is high, the frame is less symmetric. In addition, when the cluster prominence value is low, there These findings are typically colored in shades of red: from bright red for fresh blood to dark brown for old blood residuals. At first glance, 30 the color features seem to be the best option for the detection of areas with bleeding, but, in general, it is not true because of the second group of pixels which are dominating in the GI tract frames.
The second group of pixels is associated with different findings that are not-bleeding-related, like normal GI tract tissue, stool masses, food leftovers, bubbles, instruments, water, over-under-illuminated areas, etc. All these "normal" findings can be colored in more or less random colors, but they can also be colored in shades of red, for example, food leftovers, some types of fecal masses, some types of normal GI tract tissue, etc. In contrast to the first group of pixels, for this group of pixels, the texture is an important characteristic that allows the system to distinguish between different types of findings, and in combination with the color information, the texture-related local characteristics are an essential input for machine-learning methods to perform pixel-level classification.
Detailed pixel-perfect classification is essential because the bleeding areas can be both big and very small and, at the same time, the state-of-the-art WCE devices have a relatively low spatial resolution.
Thus, an evaluation of each pixel is essential for high-performance bleeding detection in this scenario.
In the proposed pixel-level classification approach, we use Random Tree (RT), 30 Random Forest (RF), 30

4.A | Performance metrics
The Recall Sensitivity F I G . 5. The sample wireless capsule endoscopy frames and their segmentation masks.
For overall performance evaluation of the proposed methods, we

4.B | Frame-level bleeding detection
As per data availability at UMMC, we have chosen 300 bleeding frames and 200 nonbleeding or normal frames for the training dataset (500 frames). The testing data set consists of 500 bleeding and 200 nonbleeding frames (700 frames). All these samples are randomly extracted from 27 different videos for comparative experiments. All the bleeding frames were annotated by the experienced endoscopists provided the bleeding areas segmentation masks with the true values assigned to the bleeding pixels.
The example output of the sequential frame processing steps is depicted in Fig. 4. The number of pixels that are considered as the positive bleeding detection threshold was set to 280 pixels in order to achieve the optimal bleeding detection performance metrics for this example and all the frame-level bleeding detection experiments.
This threshold value was selected based on our previous studies 33 showed that below this number, the detection method could incorrectly detect angiodysplasia or other small dark patches as a bleeding region.
The experimental results depicted in Table 2 shows the REC, SPEC, ACC, F1, and MCC metrics for the different classifications.
Among three different kernels of SVM, RBF kernel shows the best results for classification. Few frames have misclassified as they contained angiodysplasia, which is a small vascular malformation of the gut, also colored in red. The performance of our methods is compared to the best results reported in Ref. 10    greater than zero correspond to valid and better-than-random predictions. Thus, we can expect small but noticeable benefits of

4.C | Pixel-level bleedings detection
The comparison of the color and texture features (marked as "F") for the bleeding (marked as "Bleeding") and normal (marked as "Nonbleeding") WCE samples with the corresponding bleeding pixels detection tenfold cross-validation weighted average MCC performance. The color and texture features are marked with "C" and "T" prefixes respectively with the following texture identifier. All the feature output frames are range-normalized. Red color is used to mark pixels that are nonmeaningful or contain nonnumbers after the features extraction.   (T1-T22), (b) the top-MCC texture features (T4, T6,   T8, T13 and T14), and (c) the visually different texture features with   the highest possible MCC (T4, T6, T9, T11, and T13).
In  n/a n/a n/a n/a 21 0.99 0.97 n/a n/a 0.98 n/a n/a n/a 22 n/a 0.99 0.94 0.95 n/a n/a n/a n/a MCC value becomes lower with the limit of zero for the fully unbalanced dataset. Thereby, comparative analysis of the F-Measure and MCC score can also be used to estimate the dataset balance for the results obtained using the nonpublic and not well-described datasets.
Thus, the only metrics that can efficiently be used for the direct performance comparison of the different methods on the different datasets is MCC, and we, therefore, invite all the researches to report this metrics or a whole set of TP, TN, FP, and FN values enabling a computation of all the metrics can be used for the method's comparison.

| CONCLUSION
In this paper, we presented a developed automated bleeding detection algorithm that detects the frame with bleeding as well as pixels that are associated with bleeding areas. We briefly describe the related work and the base ideas of our detection approach. We For the future work, we plan to extend the sets of texture and color features used in our classification approach and to perform a more in-depth statistical analysis of the value of different features for the classification performance. Next, we plan to extend the methods presented in this paper for WCE ulcer frames analysis in order to support UC and inflamed areas detection and localization.
Finally, using our previous successful experience 21 in speeding-up of feature extraction using heterogeneous resources such as graphical processing units (GPU), we plan to implement the feature extraction code on GPU, which will allow a significant increase in the performance of our proposed detection approach in relevance with frame processing speed.

ACKNOWLEDGMENT
This work is founded by graduate assistantship (GA) scheme, Universiti Teknologi PETRONAS, Perak, Malaysia and by the FRINATEK project "EONS" #231687, Norway.

CONFLI CT OF INTEREST
The authors declare that they have no conflict of interests.