Evaluation of video compression methods for cone‐beam computerized tomography

Abstract Purpose Cone‐beam computerized tomography (CBCT) is routinely performed for verification of patient position in radiotherapy. It produced a large amount of data which require a method to compress them for efficient storage. In this study three video compression algorithms were introduced and their performance was evaluated based on real patient data. Materials and methods At first CBCT images in multiple sets of a patient were transferred from reconstruction workstation or exported from treatment planning system. Then CBCT images were sorted according to imaging time (time‐prioritized sequence) or imaging location (location‐prioritized sequence). Next, this sequence was processed by a video compression algorithm and resulted in a movie. Three representative video compression algorithms (Motion JPEG 2000, Motion JPEG AVI, and MPEG‐4) were employed and their compression performance was evaluated based on the CBCT data of 30 patients. Results Among three video compression algorithms, Motion JPEG 2000 has the least compression ratio since it is a lossless compression algorithm. Motion JPEG AVI and MPEG‐4 have higher compression ratios than Motion JPEG 2000 but come with certain image losses. For MPEG‐4, location‐prioritized sequences show higher compression ratio than time‐prioritized sequences. Based on the results achieved on the clinical target verification application, the registration accuracy of CBCT after decompression was comparable to that of the original CBCT. Conclusions Video compression algorithms could provide a higher compression ratio comparing to static image compression algorithm. Although the loss of CBCT image due to compression its impact on registration accuracy of patient positioning is almost negligible. Video compression method is an effective way to substantially reduce the size of CBCT images for storage.


2.A | Standards of image and video compression
There are many working groups that aim at image and video coding.
They formed many working groups dedicated to set standard for image, audio, and video compression and transmission. [15][16][17] The most famous standards are JPEG, MPEG, and H.26x.
The Joint Photographic Experts Group (JPEG) was created in 1986 and is the joint committee between International Organization for Standardization (ISO)/International Electronic Commission (IEC) and International Telegraph and Telephone Consultative Committee (CCITT) that created the JPEG and JPEG 2000 standards. 18  Inter-frame prediction coding utilizes the strong correlation between successive frames and is the most popular video compression YAN ET AL.
| 115 algorithm which was adopted by many international standards such as MPEG-4 and H.264.
Video encoding technique exploits certain characteristics of video signals, namely, redundancy of information both intra-frame (spatial redundancy) and inter-frame (temporal redundancy). 21 The intra-frame compression algorithm (Figure 1a), such as Motion JPEG or Motion JPEG 2000, begins by calculating the DCT or wavelet transform (WT) coefficients over small image blocks. This block-byblock processing takes advantage of the image's local spatial correlation properties. The DCT or WT process produces many 2D blocks of transform coefficients that are quantized (Q) to discard some of the trivial coefficients. The quantized coefficients are then processed by encoding to form a video frame. On the other hand, inter-frame coding [ Fig. 1(b)], such as MPEG, exploits temporal redundancy by predicting future frames from previous reference frames. The motion estimator searches reference frames for areas similar to those in the current frame. This search results in motion vectors, which is used to form a prediction of the current frame based on reference frames via motion compensator. The difference image between the current frame and predicted frame is then calculated. Since only difference image and motion vector needed to be encoded instead of the original image, inter-frame coding always results in a significant reduction in video size.

2.C | Video compression for CBCT images
The work flow of video compression process for CBCT images is

2.D | Evaluation
The performance of three video compression algorithms as men-   The similarity in quality of image sequence is evaluated by difference and correlation between all successive images in a sequence.
Higher value of similarity of a sequence means there is more redundant information to be reduced and larger compression ratio is expected. The image difference (DIFF) is calculated by mean value of image differences in a sequence as defined below.
The image correlation (CORR) is calculated by mean value of image correlation coefficients in a sequence as defined below.
The performance of video decompression algorithm is evaluated by decompression time, mean square error (MSE), peak signal-tonoise ratio (PSNR), and video quality matrix (VQM). Decompression time is the average time for processing one image from video. The MSE is calculated by comparing original and decompressed images pixel by pixel as defined below.
It is important to compare the error of an image with respect to the amount of bits a pixel is encoded. PSNR is the ratio between the maximum power of a signal and the power of corrupting noise that affects the fidelity of its representation. In this case it is defined as: here, MAX is the maximum possible pixel value of the image and 2 16 -1 in this study. Typical values for the PSNR of lossy image and video compression are between 60 and 80 dB, provided the bit depth is 16 bits. VQM is a metric to predict human-perceived video quality and is defined as: The values of a = 0.15 and b = 19.7818 have been set experimentally. The resulting VQM is compared to fuzzy results like "excellent" (VQM < 20%) or "good" (VQM < 40%). 22 The impact of image loss on positioning accuracy was assessed using a clinical image registration applicationoffline review (Varian medical system, Palo Alto, CA, USA). First, the original CBCT images were automatically registered with planning CT to determine target offset for patient positioning. Next, CBCT images were com-

3.A | Video compression
The performance of three video compression algorithms was compared as shown in Table 1

3.B | Video decompression
The performance of three video decompression algorithms was compared as shown in Table 2

3.C | Registration accuracy
The discrepancy of positioning accuracy before and after compression for two lossy algorithms, AVI and MP4, are summarized in

| DISCUSSIONS
MP4 demonstrated its superior compression capability over MJ2 (lossless encoding algorithm) and AVI (lossy intra-frame coding algorithm). This is attributed to its algorithm in reducing more redundant information between two successive images. The compression ratio of MP4 with position-prioritized sequence is higher than that of MP4 with time-prioritized sequence. It was also observed that DIFF of position-prioritized sequence is less than that of time-prioritized sequence. Both facts indicate that position-prioritized sequence may improve the compression ratio for MP4. For different treatment sites, the compression ratios are varied to certain degrees. The compression ratio is highest for head-and-neck cases and lowest for pelvis cases for all three video compression algorithms. This implies that the variation of CBCT images between different sessions in head-and-neck cases may be less than that in pelvis cases. For those cases with larger variation between different sessions, the compression ratio using video compression algorithm might be lower.