1. Introduction
Volumetric modulated arc therapy (VMAT) is a popular rotational radiotherapy (RT) technique due to its faster delivery, increased degree of freedom for dose optimization, and improved dose conformity [
1,
2]. To detect the intra-fractional anatomical changes without introducing extra imaging dose or cost, Poludniowski et al. [
3] proposed to reconstruct megavoltage (MV) computed tomography (CT) using electronic portal imaging device (EPID) images collected during VMAT and named it VMAT-CT. The proposed reconstruction is a three-dimensional (3D) lambda tomography (LT) method based on the Feldkamp–Davis–Kress (FDK) algorithm and lambda filter [
3]. However, the poor image quality of VMAT-CT due to data insufficiency, truncation and blurriness hindered its applications in clinic. To improve the image quality of VMAT-CT, our group proposed a new extrapolation scheme that extrapolates along collimator angles instead of horizontal direction to preserve most of the useful information in EPID images [
4]. Furthermore, we proposed systematic methods to preprocess EPID images, including online region-based active contouring, multi-leaf collimator (MLC) motion modeling, and outlier filtering, and significantly improved the image quality of VMAT-CT for multiple treatment sites (head and neck, lung, and esophagus) [
5].
However, our methods still failed to reconstruct VMAT-CT from certain VMAT plans that had extremely insufficient projection data. The failures resulted from the inherent limitation of the LT algorithm: the reconstruction quality degrades dramatically if the projection sampling is sparse or the projection angle range is less than 180° plus the fan angle [
6,
7]. Although the LT is non-quantitative in that it does not require an exact and unique reconstruction [
8,
9], the reconstruct may fail if the sampled angle range for certain voxels is fewer than the lowest acceptable cutoff threshold (1.57 radians or 90°) [
3].
Iterative reconstruction algorithms for incomplete projection data have been proposed to overcome the limitations of the LT algorithm. The iterative algorithms can introduce image constraints [
10,
11,
12,
13], which can be prior known information or realistic assumptions of the missing data such as positivity of voxel values, bounds of image smoothness and voxel values, so the reconstruction can be protected from unrealistic artifacts and distortions coming from data deficiency.
The concept of compressed sensing (CS) was proposed in 2006 [
14]. According to CS theory, a signal can be recovered from fewer samples than the number required by Nyquist sampling theorem if the signal is sparse. Consequently, if an image
can be sparsified by operations such as a discrete gradient operator [
15,
16], the image can be reconstructed from less sampling. Sidky et al. proposed an iterative algorithm for CT reconstruction based on CS theory and incorporated total variation (TV) minimization [
17], and later developed the iterative algorithm named adaptive-steepest-descent-projection-onto-convex-sets (ASD-POCS) [
18]. Since then, other studies have adopted TV minimization in the iterative reconstruction to solve the problem of insufficient projection data [
19,
20,
21].
TV minimization has an assumption that the pixel values within the structures in a CT image are piecewise constant such that the non-zero signal concentrates on the boundaries of these structures in the TV domain of the CT image [
22,
23]. Under this assumption, TV minimization can exploit the gradient sparsity by the L1 regularization technique, protect the edges of internal structures within images, and smooth out noises within the anatomical structures, which is suitable for medical images that have a uniform intensity within a structure. However, if the image intensity fluctuates drastically because of complicated structures or large streaking artifacts, TV minimization might introduce staircase artifacts that degrade the image quality and fail to remove the streaking artifacts in the CT reconstruction [
24]. Although many revisions of TV minimization have been proposed to solve this problem such as edge-preserving TV [
25], anisotropic TV [
26], and adaptive-weighted TV [
27], they are still not effective for images with poor image contrast and require considerable tuning labors.
The block-matching and 3D filtering (BM3D) method [
28,
29] is an advanced image-denoising method that can also encourage data sparsity. In this method, 3D stacks, which are similar patches grouped by block-matching, can be sparsified by linear transform such as Fourier transform and hard-thresholding such as L1 regularization [
28]. While TV minimization utilizes gradient sparsity in the spatial domain, the BM3D method achieves sparse representation in the transform domain but with the assumption that an anatomical structure recognized by blocking matching would have a similar appearance throughout a medical image. Unlike TV minimization, the BM3D method does not expect an image to have uniform intensity within a structure, thus avoiding the introduction of staircase artifacts when the image intensity fluctuates. Therefore, several groups have proposed iterative algorithms for CT reconstruction using BM3D filters [
30,
31,
32].
The goal of this study was to develop a CS-based iterative algorithm for VMAT-CT reconstruction. This algorithm utilizes both TV minimization and BM3D denoising, and can further improve the image quality of VMAT. The preprocessing methods previously developed by our group [
5] were also used in this study to achieve the best results. To the best of our knowledge, this study developed the first iterative VMAT-CT reconstruction algorithm that is superior to FDK-based algorithms in terms of success rate and reconstructed image quality.
4. Discussion
The concept of VMAT-CT was proposed a decade ago but did not gain popularity due to multiple limitations and technical challenges. Because the daily portal images during VMAT are highly blurred due to beam modulation, and commercial software cannot be used to reconstruct CT based on these images, most clinics in the US do not collect or utilize these images to our knowledge. A huge amount of image data that does not require any additional hardware, beam time or imaging dose could have been used for treatment monitoring and dose tracking purposes. There are some studies that investigated prostate localization during VMAT based on fiducial markers and portal images collected during VMAT [
47,
48], but this type of tracking cannot reveal patient anatomy or dose information.
In this study, we adopted the concept of CS theory, introduced TV and BM3D as the regularization constraints, and developed a TV-BM3D iterative reconstruction algorithm to improve the image quality of VMAT-CT. We succeeded in reconstructing 48 out of 50 phantom cases and 15 out of 17 patient cases using iterative + preprocessing. In contrast, only 39 phantom cases and eight patient cases could be reconstructed with FDK, and 44 phantom cases and 11 patient cases could be reconstructed with FDK + preprocessing. All phantom and patient cases show improvements in the image quality using the TV-BM3D iterative reconstruction algorithm. Our iterative algorithm can remove the irregular artifacts due to insufficient projection data and show the hidden structures in VMAT-CT that could not be revealed by the FDK-based algorithm.
The BM3D denoising algorithm characterizes pattern searching by extracting similar blocks within an image and grouping them into a few templates. With the collaborative filtering to enhance the similarity between blocks in each template, BM3D can reconstruct structures such as bones based on the assumption that these structures feature similar appearance in a medical image. On the other hand, TV minimization assumes that the voxel values within a structure in a CT image are nearly the same. Therefore, BM3D and TV exploit data sparsity with different assumptions, and both provide constraints for the iterative algorithm to solve the sparse data problem in CT reconstruction, making the TV-BM3D iterative algorithm more effective than TV minimization or BM3D alone. The proper choice of block size and noise level in the BM3D method is crucial for the denoising performance, and extra tuning efforts are required to balance the denoising power of TV and BM3D regularizations.
Reconstruction from incomplete projection data, such as limited-angle or truncated field-of-view CT, is an ill-posed problem in which analytical algorithms like the FDK often produce strong streak artifacts and noise amplification due to violation of the full-sampling assumption. Iterative reconstruction methods incorporating sparsity constraints have therefore become the state-of-the-art for sparse or truncated CT data. In particular, TV regularization has been widely used to suppress streak artifacts and stabilize reconstruction from limited projections, although it may introduce over-smoothing and loss of fine anatomical details, particularly under severe data incompleteness [
18]. Several studies have proposed improved regularization models to overcome these limitations. For example, compressed-sensing-based CT reconstruction frameworks demonstrated that sparse regularization could significantly improve image quality with reduced projection data, forming the theoretical foundation for many modern iterative CT reconstruction algorithms. Subsequent developments introduced adaptive or relative TV models to better preserve edges and textures under limited-angle acquisition. More advanced methods incorporate additional priors such as non-local patch similarity or prior-image constraints to better preserve structural information [
49]. For example, the prior-image-constrained compressed sensing framework and non-local regularization approaches have demonstrated improved reconstruction accuracy for undersampled CT data [
50].
Building on these developments, the proposed algorithm integrates local sparsity constraints (TV) with non-local self-similarity priors (BM3D) within a unified optimization framework solved using the Split Bregman method [
28,
39]. This hybrid regularization strategy improves both noise suppression and structural preservation compared with analytical reconstruction and TV-only methods. Quantitatively, the proposed method achieved CNR values ranging from 3.61 to 19.57 (mean ≈ 9.3) and SSIM values ranging from 0.087 to 0.782 (mean ≈ 0.27) across all 64 cases. While the mean SSIM appears lower than some reports in the literature, meaningful comparison requires careful consideration of acquisition conditions, reference definitions, and task difficulty, particularly in sparse-view and limited-angle CT reconstruction.
TV-based reconstruction remains a standard baseline for sparse-view and limited-angle CT. Conventional TV-based baselines have demonstrated SSIM values of approximately 0.812–0.960 and CNR values of 1.97–7.27 under sparse-view conditions (30–90 projections) on the Shepp–Logan phantom. More advanced TV-based variants, such as reinforced TV (rTV), have pushed performance further with SSIM up to 0.984 and CNR up to 14.26 under 90-projection sparse-view acquisition [
51]. In more demanding limited-angle configurations, single-energy TV regularization has yielded SSIM ≈ 0.88 and CNR ≈ 2.8 on anthropomorphic phantoms [
52]. More recent work, such as Xi et al. [
53], reports SSIM values of approximately 0.85–0.93 for standard TV and up to ~0.90–0.97 for advanced high-order TV formulations, when evaluated against matched full-view reference images under moderate sparse-view conditions. However, these high SSIM values are largely attributable to matched-reference evaluation and moderate undersampling regimes. Importantly, TV-based methods inherently impose piecewise-constant assumptions, which suppress noise but also attenuate low-contrast features and fine textures, leading to moderate CNR improvement but reduced structural fidelity, particularly in highly undersampled scenarios.
Recent studies have explored deep learning-based reconstruction or sinogram completion for limited-angle CT. Across deep learning-based methods, SSIM values are typically reported in the range of ~0.80–0.95 under moderate sparse-view or low-dose conditions, depending on the similarity between training and testing distributions [
54]. However, these methods are typically evaluated on datasets with consistent geometries, full-angular sampling ranges, and reference reconstructions from dense-view filtered back-projection. Moreover, they often rely on large, well-matched training datasets, supervised learning with high-quality ground truth, and limited generalizability across imaging systems or treatment sites [
55]. In contrast, the proposed approach operates directly on measured portal images without the need for training data, supporting the feasibility of VMAT-CT as a practical in-treatment imaging modality for treatment monitoring and adaptive radiotherapy.
What fundamentally distinguishes our study from the vast majority of the sparse-view and limited-angle CT reconstruction literature is the nature of the raw data. Nearly all benchmark studies—including those employing TV-based methods, advanced TV variants, and deep learning approaches—evaluate performance on kV CT datasets acquired under idealized or controlled conditions: well-calibrated geometries, consistent photon flux, and relatively predictable noise characteristics. In these studies, the primary ill-posedness stems solely from angular undersampling or restricted scan ranges, with the underlying projection data remaining otherwise coherent and physically well-behaved. In stark contrast, our data originate from MV portal imaging, which introduces a cascade of compounded degradations absent from conventional benchmarks: inherently poor image quality due to high-energy photon physics, severe irregularity from MLC modulation that creates highly non-uniform fluence patterns, substantial blurring from both MLC motion during delivery and scatter-dominated signals, and extreme angular incompleteness far beyond typical limited-angle scenarios. Consequently, where standard sparse-view studies address reconstruction under ideal acquisition models with controlled subsampling, we confront a regime in which the forward model itself is corrupted by time-varying modulation, mechanical motion, and physical degradations that violate nearly all conventional assumptions. This places our reconstruction task in a fundamentally more challenging class of problems, rendering direct quantitative comparisons of CNR and SSIM across studies inherently inequitable without careful contextualization of the underlying data fidelity and acquisition physics. That being said, the proposed method demonstrates substantial performance gains, with multiple cases achieving CNR > 14–19, exceeding the typical upper range reported for conventional iterative reconstruction. Similarly, the upper range of SSIM values (0.4–0.8) approaches or surpasses those observed in early deep learning-based reconstruction frameworks. To our knowledge, this work represents the first demonstration of an iterative reconstruction framework specifically designed for VMAT-CT, and it highlights the potential of advanced regularization methods to overcome the severe data incompleteness inherent in treatment-time imaging.
There are several limitations of this study. First, the stopping threshold value (0.005) is determined by our trials of VMAT-CT reconstruction. However, the convergence is affected by the strengths of BM3D denoising which is tuned by the noise levels in BM3D, as well as TV minimization which is tuned by steepest-descent step size and the number of steps within the inner-iteration. If the regularizations of BM3D denoising and TV minimization are adjusted unbalanced, the VMAT-CT may be overly smoothed at each iteration step such that the updated parameter
, which represents the change between successive VMAT-CT iterations, will be too large to converge. Second, some failed cases of VMAT-CT remain unsolved even with the iterative algorithm. For VMAT-CT with extremely poor quality, tuning the proper block size and finding the patterns for BM3D could be challenging and correct templates could not be represented by blocks. One feasible approach is that the regularization models in our algorithm could be decomposed and replaced by a convolutional neural network (CNN) such that the limitations of tuning regularization parameters of TV and BM3D can be relieved. For example, some groups introduced deep CNN into the alternating direction method of multipliers (ADMMs) iterative reconstruction algorithm as the regularization term to solve the distorted limited-angle CT images, and found better image recovery than the iterative algorithms with TV regularization [
56,
57]. Finally, the speed of the iterative algorithm is relatively slower.
Table 5 shows the overall computational time of the whole 3D VMAT-CT reconstruction. Compared with FDK and FDK + preprocessing, iterative + preprocessing takes the longest time ranging between 6 and 10 min because of the varying convergence speed for different cases of VMAT-CT. The computational bottleneck of the iterative reconstruction is the BM3D denoising, which involves computationally demanding processes such as block-matching, grouping, and aggregation. There are several studies in the literature about GPU-based BM3D denoising, but they are limited to applications to a 2D image and require memory organization and thread cooperation for data exchange [
58]. Future work on 3D GPU-based BM3D denoising in MATLAB could further accelerate the TV-BM3D iterative algorithm.
In the future, the framework of our TV-BM3D iterative algorithm could be revised to have faster convergence and require less tuning. Instead of optimizing in two alternative phases, the optimization problem may be solved using Barzilai–Borwein formulation in a single phase [
7]. Because the tuning parameters, including regularization weighting factor, block size and noise levels of BM3D, step size and iteration number of TV minimization, are affected by some characteristics of VMAT plans such as MLC modulation complexity score [
59], small aperture score for the aperture size [
60], and the inherit CT contrasts at the locations of treatment sites [
61], we can reduce the tuning labors in the clinical workflow by pre-setting these parameters as specific protocols for each treatment site, which is similar to the kV-CBCT protocols used in the clinic.