1. Introduction
Radiotherapy treatment planning has developed rapidly over the past century [
1,
2,
3]. From the early 20th century, treatment plans were created manually using 2D patient contours and standard isodose charts derived from measured percentage depth-dose (PDD) data [
1]. By 1972, the invention of X-ray computed tomography (CT) by Hounsfield and Cormack, along with the computer-assisted calculations, enabled true 3D planning [
2,
3]. For the first time, CT scanning allowed 3D visualization of the tumor and normal organs, permitting accurate calculation and optimization of dose distribution in the target volume while protecting surrounding tissues. In the 1980s, three-dimensional conformal radiotherapy (3D-CRT) was developed to design beam configurations using the beam’s eye view visualization [
4]. Treatment plans could be evaluated with dose uniformity overlays and dose–volume histograms (DVHs). Since the 1990s, the concepts of inverse planning using algorithms to optimize intensity-modulated radiotherapy (IMRT) and volumetric modulated arc radiotherapy (VMAT) have been introduced [
5]. These techniques enabled conformal dose sculpting by automatically optimizing small beamlets with computer algorithms, rather than relying on manual beam arrangement trials. As a result, radiation could be delivered with millimeter-level accuracy, and this approach remains the standard of practice today. Over time, radiation therapy simulation has evolved from a purely geometric setup into a comprehensive computerized process that includes imaging, contouring, dose calculation, and treatment plan optimization.
Imaging is fundamental to radiotherapy simulation because it enables accurate identification of patient anatomy for precise treatment planning [
3,
6,
7]. CT remains the most widely used modality because it provides reliable geometric information in most cases. In clinical practice, additional imaging modalities, such as magnetic resonance imaging (MRI) and positron emission tomography (PET), are often combined with CT to address the limitations of any single technique and to improve the delineation of complex anatomical relationships [
8].
Once imaging datasets are acquired, the next essential step is delineation of target and organ at risk (OARs) to guide dose optimization and minimize normal-tissue toxicity. Contouring techniques have undergone tremendous development from the manual, wire-based body-outline tracings of the pre-CT era to the semi-automated and fully automated methods available today [
9,
10]. The introduction of CT has increased segmentation accuracy compared to earlier approaches, which involved tracing the patient’s body contour using mechanical aids such as flexible leads or solder wires to capture transverse outlines. However, manual delineation on CT remains a labor-intensive and time-consuming process, often requiring several hours for complete segmentation of all relevant structures [
11]. Moreover, inconsistency between operations has always been an issue [
12]. Therefore, early auto-contouring algorithms were developed, such as simple threshold-based methods [
13], nearest-point line approaches [
14], and atlas-based segmentation [
10]. While successful for specific organs, these methods have encountered limitations in accuracy, generalizability across different anatomical regions, and fully automated performance. In recent years, artificial intelligence (AI) and deep learning approaches have transformed the contouring process [
9,
15,
16]. Advanced neural network models trained on large datasets of expert-annotated scans are now capable of automatically delineating organs [
17]. The integration of deep learning has significantly improved the quality and accessibility of auto-segmentation, producing contours that are far more accurate and consistent than earlier methods [
18].
The primary objective of radiotherapy is to accurately deliver the prescribed dose to tumor tissues while minimizing exposure to surrounding healthy organs. Therefore, dose calculation plays a critical role in radiotherapy (RT) treatment planning [
19,
20]. The development of dose calculation has progressed from empirical approaches to sophisticated physics-based algorithms [
21,
22]. Early treatment planning systems (TPS) relied on measured beam data, such as percentage depth-dose (PDD) curves and tissue-air ratios (TAR), to manually estimate the dose at specific points, typically referencing only the central-axis depth [
23]. Although these methods were simple, fast, and effective for cases involving flat and homogeneous tissue regions, they lacked the accuracy required for evaluating 3D dose distributions and tissue heterogeneities. The introduction of CT imaging, which provides a 3D patient-specific electron density map, greatly advanced the development of computerized dose calculation algorithms [
20,
24]. These developments included pencil-beam models and Clarkson integration techniques in the 1980s, followed by convolution/superposition algorithms in the 1990s. Around the same time, Monte Carlo (MC) simulation became the gold standard for dose calculation because of its high accuracy in modeling dose deposition. However, its clinical use was initially limited by high computational demands. Recent advances have improved calculation speed, making MC methods more practical. Overall, the choice of dose calculation method depends on the clinical scenario and must balance accuracy with computational efficiency [
21,
25].
Modern radiotherapy platforms now incorporate a wide range of advanced functionalities to improve treatment safety and streamline clinical workflows [
26]. Beyond the core tasks of imaging, delineating organs, and calculating dose distributions, current systems also offer capabilities in sophisticated plan optimization, adaptive replanning, and data-driven decision support. However, these functions are often proprietary products of a few large vendors. The closed design and high computational cost can restrict access and hinder customization [
27,
28].
In this study, we developed a radiotherapy simulation system that enables automated segmentation, radiation dose estimation, and collision detection during treatment planning. The Point Transformer neural network model was implemented to segment OARs from CT medical imaging. To reduce training cost, the farthest point sampling (FPS) technique was applied on the input data. The AI-generated segmentation results were refined using post-processing algorithms, including k-d tree construction, outlier removal, marching cubes, and surface smoothing, to enhance the accuracy and anatomical fidelity of the segmented organs. Dose estimation at the tumor center can be performed in sync with beam selection, which is a foundational step in 3D-CRT and IMRT planning. Beam selection defines the geometric search space for optimization. In clinical planning, this process is primarily guided by the Beam’s Eye View (BEV), which helps identify beam angles that maximize target coverage while minimizing overlap with critical organs. Additionally, a collision detection module, based on a depth camera and the separating axis theorem algorithm, was integrated to evaluate potential interactions between the radiotherapy machine and the patient. All functionalities were implemented and simulated within a Unity 3D virtual environment to support precise and effective radiotherapy planning.
2. Materials and Methods
This study was reviewed and approved by the Institutional Review Board of Chiayi Christian Hospital (IRB2025011). Manually segmented CT images from patients who had undergone radiotherapy were employed. A radiotherapy simulation system was developed in a Unity 3D virtual environment. Its functions included automated organ segmentation, radiation dose estimation, and collision detection during treatment planning (
Figure 1).
The software environment consists of Python 3.9.12 with PyTorch 1.10.2 for deep learning implementation, CUDA 11.x for GPU acceleration, and supporting libraries including SimpleITK and PyVista for medical image and point cloud processing, as well as NumPy, SciPy, and scikit-learn. The virtual simulation environment was developed using Unity 3D (version 2020.3.19f1). All experiments were conducted on a workstation equipped with an Intel® Core™ i7-9700KF CPU (3.60 GHz, 8 cores), an NVIDIA Tesla P40 GPU with 24 GB VRAM, and 31 GB of system memory (2666 MT/s).
2.1. Automated Image Segmentation for Organ Recognition and 3D Surface Reconstruction
Point Transformer [
15] was employed for organ recognition using the WORD dataset [
29], which provides annotated organ contours as ground truth. Two evaluation scenarios were considered: single-organ recognition (liver) and multi-organ recognition (liver and spleen). For training, the 3D volumetric CT scan data (.nii.gz) were converted into point-cloud-like sequential data (.
xyz coordinates, HU values, and labels). Specifically, SimpleITK [
30] was used to extract the origin and voxel spacing, while PyVista [
31] mapped voxel indices to physical coordinates (
x,
y,
z). A dual-condition filtering strategy was applied to retain voxels that either belong to regions of interest (ROIs; predefined target labels) or fall within a specified HU range. Finally, label masking was performed to assign non-target voxels to the background (label 0), and the data were flattened into 1D arrays (coordinates, HU, labels). This process reduces background redundancy while preserving essential spatial and density information for downstream modeling.
To balance computational efficiency and cost, farthest point sampling (FPS) [
32] was applied to downsample each point cloud to a uniform size of 32,000 points per case. These processed point clouds were then used as input to the Point Transformer model, which predicts the label information for each point.
The point transformer architecture builds on self-attention mechanisms designed for irregular and unordered 3D data. This design enables the model to effectively capture local geometric structures and long-range dependencies. The model consists of an encoder–decoder structure with TransitionDown and TransitionUp modules for hierarchical feature extraction and reconstruction, respectively. Each stage of the encoder integrates a PointTransformerLayer, which applies spatially aware attention across neighboring points using learned positional encodings and attention weights. The decoder progressively upsamples features while fusing information from corresponding encoder layers, allowing fine-grained semantic predictions. This implementation provides multiple variants (Seg26, Seg38, Seg50) with varying depths, enabling trade-offs between performance and computational cost.
The dataset was partitioned into training (100 samples), validation (20 samples), and test (30 samples) subsets. The predicted results were compared with the ground truth annotations to evaluate model accuracy. In addition, a comparative analysis of prediction performance and visualized point clouds was conducted against PointNet++ [
33] using the same dataset.
The raw outputs obtained by Point Transformer require further processing before being used in dose estimation. During both training and inference, point cloud data undergo sampling, which leads to a loss of fine surface details. To recover these missing features, a k-d tree-based nearest neighbor search algorithm [
34] was employed to propagate labels from the downsampled predictions back to the original high-resolution point cloud. After that, a statistical outlier removal technique was applied to eliminate noisy points near the liver boundaries. For downstream applications, such as dose evaluation and radiation-object collision detection, pointwise operations are essential. However, performing these calculations directly on large-scale point clouds is computationally expensive and often insufficient for fully representing complex organ geometries. To overcome this, the labeled point clouds were voxelized and converted into surface meshes using the marching cubes algorithm [
35]. Finally, Laplacian smoothing [
36] was applied to enhance surface fidelity as the marching cubes-reconstructed meshes are typically coarse.
2.2. Dose Evaluation
The organ point cloud data extracted through Point Transformer were converted into surface triangular meshes in OBJ format. The ray casting method was employed to simulate interactions between radiation beams and organs. This process identifies the entry and exit points of radiation beams across the triangular meshes, enabling the calculation of the path length traversed within each organ. Based on this information, the absorbed dose delivered to the tumor was estimated. This process supports the optimization of radiotherapy treatment planning, as illustrated in
Figure 2.
2.2.1. Ray Casting for Intersection and Beam Path
In this study, ray casting [
37] was applied to calculate the entry and exit points (the motion path of the radiation beam), where rays intersect with organ surfaces, enabling identification of the traversed mesh regions. As illustrated in
Figure 3, a ray is projected from an origin point
in the direction of a vector
. The ray intersects the surface of a triangular mesh, whose surface normal is denoted by
, and a point on the plane is represented as
. The intersection occurs at a distance
from the ray origin, and the intersection point is computed using the ray’s parametric equation (Equation (1)). The scalar
is determined by first formulating the plane equation (Equation (2)).
From Equations (1) and (2), the ray-plane intersection condition gives:
After solving for
, the intersection point
is calculated using Equation (1). To check if the intersection is inside the triangular mesh, a point-in-triangle test is performed. This test evaluates the directional consistency of the cross products of vectors from the intersection point to the triangle vertices. A point
is considered inside the triangle if the following condition holds:
where
denotes the index of a triangle vertex (
), this condition ensures that the point lies within the triangle boundaries by confirming the relative orientation of the formed vectors.
2.2.2. Model-Based Dose Estimation
PDD quantifies how the absorbed radiation dose varies with depth as a photon beam penetrates a medium. As illustrated in
Figure 4, PDD is defined as the ratio of the absorbed dose
(at depth
) to the maximum absorbed dose
(at depth
), expressed as Equation (5). The source-to-surface distance (SSD) is the distance between the radiation source and the surface of the medium. PDD values can be determined experimentally using a water phantom or tissue-equivalent material, or through mathematical modeling such as the Monte Carlo method.
In this study, the buildup-tail method [
38] was employed to simulate the absorbed dose at various depths. This method combines a quadratic buildup function in the form of
with an exponential attenuation tail
, as shown in Equation (6):
where
is the depth in water (cm),
is the diffusion parameter (dimensionless), and
is the attenuation coefficient (
). By tuning
and
, the PDD curve can be fitted for different photon energies.
Geometrically, the PDD curve is also influenced by the SSD. The dose at depth is governed by the inverse-square law, tissue attenuation, and scattering. The following Equations (7) and (8) describe PDD at two different SSDs, where
is the scatter factor between
and
:
Combining Equations (7) and (8) gives the Mayneord Factor (
), which corrects the PDD for changes in SSD due to beam rotation:
The relationship between the fluence in ion chamber in monitor units (MUs) and the absorbed dose in the patient (cGy) is expressed in Equation (10). In this equation,
denotes the dose to be delivered at the tumor center,
represents the linear accelerator calibration dose (cGy/MU) at
,
is the collimator scatter factor,
is the phantom scatter factor, and the SSD factor accounts for the distance correction, evaluated by Equation (11).
where
is the source-to-surface distance at which
is specified.
The lungs are located among inhomogeneous anatomical structures, each characterized by different attenuation coefficients. These tissue inhomogeneities can lead to significant deviations in dose estimations when using the standard PDD for homogeneous medium. To mitigate this limitation, a modified PDD that accounts for the heterogeneity of body tissues was employed. It is assumed that the effective thickness (
) of an organ is equal to its actual physical thickness (
) scaled by a coefficient of equivalent attenuation relative to water (
),
. For a given material,
is also proportional to its relative electron density with respect to water. Based on this relationship, the dose at a point beyond an inhomogeneity can be determined based on the effective depth (
) along the ray joining the point and the electron source, as illustrated in
Figure 4B.
2.2.3. Irradiated Volume Assessment
Normal tissue complication probability (NTCP) [
39] is a widely used radiotherapy metric for evaluating the probability of radiation-induced side effects in healthy organs. It considers the absorbed dose, irradiated volume, and organ-specific radiosensitivity to predict the risk of complications. Standard NTCP estimation is commonly based on radiobiological dose–volume models, such as the Lyman–Kutcher–Burman (LKB) model and related formulations [
40]. NTCP plays an important role in treatment planning by balancing effective tumor control with the minimization of harm to surrounding normal tissues, especially for radiation-sensitive organs such as the lungs or heart, where precise dose management is necessary.
The Effective Volume (
) in the LKB model [
39] is a mathematical concept used to convert a complex, non-uniform dose distribution (like those found in IMRT or VMAT plans) into a simpler equivalent partial volume irradiation. This concept is rooted in the physiological architecture of the organ, specifically how its Functional Sub-Units (FSUs) are arranged. In the LKB NTCP model, this dependence is quantified by the parameter
. Understanding how tissue behaves differently to irradiation is critical for determining which cost functions and constraints should be prioritized during the inverse planning optimization process. For example, for
, the FSUs are arranged parallelly and are sensitive to the overall volume irradiated, demonstrating a pronounced volume effect. The formula transforms the DVH bins into a single effective volume [
41]:
To relate
to NTCP, an intermediate variable
is needed.
In the preceding equation,
is the maximal dose to the OAR.
is the dose at 50% probability of complication. Derived experimentally,
is inversely related to the dose–response curve.
Organs with a serial FSU, such as the spinal cord, brainstem, or optic chiasm, function as a chain in which the integrity of the entire organ depends on its weakest link. Damage to a small sub-volume in a serial FSU can lead to total functional failure. Serial tissues demonstrated little or no volume effect. The tolerance dose remains roughly the same whether a small section or the entire organ is irradiated. When optimizing treatment plans for serial organs, the primary objective must be to strictly limit the maximum point dose. The proposed framework enables identification of beam intersections with serial OARs, as illustrated in
Figure 5.
In contrast, organs with a parallel architecture, such as the lung, liver, or parotid glands, possess a profound volume effect, meaning they can tolerate high doses to small sub-volumes, provided the rest of the organ is spared. The irradiated volume can be assessed by tumor depth across a range of gantry angles.
Figure 6 shows how tumor depth varies with beam angle. The area under the curve provides a quantitative measure of the irradiated volume.
2.3. Radiation Therapy Simulation for Beam Arrangement and Collision Detection
The simulation was developed in the Unity 3D environment using an institutional treatment unit (Elekta Versa HD), enabling the creation of a virtual radiotherapy scenario (
Figure 7). The system provides two modes of operation. In the manual configuration mode, all treatment parameters are entered manually, allowing users to freely adjust beam arrangements and machine settings. In the RT plan mode, treatment parameters, including beam arrangements, dose rates, and collimator settings, are imported directly from an RT plan file, while patient table parameters are entered separately. In both modes, the system enables the detection of operational issues, such as collisions, and supports dose estimation throughout the treatment process. Consequently, it facilitates the selection and optimization of appropriate RT plans for effective treatment delivery.
The simulation was implemented on the Unity 3D platform to reconstruct realistic radiotherapy scenarios. A depth camera was used to capture point cloud data, while the structure and motion trajectories of the LINAC radiotherapy machine, along with the initial setup parameters, were obtained from manufacturer specifications and RT files. Spatial alignment between the treatment machine and the depth camera was achieved using the iterative closest point (ICP) algorithm combined with hand-eye calibration. Virtual markers from the camera were employed to assess and track discrepancies between planned and actual patient positions during treatment. The motion simulation and collision detection system was developed in our previous study [
42], in which the separating axis theorem (SAT) algorithm was integrated with collision bodies defined by axis-aligned bounding boxes (AABB). Optimization techniques, including bounding volume hierarchy (BVH) and collision pair analysis, were implemented to reduce the computational complexity of collision detection and improve simulation efficiency.
4. Discussion
This study presents a Unity 3D-based radiotherapy simulation platform integrating AI-driven organ segmentation, model-based dose estimation, and collision detection. Point Transformer achieved over 90% segmentation accuracy, while post-processing improved anatomical fidelity for dose estimation. Using a clinically oriented left lung tumor example, patient CT-derived anatomy was automatically segmented and reconstructed into refined 3D organ models. The resulting models were imported into the simulation platform. Candidate beam directions were then evaluated based on tumor depth, beam-path dose estimation, irradiated-volume effects in parallel and serial organs, and collision risk during treatment delivery. The platform therefore enhanced treatment visualization and supported collision risk assessment. It also provided relevant dose metrics and served as an interpretable pre-planning environment for early feasibility assessment, beam-angle screening, and safety evaluation before full dosimetric optimization. However, it was intended to complement rather than replace a commercial TPS.
The proposed Point Transformer model achieved markedly higher segmentation accuracy than the traditional PointNet++ baseline. In single-organ liver segmentation, it reached a DSC of 93.86
1.50% versus 91.80
6.47% for PointNet++ (
Table 1), reflecting its ability to capture both local and global geometric features through self-attention for precise and consistent contours. This result is consistent with broader point cloud benchmarks, where transformer-based networks outperform earlier point-based models by several percentage points in segmentation accuracy [
15]. As shown in
Figure 8, most Point Transformer errors occurred near organ boundaries, while PointNet++ misclassifications were more randomly distributed, indicating the transformer produced more coherent and anatomically faithful segmentations. In multi-organ segmentation (liver and spleen), the transformer still outperformed PointNet++ (91.86
3.25% vs. 90.79
5.78% Dice), though performance dropped slightly compared to the single-organ case, likely due to class overlap or smaller organ size. This process may introduce minor errors in the geometric representation of organs, which could subsequently affect the accuracy of dose estimation. This study presents a proof-of-concept framework for CT-based point-cloud segmentation, currently demonstrated for the liver and spleen. In addition, due to hardware limitations, the model was trained using only 32,000 points per case. Nevertheless, the proposed framework can be extended and adapted to newer and more efficient models in future work. Although the same approach may be applied to other OARs, further validation across a wider range of anatomical regions will be necessary to confirm its generalizability.
In addition to the segmentation, the absorbed dose analysis further demonstrates the clinical relevance of the proposed work. The system used ray-casting-based depth estimation with equivalent-depth PDD correction to generate dose distributions. For homogeneous organs such as the liver and brain, these distributions aligned well with those produced by a commercial TPS, with deviations below 6%. However, inhomogeneous tissues such as the lung showed a larger discrepancy (13.8%). Unlike convolution/superposition or Monte Carlo approaches, PDD does not explicitly model scatter, lateral electron transport, or complex heterogeneous media, and therefore its accuracy is reduced in highly inhomogeneous regions. Furthermore, geometric errors resulting from automated segmentation also contributed to the final accuracy, as previously characterized by Van Herk et al. [
46]. While analytical PDD-based correction provides a reasonable estimation for rapid dose assessment, further integration of advanced physics-based methods (e.g., Monte Carlo simulation or convolution-superposition algorithms) may be required to reduce uncertainty in aerated structures and improve the calculation accuracy, especially in heterogeneous tissues. Nevertheless, the proposed platform provides a flexible simulation framework that could be extended in future studies to incorporate more advanced physics-based dose calculation algorithms. However, due to high computational demands, convolution/superposition and Monte Carlo methods would more appropriately be implemented as native back-end solvers rather than as fully in-Unity C# routines, as in the current workflow.
Compared to commercial TPS such as RayStation and Varian Eclipse [
26], our platform emphasizes transparency, cost-efficiency, and customization. Commercial TPS offers advanced features, but relies on proprietary black-box algorithms tied to specific hardware and service contracts, limiting adaptability and driving high costs. In contrast, the proposed framework offers transparency and flexibility, enabling users to inspect, validate, and customize segmentation, dose estimation, and collision detection modules. This flexibility facilitates the integration of new AI models, alternative dose estimation methods, and institution-specific constraints without reliance on commercial licensing or hardware-specific platforms. The Unity-based virtual environment supports intuitive visualization and training, allowing systematic evaluation of beam–organ interactions, dosimetric effects, and collision risks. From a clinical workflow perspective, the proposed system is positioned as a supplementary clinical decision-support tool for beam-angle optimization, preliminary feasibility assessments, and safety evaluations. The proposed framework offers advantages in accessibility, customization, and workflow integration, with the potential to improve efficiency and cost-effectiveness. However, formal quantitative evaluation of these benefits compared with existing systems remains for future study. While not intended to replace commercial TPS for final dose prescription or definitive plan verification, it provides an extensible, low-cost platform for dosimetric analysis and educational applications in radiation oncology.