1. Introduction
Radiation estimation is a critical aspect of controlling radiation dispersion in accelerator facilities. The Department of Energy (DOE) mandates that radiation exposure in such facilities complies with legal standards to protect personnel, the public, and the environment [
1]. At the Thomas Jefferson National Accelerator Facility (JLab), administrative control policies are implemented to maintain radiation exposure as low as reasonably achievable (ALARA), limiting annual doses for unmonitored personnel and the public to 10% of the federal limit (0.1 mSv or 10 mrem). Therefore, minimizing radiation exposure for personnel within the facility without compromising operational efficiency is essential. Traditional methods of controlling radiation exposure primarily focus on controlling radiation sources and improving protective measures between sources and personnel. To ensure compliance with radiation safety regulations, facilities typically rely on active and passive real-time monitoring using radiation sensors to assess both on-site and off-site radiation fields. However, this approach often requires significant resources for effective radiation exposure management, creating challenges for optimizing safety and operational efficiency.
Deep learning (DL) models have demonstrated superior performance in radiation analysis across various radiation sources [
2,
3,
4,
5]. In our previous work, we developed a multi-task learning (MTL) framework based on accelerator data characteristics, utilizing long short-term memory (LSTM) and convolutional neural network (CNN) architectures as backbones for effective radiation estimation [
6]. Leveraging DL models for automated radiation dose estimation reduces operational costs and enables real-time monitoring, ensuring regulatory compliance for accelerator configurations. Transformers [
7] have been extensively studied and have proven to outperform traditional neural networks in time-series analysis, offering enhanced capabilities for capturing both short-term and long-term dependencies. However, there are no existing Transformer-based models specifically designed for radiation estimation in high-energy accelerator environments, which leaves a gap in benchmark comparisons. This paper aims to develop an MTL deep learning model, named MTL_TX, which is based on the Transformer architecture to enhance radiation estimation.
MTL_TX is designed to capture correlations among historical readings across all sensors and provide synchronized, real-time radiation estimates across multiple sensor locations. It is important to clarify that the proposed framework is not intended to model the underlying radiation physics or radiation transport mechanisms of accelerator facilities. Instead, this work addresses the problem of multi-sensor radiation estimation, where reliable real-time estimation must be achieved under noisy, incomplete, and heterogeneous sensing conditions. Building on existing innovations in Transformer architectures, we introduce novel components tailored to the unique characteristics of the collected data, with further enhancement of radiation estimation performance. The proposed model is comprehensively compared with previously developed deep learning frameworks and several other competing methods. Experimental results demonstrate that MTL_TX achieves state-of-the-art performance. Our main contributions are summarized as follows:
Radiation monitoring at high-energy accelerator facilities is formulated as a multi-sensor estimation problem, and a unified Transformer-based multi-task framework is constructed to jointly estimate radiation values at multiple sensor locations.
Two novel components, hierarchical feature embedding (HFE) and multi-level decomposition attention (MDA), are specifically tailored for radiation estimation.
Extensive experiments demonstrate the superior performance of the proposed model, particularly in estimating radiation doses on unseen datasets.
The HFE component integrates global variate embeddings and local patch embeddings into a hierarchical representation to capture inter-sensor dependencies, and the MDA component decomposes input sequences into trend and seasonal components to model multi-level temporal patterns. In summary, MTL_TX achieved = 0.8584 and = 0.2353 on unseen data from the same year, and an average = 0.8831 with = 0.2263 across different years.
The remainder of this paper is organized as follows:
Section 2 reviews the related work.
Section 3 describes the proposed methodology.
Section 4 outlines the experimental setup.
Section 5 analyzes the experimental results.
Section 6 discusses implications and limitations. Finally,
Section 7 concludes the study and outlines directions for future research.
6. Discussions
This study aims to develop a Transformer-based MTL model to simultaneously estimate radiation values for sensors within the JLab facility. Two novel components, hierarchical feature embedding (HFE) and multi-level decomposition attention (MDA), were integrated into the proposed model, and it was compared against multiple competing models to evaluate its estimation performance. All of the competing models were tested on datasets collected by JLab from 2016 to 2019. MTL_TX demonstrated the best overall estimation performance. On the 2018 dataset, it achieved average scores of RAE = 32.4656%, RSE = 22.0956%, MAE = 0.1464, RMSE = 0.2353, and = 0.8584. In addition, it exhibited superior generalization performance on unseen data from other years, with average scores of RAE = 31.2067%, RSE = 21.2411%, MAE = 0.1407, RMSE = 0.2263, and = 0.8831.
Compared to STL models, MTL_TX significantly enhances its own ability to extract latent correlations and features from multivariate time-series data, as shown in
Figure 10. The STL models are limited to feature extraction for a single radiation sensor and cannot be directly generalized to achieve high-accuracy radiation estimation for other sensors. Each radiation sensor requires a separate STL model, leading to an inefficient deployment strategy that consumes substantial computational resources and increases costs. The primary advantage of the MTL framework lies in its ability to process data from all sensors deployed around Hall A and simultaneously estimate radiation values for multiple sensors. By leveraging the MTL approach, the proposed model provides significant advantages, including reduced computational redundancy, enhanced feature extraction, and improved estimation accuracy.
All three components—ElasticNet, HFE, and MDA—play a crucial role in the proposed model. ElasticNet introduces constraints to mitigate overfitting. HFE integrates variate embedding and univariate patch embedding to capture both global and local information from the input time-series data. The global embeddings provide a comprehensive view of the dataset, while the local embeddings capture localized temporal information within each individual variable channel. Both embeddings enhance the MDA component’s ability to learn multivariate correlations effectively. Results from the ablation study clearly demonstrate the significant contributions of ElasticNet (improving from 0.7792 → 0.8086), HFE ( 0.8068 → 0.8377), and MDA ( 0.8377 → 0.8584).
The novelty of this work lies in integrating Transformer-based multi-task learning into radiation monitoring by formulating the problem as a multi-sensor estimation task rather than radiation transport physics modeling. The proposed MTL_TX framework targets robust radiation estimation under noisy, incomplete, and heterogeneous sensor conditions encountered in operational accelerator environments. By jointly modeling multiple radiation sensors within a unified framework, MTL_TX enables effective cross-sensor information sharing and consistent state estimation that cannot be achieved by independent single-task models.
From a deployment perspective, the final MTL_TX configuration contains approximately trainable parameters and performs radiation estimation through a single forward pass without iterative decoding. In our implementation, a single forward pass requires approximately 10.596 ms. Given the hourly sampling interval, the inference cost is orders of magnitude smaller than the data acquisition interval, making it negligible in practice. As a result, MTL_TX readily satisfies near-real-time requirements while offering reliable, scalable, and low-overhead deployment for radiation monitoring systems.
MTL_TX has a substantial number of parameters, and this significantly contributes to its superior performance. However, we observed that increasing the number of encoder and decoder layers only yielded a slight improvement in performance while dramatically increasing the number of parameters and the training time. To balance model efficiency and performance, MTL_TX in this study comprises two encoder layers and two decoder layers. With advancements in hardware technology and increasing computational power of graphics processing units (GPUs), training time is expected to decrease significantly in the future [
42]. Future work may involve adjusting the model architecture to fully exploit the potential of evolving hardware capabilities.
We conducted extensive experiments on both model architecture and input design to investigate the impact of structural choices and information sources on radiation estimation performance. Specifically, we explored encoder-only configurations, encoder–decoder variants with restricted information flow, and multiple decoder input conditioning strategies. Across all evaluations, the full encoder-decoder architecture consistently achieved the best performance. The results indicate that effective radiation estimation requires both explicit cross-sensor state representation in the encoder and conditioned temporal generation in the decoder. Cross-sensor radiation information plays a critical role in capturing shared operational states and inter-sensor dependencies, which cannot be reliably inferred from beam-related variables alone. These findings highlight the importance of jointly modeling beam conditions and cross-sensor radiation histories within a unified framework for robust multi-sensor radiation estimation.
This study has several limitations. First, the performance of MTL_TX depends on the quality and availability of archived multi-sensor data from JLab. The historical data is sparse, noisy, and peak-containing. We utilized simple methods to eliminate these peaks as anomalies; more advanced techniques will be explored for preprocessing. Second, this study used hourly-sampled data for model training and testing. However, JLab also collects data with finer temporal resolution, such as 10 min or 1 min intervals, which capture more detailed radiation fluctuations. Future work will explore multiple temporal resolutions to evaluate the adaptability of MTL_TX to finer-grained time scales. Third, the proposed model was validated using sensor data from JLab’s accelerator facility only. We will apply it to datasets from other DOE accelerator facilities to further evaluate its generalization capability.