1. Introduction
W7-X is the world’s largest stellarator-type fusion device [
1], designed to demonstrate the feasibility of stellarators as a future power plant concept. Due to its stellarator-type structure, W7-X features a three-dimensional helical architecture with five-fold modular symmetry [
2,
3,
4]. The main objective of W7-X is to achieve steady-state plasma operation at fusion-relevant parameters, addressing key challenges, such as plasma confinement, stability, and heat exhaust. The protection of plasma-facing components during long-pulse plasma operation is also one of the main concerns for ITER operation and lifetime, as discussed in the ITER Physics Basis [
5,
6].
One of the critical aspects of ensuring the safe and efficient operation of W7-X is the development of reliable overload risk detectors [
7]. The exhaust of thermal energy from the confined plasma is predominantly managed by an island divertor system, which channels the heat toward ten specialized divertor units—distributed equally between the upper and lower sections of the device. Each unit integrates vertical and horizontal targets, specifically engineered to tolerate intense thermal loads, with design limits reaching up to 10 MW/m
2. In
Figure 1, one of the ten divertor units is shown.
The need for developing effective overload risk detectors is underscored by the potential risks associated with thermal overloads, which can compromise the structural integrity of key elements, particularly those in direct contact with the hot plasma [
8].
In this context, a key challenge in nuclear fusion research is finding a good trade-off between safety [
9] and high-performance plasmas. In fact, achieving high performance often involves operating at conditions close to the stability limits of the plasma, which increases the risk of thermal overloads [
10]. Therefore, it is crucial to develop systems that can predict and mitigate these risks in real-time [
11], ensuring both the safety of the reactor and the achievement of high-performance plasma conditions. Additionally, thermal overload detection may entail monitoring the performance of the present cooling systems, which are active on the 10 water-cooled divertors of W7-X, to guarantee the efficient dissipation of heat from critical components and maintain a safe operating temperature range [
12,
13].
Automated systems using deep learning algorithms, such as Cascade R-CNN and SORT [
14,
15], are under development in nuclear fusion devices, e.g., at WEST, to detect, track, and classify thermal events. These systems are trained on datasets of thermal events and are capable of identifying the hottest zones, which are critical for preventing damage to PFCs [
16].
An infrared (IR) image analysis system is currently under deployment at W7-X for real-time protection of the divertors. The system aims to prevent thermal overloads that could damage PFCs, leading to machine downtimes and repair costs. The thermal overload detection (TOD) system processes calibrated IR images of PFCs acquired every 10ms to produce an alarm signal for the Fast Interlock System (FIS), thresholding an overload risk indicator, calculated among the different IR cameras. The calibration, made to map digital levels to apparent temperatures, includes correcting for non-uniformity, bad pixels, and temperature drift. Pre-processing filters out bad pixels and averages surface temperatures to reduce fluctuations. The detection stage adjusts the temperature thresholds to anticipate overloads and triggers the FIS to stop heating systems if an overload is detected. A real-time constraint to produce an overload alarm is 0.11 s, which entails acquisition, calibration, detection, and interlock delays [
11].
This study focuses on developing a machine learning (ML)-based tool for monitoring the overload risk at W7-X, specifically using the Self-Organizing Map (SOM) [
17], a nonlinear dimensionality reduction technique that preserves input space similarities while mapping data onto a lower-dimensional (typically 2D) representation [
18]. Two-dimensional mappings have been widely applied in nuclear fusion for classification and prediction tasks [
19,
20,
21,
22], including disruption prediction [
23,
24], and electromagnetic studies [
25]. In the context of thermal overload detection, SOMs offer intuitive visual insights that are easily interpretable by experimentalists, aiding in the development of effective thermal safety and control policies.
The primary aim of this paper is to demonstrate that the multidimensional operational space of W7-X, as explored through the analyzed experimental campaigns, can be effectively mapped onto a 2D Self-Organizing Map (SOM) and proposed as a real-time overload risk detector. To support this, a criterion is introduced to correlate the map’s composition with different overload risk levels. As a foundational step toward a probabilistic approach, a multi-level risk scale is defined to estimate the probability of overload for each experimental sample.
Furthermore, this paper highlights the added value of a SOM-based detector, particularly in terms of output interpretability. The 2D map enables tracking the operational point in relation to the evolution of input features, providing insight into the underlying causes of the detector’s response and helping to interpret classification errors. This level of interpretability is rarely achievable with conventional machine learning algorithms.
This paper is organized as follows. In
Section 2, both the characteristics of the input space and the ratio used to define the target are described. In
Section 3, the fundamentals of the SOM algorithm are reported. In
Section 4, the performance analysis is reported, and the results are discussed. In
Section 5, the main causes of the classification errors are analyzed. Finally, the conclusions are drawn in
Section 6.
2. Methodology
2.1. Experimental Dataset
A database of 17 0D plasma parameters, for a list of 69 experiments taken from the OP1.2a campaign, was developed. Among all the experiments available from the OP1.2a campaign, only items containing all the selected plasma parameters were retained.
The signals were selected based on their impact on the presence of a thermal overload. W7-X utilizes a variety of diagnostic and control input signals to monitor and influence plasma behavior. A key aspect is the magnetic configuration, which is governed by the currents in different sets of coils: the planar coils, non-planar coils, trim coils, and control coils. These coil systems collectively shape the magnetic field geometry, enabling W7-X to explore optimized stellarator configurations and control edge conditions, such as island divertors. In particular, the upper and lower divertors control coil currents, together with the toroidal current, measured using Rogowski coils, affect the distribution of heat loads on the divertors while impacting the overall magnetic confinement and stability of the plasma [
26,
27,
28]. While W7-X is primarily a current-less stellarator, any induced or driven toroidal current is relevant for stability and confinement studies. Heating power is provided through two main systems: Electron Cyclotron Resonance Heating (ECRH) and Neutral Beam Injection (NBI). These systems inject energy into the plasma to raise its temperature and maintain the required conditions for fusion-relevant plasma behavior. To assess the plasma’s energy content, diamagnetic energy measurements are used. These are derived from magnetic diagnostics and provide a direct estimate of the total thermal energy stored in the plasma. Finally, the total radiated power is measured using bolometers positioned to provide both horizontal and vertical views of the plasma. These measurements are critical for energy balance calculations and for understanding radiative losses due to impurities and other processes within the plasma. Together, these input signals form the foundation of W7-X’s operational diagnostics, enabling accurate control, monitoring, and analysis of plasma discharges. A time window for each experiment was defined based on the toroidal current value and by removing potential outliers and experiments with missing values. A features’ evaluation was made in order to discard correlated signals reducing the space dimensionality. In particular, control coil currents were aggregated into one feature, by averaging all the upper divertors and lower divertors control coil currents. Also, the additional heating powers (ECRH and NBI) were aggregated into a single feature providing the total heating power. Regarding the bolometer, the total radiated power calculated from both the horizontal (H) bolometer camera and the vertical (V) bolometer camera were used. The list of features used to train the SOM-based detector is reported in
Table 1.
As can be seen, the SOM does not receive explicit input regarding IR properties. The temperature data from the IR cameras is used solely to define the target for the overload detector.
2.2. Criticality
In this paper, the overload detector’s target is defined as the maximum criticality across all the different IR cameras. The criticality (
) is an overload risk indicator, calculated for each IR camera image, pixel, and time instance, as the ratio between the pixel temperature
and a dynamic threshold
:
where
is the pixel position and
is the time instant [
8].
Before calculating the criticality and the related threshold
, several filters and morphological operations are applied to the temperature taken from the IR camera in order to reduce the effect of noise peaks. The temperature derivative is then estimated as the temperature increment from the previous frame divided by the frame time
. After this, the temperature threshold
is computed to avoid a temperature overshoot of the temperature limit within the reaction time of the system
. Given the high noise in the temperature derivative calculation, a protective mechanism is introduced, and the threshold is set as the minimum between the computed threshold and an upper bound
dependent on the material properties:
where
Tlimit (
x,
y) is the limit temperature of the component in the related pixel (see Figure 3 in [
8]).
Finally, a moving average filter is applied to the criticality calculated in (1) in order to filter out the fast transients and spikes.
A criticality greater than or equal to 1 means that if there are no heat-flux changes in the short term, an overload will occur within or before the reaction time, i.e., without the possibility of avoiding it in due time. Thus, a binary classifier based on the criticality thresholding is employed: when an alarm is triggered, and the experiment is interrupted.
2.3. Machine Learning Target
To prevent overload during high-performance operations, a tool is proposed to monitor approaching overload by detecting increasing risk levels. This allows the control system to plan different avoidance actions depending on the detected risk level.
The diagnostic signals from 69 shots in the database were downsampled to a common time base, matching the IR camera’s sampling time (10 ms). This resulted in a dataset of 36,256 samples, with 70% used for training the SOM and the remaining 30% for performance testing.
In this paper, a scale of five risk levels for thermal overload is defined as the target. Compared to a simple binary classification of risk (i.e., safe operation vs. overload), this approach allows each risk level to incorporate preventive actions that are more aligned with those of neighboring levels. The discretization of criticality into a finite number of risk levels serves as an initial step toward a probabilistic framework, in which each sample will eventually be assigned an estimated overload probability.
The five risk levels were defined by equally distributing the dataset samples among five ranges of criticality, as summarized in
Table 2. Since both higher and lower criticality samples are typically scarce in each experiment, they were aggregated into two larger intervals.
Since balancing a dataset makes training an ML model easier, because it helps prevent the model from becoming biased towards one risk level, the training samples were selected in order to cover equally the five ranges of criticality, as shown in
Figure 2. Only 15 shots reached the highest risk level of thermal overload. Note that the higher level LV5 mostly covered the high overload range (i.e.,
) defined by the binary classifier used at W7-X [
8].
3. Self-Organizing Maps
The Self-Organizing Map (SOM) [
17,
29] is a nonlinear dimensionality reduction technique that generates a low-dimensional representation of data while preserving the topology of the input space. A SOM is a neural network with two layers, i.e., the input layer and the output layer, with the nodes of the input layer directly connected to those of the output layer. The output layer is arranged as a bounded grid of neurons, referred to as prototypes, each having coordinates in both the original high-dimensional space and a lower-dimensional grid (see
Figure 3). The SOM is trained iteratively using competitive nonsupervised learning, where each data point is assigned to its Best Matching Unit (BMU), i.e., the closest prototype in the input space. The BMU and its neighbors are updated to move towards the data point, following an adaptive learning rule governed by a neighborhood function that decreases over time. This process allows prototypes to gradually organize themselves within the data distribution while preserving topological relationships rather than strict Euclidean distances. Once training is complete, new data points can be mapped to the closest prototype, effectively clustering the input space. A SOM requires careful selection of parameters, such as the grid size, learning rate, and neighborhood function, to ensure meaningful organization of the data.
Beyond dimensionality reduction, SOMs are widely used for unsupervised learning tasks, such as clustering, visualization, and anomaly detection. Unlike other clustering methods, SOMs can provide an intuitive and interpretable 2D (or 3D) grid that highlights the relationships between clusters, such as the one shown in
Figure 3. In addition, SOM interpretability can be enhanced by the so-called component planes, consisting of the representation of the distribution of each input feature on the grid, through color-coding [
30]. Plotting the component planes for all input features allows for correlating each variable’s behavior with the resulting clustering, assisting in obtaining a better understanding of the characteristics of specific map regions and in discovering the underlying dependencies [
31]. Indeed, by comparing the component planes, the input correlation can be easily investigated by visual inspection. If the outlook of the component planes is similar, the inputs are strongly correlated.
This study used a 2D SOM, which is a commonly used tool for clustering and classification tasks [
32]. The architecture was optimized by using the toolbox presented in [
33].
5. Pulse Tracking
The projection of a pulse on the SOM can be visualized on the 2D map, allowing us to display the temporal evolution of the operating point across the macro-clusters with different risk levels. A pulse trajectory can be obtained by tracking the projection of the test samples onto the map over time. Each projection involves identifying the Best Matching Unit (BMU), i.e., the cluster with the closest prototype vector. On the map, each test sample is represented as a dot positioned on the best-matching cluster. As an example,
Figure 10 reports the trajectory of the eXperiment Program (XP) 20171205.026, which evolves in the right bottom corner of the selected SOM (see
Figure 6). The yellow and the black dots represent the starting and the ending points of the trajectory, respectively. The shot starts in an LV2 region (light orange), passing through the LV3 (orange) and LV4 (red) regions and, finally, ending in an LV5 cluster (dark red).
Figure 11 compares the criticality behavior with respect to the SOM output for the experiment program 20171205.026. Subplot (a) reports the target criticality (in red) with respect to the target risk levels (in black). Subplot (b) reports the SOM output (in black), and subplot (c) reports the classification error (in red), evaluated as the difference between the target and the SOM output. The criticality monotonically increases from 0.770 to 1.026 in the first part of the pulse and then monotonically decreases to 0.985. The SOM clearly identified the three levels of risk defined in the criticality range; indeed, the error signal is zero. The SOM output is not monotonic at all; some spikes appear mainly in the transition between two risk levels due to jumps of the trajectory across the boundary delimiting two macro-clusters with adjacent risk levels. As an example,
Figure 12a shows a zoomed-in view of the pulse trajectory reported in
Figure 10; only the trajectory of five samples around the transition time of 14.35 s is shown. It can be observed that the operating point oscillates between five adjacent clusters, with three clusters corresponding to LV4 and the remaining two to LV5.
Gaps between non-adjacent risk levels are observable at the beginning and the end of the experiment. At the end, a jump between LV5 and LV2 and then a jump back to LV5 occurs because the trajectory evolves over time across a region where a mix of clusters with low risk levels are present, as shown in
Figure 12b, where the last 360 ms of the trajectory shown in
Figure 10 is reported. The root of the errors can be investigated by analyzing the component planes, which show the distribution of the input features on the map. This allows us to straight link the risk levels defined on the map to the behavior of the inputs. Thus, the risk level assigned to the test points by the trajectory evolution in the map can be interpreted. As an example, the region in which the last 360 ms of the XP 20171205.026 trajectory evolves is characterized by low values of both the total input power (P
INP) and diamagnetic energy (W
DIA), as clearly highlighted by the magenta square on the component planes shown in
Figure 13a and
Figure 13b, respectively. This behavior can be representative of both a low overload risk condition and the starting and/or ending phases of the experiments. This explains why both the map regions are characterized by LV2 and the last part of the experimental trajectory is projected right there. Indeed,
Figure 12b visualizes the part of the experiment following the switch-off of the additional heating systems. This example shows the potential in terms of the result interpretability of the SOM output.
Figure 14 reports the trajectory of XP 20171205.014, which evolves in the middle-top region of the selected SOM.
Figure 15 compares the criticality behavior with respect to the SOM output. Subplot (a) reports the target criticality (in red) compared to the target risk levels (in black). Subplot (b) reports the SOM output (in black), and subplot (c) reports the classification error (in red), evaluated as the difference between the target and the SOM output. For this experiment, the criticality always remains below 0.766, starting from 0.596 and then coming back to 0.696 at the end of the shot. In this case, the risk levels from 1 to 3 are experienced. The shot trajectory evolves in a small region, well delimited from the rest of the map by a white cluster border. As for XP 20171205.026, the classification errors occur just behind the transition between the two risk levels.
Figure 16 reports the trajectory evolution around the transition occurring at 1.45 s, between LV2 and LV3. The fluctuations in the SOM output are due to a repeated jump of the trajectory between a light-orange cluster and an orange cluster.
Generally speaking, the spikes in the SOM output and the oscillation at the transition times do not correspond to fast changes in the input variables but are caused by the quantization of the target, which results in a fragmentation of the 2D map into several adjacent macro-clusters.
6. Conclusions
In this paper, a method for mapping the W7-X operational space onto a 2D grid by means of Self-Organizing Maps (SOMs) was proposed. The peculiarities of SOMs in preserving the topological structure of the operational input space were considered, while representing the data onto a discrete and interpretable 2D grid. This enabled easy visual exploration of high-dimensional data by a 2D map, allowing us to track the experiments and monitor the overload risk during W7-X operation. At this preliminary stage, the so-called criticality was quantized into five risk levels in order to test the ability of the SOM in clustering experimental conditions with similar thermal load conditions.
The SOM proved to be an effective tool for detecting the overload risk, with good performance on both the training and test sets. It correctly identified the overload risk level for 87.52% of the samples, with less than 1% of the samples left unclassified. The most frequent error on the test set, occurring in 10.46% of the cases, was assigning a risk level to the sample that is adjacent to the target one. Only 1.91% of the samples were misclassified with a non-adjacent risk level. These results are promising for the application of SOMs in the real-time detection system at W7-X, where accurate and timely predictions are crucial to prevent damage to the PFCs.
The analysis of the results allowed us to highlight the prospects of future work. The identified common errors suggest areas for improving the potential of this tool. The next step will focus on limiting the effect of the level transitions occurring between two contiguous risk levels. For this purpose, macro-regions with an assigned overload will be defined to limit the output oscillation among contiguous clusters in the same map region. In addition, the risk level definition could be optimized based on the control actions to be undertaken to limit inaccurate responses of the control system when oscillation among adjacent classes occurs. Meanwhile, an assertion time can be optimized to reduce the oscillation output among non-adjacent risk levels, in accordance with the overload detection needs, reducing the harsh risk level misclassification. Implementing these actions could significantly enhance the reliability of the overload detection system at W7-X, contributing to the safe and efficient operation of the stellarator.