AI Evaluation of Imaging Factors in the Evolution of Stage-Treated Metastases Using Gamma Knife

Background: The study investigated whether three deep-learning models, namely, the CNN_model (trained from scratch), the TL_model (transfer learning), and the FT_model (fine-tuning), could predict the early response of brain metastases (BM) to radiosurgery using a minimal pre-processing of the MRI images. The dataset consisted of 19 BM patients who underwent stereotactic-radiosurgery (SRS) within 3 months. The images used included axial fluid-attenuated inversion recovery (FLAIR) sequences and high-resolution contrast-enhanced T1-weighted (CE T1w) sequences from the tumor center. The patients were classified as responders (complete or partial response) or non-responders (stable or progressive disease). Methods: A total of 2320 images from the regression class and 874 from the progression class were randomly assigned to training, testing, and validation groups. The DL models were trained using the training-group images and labels, and the validation dataset was used to select the best model for classifying the evaluation images as showing regression or progression. Results: Among the 19 patients, 15 were classified as “responders” and 4 as “non-responders”. The CNN_model achieved good performance for both classes, showing high precision, recall, and F1-scores. The overall accuracy was 0.98, with an AUC of 0.989. The TL_model performed well in identifying the “progression” class, but could benefit from improved precision, while the “regression” class exhibited high precision, but lower recall. The overall accuracy of the TL_model was 0.92, and the AUC was 0.936. The FT_model showed high recall for “progression”, but low precision, and for the “regression” class, it exhibited a high precision, but lower recall. The overall accuracy for the FT_model was 0.83, with an AUC of 0.885. Conclusions: Among the three models analyzed, the CNN_model, trained from scratch, provided the most accurate predictions of SRS responses for unlearned BM images. This suggests that CNN models could potentially predict SRS prognoses from small datasets. However, further analysis is needed, especially in cases where class imbalances exist.


Introduction
Gamma Knife stereotactic radiosurgery (GKSRS) is a non-invasive technique used to treat brain tumors, vascular malformations, and other neurological conditions.Its history dates back to the 1940s when neurosurgeon Lars Leksell developed the concept of radiosurgery.In the 1950s, Leksell and Borje Larsson created the first Gamma Knife prototype at the Karolinska Institute in Sweden.The machine uses 192-201 cobalt-60 sources to deliver precise brain radiation [1].
The first clinical use of Gamma Knife treatment was in 1968, and it began gaining popularity in the 1970s and 1980s [2].Initially, it treated hard-to-reach brain tumors and vascular malformations.Now, it addresses neurological conditions like trigeminal neuralgia, epilepsy, and Parkinson's disease [3].
Over time, the Gamma Knife has seen technological improvements, including computerized treatment planning and image guidance.Today, it is used in over 700 medical centers worldwide, treating over one million patients per year [1][2][3][4].
Machine learning (ML) is a subset of AI, enabling computer systems to learn from experience without explicit programming.ML uses algorithms trained on data to recognize patterns and make decisions as humans do [5,6].
The three main ML algorithms are: 1.
Supervised learning: Maps input data to known output data for predictions, e.g., linear regression, decision trees, and neural networks.

2.
Unsupervised learning: Identifies patterns and relationships in data, without supervision, e.g., clustering and principal component analysis.

3.
Reinforcement learning: Learns through trial and error, optimizing for cumulative rewards, e.g., Q-learning and deep reinforcement learning.
ML finds applications in natural language processing, image recognition, predictive maintenance, fraud detection, and recommendation systems [5].
Deep learning, a subfield of ML, employs multi-layer neural networks for complex tasks like image and speech recognition.It excels in computer vision, speech recognition, and natural language processing [7,8].
There are various types of deep learning algorithms for different problems [7,8]: 1.
Convolutional neural networks (CNNs): Used for image and video recognition, and for analyzing local patterns.2.
Recurrent neural networks (RNNs): For sequential data, such as speech recognition and time series analysis.

3.
Generative adversarial networks (GANs): Generate data similar to a given dataset, such as realistic images.4.
Many other algorithm variations exist to solve diverse problems [7,8].Transfer learning (TL) is a technique in which a pre-trained model kickstarts a new task.The model first learns general features from a large dataset, then trains on a smaller, specific sample.TL is beneficial with limited labeled data, leveraging the pre-trained model's knowledge [9,10].It is used in computer vision, natural language processing, and speech recognition [11,12].
There are some new methods, like RCNN (region-based convolutional neural network) [13] and attention models [14], which are two different techniques commonly used in computer vision tasks.While they might not be the most typical choices for addressing the problem of early response prediction of brain metastases, they could potentially be applied in creative ways to enhance the performance of predictive models for this specific medical imaging task.RCNN is a family of methods used for object detection and localization in images.It involves dividing an image into multiple regions (proposals) and processing each region separately to detect and classify objects within them.In the context of brain metastases prediction, RCNN-like approaches could be adapted to identify and classify different regions of interest (ROIs) within medical images that indicate the presence of metastatic growth.These regions could correspond to areas with abnormal features indica-tive of metastases.By processing these ROIs separately, the model could potentially learn to detect early signs of metastases that might be overlooked by more traditional image analysis methods.Attention mechanisms have gained popularity in various deep learning applications, including computer vision and natural language processing.Attention mechanisms help models focus on the most relevant parts of the input data when making predictions.In the context of brain metastases prediction, attention mechanisms could be used to guide the model's focus to specific regions within the medical images that are more likely to contain early signs of metastases.This could be particularly helpful in identifying subtle patterns or anomalies that might not be immediately apparent to human observers or traditional image analysis methods.
It is important to note that neither RCNN nor attention models are directly designed for the early response prediction of brain metastases.The typical approach for medical image analysis involves techniques such as convolutional neural networks (CNNs) and other deep learning architectures specifically tailored for image classification and segmentation tasks.However, the application of RCNN-like techniques and attention mechanisms in this context could be explored as part of a more advanced and innovative approach to improve the sensitivity and accuracy of early response prediction for brain metastases.
The success of these techniques would depend on factors such as the availability of labeled data, the complexity of the metastases detection task, and the computational resources available for model training and evaluation.It is recommended to work closely with domain experts in radiology and medical imaging when designing and evaluating such models to ensure their clinical relevance and efficacy.
Treatment planning: ML helps identify brain structures in medical images (MRI) for precise target delineation.

2.
Dose optimization: ML optimizes radiation doses during planning, balancing efficacy and tissue protection.

3.
Prediction of outcomes: ML predicts SRS treatment outcomes based on patient characteristics.

4.
Quality assurance: ML automates error detection in treatment delivery, enhancing safety and efficacy.

5.
Treatment evaluation: ML assesses treatment effectiveness through patient data, refining protocols.
In summary, machine learning and deep learning improve Gamma Knife stereotactic radiosurgery, leading to a more efficient healthcare system [18,19].
In this work, we utilize Google Colaboratory (Google Colab), a cloud-based platform for Python code development via Jupyter notebooks.It offers a free environment for researchers, data scientists, and ML practitioners to analyze data, perform machine learning tasks, etc. [20].Google Colab boasts features such as access to free GPUs and TPUs for model training, integration with Google Drive for storage and notebook sharing, and realtime code cell execution with instant feedback.Popular Python libraries like TensorFlow, Keras, and PyTorch are supported [18,19].
Data augmentation is employed in ML and computer vision to expand training datasets with varied samples, addressing limited data and overfitting [21][22][23][24].ML techniques include flipping, rotating, scaling, cropping, adding noise, and adjusting brightness/contrast.These transformations enhance model robustness and accuracy, with care taken to maintain data representativeness [24].
Deep learning employs early stopping and callback lists to improve performance and prevent overfitting.Early stopping halts training when validation performance degrades after a set number of epochs, guarding against overfitting.Callback lists execute functions during training, enabling customization and the implementation of techniques like model checkpointing and learning rate scheduling [25][26][27][28][29]. Keras supports both early stopping and callback lists, enhancing model training and performance.
In the following sections, we present an AI evaluation of prognostic factors in the evolution of stage-treated metastases based on medical imaging with the Gamma Knife treatment machine from our department, as depicted in the diagram from Figure 1.
Deep learning employs early stopping and callback lists to improve performance and prevent overfitting.Early stopping halts training when validation performance degrades after a set number of epochs, guarding against overfitting.Callback lists execute functions during training, enabling customization and the implementation of techniques like mode checkpointing and learning rate scheduling [25][26][27][28][29]. Keras supports both early stopping and callback lists, enhancing model training and performance.
In the following sections, we present an AI evaluation of prognostic factors in the evolution of stage-treated metastases based on medical imaging with the Gamma Knife treatment machine from our department, as depicted in the diagram from Figure 1.Our present goals are as follows: 1.
Gather and preprocess data: Collect MRI scans from patients with stage-treated metastases who underwent Gamma Knife treatment.Preprocess the data to ensure its analysis readiness, addressing data imbalance using augmentation techniques like SMOTE.

2.
Identify prognostic factors: Use domain expertise and existing research to identify potential factors influencing metastases evolution, including tumor size, location, shape, patient demographics, and clinical history.

3.
Develop the AI model: Select a suitable deep learning algorithm and train it on preprocessed data to predict metastases evolution likelihood, considering the identified prognostic factors and Gamma Knife treatment specifics.Three methods are employed: CNN model from scratch, CNN model with transfer learning, and CNN model with fine-tuning.4.
Evaluate model performance: Test the AI model on separate data to assess its predictive capabilities using metrics like accuracy, sensitivity, specificity, confusion matrix, and receiver operating characteristics.5.
Interpret results: Analyze AI model outputs to identify the most important prognostic factors affecting metastases evolution in Gamma Knife-treated patients, using visualizations and statistical analysis to explore relationships between factors.6.
Validate findings: Verify AI model results with additional data and compare predictions against outcomes of real Gamma Knife-treated patients.7.
Communicate results: Present the AI evaluation findings clearly, highlighting crucial prognostic factors and their implications for metastases treatment with the Gamma Knife.
ing further research avenues will enhance the reliability, generalizability, and practicality of the proposed approach.

Ethics
All experiments were carried out in accordance with relevant guidelines and regulations.The study used only pre-existing medical data; therefore, patient consent was not required, and since it was retrospective, there was no need of approval from the Ethics Committee of Clinical Emergency Hospital "Prof.Dr. Nicolae Oblu" Iasi.

Patients
From July 2022 to February 2023, in the Stereotactic Radiosurgery Laboratory, Prof. Dr. N. Oblu Emergency Clinical Hospital, Iasi, 19 patients with single metastases were staged treated according to a treatment scheme of 30 Gy administered in 3 sessions (S1, S2, S3) of 10 Gy at 2-week intervals.Among the 19 patients, 5 were female and 14 male, aged between 43 and 80 years.All treated patients had a Karnofsky score of at least 70, and the initial tumor volumes before the first treatment session varied from 2 to 81 cm 3 , with an average of 16 cm 3 .The primary site of the 19 metastases treated was bronchopulmonary neoplasm in 14 cases, breast neoplasm in 3 cases, laryngeal neoplasm and prostate neoplasm, in one case each.
After the treatment, only one patient showed a clear regression of the lesion using the three-session treatment scheme, with another three being observed as a fluctuating progression and regression lesion case.

MRI Data Acquisition
All MRI examinations were performed on a 1.5 Tesla whole-body scanner (GE SIGMA EXPLORER) that was equipped with the standard 16-channel head coil.The MRI study protocol consisted of: 1.
The conventional anatomical MRI (cMRI) protocol for clinical routine diagnosis of brain tumors included, among others, an axial fluid-attenuated inversion recovery (FLAIR) sequence, as well as a high-resolution contrast-enhanced T1-weighted (CE T1w) sequence.

2.
The advanced MRI (advMRI) protocol for clinical routine diagnosis of brain tumors was extended by axial diffusion-weighted imaging (DWI; b values 0 and 1000 s/mm 2 ) sequence and a gradient echo dynamic susceptibility contrast (GE-DSC) perfusion MRI sequence, which was performed using 60 dynamic measurements during administration of 0.1 mmol/kg-bodyweight gadoterate meglumine.

Workflow 2.4.1. Basic Imports
In Python, we imported fundamental libraries for scientific computing, data manipulation (NumPy and Pandas), machine learning (TensorFlow), and data visualization (Seaborn and Matplotlib).

Image Data Processing
The BrainMet Image Dataset, consisting of three subfolders: TRAIN, TEST, and VAL, is stored in the cloud on Google Drive.For training, the train_path is used; for testing, the test_path is used; and for validation, the valid_path is used.The train subfolder contains 2865 MRI brain metastasis images, with 2083 images of regression class '1' and 782 images of progression class '0'.The test subfolder contains 874 images, with 230 images of regression class '1' and 85 images of progression class '0'.The val subfolder contains 14 images, with 7 images of regression class '1' and 7 images of progression class '0'.

Techniques to Overcome Insufficient Data from our Image Database
As discussed in Section 1, Introduction, data augmentation is a technique to artificially increase the size of a dataset by applying image augmentation methods to the existing training data.Its use is crucial when dealing with small imaging databases, as it improves the model's training ability by subjecting the data to various image processing techniques.This increases the model's accuracy and enhances its capability to predict cases.In the medical image recognition field, data augmentation plays a vital role by applying small transformations to existing data.This approach is especially important to address privacy regulations that may limit the sharing of medical data.
Transfer learning is another alternative approach that utilizes pre-trained state-of-theart CNN models, like those trained on the ImageNet dataset.It is known to achieve higher performance compared to training CNNs from scratch (full-training) [30,31].
In the following material, you will find Table 1, with the splitting of the BrainMet image dataset in TRAIN, TEST, VAL sets, and the number of images belonging to the two classes: PROGRESSION-class 0, and REGRESSION-class 1. Class weights are typically used in the calculation of the loss function.The loss function is a measure of the difference between the predicted values and the actual values, and is used to update the model's parameters during training.By assigning higher weights to the minority class, the loss function places more emphasis on correctly classifying these samples.The class weights can be manually specified or automatically computed, based on the class frequencies in the training data.
Our image dataset is highly unbalanced, with 2320 images from the regression class '1' and only 874 from the progression class '0.' To address this issue, we use class weighting (see Section 1, Introduction, for details).
Class weighting is a technique in deep learning that adjusts the contribution of different classes in the loss function during training.In classification tasks with imbalanced classes, the model may become biased towards the majority class, leading to poor performance regarding the minority class.To overcome this, class weights are assigned to give more importance to the minority class during training.In our study, the regression class receives less weight (0.69) compared to the progression class (1.83), as illustrated in the following output: {0: 1.83, 1: 0.69} Class weights are typically used in the loss function calculation, which measures the difference between predicted and actual values and guides the model's parameter updates during training.By giving higher weights to the minority class, the loss function focuses more on correctly classifying these samples.Class weights can be manually specified or automatically computed, based on the class frequencies in the training data.

Model-1: Convolutional Neural Network Model from Scratch (CNN_Model)
A convolutional neural network (CNN) is a variant of multi-layer perceptrons (MLPs) designed for 2-D imaging tasks.It comprises three layers: convolutional, subsampling, and output.The convolutional layer passes results to the next layer through convolution, function expression, and feature maps.Subsampling layers follow convolutional layers, reducing feature map size while preserving information between features.
We will briefly discuss the use of the CNN approach in addressing the problem of early response prediction of brain metastases.** Feature Learning from Images: ** CNNs are particularly effective in image analysis tasks due to their ability to automatically learn hierarchical features from raw pixel values.In the context of brain metastases prediction, CNNs can learn to identify patterns, textures, and shapes that are indicative of early metastatic growth.This feature learning process reduces the need for manual feature engineering, which can be complex and time-consuming.
** End-to-End Learning: ** CNNs enable end-to-end learning, meaning that the model learns a mapping directly from input images in order to make predictions.This is advantageous because it optimizes the entire pipeline simultaneously, without requiring intermediate steps.For the early response prediction of brain metastases, this end-to-end approach allows the model to learn complex relationships between image features and the likelihood of progression or stagnation.
** Transfer Learning: ** Transfer learning involves using pre-trained models on large datasets and fine-tuning them for specific tasks.The use of a modified ResNet152V2 architecture, previously trained on colored images, illustrates this advantage.Leveraging a pre-trained model speeds up the convergence process and allows the model to capture general image features that could be relevant to brain metastases prediction.
** Availability of Labeled Data: ** CNNs can perform well with a moderate amount of labeled data.In medical imaging, acquiring large datasets with accurate labels can be challenging due to the need for expert annotations.CNNs can still yield meaningful results, even with relatively smaller datasets, making them suitable for medical applications in which data availability might be limited.
** Robustness to Variability: ** CNNs are designed to handle various levels of variability in images, including changes in lighting, orientation, and scale.In the case of brain metastases, images can exhibit variations in terms of image quality, patient positioning, etc.The hierarchical features learned by CNNs allow them to capture relevant information, despite these variations.
** Interpretability and Visualization: ** CNNs can provide insights into their decisionmaking process through techniques like feature visualization and heatmaps.This interpretability can be crucial in medical applications in which understanding why a model makes a particular prediction is important for gaining trust and clinical acceptance.
Accordingly, the main model in this project is a CNN network designed from scratch.The network uses convolution, max pooling, and dense layers, and it is trained based on input weights (images).
The process starts with a lower filter size and gradually increases layer-wise.The kernel/filter size is [3X3], and ReLU is used as the activation function.The input shape represents the dimensions of the MRI image (height, width, and color channel).Although MRI images are in grayscale, we assume a color channel of "3" because the network used (ResNet152V2) was previously trained on colored images.One could also choose 1 as the color channel for grayscale images.
After each convolution layer, a [2X2] max pooling layer is added to decrease data size and processing time.The network consists of three blocks: Block-1, with a 1X16 Conv2D layer and one MaxPooling2D layer; Block-2, with 2X32 Conv2D layers and two MaxPooling2D layers after each convolution layer; and Block-3, with 2X64 Conv2D layers and two MaxPooling2D layers after each convolution layer.
The final layer is a flatten layer, followed by dense layers for classification/prediction. The sigmoid function is used in the last layer, since the problem is a binary classification (stagnation or progression),with one output unit.
To compile the model, three main parameters are required: (1) learning rate (optimizer), (2) loss function (binary_crossentropy), and (3) metrics (accuracy) to evaluate the training and validation sets' loss and accuracy.The Adam optimizer is mainly used in this case, as it provides adaptive learning rates for different parameters.
The model architecture we use is a convolutional neural network (CNN) based on the ResNet152V2 architecture, which we are adapting for the task of early response prediction of brain metastases using MRI images.Our model selection, loss function, training strategy, and other relevant aspects, are broken down as follows: ** Model Architecture: ** We are using a modified ResNet152V2 architecture, which is a deep CNN architecture known for its strong performance for various computer vision tasks.By customizing the architecture to our specific problem, we are leveraging the hierarchical feature extraction capabilities of the CNN layers.The gradual increase in filter size and the introduction of max pooling layers help to capture increasingly complex features from the MRI images.
** Loss Function: ** Binary cross-entropy loss (binary_crossentropy) is a suitable choice for binary classification tasks like ours (regression or progression).It quantifies the difference between predicted probabilities and true labels, encouraging the model to produce higher probabilities for the correct class and lower probabilities for the incorrect class.This loss function is widely used in binary classification scenarios.
** Training Strategy: ** Our training strategy involves using the Adam optimizer, with a specified learning rate.Adam is an adaptive optimization algorithm that adjusts the learning rates for each parameter based on the historical gradients.This can help the model converge faster and find a good set of weights.Additionally, we are using accuracy as a metric to evaluate the model's performance on both training and validation sets.It is important to monitor not only the training accuracy, but also the validation accuracy to detect overfitting, as we did.
** Model Layers and Activation Function: ** The use of ReLU activation functions after each convolutional layer is a common practice.ReLU helps introduce non-linearity into the model and can improve the network's ability to capture complex relationships in the data.Max pooling layers after each convolutional layer reduce the spatial dimensions of the feature maps, helping to decrease the computational load and retain essential features.
** Flatten and Dense Layers: ** The final layers include a flatten layer, followed by dense layers for classification.This is a typical setup in which the spatial information is flattened and then passed through fully connected layers for classification.Using a sigmoid activation function in the last layer is appropriate for binary classification, as it produces an output in the range of [0, 1], representing the probability of the positive class.
Overall, our model architecture, loss function, training strategy, and other design choices seem reasonable for the task of early response prediction of brain metastases using MRI images.
The word "parameters" refers to the count of weights learned during our training process.These parameters play a crucial role in the model's predictive power, as they are updated layer-wise using the back propagation method, driven by the optimization technique, which is Adam learning, in this case.As seen in Table 2, we have three columns: (1) Layer (type), (2) Output Shape, and (3) Param #, which represents the parameters.For each layer, the parameters are calculated and generated.The input layer, which is just an assigned input image shape, is not listed in the table and does not have learnable parameters.Fitting and Evaluating the Model (CNN_Model) In Figure 2, the plot of the training and validation and the loss and accuracy curves, along with the learning rate (see the legend) for the CNN_model from scratch, is displayed.

Fitting and Evaluating the Model (CNN_Model)
In Figure 2, the plot of the training and validation and the loss and accuracy curves, along with the learning rate (see the legend) for the CNN_model from scratch, is displayed.The accuracy of the CNN_model: The testing accuracy is: 98.41269850730896%.The confusion matrix of the CNN_model: In Figure 3, the confusion matrix for the CNN_model from scratch can be seen.The accuracy of the CNN_model: The testing accuracy is: 98.41269850730896%.The confusion matrix of the CNN_model: In Figure 3, the confusion matrix for the CNN_model from scratch can be seen.From the confusion matrix for the CNN_model from scratch, we can see that from the 85 progression cases, the model predicted 85 cases correctly.From the 230 regression cases, the model predicted 225 cases correctly, making only 5 mistakes, predicting progression for cases actually belonging to the regression class (false negatives).The classifi- From the confusion matrix for the CNN_model from scratch, we can see that from the 85 progression cases, the model predicted 85 cases correctly.From the 230 regression cases, the model predicted 225 cases correctly, making only 5 mistakes, predicting progression for cases actually belonging to the regression class (false negatives).The classification report for the CNN_model from scratch is depicted in Table 3.In Figure 4, images showing actual cases versus the predicted cases, with the probability of prediction, for the CNN_model from scratch on unseen images, are presented.In our study, we employed transfer learning (TL) using a pretrained model (see Table 4), specifically ResNet152V2, which was pretrained on the ImageNet dataset.
Residual network (ResNet) networks are deep networks that avoid vanishing gradi ent issues through "skip connection".ResNet has different models with varying numbers of layers, such as ResNet50, ResNet50V2, ResNet101, ResNet101V2, ResNet152, and Res Net152V2.
In ResNet, convolution layers and other methods are used, but the key is the "skip connection" that adds the original input to the output of the convolution block.This skips In our study, we employed transfer learning (TL) using a pretrained model (see Table 4), specifically ResNet152V2, which was pretrained on the ImageNet dataset.Residual network (ResNet) networks are deep networks that avoid vanishing gradient issues through "skip connection".ResNet has different models with varying numbers of layers, such as ResNet50, ResNet50V2, ResNet101, ResNet101V2, ResNet152, and ResNet152V2.
In ResNet, convolution layers and other methods are used, but the key is the "skip connection" that adds the original input to the output of the convolution block.This skips some layers, preventing the gradient from vanishing.
Weights download for the ResNet model: 234545216/234545216 [==============================]-2 s 0 us/step As the featured learning layers are frozen, the parameters of these layers are also predetermined.Finally, the classified layer parameters are obtained by the retrained layer.All the parameter rules and formulas are similar to those mentioned in Model-1.The accuracy of the TL_model The testing accuracy is: 91.74603223800659%.The prediction of the TL_model timestamp 10/10 [==============================]-5 s 219 ms/step The confusion matrix of the TL_model: In Figure 6, the confusion matrix for the TL_model from scratch can be seen.The accuracy of the TL_model The testing accuracy is : 91.74603223800659%.The prediction of the TL_model timestamp 10/10 [==============================]-5 s 219 ms/step The confusion matrix of the TL_model: In Figure 6, the confusion matrix for the TL_model from scratch can be s From the confusion matrix for the TL_model, we can see that from the 85 cases, the model predicted 83 cases correctly, making only 2 mistakes, predic sion for cases actually belonging to progression class (false positives).From gression cases, the model predicted 206 cases correctly, making only 24 mistak ing progression for cases actually belonging to the regression class (false nega check the results in the classification report (Table5).From the confusion matrix for the TL_model, we can see that from the 85 progression cases, the model predicted 83 cases correctly, making only 2 mistakes, predicting regression for cases actually belonging to progression class (false positives).From the 230 regression cases, the model predicted 206 cases correctly, making only 24 mistakes, predicting progression for cases actually belonging to the regression class (false negatives).Also, check the results in the classification report (Table 5).In Figure 7, images showing actual cases versus the predicted cases, with the probability of prediction, for the TL_model on unseen images, are presented.The fine tuning technique is the third type of approach for solving this problem.FT is the most efficient and accurate technique because of its flexibility.In this model, every aspect is similar to those in the TL model (Model-2); the only change is the unfreezing of the last few layers of the feature extraction step; all other aspects remain the same.This small change will bring a beneficial results regarding the model prediction because of the retraining the last few layers of the feature learning layers.The pretrained model is also the same as that used in Model-2, which is Resnet152V2 (see Table 6).The fine tuning technique is the third type of approach for solving this problem.FT is the most efficient and accurate technique because of its flexibility.In this model, every aspect is similar to those in the TL model (Model-2); the only change is the unfreezing of the last few layers of the feature extraction step; all other aspects remain the same.This small change will bring a beneficial results regarding the model prediction because of the retraining the last few layers of the feature learning layers.The pretrained model is also the same as that used in Model-2, which is Resnet152V2 (see Table 6).Calculation of parameters for FT Technique: The Model-3 summary of layers will be the same as for Model-2 because they both employ the same layers up to this point; however, in this technique, the last 15 layers are unfrozen.Due to this change, the number of trainable and non-trainable parameters in this model will also change.It should be noted that the parameters of the dense layers do not change.The total trainable parameters in this technique are 5,789,953, whereas in Model-2, they are 270,593.Calculation of parameters for FT Technique: The Model-3 summary of layers will be the same as for Model-2 because they both employ the same layers up to this point; however, in this technique, the last 15 layers are unfrozen.Due to this change, the number of trainable and non-trainable parameters in this model will also change.It should be noted that the parameters of the dense layers do not change.The total trainable parameters in this technique are 5,789,953, whereas in Model-2, they are 270,593.

Fitting and Evaluating the Model (FT)
In Figure 8, a plot of the training and validation and loss and accuracy curves, along with the learning rate (see the legend) for the FT_model, is shown.From the confusion matrix for the FT_model, we can see that from the 85 progression cases, the model predicted 85 cases correctly.From the 230 regression cases, the model predicted 177 cases correctly, making 53 mistakes, predicting progression for cases actually belonging to the regression class (false negatives).Please check Table 7 for the classification report.From the confusion matrix for the FT_model, we can see that from the 85 progression cases, the model predicted 85 cases correctly.From the 230 regression cases, the model predicted 177 cases correctly, making 53 mistakes, predicting progression for cases actually belonging to the regression class (false negatives).Please check Table 7 for the classification report.In Figure 10, images showing actual cases versus the predicted cases, with probability of prediction, for the FT_model on unseen images, are presented.The receiver operating characteristic (ROC) curve is a graphical representation of a binary classifier's performance as the discrimination threshold varies.It plots the true positive rate (TPR) against the false positive rate (FPR) at different thresholds.
The area under the ROC curve (AUC) quantifies the binary classifier's performance over all possible thresholds.It ranges from 0 to 1, with 0.5 indicating random guessing and 1 indicating perfect classification.
To calculate the ROC curve and the AUC, we followed these steps: Predictions: First, the deep learning model makes predictions for each instance in the dataset.These predictions are often in the form of probability scores for the positive class.
Sorting: Sort the instances based on their predicted scores in descending order.Threshold Variation: Start with a threshold of 0 (considering all instances as negative) and gradually increase the threshold.At each threshold, calculate the TPR and FPR.
The receiver operating characteristic (ROC) curve is a graphical representation of a binary classifier's performance as the discrimination threshold varies.It plots the true positive rate (TPR) against the false positive rate (FPR) at different thresholds.
The area under the ROC curve (AUC) quantifies the binary classifier's performance over all possible thresholds.It ranges from 0 to 1, with 0.5 indicating random guessing and 1 indicating perfect classification.
To calculate the ROC curve and the AUC, we followed these steps: Predictions: First, the deep learning model makes predictions for each instance in the dataset.These predictions are often in the form of probability scores for the positive class.
Sorting: Sort the instances based on their predicted scores in descending order.

Discussion and Conclusions
In this paper, we used deep learning techniques to analyze imaging data from patients with stage-treated metastases who underwent Gamma Knife radiosurgery.Our results show that deep learning algorithms accurately predict metastases evolution post-treatment [32].
However, our study has limitations, including a relatively small sample size, a single-center design, and a retrospective nature, potentially introducing biases and confounding factors.
Despite these limitations, our work provides essential insights into using deep learning techniques to predict treatment outcomes in metastases patients, with clinical implications for treatment decision making and patient outcomes.Future studies should validate our models in larger patient cohorts and explore deep learning algorithms in other clinical contexts.
Radiomics, an image analysis technique used in oncology, enhances diagnosis, prognosis, and clinical decision making for precision medicine.In brain metastases, radiomics identifies smaller metastases, defines multiple larger ones, predicts local response postradiosurgery, and distinguishes radiation injury from metastasis recurrence.Radiomics approaches achieve high diagnostic accuracies of 80-90% [32].
Notable papers related to radiomics and machine learning applications in stereotactic radiosurgery include a comprehensive review discussing brain tumor diagnostics, image segmentation, and distinguishing radiation injury from metastasis relapse [32].Studies regarding predicting the response after radiosurgery reveal potential, with features like the presence of a necrotic core, the fraction of contrast-enhancing tumor tissue, and the extent of perifocal edema [33][34][35][36][37][38].
The advances in radiomics and deep learning hold promise for precision medicine in brain metastases treatment, enabling precise diagnoses, prognoses, and treatment response monitoring [39].
Quantitative imaging features correlate with outcomes after radiation therapy, enhancing personalized cancer care.A multidisciplinary approach integrating radiomics and deep learning is essential in the medical decision-making and radiation therapy workflow for bone metastasis [48].
Studies by Huang et al. and others explore significant radiomic features related to core volume and sphericity, predicting local tumor control after GKRS [49].Machine learning processes predict the brain metastasis response to GKRS, with promising accuracy [50].
Cha et al. developed a radiomics model based on a convolutional neural network to predict the response to SRT for brain metastases, achieving promising results with ensemble models [37].
Computational software related to applied fractal analysis used in this study was initiated and then successfully developed in the articles of some of the authors mentioned in the bibliography [51][52][53][54][55].

Figure 1 .
Figure 1.Diagram of AI workflow of prognostic factors in the evaluation of stage-treated metastasis based on medical imaging with the Gamma Knife treatment machine from our department.

Figure 1 .
Figure 1.Diagram of AI workflow of prognostic factors in the evaluation of stage-treated metastasis based on medical imaging with the Gamma Knife treatment machine from our department.

Figure 2 .
Figure 2. Plot of the training and validation and the loss and accuracy curves, along with the learning rate (see the legend) for the CNN_model from scratch.

Figure 2 .
Figure 2. Plot of the training and validation and the loss and accuracy curves, along with the learning rate (see the legend) for the CNN_model from scratch.

Figure 3 .
Figure 3. Confusion matrix for the CNN_model from scratch.

Figure 4 .
Figure 4. Images showing actual cases versus the predicted cases, with probability of prediction, fo the CNN_model from scratch on unseen images.

Fitting
and Evaluating the Model (TL_Model) In Figure 5, the plot of the training and validation and the loss and accuracy curves, along with the learning rate (see the legend) for TL_model, is shown.iagnostics 2023, 13, x FOR PEER REVIEW 2.4.4.1 Fitting and Evaluating the Model (TL_Model) In Figure 5, the plot of the training and validation and the loss and ac along with the learning rate (see the legend) for TL_model, is shown.

Figure 5 .
Figure 5. Plot of the training and validation and the loss and accuracy curves, along ing rate (see the legend) for the TL_model.

Figure 5 .
Figure 5. Plot of the training and validation and the loss and accuracy curves, along with the learning rate (see the legend) for the TL_model.

Figure 5 .
Figure 5. Plot of the training and validation and the loss and accuracy curves, along w ing rate (see the legend) for the TL_model.

Fitting
and Evaluating the Model (FT)In Figure8, a plot of the training and validation and loss and accuracy curves, along with the learning rate (see the legend) for the FT_model, is shown.

Table 1 .
The splitting of the BrainMet image dataset in TRAIN, TEST, VAL and the number of images belonging to the two classes.
Addressing the Unbalanced Dataset Issue Which Is Present in This Study

Table 2 .
Parameters in the first model, CNN model from scratch.Model: "sequential".

Table 3 .
Classification report for the CNN_model from scratch.

Table 4 .
Parameters of the second model, TL model.

Table 5 .
Classification report for the TL_model.

Table 5 .
Classification report for the TL_model.

Table 6 .
Parameters of the third model, FT model.

Table 6 .
Parameters of the third model, FT model.

Table 7 .
Classification report for the FT_model.

Table 7 .
Classification report for the FT_model.