Modeling Based on Machine Learning and Synthetic Generated Dataset for the Needs of Multi-Criteria Decision-Making Forensics

Aleksić, Aleksandar; Radovanović, Radovan; Joksimović, Dušan; Ranđelović, Milan; Vuković, Vladimir; Ilić, Slaviša; Ranđelović, Dragan

doi:10.3390/sym17081254

Open AccessArticle

Modeling Based on Machine Learning and Synthetic Generated Dataset for the Needs of Multi-Criteria Decision-Making Forensics

by

Aleksandar Aleksić

¹

,

Radovan Radovanović

²

,

Dušan Joksimović

²

,

Milan Ranđelović

³

,

Vladimir Vuković

¹,

Slaviša Ilić

¹

and

Dragan Ranđelović

^1,*

¹

Faculty of Diplomacy and Security, University Union-Nikola Tesla, 11000 Belgrade, Serbia

²

University of Criminal Investigation and Police Studies, 11080 Belgrade, Serbia

³

Science Technology Park, 18104 Niš, Serbia

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(8), 1254; https://doi.org/10.3390/sym17081254

Submission received: 30 June 2025 / Revised: 26 July 2025 / Accepted: 1 August 2025 / Published: 6 August 2025

(This article belongs to the Special Issue Symmetry or Asymmetry in Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

Information is the primary driver of progress in today’s world, especially given the vast amounts of data available for extracting meaningful knowledge. The motivation for addressing the problem of forensic analysis—specifically the validity of decision making in multi-criteria contexts—stems from its limited coverage in the existing literature. Methodologically, machine learning and ensemble models represent key trends in this domain. Datasets used for such purposes can be either real or synthetic, with synthetic data becoming particularly valuable when real data is unavailable, in line with the growing use of publicly available Internet data. The integration of these two premises forms the central challenge addressed in this paper. The proposed solution is a three-layer ensemble model: the first layer employs multi-criteria decision-making methods; the second layer implements multiple machine learning algorithms through an optimized asymmetric procedure; and the third layer applies a voting mechanism for final decision making. The model is applied and evaluated through a case study analyzing the U.S. Army’s decision to replace the Colt 1911 pistol with the Beretta 92. The results demonstrate superior performance compared to state-of-the-art models, offering a promising approach to forensic decision analysis, especially in data-scarce environments.

Keywords:

machine learning; synthetic dataset; algorithm optimization; asymmetric procedure; classification; regression; decision making; handgun shooting

MSC:

68T05

1. Introduction

The widespread availability of digital data and information technologies (IT) across all areas of human activity at the beginning of the 21st century has enabled their extensive application for various purposes. One prominent example is the use of machine learning (ML), especially ensemble and hybrid methods, to solve numerous practical classification problems that humans face in daily life, including applications in medicine [1], economics [2], education [3], transportation [4], and beyond. Another important example, when it comes to solving the problem discussed in this paper, is the possibility of using a synthetically generated dataset with suitable artificial intelligence (AI) tools, such as ChatGPT-4, from a large language model (LLM) type.

The authors propose a three-layer ensemble model that uses multi-criteria methods in the first layer, an ensemble of multiple ML algorithms of different types in an optimized asymmetric procedure in the second layer, and a voting method for the final decision in the third layer.

In proposing the model described in this paper, the authors had in mind two fundamental elements that determine the quality of the solution: the dataset used and the methodology applied [5]. Because of that, in the first layer of the model, the well-known Analytic Hierarchy Process (AHP) and Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) multi-criteria methods are used to check the usability of the synthetically generated dataset and further for additional layers of model determination of decision validity in the case of the considered multi-criteria problem having added model layers. In continuation of the proposed model in the second layer, the authors proposed a novel algorithm for determining the most important factors in the considered multi-criteria problem, which determines the best alternative. Namely, it is well known that the type of problem addressed in this paper can be formulated as a binary classification task, defined by two different outcomes of output in the considered problem that can be represented as Yes or No, 1 or 0, etc. The authors opted for a stacking ensemble model, which enhances accuracy and reduces overfitting from a methodological perspective; moreover, it applies to a wide range of predictors and effectively handles potential class imbalances within the dataset, which all distinguish the proposed model from those that we can find in the literature review. By adopting this approach, the authors developed a model that capitalizes on the strengths of two different types of methods included in the ensemble while mitigating their separate weaknesses. Ensemble analysis, regarded as an optimization process for selecting significant subsets from a broader set of influencing factors [6], can be interpreted as a method for determining premises for intelligent decision making [7]. While the literature contains studies that apply various multi-criteria decision-making (MCDM) methods, to assess factor importance in one decision-making knowledge driven process [8], as well as big number of hybrid ML and MCDM methods [9], such known approaches are beyond the scope of this paper because the subject of this paper is proposing an ensemble optimization procedure for forensic validity of the decision made, which uses proposed ensemble advantages of the knowledge-driven MCDM methodology from one side, to check the usability of the generated synthetic dataset and of the data-driven ML methodology, and from another side to optimize the procedure of determining important criteria and making a final decision. The third layer of the proposed model deals with determining the final decision using a voting method.

As we noticed, the main subject of this paper is the forensic validity of the decision made. With this in mind, the authors proposed one model that comprehensively addresses both aspects: dataset selection using the MCDM methodology for validating the usability of the generated synthetic dataset as an input condition and the application of ML as the core methodology for determining the most important criteria through asymmetric optimization and vote-based final decision making. It is well known from the literature that ML methodology is data driven, while MCDM methodology is knowledge driven, so their use together could have benefits since they have complementary advantages but also disadvantages. From four known approaches of common use, the authors implemented, in the proposed model, the second approach, which involves the application of MCDM methods first and then the integration of ML methods [9]. It is also known that each classification problem depends not only on the selected methodology but also on the selected dataset and types of input variables (categorical, numerical, or both types) and the prevalence of some classes. Because of this, the authors considered the possibility of using one type of stacking ensemble of ML as expedient because it is suitable for different classes of criterion types and prevalence and, thus, could be a possible solution to various problems [6]. And, in the end, it is also known that for the decision-making process, besides others and regarding the selection of handguns, various MCDM methods are available, such as AHP, TOPSIS, fuzzy logic [10], and Elimination and Choice Translating Reality (ELECTRE) [11], among others. To assess the impacts of different criteria and, in our considered problem, the handgun-related factors, whether quantitative or qualitative, various methodologies can be employed. These range from classical statistical analysis techniques, such as regression, discriminant, and factor analyses, to more modern MCDM such as large group decision making based on interval rough integrated cloud models [12] and modern ML algorithms that have gained prominence since the early 21st century, including classification algorithms, neural networks, and different ensembles [13].

The proposed ensemble methodology was chosen to solve the problem of handgun selection, applied here through a specific case study analyzing the forensic validity of the decision to adopt a particular type of pistol in the U.S. Army at the end of the 20th century. This analysis may be of practical use to military and police forces worldwide, as they often face similar challenges in real-time decision making regarding the selection of various types of weapons. Moreover, the model proposed in this paper provides a structured approach not only for making such decisions but also for verifying their correctness through data-driven analysis.

Namely, in the armament of various military and police formations, a wide range of weapons is encountered, from so-called light to heavy weapons, but the most commonly used are handguns, i.e., pistols. Determining the influence of various factors on the quality, particularly the shooting accuracy of handguns is important not only for ensuring their most effective use but also for making informed decisions when selecting among the available pistol models on the arms market to equip military or police units. Naturally, the choice of relevant factors may vary depending on the specific type and purpose of the formation. A comprehensive comparative analysis of handguns, both those currently in use and those planned for adoption by the U.S. Army, was provided by Jenkins and Lowrey [14], while an analysis of handgun selection for officers in the Yugoslav Army can be found in [15]. These factors are different types of related conditions [16,17], such as the following:

Atmospheric variables (temperature, humidity, pressure, wind speed, light, …);
Physical variables (target distance, trigger pull force, …);
Biomechanical (grip stability, …);
Behavioral variables (shooter experience, shooter heart rate, …).

We have addressed some of the most important of them: target distance, wind speed, shooter experience, grip stability, trigger pull force, light, heart rate, ammunition quality, temperature, humidity, and precision.

To assess the impact of the aforementioned handgun-related factors, whether quantitative or qualitative, various methodologies can be employed. These range from usually used modern MCDM as fuzzy and large group decision- making based on interval rough integrated cloud, classical statistical analysis techniques, such as regression, discriminant, and factor analysis, to more modern ML algorithms that have gained prominence since the early 21st century, including classification algorithms, neural networks, and others theirs integrations known as hybrid models [18,19].

The main goal of this paper is to explore the possibility of constructing an ensemble ML model capable of confirming or refuting the correctness of decisions regarding the selection of specific handgun types for use in military or police forces.

To achieve this goal, the authors conducted a study of the following:

First, we examined a multi-layer model that integrates MCDM and ML methods. MCDM methods are used in beginning layer to provide to the model with necessary suitable dataset, and after that, we used an ensemble methodology integrating two widely used approaches, regression and filter-based feature selection with classification algorithms, into a single ensemble ML model. And on the last and third layer, a voting ensemble is applied for making the final decision. The proposed model is, in this way, practically designed as an asymmetric optimization procedure. The integration aims to leverage the strengths of each method to improve overall performance. Given the complexity of the problem and the wide array of available algorithms and integration strategies for ensemble modeling, the proposed model provides a flexible foundation for future extensions and refinements. According to the authors’ knowledge, a model combining modern ML techniques and MCDM in the proposed configuration for the forensic analysis of handgun selection has not previously been addressed in the existing literature.
Second, the proposed methodology was evaluated through a case study involving the forensic assessment of the U.S. Army’s 1985 decision to adopt the Beretta 92 handgun in place of the previously used Colt 1911. To enhance the realism of the analysis, the Glock 17, the third most widely used handgun globally, was included in the evaluation [18]. Due to the impracticality of conducting real-world experiments and the lack of publicly available datasets on factors affecting handgun accuracy, the authors utilized synthetic data generated using ChatGPT-4, one of the most prominent AI-based text generation tools, for model evaluation.

The obtained results confirmed the correctness of the historical decision. In support of the research objectives, the authors formulated and addressed two key hypotheses:

First (conditional) hypothesis: It is possible to use synthetically generated data from the ChatGPT-4 AI tool for this type of forensic analysis.
Second (final) hypothesis: It is possible to construct a novel ensemble model that solves the problem of handgun type selection in a more effective manner than existing state-of-the-art approaches.

To achieve the stated objective, the remainder of this manuscript is organized into the following sections: After the Introduction presented in the Section 1, the Section 2, ‘Materials and Methods’, describes the methodologies employed in constructing the proposed model. This section includes a comprehensive review of the model in the Section 2.1 and an overview of the data used in the Section 2.2. The Section 3, ‘Proposed Solution’, presents the novel model developed to address the defined problem. The Section 4, ‘Results and Findings’, presents and discusses the results obtained from applying the proposed model to the case study, including an analysis of its performance. Finally, the Section 5, ‘Conclusions’, summarizes the main contributions of the research and outlines directions for future work aimed at developing more efficient and accurate solutions to the problem explored in this study.

2. Materials and Methods

As mentioned in the previous section, the current trend in developing solutions for identifying the influence of individual factors on various processes as well as for supporting intelligent decision making across different domains of human life is increasingly centered around the use of advanced ML, especially ensemble algorithms and hybrid methodologies [19,20], along with the traditionally most frequently used MCDM algorithms, which are also more and more advanced oriented, especially on fuzzy logic and large group decision making with a roughly integrated asymmetric cloud model under multi-granularity linguistic environment. Also, we noticed that the use of synthetically generated data from different AI tools is a trend in the considered type of forensics but also in other analyses when real-world data is hard or impossible to obtain. Namely, the datasets are crucial to evaluating the reproducibility of the proposed methodology, and because of that, when it cannot be generated as real-world data, we use the possibility of creating a synthetic dataset.

These approaches also facilitate the development of ensemble models for intelligent forensic analysis of past decisions [21,22]. Following this trend, the authors of this paper proposed a novel three-layer model for handgun type selection forensics, consisting of the following core components, as illustrated in Figure 1:

The Input Layer, which handles essential dataset preparation and validation for its suitability in solving the given problem, using MCDM methodology;
The Base Layer, which integrates two widely adopted approaches—classical regression techniques and modern ML-based feature selection—into a unified ensemble model through an asymmetric optimization procedure;
The Output Layer, which applies a simple voting mechanism to determine the final decision based on the number of shared significant factors across the different considered handgun type.

To evaluate the proposed model, the authors conducted a case study focused on the forensic analysis of the U.S. Army’s decision, made at the end of the 20th century, to replace the Colt 1911 pistol with the Beretta 92 in 1985. Namely, it is well known that all armies—specifically their educational institutions and units—include a discipline called ‘weapons knowledge with shooting instruction’. One of the key reasons for studying this subject is to evaluate the potential for new tactical and technical requirements for weapons that may be used in the future. This involves considering possibilities from a constructive, technological, or techno-economic perspective. Given that this paper analyzes handguns—which are widely used in terms of quantity—as one of the tools for carrying out state functions, it is of paramount importance to apply all relevant models to support sound decision making regarding innovation or replacement with a new type or model of this technical asset used by the army and police. If the techno-economic analysis of the innovation supports the tactical and technical requirements satisfied by the new (replacement) weapon—derived from a comparison of current usage, the potential of new materials, modern production and maintenance technologies, and projections of future operational environments—then it becomes imperative to proceed with the replacement. Due to the unavailability of real data that informed the original, already mentioned decision and the lack of relevant datasets on prominent open data platforms such as Kaggle, Data.gov, and the UCI Machine Learning Repository, the authors opted to generate synthetic data using the ChatGPT-4 AI tool. This approach was intended to address the data scarcity by producing a dataset that reflects real-world-like scenarios [23,24].

Based on the general functional block diagram of the proposed model, the Methods subsection focuses on the methods used in the Base Layer, where a novel ensemble ML model is introduced for determining the significance of various factors influencing pistol shooting success. This subsection also includes the description of methods used in the Output Layer, which uses a simple voting-based stacking method to make the final decision. Practically, this subsection addresses the methodologies included in the proposed model, which help to examine the second (final) hypothesis stated in the Introduction, which asserts that it is possible to construct a novel ensemble model that effectively solves the problem of handgun type selection forensics. Meanwhile, the Materials subsection provides a description of the synthetic dataset used, along with details on the preprocessing steps necessary to prepare the data for use in the proposed model. This subsection also explains the application of the AHP and TOPSIS methodologies used to evaluate the usability of the preprocessed dataset. Practically, the descriptions of the used synthetic dataset as well as the decision-making methodology in this subsection are necessary to check the first (conditional) hypothesis presented in the Introduction, which states that the use of synthetic data generated by ChatGPT-4 is valid if the outcomes based on this data (i.e., pistol type selection) match the actual decisions made by the U.S. Army.

2.1. Methods

The ML algorithms employed by the authors to address the problem of handgun type selection forensics are designed to induce logical patterns that can later be interpreted and applied by humans to solve various tasks associated with the problem defined by the extracted dataset. The validity of the learned knowledge is assessed by splitting the dataset into a training set and a test set, where the performance of the learning process is measured using predictive accuracy (the proportion of correctly classified instances on previously unseen data).

To address the identified task, the authors propose an ensemble ML model (illustrated in Figure 1) that integrates several types of learners—binary regression, feature selection algorithms, and classification models—into a cohesive framework. This ensemble strategy harnesses the complementary strengths of each component to achieve enhanced performance compared to currently known state-of-the-art models.

In the subsections that follow, the authors briefly present the methodologies incorporated in the proposed ensemble model, as depicted in Figure 1. As previously discussed, this hybrid model combines the MCDM methodology at the first Input Layer with ML methodologies of logistic regression, feature selection, and classification algorithms at the second Base Layer and the voting algorithm at the third Output Layer, with all of them organized within a multi-layered ensemble architecture group.

2.1.1. Logistic (Binary) Regression

In classification, as a widely used ML technique, it is often necessary to develop probabilistic classifiers, models that, in addition to assigning an instance to the most likely class, also provide the probability of belonging to that class. In the binary classification context addressed in this paper (i.e., hit or miss), this involves calculating the probability of each outcome and visualizing the results using separate calibration plots for each class. A probabilistic classifier is considered well calibrated when the predicted probabilities closely match the actual observed frequencies.

This concept of model calibration has been widely adopted by researchers working on problems similar to the one considered in this study. One notable example is univariate calibration using logistic regression in binary classification settings [25].

In general, binary logistic regression models the relationship between categorical dependent variables and one or more independent variables, which can be continuous, binary, or categorical. Among several logistic regression types, the authors employed the Enter method for binary regression in this study. The primary objective of logistic regression is to construct a model that accurately describes the relationship between the outcome variable and the set of predictor variables.

The one more possible form of binary regression is as in (1):

o d d s = \frac{p}{1 - p} = e^{b_{0}} \cdot e^{b_{1} X_{1}} \cdot e^{b_{2} X_{2}} \cdot \dots \cdot e^{b_{k} X_{k}}

(1)

From Formula (1), it is evident that when a variable Xi increases by one unit and all other parameters are unchanged, then the odds changes directly proportional by a parameter

e^{\land b_{i}} - 1

(for

b_{i} > 0

increases; for

b_{i} < 0

decreases) as it is given in Equation (2).

e^{b_{i} (1 + X_{i})} - e^{b_{i} X_{i}} = e^{b_{i} X_{i}} (e^{b_{i}} - 1)

(2)

So, the calculated factor

e^{b_{i}}

is the odds ratio (O.R.) of suitable independent variable

X_{i}

, which gives the relative amount by which the odds of the outcome decrease (

O . R . < 1

) or increase (

O . R . > 1

) when the independent variable increases by one unit.

The authors selected IBM SPSS version 17, one of the most widely used statistical software tools, for conducting the binary regression analysis [26].

2.1.2. Classification Algorithms

Classification methodology is a widely used technique in ML that supports experts across various domains in extracting knowledge from large volumes of data. Classification algorithms belong to the category of supervised ML techniques and are primarily used for predictive modeling. They require labeled instances, where each object is associated with a specific class (attribute), and the goal is to predict the value of this categorical class using the values of other predictor attributes. Some of the most well-known classification methods include decision tree-based algorithms (e.g., ID3, C4.5, and Random Tree), regression-based algorithms (i.e., Linear Regression and logistic regression), Bayesian classifiers (i.e., BayesNet and Naive Bayes), neural networks (i.e., Support Vector Machine and Single/Multi-Layer Perceptron), and association rule-based classifiers (i.e., JRip and PART), among others [27,28,29].

For the problem addressed in this paper, the authors applied a binary classification approach in their proposed ensemble model, classifying data into two categories: positive (true, shooting hit) and negative (false, shooting miss).

The outcomes of the predictions are summarized using a confusion matrix, as shown in Table 1.

The number of members in the considered set shown in the Table 1 is the sum of positive and negative cases and will be classified with notation N, i.e.,

T P + F P + T N + F N = N

. All results that are presented in Table 1, for the considered case of a two-class classifier, can be given for most important measures of classification accuracy, precision, recall, and F1 with Formulas (3)–(6), which are described in [28,30,31]:

A c c u r a c y = (T P + T N) / N

(3)

P r e c i s i o n = T P / (T P + F P)

(4)

R e c a l l = T P / (T P + F N)

(5)

F 1 m e a s u r e = 2 \times (p r e c i s i o n \times r e c a l l) / (p r e c i s i o n + r e c a l l)

(6)

The Receiver Operating Characteristic (ROC) curve presents a binary classifier’s performance that is obtained comparing the relationship between sensitivity and specificity for every possible cut-off. The ROC curve is a graph with the x-axis showing 1 − specificity (=false positive fraction = FP/(FP + TN)) and the y-axis showing sensitivity (=true positive fraction = TP/(TP + FN)). The Area Under the Curve ROC (AUCROC) is the most commonly used metric to assess the diagnostic accuracy of a model. AUCROC values greater than 60%mm are considered acceptable, as they indicate a good classification performance. It is important to note that, as established in the literature, when the dataset is imbalanced, the Precision–Recall Curve (PRC) and its corresponding Area Under the Curve PRC (AUCPRC) are preferred over the AUCROC for evaluating model performance [31].

In the proposed model, the authors employed some of the most widely used classification algorithms, each from a different methodological group as categorized in one of the most popular tools for ML, Weka version 3.8.6 [32]. Specifically, they used

AdaBoost (AB) from the Meta group [33];
PART from the Rules group [34];
Random Forest (RF) from the Trees group [35].

2.1.3. Future Selection Techniques

The dimensionality of the dataset used in various ML classification methods has a significant impact on the quality of results. It is well known that dimensionality reduction can lead to improved performance in solving complex problems. Selecting the most significant parameters (factors, criteria, etc.) to form an optimal sub-dataset prior to classification helps achieve better results by reducing noise, minimizing overfitting, and shortening training time [36]. The evaluation process applies various effective measures for eliminating redundant and irrelevant features, such as consistency measures [37] and correlation measures [38].

Feature selection methods are typically categorized into three main groups [39]:

Filter methods, such as Relief, CorrelationAttributeEval, InfoGain, etc.;
Wrapper methods, such as Greedy Stepwise, BestFirst, Genetic Search, etc.;
Embedded methods, which combine the strengths of filters and wrappers, for example, Ridge Regression and decision tree-based algorithms like NBTree.

In the proposed model presented in this paper, the first two groups—filter and wrapper methods—are used. Therefore, a brief description of these two groups is provided in the subsequent paragraphs of this section. Before this short description, it is important to note that while wrapper methods often provide better performance, they are typically more time consuming. Therefore, to balance the strengths and weaknesses of both approaches, the authors chose to use two filter methods and one wrapper method in the proposed asymmetric model.

Filter methods

For a given dataset, filter methods initiate the search process by defining an initial subset, which may be an empty set, a randomly selected subset, or the full feature set. The algorithm then explores the feature space using a defined search strategy. Each newly generated subset is evaluated using an independent evaluation measure and compared to the previously best-performing subset. If it performs better, it becomes the new best subset. This process continues until a predefined stopping criterion is satisfied. The final outcome of the method is the most recently identified best-performing subset.

Different combinations of search strategies and evaluation measures allow for the development of various algorithms within the filter model framework [40,41,42].

The software tool Weka, which offers built-in functionality for feature selection, was used to reduce the volume of data through the application of multiple algorithms. Because of its flexibility and reliability, Weka was employed to evaluate the proposed model on the selected case study. In practice, the application of these methods enabled the identification of significant factors influencing shooting success (i.e., a hit) across three different pistol types. This process also facilitated the development of a prediction model using the same methodology as described earlier for logistic regression.

For the proposed model, the authors used the following two filter-based feature selection algorithms: Relief and ReliefF algorithm and CorrelationAttributeEval algorithm.

Wrapper methods

This group of ML feature selection methods relies on suitable modeling algorithms to evaluate subsets of attributes based on how well they support classification or prediction. Such procedures are computationally intensive, primarily because the ML algorithm must be executed repeatedly. Evaluating the model’s performance for each subset becomes increasingly demanding, especially as the number of possible subsets grows exponentially with the number of attributes. To address this complexity, various search strategies are employed, most notably those based on greedy algorithms, for which the general working principle is illustrated in Algorithm 1 [43].

The classification frameworks [44] in which wrapper methods are applied can be categorized based on the attribute search methodology used. They are divided into two groups: randomized methods, which rely on stochastic search approaches, and deterministic wrapper methods.

Algorithm 1: The greedy search technique

1. The solution set with suitable answers is empty at the beginning.
2. In each step, an item is added to the set that represents the solution, continuing until a final solution is reached.
3. The current item is kept only if the solution set is feasible.
4. Else, the item is rejected and is never considered again.

The deterministic methods are further divided into subgroups, one of which employs a complete attribute space search using a sequential strategy. These approaches are time consuming but tend to yield the most accurate results. Due to their reliability, sequential search techniques are among the most commonly used in wrapper algorithms. For this reason, the authors selected the GreedyStepwise algorithm from this subgroup to be included in the proposed model.

2.1.4. Ensemble Methods

It is well established in the literature that ML ensemble methods are based on the principle that combining algorithms of different types can produce better results than any individual algorithm alone, as well as better performance than many existing state-of-the-art models. The most commonly used taxonomy of ensemble methods are as follows:

Bootstrap (Bagging);
Boosting;
Voting;
Stacking.

Ensemble learning can be described in short, as found in the literature [45]:

Ensemble learning combines multiple ML and other types of algorithms into a single unified model to improve overall performance. Bagging primarily aims to reduce variance, boosting focuses on minimizing bias, while stacking seeks to enhance classification or prediction accuracy. For these reasons, the authors applied stacking in their proposed model to address the handgun type selection forensics problem, formulating it as a binary classification task.
Ensemble learning offers several key advantages, including improved accuracy, interpretability, robustness, and model combination.
It is important to note that in many real-world problems—including the case study presented in this paper—real-time computation is not a strict constraint. Moreover, the continuous advancement of computing technologies enables increasingly faster processing, making the additional computational demands of ensemble models less problematic.

Voting

Voting is a widely used ensemble learning technique, particularly effective for classification tasks, which is how the authors framed the problem addressed in this paper. However, voting can also be applied to regression problems, as illustrated in Figure 2.

In a voting ensemble, each individual algorithm within the ensemble independently produces a classification or prediction for the input dataset [46]. The final decision is then determined by combining these individual outputs using one of several possible voting schemes:

Majority Voting: Each model votes for a class label, and the class receiving the majority of votes is selected as the final classification or prediction. This is the approach used by the authors to make the final decision in their model. In the event of a tie (i.e., equal number of votes), a tie-breaking rule is applied, such as selecting the factor with the higher associated probability.
Weighted Voting: Each model’s vote is assigned a weight based on its individual performance, giving more influence to stronger models.
Soft Voting: This scheme is used when models provide probability estimates or confidence scores for each class label rather than discrete class predictions.

In the third (output) layer of the proposed model, the authors implemented majority voting to finalize the decision.

They compared the number of significant factors that matched across the three considered types of pistols (or m in the general case). To determine which of the other two (m − 1 in the general case) types should replace a given pistol type, they selected the one with the greatest number of identical significant factors, as previously identified in the second (base) layer of the model (see Figure 2).

Stacking

The stacking type of the ensemble algorithm, selected by the authors for the second (base) layer of the proposed model—designed to determine the significant factors contributing to successful shooting and to evaluate each of the three selected pistols—involves training multiple ML algorithms and then combining their outputs into a unified predictive or classification model (Figure 3).

This stacking approach typically demonstrates better performance than any of its individual component algorithms, including those considered state of the art for similar problems [47]. Stacking is applicable to both supervised and unsupervised learning tasks; in the presented case study, it is applied in a supervised learning context.

As illustrated in Figure 3, each algorithm within the ensemble is first trained on the available dataset. Subsequently, a meta-learner (combiner algorithm) is trained to make the final classification or prediction based on the outputs of the individual learners. To reduce the risk of overfitting, cross-validation is generally used during the training phase.

Although logistic regression is commonly used as the combiner algorithm in many stacking implementations, the proposed model takes a different approach: it uses classification algorithms as the combiner, while the base learners consist of a binary regression model and several feature selection algorithms. This integration enables more robust identification of critical factors influencing shooting success across multiple pistol types.

2.2. Materials

The determination of the weights of the considered factors influencing pistol shooting accuracy in this study is based on synthetic data generated using ChatGPT-4, an LLM capable of independently analyzing and deriving insights from complex datasets. Recognizing that ChatGPT-4 can only generate data based on patterns learned during its training—and thus may not always reflect real-world distributions or accuracy—the authors designed a real-world use-case scenario comprising a set of expected inputs and outputs to guide the generation of a dataset suitable for the case study [23,24].

The resulting dataset is provided as Supplementary Materials under the following name:

ChatGPTsyntheticGeneratedDatasetHandgunShooting-Colt1911Bereta92Glock17.xls.

To be used with the proposed model (whose functional block diagram is shown in Figure 1), this dataset was processed into three separate datasets, one for each type of pistol, enabling model-specific analysis.

These individual datasets are also included as Supplementary Materials, named as follows:

ChatGPTsyntheticGeneratedDatasetHandgunShooting-Colt1911.xls;
ChatGPTsyntheticGeneratedDatasetHandgunShooting-Bereta92.xls;
ChatGPTsyntheticGeneratedDatasetHandgunShooting-Glock17.xls.

All four Supplementary Files are accessible on the corresponding author’s official website at:

https://it.fdb.edu.rs/wp-content/uploads/2023/06/Current-scientific-work.zip accessed on 1 July 2025.

To evaluate the quality and usability of the synthetically generated dataset for testing the proposed model in solving the considered problem, the authors applied the TOPSIS decision-making method. This was preceded by a critical step: validating whether the decision obtained using the synthetic data matched the historical decision made by the U.S. Army, based on its real-world dataset. This consistency served as the foundation for accepting the dataset’s relevance and suitability for further analysis.

2.2.1. Generation and Preprocessing of the Required Dataset

Considering given the practical unavailability of the original dataset used by the U.S. Army authority to decide on the selection of small arms based on shooting accuracy as one of the core problems that is considered in this paper, the authors opted to use, within their proposed model, synthesized data generated by ChatGPT-4, which belongs to the group of LLMs.

This synthetic dataset refers to artificially generated data that mimics the statistical patterns of real-world data but does not contain actual historical values. Instead, it is produced using algorithmic simulations that replicate realistic distributions and dependencies. Despite lacking real instances, such data can be effectively used for training, testing, and validating ML models, especially in cases like the one in this paper, where real-world data is limited or unavailable [48].

The dataset used to evaluate the novel ensemble model, as presented in this case study, is attached to this paper as Supplementary Materials and is available at:

https://it.fdb.edu.rs/wp-content/uploads/2023/04/ChatGPTsyntheticGeneratedDataset.zip accessed on 1 July 2025.

This dataset was obtained by prompting ChatGPT-4 with the following query:

“I am looking for a database of shooting results using three different types of pistols, depending on 11 factors, with a total of 1000 entries. It must refer to exactly three specific, arbitrarily chosen types of pistols and exactly eleven arbitrarily chosen factors that influence the outcome of the shooting. The table needs to include, in the last column, the success of shooting—i.e., hit or miss.”

The resulting master dataset and its three derived subsets, each corresponding to a specific pistol type, are now ready for preprocessing. This step forms the first phase of the first layer in the proposed three-layer ensemble model (see Figure 1). The data is prepared for the application of binary regression and classification algorithms, which will be used in the second layer of the model.

The following Excel files are included as Supplementary Materials:

ChatGPTsyntheticGeneratedDatasetHandgunShooing-Colt1911Bereta92Glock17.xls;
ChatGPTsyntheticGeneratedDatasetHandgunShooting-Colt1911.xls;
ChatGPTsyntheticGeneratedDatasetHandgunShooting-Bereta92.xls;
ChatGPTsyntheticGeneratedDatasetHandgunShooting-Glock17.xls.

2.2.2. MCDM Evaluation of the Synthetic Generated Data to Solve the Considered Problem

To evaluate the suitability of the generated dataset for solving the considered problem, the authors used a free online TOPSIS tool [49] to determine the ranking order of the three considered pistol types. The goal was to verify whether the results obtained using the synthetically generated data matched those based on real-world data, specifically, the decision made by the U.S. Army.

As a preparatory step, the authors first applied the AHP method using a freely available AHP tool [50] to assess the relative importance of the individual factors. This assessment was based on the complete (non-reduced) synthesized dataset containing all three pistol types:

ChatGPTsyntheticGeneratedDatasetHandgunShooting-Colt1911Bereta92Glock17.xls.

In addition to the 11 factors originally included in the dataset, the authors incorporated a “price” factor—which was not part of the synthetic data—by using real-world pricing information and expert opinion to estimate its relative importance. This allowed for a more comprehensive analysis.

With the weights of all relevant criteria determined, the authors were able to construct a normalized multi-criteria decision matrix, which served as the input for the TOPSIS tool. This matrix was then used to evaluate the applicability of the synthetic dataset by checking whether it would produce the same decision—i.e., selecting the same optimal pistol type—as was made historically by the U.S. Army.

AHP

The essence of the AHP method can be summarized as follows: it structures a complex decision-making problem into a hierarchy that may involve multiple criteria, several alternatives, and potentially multiple decision makers (i.e., group decision making), distributed across several hierarchical levels. The method determines the weighting coefficients for both criteria and alternatives at each level, ultimately leading to the formation of a final ranking of alternatives [51].

According to linear algebra theory, any inconsistency or deviation in the values of the decision matrix coefficients leads to corresponding deviations in the matrix’s eigenvalues. Therefore, the objective is to find an eigenvalue that is approximately equal to the trace of the matrix (denoted by n), as this condition indicates minimal judgment errors made by the decision maker. The eigenvalue that satisfies this condition is the largest eigenvalue, and we mark it as

λ_{\max}

. The consistency index (CI) is a measure of deviation

λ_{\max}

from n and is calculated according to Formula (7):

C I = \frac{λ_{\max} - n}{n - 1}

(7)

The AHP method considers an inconsistency of less than 10% acceptable, i.e., for a CI value of less than 0.1.

TOPSIS

The problem of multi-criteria analysis with m alternatives and n criteria can also be represented as a geometric system of m points in an n-dimensional space. Starting from this fact, Hwang and Yoon [52] developed the TOPSIS method, based on the opinion that the optimal alternative, i.e., the point that represents it, should have the smallest distance from the positive-ideal (di+) and the largest distance from the negative-ideal (di−) solutions in the geometric sense. But the problem is that one of these possibilities does not exclude the other. The TOPSIS method overcomes the mentioned problem by defining the proximity index or relative proximity (Ci) to the positive-ideal solution, combining the proximity of the positive-ideal and the distance of the negative-ideal solutions from the observed alternative. The alternative with the highest index of closeness to the positive-ideal solution is chosen as optimal.

3. Proposed Ensemble Method for Handgun Type Selection Forensics

To address the problem outlined in this paper, following the necessary preprocessing of the synthetically generated dataset—including the evaluation of its applicability as an AI-generated dataset—the authors proceeded according to the block diagram of the proposed model presented in Figure 1, specifically its first part, the Input Layer. Subsequently, the authors introduce a novel ensemble algorithm, described in Algorithm 2 and illustrated through a block diagram in Figure 4, which constitutes the second part, the Base Layer of the proposed model (as shown in Figure 1). This layer is responsible for solving the sub-problem of identifying significant factors that influence shooting accuracy across three (or more, in general m) selected types of pistols.

Algorithm 2: Determining biomechanical and atmospheric predictors

Dataset with number of valid

n_{i}

attributes;

i = 1 - m

, m number of types of pistols
* 1. Input preprocessed verified data for one of m types of pistols;
Check balance of uploaded data;
IF dataset is balanced BM = AUCROC; GO TO next step
ELSE BM = AUCPRC; GO TO next step
** 2. Determine best from three classification algorithms AB, PART and RF algorithm which has the highest value of BM.
Perform regression with

n_{i}

attributes; check regression goodness;
IF regression OK (Hosmer–Lemeshow), number of attributes is

n_{i}

;
GO TO next step ELSE GO TO step 5
*** 3. Implement Feature selection with two filter and one wrapper algorithm;
using intersection of obtained results determine

l_{i} < n_{i}

factors;
check regression goodness with

l_{i}

factors;
IF regression OK number of attributes is

l_{i}

; GO TO next step
ELSE number of attributes is

n_{i}

; GO TO step 5
**** 4. Determine BM1 with best classifier determined in step 2; check

B M 1 > B M

;
IF Yes number of attributes is

l_{i}

; GO TO next step
ELSE number of attributes is

n_{i}

; GO TO step 5
***** 5. Check is algorithm finished for all types of pistols;
IF

i < m

; GO TO step 1 ELSE GO TO end

* The input consists of already preprocessed and validated data, where for each of the m types of pistols, n_i attributes influencing shooting success are considered. The dataset is then checked for class balance. If the dataset is balanced, the primary performance metric for evaluating the proposed model will be BM = AUC-ROC; otherwise, BM = AUC-PRC will be used.
** A binary logistic regression is performed using 11 biomechanical attributes ( $n_{i}$ in the general case) as predictors. The dependent variable represents the binary outcome, either a hit or a miss. If any predictors exhibit unacceptable levels of multicollinearity, they must be excluded from the proposed model. A classification table is used to evaluate the model’s classification accuracy and to compare it against the accuracy expected from random classification. To assess the proportion of variance explained by the model, the Cox–Snell R Square and Nagelkerke R Square statistics are calculated. Finally, the Hosmer and Lemeshow test is applied as the key indicator of the model’s goodness of fit. If the test of goodness is positive, the algorithm continues with the next step, but if opposite, it continues with step 5 without optimization of the number of factors: they remain the same $n_{i}$ factors. The authors propose that the classification performance of the model be evaluated using the AUC-ROC or AUC-PRC, depending on whether the dataset is balanced, in accordance with standard measures for binary classification problems [30,31]. To determine the best-performing classification algorithm, three algorithms representing distinct classification paradigms—Random Forest (from the Decision Tree group), PART (from the Rule-based group), and AdaBoost (from the Meta-learning group)—as provided in the WEKA tool, are initially tested. The best-performing algorithm among these will be selected for use in Step 4, which involves attribute selection using several feature selection methods to identify the most relevant predictors.
*** Using three feature selection algorithms—two from the filter group (ReliefF and CorrelationEval) and one from the wrapper group algorithms (GreedyStepwise)—attribute selection is performed. The intersection operation is applied to the three resulting sets obtained from each of these feature selection methods to identify the common attributes. At the end of this step, the quality of the selected attribute subset is evaluated using binary logistic regression, as described in Step 2. The regression model is now constructed with $l_{i} < n_{i}$ predictors, where $l_{i}$ is the number of attributes retained after selection. Model evaluation is performed using a classification table, Cox–Snell R Square, Nagelkerke R Square, and the Hosmer and Lemeshow test, as previously explained. If the evaluation confirms the adequacy of the reduced model, and if the check of goodness of regression is OK, the procedure continues using the optimized subset of $l_{i}$ attributes. Otherwise, the process terminates with the original $n_{i}$ attributes retained in the model.
**** The value of the performance measure BM, defined as in Step 1 and now denoted as BM₁, is determined using the most effective classifier identified in Step 2 of this algorithm. We check if $B M 1 > B M$ and if yes, an optimized subset of factors is $l_{i}$ , and in the opposite case, the optimized subset is of $n_{i}$ attributes.
***** In the final step of the proposed algorithm, it is checked whether all three datasets (or m datasets in the general case) corresponding to the considered types of pistols have been processed. If this condition is met, the algorithm terminates; otherwise, it returns to Step 1 and continues the procedure.

Figure 4. Proposed stacking ensemble method for determination significant factors in handgun type selection forensics.

To complete the solution of the problem addressed in this paper, following the successful execution of the first and second parts, the third part—the Output Layer of the proposed model, as shown in Figure 1—is applied. In this final step, a voting approach is used to make the forensic decision regarding pistol type selection. The selected pistol type is determined based on which of the m considered pistol types shares the greatest number of significant factors with the identified optimal set.

4. Results and Findings

The proposed model was evaluated through a case study involving the forensic analysis of the decision made by the U.S. Army in 1985 to adopt the Beretta 92 handgun, replacing the previously used Colt 1911.

To enhance the realism of the analysis, the Glock 17, the third most widely used pistol globally, was also included in the evaluation.

Given the impossibility of conducting real-world experiments and the lack of publicly available datasets on factors influencing handgun shooting accuracy, the authors used synthetically generated data produced by the ChatGPT-4 LLM.

4.1. First Layer of the Proposed Model: Checking the Usability of the Synthetic Generated Dataset

As described in the introduction to Section 2 (Materials and Methods), the advancement of modern IT, particularly machine ML, enables the development of ensemble models for intelligent analysis—including forensic analysis—of previously made decisions, which is the focus of this paper. Leveraging this approach, the authors propose a novel three-layer model for handgun type selection forensics, composed of three fundamental parts—the Input Layer, Base Layer, and Output Layer—as illustrated in Figure 1.

To evaluate the proposed model, the authors conducted a case study focused on the forensic analysis of the U.S. Army’s decision, made in 1985, to replace the Colt 1911 with the Beretta 92 as its standard-issue pistol. As previously explained in Section 2.2 Materials, due to the unavailability of the actual data that informed the original decision as well as the absence of relevant datasets on major open data platforms, the authors opted to generate synthetic data using the ChatGPT-4 AI tool.

This approach was adopted to address the data scarcity in a manner that reflects realistic scenarios.

The data was created in response to the following prompt:

“I am looking for a database of shooting results using three different types of pistols, depending on 11 factors, with a total of 1000 entries. It must refer to exactly three specific, arbitrarily chosen types of pistols and exactly eleven arbitrarily chosen factors that influence the outcome of the shooting. The table needs to include in the last column, success of shooting, i.e., hit or miss”. A synthetic dataset was successfully generated, enabling the implementation and evaluation of the proposed model.

The process began with the execution of the Input Layer, as defined in the first step of the model. The AHP was applied to determine the weights of the 11 factors influencing shooting success for the three considered pistol types.

The results of this analysis are presented in Table 2.

By checking the consistency coefficient according to Equation (7), a value of CI = 0 was obtained, confirming the reliability of the conducted AHP analysis. It is important to note that in this first step of the Input Layer of the proposed model, the generated dataset was also divided into separate subsets—one for each of the considered pistol types—in preparation for use in the next Base Layer of the model.

In the second step of this Input Layer, based on the results obtained in the previous step, an evaluation of the usability of the synthetically generated data was conducted. This was achieved by applying the TOPSIS method and performing a multi-criteria analysis to determine the ranking of the three considered pistol types, as presented in Table 3.

The results obtained using the TOPSIS method are presented in Table 4, and they confirm that the decision made by the relevant U.S. Army authority to replace the Colt 1911 with the Beretta 92 was a correct one. The Beretta 92 was identified as the best alternative among the three pistol types considered, as shown in Table 4.

Namely, the results presented in Table 4 clearly show that preference should be given to the Beretta 92 alternative, as it ranks highest according to two out of three key parameters: it is the closest to the positive-ideal solution, it is not the closest to the negative-ideal solution, and most importantly, it has the highest relative closeness index (C_i). In this way, the authors confirmed the usability of the synthetically generated dataset, as it led to the same conclusion as the actual decision made by the U.S. Army authority.

4.2. Second Layer of Proposed Model: Stacking Ensemble for Determining Most Important Factors

Since the results obtained at the end of the Input Layer of the proposed model clearly demonstrated that the synthetically generated dataset is suitable for constructing an effective ensemble ML algorithm, the authors proceeded with the application of the proposed stacking ensemble ML algorithm (Algorithm 2), as outlined in the Base Layer of the model and illustrated in Figure 4.

In the first step of this algorithm, the authors used the three datasets—one for each type of pistol—that had been prepared earlier in Section 4.1: Input Layer of the Proposed Model: Checking the Usability of the Synthetic Dataset. This paper presents detailed results for two of the considered pistol types: Beretta 92, which was chosen to replace the previously used Colt 1911, and Colt 1911 itself, as the correctness of that replacement decision is being evaluated. For the third type, Glock 17, only summary results are provided.

A binary logistic regression procedure was performed using SPSS on the available datasets as part of the first step of Algorithm 2, starting with the dataset for the Beretta 92 pistol. All 11 biomechanical parameters were used as predictor variables, while the binary outcome variable—shooting success (miss = 0, hit = 1)—was used as the dependent variable.

The results of the applied binary regression analysis are presented in Table 5.

The obtained results show that the binary regression model, using all 11 considered biomechanical factors, explains 3.6% of the variance according to the Cox & Snell R² and 2.7% according to the Nagelkerke R², indicating a weak association with the data (values greater than 0 but less than 0.4) [53]. The Hosmer and Lemeshow test returned a significance value of 0.897, which is well above 0.05, indicating a good fit between the data and the model [54,55] and confirming that the model is well calibrated [56]. Additionally, none of the 11 factors were excluded due to multicollinearity, suggesting acceptable correlations among the predictors. The accuracy of random classification was calculated as

{(168 / 356)}^{2} + {(188 / 356)}^{2} = 0.501573

, which corresponds to 50.15%. In contrast, the binary regression model achieved a classification accuracy of 56.7%, which is clearly better than random guessing [57]. Given that the dataset consists of 188 instances labeled as miss and 168 as hit, it can be considered balanced. This balance justifies the use of the AUC-ROC as the primary performance measure for evaluating binary classification models. Since model quality is significantly reflected by the AUC-ROC value [58], this metric is determined in the second step of Algorithm 2, where the analysis continues. The validity of proceeding is supported by the acceptable Hosmer and Lemeshow test result of 0.897.

In this second step, three classification algorithms—each representing a different category as per the WEKA tool’s taxonomy—were applied: Random Forest (Trees group), AdaBoost (Meta group), and PART (Rules group). The model performance was evaluated using the 10-fold cross-validation method [59]. Also, the results obtained in Table 6 show that the proposed asymmetric optimization model using feature selection has better characteristics compared to other considered classifiers taken from the same groups of Weka classifiers, as done in Step 2 of this algorithm, Table 6.

The performance metrics for all three classification algorithms are presented in Table 6, which shows that Random Forest achieved the most accurate prediction results, especially considering the AUCROC as the most important evaluation metric.

In the third step of the proposed Algorithm 2, a feature selection procedure was applied using three algorithms available in the WEKA tool: two filter methods—ReliefF and CorrelationAttributeEval—and one wrapper method, GreedyStepwise. By performing an intersection operation on the sets of selected and non-selected attributes generated by each of the three feature selection algorithms, the authors identified the common attributes deemed important by all methods. As shown in Table 7, the following eight biomechanical factors were determined to be significant: target distance, shooter experience, trigger pull force, light, ammunition quality, temperature, humidity, and precision.

In the fourth step of the proposed Algorithm 2, the model checks whether the most important performance measure—AUC-ROC, calculated using the Random Forest classifier (identified in Step 2 as the best-performing algorithm)—improves when using the reduced and optimized set of eight selected factors. If the AUCROC value improves, the model then verifies the goodness of fit of the binary regression model constructed using these reduced factors. If both conditions are met, the procedure proceeds to Step 5, where Algorithm 2 is applied to the next dataset (i.e., for the next pistol type) following the same sequence of steps. Otherwise, the algorithm terminates, as no further optimization through factor reduction is possible.

In our case, the AUC-ROC value improved when using the reduced set of eight factors, as shown in Table 8, and the binary regression model also demonstrated good calibration, as evidenced by the results in Table 9. Therefore, we concluded the analysis for the Beretta 92 dataset using the optimized set of eight factors.

Since we have two additional datasets for Colt 1911 and Glock 17 remaining, we repeated Steps 1 through 5 of the proposed Algorithm 2 for these two pistol types, following the same methodology. The obtained results identified the important factors:

The following eight important factors for the Colt 1911 dataset:

Target distance, shooter experience, grip stability, trigger pull force, light, temperature, humidity, and precision.

For the Glock 17 dataset, seven important factors were identified:

Wind speed, light, heart rate, ammunition quality, temperature, humidity, and precision.

4.3. Third Layer of Proposed Model: Final Decision Making Using Voting

In the third layer of the proposed model, a comparison was made based on the sets of significant factors identified in the previous step—i.e., in the Base Layer of the model—for each of the considered pistol types. By applying a majority voting approach, the model determined which of the pistol types shares the highest number of significant factors with the one being considered for replacement, in this case, the Colt 1911. The results of this comparison are presented in Table 10.

From the data presented in Table 10, the following is evident based on the application of majority voting [60]:

It can be concluded that only three of the considered factors are not important for successful shooting using the three types of pistols considered: Colt 1911, Beretta 92, and Glock 17
It can be concluded that the Beretta 92 pistol shares seven out of eight significant factors with the Colt 1911, while only one factor is not common. In contrast, the Glock 17 shares only four significant factors with the Colt 1911, while the remaining four are not matched. A logical conclusion follows from this analysis: the decision made by the U.S. Army to replace the Colt 1911 with the Beretta 92 was well founded and justified.

4.4. Discussion of the Obtained Results

Conducting the described case study regarding the U.S. Army’s choice of a standard-issue pistol at the end of the 20th century, based on the results obtained and presented in the first three subsections, in this Section 4 of the paper, the authors have successfully achieved the main goal of this paper:

The results demonstrated the feasibility of constructing an ensemble ML model capable of forensically evaluating the correctness of a weapon selection decision.
The results proved the possibility of constructing such a model, which is usable in solving many similar forensics problems in different fields of human life.

Furthermore, the obtained results confirm both hypotheses stated in the Section 1 of this paper:

The conditional hypothesis, asserting that it is possible to use synthetically generated data—such as that produced by ChatGPT-4 or similar tools—in decision analysis;
The final hypothesis, stating that it is possible to construct a novel ensemble model that addresses the problem of handgun type selection forensics more effectively than existing state-of-the-art models.

As the authors already mentioned in Section 1, they did not find in the literature a similar or equivalent multi-layered forensic decision model. However, in the middle of the three layers—the Base Layer—a new ensemble algorithm was proposed using an asymmetric optimization procedure that identifies the most important criteria in a multi-criteria decision-making problem, commonly encountered in both the literature and practice. The advantages of the proposed model, particularly the methodology of this layer, are presented in Table 8, and the results confirm the overall benefits of the entire model.

The limitations of the proposed model include the inherent constraints of synthetic datasets and the relatively longer implementation time required for executing the stacking ensemble algorithm used in the Base Layer of the proposed model. However, neither limitation significantly impacts the practical applicability of the model, especially in scenarios involving incomplete or unavailable real-world data and the development of new analytical frameworks.

The 10-fold cross-validation technique is used in this paper as a valuable method for assessing the quality of synthetic data, as it helps determine how well the synthetic data represents real-world conditions [61]. Specifically, by dividing the dataset into subsets used for training and validation, cross-validation offers several advantages:

Reducing overfitting: It helps identify overfitting issues by evaluating model performance on data that was not seen during training.
Providing a realistic assessment of performance: By using different subsets for training and validation, it offers a more robust evaluation of how well the model generalizes to new, unseen data.
Detecting data quality issues: If the synthetic data does not accurately represent real-world data, poor performance on validation sets may reveal underlying quality issues.
Improving model generalization: By validating models on different data partitions, it ensures the model learns generalizable patterns rather than memorizing training data.

Cross-validation is especially important when working with synthetic data, as it ensures the synthetic dataset is useful. In our case, an alternative validation technique—benchmarking against real data—is not feasible.

By 2025, it is expected that 70% of enterprises will be using synthetic data for AI and analytics, underscoring the critical role synthetic data plays in AI and machine learning development [61]. However, the success of AI models heavily depends on the quality of the training data. Ensuring data realism remains a major challenge.

According to [62], synthetic data is playing an increasingly significant role not only in forensics but also in other domains of human activity. For example, it is projected that by 2030, all training and test data for AI models could be synthetic. Furthermore, some authors already propose the use of synthetic data in digital traffic to address various security issues in digital communication [63].

The authors plan the following directions for future research:

In a methodological context:

To explore the possibility of using new synthetic multi-criteria decision analysis approaches, such as fuzzy methods and group decision-making techniques, to enhance the characteristics of the proposed model [64]. Of particular interest for the first layer of the model are approaches related to large-group decision making, such as the rough integrated asymmetric cloud model under a multi-granularity linguistic environment [65].
To include a broader set of classification and feature selection algorithms in order to introduce n-modular redundancy into the second layer of the proposed ensemble algorithm [66]. This enhancement aims to improve prediction performance in similar problem domains across various fields.
To incorporate an ablation study within the ensemble of machine learning (ML) and multi-criteria decision-making (MCDM) components. Such a study would systematically remove or disable components of the ensemble model to evaluate their individual contributions, thereby identifying which components are most impactful and guiding further performance improvements.

In the application context:

Given that the model is structurally organized for forensic decision analysis, it is worthwhile to explore its application in other domains beyond the choice of weapons. Potential application areas include transportation and traffic [67], biology [68], medicine [69], economics [70], and the public sector [71].

5. Conclusions

The authors addressed two basic tasks in this paper besides the main task, which was checking the possibility of constructing one model for handgun type selection forensics. These tasks are directly related to testing the two hypotheses formulated to solve the problem described in the study. The results of the conducted research, obtained using the proposed ensemble model that incorporates AHP and TOPSIS multi-criteria decision-making methods, confirmed the first (conditional) hypothesis and its associated research question: it is possible to use synthetically generated data—such as that produced by ChatGPT-4 or similar tools—when the ranking of considered alternatives based on this synthetic data aligns with real-world decisions made using actual data.

The authors also addressed the second (final) hypothesis, demonstrating that it is possible to construct a hybrid model that solves the problem of handgun type selection forensics more effectively than existing state-of-the-art models. This was achieved by aggregating MCDM methods of AHP and TOPSIS with the ML ensemble of three specific classification methods and a binary regression algorithm into a single stacking ensemble ML model, which incorporates three distinct feature selection algorithms as part of its combiner strategy and a voting method at the end. Such an obtained and proposed model leads to superior performance compared to the individual application of each method or previously known models.

Both hypotheses were validated using a case study analyzing the U.S. Army’s decision at the end of the twentieth century to replace the Colt 1911 pistol with the Beretta 92. The analysis also included a third widely used pistol—the Glock 17—and was based on synthetic data generated via ChatGPT-4, an LLM. The obtained results using the proposed model, based on one asymmetric optimization procedure, confirmed the possibility of constructing one model for handgun type selection forensics. The evaluation of results was conducted using 10-fold cross-validation in WEKA software for each applied ML algorithm and the Hosmer–Lemeshow test, along with the Cox & Snell and Nagelkerke R² tests, in SPSS software to assess the goodness of fit of the applied binary regression model.

The authors concluded that the proposed model demonstrated no significant limitations, apart from the fact that synthetic datasets are not directly derived from real-world data. However, this limitation is mitigated by the practical value of such synthetic data in evaluating newly proposed models for problems where real data is scarce or unavailable.

For future work, the authors are going to consider the possibility of using the new synthetic multi-criteria decision analysis approach and explore the inclusion of a broader set of classification and feature selection algorithms, thereby introducing n-modular redundancy into the construction of the proposed ensemble algorithm. The authors also plan to explore the possibility of using new synthetic multi-criteria decision analysis approaches, particularly fuzzy methods and group decision-making techniques, to enhance the characteristics of the proposed model.

Supplementary Materials

The following supporting information, which is used to generate dataset, can be downloaded at: https://it.fdb.edu.rs/wp-content/uploads/2023/04/ChatGPTsyntheticGeneratedDataset.zip.

Author Contributions

Conceptualization, A.A. and R.R.; Methodology, D.R.; Software, V.V. and S.I.; Validation, M.R.; Formal analysis, A.A. and R.R.; Investigation, A.A. and R.R.; Resources, V.V. and S.I.; Data curation, V.V. and S.I.; Writing—original draft, A.A., R.R. and D.R.; Writing—review & editing, D.J.; Visualization, D.J.; Supervision, D.R.; Project administration, M.R.; Funding acquisition, M.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data supporting the reported results can be found on https://it.fdb.edu.rs/wp-content/uploads/2023/04/ChatGPTsyntheticGeneratedDataset.zip accessed on 1 July 2025.

Acknowledgments

During the preparation of this manuscript/study, the authors used ChatGPT-4 for the purposes of generating the synthetic dataset. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MCDM	Multi-criteria decision making
ML	Machine learning
ROC	Receiver operating characteristic
PRC	Precision–recall curve
AUCROC	Area under the curve ROC
AUCPRC	Area under the curve PRC
AHP	Analytic hierarchy process
TOPSIS	Technique for order preference by similarity to ideal solution
LLM	Large language model
IT	Information technologies

References

Aleksić, A.; Nedeljković, S.; Jovanović, M.; Ranđelović, M.; Vuković, M.; Stojanović, V.; Radovanović, R.; Ranđelović, M.; Ranđelović, D. Prediction of Important Factors for Bleeding in Liver Cirrhosis Disease Using Ensemble Data Mining Approach. Mathematics 2020, 8, 1887. [Google Scholar] [CrossRef]
Kemiveš, A.; Ranđelović, M.; Barjaktarović, L.; Đikanović, P.; Čabarkapa, M.; Ranđelović, D. Identifying Key Indicators for Successful Foreign Direct Investment through Asymmetric Optimization Using Machine Learning. Symmetry 2024, 16, 1346. [Google Scholar] [CrossRef]
Ranđelović, M.; Aleksić, A.; Radovanović, R.; Stojanović, V.; Čabarkapa, M.; Ranđelović, D. One Aggregated Approach in Multidisciplinary Based Modeling to Predict Further Students’ Education. Mathematics 2022, 10, 2381. [Google Scholar] [CrossRef]
Aleksić, A.; Ranđelović, M.; Ranđelović, D. Using Machine Learning in Predicting the Impact of Meteorological Parameters on Traffic Incidents. Mathematics 2023, 11, 479. [Google Scholar] [CrossRef]
Mikhaylova, S.S.; Grineva, N.V. Development of a binary classification model based on small data using machine learning methods. Econ. Probl. Leg. Pract. 2024, 20, 129–140. [Google Scholar] [CrossRef]
Mišić, J.; Kemiveš, A.; Ranđelović, M.; Ranđelović, D. An Asymmetric Ensemble Method for Determining the Importance of Individual Factors of a Univariate Problem. Symmetry 2023, 15, 2050. [Google Scholar] [CrossRef]
Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
White, M.D. Identifying Situational Predictors of Police Shootings Using Multivariate Analysis. Polic. Int. J. Police Strateg. Manag. 2002, 25, 726–751. [Google Scholar] [CrossRef]
Mehmet Asaf Düzena, M.A.; Bölükbaşıa, I.B.; Çalıka, E. How to combine ML and MCDM techniques: An extended bibliometric analysis. J. Innov. Eng. Nat. Sci. 2024, 4, 642–657. [Google Scholar] [CrossRef]
Dagdeviren, M.; Yavuz, S.; Kılınç, N. Weapon Selection Using the AHP and TOPSIS Methods under Fuzzy Environment. Expert Syst. Appl. 2009, 36, 8143–8151. [Google Scholar] [CrossRef]
Ashari, H.; Parsaei, M. Application of the multi-criteria decision method ELECTRE III for the Weapon selection. Decis. Sci. Lett. 2014, 3, 511–522. [Google Scholar] [CrossRef]
Jiang, J.; Liu, X.; Garg, H.; Zhang, S. Large group decision-making based on interval rough integrated cloud model. Adv. Eng. Inform. 2023, 56, 101964. [Google Scholar] [CrossRef]
Kaya, V.; Tuncer, S.; Baran, A. Detection and Classification of Different Weapon Types Using Deep Learning. Appl. Sci. 2021, 11, 7535. [Google Scholar] [CrossRef]
Jenkins, S.; Lowrey, D. A Comparative Analysis of Current and Planned Small Arms Weapon Systems; MBA Professional Report; Naval Postgraduate School: Monterey, CA, USA, 2004. [Google Scholar]
Kukolj, M. Pištolj ili revolver za starješine JNA. Vojnoteh. Glas. 1991, 39, 24–34. [Google Scholar] [CrossRef]
Mason, B.R.; Cowan, L.F.; Gonczol, T. Factors Affecting Accuracy in Pistol Shooting. In EXCEL Publication of the Australian Institute of Sport; Fricker, P., Telford, R., Eds.; Australian Institute of Sport: Canberra, Australia, 1990; Volume 6, pp. 2–6. [Google Scholar]
Goonetilleke, R.S.; Hoffmann, E.R.; Lau, W.C. Pistol Shooting Accuracy as Dependent on Experience, Eyes Being Opened, and Available Viewing Time. Appl. Ergon. 2009, 40, 500–508. [Google Scholar] [CrossRef]
Verma, G.K.; Dhillon, A. A Handheld Gun Detection using Faster R-CNN Deep Learning. In Proceedings of the 7th International Conference on Computer and Communication Technology, Allahabad, India, 24–26 November 2017; pp. 84–88. [Google Scholar] [CrossRef]
Pugliese, R.; Regondi, S.; Marini, R. Machine Learning-Based Approach: Global Trends, Research Directions, and Regulatory Standpoints. Data Sci. Manag. 2021, 4, 19–29. [Google Scholar] [CrossRef]
Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef] [PubMed]
Nayerifard, T.; Amintoosi, H.; Bafghi, A.; Dehghantanha, A. Machine Learning in Digital Forensics: A Systematic Literature Review. arXiv 2023, arXiv:2306.04965. [Google Scholar] [CrossRef]
Krivchenkov, A.; Misnevs, B.; Pavlyuk, D. Intelligent Methods in Digital Forensics: State of the Art. In Reliability and Statistics in Transportation and Communication, RelStat 2018; Kabashkin, I., Yatskiv (Jackiva), I., Prentkovskis, O., Eds.; Lecture Notes in Networks and Systems; Springer: Cham, Switzerland, 2019; Volume 68. [Google Scholar] [CrossRef]
Ubani, S.; Polat, S.O.; Nielsen, R. ZeroShotDataAug: Generating and Augmenting Training Data with ChatGPT. arXiv 2023, arXiv:2304.14334. [Google Scholar] [CrossRef]
Lingo, R. Exploring the Potential of AI-Generated Synthetic Datasets: A Case Study on Telematics Data with ChatGPT. arXiv 2023, arXiv:2306.13700. [Google Scholar] [CrossRef]
Domínguez-Almendros, S.; Benítez-Parejo, N.; Gonzalez-Ramirez, A.R. Logistic Regression Models. Allergol. Immunopathol. 2011, 39, 295–305. [Google Scholar] [CrossRef]
SPSS Statistics 17.0 Brief Guide. Available online: http://www.sussex.ac.uk/its/pdfs/SPSS_Statistics_Brief_Guide_17.0.pdf (accessed on 20 March 2025).
Romero, C.; Ventura, S.; Espejo, P.; Hervas, C. Data Mining Algorithms to Classify Students. In Proceedings of the 1st International Conference on Educational Data Mining (EDM08), Montreal, QC, Canada, 20–21 June 2008; pp. 20–21. [Google Scholar]
Witten, H.; Eibe, F. Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed.; Morgan Kaufmann: Burlington, MA, USA, 2005. [Google Scholar]
Benoit, G. Data Mining. Annu. Rev. Inf. Sci. Technol. 2002, 36, 265–310. [Google Scholar] [CrossRef]
Gong, M. A Novel Performance Measure for Machine Learning Classification. Int. J. Manag. Inf. Technol. 2021, 13, 11–19. [Google Scholar] [CrossRef]
Watson, D.; Reichard, K.; Isaacson, A. A Case Study Comparing ROC and PRC Curves for Imbalanced Data. Annu. Conf. PHM Soc. 2023, 15. [Google Scholar] [CrossRef]
University of Waikato. WEKA. Available online: http://www.cs.waikato.ac.nz/ml/weka (accessed on 20 March 2025).
Friedman, J.; Hastie, T.; Tibshirani, R. Additive Logistic Regression: A Statistical View of Boosting. Ann. Stat. 2000, 28, 337–407. [Google Scholar] [CrossRef]
Loh, W.Y. Classification and Regression Trees. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2011, 1, 14–23. [Google Scholar] [CrossRef]
Rucco, M.; Giannini, F.; Lupinetti, K.; Monti, M. A methodology for part classification with supervised machine learning. Artif. Intell. Eng. Des. Anal. Manuf. 2018, 33, 1–14. [Google Scholar] [CrossRef]
Kumbhakar, S.C.; Lovell, C.A.K. Stochastic Frontier Analysis; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
MacKay, D. Information Theory, Inference, and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
Mitchell, T. Machine Learning; McGraw-Hill Science/Engineering/Math: New York, NY, USA, 1997. [Google Scholar]
Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach; Prentice Hall: Hoboken, NJ, USA, 2003. [Google Scholar]
Urbanowicz, R.J.; Meeker, M.; La Cava, W.; Olson, R.S.; Moore, J.H. Relief-Based Feature Selection: Introduction and Review. arXiv 2018, arXiv:1711.08421. [Google Scholar] [CrossRef]
Sugianela, Y.; Ahmad, T. Pearson Correlation Attribute Feature Selection Evaluation-based for Intrusion Detection System. In Proceedings of the International Conference on Smart Technology and Applications (ICoSTA) 2020, Surabaya, Indonesia, 20 February 2020; pp. 1–5. [Google Scholar] [CrossRef]
Blessie, E.C.; Eswaramurthy, K. Sigmis: A Feature Selection Algorithm Using Correlation-Based Method. J. Algorithms Comput. Technol. 2012, 6, 385–394. [Google Scholar] [CrossRef]
Programiz. Greedy Algorithm. Available online: https://www.programiz.com/dsa/greedy-algorithm (accessed on 15 March 2025).
Girish, S.; Chandrashekar, F. A Survey on Feature Selection Methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
Zhou, Z.H. Ensemble Methods Foundations and Algorithm; Chapman and Hall/CRC: New York, NY, USA, 2012. [Google Scholar] [CrossRef]
Ahad, A. Vote-Based: Ensemble Approach. Sak. Univ. J. Sci. 2021, 25, 858–866. [Google Scholar] [CrossRef]
Yousefi, Z.; Alesheikh, A.A.; Jafari, A.; Torktatari, S.; Sharif, M. Stacking Ensemble Technique Using Optimized Machine Learning Models with Boruta–XGBoost Feature Selection for Landslide Susceptibility Mapping: A Case of Kermanshah Province, Iran. Information 2024, 15, 689. [Google Scholar] [CrossRef]
Goyal, M.; Mahmoud, Q.H. A Systematic Review of Synthetic Data Generation Techniques Using Generative AI. Electronics 2024, 13, 3509. [Google Scholar] [CrossRef]
SANNA. Available online: https://nb.vse.cz/~jablon/sanna.htm (accessed on 15 May 2025).
AHP Online System—AHP–OS. Available online: https://bpmsg.com/ahp/ahp.php (accessed on 15 May 2025).
Saaty, T.L. The Analytic Hierarchy Process; McGraw-Hill: New York, NY, USA, 1980. [Google Scholar]
Yoon, K.; Hwang, C.L. Multiple Attribute Decision Making: An Introduction; Sage Publications: Thousand Oaks, CA, USA, 1995; Volume 104. [Google Scholar] [CrossRef]
Shah, M. Re: Could Someone Explain Me about Nagelkerke R Square in Logit Regression Analysis? 2023. Available online: https://www.researchgate.net/post/Could_someone_explain_me_about_Nagelkerke_R_Square_in_Logit_Regression_analysis/63f330cfe22cf468000037c9/citation/download (accessed on 10 May 2025).
Hosmer, D.W.; Hosmer, T.; le Cessie, S.; Lemeshow, S. A comparison of goodness-of-fit tests for the logistic regression model. Stat. Med. 1997, 16, 965–980. [Google Scholar] [CrossRef]
Hosmer, D.W.; Lemeshow, S. Applied Logistic Regression, 2nd ed.; John Wiley and Sons Inc.: New York, NY, USA, 2000. [Google Scholar]
Monteiro, A.d.R.D.; Feital, T.d.S.; Pinto, J.C. A Numerical Procedure for Multivariate Calibration Using Heteroscedastic Principal Components Regression. Processes 2021, 9, 1686. [Google Scholar] [CrossRef]
Hair, J.F.; Anderson, R.E.; Tatham, R.L.; Black, W.C. Multivariate Data Analysis; Prentice-Hall, Inc.: New York, NY, USA, 1998. [Google Scholar]
Yang, T.; Ying, Y. AUC Maximization in the Era of Big Data and AI: A Survey. ACM Comput. Surv. 2022, 37, 1–37. [Google Scholar] [CrossRef]
Gorriz, J.M.; Segovia, F.; Ramirez, J.; Ortiz, A.; Suckling, J. Is K-fold cross validation the best model selection method for Machine Learning? arXiv 2024, arXiv:2401.16407. [Google Scholar] [CrossRef]
Emerson, P. Majority Voting—A Critique Preferential Decision-Making—An Alternative. J. Politics Law 2024, 17, 47–57. [Google Scholar] [CrossRef]
Abramov, M. Ensuring Quality and Realism in Synthetic Data. Available online: https://keymakr.com/blog/ensuring-quality-and-realism-in-synthetic-data/ (accessed on 20 July 2025).
Abbasi-Azar, M.; Teimouri, M.; Nikray, M. Blind protocol identification using synthetic dataset: A case study on geographic protocols. Forensic Sci. Int. Digit. Investig. 2025, 53, 301911. [Google Scholar] [CrossRef]
Göbel, T.; Schäfer, T.; Hachenberger, J.; Türr, J.; Baier, H. A Novel Approach for Generating Synthetic Datasets for Digital Forensics. In Advances in Digital Forensics XVI. DigitalForensics 2020; Peterson, G., Shenoi, S., Eds.; IFIP Advances in Information and Communication Technology; Springer: Cham, Switzerland, 2020; Volume 589. [Google Scholar] [CrossRef]
Anand, M.C.; Kalaiarasi, K.; Martin, N.; Ranjitha, B.; Priyadharshini, S.S.; Tiwari, M. Fuzzy C-Means Clustering with MAIRCA -MCDM Method in Classifying Feasible Logistic Suppliers of Electrical Products. In Proceedings of the 2023 First International Conference on Cyber Physical Systems, Power Electronics and Electric Vehicles (ICPEEV), Hyderabad, India, 28–30 September 2023; pp. 1–7. [Google Scholar] [CrossRef]
Jiang, J.; Liu, X.; Wang, Z.; Ding, W.; Zhang, S.; Xu, H. Large group decision-making with a rough integrated asymmetric cloud model under multi-granularity linguistic environment. Inf. Sci. 2024, 678, 120994. [Google Scholar] [CrossRef]
Lyons, E.; Vanderkulk, W. The Use of Triple-Modular Redundancy to Improve Computer Reliability. IBM J. Res. Dev. 1962, 6, 200–209. [Google Scholar] [CrossRef]
Manzolli, J.A.; Yu, J.; Miranda-Moreno, L. Synthetic multi-criteria decision analysis (S-MCDA): A new framework for participatory transportation planning. Transp. Res. Interdiscip. Perspect. 2025, 31, 101463. [Google Scholar] [CrossRef]
Sanduleanu, S.; Ersahin, K.; Bremm, J.; Talibova, N.; Damer, T.; Erdogan, M.; Kottlors, J.; Goertz, L.; Bruns, C.; Maintz, D.; et al. Feasibility of GPT-3.5 versus Machine Learning for Automated Surgical Decision-Making Determination: A Multicenter Study on Suspected Appendicitis. AI 2024, 5, 1942–1954. [Google Scholar] [CrossRef]
Chowdhury, N.K.; Kabir, M.A.; Rahman, M.; Islam, S.M.S. Machine learning for detecting COVID-19 from cough sounds: An ensemble-based MCDM method. Comput. Biol. Med. 2022, 145, 105405. [Google Scholar] [CrossRef]
Chowdhury, S.J.; Mahi, M.I.; Saimon, S.A.; Urme, A.N.; Nabil, R.H. An Integrated Approach of MCDM Methods and Machine Learning Algorithms for Employees’ Churn Predict. In Proceedings of the 2023 3rd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), Dhaka, Bangladesh, 7–8 January 2023; pp. 68–73. [Google Scholar] [CrossRef]
Fischer-Abaigar, U.; Kern, C.; Barda, N.; Kreuter, F. Bridging the gap: Towards an expanded toolkit for AI-driven decision-making in the public sector. Gov. Inf. Q. 2024, 41, 101976. [Google Scholar] [CrossRef]

Figure 1. The functional block schema of a three-layer model for handgun type selection forensics.

Figure 2. An ensemble voting algorithm for determining the most similar of three (general case m) types of guns based on previously determined significant factors for their shooting.

Figure 3. Stacking ensemble algorithm for determination significant factors for pistol shooting.

Table 1. The confusion matrix for the two-class classifier.

		Predicted Class
		Positive	Negative
Actual class	Positive	TP (true positive)	FN (false negative)
Actual class	Negative	FP (false positive)	TN (true negative)

Table 2. Using AHP for determination of importance of factors in considered problem.

	C1	C2	C3	C4	C5	C6	C7	C8	C9	C10	C11	C12	C13	Weights
Distance_to_Target/C1	1	1	7/5	7/3	7/3	7/3	7/5	7/5	7	7	7/9	1	7/9	0.107
Wind_Speed/C2		1	7/5	7/3	7/3	7/3	7/5	7/5	7	7	7/9	1	7/9	0.107
Shooter_Experience/C3			1	5/3	5/3	5/3	1	1	5	5	5/9	5/7	5/9	0.076
Grip_Stability/C4				1	1	1	3/5	3/5	3	3	3/9	3/7	3/9	0.046
Trigger_Pull_Force/C5					1	1	3/5	3/5	3	3	3/9	3/7	3/9	0.046
Light_Conditions/C6						1	3/5	3/5	3	3	3/9	3/7	3/9	0.046
Heart_Rate/C7							1	1	5	5	5/9	5/7	5/9	0.076
Ammo_Quality/C8								1	5	5	5/9	5/7	5/9	0.076
Temperature/C9									1	1	1/9	1/7	1/9	0.015
Humidity/C10										1	1/9	1/7	1/9	0.015
Precision/C11											1	9/7	1	0.138
Outcome/C12												1	7/9	0.107
Price/C13													1	0.138

Table 3. The multi-criteria decision matrix for the evaluation of the suitability of the generated data.

	1. Alternative Colt1911	2. Alternative Bereta92	3. Alternative Glock17	Weights
Distance_to_Target	31.09893188	28.89752522	30.10498801	0.10700
Wind_Speed	9.987965214	10.13798654	9.818743842	0.10700
Shooter_Experience	0.67721519	0.662921348	0.716463415	0.07600
Grip_Stability	5.48497407	5.583323259	5.342132363	0.04600
Trigger_Pull_Force	3.523007728	3.482858324	3.48476578	0.04600
Light_Conditions	560.5809053	540.6561038	547.8948968	0.04600
Heart_Rate	117.9697391	117.5425839	119.9949317	0.07600
Ammo_Quality	0.683544304	0.643258427	0.670731707	0.07600
Temperature	15.02015166	14.85945511	15.00268663	0.01500
Humidity	50.66014474	50.28772977	48.44931696	0.01500
Precision	74.45222776	74.78560132	72.75204747	0.13800
Outcome	0.522151899	0.471910112	0.548780488	0.10700
Price	74	111	100	0.13800

Table 4. TOPSIS results applied on a normalized decision matrix for evaluation of the suitability of the generated data.

	1. Alternative Colt1911	2. Alternative Bereta92	3. Alternative Glock17	Weights
Distance_to_Target	0.0643	0.0598	0.0623	0.107
Wind_Speed	0.0622	0.0631	0.0611	0.107
Shooter_Experience	0.0436	0.0427	0.0461	0.076
Grip_Stability	0.0268	0.0272	0.0261	0.046
Trigger_Pull_Force	0.0269	0.0266	0.0266	0.046
Light_Conditions	0.0272	0.0263	0.0266	0.046
Heart_Rate	0.0439	0.0438	0.0447	0.076
Ammo_Quality	0.0453	0.0426	0.0445	0.076
Temperature	0.0087	0.0086	0.0087	0.015
Humidity	0.0088	0.0088	0.0084	0.015
Precision	0.0807	0.0810	0.0788	0.138
Outcome	0.0630	0.0569	0.0662	0.107
Price	0.0616	0.0925	0.0833	0.138
di−	0.21318	0.03101	0.02405
di+	0.03114	0.01131	0.00999
ci	0.213	0.732	0.706

Table 5. Binary regression Bereta 92 dataset with all 11 factors.

Binary Regression: Variables in the Equation
Step 1			B	S.E.	Wald	Df	Sig.	Exp(B)	95% C.I. for EXP(B)
									Lower	Upper
Distance_to_Target			−0.005	0.009	0.285	1	0.594	0.995	0.977	1.013
Wind_Speed			−0.037	0.018	4.232	1	0.040	0.963	0.930	0.998
Shooter_Experience			0.017	0.224	0.006	1	0.939	1.017	0.656	1.577
Grip_Stability			0.096	0.158	0.368	1	0.544	1.101	0.807	1.500
Trigger_Pull_Force			0.188	0.114	2.729	1	0.099	1.206	0.966	1.507
Light_Conditions			0.000	0.000	0.159	1	0.690	1.000	0.999	1.001
Heart_Rate			−0.001	0.003	0.090	1	0.765	0.999	0.992	1.006
Ammo_Quality			0.111	0.226	0.243	1	0.622	1.118	0.718	1.741
Temperature			0.009	0.008	1.341	1	0.247	1.009	0.994	1.024
Humidity			0.005	0.005	0.982	1	0.322	1.005	0.996	1.014
Precision			−0.013	0.017	0.584	1	0.445	0.987	0.954	1.021
Classification Table ^a,b
	Observed					Predicted				Percentage Correct
						Outcome
						0		1
Step 1	Outcome			0		128		60		68.1
	Outcome			1		94		74		44.0
	Overall Percentage									56.7
		a. Constant is included in the model. b. The cut-off value is 0.500.
		Model Summary
Step	−2 Log likelihood				Cox–Snell R Square				Nagelkerke R Square
1	483.891 ^c				0.036				0.027
		c. Estimation terminated at iteration 3 because parameter estimates changed by less than 0.001.
		Hosmer and Lemeshow Test
Step		Chi-square				df			Sig.
1		3.528				8			0.897

Table 6. Performance indicators for Bereta92: classification using all 11 factors.

	Precision	Recall	F1 Measure	AUCROC
RandomForest	0.479	0.483	0.479	0.461
Ada Boost	0.441	0.458	0.437	0.433
PART	0.422	0.466	0.409	0.444

Table 7. Feature selection algorithms Bereta92: selection of important factors.

	Majority Voting	Relief	CorrelationAttributeEval	GreedyStepwise
Distance_to_Target	●	●		●
Wind_Speed			●
Shooter_Experience	●	●	●
Grip_Stability		●
Trigger_Pull_Force	●	●	●
Light_Conditions	●	●	●
Heart_Rate		●
Ammo_Quality	●	●	●
Temperature	●	●	●
Humidity	●	●	●
Precision	●	●	●

Table 8. Performance indicators for Bereta92: RandomForest classifier using 8 factors.

	Precision	Recall	F1 Measure	AUCROC
RandomForest—8 factors	0.502	0.506	0.502	o.489
RandomForest—11 factors	0.479	0.483	0.479	0.461
JRip	0.437	0.441	0.438	0.417
Bagging	0.440	0.444	0.441	0.444
REPTree	0.478	0.486	0.476	0.457

Table 9. Binary regression using Bereta92 dataset with reduced 8 factors.

Binary Regression: Variables in the Equation
Step 1			B	S.E.	Wald	df	Sig.	Exp(B)	95% C.I. for EXP(B)
									Lower	Upper
Distance_to_Target			−0.009	0.009	0.951	1	0.329	0.991	0.974	1.009
Shooter_Experience			0.001	0.220	0.000	1	0.996	1.001	0.650	1.542
Trigger_Pull_Force			0.118	0.107	1.222	1	0.269	1.125	0.913	1.387
Light_Conditions			0.000	0.000	0.613	1	0.433	1.000	0.999	1.000
Ammo_Quality			0.043	0.222	0.037	1	0.847	1.044	0.675	1.614
Temperature			0.007	0.007	0.935	1	0.334	1.007	0.993	1.022
Humidity			0.003	0.004	0.388	1	0.534	1.003	0.994	1.011
Precision			−0.005	0.004	1.217	1	0.270	0.995	0.987	1.004
Classification Table ^a,b
	Observed					Predicted				Percentage Correct
						Outcome
						0		1
Step 1	Outcome			0		124		64		66.0
	Outcome			1		108		60		35.7
	Overall Percentage									51.7
		a. Constant is included in the model. b. The cut-off value is 0.500.
		Model Summary
Step	−2 Log likelihood				Cox–Snell R Square				Nagelkerke R Square
1	489.011 ^c				0.013				0.017
c. Estimation terminated at iteration 3 because parameter estimates changed by less than 0.001.
		Hosmer and Lemeshow Test
Step		Chi-square				Df				Sig.
1		9.335				8				0.315

Table 10. Final decision making: majority voting on which type has most identical significant factors.

	Majority Voting	Colt1911	Bereta92	Glock17
Distance_to_Target	●	●	●
Wind_Speed				●
Shooter_Experience	●	●	●
Grip_Stability		●
Trigger_Pull_Force	●	●	●
Light_Conditions	●	●	●	●
Heart_Rate				●
Ammo_Quality	●		●	●
Temperature	●	●	●	●
Humidity	●	●	●	●
Precision	●	●	●	●

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Aleksić, A.; Radovanović, R.; Joksimović, D.; Ranđelović, M.; Vuković, V.; Ilić, S.; Ranđelović, D. Modeling Based on Machine Learning and Synthetic Generated Dataset for the Needs of Multi-Criteria Decision-Making Forensics. Symmetry 2025, 17, 1254. https://doi.org/10.3390/sym17081254

AMA Style

Aleksić A, Radovanović R, Joksimović D, Ranđelović M, Vuković V, Ilić S, Ranđelović D. Modeling Based on Machine Learning and Synthetic Generated Dataset for the Needs of Multi-Criteria Decision-Making Forensics. Symmetry. 2025; 17(8):1254. https://doi.org/10.3390/sym17081254

Chicago/Turabian Style

Aleksić, Aleksandar, Radovan Radovanović, Dušan Joksimović, Milan Ranđelović, Vladimir Vuković, Slaviša Ilić, and Dragan Ranđelović. 2025. "Modeling Based on Machine Learning and Synthetic Generated Dataset for the Needs of Multi-Criteria Decision-Making Forensics" Symmetry 17, no. 8: 1254. https://doi.org/10.3390/sym17081254

APA Style

Aleksić, A., Radovanović, R., Joksimović, D., Ranđelović, M., Vuković, V., Ilić, S., & Ranđelović, D. (2025). Modeling Based on Machine Learning and Synthetic Generated Dataset for the Needs of Multi-Criteria Decision-Making Forensics. Symmetry, 17(8), 1254. https://doi.org/10.3390/sym17081254

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modeling Based on Machine Learning and Synthetic Generated Dataset for the Needs of Multi-Criteria Decision-Making Forensics

Abstract

1. Introduction

2. Materials and Methods

2.1. Methods

2.1.1. Logistic (Binary) Regression

2.1.2. Classification Algorithms

2.1.3. Future Selection Techniques

2.1.4. Ensemble Methods

2.2. Materials

2.2.1. Generation and Preprocessing of the Required Dataset

2.2.2. MCDM Evaluation of the Synthetic Generated Data to Solve the Considered Problem

3. Proposed Ensemble Method for Handgun Type Selection Forensics

4. Results and Findings

4.1. First Layer of the Proposed Model: Checking the Usability of the Synthetic Generated Dataset

4.2. Second Layer of Proposed Model: Stacking Ensemble for Determining Most Important Factors

4.3. Third Layer of Proposed Model: Final Decision Making Using Voting

4.4. Discussion of the Obtained Results

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI