Next Article in Journal
Distribution Characteristics and Fractal Dimension of Continental Shale Reservoir Spaces Based on Lithofacies Control: A Case Study of the Lucaogou Formation in Jimsar Sag, Junggar Basin, Northwest China
Previous Article in Journal
A General Framework for the Multiplicity of Positive Solutions to Higher-Order Caputo and Hadamard Fractional Functional Differential Coupled Laplacian Systems
Previous Article in Special Issue
Real-Time Efficient Approximation of Nonlinear Fractional-Order PDE Systems via Selective Heterogeneous Ensemble Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Empirical Comparison of Neural Network Architectures for Prediction of Software Development Effort and Duration

by
Anca-Elena Iordan
Department of Computer Science, Technical University of Cluj-Napoca, 400027 Cluj-Napoca, Romania
Fractal Fract. 2025, 9(11), 702; https://doi.org/10.3390/fractalfract9110702 (registering DOI)
Submission received: 30 July 2025 / Revised: 21 October 2025 / Accepted: 26 October 2025 / Published: 31 October 2025

Abstract

Accurately estimating the effort and duration required for software development is one of the most important challenges in the field of software engineering. In a context where software projects are becoming increasingly complex, project managers face real difficulties in meeting established deadlines and staying within budget constraints. The purpose of this research study is to identify which type of artificial neural network is most suitable for estimating the effort and duration of software development, given the relatively small size of existing datasets. In the process of software effort and duration prediction, four datasets were used: China, Desharnais, Kemerer and Maxwell. Additionally, different types of artificial neural networks were used: Multilayer Perceptron, Fractal Neural Network, Deep Fully Connected Neural Network, Extreme Learning Machine, and Hybrid Neural Network. Another goal of this research is to analyze the impact of a new and innovative hybrid architecture, which combines Fractal Neural Network with Random Forests in the estimation process. Five metrics were used to compare the accuracy of artificial neural networks: mean absolute error, median absolute error, root mean square error, coefficient of determination, and mean squared logarithmic error. Python 3.11 programming language was used in combination with TensorFlow, Keras, and Scikit-learn libraries to implement artificial neural networks.

1. Introduction

One of the most complex and critical tasks faced by a project manager during software development is estimating the total effort and duration needed to meet the initial requirements. It is considered one of the major challenges in software engineering [1], as a more accurate estimation increases the chances of success in software project development, completion, and delivery within the specified budget and schedule.
The diversity of software projects has led to the use of many techniques for effort and duration estimation. To support project managers in their tasks, various algorithms (including those based on artificial intelligence [2]) have been used to increase the accuracy of software development effort and duration estimation. Using a dataset to build predictive models is essential for accurately estimating the effort and duration required in software engineering projects [3]. Currently, there is a wide range of datasets available, such as the Albrecht, COCOMO81, China, Desharnais, ISBSG, Kemerer, Kitchenham, Maxwell, Miyazaki, NASA, and Tukutuku datasets. Deep learning techniques [4] typically perform well on relatively large-scale datasets and have demonstrated strong capabilities in estimating target variables in classification and predictive modeling tasks. Since the size of the datasets listed above is relatively small, this study investigates the efficiency of different types of traditional artificial neural networks [5], as well as hybrid artificial neural network architectures [6] obtained by combining them with Random Forests [7], when applied to such datasets.
To support the full comprehension of this research study, the article structure was designed as follows:
  • The Section 1 clarifies the motivation that led to the choice of the research topic.
  • The Section 2 summarizes the current status and evolution regarding the estimation of software development effort and duration.
  • The Section 3 includes the rationale for selecting the four datasets used, as well as a description of their structure.
  • The Section 4 describes, in detail, the approach adopted for estimating the effort and duration associated with software development adapted to the following three categories of existing datasets: small-sized, medium-sized, and large-sized.
  • The Section 5 includes an analysis of the results obtained by the used artificial neural networks (traditional and hybrid in combination with Random Forests), after the parameter tuning process. At the same time, a new and innovative hybrid architecture, referred to as FractalNN_RF, is introduced, resulting from the combination of Fractal Neural Network with the Random Forests algorithm. By integrating these two paradigms, it is anticipated to obtain a model capable of improving the accuracy of estimates, increasing the stability of predictions, and providing superior generalization in contexts characterized by structural complexity, especially for medium-sized datasets.
  • In the Section 6, a comparison of the implemented architectures was conducted, based on five selected metrics, identifying which architecture is optimal for each type of dataset.
  • The Section 7 summarizes the relevant conclusions, highlighting the implications of implementing the proposed intelligent methods on datasets of different sizes.

2. Literature Survey

At present, numerous studies are dedicated to the estimation of software development effort and duration, each with their own strengths and limitations. Over time, various methods have been applied in these studies, including those from statistics [8], graph theory [9], heuristic approaches [10], fuzzy logic [11,12], evolutionary computation [13], machine learning [14], and artificial neural networks [15]. These studies rely either on public datasets or on private datasets belonging to specific organizations. The choice of dataset type significantly influences both the accuracy and applicability of the resulting estimations. Given the considerable number of studies dedicated to estimating the effort, duration, and cost of software development, the specialized literature includes several articles offering comparative analysis of the results obtained so far.
The research presented in [16] provides a comprehensive analysis of contemporary trends in the field of software effort estimation, with the objective of grounding future research directions. The paper presents a detailed comparison of relevant contributions, organized in reverse chronological order, highlighting the techniques used, the metrics applied, the reported methodological limitations, as well as the main conclusions drawn by various authors. Overall, the analyzed literature reveals the continuous evolution and a significant diversification of approaches in software effort estimation.
Study [17] investigates the application of machine learning techniques in estimating the effort required for software development, with a particular focus on the benefits brought by ensemble methods. The research initially identified 558 relevant papers in the field, from which, after a rigorous selection process based on quality criteria, 40 articles were retained for in-depth analysis. The study conclusions highlight that the integration of ensemble techniques, in both supervised and unsupervised learning, significantly contributes to improving the accuracy of software effort estimations.
The systematic review conducted in study [18] explores the use of ensemble learning techniques and other artificial-intelligence-based strategies in estimating the effort required for software projects. The review focuses on modern methods involving machine learning, neural networks, and large language models, with the primary goal of improving estimation accuracy. Through extensive research conducted in major scientific databases (ACM Digital Library, IEEE Xplore, ScienceDirect, and Scopus), 826 empirical and theoretical studies were identified, 66 of which were selected for detailed analysis. The findings highlight that machine-learning-based methods have become dominant, with most of the analyzed studies confirming their substantial contribution to increasing estimation accuracy and optimizing software project management. In contrast, the use of non-machine-learning artificial intelligence techniques, such as Bayesian networks, remains limited, and the adoption of large language models is still in its early stages of development and application.
The emergence of modern machine learning (ML) techniques and, more recently, automated machine learning (AutoML), has brought significant transformations to the field of software development effort estimation, contributing to increased accessibility, efficiency, and accuracy in the estimation process. Study [19] presents a systematic literature review on the application of ML and AutoML in software effort estimation, highlighting the relevance of the topic, the methods used, the identified advantages, and the volume of existing research. The adopted methodology involved selecting and analyzing 43 articles published in the last decade, based on the techniques implemented—either conventional machine learning or AutoML. The review findings indicate that in most of the analyzed studies, researchers employed ML techniques for software effort estimation, while the application of AutoML remained limited, thus revealing considerable potential for future research in this area.
The aim of the study presented in [20] is to identify the most effective method for estimating the effort required in software development, using the Long Short-Term Memory and Stacked Long Short-Term Memory machine learning algorithms. The study employs six datasets: China, Kitchenham, Kemerer, COCOMO81, Albrecht, and Desharnais. Additionally, it evaluates performance using three metrics: root mean squared error, mean absolute error, and R-squared. The results indicate that Stacked_LSTM algorithm provides the best performance across all metrics for the China, Kemerer, and Albrecht datasets. In contrast, the LSTM algorithm yielded better results for the Desharnais and Kitchenham datasets. For the China dataset, the performance of the Stacked_LSTM algorithm is demonstrated by the following evaluation metric values: 0.012 for MAE, 0.016 for RMSE, and 0.981 for R-squared. For the Desharnais dataset, the performance of the LSTM algorithm is demonstrated by the following evaluation metric values: 0.076 for MAE, 0.102 for RMSE, and 0.638 for R-squared. For the Kemerer dataset, the performance of the Stacked_LSTM algorithm is demonstrated by the following evaluation metric values: 0.170 for MAE, 0.301 for RMSE, and 0.336 for R-squared. These results demonstrate a high level of model accuracy and a strong ability to explain the variance in the data.
Study [21] presents an analysis of the use of machine learning techniques to improve software effort estimation, based on empirical datasets. Five public datasets were employed: ISBSG, NASA93, COCOMO, Maxwell, and Desharnais. The data were preprocessed by handling missing values and transforming categorical features. Four machine learning regression methods were evaluated: Linear Regression, Gradient Boosting, Random Forests, and Decision Tree. Additionally, correlation-based feature selection was applied to identify relevant feature subsets and reduce dimensionality. The comparative analysis focused on two key metrics: R-squared and root mean squared error to evaluate prediction accuracy. The results show that Linear Regression and Random Forests models significantly outperformed the other approaches for the effort estimation task when correlation-based feature selection is applied. The conclusions suggest that correlation-based feature selection can enhance machine learning models for software effort estimation.
Study [22] employed a deep learning model to estimate the effort required for software development. The data preprocessing stage involved cleaning, normalization, and handling missing values, followed by their imputation. For prediction modeling, an innovative network (Multilayer Perceptron-assisted Honey Bidirectional Gated Recurrent Feed Forward Network) was developed, supported by an adaptive optimization algorithm (A-HBa), which adjusted the model parameters to achieve superior performance. The datasets used include the Albrecht, China, Desharnais, Kemerer, Kitchenham, and COCOMO81 datasets. The evaluation, based on mean absolute error, reported values such as 0.0763 for the China dataset, 0.0737 for the Desharnais dataset, and 0.0754 for the Kemerer dataset.
Article [23] introduces the NIVIM model, a method for imputing missing values based on variational autoencoders (VAE) and synthetic data. By combining contextual and similarity-based information, the model generates an extended dataset (SDEE) and applies contextual imputation to improve data quality. NIVIM stands out for its broad applicability as a preprocessing technique and for its superior performance compared to VAE, GAIN, kNN, and MICE methods. The proposed model brings statistically significant improvements across six benchmark datasets—ISBSG, Albrecht, COCOMO81, Desharnais, NASA, and UCP—achieving an average reduction in RMSE between 11.05% and 17.72%, and in MAE between 9.62% and 21.96%. For the Desharnais dataset, the performance of the NIVIM model is highlighted by the following evaluation metric values: MAE = 0.0699, RMSE = 0.1134, and CD = 0.6432.
Accurate effort and duration estimation in software development is one of the most challenging and widely debated issues in the field. It is essential for effective project management, yet its complexity makes it a particularly difficult subject of research. Therefore, accurate effort and duration estimation in software development represents a major challenge in research.

3. Used Datasets

To accurately assess and estimate the effort and duration required for software product development, researchers in the field of software engineering rely on various datasets collected from real-world projects. Among the most well-known and frequently used datasets in software engineering are Albrecht, COCOMO81, China, Desharnais, ISBSG, Kemerer, Kitchenham, Maxwell, Miyazaki, NASA93, and Tukutuku, and detailed analysis of these datasets are presented in article [24].
Article [25] proposes a classification of datasets into three categories, based on the optimal spacing theorem formulated by Eubank [26]. According to this theorem, the quantile function of the density is divided into four intervals: Q1 (first quartile), Q2 (second quartile), Q3 (third quartile), and Q4 (fourth quartile). The first category corresponds to Q1, the second to Q2 and Q3, and the third to Q4. Based on this classification, an SEE (Software Engineering Estimation) dataset is considered small-sized if it includes, at most, 43 project instances, medium-sized if it contains between 44 and 146 instances, and large-sized if it exceeds 147 instances. In the present study, to accurately approximate software development effort and duration, four datasets were selected, one for each quartile, as follows: China (Q4), Desharnais (Q3), Kemerer (Q1), and Maxwell (Q2). The selection of these datasets was based on their relevance in the field of software engineering, the public availability of the data, the size of the datasets, and the diversity of the information included (including actual values for the software development effort and duration), thus ensuring a solid foundation for the comparative analysis and validation of the proposed methods. Table 1 presents both the number of projects analyzed in each dataset and the number of attributes used in this study. The last two columns of Table 1 include the units of measurement corresponding to the two output attributes. The unit of measurement used for the attribute representing the development duration of a software project is the calendar month.
For the China and Desharnais datasets, the effort required for software development is measured in person-hours, while for the Kemerer and Maxwell datasets, the unit of measurement for effort is person-months.
Each of the 499 projects included in the China dataset [27] contains a series of essential characteristics for the analysis and estimation of software development projects, represented by numerical values. In this study, fifteen attributes were used (thirteen as input data and two as output data). The meaning of these fifteen attributes, along with their numerical characteristics (minimum value, maximum value, mean, and standard deviation), is presented in Table 2.
Desharnais dataset [28] contains information extracted from 81 completed software projects, including variables that describe the characteristics of the projects and the teams that developed them. The meaning and numerical characteristics of the ten attributes used in this study (eight input variables and two output variables) are presented in Table 3. Among the eight input variables, the last one, labeled Language, indicates the type of programming language used and is encoded as follows: 1 for first-generation programming languages (e.g., Assembly), 2 for third-generation programming languages (e.g., C++, Java), and 3 for fourth-generation programming languages (e.g., SQL, Oracle Forms).
The Kemerer dataset [29] is a classic dataset used in the estimation of software development effort and duration, built from the acquisition of seven characteristics collected from 15 real software projects. The meaning and numerical characteristics of the seven attributes used in this study (five input variables and two output variables) are presented in Table 4. This table provides details on the distribution of these attributes and their impact on the estimation of software development effort and duration.
Each project in the Maxwell dataset [30] includes a set of essential characteristics for the analysis and estimation of software development projects, represented by numerical values. The Maxwell dataset comprises a total of 62 distinct projects, each containing 26 attributes, of which 22 are independent and 4 are dependent. Out of the four dependent attributes, the following two were used in this study: the effort required to complete the project, measured in person-hours per month, and the total development duration, measured in months. Information about the 24 Maxwell dataset attributes used in this study is presented in Table 5.
With 499 projects, the China collection is considered a large-sized SEE dataset according to the classification provided in [25]. With 15 projects, the Kemerer collection is classified as small-sized according to the previously mentioned classification. Both the Desharnais dataset, which includes data from 81 projects, and the Maxwell dataset, containing data from 62 projects, are classified as medium-sized datasets.

4. Research Approach

4.1. Selected Artificial Neural Networks

To achieve a more accurate estimation of effort and duration required for the development of a software product, the following neural network architectures were used in this study: Multilayer Perceptron (MLP), Deep Fully Connected Neural Network (DFCNN), Fractal Neural Network (FractalNN), Kernel Extreme Learning Machine (KELM), and Hybrid Artificial Neural Networks.
MLP [31] is one of the most fundamental architectures of artificial neural networks, widely used in tasks such as classification, regression, and pattern recognition. Its structure consists of three categories of layers: the input layer, output layer, and one or more hidden layers. Data flows through the network in a unidirectional manner, without forming loops. The learning process is based on the backpropagation algorithm, and weight optimization is performed using the gradient descent method.
DFCNN [32] is an advanced extension of MLP architecture, characterized by a large number of hidden layers and dense connectivity between neurons. This type of network is successfully applied to complex tasks such as regression, classification, and functional modeling. Due to its architectural depth, DFCNN is capable of learning hierarchical and abstract data representations. In the context of software development effort and duration estimation, the model can identify and model sophisticated relationships between variables, such as code size, functional complexity, or team experience level, thus generating more accurate predictions compared to shallow architectures.
FractalNN represents a modern approach to deep learning, integrating the principles of fractal geometry into neural network architecture. Through their hierarchical structure and self-similarity, they effectively model complex patterns and nonlinear dependencies. This type of network was first introduced in study [33] as an alternative to residual neural networks, a class of convolutional neural networks, initially applied to classification tasks. Inspired by fractal geometry, which investigates self-repeating patterns across multiple scales, Fractal Neural Networks employ parallel convolutional branches operating at varying levels of abstraction. These branches are subsequently combined, typically via averaging or concatenation, to generate richer, multiscale feature representations. This multi-branch design reflects fractal self-similarity and enables the effective capture of patterns distributed across different temporal scales.
KELM [34] integrates kernel functions to facilitate the learning of nonlinear relationships in high-dimensional feature spaces. Unlike traditional backpropagation-based methods, KELM uses an analytical formulation without iterative training, resulting in its faster processing speed and increased scalability. By employing kernel functions (Gaussian, polynomial, or linear), the algorithm projects data into higher-dimensional spaces, enhancing the model generalization capability. In the field of software development effort and duration estimation, KELM enables the fast and accurate modeling of complex relationships between project variables such as functional complexity, code volume, or team expertise level, offering performance comparable to or even better than that of conventional neural networks.
A hybrid artificial neural network [35] is an artificial intelligence model that combines neural networks with other machine learning algorithms in order to leverage the strengths of each and improve overall system performance. In this study, hybrid neural networks were designed in cascade architecture, where an artificial neural network (such as MLP, DFCNN, FractalNN, or ELM) was used for automatic feature extraction from the input data. The resulting features were then passed as input to a Random Forests (RF) regressor, which performed the final prediction. This modular structure enables the combination of the deep representational capabilities of neural networks with the robustness and generalization power of the Random Forests algorithm, leading to improved prediction accuracy and greater stability in the presence of noise or variability in the data.
The efficiency of the aforementioned artificial neural network architectures in estimating the effort and duration required for software product development fundamentally depends on the optimal configuration of the hyperparameters specific to each architecture. The appropriate selection of these (such as the number of hidden layers, the number of neurons per layer, learning rate, activation functions, regularization methods, and the type of optimizer), directly influences the model ability to generalize and to provide accurate and robust predictions.

4.2. Used Metrics

The accurate evaluation of the performance of selected artificial neural networks [36] is challenging due to imbalanced datasets. To achieve the aforementioned objective, the following five metrics were used: mean absolute error, median absolute error, root mean square error, coefficient of determination, and mean squared logarithmic error.
The mean absolute error (MAE) [37] represents the average of the absolute differences between predicted and actual values. The formula used to calculate MAE is presented in Equation (1).
M A E = 1 m · k = 1 m x k x k
In this formula, as well as in the next four, m denotes the total number of data, xk represents the true value, and xk″ indicates the predicted value. Median absolute error (MdAE) [38] computes the median of all absolute differences between the actual effort and the estimated effort, as defined by the formula shown in Equation (2).
M d A E = m e d i a n x k x k k = 1 m
The root mean square error (RMSE) [39] measures the standard deviation of the prediction errors. The mathematical expression used to calculate the root mean square error is given in Equation (3).
R M S E = 1 m · k = 1 m x k x k 2
The coefficient of determination (CD) [40] is defined as one minus the ratio of the sum of squared residuals to the total sum of squares, as shown in Equation (4).
C D = 1 k = 1 m x k x k 2 k = 1 m x k x 2
where
x = 1 m · k = 1 m x k .
The mean squared logarithmic error (MSLE) [41] measures the average squared difference between the logarithms of the predicted and actual values. This metric is useful when you want to penalize underestimates more than overestimates, especially when dealing with data spanning several orders of magnitude. The mathematical formula for MSLE is presented in Equation (6).
M S L E = 1 m · k = 1 m log x k + 1 l o g ( x k + 1 ) 2
Values closer to one for CD and values closer to zero for the other metrics indicate a higher prediction accuracy. The characteristic values for these five metrics were computed using functions from the sklearn.metrics module, which is part of the Scikit-learn library [42].

4.3. Software Design for Effort and Duration Estimation

To successfully achieve the proposed objectives, an intelligent software system based on artificial neural networks was developed. Its functionalities, which reflect the usual workflow across the stages of selection, training, testing, and comparison of neural networks, are represented in the UML use case diagram [43], shown in Figure 1. The use case diagram includes one actor (the user who interacts directly with the intelligent software system), eighteen use cases, and the functional relationships between them. Analyzing the eighteen use cases, the functionalities of the intelligent software system are embodied in the following main activities:
  • The selection of the dataset is followed by its normalization using the Min-Max scaler technique. The normalization process is implemented using the MinMaxScaler function from the sklearn.preprocessing library [42], which enables the rescaling of dataset values into a standardized interval [0, 1]. After normalization, the dataset is partitioned into training and testing subsets, with approximately 80% of the data used for training and the remaining 20% for testing, ensuring a proper separation for model evaluation.
  • The selection of the neural network type is followed by the process of parameter tuning, training, and testing, based on the previously specified dataset.
  • The metric values of the trained and tested model are saved for the purpose of comparing neural networks and identifying the most efficient architecture for each dataset category, which are as follows: small-sized, medium-sized, and large-sized. All five used evaluation metrics were computed on the normalized datasets, after applying Min-Max normalization. This approach guarantees that differences in original units of effort across datasets (person-hours vs. person-months) do not affect the comparability of results.
Before the parameter tuning process, the datasets were partitioned for training and testing purposes. For each dataset under analysis, an optimal splitting strategy was applied, leading to the selection of approximately 80% of the data for training and the remaining 20% for testing. Table 6 provides details on the number of software effort values (columns 2 and 7), the minimum value (columns 3 and 8), the maximum value (columns 4 and 9), the mean (columns 5 and 10), and the standard deviation (columns 6 and 11), corresponding to both the training and testing phases of the intelligent methods.
In a similar manner, Table 7 presents detailed statistics regarding the software development duration, including the number of values for software development duration (columns 2 and 7), minimum values (columns 3 and 8), maximum values (columns 4 and 9), mean values (columns 5 and 10), and standard deviations (columns 6 and 11), corresponding to both the training and testing phases of the intelligent methods.
To implement the functionalities described in the UML use case diagram, shown in Figure 1, Python programming language [44] was chosen, due to its versatility and extensive support for the development of applications based on artificial intelligence. In support of this approach, four specialized libraries were used, each having an essential role in the development and experimentation process. Keras library [45] was used to define and train neural networks, providing a high-level, intuitive, and efficient interface for building complex models. TensorFlow 2.14.0 [46], on which Keras is based, was responsible for the efficient execution of numerical operations and for managing the computational graph, thus ensuring the scalability and performance required in the learning process. For the visualization of the results and the graphical analysis of the performance of the models, the Matplotlib 3.9.4 library [47] was used, which allows for the generation of detailed and customizable graphs. Additionally, the Scikit-learn 1.6.1 library [42] was integrated for data preprocessing, feature selection, dataset partitioning, and model performance evaluation. Together, these tools provided a robust and flexible framework for implementing, testing, and validating the functionalities specified in the UML model.

5. Analysis of Implemented Artificial Neural Networks

In most artificial neural networks, parameters are essential variables used to learn the characteristics of the dataset and to adjust the learning process with the goal of achieving optimal performance. Parameter tuning [48] procedure, aimed at identifying the ideal configuration for each neural network to ensure that the predicted outcomes are as accurate and efficient as possible, was applied in this research to all the models under analysis.

5.1. Multilayer Perceptron

MLP implementation was carried out using the MLPRegressor function belonging to the sklearn.neural_network library, with multiple configurations tested based on different hyperparameter values. The MLP architecture used in this study is characterized by the following components:
  • The input layer contains a number of neurons automatically determined by the number of input attributes in dataset: 13 neurons for the China dataset, 8 for the Desharnais dataset, 5 for the Kemerer dataset, and 22 for the Maxwell dataset.
  • The hidden layer consists of a variable number of neurons, ranging from 20 to 200, incremented in steps of 20.
  • The output layer includes two neurons, each corresponding to one of the two following estimated values: software development effort and duration.
  • The ReLU activation function is used for the hidden layer.
  • The model is trained using the Adam optimizer.
  • The number of epochs varies between 100 and 1000, in increments of 100.
Following the used values in the parameter tuning process, 10 values for parameter e (number of epochs) and 10 values for parameter n (number of neurons from hidden layer), and 100 configurations of the MLP neural network were trained. The performance of each configuration was evaluated using the five selected metrics. In Table A1, the third column presents the optimal values obtained for the five metrics applied to the 100 configurations of MLP network. The fourth and fifth columns indicate the values of the hyperparameters corresponding to the configurations for which these optimal performances were achieved for each metric. Columns six, seven, and eight in Table A1 present information related to the estimated effort, while the last three columns provide details about the estimated duration, according to the MLP model for which the optimal values of the evaluation metrics were obtained. For the China dataset, three distinct MLP configurations were identified, each corresponding to the optimal values obtained for the five used metrics. It is noteworthy that the optimal values for RMSE, CD, and MSLE were produced by the same hyperparameter configuration. For Desharnais dataset, two optimal MLP configurations were identified; one configuration yielded the lowest values for MAE and MdAE, while another led to the best results for RMSE, CD, and MSLE. For the Kemerer and Maxwell datasets, a single hyperparameter configuration simultaneously yielded optimal results across all five metrics, suggesting a higher degree of model stability and robustness in these particular contexts.

5.2. Deep Fully Connected Neural Network

DFCNN, a feedforward and fully connected network, was designed with an input layer, ten fully connected hidden layers, and an output layer to solve a multivariate regression problem with two continuous output variables. It was trained with the objective of identifying the most performant combination of hyperparameters, specifically, the number of epochs and the number of nodes, based on the values obtained for the evaluation metrics. The DFCNN architecture used in this study is characterized by the following components:
  • The input and output layers were designed with the same structure as in the MLP network.
  • Ten hidden layers were implemented using Dense class from tensorflow.keras.layers, each of them using ReLU activation function. The number of neurons in each hidden layer is consistent within a given configuration and varies across experimental runs, ranging from 20 to 200 neurons, in increments of 20.
  • The model is trained using the Adam optimizer algorithm, with MSE employed as the loss function.
  • The number of epochs varies between 100 and 1000, in increments of 100.
This deep architecture allows for a flexible representation of complex nonlinear relationships within the data, while the systematic hyperparameter tuning aims to identify robust configurations that generalize well across different software engineering datasets. Based on the hyperparameter tuning process, where 10 values were tested for the parameter e (number of training epochs) and 10 values for the parameter n (number of neurons in the hidden layers), a total of 100 distinct DFCNN configurations were trained. In Table A2, the third column reports the optimal values obtained for the five used metrics applied across these 100 configurations.
In the experiments conducted on the China, Desharnais, and Kemerer datasets, three distinct DFCNN configurations were identified, each leading to optimal values for the five analyzed performance metrics. It was observed that the same set of hyperparameters simultaneously yielded the best results for RMSE, CD, and MSLE, indicating the increased robustness of that particular configuration. For the Maxwell dataset, two optimal DFCNN configurations were identified. The first configuration resulted in the lowest values for MAE and MdAE, while the second configuration achieved superior results for RMSE, CD, and MSLE. These findings highlight the variability in model behavior depending on dataset characteristics, as well as the importance of selecting appropriate hyperparameter configurations to ensure optimal performance.

5.3. Fractal Neural Network

The application of Fractal Neural Networks in regression problems remains insufficiently explored, although in the past eight years several studies have used this type of neural network in classification tasks. A recent study [49], published this year, proposes a hybrid variant for time series forecasting; however, in the field of software engineering, these architectures have not yet been utilized.
The innovative FractalNN architecture proposed in this study combines a recursive fractal structure with 1D convolutional blocks, specific to convolutional neural networks [50], for addressing regression problems.
The innovative proposed architecture is composed of the following elements:
  • An input layer whose dimensionality corresponds to number of input attributes in the dataset (13 neurons for the China dataset, 8 for Desharnais, 5 for Kemerer, and 22 for Maxwell).
  • A fractal convolutional block, which is defined as a recursive structure controlled by a depth parameter (equal with four), generating two parallel branches at each level. The short branch applies a single Conv1D layer, while the long branch recursively applies two fractal blocks to the same input, illustrating self-similarity. The outputs of the two branches are merged via averaging, facilitating the integration of information across multiple scales. Implementation of convolutional layer was carried out using Conv1D class, belonging to tensorflow.keras.layers library [45].
  • Dense layers for regression, which are applied after the fractal convolutional block; the output is first flattened into a vector, then passed through a Dense layer with 64 neurons and ReLU activation, and finalized with a Dense layer with 2 neurons and linear activation, corresponding to a regression task with two continuous outputs.
  • As part of the hyperparameter tuning process, 10 values were evaluated for the number of training epochs (denoted e) and 6 values for the number of filters in Conv1D (denoted f), resulting in a total of 60 unique FractalNN configurations. The number of epochs varies between 100 and 1000, in increments of 100, but for the number of filters in Conv1D, six discrete values were tested, which were as follows: 8, 16, 32, 64, 128, and 256.
Table A3 presents, in its third column, the optimal values obtained for five metrics applied across these configurations. In the experiments performed on the China and Desharnais datasets, 3 distinct FractalNN configurations were found to yield optimal results across the five performance metrics considered. It was observed that one particular set of hyperparameters simultaneously produced the best values for RMSE, CD, and MSLE, indicating a higher degree of robustness for that configuration.
In contrast, for the Kemerer and Maxwell datasets, 2 optimal FractalNN configurations were identified. The second configuration achieved the lowest MdAE value, while the first configuration delivered superior performance for MAE, RMSE, CD, and MSLE.

5.4. Kernel Extreme Learning Machine

The proposed KELM algorithm represents an extension of the traditional Extreme Learning Machine approach, in which the feature space is implicitly generated through the use of a kernel function. This strategy eliminates the need for explicitly optimizing the hidden layer weights and activations, thereby simplifying the training process. In the implemented architecture, the RBF (Radial Basis Function) kernel is employed to perform a nonlinear mapping of the input data, which enhances the separability of the data in the induced feature space. The main components of the KELM model are as follows:
  • RBF kernel Gram matrix captures the pairwise similarities between training samples in the transformed feature space, enabling nonlinear modeling through kernel-based methods.
  • Hyperparameter γ represents the coefficient of the radial basis function kernel and controls the spread or influence of the kernel function. Its value remains consistent within a given configuration and varies across experimental cycles. In this study, γ was tested over a range of values, as follows: 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, and 50.
  • Hyperparameter λ denotes the regularization coefficient, employed to stabilize the inversion of the Gram matrix in the presence of multicollinearity or noise. In this study, λ was evaluated across the following values: 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100, and 500.
Within the hyperparameter optimization process of KELM architecture (Table A4), a grid search was conducted over ten distinct values for the radial coefficient (denoted γ) and ten values for the regularization coefficient (denoted λ).
This process yielded a total of 100 unique KELM configurations. The performance of each configuration was assessed based on five metrics, and the optimal values identified across these configurations are summarized in the third column of Table A4. Experimental evaluations conducted on the China and Desharnais datasets led to the identification of two optimal configurations of KELM. The first configuration yielded superior performance across four metrics: MAE, RMSE, CD and MSLE, while the second configuration minimized MdAE. In the case of Kemerer dataset, a single configuration of hyperparameters simultaneously produced optimal values for all five metrics, indicating the increased model stability and robustness within this specific data context. Regarding Maxwell dataset, three distinct KELM configurations were found to yield optimal results across the five metrics assessed. Notably, one of these configurations achieved the best values for RMSE, CD, and MSLE concurrently, suggesting a higher degree of reliability and generalization capacity for that particular hyperparameter setting.

5.5. Hybrid Artificial Neural Networks

To improve the metrics performance obtained for four previously implemented artificial neural network architectures (MLP, DFCNN, FractalNN, and KELM), the development of four corresponding hybrid neural network models was undertaken. These hybrid models consisted of combining each type of neural network with the Random Forests machine learning algorithm, aiming to capitalize on the complementary strengths of both approaches and enhance predictive accuracy. The hybrid neural networks were designed using cascade architecture, in which an artificial neural network (such as MLP, DFCNN, FractalNN, or ELM) is employed for automatic feature extraction from the input data. Subsequently, these extracted features are fed into the RF regressor, which leverages its ensemble learning capabilities to produce the final prediction. This combined framework is intended to improve robustness, generalization, and overall model performance beyond what either method could achieve independently.

5.5.1. Multilayer Perceptron Combined with Random Forests

The proposed hybrid artificial neural network architecture follows a cascade structure, integrating a MLP with a decision tree-based regressor (RF). This architectural combination (denoted as MLP_RF) aims to leverage the feature extraction capabilities of neural networks together with the robustness and generalization power of ensemble learning methods like RF, particularly in the context of multivariate regression tasks.
The first component of proposed architecture is an MLP network, consisting of an input layer adapted to the dimensionality of each dataset, followed by one fully connected hidden layer using ReLU activation function. The final dense layer of MLP produces a low-dimensional latent representation, which serves as a compressed and abstract feature vector derived from the input data. This latent vector is then used as an input to the RF model.
The second component is a multi-output RF regressor, trained to simultaneously predict two output variables. RF model operates on the latent features extracted by MLP and is responsible for generating the final predictions. During the hyperparameter tuning process for MLP_RF architecture, the following five key hyperparameters were optimized:
  • Number of hidden layer nodes in MLP network (denoted as n), which varied within the interval [100, 1000] with an increment step of 100.
  • Number of training epochs for MLP (denoted as e), explored within the range [50, 500] with a step size of 50.
  • Number of estimators (denoted as s), controlling the total number of trees generated by RF model, ranging from 80 to 800 with an increment of 80.
  • Maximum depth of the trees (denoted as d), influencing the complexity of each individual decision tree.
  • Random seed (denoted as r), used to control the randomness of the RF training process.
During the hyperparameters’ tuning process, multiple combinations of predefined values for five important parameters of MLP_RF architecture were explored. The number of neurons in MLP hidden layer was varied using 10 values ranging from 100 to 1000, while the number of training epochs was adjusted across 10 values between 50 and 500. Additionally, 10 values between 80 and 800 were selected for the number of estimators in the RF model. For the maximum depth of the trees, the following five discrete values were tested: 4, 8, 16, 32, and 64. The random_state parameter was evaluated using four values (21, 42, 84, and 168) to ensure the reproducibility of the results. This configuration led to the generation of a large number of unique MLP_RF models, each evaluated based on five performance metrics applied to multivariate regression tasks. Table A5 summarizes these results, highlighting, in the third column, the optimal values obtained for each metric, thus reflecting the superior performance of the corresponding configuration. In addition, Table A5 reports the estimated software effort and duration (minimum, maximum, and mean) for each experimental configuration.
The experimental evaluations conducted on the China and Maxwell datasets led to the identification of three distinct configurations of MLP_RF hybrid architecture, each yielding optimal results with respect to the five analyzed performance metrics. Among these, one configuration stood out by simultaneously achieving the best values for RMSE, CD, and MSLE, suggesting the increased reliability and a high generalization capability associated with the specific parameter values used in that configuration. The best performance was observed in the China dataset, indicating a strong correlation between predicted and actual values. For the Desharnais and Kemerer datasets, two optimal MLP_RF configurations were identified. The first configuration demonstrated superior performance in four out of the five metrics (MAE, RMSE, CD, and MSLE), while the second configuration proved effective in minimizing the MdAE value.

5.5.2. Deep Fully Connected Neural Network Combined with Random Forests

Another proposed architecture for hybrid artificial neural networks involves the integration of a DFCNN with an RF regressor, within a modular and flexible architecture. In this configuration (denoted as DFCNN_RF), DFCNN serves as a latent feature extractor, generating abstract and informative representations of input data. These features are subsequently used by the RF regressor to perform the prediction, leveraging the DFCNN ability to learn complex representations and RF robustness in regression tasks. To enable the simultaneous prediction of two target variables, the model uses Scikit-learn MultiOutputRegressor wrapper, adapting the RF regressor to a multi-output regression setting. The entire ensemble is implemented as a custom Scikit-learn estimator, inheriting functionalities from BaseEstimator and RegressorMixin, thereby ensuring compatibility with hyperparameter optimization procedures. During the hyperparameter tuning process for hybrid DFCNN_RF architecture, the same five parameters previously used in MLP_RF hybrid architecture were applied. The use of these common hyperparameters facilitates a fair comparison between the two architectures, allowing for an objective evaluation of performance under similar experimental conditions. The hyperparameters’ tuning process led to the generation of a significant number of unique models of hybrid DFCNN_RF architecture, with each variant being evaluated based on five performance metrics corresponding to multivariate regression tasks. The results are summarized in Table A6, where the third column presents the optimal values associated with each used metric.
The experimental evaluations conducted on the China, Desharnais, and Maxwell datasets led to the identification of three distinct configurations of hybrid DFCNN_RF architecture, each exhibiting optimal performance according to the five metrics used for multivariate regression tasks. One of these configurations stood out by simultaneously achieving the best values for RMSE, CD, and MSLE, indicating a high level of reliability and superior generalization capacity associated with the specific parameterization implemented.
Regarding Kemerer dataset, two DFCNN_RF configurations with notable performance were identified. The first achieved superior results for four out of the five analyzed metrics (MAE, RMSE, CD, and MSLE), while the second was distinguished by its efficiency in minimizing MdAE.

5.5.3. Fractal Neural Network Combined with Random Forests

As part of addressing the multivariate regression problem, a new innovative hybrid model named FractalNN_RF was developed, combining the feature extraction capabilities of the previously described FractalNN architecture with the robustness and generalization power of the RF regressor. This integration aims to leverage the strengths of both components to enhance prediction accuracy and stability. The proposed architecture is based on two main components. The FractalNN model acts as an extractor of latent features from the input data. The latent features extracted by the network are then used as input for second component, which is an RF regressor trained in multi-output mode. The novelty of this hybrid model, FractalNN_RF, lies in the synergistic integration of a fractal architecture, capable of capturing multiscale patterns through recursive self-similarity, with the robustness and generalization strength of Random Forests. This combination was specifically motivated by the need to address the challenges of limited-size datasets, where complex nonlinear feature extraction must be balanced with stability against noise and variability. Together, these strengths provide a balanced model that enhances both accuracy and generalization in software effort and duration prediction.
During the hyperparameter optimization phase for the hybrid model FractalNN_RF, four out of the five hyperparameters previously used in the MLP_RF and DFCNN_RF hybrid architectures were retained, with the same values applied. Among these, one pertains to the neural network component, the number of training epochs, while the remaining three are associated with RF regressor: the number of estimators, the maximum tree depth, and the random seed values. The fifth hyperparameter, specific to the FractalNN model, is the number of filters in the Conv1D layer (denoted as f), for which six values from the discrete set {8, 16, 32, 64, 128, 256} were evaluated. The tuning process led to the development of a substantial number of unique FractalNN_RF configurations, each variant being assessed according to five relevant metrics of multivariate regression tasks. The evaluation results are summarized in Table A7, where the third column highlights the optimal values associated with each metric used.
Training on the China, Desharnais, and Kemerer datasets led to the identification of three distinct configurations of FractalNN_RF hybrid architecture, each demonstrating optimal performance across the five metrics used to evaluate multivariate regression tasks. For the China dataset, one configuration achieved the lowest values for MAE and MSLE, another excelled in terms of RMSE and CD, while a third stood out by minimizing MdAE. For the Desharnais and Kemerer datasets, one configuration clearly distinguished itself by simultaneously delivering top performance in RMSE, CD, and MSLE. Regarding Maxwell dataset, two high-performing FractalNN_RF configurations were identified; the first showed superior results across four of the five metrics (MAE, RMSE, CD, and MSLE), while the second was notable for its efficiency in reducing MdAE.

5.5.4. Extreme Learning Machine Combined with Random Forests

The last proposed model employs a hybrid architecture (ELM_RF) that combines ELM for nonlinear feature extraction with an RF regressor responsible for predicting the target variables. The objective of this approach is to capture complex relationships within the data by projecting them into a latent feature space, followed by a robust and interpretable regression stage. The process begins with training the ELM model, where the number of neurons in the hidden layer is varied as a key parameter. This stage produces a latent representation of the data through a nonlinear transformation. The extracted features are then used as input for RF, which is trained to predict the target vector. The RF regressor is chosen for its robustness to noise and is fine-tuned by varying three hyperparameters: the number of estimators, the maximum tree depth, and the random seed value.
During the hyperparameter tuning process, multiple combinations of predefined values were investigated for the four previously specified parameters of the hybrid ELM_RF architecture. The number of neurons in the hidden layer of ELM model was varied using ten values ranging from 100 to 1000, in increments of 100 (Table A8).
Likewise, for the RF model, 10 values were selected for the number of estimators, ranging from 80 to 800, with a step size of 80. The maximum tree depth was tested using the following five discrete levels: 4, 8, 16, 32, and 64. The random_state parameter, used to control randomness and ensure reproducibility, was evaluated with the following four values: 21, 42, 84, and 168. This strategy for exploring the hyperparameter space led to the generation of a significant number of distinct ELM_RF configurations, each of which was assessed based on five multivariate regression performance metrics. The results of these experiments are summarized in Table A8, where the third column presents the optimal values obtained for each metric, thus highlighting the superior performance of the corresponding configuration.
Training ELM_RF model on the China and Kemerer datasets led to the identification of two distinct configurations of hybrid ELM_RF architecture, each achieving optimal performance across the five metrics used to evaluate multivariate regression tasks. For the China dataset, two high-performing configurations were identified; the first demonstrated superior results for four out of the five metrics (MAE, RMSE, CD, and MSLE), while the second stood out for its effectiveness in reducing the MdAE value. In the case of the Kemerer dataset, one configuration achieved the lowest values for MAE and MdAE, whereas another configuration excelled in terms of RMSE, CD, and MSLE. For the Desharnais dataset, a single configuration clearly stood out by simultaneously delivering top-level performance in RMSE, CD, and MSLE. Regarding the Maxwell dataset, a single high-performing ELM_RF configuration was identified, showing superior results across all five evaluated metrics.

6. Comparative Analysis of Implemented Artificial Neural Networks

To determine the most suitable estimation model based on the dataset size (small-sized, medium-sized, and large-sized), four types of artificial neural networks and four types of hybrid neural networks were compared, using the values of five evaluation metrics: MAE, MdAE, RMSE, CD, and MSLE. Table 8 highlights the optimal values of used metrics for the eight prediction methods applied to the four datasets, which are as follows: China (large-sized), Desharnais (medium-sized), Kemerer (small-sized), and Maxwell (medium-sized). The results obtained are compared both with the proposed models and with the values reported in previous studies.
For the China dataset, the best performance, reflected by the minimum values of the MAE, MdAE, RMSE, and MSLE metrics, as well as the maximum value of the CD, was achieved by ELM_RF model, demonstrating its superior predictive capability. MAE (0.0046) and RMSE (0.0137) values indicate a high level of accuracy, with very low prediction errors. Although the RMSE is slightly higher than the MAE, suggesting a few instances of more pronounced errors, these remain limited overall. The minimum MdAE value (0.0013) confirms that model performance is not significantly affected by abrupt variations. Additionally, the very low MSLE value (0.0001) reflects an extremely small logarithmic error, which is a strong indicator of the model robustness. The CD, with a value of 0.9834, indicates an excellent fit between the predicted and actual values.
ELM_RF architecture, which combines ELM and RF models, leverages both the generalization capability of ELM and the robustness of RF to data variability. Therefore, this method may be considered a suitable choice for large-sized datasets, such as the China dataset.
For the Desharnais and Maxwell datasets, the innovative FractalNN_RF architecture achieves the best performance across all evaluated metrics. For the Desharnais dataset, the MdAE value (0.0236), being lower than the MAE (0.0573), indicates that most prediction errors are small, although the average is influenced by a few larger deviations. RMSE (0.0777), while still low, is slightly higher than MAE, confirming the presence of a few isolated cases with more significant errors. The CD value of 0.7135 reflects a good, though not perfect, fit between the predicted and actual values, suggesting room for improvement in capturing data relationships, an aspect that is typically challenging to optimize given the medium size of the Desharnais dataset. The MSLE value (0.0036) confirms that the proportional errors in the predictions are very small. For the Maxwell dataset, the MAE value (0.0957) is relatively moderate, while the lower MdAE value (0.0328) indicates that majority of predictions are accurate, with a few outliers increasing the overall mean error. RMSE (0.1629), being higher than MAE, further confirms the presence of certain predictions with notable deviations. The CD, with a value of 0.5320, suggests that the model does not fully capture the underlying relationships within the data, a limitation that is often challenging to address in the context of medium-sized datasets. The MSLE value (0.0118) indicates low errors in the logarithmic space, reflecting good proportional prediction performance by the model.
Hybrid FractalNN_RF architecture, which combines Fractal Neural Network with the Random Forests algorithm, represents an advanced approach that merges deep learning capabilities with the robustness of ensemble techniques. It is ideal for medium-sized datasets, such as the Desharnais and Maxwell datasets, where complex nonlinear relationships and data variations can significantly impact model performance.
For the Kemerer dataset, MLP architecture achieves the best performance across all five metrics, while the hybrid networks prove to be ineffective on this small-sized dataset. The MAE value (0.0173) highlights a high level of overall predictive accuracy, making it suitable for applications with strict precision requirements. MdAE (0.0202), being slightly higher than MAE, suggests a relatively uniform distribution of errors, without significant outliers. RMSE (0.0203), close to MAE and nearly identical to MdAE, indicates that there are no large-sized errors distorting the average, thus confirming the consistency and balance of the model predictions. The CD (0.9274) represents an excellent score, suggesting that the model appropriately captures the relevant relationships between variables, even in the context of a small-sized dataset. The MSLE (0.0003), with an extremely low value, reflects a high proportional accuracy. Moreover, the comparative analysis of CD values obtained for the eight proposed models reveals that hybrid networks generally exhibit inferior performance compared to standard models. The analysis of the metric values highlights that the proposed MLP model demonstrates a balanced performance and a high degree of reliability in its predictions, an aspect that is both rare and highly valuable. These characteristics make this model a suitable candidate for estimating software development effort and duration, particularly when working with small-sized datasets, such as the Kemerer dataset.
The superior performance of the simpler MLP model on the small-sized Kemerer dataset is attributable to its ability to avoid overfitting, a common issue in complex architectures with numerous parameters. In contexts with reduced datasets, an MLP model, characterized by a lower complexity, strikes an optimal balance between expressiveness and generalization, effectively capturing underlying patterns without amplifying noise. This observation is consistent with the bias–variance trade-off theory; for small datasets, models with lower complexity typically generalize better, whereas for medium or large datasets, more complex architectures are able to exploit the richer information available and achieve superior accuracy.
To reinforce the previous observations, the Wilcoxon signed-rank test was applied to the small-sized Kemerer dataset in order to assess the statistical significance of the performance differences between the MLP architecture and proposed hybrid architectures (MLP_RF, DFCNN_RF, FractalNN_RF, and ELM_RF). Each of the five models was trained over ten independent runs, and the Wilcoxon test was performed to compare their performance across five evaluation metrics: MAE, MdAE, RMSE, CD, and MSLE. The resulting p-values, presented in Table 9, range from 0.0005 to 0.0024, indicating that all observed performance differences are statistically significant at the 1% significance level (p < 0.01). These findings provide strong empirical evidence that the simpler MLP architecture consistently outperforms the more complex hybrid models (MLP_RF, DFCNN_RF, FractalNN_RF, and ELM_RF) in the context of small-sized datasets, such as the Kemerer dataset.
This result supports the hypothesis that architectural simplicity enhances generalization capability under conditions of limited data availability, whereas hybrid models, due to their higher structural complexity, tend to be more prone to overfitting.
The proposed architectures were also compared with those presented in previous studies, as reviewed in the Section 2. The optimal architectures developed for each dataset significantly outperform the values reported in the existing literature. Table 8 highlights the superiority of the proposed hybrid architectures (ELM_RF for large-sized datasets and FractalNN_RF for medium-sized datasets), compared to traditional approaches and prior research results. In the case of small-sized datasets, the MLP traditional architecture still yields better performance.

7. Conclusions

This study focused on developing predictive models for estimating the effort and duration required to complete software projects, tailored to dataset size. An analysis of the software engineering domain revealed that open-source datasets within this field can be categorized into three main groups: small-sized, medium-sized, and large-sized datasets. In this study, four datasets were utilized: the large-sized China dataset, the medium-sized Desharnais and Maxwell datasets, and the small-sized Kemerer dataset.
For this purpose, eight artificial neural network architectures were proposed: four traditional models (MLP, DFCNN, FractalNN, and KELM) and four hybrid models, which were obtained by combining these with the RF algorithm, denoted as MLP_RF, DFCNN_RF, FractalNN_RF, and ELM_RF. The proposed architectures were analyzed and compared based on the following five evaluation metrics: MAE, MdAE, RMSE, CD, and MSLE.
Following the comparative analysis of the results obtained from the eight proposed architectures, it becomes evident that the hybrid neural network model ELM_RF demonstrates the highest effectiveness when applied to large-sized datasets. This conclusion is supported by its superior performance on the China dataset, where it achieved the best overall results across multiple evaluation metrics. The integration of the Extreme Learning Machine with the Random Forests algorithm appears to enhance the model capacity for nonlinear pattern recognition and reduce the tendency toward overfitting, resulting in improved generalization and predictive stability.
In contrast, for small-sized datasets, the traditional MLP architecture was the most suitable approach, as it yielded the best results on the Kemerer dataset. Furthermore, statistical validation using the Wilcoxon signed-rank test confirmed the robustness and significance of these results, reinforcing the conclusion that MLP provides a reliable and efficient solution for estimating effort and duration in small-sized software project datasets.
In this research paper, a new innovative hybrid neural network called FractalNN_RF was proposed, which combines a Fractal Neural Network with the RF algorithm for regression tasks. When comparing the performance of this optimized hybrid architecture with that of seven other prediction architectures, FractalNN_RF demonstrated superior results on medium-sized datasets and yielded the highest accuracy on the Desharnais and Maxwell datasets.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are openly available in: China dataset https://zenodo.org/record/268446 [27], Desharnais dataset [28], Kemerer dataset https://zenodo.org/record/268464 [29], and Maxwell dataset https://zenodo.org/records/268461 [30], all accessed on 29 July 2025.

Conflicts of Interest

The author declare no conflict of interest.

Appendix A

Table A1. MLP results.
Table A1. MLP results.
Dataset MetricOptimParametersEstimated EffortEstimated Duration
enMinimMaximMeanMinimMaximMean
ChinaMAE0.03173001401472.38324,444.8273366.8474.78620.5418.681
MdAE0.01068001601159.98620,928.5413435.9913.44922.8098.514
RMSE0.06492002001628.62325,036.6743390.3524.74422.7228.258
CD0.6270
MSLE0.0025
DesharnaisMAE0.0647400160348.76510,623.0764443.9137.03518.73211.543
MdAE0.0345
RMSE0.08875001201243.7959210.4374239.5814.33718.82510.916
CD0.6267
MSLE0.0048
KemererMAE0.01732004051.659286.898190.3578.00613.13811.261
MdAE0.0202
RMSE0.0203
CD0.9274
MSLE0.0003
MaxwellMAE0.11394001401758.47116,327.2588915.9713.68726.76817.309
MdAE0.0730
RMSE0.2151
CD0.1840
MSLE0.0213
Table A2. DFCNN results.
Table A2. DFCNN results.
Dataset MetricOptimParametersEstimated EffortEstimated Duration
enMinimMaximMeanMinimMaximMean
ChinaMAE0.009940012045.53436,469.4923280.7660.99642.1357.783
MdAE0.0028100016063.13329,416.5033245.0351.28541.0827.899
RMSE0.0205100020251.18945,330.3083356.6811.32944.1218.524
CD0.9627
MSLE0.0003
DesharnaisMAE0.0735100060816.70813,523.4414179.3623.53420.52811.112
MdAE0.029830080975.22914,300.3914271.0452.52121.4049.659
RMSE0.108890040803.16114,015.8014780.8722.92422.23110.916
CD0.4383
MSLE0.0070
KemererMAE0.068390020074.731250.584138.35210.63317.55014.263
MdAE0.038930014061.393257.390135.77510.73817.28314.151
RMSE0.09239002035.217204.015122.3287.79116.69813.173
CD0.4951
MSLE0.0064
MaxwellMAE0.1104600201158.09321,672.6627113.7295.64135.52917.326
MdAE0.0412
RMSE0.20491001402592.81925,223.75111,619.2557.24137.88122.179
CD0.2589
MSLE0.0200
Table A3. FractalNN results.
Table A3. FractalNN results.
Dataset MetricOptimParametersEstimated EffortEstimated Duration
efMinimMaximMeanMinimMaximMean
ChinaMAE0.013530064127.88331,725.6023615.9922.82827.3668.281
MdAE0.0028400128114.72529,842.3433560.6673.22723.7628.519
RMSE0.038820032112.77531,570.0253459.5422.61227.6028.401
CD0.8666
MSLE0.0008
DesharnaisMAE0.074990032514.85614,540.0154375.2392.64420.93110.531
MdAE0.0306600128564.00615,068.6964514.6891.10524.40110.163
RMSE0.115820064449.58112,471.7244663.4242.73322.81411.081
CD0.3635
MSLE0.0077
KemererMAE0.0411001645.903259.081164.5378.35714.43811.961
RMSE0.0453
CD0.6403
MSLE0.0015
MdAE0.02914006443.223233.819144.38211.26816.67513.753
MaxwellMAE0.107810032417.83926,885.6837444.0511.23834.26520.154
RMSE0.1949
CD0.3299
MSLE0.0173
MdAE0.0438500256137.77716,442.8286952.9042.87727.95116.464
Table A4. KELM results.
Table A4. KELM results.
Dataset MetricOptimParametersEstimated EffortEstimated Duration
γλMinimMaximMeanMinimMaximMean
ChinaMAE0.007510.518.39436,780.8213368.4341.59820.7737.913
RMSE0.0233
CD0.9518
MSLE0.0002
MdAE0.0014101010.84811,169.8112097.7780.83532.5876.837
DesharnaisMAE0.070450.051542.859829.5754255.1245.77515.65110.307
RMSE0.0955
CD0.5672
MSLE0.0055
MdAE0.039610.011542.859829.5754255.1245.77515.65110.307
KemererMAE0.02710.015053.862237.222179.85210.60312.54511.828
MdAE0.0262
RMSE0.0306
CD0.8352
MSLE0.0007
MaxwellMAE0.106610.11050.51911,953.7565720.1094.98124.89513.491
MdAE0.04170.051002387.2312634.2662522.2917.2037.6987.517
RMSE0.22250.55638.44912,404.1717152.5566.28523.82715.346
CD0.1266
MSLE0.0227
Table A5. MLP_RF results.
Table A5. MLP_RF results.
Dataset MetricOptimParametersEstimated EffortEstimated Duration
ensdrMinimMaximMeanMinimMaximMean
ChinaMAE0.01295007004803221141.07529,762.0253438.4242.6528.4258.726
MdAE0.00224007007206484116.51629317.13477.3722.86628.2668.854
RMSE0.03733006003203242102.831,664.0453512.5962.81225.8258.753
CD0.8766
MSLE0.0007
DesharnaisMAE0.0660505006403242817.0269937.1164798.4933.66816.95210.563
RMSE0.0892
CD0.6219
MSLE0.0048
MdAE0.02565070056064 168812.42513,158.4414535.9523.01220.83211.121
KemererMAE0.048250200400164281.528244.554161.7116.14515.80112.807
RMSE0.0652
CD0.2535
MSLE0.0032
MdAE0.03795008007203284220.814225.819223.31110.31212.31211.562
MaxwellMAE0.103335020064016422363.05115,093.8037437.45310.05023.35115.752
MdAE0.035550080080064842529.42227,173.0619318.6769.58830.21117.203
RMSE0.207925060048032423046.75019,160.9838786.9329.40329.18317.402
CD0.2373
MSLE0.0206
Table A6. DFCNN_RF results.
Table A6. DFCNN_RF results.
Datasets MetricsOptimParametersEstimated EffortEstimated Duration
ensdrMinimMaximMeanMinimMaximMean
ChinaMAE0.0062200200560164273.13341,748.2833444.1242.02528.1518.019
MdAE0.0017200300640648473.23339,714.0413448.2591.93728.3258.092
RMSE0.0154505003203221137.55544,868.6223466.0661.78824.9778.251
CD0.9791
MSLE0.0001
DesharnaisMAE0.0692350300480884809.55113,588.4014465.0833.00123.90111.421
MdAE0.029920070072032168765.94113,373.6414068.4513.38121.9519.733
RMSE0.10821506004001642834.05111,005.1664337.5283.83322.86611.275
CD0.4438
MSLE0.0068
KemererMAE0.0447350200240162185.567259.569173.86211.08116.72113.731
RMSE0.0637
CD0.2865
MSLE0.0029
MdAE0.0219200400480328479.811225.368152.5868.40117.52813.162
MaxwellMAE0.097830080032016421076.27523,005.7127972.0446.01232.25116.485
MdAE0.0403250900560641681027.07710,103.7946935.7326.04429.71117.336
RMSE0.188945040024032421016.89132,674.30511,638.1925.99235.42120.936
CD0.3702
MSLE0.0192
Table A7. FractalNN_RF results.
Table A7. FractalNN_RF results.
Datasets MetricsOptimParametersEstimated EffortEstimated Duration
efsdrMinimMaximMeanMinimMaximMean
ChinaMAE0.00724001286403242357.57937,267.2693381.8340.48734.6238.157
MSLE0.0002
RMSE0.020625064480162184.74141,627.5973428.6310.17630.5177.864
CD0.9623
MdAE0.002440025680032844.24335,487.9453528.9242.26943.3378.689
DesharnaisMAE0.05734506456032841231.65511,791.2914515.2526.45517.31510.765
MdAE0.0236350128720641681076.95111,757.9024695.7275.70117.67510.248
RMSE0.0777503240016211745.6259594.7254338.9217.22119.00212.336
CD0.7135
MSLE0.0036
KemererMAE0.05203001624084267.796306.349182.4849.25117.16613.233
MdAE0.0266100256480648449.214266.254164.2736.78315.95511.261
RMSE0.0587100643201621131.558260.103181.31110.49115.76213.055
CD0.3955
MSLE0.0027
MaxwellMAE0.09571503232032421535.39827,359.24710,670.1215.02530.82618.061
RMSE0.1629
CD0.5320
MSLE0.0118
MdAE0.032835064560161681530.59230,992.0847092.7644.46737.58818.225
Table A8. ELM_RF results.
Table A8. ELM_RF results.
DatasetsMetricsOptimParametersEstimated EffortEstimated Duration
nsdrMinimMaximMeanMinimMaximMean
ChinaMAE0.00467005601621110.80142,085.1783334.3941.26425.8327.506
RMSE0.0137
CD0.9834
MSLE0.0001
MdAE0.00139007203242142.01735,781.3053241.7091.52522.8497.721
DesharnaisMAE0.06513004003221423.7838425.4244411.7497.63216.43711.287
MdAE0.02772006406484725.28210,394.3934304.3386.01519.01111.377
RMSE0.09424003201642172.9228369.1934409.7637.54416.54611.394
CD0.5789
MSLE0.0051
KemererMAE0.06965004806416882.58185.4105138.9617.5515.9511.762
MdAE0.0589
RMSE0.0831300240328481.535167.012134.9478.00216.15412.649
CD0.2129
MSLE0.0051
MaxwellMAE0.098460040016422764.911,393.75204.42810.018.914.581
MdAE0.0391
RMSE0.2328
CD0.3437
MSLE0.0251

References

  1. Anwar, A. Software Engineering a Journey Beyond Code. J. Comput. Sci. Technol. Stud. 2025, 7, 619–627. [Google Scholar] [CrossRef]
  2. Rus, G.; Andras, I.; Vaida, C.; Crisan, N.; Gherman, B.; Radu, C.; Tucan, P.; Iakab, S.; Hajjar, N.A.; Pisla, D. Artificial intelligence-based hazard detection in robotic-assisted single-incision oncologic surgery. Cancers 2023, 15, 3387. [Google Scholar] [CrossRef] [PubMed]
  3. Ran, D.; Wu, M.; Yang, W.; Xie, T. Foundation Model Engineering: Engineering Foundation Models Just as Engineering Software. ACM Trans. Softw. Eng. Methodol. 2025, 34, 1–18. [Google Scholar] [CrossRef]
  4. Gupta, A. Machine Learning and Deep Learning: A Comprehensive Overview. Int. J. Res. Appl. Sci. Eng. Technol. 2025, 13, 1620–1626. [Google Scholar] [CrossRef]
  5. Covaciu, F.; Crisan, N.; Vaida, C.; Andras, I.; Pusca, A.; Gherman, B.; Radu, C.; Tucan, P.; Al Hajjar, N.; Pisla, D. Integration of Virtual Reality in the Control System of an Innovative Medical Robot for Single-Incision Laparoscopic Surgery. Sensors 2023, 23, 5400. [Google Scholar] [CrossRef]
  6. Panoiu, M.; Panoiu, C. Hybrid Deep Neural Network Approaches for Power Quality Analysis in Electric Arc Furnaces. Mathematics 2024, 12, 3071. [Google Scholar] [CrossRef]
  7. Olaniran, O.; Alzahrani, A.R.; Alharbi, N.M.; Alzahrani, A.A. Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification. Mathematics 2025, 13, 1214. [Google Scholar] [CrossRef]
  8. Muscalagiu, I.; Popa, H.E.; Negru, V. Improving the performances of asynchronous search algorithms in scale-free networks using the nogood processor technique. Comput. Inform. 2015, 34, 254–274. [Google Scholar]
  9. Iordan, A.E. Optimal Solution of the Guarini Puzzle Extension using Tripartite Graphs. IOP Conf. Ser. Mater. Sci. Eng. 2019, 477, 012046. [Google Scholar]
  10. Supiyandi, S.; Hasanunddin, M. Optimization of Computer Network Performance Using Heuristic Algorithms. J. Comput. Sci. Artif. Intell. Commun. 2025, 1, 12–17. [Google Scholar] [CrossRef]
  11. Rus, G.; Gherman, B.; Nae, L.; Vaida, C.; Pisla, A.; Oprea, E.; Schonstein, C.; Antal, T.; Pisla, D. Fuzzy Logic Systems: From WisdomofAge Mentoring Platform to Medical Robots. In International Workshop on Medical and Service Robots; Springer Nature: Cham, Switzerland, 2023. [Google Scholar]
  12. Ghiormez, L.; Panoiu, M.; Panoiu, C. Fuzzy Logic Controller for Power Control of an Electric Arc Furnace. Mathematics 2024, 12, 3445. [Google Scholar] [CrossRef]
  13. Tucan, P.; Ciocan, A.; Gherman, B.; Radu, C.; Vaida, C.; Al Hajjar, N.; Chablat, D.; Pisla, D. Design Optimization of a Parallel Robot for Laparoscopic Pancreatic Surgery Using a Genetic Algorithm. Appl. Sci. 2025, 15, 4383. [Google Scholar] [CrossRef]
  14. Panoiu, M.; Panoiu, C.; Ivascanu, P. Power Factor Modelling and Prediction at the Hot Rolling Mills’ Power Supply Using Machine Learning Algorithms. Mathematics 2024, 12, 839. [Google Scholar] [CrossRef]
  15. Covaciu, F.; Tucan, P.; Rus, G.; Pisla, A.; Zima, I.; Gherman, B. Positioning of a Surgical Parallel Robot Using Artificial Intelligence. In Proceedings of the 33rd International Conference on Robotics in Alpe-Adria-Danube Region, Cluj-Napoca, Romania, 5–7 June 2024. [Google Scholar]
  16. Rajput, Y.; Razi, M.H.; Sharma, A.K. A Comparative Analysis of Different Machine Learning Techniques used in Software Effort Estimation. In Proceedings of the International Conference on Computational Intelligence, Communication Technology and Networking, Ghaziabad, India, 6–7 February 2025. [Google Scholar]
  17. Hariyanti, E.; Paradista, M.A.; Goyayi, M.L.J.; Shabirina, D.A.; Nurjanah, E.; Husna, O.I.; Yahrani, F.A.S. The Implementation of Machine Learning for Software Effort Estimation: A Literature Review. Khazanah Inform. 2024, 10, 47–57. [Google Scholar] [CrossRef]
  18. Rossi, B.B.; Fontoura, L.M. AI-Based Approaches for Software Tasks Effort Estimation: A Systematic Review of Methods and Trends. In Proceedings of the International Conference on Enterprise Information Systems, Porto, Portugal, 4–6 April 2025. [Google Scholar]
  19. Salihu, S.A.; Saliu, K.B.; Owoyemi, O.A. A Systematic Literature Review of Machine Learning and AutoML in Software Effort Estimation. In Proceedings of the International Conference on ICT for National Development and Its Sustainability, Ilorin, Nigeria, 21–25 May 2024; pp. 145–168. [Google Scholar]
  20. Farah, A.; Lahceb, I. Software effort estimation based on long short-term memory and stacked long short term memory. In Proceedings of the International Conference on Contemporary Information Technology and Mathematics, Mosul, Iraq, 30–31 August 2022. [Google Scholar]
  21. Nisa, M.; Saqlain, M.; Abid, M.; Awais, M.; Stevic, Z. Analysis of Software Effort Estimation by Machine Learning Techniques. Ing. Des Syst. D’Inf. 2023, 28, 1445–1457. [Google Scholar]
  22. Anitha, C.; Parveen, N. Deep artificial neural network based multilayer gated recurrent model for effective prediction of software development effort. Multimed. Tools Appl. 2024, 83, 66869–66895. [Google Scholar] [CrossRef]
  23. Ali, S.S.; Ren, J.; Wu, J.; Zhang, K.; Chao, L. Advancing Software Project Effort Estimation: Leveraging a NIVIM for Enhanced Preprocessing. J. Softw. Evol. Process 2025, 37, e2745. [Google Scholar]
  24. Rahman, M.; Goncalves, T.; Sarwar, H. Review of Existing Datasets used for Software Effort Estimation. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 921–931. [Google Scholar] [CrossRef]
  25. Abedu, S.; Mensah, S.; Boafo, F. An empirical Study on Small-Sized Datasets Based on Eubank’s Optimal Spacing Theorem. SN Comput. Sci. 2025, 6, 1. [Google Scholar] [CrossRef]
  26. Eubank, R.L. A Density-Quantile Function Approach to Optimal Spacing Selection. Ann. Stat. 1981, 9, 494–500. [Google Scholar] [CrossRef]
  27. Zenodo|China: Effort Estimation Dataset. Available online: https://zenodo.org/record/268446 (accessed on 10 May 2025).
  28. Desharnais, J.M. Analyse Statistique de la Productivitie des Projets Informatique a Partie de la Technique des Point des Function. Master’s Thesis, University of Montreal, Montreal, QC, Canada, 1999. [Google Scholar]
  29. Kemerer Zenodo|Kemerer. Available online: https://zenodo.org/record/268464 (accessed on 11 May 2025).
  30. Maxwell Zenodo|Maxwell. Available online: https://zenodo.org/records/268461 (accessed on 20 May 2025).
  31. Covaciu, F.; Gherman, B.; Vaida, C.; Pisla, A.; Tucan, P.; Caprariu, A.; Pisla, D. A Combined Mirror–EMG Robot-Assisted Therapy System for Lower Limb Rehabilitation. Technologies 2025, 13, 227. [Google Scholar] [CrossRef]
  32. Iordan, A.E. Usage of Stacked Long Short-Term Memory for Recognition of 3D Analytic Geometry Elements. In Proceedings of the International Conference on Agents and Artificial Intelligence, Lisbon, Portugal, 3–5 February 2022. [Google Scholar]
  33. Larsson, G.; Maire, M.; Shakhnarovich, G. FractalNet: Ultra-Deep Neural Networks without Residuals. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
  34. Yu, S. An Efficient Kernel Extreme Learning Machine Approach for Bankruptcy Prediction. J. Comput. Sci. Artif. Intell. 2025, 2, 72–77. [Google Scholar] [CrossRef]
  35. Lucchese, L.V.; Oliveira, G.; Pedrollo, O. A hybrid Random Forests and Artificial Neural Networks Bagging Ensemble for Landslide Susceptibility Modelling. Geocarto Int. 2022, 37, 16492–16511. [Google Scholar] [CrossRef]
  36. Athya, T.; Singh, R. Use of Artificial Neural Network in Engineering: A Review. Int. J. Res. Appl. Sci. Eng. Technol. 2025, 13, 7200–7205. [Google Scholar] [CrossRef]
  37. Bazilevskiy, M.P. Optimization Problem of Constructing Linear Regressions with a Minimum Value of the Mean Absolute Error on Test Sets. Model. Data Anal. 2024, 14, 91–103. [Google Scholar] [CrossRef]
  38. Iordan, A.E. Development of an Interactive Environment used for Simulation of Shortest Paths Algorithms. Ann. Fac. Eng. Hunedoara 2012, 10, 97. [Google Scholar]
  39. Panoiu, C.; Militaru, G.; Panoiu, M. Real-Time Video Processing for Measuring Zigzag Length of Pantograph–Catenary Systems Based on GPS Correlation. Appl. Sci. 2024, 14, 9252. [Google Scholar]
  40. Das, M.; Das, P.; Akram, W.; Chatterjee, S. A Comparative Analysis of Linear Regression Techniques: Evaluating Predictive Accuracy and Model Effectiveness. Int. J. Innov. Sci. Res. Technol. 2025, 10, 127–139. [Google Scholar]
  41. Kumar, A.; Sen, S.; Sinha, S. Support vector machine-based prediction model for the compressive strength for concrete reinforced with waste plastic and fly ash. Asian J. Civ. Eng. 2025, 26, 1429–1447. [Google Scholar] [CrossRef]
  42. Ali, A.; Amin, M.Z. Hands-On Machine Learning with Scikit-Learn, 1st ed.; Amazon Kindle Direct Publishing: Seattle, WA, USA, 2019. [Google Scholar]
  43. Iordan, A.; Savii, G.; Panoiu, M.; Panoiu, C. Development of a dynamical software for teaching plane analytical geometry. In Proceedings of the International Conference on Engineering Education, Heraklion, Greece, 22–24 July 2008. [Google Scholar]
  44. Palani, K.; Ankalagi, N.; Karunya, T.; Wankhede, D. Python Programming Essentials, 1st ed.; Global Scholars Press: Chiplun, India, 2025. [Google Scholar]
  45. Lafta, N.A. A Comprehensive Analysis of Keras: Enhancing Deep Learning Applications in Network Engineering. Babylon. J. Netw. 2023, 2023, 94–100. [Google Scholar] [CrossRef]
  46. Ramchandani, M.; Khandare, H.; Singh, P.; Rajak, P.; Suryawanshi, N.; Jangde, A.S.; Arya, L.; Kumar, P.; Sahu, M. Survey: Tensorflow in Machine Learning. J. Phys. Conf. Ser. 2022, 2273, 012008. [Google Scholar] [CrossRef]
  47. Hunt, J. Introduction to Matplotlib. In Advanced Guide to Python 3 Programming; Springer: Cham, Switzerland, 2019; Volume 5, pp. 35–42. [Google Scholar]
  48. Iordan, A.E. An optimized LSTM neural network for accurate estimation of software development effort. Mathematics 2024, 12, 200. [Google Scholar] [CrossRef]
  49. Shakhovska, N.; Shymanskyi, V.; Prymachenko, M. FractalNet-LSTM Model for Time Series Forecasting. Comput. Mater. Contin. 2025, 82, 4469. [Google Scholar] [CrossRef]
  50. Jiang, J. The eye of artificial intelligence—Convolutional Neural Networks. Appl. Comput. Eng. 2024, 76, 273–279. [Google Scholar] [CrossRef]
Figure 1. UML use case diagram.
Figure 1. UML use case diagram.
Fractalfract 09 00702 g001
Table 1. Datasets dimensions.
Table 1. Datasets dimensions.
DatasetsProjects AttributesDuration UnitEffort Unit
China49915MonthsPerson-hours
Desharnais8110MonthsPerson-hours
Kemerer157MonthsPerson-months
Maxwell6227MonthsPerson-months
Table 2. China dataset used attributes.
Table 2. China dataset used attributes.
AttributesAttributes DescriptionMinMaxMeanStd
AFPAdjusted function points917,518486.8571059.171
InputFunction points of input09404167.098486.3386
OutputFunction points of external output02455113.601221.2744
EnquiryFunction points of external output enquiry095261.6012105.4228
FileFunction points of internal logical files0295591.2344210.271
InterfaceFunction points of external interface added0157224.234485.041
AddedFunction points of added functions013,580360.3547829.8423
ChangedFunction points of changed functions0519385.0621290.857
PDR_AFPProductivity delivery rate0.383.811.7705412.10565
PDR_UFPProductivity delivery rate0.396.612.0797612.81871
NPDR_AFPNormalized productivity delivery rate0.410113.2697414.00984
NPDU_UFPNormalized productivity delivery rate0.4108.313.6262514.84342
ResourceTeam type141.4589180.823729
DurationTotal elapsed time for software completion1848.7192387.347058
EffortSummary work report2654,6203921.0486480.856
Table 3. Desharnais dataset used attributes.
Table 3. Desharnais dataset used attributes.
AttributesAttributes DescriptionMinMaxMeanStd
TeamExpDevelopment team experience−142.1851.415
ManagerExpManager experience−172.5311.644
TransactionsNumber of the logical transactions9886182.123144.035
EntitiesNumber of logical files or data entities7387122.33384.882
PointsNonAdjustUnadjusted function points731127304.457180.210
AdjustmentAdjustment factor55227.63010.592
PointsAdjustAdjusted function points621116289.235185.761
LanguageUsed programming language131.5560.707
DurationProject schedule in months13911.6677.425
EffortTotal effort for software completion54623,9405046.3094418.767
Table 4. Kemerer dataset used attributes.
Table 4. Kemerer dataset used attributes.
AttributesAttributes DescriptionMinMaxMeanStd
LanguageUsed programming language131.2000.561
HardwareHardware Platform Type or Complexity162.3331.676
KSLOCSize of the software project39449.9186.573136.817
AdjFPAdjusted Function Points99.92306.8999.140589.592
RAWFPRaw Function Points972284993.867597.426
DurationTotal elapsed time for software completion53114.2677.545
EffortTotal amount of required effort 23.21107.31219.247263.055
Table 5. Maxwell dataset used attributes.
Table 5. Maxwell dataset used attributes.
AttributesAttributes DescriptionMinMaxMeanStd
AppApplication type152.35480.9933
HarHardware platform152.61290.9976
DbaUsed database management system041.03220.4423
IfcUsed user interface technology121.93540.2476
SourceUsed source code management system121.87090.3379
NlanDifferent programming languages used 142.54831.0191
T01Customer participation153.04830.9988
T02Development environment adequacy153.04830.7112
T03Staff availability253.03220.8864
T04Standards use253.19350.6975
T05Methods use153.04830.7112
T06Tools use142.90320.6944
T07Software’s logical complexity153.24190.8996
T08Requirements volatility253.80640.9553
T09Quality requirements254.06450.7437
T10Efficiency requirements253.61290.8935
T11Installation requirements 253.41930.9842
T12Staff analysis skills253.82250.6900
T13Staff application knowledge153.06450.9559
T14Staff tool skills153.25801.0071
T15Staff team skills153.33870.7453
TelonuseWhether or not the project uses Telon tool010.24190.4317
DurationProject duration in months45417.209610.6511
EffortTotal amount of effort expended on the project58363,6948223.2110,499.9
Table 6. Real effort values.
Table 6. Real effort values.
DatasetTrainingTesting
NumberMinimMaximMeanStdNumberMinimMaximMeanStd
China3742654,6204089.3056684.161258949,0343417.6245826.511
Desharnais6065123,9405241.554714.1242154614,9874488.4763478.961
Kemerer1123.21107.31222.919306.831472287209.1594.446
Maxwell4679639,4797461.6093478.9611658363,69410,412.8115,517.68
Table 7. Real duration values.
Table 7. Real duration values.
DatasetTrainingTesting
NumberMinimMaximMeanStdNumberMinimMaximMeanStd
China3741488.5816.5691251849.1329.313
Desharnais6013912.3667.697213279.6666.327
Kemerer1152012.0024.921453119.75011.412
Maxwell4645416.54310.7651664519.12510.410
Table 8. Optimal values of used metrics.
Table 8. Optimal values of used metrics.
DatasetMethodMAEMdAERMSECDMSLE
ChinaMLP0.03170.01060.06490.62700.0025
DFCNN0.00990.00280.02050.96270.0003
FractalNN0.01350.00280.03880.86660.0008
KELM0.00750.00140.02330.95180.0002
MLP_RF0.01290.00220.03730.87660.0007
DFCNN_RF0.00620.00170.01540.97910.0001
FractalNN_RF0.00720.00240.02060.96230.0002
ELM_RF0.00460.00130.01370.98340.0001
Previous studies 0.012 [20] 0.016 [20]0.981 [20]
Desharnais MLP0.06470.03450.08870.62670.0048
DFCNN0.07350.02980.10880.43830.0070
FractalNN0.07490.03060.11580.36350.0077
KELM0.07040.03960.09550.56720.0055
MLP_RF0.06600.02560.08920.62190.0048
DFCNN_RF0.06920.02990.10820.44380.0068
FractalNN_RF0.05730.02360.07770.71350.0036
ELM_RF0.06510.02770.09420.57890.0051
Previous studies 0.0699 [23] 0.102 [20]0.6432 [23]
KemererMLP0.01730.02020.02030.92740.0003
DFCNN0.06830.03890.09230.49510.0064
FractalNN0.04100.02910.04530.64030.0015
KELM0.02710.02620.03060.83520.0007
MLP_RF0.04820.03790.06520.25350.0032
DFCNN_RF0.04470.02190.06370.28650.0029
FractalNN_RF0.05200.02660.05870.39550.0027
ELM_RF0.06960.05890.08310.21290.0051
Previous studies 0.0754 [22] 0.301 [20]0.336 [20]
MaxwellMLP0.11390.07300.21510.18400.0213
DFCNN0.11040.04120.20490.25890.0200
FractalNN0.10780.04380.19490.32990.0173
KELM0.10660.04170.22250.12660.0227
MLP_RF0.10330.03550.20790.23730.0206
DFCNN_RF0.09780.04030.18890.37020.0192
FractalNN_RF0.09570.03280.16290.53200.0118
ELM_RF0.09840.03910.23280.34370.0251
Table 9. Wilcoxon test p-values.
Table 9. Wilcoxon test p-values.
MethodsMAEMdAERMSECDMSLE
MLP vs. MLP_RF0.00130.00120.00140.00150.0019
MLP vs. DFCNN_RF0.00140.00050.00130.00170.0017
MLP vs. FractalNN_RF0.00190.00070.00100.00200.0015
MLP vs. ELM_RF0.00230.00240.00220.00240.0023
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Iordan, A.-E. Empirical Comparison of Neural Network Architectures for Prediction of Software Development Effort and Duration. Fractal Fract. 2025, 9, 702. https://doi.org/10.3390/fractalfract9110702

AMA Style

Iordan A-E. Empirical Comparison of Neural Network Architectures for Prediction of Software Development Effort and Duration. Fractal and Fractional. 2025; 9(11):702. https://doi.org/10.3390/fractalfract9110702

Chicago/Turabian Style

Iordan, Anca-Elena. 2025. "Empirical Comparison of Neural Network Architectures for Prediction of Software Development Effort and Duration" Fractal and Fractional 9, no. 11: 702. https://doi.org/10.3390/fractalfract9110702

APA Style

Iordan, A.-E. (2025). Empirical Comparison of Neural Network Architectures for Prediction of Software Development Effort and Duration. Fractal and Fractional, 9(11), 702. https://doi.org/10.3390/fractalfract9110702

Article Metrics

Back to TopTop