Empirical Comparison of Neural Network Architectures for Prediction of Software Development Effort and Duration

Iordan, Anca-Elena

doi:10.3390/fractalfract9110702

Open AccessArticle

Empirical Comparison of Neural Network Architectures for Prediction of Software Development Effort and Duration

by

Anca-Elena Iordan

Department of Computer Science, Technical University of Cluj-Napoca, 400027 Cluj-Napoca, Romania

Fractal Fract. 2025, 9(11), 702; https://doi.org/10.3390/fractalfract9110702 (registering DOI)

Submission received: 30 July 2025 / Revised: 21 October 2025 / Accepted: 26 October 2025 / Published: 31 October 2025

(This article belongs to the Special Issue Data-Driven Modeling, Prediction and Control of Fractional-Order Systems)

Download

Browse Figure

Versions Notes

Abstract

Accurately estimating the effort and duration required for software development is one of the most important challenges in the field of software engineering. In a context where software projects are becoming increasingly complex, project managers face real difficulties in meeting established deadlines and staying within budget constraints. The purpose of this research study is to identify which type of artificial neural network is most suitable for estimating the effort and duration of software development, given the relatively small size of existing datasets. In the process of software effort and duration prediction, four datasets were used: China, Desharnais, Kemerer and Maxwell. Additionally, different types of artificial neural networks were used: Multilayer Perceptron, Fractal Neural Network, Deep Fully Connected Neural Network, Extreme Learning Machine, and Hybrid Neural Network. Another goal of this research is to analyze the impact of a new and innovative hybrid architecture, which combines Fractal Neural Network with Random Forests in the estimation process. Five metrics were used to compare the accuracy of artificial neural networks: mean absolute error, median absolute error, root mean square error, coefficient of determination, and mean squared logarithmic error. Python 3.11 programming language was used in combination with TensorFlow, Keras, and Scikit-learn libraries to implement artificial neural networks.

Keywords:

Multilayer Perceptron; Fractal Neural Network; Deep Fully Connected Neural Network; Extreme Learning Machine; Random Forests

1. Introduction

One of the most complex and critical tasks faced by a project manager during software development is estimating the total effort and duration needed to meet the initial requirements. It is considered one of the major challenges in software engineering [1], as a more accurate estimation increases the chances of success in software project development, completion, and delivery within the specified budget and schedule.

The diversity of software projects has led to the use of many techniques for effort and duration estimation. To support project managers in their tasks, various algorithms (including those based on artificial intelligence [2]) have been used to increase the accuracy of software development effort and duration estimation. Using a dataset to build predictive models is essential for accurately estimating the effort and duration required in software engineering projects [3]. Currently, there is a wide range of datasets available, such as the Albrecht, COCOMO81, China, Desharnais, ISBSG, Kemerer, Kitchenham, Maxwell, Miyazaki, NASA, and Tukutuku datasets. Deep learning techniques [4] typically perform well on relatively large-scale datasets and have demonstrated strong capabilities in estimating target variables in classification and predictive modeling tasks. Since the size of the datasets listed above is relatively small, this study investigates the efficiency of different types of traditional artificial neural networks [5], as well as hybrid artificial neural network architectures [6] obtained by combining them with Random Forests [7], when applied to such datasets.

To support the full comprehension of this research study, the article structure was designed as follows:

The Section 1 clarifies the motivation that led to the choice of the research topic.
The Section 2 summarizes the current status and evolution regarding the estimation of software development effort and duration.
The Section 3 includes the rationale for selecting the four datasets used, as well as a description of their structure.
The Section 4 describes, in detail, the approach adopted for estimating the effort and duration associated with software development adapted to the following three categories of existing datasets: small-sized, medium-sized, and large-sized.
The Section 5 includes an analysis of the results obtained by the used artificial neural networks (traditional and hybrid in combination with Random Forests), after the parameter tuning process. At the same time, a new and innovative hybrid architecture, referred to as FractalNN_RF, is introduced, resulting from the combination of Fractal Neural Network with the Random Forests algorithm. By integrating these two paradigms, it is anticipated to obtain a model capable of improving the accuracy of estimates, increasing the stability of predictions, and providing superior generalization in contexts characterized by structural complexity, especially for medium-sized datasets.
In the Section 6, a comparison of the implemented architectures was conducted, based on five selected metrics, identifying which architecture is optimal for each type of dataset.
The Section 7 summarizes the relevant conclusions, highlighting the implications of implementing the proposed intelligent methods on datasets of different sizes.

2. Literature Survey

At present, numerous studies are dedicated to the estimation of software development effort and duration, each with their own strengths and limitations. Over time, various methods have been applied in these studies, including those from statistics [8], graph theory [9], heuristic approaches [10], fuzzy logic [11,12], evolutionary computation [13], machine learning [14], and artificial neural networks [15]. These studies rely either on public datasets or on private datasets belonging to specific organizations. The choice of dataset type significantly influences both the accuracy and applicability of the resulting estimations. Given the considerable number of studies dedicated to estimating the effort, duration, and cost of software development, the specialized literature includes several articles offering comparative analysis of the results obtained so far.

The research presented in [16] provides a comprehensive analysis of contemporary trends in the field of software effort estimation, with the objective of grounding future research directions. The paper presents a detailed comparison of relevant contributions, organized in reverse chronological order, highlighting the techniques used, the metrics applied, the reported methodological limitations, as well as the main conclusions drawn by various authors. Overall, the analyzed literature reveals the continuous evolution and a significant diversification of approaches in software effort estimation.

Study [17] investigates the application of machine learning techniques in estimating the effort required for software development, with a particular focus on the benefits brought by ensemble methods. The research initially identified 558 relevant papers in the field, from which, after a rigorous selection process based on quality criteria, 40 articles were retained for in-depth analysis. The study conclusions highlight that the integration of ensemble techniques, in both supervised and unsupervised learning, significantly contributes to improving the accuracy of software effort estimations.

The systematic review conducted in study [18] explores the use of ensemble learning techniques and other artificial-intelligence-based strategies in estimating the effort required for software projects. The review focuses on modern methods involving machine learning, neural networks, and large language models, with the primary goal of improving estimation accuracy. Through extensive research conducted in major scientific databases (ACM Digital Library, IEEE Xplore, ScienceDirect, and Scopus), 826 empirical and theoretical studies were identified, 66 of which were selected for detailed analysis. The findings highlight that machine-learning-based methods have become dominant, with most of the analyzed studies confirming their substantial contribution to increasing estimation accuracy and optimizing software project management. In contrast, the use of non-machine-learning artificial intelligence techniques, such as Bayesian networks, remains limited, and the adoption of large language models is still in its early stages of development and application.

The emergence of modern machine learning (ML) techniques and, more recently, automated machine learning (AutoML), has brought significant transformations to the field of software development effort estimation, contributing to increased accessibility, efficiency, and accuracy in the estimation process. Study [19] presents a systematic literature review on the application of ML and AutoML in software effort estimation, highlighting the relevance of the topic, the methods used, the identified advantages, and the volume of existing research. The adopted methodology involved selecting and analyzing 43 articles published in the last decade, based on the techniques implemented—either conventional machine learning or AutoML. The review findings indicate that in most of the analyzed studies, researchers employed ML techniques for software effort estimation, while the application of AutoML remained limited, thus revealing considerable potential for future research in this area.

The aim of the study presented in [20] is to identify the most effective method for estimating the effort required in software development, using the Long Short-Term Memory and Stacked Long Short-Term Memory machine learning algorithms. The study employs six datasets: China, Kitchenham, Kemerer, COCOMO81, Albrecht, and Desharnais. Additionally, it evaluates performance using three metrics: root mean squared error, mean absolute error, and R-squared. The results indicate that Stacked_LSTM algorithm provides the best performance across all metrics for the China, Kemerer, and Albrecht datasets. In contrast, the LSTM algorithm yielded better results for the Desharnais and Kitchenham datasets. For the China dataset, the performance of the Stacked_LSTM algorithm is demonstrated by the following evaluation metric values: 0.012 for MAE, 0.016 for RMSE, and 0.981 for R-squared. For the Desharnais dataset, the performance of the LSTM algorithm is demonstrated by the following evaluation metric values: 0.076 for MAE, 0.102 for RMSE, and 0.638 for R-squared. For the Kemerer dataset, the performance of the Stacked_LSTM algorithm is demonstrated by the following evaluation metric values: 0.170 for MAE, 0.301 for RMSE, and 0.336 for R-squared. These results demonstrate a high level of model accuracy and a strong ability to explain the variance in the data.

Study [21] presents an analysis of the use of machine learning techniques to improve software effort estimation, based on empirical datasets. Five public datasets were employed: ISBSG, NASA93, COCOMO, Maxwell, and Desharnais. The data were preprocessed by handling missing values and transforming categorical features. Four machine learning regression methods were evaluated: Linear Regression, Gradient Boosting, Random Forests, and Decision Tree. Additionally, correlation-based feature selection was applied to identify relevant feature subsets and reduce dimensionality. The comparative analysis focused on two key metrics: R-squared and root mean squared error to evaluate prediction accuracy. The results show that Linear Regression and Random Forests models significantly outperformed the other approaches for the effort estimation task when correlation-based feature selection is applied. The conclusions suggest that correlation-based feature selection can enhance machine learning models for software effort estimation.

Study [22] employed a deep learning model to estimate the effort required for software development. The data preprocessing stage involved cleaning, normalization, and handling missing values, followed by their imputation. For prediction modeling, an innovative network (Multilayer Perceptron-assisted Honey Bidirectional Gated Recurrent Feed Forward Network) was developed, supported by an adaptive optimization algorithm (A-HBa), which adjusted the model parameters to achieve superior performance. The datasets used include the Albrecht, China, Desharnais, Kemerer, Kitchenham, and COCOMO81 datasets. The evaluation, based on mean absolute error, reported values such as 0.0763 for the China dataset, 0.0737 for the Desharnais dataset, and 0.0754 for the Kemerer dataset.

Article [23] introduces the NIVIM model, a method for imputing missing values based on variational autoencoders (VAE) and synthetic data. By combining contextual and similarity-based information, the model generates an extended dataset (SDEE) and applies contextual imputation to improve data quality. NIVIM stands out for its broad applicability as a preprocessing technique and for its superior performance compared to VAE, GAIN, kNN, and MICE methods. The proposed model brings statistically significant improvements across six benchmark datasets—ISBSG, Albrecht, COCOMO81, Desharnais, NASA, and UCP—achieving an average reduction in RMSE between 11.05% and 17.72%, and in MAE between 9.62% and 21.96%. For the Desharnais dataset, the performance of the NIVIM model is highlighted by the following evaluation metric values: MAE = 0.0699, RMSE = 0.1134, and CD = 0.6432.

Accurate effort and duration estimation in software development is one of the most challenging and widely debated issues in the field. It is essential for effective project management, yet its complexity makes it a particularly difficult subject of research. Therefore, accurate effort and duration estimation in software development represents a major challenge in research.

3. Used Datasets

To accurately assess and estimate the effort and duration required for software product development, researchers in the field of software engineering rely on various datasets collected from real-world projects. Among the most well-known and frequently used datasets in software engineering are Albrecht, COCOMO81, China, Desharnais, ISBSG, Kemerer, Kitchenham, Maxwell, Miyazaki, NASA93, and Tukutuku, and detailed analysis of these datasets are presented in article [24].

Article [25] proposes a classification of datasets into three categories, based on the optimal spacing theorem formulated by Eubank [26]. According to this theorem, the quantile function of the density is divided into four intervals: Q1 (first quartile), Q2 (second quartile), Q3 (third quartile), and Q4 (fourth quartile). The first category corresponds to Q1, the second to Q2 and Q3, and the third to Q4. Based on this classification, an SEE (Software Engineering Estimation) dataset is considered small-sized if it includes, at most, 43 project instances, medium-sized if it contains between 44 and 146 instances, and large-sized if it exceeds 147 instances. In the present study, to accurately approximate software development effort and duration, four datasets were selected, one for each quartile, as follows: China (Q4), Desharnais (Q3), Kemerer (Q1), and Maxwell (Q2). The selection of these datasets was based on their relevance in the field of software engineering, the public availability of the data, the size of the datasets, and the diversity of the information included (including actual values for the software development effort and duration), thus ensuring a solid foundation for the comparative analysis and validation of the proposed methods. Table 1 presents both the number of projects analyzed in each dataset and the number of attributes used in this study. The last two columns of Table 1 include the units of measurement corresponding to the two output attributes. The unit of measurement used for the attribute representing the development duration of a software project is the calendar month.

For the China and Desharnais datasets, the effort required for software development is measured in person-hours, while for the Kemerer and Maxwell datasets, the unit of measurement for effort is person-months.

Each of the 499 projects included in the China dataset [27] contains a series of essential characteristics for the analysis and estimation of software development projects, represented by numerical values. In this study, fifteen attributes were used (thirteen as input data and two as output data). The meaning of these fifteen attributes, along with their numerical characteristics (minimum value, maximum value, mean, and standard deviation), is presented in Table 2.

Desharnais dataset [28] contains information extracted from 81 completed software projects, including variables that describe the characteristics of the projects and the teams that developed them. The meaning and numerical characteristics of the ten attributes used in this study (eight input variables and two output variables) are presented in Table 3. Among the eight input variables, the last one, labeled Language, indicates the type of programming language used and is encoded as follows: 1 for first-generation programming languages (e.g., Assembly), 2 for third-generation programming languages (e.g., C++, Java), and 3 for fourth-generation programming languages (e.g., SQL, Oracle Forms).

The Kemerer dataset [29] is a classic dataset used in the estimation of software development effort and duration, built from the acquisition of seven characteristics collected from 15 real software projects. The meaning and numerical characteristics of the seven attributes used in this study (five input variables and two output variables) are presented in Table 4. This table provides details on the distribution of these attributes and their impact on the estimation of software development effort and duration.

Each project in the Maxwell dataset [30] includes a set of essential characteristics for the analysis and estimation of software development projects, represented by numerical values. The Maxwell dataset comprises a total of 62 distinct projects, each containing 26 attributes, of which 22 are independent and 4 are dependent. Out of the four dependent attributes, the following two were used in this study: the effort required to complete the project, measured in person-hours per month, and the total development duration, measured in months. Information about the 24 Maxwell dataset attributes used in this study is presented in Table 5.

With 499 projects, the China collection is considered a large-sized SEE dataset according to the classification provided in [25]. With 15 projects, the Kemerer collection is classified as small-sized according to the previously mentioned classification. Both the Desharnais dataset, which includes data from 81 projects, and the Maxwell dataset, containing data from 62 projects, are classified as medium-sized datasets.

4. Research Approach

4.1. Selected Artificial Neural Networks

To achieve a more accurate estimation of effort and duration required for the development of a software product, the following neural network architectures were used in this study: Multilayer Perceptron (MLP), Deep Fully Connected Neural Network (DFCNN), Fractal Neural Network (FractalNN), Kernel Extreme Learning Machine (KELM), and Hybrid Artificial Neural Networks.

MLP [31] is one of the most fundamental architectures of artificial neural networks, widely used in tasks such as classification, regression, and pattern recognition. Its structure consists of three categories of layers: the input layer, output layer, and one or more hidden layers. Data flows through the network in a unidirectional manner, without forming loops. The learning process is based on the backpropagation algorithm, and weight optimization is performed using the gradient descent method.

DFCNN [32] is an advanced extension of MLP architecture, characterized by a large number of hidden layers and dense connectivity between neurons. This type of network is successfully applied to complex tasks such as regression, classification, and functional modeling. Due to its architectural depth, DFCNN is capable of learning hierarchical and abstract data representations. In the context of software development effort and duration estimation, the model can identify and model sophisticated relationships between variables, such as code size, functional complexity, or team experience level, thus generating more accurate predictions compared to shallow architectures.

FractalNN represents a modern approach to deep learning, integrating the principles of fractal geometry into neural network architecture. Through their hierarchical structure and self-similarity, they effectively model complex patterns and nonlinear dependencies. This type of network was first introduced in study [33] as an alternative to residual neural networks, a class of convolutional neural networks, initially applied to classification tasks. Inspired by fractal geometry, which investigates self-repeating patterns across multiple scales, Fractal Neural Networks employ parallel convolutional branches operating at varying levels of abstraction. These branches are subsequently combined, typically via averaging or concatenation, to generate richer, multiscale feature representations. This multi-branch design reflects fractal self-similarity and enables the effective capture of patterns distributed across different temporal scales.

KELM [34] integrates kernel functions to facilitate the learning of nonlinear relationships in high-dimensional feature spaces. Unlike traditional backpropagation-based methods, KELM uses an analytical formulation without iterative training, resulting in its faster processing speed and increased scalability. By employing kernel functions (Gaussian, polynomial, or linear), the algorithm projects data into higher-dimensional spaces, enhancing the model generalization capability. In the field of software development effort and duration estimation, KELM enables the fast and accurate modeling of complex relationships between project variables such as functional complexity, code volume, or team expertise level, offering performance comparable to or even better than that of conventional neural networks.

A hybrid artificial neural network [35] is an artificial intelligence model that combines neural networks with other machine learning algorithms in order to leverage the strengths of each and improve overall system performance. In this study, hybrid neural networks were designed in cascade architecture, where an artificial neural network (such as MLP, DFCNN, FractalNN, or ELM) was used for automatic feature extraction from the input data. The resulting features were then passed as input to a Random Forests (RF) regressor, which performed the final prediction. This modular structure enables the combination of the deep representational capabilities of neural networks with the robustness and generalization power of the Random Forests algorithm, leading to improved prediction accuracy and greater stability in the presence of noise or variability in the data.

The efficiency of the aforementioned artificial neural network architectures in estimating the effort and duration required for software product development fundamentally depends on the optimal configuration of the hyperparameters specific to each architecture. The appropriate selection of these (such as the number of hidden layers, the number of neurons per layer, learning rate, activation functions, regularization methods, and the type of optimizer), directly influences the model ability to generalize and to provide accurate and robust predictions.

4.2. Used Metrics

The accurate evaluation of the performance of selected artificial neural networks [36] is challenging due to imbalanced datasets. To achieve the aforementioned objective, the following five metrics were used: mean absolute error, median absolute error, root mean square error, coefficient of determination, and mean squared logarithmic error.

The mean absolute error (MAE) [37] represents the average of the absolute differences between predicted and actual values. The formula used to calculate MAE is presented in Equation (1).

M A E = \frac{1}{m} \cdot \sum_{k = 1}^{m} |x_{k} - x_{k}^{″}|

(1)

In this formula, as well as in the next four, m denotes the total number of data, x_k represents the true value, and x_k″ indicates the predicted value. Median absolute error (MdAE) [38] computes the median of all absolute differences between the actual effort and the estimated effort, as defined by the formula shown in Equation (2).

M d A E = m e d i a n ({\{|x_{k} - x_{k}^{″}|\}}_{k = 1}^{m})

(2)

The root mean square error (RMSE) [39] measures the standard deviation of the prediction errors. The mathematical expression used to calculate the root mean square error is given in Equation (3).

R M S E = \sqrt{\frac{1}{m} \cdot \sum_{k = 1}^{m} {(x_{k} - x_{k}^{″})}^{2}}

(3)

The coefficient of determination (CD) [40] is defined as one minus the ratio of the sum of squared residuals to the total sum of squares, as shown in Equation (4).

C D = 1 - \frac{\sum_{k = 1}^{m} {(x_{k} - x_{k}^{″})}^{2}}{\sum_{k = 1}^{m} {(x_{k} - x^{’})}^{2}}

(4)

where

x^{’} = \frac{1}{m} \cdot \sum_{k = 1}^{m} x_{k} .

(5)

The mean squared logarithmic error (MSLE) [41] measures the average squared difference between the logarithms of the predicted and actual values. This metric is useful when you want to penalize underestimates more than overestimates, especially when dealing with data spanning several orders of magnitude. The mathematical formula for MSLE is presented in Equation (6).

M S L E = \frac{1}{m} \cdot \sum_{k = 1}^{m} {(\log (x_{k}^{″} + 1) - l o g (x_{k} + 1))}^{2}

(6)

Values closer to one for CD and values closer to zero for the other metrics indicate a higher prediction accuracy. The characteristic values for these five metrics were computed using functions from the sklearn.metrics module, which is part of the Scikit-learn library [42].

4.3. Software Design for Effort and Duration Estimation

To successfully achieve the proposed objectives, an intelligent software system based on artificial neural networks was developed. Its functionalities, which reflect the usual workflow across the stages of selection, training, testing, and comparison of neural networks, are represented in the UML use case diagram [43], shown in Figure 1. The use case diagram includes one actor (the user who interacts directly with the intelligent software system), eighteen use cases, and the functional relationships between them. Analyzing the eighteen use cases, the functionalities of the intelligent software system are embodied in the following main activities:

The selection of the dataset is followed by its normalization using the Min-Max scaler technique. The normalization process is implemented using the MinMaxScaler function from the sklearn.preprocessing library [42], which enables the rescaling of dataset values into a standardized interval [0, 1]. After normalization, the dataset is partitioned into training and testing subsets, with approximately 80% of the data used for training and the remaining 20% for testing, ensuring a proper separation for model evaluation.
The selection of the neural network type is followed by the process of parameter tuning, training, and testing, based on the previously specified dataset.
The metric values of the trained and tested model are saved for the purpose of comparing neural networks and identifying the most efficient architecture for each dataset category, which are as follows: small-sized, medium-sized, and large-sized. All five used evaluation metrics were computed on the normalized datasets, after applying Min-Max normalization. This approach guarantees that differences in original units of effort across datasets (person-hours vs. person-months) do not affect the comparability of results.

Before the parameter tuning process, the datasets were partitioned for training and testing purposes. For each dataset under analysis, an optimal splitting strategy was applied, leading to the selection of approximately 80% of the data for training and the remaining 20% for testing. Table 6 provides details on the number of software effort values (columns 2 and 7), the minimum value (columns 3 and 8), the maximum value (columns 4 and 9), the mean (columns 5 and 10), and the standard deviation (columns 6 and 11), corresponding to both the training and testing phases of the intelligent methods.

In a similar manner, Table 7 presents detailed statistics regarding the software development duration, including the number of values for software development duration (columns 2 and 7), minimum values (columns 3 and 8), maximum values (columns 4 and 9), mean values (columns 5 and 10), and standard deviations (columns 6 and 11), corresponding to both the training and testing phases of the intelligent methods.

To implement the functionalities described in the UML use case diagram, shown in Figure 1, Python programming language [44] was chosen, due to its versatility and extensive support for the development of applications based on artificial intelligence. In support of this approach, four specialized libraries were used, each having an essential role in the development and experimentation process. Keras library [45] was used to define and train neural networks, providing a high-level, intuitive, and efficient interface for building complex models. TensorFlow 2.14.0 [46], on which Keras is based, was responsible for the efficient execution of numerical operations and for managing the computational graph, thus ensuring the scalability and performance required in the learning process. For the visualization of the results and the graphical analysis of the performance of the models, the Matplotlib 3.9.4 library [47] was used, which allows for the generation of detailed and customizable graphs. Additionally, the Scikit-learn 1.6.1 library [42] was integrated for data preprocessing, feature selection, dataset partitioning, and model performance evaluation. Together, these tools provided a robust and flexible framework for implementing, testing, and validating the functionalities specified in the UML model.

5. Analysis of Implemented Artificial Neural Networks

In most artificial neural networks, parameters are essential variables used to learn the characteristics of the dataset and to adjust the learning process with the goal of achieving optimal performance. Parameter tuning [48] procedure, aimed at identifying the ideal configuration for each neural network to ensure that the predicted outcomes are as accurate and efficient as possible, was applied in this research to all the models under analysis.

5.1. Multilayer Perceptron

MLP implementation was carried out using the MLPRegressor function belonging to the sklearn.neural_network library, with multiple configurations tested based on different hyperparameter values. The MLP architecture used in this study is characterized by the following components:

The input layer contains a number of neurons automatically determined by the number of input attributes in dataset: 13 neurons for the China dataset, 8 for the Desharnais dataset, 5 for the Kemerer dataset, and 22 for the Maxwell dataset.
The hidden layer consists of a variable number of neurons, ranging from 20 to 200, incremented in steps of 20.
The output layer includes two neurons, each corresponding to one of the two following estimated values: software development effort and duration.
The ReLU activation function is used for the hidden layer.
The model is trained using the Adam optimizer.
The number of epochs varies between 100 and 1000, in increments of 100.

Following the used values in the parameter tuning process, 10 values for parameter e (number of epochs) and 10 values for parameter n (number of neurons from hidden layer), and 100 configurations of the MLP neural network were trained. The performance of each configuration was evaluated using the five selected metrics. In Table A1, the third column presents the optimal values obtained for the five metrics applied to the 100 configurations of MLP network. The fourth and fifth columns indicate the values of the hyperparameters corresponding to the configurations for which these optimal performances were achieved for each metric. Columns six, seven, and eight in Table A1 present information related to the estimated effort, while the last three columns provide details about the estimated duration, according to the MLP model for which the optimal values of the evaluation metrics were obtained. For the China dataset, three distinct MLP configurations were identified, each corresponding to the optimal values obtained for the five used metrics. It is noteworthy that the optimal values for RMSE, CD, and MSLE were produced by the same hyperparameter configuration. For Desharnais dataset, two optimal MLP configurations were identified; one configuration yielded the lowest values for MAE and MdAE, while another led to the best results for RMSE, CD, and MSLE. For the Kemerer and Maxwell datasets, a single hyperparameter configuration simultaneously yielded optimal results across all five metrics, suggesting a higher degree of model stability and robustness in these particular contexts.

5.2. Deep Fully Connected Neural Network

DFCNN, a feedforward and fully connected network, was designed with an input layer, ten fully connected hidden layers, and an output layer to solve a multivariate regression problem with two continuous output variables. It was trained with the objective of identifying the most performant combination of hyperparameters, specifically, the number of epochs and the number of nodes, based on the values obtained for the evaluation metrics. The DFCNN architecture used in this study is characterized by the following components:

The input and output layers were designed with the same structure as in the MLP network.
Ten hidden layers were implemented using Dense class from tensorflow.keras.layers, each of them using ReLU activation function. The number of neurons in each hidden layer is consistent within a given configuration and varies across experimental runs, ranging from 20 to 200 neurons, in increments of 20.
The model is trained using the Adam optimizer algorithm, with MSE employed as the loss function.
The number of epochs varies between 100 and 1000, in increments of 100.

This deep architecture allows for a flexible representation of complex nonlinear relationships within the data, while the systematic hyperparameter tuning aims to identify robust configurations that generalize well across different software engineering datasets. Based on the hyperparameter tuning process, where 10 values were tested for the parameter e (number of training epochs) and 10 values for the parameter n (number of neurons in the hidden layers), a total of 100 distinct DFCNN configurations were trained. In Table A2, the third column reports the optimal values obtained for the five used metrics applied across these 100 configurations.

In the experiments conducted on the China, Desharnais, and Kemerer datasets, three distinct DFCNN configurations were identified, each leading to optimal values for the five analyzed performance metrics. It was observed that the same set of hyperparameters simultaneously yielded the best results for RMSE, CD, and MSLE, indicating the increased robustness of that particular configuration. For the Maxwell dataset, two optimal DFCNN configurations were identified. The first configuration resulted in the lowest values for MAE and MdAE, while the second configuration achieved superior results for RMSE, CD, and MSLE. These findings highlight the variability in model behavior depending on dataset characteristics, as well as the importance of selecting appropriate hyperparameter configurations to ensure optimal performance.

5.3. Fractal Neural Network

The application of Fractal Neural Networks in regression problems remains insufficiently explored, although in the past eight years several studies have used this type of neural network in classification tasks. A recent study [49], published this year, proposes a hybrid variant for time series forecasting; however, in the field of software engineering, these architectures have not yet been utilized.

The innovative FractalNN architecture proposed in this study combines a recursive fractal structure with 1D convolutional blocks, specific to convolutional neural networks [50], for addressing regression problems.

The innovative proposed architecture is composed of the following elements:

An input layer whose dimensionality corresponds to number of input attributes in the dataset (13 neurons for the China dataset, 8 for Desharnais, 5 for Kemerer, and 22 for Maxwell).
A fractal convolutional block, which is defined as a recursive structure controlled by a depth parameter (equal with four), generating two parallel branches at each level. The short branch applies a single Conv1D layer, while the long branch recursively applies two fractal blocks to the same input, illustrating self-similarity. The outputs of the two branches are merged via averaging, facilitating the integration of information across multiple scales. Implementation of convolutional layer was carried out using Conv1D class, belonging to tensorflow.keras.layers library [45].
Dense layers for regression, which are applied after the fractal convolutional block; the output is first flattened into a vector, then passed through a Dense layer with 64 neurons and ReLU activation, and finalized with a Dense layer with 2 neurons and linear activation, corresponding to a regression task with two continuous outputs.
As part of the hyperparameter tuning process, 10 values were evaluated for the number of training epochs (denoted e) and 6 values for the number of filters in Conv1D (denoted f), resulting in a total of 60 unique FractalNN configurations. The number of epochs varies between 100 and 1000, in increments of 100, but for the number of filters in Conv1D, six discrete values were tested, which were as follows: 8, 16, 32, 64, 128, and 256.

Table A3 presents, in its third column, the optimal values obtained for five metrics applied across these configurations. In the experiments performed on the China and Desharnais datasets, 3 distinct FractalNN configurations were found to yield optimal results across the five performance metrics considered. It was observed that one particular set of hyperparameters simultaneously produced the best values for RMSE, CD, and MSLE, indicating a higher degree of robustness for that configuration.

In contrast, for the Kemerer and Maxwell datasets, 2 optimal FractalNN configurations were identified. The second configuration achieved the lowest MdAE value, while the first configuration delivered superior performance for MAE, RMSE, CD, and MSLE.

5.4. Kernel Extreme Learning Machine

The proposed KELM algorithm represents an extension of the traditional Extreme Learning Machine approach, in which the feature space is implicitly generated through the use of a kernel function. This strategy eliminates the need for explicitly optimizing the hidden layer weights and activations, thereby simplifying the training process. In the implemented architecture, the RBF (Radial Basis Function) kernel is employed to perform a nonlinear mapping of the input data, which enhances the separability of the data in the induced feature space. The main components of the KELM model are as follows:

RBF kernel Gram matrix captures the pairwise similarities between training samples in the transformed feature space, enabling nonlinear modeling through kernel-based methods.
Hyperparameter γ represents the coefficient of the radial basis function kernel and controls the spread or influence of the kernel function. Its value remains consistent within a given configuration and varies across experimental cycles. In this study, γ was tested over a range of values, as follows: 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, and 50.
Hyperparameter λ denotes the regularization coefficient, employed to stabilize the inversion of the Gram matrix in the presence of multicollinearity or noise. In this study, λ was evaluated across the following values: 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100, and 500.

Within the hyperparameter optimization process of KELM architecture (Table A4), a grid search was conducted over ten distinct values for the radial coefficient (denoted γ) and ten values for the regularization coefficient (denoted λ).

This process yielded a total of 100 unique KELM configurations. The performance of each configuration was assessed based on five metrics, and the optimal values identified across these configurations are summarized in the third column of Table A4. Experimental evaluations conducted on the China and Desharnais datasets led to the identification of two optimal configurations of KELM. The first configuration yielded superior performance across four metrics: MAE, RMSE, CD and MSLE, while the second configuration minimized MdAE. In the case of Kemerer dataset, a single configuration of hyperparameters simultaneously produced optimal values for all five metrics, indicating the increased model stability and robustness within this specific data context. Regarding Maxwell dataset, three distinct KELM configurations were found to yield optimal results across the five metrics assessed. Notably, one of these configurations achieved the best values for RMSE, CD, and MSLE concurrently, suggesting a higher degree of reliability and generalization capacity for that particular hyperparameter setting.

5.5. Hybrid Artificial Neural Networks

To improve the metrics performance obtained for four previously implemented artificial neural network architectures (MLP, DFCNN, FractalNN, and KELM), the development of four corresponding hybrid neural network models was undertaken. These hybrid models consisted of combining each type of neural network with the Random Forests machine learning algorithm, aiming to capitalize on the complementary strengths of both approaches and enhance predictive accuracy. The hybrid neural networks were designed using cascade architecture, in which an artificial neural network (such as MLP, DFCNN, FractalNN, or ELM) is employed for automatic feature extraction from the input data. Subsequently, these extracted features are fed into the RF regressor, which leverages its ensemble learning capabilities to produce the final prediction. This combined framework is intended to improve robustness, generalization, and overall model performance beyond what either method could achieve independently.

5.5.1. Multilayer Perceptron Combined with Random Forests

The proposed hybrid artificial neural network architecture follows a cascade structure, integrating a MLP with a decision tree-based regressor (RF). This architectural combination (denoted as MLP_RF) aims to leverage the feature extraction capabilities of neural networks together with the robustness and generalization power of ensemble learning methods like RF, particularly in the context of multivariate regression tasks.

The first component of proposed architecture is an MLP network, consisting of an input layer adapted to the dimensionality of each dataset, followed by one fully connected hidden layer using ReLU activation function. The final dense layer of MLP produces a low-dimensional latent representation, which serves as a compressed and abstract feature vector derived from the input data. This latent vector is then used as an input to the RF model.

The second component is a multi-output RF regressor, trained to simultaneously predict two output variables. RF model operates on the latent features extracted by MLP and is responsible for generating the final predictions. During the hyperparameter tuning process for MLP_RF architecture, the following five key hyperparameters were optimized:

Number of hidden layer nodes in MLP network (denoted as n), which varied within the interval [100, 1000] with an increment step of 100.
Number of training epochs for MLP (denoted as e), explored within the range [50, 500] with a step size of 50.
Number of estimators (denoted as s), controlling the total number of trees generated by RF model, ranging from 80 to 800 with an increment of 80.
Maximum depth of the trees (denoted as d), influencing the complexity of each individual decision tree.
Random seed (denoted as r), used to control the randomness of the RF training process.

During the hyperparameters’ tuning process, multiple combinations of predefined values for five important parameters of MLP_RF architecture were explored. The number of neurons in MLP hidden layer was varied using 10 values ranging from 100 to 1000, while the number of training epochs was adjusted across 10 values between 50 and 500. Additionally, 10 values between 80 and 800 were selected for the number of estimators in the RF model. For the maximum depth of the trees, the following five discrete values were tested: 4, 8, 16, 32, and 64. The random_state parameter was evaluated using four values (21, 42, 84, and 168) to ensure the reproducibility of the results. This configuration led to the generation of a large number of unique MLP_RF models, each evaluated based on five performance metrics applied to multivariate regression tasks. Table A5 summarizes these results, highlighting, in the third column, the optimal values obtained for each metric, thus reflecting the superior performance of the corresponding configuration. In addition, Table A5 reports the estimated software effort and duration (minimum, maximum, and mean) for each experimental configuration.

The experimental evaluations conducted on the China and Maxwell datasets led to the identification of three distinct configurations of MLP_RF hybrid architecture, each yielding optimal results with respect to the five analyzed performance metrics. Among these, one configuration stood out by simultaneously achieving the best values for RMSE, CD, and MSLE, suggesting the increased reliability and a high generalization capability associated with the specific parameter values used in that configuration. The best performance was observed in the China dataset, indicating a strong correlation between predicted and actual values. For the Desharnais and Kemerer datasets, two optimal MLP_RF configurations were identified. The first configuration demonstrated superior performance in four out of the five metrics (MAE, RMSE, CD, and MSLE), while the second configuration proved effective in minimizing the MdAE value.

5.5.2. Deep Fully Connected Neural Network Combined with Random Forests

Another proposed architecture for hybrid artificial neural networks involves the integration of a DFCNN with an RF regressor, within a modular and flexible architecture. In this configuration (denoted as DFCNN_RF), DFCNN serves as a latent feature extractor, generating abstract and informative representations of input data. These features are subsequently used by the RF regressor to perform the prediction, leveraging the DFCNN ability to learn complex representations and RF robustness in regression tasks. To enable the simultaneous prediction of two target variables, the model uses Scikit-learn MultiOutputRegressor wrapper, adapting the RF regressor to a multi-output regression setting. The entire ensemble is implemented as a custom Scikit-learn estimator, inheriting functionalities from BaseEstimator and RegressorMixin, thereby ensuring compatibility with hyperparameter optimization procedures. During the hyperparameter tuning process for hybrid DFCNN_RF architecture, the same five parameters previously used in MLP_RF hybrid architecture were applied. The use of these common hyperparameters facilitates a fair comparison between the two architectures, allowing for an objective evaluation of performance under similar experimental conditions. The hyperparameters’ tuning process led to the generation of a significant number of unique models of hybrid DFCNN_RF architecture, with each variant being evaluated based on five performance metrics corresponding to multivariate regression tasks. The results are summarized in Table A6, where the third column presents the optimal values associated with each used metric.

The experimental evaluations conducted on the China, Desharnais, and Maxwell datasets led to the identification of three distinct configurations of hybrid DFCNN_RF architecture, each exhibiting optimal performance according to the five metrics used for multivariate regression tasks. One of these configurations stood out by simultaneously achieving the best values for RMSE, CD, and MSLE, indicating a high level of reliability and superior generalization capacity associated with the specific parameterization implemented.

Regarding Kemerer dataset, two DFCNN_RF configurations with notable performance were identified. The first achieved superior results for four out of the five analyzed metrics (MAE, RMSE, CD, and MSLE), while the second was distinguished by its efficiency in minimizing MdAE.

5.5.3. Fractal Neural Network Combined with Random Forests

As part of addressing the multivariate regression problem, a new innovative hybrid model named FractalNN_RF was developed, combining the feature extraction capabilities of the previously described FractalNN architecture with the robustness and generalization power of the RF regressor. This integration aims to leverage the strengths of both components to enhance prediction accuracy and stability. The proposed architecture is based on two main components. The FractalNN model acts as an extractor of latent features from the input data. The latent features extracted by the network are then used as input for second component, which is an RF regressor trained in multi-output mode. The novelty of this hybrid model, FractalNN_RF, lies in the synergistic integration of a fractal architecture, capable of capturing multiscale patterns through recursive self-similarity, with the robustness and generalization strength of Random Forests. This combination was specifically motivated by the need to address the challenges of limited-size datasets, where complex nonlinear feature extraction must be balanced with stability against noise and variability. Together, these strengths provide a balanced model that enhances both accuracy and generalization in software effort and duration prediction.

During the hyperparameter optimization phase for the hybrid model FractalNN_RF, four out of the five hyperparameters previously used in the MLP_RF and DFCNN_RF hybrid architectures were retained, with the same values applied. Among these, one pertains to the neural network component, the number of training epochs, while the remaining three are associated with RF regressor: the number of estimators, the maximum tree depth, and the random seed values. The fifth hyperparameter, specific to the FractalNN model, is the number of filters in the Conv1D layer (denoted as f), for which six values from the discrete set {8, 16, 32, 64, 128, 256} were evaluated. The tuning process led to the development of a substantial number of unique FractalNN_RF configurations, each variant being assessed according to five relevant metrics of multivariate regression tasks. The evaluation results are summarized in Table A7, where the third column highlights the optimal values associated with each metric used.

Training on the China, Desharnais, and Kemerer datasets led to the identification of three distinct configurations of FractalNN_RF hybrid architecture, each demonstrating optimal performance across the five metrics used to evaluate multivariate regression tasks. For the China dataset, one configuration achieved the lowest values for MAE and MSLE, another excelled in terms of RMSE and CD, while a third stood out by minimizing MdAE. For the Desharnais and Kemerer datasets, one configuration clearly distinguished itself by simultaneously delivering top performance in RMSE, CD, and MSLE. Regarding Maxwell dataset, two high-performing FractalNN_RF configurations were identified; the first showed superior results across four of the five metrics (MAE, RMSE, CD, and MSLE), while the second was notable for its efficiency in reducing MdAE.

5.5.4. Extreme Learning Machine Combined with Random Forests

The last proposed model employs a hybrid architecture (ELM_RF) that combines ELM for nonlinear feature extraction with an RF regressor responsible for predicting the target variables. The objective of this approach is to capture complex relationships within the data by projecting them into a latent feature space, followed by a robust and interpretable regression stage. The process begins with training the ELM model, where the number of neurons in the hidden layer is varied as a key parameter. This stage produces a latent representation of the data through a nonlinear transformation. The extracted features are then used as input for RF, which is trained to predict the target vector. The RF regressor is chosen for its robustness to noise and is fine-tuned by varying three hyperparameters: the number of estimators, the maximum tree depth, and the random seed value.

During the hyperparameter tuning process, multiple combinations of predefined values were investigated for the four previously specified parameters of the hybrid ELM_RF architecture. The number of neurons in the hidden layer of ELM model was varied using ten values ranging from 100 to 1000, in increments of 100 (Table A8).

Likewise, for the RF model, 10 values were selected for the number of estimators, ranging from 80 to 800, with a step size of 80. The maximum tree depth was tested using the following five discrete levels: 4, 8, 16, 32, and 64. The random_state parameter, used to control randomness and ensure reproducibility, was evaluated with the following four values: 21, 42, 84, and 168. This strategy for exploring the hyperparameter space led to the generation of a significant number of distinct ELM_RF configurations, each of which was assessed based on five multivariate regression performance metrics. The results of these experiments are summarized in Table A8, where the third column presents the optimal values obtained for each metric, thus highlighting the superior performance of the corresponding configuration.

Training ELM_RF model on the China and Kemerer datasets led to the identification of two distinct configurations of hybrid ELM_RF architecture, each achieving optimal performance across the five metrics used to evaluate multivariate regression tasks. For the China dataset, two high-performing configurations were identified; the first demonstrated superior results for four out of the five metrics (MAE, RMSE, CD, and MSLE), while the second stood out for its effectiveness in reducing the MdAE value. In the case of the Kemerer dataset, one configuration achieved the lowest values for MAE and MdAE, whereas another configuration excelled in terms of RMSE, CD, and MSLE. For the Desharnais dataset, a single configuration clearly stood out by simultaneously delivering top-level performance in RMSE, CD, and MSLE. Regarding the Maxwell dataset, a single high-performing ELM_RF configuration was identified, showing superior results across all five evaluated metrics.

6. Comparative Analysis of Implemented Artificial Neural Networks

To determine the most suitable estimation model based on the dataset size (small-sized, medium-sized, and large-sized), four types of artificial neural networks and four types of hybrid neural networks were compared, using the values of five evaluation metrics: MAE, MdAE, RMSE, CD, and MSLE. Table 8 highlights the optimal values of used metrics for the eight prediction methods applied to the four datasets, which are as follows: China (large-sized), Desharnais (medium-sized), Kemerer (small-sized), and Maxwell (medium-sized). The results obtained are compared both with the proposed models and with the values reported in previous studies.

For the China dataset, the best performance, reflected by the minimum values of the MAE, MdAE, RMSE, and MSLE metrics, as well as the maximum value of the CD, was achieved by ELM_RF model, demonstrating its superior predictive capability. MAE (0.0046) and RMSE (0.0137) values indicate a high level of accuracy, with very low prediction errors. Although the RMSE is slightly higher than the MAE, suggesting a few instances of more pronounced errors, these remain limited overall. The minimum MdAE value (0.0013) confirms that model performance is not significantly affected by abrupt variations. Additionally, the very low MSLE value (0.0001) reflects an extremely small logarithmic error, which is a strong indicator of the model robustness. The CD, with a value of 0.9834, indicates an excellent fit between the predicted and actual values.

ELM_RF architecture, which combines ELM and RF models, leverages both the generalization capability of ELM and the robustness of RF to data variability. Therefore, this method may be considered a suitable choice for large-sized datasets, such as the China dataset.

For the Desharnais and Maxwell datasets, the innovative FractalNN_RF architecture achieves the best performance across all evaluated metrics. For the Desharnais dataset, the MdAE value (0.0236), being lower than the MAE (0.0573), indicates that most prediction errors are small, although the average is influenced by a few larger deviations. RMSE (0.0777), while still low, is slightly higher than MAE, confirming the presence of a few isolated cases with more significant errors. The CD value of 0.7135 reflects a good, though not perfect, fit between the predicted and actual values, suggesting room for improvement in capturing data relationships, an aspect that is typically challenging to optimize given the medium size of the Desharnais dataset. The MSLE value (0.0036) confirms that the proportional errors in the predictions are very small. For the Maxwell dataset, the MAE value (0.0957) is relatively moderate, while the lower MdAE value (0.0328) indicates that majority of predictions are accurate, with a few outliers increasing the overall mean error. RMSE (0.1629), being higher than MAE, further confirms the presence of certain predictions with notable deviations. The CD, with a value of 0.5320, suggests that the model does not fully capture the underlying relationships within the data, a limitation that is often challenging to address in the context of medium-sized datasets. The MSLE value (0.0118) indicates low errors in the logarithmic space, reflecting good proportional prediction performance by the model.

Hybrid FractalNN_RF architecture, which combines Fractal Neural Network with the Random Forests algorithm, represents an advanced approach that merges deep learning capabilities with the robustness of ensemble techniques. It is ideal for medium-sized datasets, such as the Desharnais and Maxwell datasets, where complex nonlinear relationships and data variations can significantly impact model performance.

For the Kemerer dataset, MLP architecture achieves the best performance across all five metrics, while the hybrid networks prove to be ineffective on this small-sized dataset. The MAE value (0.0173) highlights a high level of overall predictive accuracy, making it suitable for applications with strict precision requirements. MdAE (0.0202), being slightly higher than MAE, suggests a relatively uniform distribution of errors, without significant outliers. RMSE (0.0203), close to MAE and nearly identical to MdAE, indicates that there are no large-sized errors distorting the average, thus confirming the consistency and balance of the model predictions. The CD (0.9274) represents an excellent score, suggesting that the model appropriately captures the relevant relationships between variables, even in the context of a small-sized dataset. The MSLE (0.0003), with an extremely low value, reflects a high proportional accuracy. Moreover, the comparative analysis of CD values obtained for the eight proposed models reveals that hybrid networks generally exhibit inferior performance compared to standard models. The analysis of the metric values highlights that the proposed MLP model demonstrates a balanced performance and a high degree of reliability in its predictions, an aspect that is both rare and highly valuable. These characteristics make this model a suitable candidate for estimating software development effort and duration, particularly when working with small-sized datasets, such as the Kemerer dataset.

The superior performance of the simpler MLP model on the small-sized Kemerer dataset is attributable to its ability to avoid overfitting, a common issue in complex architectures with numerous parameters. In contexts with reduced datasets, an MLP model, characterized by a lower complexity, strikes an optimal balance between expressiveness and generalization, effectively capturing underlying patterns without amplifying noise. This observation is consistent with the bias–variance trade-off theory; for small datasets, models with lower complexity typically generalize better, whereas for medium or large datasets, more complex architectures are able to exploit the richer information available and achieve superior accuracy.

To reinforce the previous observations, the Wilcoxon signed-rank test was applied to the small-sized Kemerer dataset in order to assess the statistical significance of the performance differences between the MLP architecture and proposed hybrid architectures (MLP_RF, DFCNN_RF, FractalNN_RF, and ELM_RF). Each of the five models was trained over ten independent runs, and the Wilcoxon test was performed to compare their performance across five evaluation metrics: MAE, MdAE, RMSE, CD, and MSLE. The resulting p-values, presented in Table 9, range from 0.0005 to 0.0024, indicating that all observed performance differences are statistically significant at the 1% significance level (p < 0.01). These findings provide strong empirical evidence that the simpler MLP architecture consistently outperforms the more complex hybrid models (MLP_RF, DFCNN_RF, FractalNN_RF, and ELM_RF) in the context of small-sized datasets, such as the Kemerer dataset.

This result supports the hypothesis that architectural simplicity enhances generalization capability under conditions of limited data availability, whereas hybrid models, due to their higher structural complexity, tend to be more prone to overfitting.

The proposed architectures were also compared with those presented in previous studies, as reviewed in the Section 2. The optimal architectures developed for each dataset significantly outperform the values reported in the existing literature. Table 8 highlights the superiority of the proposed hybrid architectures (ELM_RF for large-sized datasets and FractalNN_RF for medium-sized datasets), compared to traditional approaches and prior research results. In the case of small-sized datasets, the MLP traditional architecture still yields better performance.

7. Conclusions

This study focused on developing predictive models for estimating the effort and duration required to complete software projects, tailored to dataset size. An analysis of the software engineering domain revealed that open-source datasets within this field can be categorized into three main groups: small-sized, medium-sized, and large-sized datasets. In this study, four datasets were utilized: the large-sized China dataset, the medium-sized Desharnais and Maxwell datasets, and the small-sized Kemerer dataset.

For this purpose, eight artificial neural network architectures were proposed: four traditional models (MLP, DFCNN, FractalNN, and KELM) and four hybrid models, which were obtained by combining these with the RF algorithm, denoted as MLP_RF, DFCNN_RF, FractalNN_RF, and ELM_RF. The proposed architectures were analyzed and compared based on the following five evaluation metrics: MAE, MdAE, RMSE, CD, and MSLE.

Following the comparative analysis of the results obtained from the eight proposed architectures, it becomes evident that the hybrid neural network model ELM_RF demonstrates the highest effectiveness when applied to large-sized datasets. This conclusion is supported by its superior performance on the China dataset, where it achieved the best overall results across multiple evaluation metrics. The integration of the Extreme Learning Machine with the Random Forests algorithm appears to enhance the model capacity for nonlinear pattern recognition and reduce the tendency toward overfitting, resulting in improved generalization and predictive stability.

In contrast, for small-sized datasets, the traditional MLP architecture was the most suitable approach, as it yielded the best results on the Kemerer dataset. Furthermore, statistical validation using the Wilcoxon signed-rank test confirmed the robustness and significance of these results, reinforcing the conclusion that MLP provides a reliable and efficient solution for estimating effort and duration in small-sized software project datasets.

In this research paper, a new innovative hybrid neural network called FractalNN_RF was proposed, which combines a Fractal Neural Network with the RF algorithm for regression tasks. When comparing the performance of this optimized hybrid architecture with that of seven other prediction architectures, FractalNN_RF demonstrated superior results on medium-sized datasets and yielded the highest accuracy on the Desharnais and Maxwell datasets.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are openly available in: China dataset https://zenodo.org/record/268446 [27], Desharnais dataset [28], Kemerer dataset https://zenodo.org/record/268464 [29], and Maxwell dataset https://zenodo.org/records/268461 [30], all accessed on 29 July 2025.

Conflicts of Interest

The author declare no conflict of interest.

Appendix A

Table A1. MLP results.

Dataset	Metric	Optim	Parameters		Estimated Effort			Estimated Duration
Dataset	Metric	Optim	e	n	Minim	Maxim	Mean	Minim	Maxim	Mean
China	MAE	0.0317	300	140	1472.383	24,444.827	3366.847	4.786	20.541	8.681
	MdAE	0.0106	800	160	1159.986	20,928.541	3435.991	3.449	22.809	8.514
	RMSE	0.0649	200	200	1628.623	25,036.674	3390.352	4.744	22.722	8.258
	CD	0.6270
	MSLE	0.0025
Desharnais	MAE	0.0647	400	160	348.765	10,623.076	4443.913	7.035	18.732	11.543
	MdAE	0.0345	400	160	348.765	10,623.076	4443.913	7.035	18.732	11.543
	RMSE	0.0887	500	120	1243.795	9210.437	4239.581	4.337	18.825	10.916
	CD	0.6267
	MSLE	0.0048
Kemerer	MAE	0.0173	200	40	51.659	286.898	190.357	8.006	13.138	11.261
	MdAE	0.0202
	RMSE	0.0203
	CD	0.9274
	MSLE	0.0003
Maxwell	MAE	0.1139	400	140	1758.471	16,327.258	8915.971	3.687	26.768	17.309
	MdAE	0.0730
	RMSE	0.2151
	CD	0.1840
	MSLE	0.0213

Table A2. DFCNN results.

Dataset	Metric	Optim	Parameters		Estimated Effort			Estimated Duration
Dataset	Metric	Optim	e	n	Minim	Maxim	Mean	Minim	Maxim	Mean
China	MAE	0.0099	400	120	45.534	36,469.492	3280.766	0.996	42.135	7.783
	MdAE	0.0028	1000	160	63.133	29,416.503	3245.035	1.285	41.082	7.899
	RMSE	0.0205	1000	20	251.189	45,330.308	3356.681	1.329	44.121	8.524
	CD	0.9627
	MSLE	0.0003
Desharnais	MAE	0.0735	1000	60	816.708	13,523.441	4179.362	3.534	20.528	11.112
	MdAE	0.0298	300	80	975.229	14,300.391	4271.045	2.521	21.404	9.659
	RMSE	0.1088	900	40	803.161	14,015.801	4780.872	2.924	22.231	10.916
	CD	0.4383
	MSLE	0.0070
Kemerer	MAE	0.0683	900	200	74.731	250.584	138.352	10.633	17.550	14.263
	MdAE	0.0389	300	140	61.393	257.390	135.775	10.738	17.283	14.151
	RMSE	0.0923	900	20	35.217	204.015	122.328	7.791	16.698	13.173
	CD	0.4951
	MSLE	0.0064
Maxwell	MAE	0.1104	600	20	1158.093	21,672.662	7113.729	5.641	35.529	17.326
	MdAE	0.0412	600	20	1158.093	21,672.662	7113.729	5.641	35.529	17.326
	RMSE	0.2049	100	140	2592.819	25,223.751	11,619.255	7.241	37.881	22.179
	CD	0.2589
	MSLE	0.0200

Table A3. FractalNN results.

Dataset	Metric	Optim	Parameters		Estimated Effort			Estimated Duration
Dataset	Metric	Optim	e	f	Minim	Maxim	Mean	Minim	Maxim	Mean
China	MAE	0.0135	300	64	127.883	31,725.602	3615.992	2.828	27.366	8.281
	MdAE	0.0028	400	128	114.725	29,842.343	3560.667	3.227	23.762	8.519
	RMSE	0.0388	200	32	112.775	31,570.025	3459.542	2.612	27.602	8.401
	CD	0.8666
	MSLE	0.0008
Desharnais	MAE	0.0749	900	32	514.856	14,540.015	4375.239	2.644	20.931	10.531
	MdAE	0.0306	600	128	564.006	15,068.696	4514.689	1.105	24.401	10.163
	RMSE	0.1158	200	64	449.581	12,471.724	4663.424	2.733	22.814	11.081
	CD	0.3635
	MSLE	0.0077
Kemerer	MAE	0.041	100	16	45.903	259.081	164.537	8.357	14.438	11.961
	RMSE	0.0453
	CD	0.6403
	MSLE	0.0015
	MdAE	0.0291	400	64	43.223	233.819	144.382	11.268	16.675	13.753
Maxwell	MAE	0.1078	100	32	417.839	26,885.683	7444.051	1.238	34.265	20.154
	RMSE	0.1949
	CD	0.3299
	MSLE	0.0173
	MdAE	0.0438	500	256	137.777	16,442.828	6952.904	2.877	27.951	16.464

Table A4. KELM results.

Dataset	Metric	Optim	Parameters		Estimated Effort			Estimated Duration
Dataset	Metric	Optim	γ	λ	Minim	Maxim	Mean	Minim	Maxim	Mean
China	MAE	0.0075	1	0.5	18.394	36,780.821	3368.434	1.598	20.773	7.913
	RMSE	0.0233
	CD	0.9518
	MSLE	0.0002
	MdAE	0.0014	10	10	10.848	11,169.811	2097.778	0.835	32.587	6.837
Desharnais	MAE	0.0704	5	0.05	1542.85	9829.575	4255.124	5.775	15.651	10.307
	RMSE	0.0955
	CD	0.5672
	MSLE	0.0055
	MdAE	0.0396	1	0.01	1542.85	9829.575	4255.124	5.775	15.651	10.307
Kemerer	MAE	0.0271	0.01	50	53.862	237.222	179.852	10.603	12.545	11.828
	MdAE	0.0262
	RMSE	0.0306
	CD	0.8352
	MSLE	0.0007
Maxwell	MAE	0.1066	1	0.1	1050.519	11,953.756	5720.109	4.981	24.895	13.491
	MdAE	0.0417	0.05	100	2387.231	2634.266	2522.291	7.203	7.698	7.517
	RMSE	0.2225	0.5	5	638.449	12,404.171	7152.556	6.285	23.827	15.346
	CD	0.1266
	MSLE	0.0227

Table A5. MLP_RF results.

Dataset	Metric	Optim	Parameters					Estimated Effort			Estimated Duration
Dataset	Metric	Optim	e	n	s	d	r	Minim	Maxim	Mean	Minim	Maxim	Mean
China	MAE	0.0129	500	700	480	32	21	141.075	29,762.025	3438.424	2.65	28.425	8.726
	MdAE	0.0022	400	700	720	64	84	116.516	29317.1	3477.372	2.866	28.266	8.854
	RMSE	0.0373	300	600	320	32	42	102.8	31,664.045	3512.596	2.812	25.825	8.753
	CD	0.8766
	MSLE	0.0007
Desharnais	MAE	0.0660	50	500	640	32	42	817.026	9937.116	4798.493	3.668	16.952	10.563
	RMSE	0.0892
	CD	0.6219
	MSLE	0.0048
	MdAE	0.0256	50	700	560	64	168	812.425	13,158.441	4535.952	3.012	20.832	11.121
Kemerer	MAE	0.0482	50	200	400	16	42	81.528	244.554	161.711	6.145	15.801	12.807
	RMSE	0.0652
	CD	0.2535
	MSLE	0.0032
	MdAE	0.0379	500	800	720	32	84	220.814	225.819	223.311	10.312	12.312	11.562
Maxwell	MAE	0.1033	350	200	640	16	42	2363.051	15,093.803	7437.453	10.050	23.351	15.752
	MdAE	0.0355	500	800	800	64	84	2529.422	27,173.061	9318.676	9.588	30.211	17.203
	RMSE	0.2079	250	600	480	32	42	3046.750	19,160.983	8786.932	9.403	29.183	17.402
	CD	0.2373
	MSLE	0.0206

Table A6. DFCNN_RF results.

Datasets	Metrics	Optim	Parameters					Estimated Effort			Estimated Duration
Datasets	Metrics	Optim	e	n	s	d	r	Minim	Maxim	Mean	Minim	Maxim	Mean
China	MAE	0.0062	200	200	560	16	42	73.133	41,748.283	3444.124	2.025	28.151	8.019
	MdAE	0.0017	200	300	640	64	84	73.233	39,714.041	3448.259	1.937	28.325	8.092
	RMSE	0.0154	50	500	320	32	21	137.555	44,868.622	3466.066	1.788	24.977	8.251
	CD	0.9791
	MSLE	0.0001
Desharnais	MAE	0.0692	350	300	480	8	84	809.551	13,588.401	4465.083	3.001	23.901	11.421
	MdAE	0.0299	200	700	720	32	168	765.941	13,373.641	4068.451	3.381	21.951	9.733
	RMSE	0.1082	150	600	400	16	42	834.051	11,005.166	4337.528	3.833	22.866	11.275
	CD	0.4438
	MSLE	0.0068
Kemerer	MAE	0.0447	350	200	240	16	21	85.567	259.569	173.862	11.081	16.721	13.731
	RMSE	0.0637
	CD	0.2865
	MSLE	0.0029
	MdAE	0.0219	200	400	480	32	84	79.811	225.368	152.586	8.401	17.528	13.162
Maxwell	MAE	0.0978	300	800	320	16	42	1076.275	23,005.712	7972.044	6.012	32.251	16.485
	MdAE	0.0403	250	900	560	64	168	1027.077	10,103.794	6935.732	6.044	29.711	17.336
	RMSE	0.1889	450	400	240	32	42	1016.891	32,674.305	11,638.192	5.992	35.421	20.936
	CD	0.3702
	MSLE	0.0192

Table A7. FractalNN_RF results.

Datasets	Metrics	Optim	Parameters					Estimated Effort			Estimated Duration
Datasets	Metrics	Optim	e	f	s	d	r	Minim	Maxim	Mean	Minim	Maxim	Mean
China	MAE	0.0072	400	128	640	32	42	357.579	37,267.269	3381.834	0.487	34.623	8.157
	MSLE	0.0002	400	128	640	32	42	357.579	37,267.269	3381.834	0.487	34.623	8.157
	RMSE	0.0206	250	64	480	16	21	84.741	41,627.597	3428.631	0.176	30.517	7.864
	CD	0.9623	250	64	480	16	21	84.741	41,627.597	3428.631	0.176	30.517	7.864
	MdAE	0.0024	400	256	800	32	84	4.243	35,487.945	3528.924	2.269	43.337	8.689
Desharnais	MAE	0.0573	450	64	560	32	84	1231.655	11,791.291	4515.252	6.455	17.315	10.765
	MdAE	0.0236	350	128	720	64	168	1076.951	11,757.902	4695.727	5.701	17.675	10.248
	RMSE	0.0777	50	32	400	16	21	1745.625	9594.725	4338.921	7.221	19.002	12.336
	CD	0.7135
	MSLE	0.0036
Kemerer	MAE	0.0520	300	16	240	8	42	67.796	306.349	182.484	9.251	17.166	13.233
	MdAE	0.0266	100	256	480	64	84	49.214	266.254	164.273	6.783	15.955	11.261
	RMSE	0.0587	100	64	320	16	21	131.558	260.103	181.311	10.491	15.762	13.055
	CD	0.3955
	MSLE	0.0027
Maxwell	MAE	0.0957	150	32	320	32	42	1535.398	27,359.247	10,670.121	5.025	30.826	18.061
	RMSE	0.1629
	CD	0.5320
	MSLE	0.0118
	MdAE	0.0328	350	64	560	16	168	1530.592	30,992.084	7092.764	4.467	37.588	18.225

Table A8. ELM_RF results.

Datasets	Metrics	Optim	Parameters				Estimated Effort			Estimated Duration
Datasets	Metrics	Optim	n	s	d	r	Minim	Maxim	Mean	Minim	Maxim	Mean
China	MAE	0.0046	700	560	16	21	110.801	42,085.178	3334.394	1.264	25.832	7.506
	RMSE	0.0137
	CD	0.9834
	MSLE	0.0001
	MdAE	0.0013	900	720	32	42	142.017	35,781.305	3241.709	1.525	22.849	7.721
Desharnais	MAE	0.0651	300	400	32	21	423.783	8425.424	4411.749	7.632	16.437	11.287
	MdAE	0.0277	200	640	64	84	725.282	10,394.393	4304.338	6.015	19.011	11.377
	RMSE	0.0942	400	320	16	42	172.922	8369.193	4409.763	7.544	16.546	11.394
	CD	0.5789
	MSLE	0.0051
Kemerer	MAE	0.0696	500	480	64	168	82.58	185.4105	138.961	7.55	15.95	11.762
	MdAE	0.0589	500	480	64	168	82.58	185.4105	138.961	7.55	15.95	11.762
	RMSE	0.0831	300	240	32	84	81.535	167.012	134.947	8.002	16.154	12.649
	CD	0.2129
	MSLE	0.0051
Maxwell	MAE	0.0984	600	400	16	42	2764.9	11,393.7	5204.428	10.0	18.9	14.581
	MdAE	0.0391
	RMSE	0.2328
	CD	0.3437
	MSLE	0.0251

References

Anwar, A. Software Engineering a Journey Beyond Code. J. Comput. Sci. Technol. Stud. 2025, 7, 619–627. [Google Scholar] [CrossRef]
Rus, G.; Andras, I.; Vaida, C.; Crisan, N.; Gherman, B.; Radu, C.; Tucan, P.; Iakab, S.; Hajjar, N.A.; Pisla, D. Artificial intelligence-based hazard detection in robotic-assisted single-incision oncologic surgery. Cancers 2023, 15, 3387. [Google Scholar] [CrossRef] [PubMed]
Ran, D.; Wu, M.; Yang, W.; Xie, T. Foundation Model Engineering: Engineering Foundation Models Just as Engineering Software. ACM Trans. Softw. Eng. Methodol. 2025, 34, 1–18. [Google Scholar] [CrossRef]
Gupta, A. Machine Learning and Deep Learning: A Comprehensive Overview. Int. J. Res. Appl. Sci. Eng. Technol. 2025, 13, 1620–1626. [Google Scholar] [CrossRef]
Covaciu, F.; Crisan, N.; Vaida, C.; Andras, I.; Pusca, A.; Gherman, B.; Radu, C.; Tucan, P.; Al Hajjar, N.; Pisla, D. Integration of Virtual Reality in the Control System of an Innovative Medical Robot for Single-Incision Laparoscopic Surgery. Sensors 2023, 23, 5400. [Google Scholar] [CrossRef]
Panoiu, M.; Panoiu, C. Hybrid Deep Neural Network Approaches for Power Quality Analysis in Electric Arc Furnaces. Mathematics 2024, 12, 3071. [Google Scholar] [CrossRef]
Olaniran, O.; Alzahrani, A.R.; Alharbi, N.M.; Alzahrani, A.A. Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification. Mathematics 2025, 13, 1214. [Google Scholar] [CrossRef]
Muscalagiu, I.; Popa, H.E.; Negru, V. Improving the performances of asynchronous search algorithms in scale-free networks using the nogood processor technique. Comput. Inform. 2015, 34, 254–274. [Google Scholar]
Iordan, A.E. Optimal Solution of the Guarini Puzzle Extension using Tripartite Graphs. IOP Conf. Ser. Mater. Sci. Eng. 2019, 477, 012046. [Google Scholar]
Supiyandi, S.; Hasanunddin, M. Optimization of Computer Network Performance Using Heuristic Algorithms. J. Comput. Sci. Artif. Intell. Commun. 2025, 1, 12–17. [Google Scholar] [CrossRef]
Rus, G.; Gherman, B.; Nae, L.; Vaida, C.; Pisla, A.; Oprea, E.; Schonstein, C.; Antal, T.; Pisla, D. Fuzzy Logic Systems: From WisdomofAge Mentoring Platform to Medical Robots. In International Workshop on Medical and Service Robots; Springer Nature: Cham, Switzerland, 2023. [Google Scholar]
Ghiormez, L.; Panoiu, M.; Panoiu, C. Fuzzy Logic Controller for Power Control of an Electric Arc Furnace. Mathematics 2024, 12, 3445. [Google Scholar] [CrossRef]
Tucan, P.; Ciocan, A.; Gherman, B.; Radu, C.; Vaida, C.; Al Hajjar, N.; Chablat, D.; Pisla, D. Design Optimization of a Parallel Robot for Laparoscopic Pancreatic Surgery Using a Genetic Algorithm. Appl. Sci. 2025, 15, 4383. [Google Scholar] [CrossRef]
Panoiu, M.; Panoiu, C.; Ivascanu, P. Power Factor Modelling and Prediction at the Hot Rolling Mills’ Power Supply Using Machine Learning Algorithms. Mathematics 2024, 12, 839. [Google Scholar] [CrossRef]
Covaciu, F.; Tucan, P.; Rus, G.; Pisla, A.; Zima, I.; Gherman, B. Positioning of a Surgical Parallel Robot Using Artificial Intelligence. In Proceedings of the 33rd International Conference on Robotics in Alpe-Adria-Danube Region, Cluj-Napoca, Romania, 5–7 June 2024. [Google Scholar]
Rajput, Y.; Razi, M.H.; Sharma, A.K. A Comparative Analysis of Different Machine Learning Techniques used in Software Effort Estimation. In Proceedings of the International Conference on Computational Intelligence, Communication Technology and Networking, Ghaziabad, India, 6–7 February 2025. [Google Scholar]
Hariyanti, E.; Paradista, M.A.; Goyayi, M.L.J.; Shabirina, D.A.; Nurjanah, E.; Husna, O.I.; Yahrani, F.A.S. The Implementation of Machine Learning for Software Effort Estimation: A Literature Review. Khazanah Inform. 2024, 10, 47–57. [Google Scholar] [CrossRef]
Rossi, B.B.; Fontoura, L.M. AI-Based Approaches for Software Tasks Effort Estimation: A Systematic Review of Methods and Trends. In Proceedings of the International Conference on Enterprise Information Systems, Porto, Portugal, 4–6 April 2025. [Google Scholar]
Salihu, S.A.; Saliu, K.B.; Owoyemi, O.A. A Systematic Literature Review of Machine Learning and AutoML in Software Effort Estimation. In Proceedings of the International Conference on ICT for National Development and Its Sustainability, Ilorin, Nigeria, 21–25 May 2024; pp. 145–168. [Google Scholar]
Farah, A.; Lahceb, I. Software effort estimation based on long short-term memory and stacked long short term memory. In Proceedings of the International Conference on Contemporary Information Technology and Mathematics, Mosul, Iraq, 30–31 August 2022. [Google Scholar]
Nisa, M.; Saqlain, M.; Abid, M.; Awais, M.; Stevic, Z. Analysis of Software Effort Estimation by Machine Learning Techniques. Ing. Des Syst. D’Inf. 2023, 28, 1445–1457. [Google Scholar]
Anitha, C.; Parveen, N. Deep artificial neural network based multilayer gated recurrent model for effective prediction of software development effort. Multimed. Tools Appl. 2024, 83, 66869–66895. [Google Scholar] [CrossRef]
Ali, S.S.; Ren, J.; Wu, J.; Zhang, K.; Chao, L. Advancing Software Project Effort Estimation: Leveraging a NIVIM for Enhanced Preprocessing. J. Softw. Evol. Process 2025, 37, e2745. [Google Scholar]
Rahman, M.; Goncalves, T.; Sarwar, H. Review of Existing Datasets used for Software Effort Estimation. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 921–931. [Google Scholar] [CrossRef]
Abedu, S.; Mensah, S.; Boafo, F. An empirical Study on Small-Sized Datasets Based on Eubank’s Optimal Spacing Theorem. SN Comput. Sci. 2025, 6, 1. [Google Scholar] [CrossRef]
Eubank, R.L. A Density-Quantile Function Approach to Optimal Spacing Selection. Ann. Stat. 1981, 9, 494–500. [Google Scholar] [CrossRef]
Zenodo|China: Effort Estimation Dataset. Available online: https://zenodo.org/record/268446 (accessed on 10 May 2025).
Desharnais, J.M. Analyse Statistique de la Productivitie des Projets Informatique a Partie de la Technique des Point des Function. Master’s Thesis, University of Montreal, Montreal, QC, Canada, 1999. [Google Scholar]
Kemerer Zenodo|Kemerer. Available online: https://zenodo.org/record/268464 (accessed on 11 May 2025).
Maxwell Zenodo|Maxwell. Available online: https://zenodo.org/records/268461 (accessed on 20 May 2025).
Covaciu, F.; Gherman, B.; Vaida, C.; Pisla, A.; Tucan, P.; Caprariu, A.; Pisla, D. A Combined Mirror–EMG Robot-Assisted Therapy System for Lower Limb Rehabilitation. Technologies 2025, 13, 227. [Google Scholar] [CrossRef]
Iordan, A.E. Usage of Stacked Long Short-Term Memory for Recognition of 3D Analytic Geometry Elements. In Proceedings of the International Conference on Agents and Artificial Intelligence, Lisbon, Portugal, 3–5 February 2022. [Google Scholar]
Larsson, G.; Maire, M.; Shakhnarovich, G. FractalNet: Ultra-Deep Neural Networks without Residuals. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Yu, S. An Efficient Kernel Extreme Learning Machine Approach for Bankruptcy Prediction. J. Comput. Sci. Artif. Intell. 2025, 2, 72–77. [Google Scholar] [CrossRef]
Lucchese, L.V.; Oliveira, G.; Pedrollo, O. A hybrid Random Forests and Artificial Neural Networks Bagging Ensemble for Landslide Susceptibility Modelling. Geocarto Int. 2022, 37, 16492–16511. [Google Scholar] [CrossRef]
Athya, T.; Singh, R. Use of Artificial Neural Network in Engineering: A Review. Int. J. Res. Appl. Sci. Eng. Technol. 2025, 13, 7200–7205. [Google Scholar] [CrossRef]
Bazilevskiy, M.P. Optimization Problem of Constructing Linear Regressions with a Minimum Value of the Mean Absolute Error on Test Sets. Model. Data Anal. 2024, 14, 91–103. [Google Scholar] [CrossRef]
Iordan, A.E. Development of an Interactive Environment used for Simulation of Shortest Paths Algorithms. Ann. Fac. Eng. Hunedoara 2012, 10, 97. [Google Scholar]
Panoiu, C.; Militaru, G.; Panoiu, M. Real-Time Video Processing for Measuring Zigzag Length of Pantograph–Catenary Systems Based on GPS Correlation. Appl. Sci. 2024, 14, 9252. [Google Scholar]
Das, M.; Das, P.; Akram, W.; Chatterjee, S. A Comparative Analysis of Linear Regression Techniques: Evaluating Predictive Accuracy and Model Effectiveness. Int. J. Innov. Sci. Res. Technol. 2025, 10, 127–139. [Google Scholar]
Kumar, A.; Sen, S.; Sinha, S. Support vector machine-based prediction model for the compressive strength for concrete reinforced with waste plastic and fly ash. Asian J. Civ. Eng. 2025, 26, 1429–1447. [Google Scholar] [CrossRef]
Ali, A.; Amin, M.Z. Hands-On Machine Learning with Scikit-Learn, 1st ed.; Amazon Kindle Direct Publishing: Seattle, WA, USA, 2019. [Google Scholar]
Iordan, A.; Savii, G.; Panoiu, M.; Panoiu, C. Development of a dynamical software for teaching plane analytical geometry. In Proceedings of the International Conference on Engineering Education, Heraklion, Greece, 22–24 July 2008. [Google Scholar]
Palani, K.; Ankalagi, N.; Karunya, T.; Wankhede, D. Python Programming Essentials, 1st ed.; Global Scholars Press: Chiplun, India, 2025. [Google Scholar]
Lafta, N.A. A Comprehensive Analysis of Keras: Enhancing Deep Learning Applications in Network Engineering. Babylon. J. Netw. 2023, 2023, 94–100. [Google Scholar] [CrossRef]
Ramchandani, M.; Khandare, H.; Singh, P.; Rajak, P.; Suryawanshi, N.; Jangde, A.S.; Arya, L.; Kumar, P.; Sahu, M. Survey: Tensorflow in Machine Learning. J. Phys. Conf. Ser. 2022, 2273, 012008. [Google Scholar] [CrossRef]
Hunt, J. Introduction to Matplotlib. In Advanced Guide to Python 3 Programming; Springer: Cham, Switzerland, 2019; Volume 5, pp. 35–42. [Google Scholar]
Iordan, A.E. An optimized LSTM neural network for accurate estimation of software development effort. Mathematics 2024, 12, 200. [Google Scholar] [CrossRef]
Shakhovska, N.; Shymanskyi, V.; Prymachenko, M. FractalNet-LSTM Model for Time Series Forecasting. Comput. Mater. Contin. 2025, 82, 4469. [Google Scholar] [CrossRef]
Jiang, J. The eye of artificial intelligence—Convolutional Neural Networks. Appl. Comput. Eng. 2024, 76, 273–279. [Google Scholar] [CrossRef]

$Fractalfract 09 00702 g001$

Figure 1. UML use case diagram.

$Fractalfract 09 00702 g001$

Table 1. Datasets dimensions.

Datasets	Projects	Attributes	Duration Unit	Effort Unit
China	499	15	Months	Person-hours
Desharnais	81	10	Months	Person-hours
Kemerer	15	7	Months	Person-months
Maxwell	62	27	Months	Person-months

Table 2. China dataset used attributes.

Attributes	Attributes Description	Min	Max	Mean	Std
AFP	Adjusted function points	9	17,518	486.857	1059.171
Input	Function points of input	0	9404	167.098	486.3386
Output	Function points of external output	0	2455	113.601	221.2744
Enquiry	Function points of external output enquiry	0	952	61.6012	105.4228
File	Function points of internal logical files	0	2955	91.2344	210.271
Interface	Function points of external interface added	0	1572	24.2344	85.041
Added	Function points of added functions	0	13,580	360.3547	829.8423
Changed	Function points of changed functions	0	5193	85.0621	290.857
PDR_AFP	Productivity delivery rate	0.3	83.8	11.77054	12.10565
PDR_UFP	Productivity delivery rate	0.3	96.6	12.07976	12.81871
NPDR_AFP	Normalized productivity delivery rate	0.4	101	13.26974	14.00984
NPDU_UFP	Normalized productivity delivery rate	0.4	108.3	13.62625	14.84342
Resource	Team type	1	4	1.458918	0.823729
Duration	Total elapsed time for software completion	1	84	8.719238	7.347058
Effort	Summary work report	26	54,620	3921.048	6480.856

Table 3. Desharnais dataset used attributes.

Attributes	Attributes Description	Min	Max	Mean	Std
TeamExp	Development team experience	−1	4	2.185	1.415
ManagerExp	Manager experience	−1	7	2.531	1.644
Transactions	Number of the logical transactions	9	886	182.123	144.035
Entities	Number of logical files or data entities	7	387	122.333	84.882
PointsNonAdjust	Unadjusted function points	73	1127	304.457	180.210
Adjustment	Adjustment factor	5	52	27.630	10.592
PointsAdjust	Adjusted function points	62	1116	289.235	185.761
Language	Used programming language	1	3	1.556	0.707
Duration	Project schedule in months	1	39	11.667	7.425
Effort	Total effort for software completion	546	23,940	5046.309	4418.767

Table 4. Kemerer dataset used attributes.

Attributes	Attributes Description	Min	Max	Mean	Std
Language	Used programming language	1	3	1.200	0.561
Hardware	Hardware Platform Type or Complexity	1	6	2.333	1.676
KSLOC	Size of the software project	39	449.9	186.573	136.817
AdjFP	Adjusted Function Points	99.9	2306.8	999.140	589.592
RAWFP	Raw Function Points	97	2284	993.867	597.426
Duration	Total elapsed time for software completion	5	31	14.267	7.545
Effort	Total amount of required effort	23.2	1107.31	219.247	263.055

Table 5. Maxwell dataset used attributes.

Attributes	Attributes Description	Min	Max	Mean	Std
App	Application type	1	5	2.3548	0.9933
Har	Hardware platform	1	5	2.6129	0.9976
Dba	Used database management system	0	4	1.0322	0.4423
Ifc	Used user interface technology	1	2	1.9354	0.2476
Source	Used source code management system	1	2	1.8709	0.3379
Nlan	Different programming languages used	1	4	2.5483	1.0191
T01	Customer participation	1	5	3.0483	0.9988
T02	Development environment adequacy	1	5	3.0483	0.7112
T03	Staff availability	2	5	3.0322	0.8864
T04	Standards use	2	5	3.1935	0.6975
T05	Methods use	1	5	3.0483	0.7112
T06	Tools use	1	4	2.9032	0.6944
T07	Software’s logical complexity	1	5	3.2419	0.8996
T08	Requirements volatility	2	5	3.8064	0.9553
T09	Quality requirements	2	5	4.0645	0.7437
T10	Efficiency requirements	2	5	3.6129	0.8935
T11	Installation requirements	2	5	3.4193	0.9842
T12	Staff analysis skills	2	5	3.8225	0.6900
T13	Staff application knowledge	1	5	3.0645	0.9559
T14	Staff tool skills	1	5	3.2580	1.0071
T15	Staff team skills	1	5	3.3387	0.7453
Telonuse	Whether or not the project uses Telon tool	0	1	0.2419	0.4317
Duration	Project duration in months	4	54	17.2096	10.6511
Effort	Total amount of effort expended on the project	583	63,694	8223.21	10,499.9

Table 6. Real effort values.

Dataset	Training					Testing
Dataset	Number	Minim	Maxim	Mean	Std	Number	Minim	Maxim	Mean	Std
China	374	26	54,620	4089.305	6684.16	125	89	49,034	3417.624	5826.511
Desharnais	60	651	23,940	5241.55	4714.124	21	546	14,987	4488.476	3478.961
Kemerer	11	23.2	1107.31	222.919	306.831	4	72	287	209.15	94.446
Maxwell	46	796	39,479	7461.609	3478.961	16	583	63,694	10,412.81	15,517.68

Table 7. Real duration values.

Dataset	Training					Testing
Dataset	Number	Minim	Maxim	Mean	Std	Number	Minim	Maxim	Mean	Std
China	374	1	48	8.581	6.569	125	1	84	9.132	9.313
Desharnais	60	1	39	12.366	7.697	21	3	27	9.666	6.327
Kemerer	11	5	20	12.002	4.921	4	5	31	19.750	11.412
Maxwell	46	4	54	16.543	10.765	16	6	45	19.125	10.410

Table 8. Optimal values of used metrics.

Dataset	Method	MAE	MdAE	RMSE	CD	MSLE
China	MLP	0.0317	0.0106	0.0649	0.6270	0.0025
	DFCNN	0.0099	0.0028	0.0205	0.9627	0.0003
	FractalNN	0.0135	0.0028	0.0388	0.8666	0.0008
	KELM	0.0075	0.0014	0.0233	0.9518	0.0002
	MLP_RF	0.0129	0.0022	0.0373	0.8766	0.0007
	DFCNN_RF	0.0062	0.0017	0.0154	0.9791	0.0001
	FractalNN_RF	0.0072	0.0024	0.0206	0.9623	0.0002
	ELM_RF	0.0046	0.0013	0.0137	0.9834	0.0001
	Previous studies	0.012 [20]		0.016 [20]	0.981 [20]
Desharnais	MLP	0.0647	0.0345	0.0887	0.6267	0.0048
	DFCNN	0.0735	0.0298	0.1088	0.4383	0.0070
	FractalNN	0.0749	0.0306	0.1158	0.3635	0.0077
	KELM	0.0704	0.0396	0.0955	0.5672	0.0055
	MLP_RF	0.0660	0.0256	0.0892	0.6219	0.0048
	DFCNN_RF	0.0692	0.0299	0.1082	0.4438	0.0068
	FractalNN_RF	0.0573	0.0236	0.0777	0.7135	0.0036
	ELM_RF	0.0651	0.0277	0.0942	0.5789	0.0051
	Previous studies	0.0699 [23]		0.102 [20]	0.6432 [23]
Kemerer	MLP	0.0173	0.0202	0.0203	0.9274	0.0003
	DFCNN	0.0683	0.0389	0.0923	0.4951	0.0064
	FractalNN	0.0410	0.0291	0.0453	0.6403	0.0015
	KELM	0.0271	0.0262	0.0306	0.8352	0.0007
	MLP_RF	0.0482	0.0379	0.0652	0.2535	0.0032
	DFCNN_RF	0.0447	0.0219	0.0637	0.2865	0.0029
	FractalNN_RF	0.0520	0.0266	0.0587	0.3955	0.0027
	ELM_RF	0.0696	0.0589	0.0831	0.2129	0.0051
	Previous studies	0.0754 [22]		0.301 [20]	0.336 [20]
Maxwell	MLP	0.1139	0.0730	0.2151	0.1840	0.0213
	DFCNN	0.1104	0.0412	0.2049	0.2589	0.0200
	FractalNN	0.1078	0.0438	0.1949	0.3299	0.0173
	KELM	0.1066	0.0417	0.2225	0.1266	0.0227
	MLP_RF	0.1033	0.0355	0.2079	0.2373	0.0206
	DFCNN_RF	0.0978	0.0403	0.1889	0.3702	0.0192
	FractalNN_RF	0.0957	0.0328	0.1629	0.5320	0.0118
	ELM_RF	0.0984	0.0391	0.2328	0.3437	0.0251

Table 9. Wilcoxon test p-values.

Methods	MAE	MdAE	RMSE	CD	MSLE
MLP vs. MLP_RF	0.0013	0.0012	0.0014	0.0015	0.0019
MLP vs. DFCNN_RF	0.0014	0.0005	0.0013	0.0017	0.0017
MLP vs. FractalNN_RF	0.0019	0.0007	0.0010	0.0020	0.0015
MLP vs. ELM_RF	0.0023	0.0024	0.0022	0.0024	0.0023

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Iordan, A.-E. Empirical Comparison of Neural Network Architectures for Prediction of Software Development Effort and Duration. Fractal Fract. 2025, 9, 702. https://doi.org/10.3390/fractalfract9110702

AMA Style

Iordan A-E. Empirical Comparison of Neural Network Architectures for Prediction of Software Development Effort and Duration. Fractal and Fractional. 2025; 9(11):702. https://doi.org/10.3390/fractalfract9110702

Chicago/Turabian Style

Iordan, Anca-Elena. 2025. "Empirical Comparison of Neural Network Architectures for Prediction of Software Development Effort and Duration" Fractal and Fractional 9, no. 11: 702. https://doi.org/10.3390/fractalfract9110702

APA Style

Iordan, A.-E. (2025). Empirical Comparison of Neural Network Architectures for Prediction of Software Development Effort and Duration. Fractal and Fractional, 9(11), 702. https://doi.org/10.3390/fractalfract9110702

Article Menu

Empirical Comparison of Neural Network Architectures for Prediction of Software Development Effort and Duration

Abstract

1. Introduction

2. Literature Survey

3. Used Datasets

4. Research Approach

4.1. Selected Artificial Neural Networks

4.2. Used Metrics

4.3. Software Design for Effort and Duration Estimation

5. Analysis of Implemented Artificial Neural Networks

5.1. Multilayer Perceptron

5.2. Deep Fully Connected Neural Network

5.3. Fractal Neural Network

5.4. Kernel Extreme Learning Machine

5.5. Hybrid Artificial Neural Networks

5.5.1. Multilayer Perceptron Combined with Random Forests

5.5.2. Deep Fully Connected Neural Network Combined with Random Forests

5.5.3. Fractal Neural Network Combined with Random Forests

5.5.4. Extreme Learning Machine Combined with Random Forests

6. Comparative Analysis of Implemented Artificial Neural Networks

7. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI