A Supervised Machine Learning-Based Approach for Task Workload Prediction in Manufacturing: A Case Study Application

De Simone, Valentina; Di Pasquale, Valentina; Calabrese, Joanna; Miranda, Salvatore; Iannone, Raffaele

doi:10.3390/machines13070602

Open AccessArticle

A Supervised Machine Learning-Based Approach for Task Workload Prediction in Manufacturing: A Case Study Application

by

Valentina De Simone

^1,*

,

Valentina Di Pasquale

¹

,

Joanna Calabrese

²,

Salvatore Miranda

¹

and

Raffaele Iannone

¹

Department of Industrial Engineering, University of Salerno, 84084 Fisciano, Italy

²

Motortecnica S.r.l., 84099 San Cipriano Picentino, Italy

^*

Author to whom correspondence should be addressed.

Machines 2025, 13(7), 602; https://doi.org/10.3390/machines13070602

Submission received: 5 June 2025 / Revised: 5 July 2025 / Accepted: 10 July 2025 / Published: 12 July 2025

(This article belongs to the Special Issue Smart Manufacturing and Beyond: Bridging Innovation in Industry 4.0 and 5.0)

Download

Browse Figures

Versions Notes

Abstract

Predicting workload for tasks in manufacturing is a complex challenge due to the numerous variables involved. In small- and medium-sized enterprises (SMEs), this process is often experience-based, leading to inaccurate predictions that significantly impact production planning, order management, and consequently the ability to meet customer deadlines. This paper presents an approach that leverages machine learning to enhance workload prediction with minimal data collection, making it particularly suitable for SMEs. A case study application using supervised machine learning models for regression, trained in an open-source data analytics, reporting, and integration platform (KNIME Analytics Platform), has been carried out. An Automated Machine Learning (AutoML) regression approach was employed to identify the most suitable model for task workload prediction based on minimising the Mean Absolute Error (MAE) scores. Specifically, the Regression Tree (RT) model demonstrated superior accuracy compared to more traditional simple averaging and manual predictions when modelling data for a single product type. When incorporating all available product data, despite a slight performance decrease, the XGBoost Tree Ensemble still outperformed the traditional approaches. These findings highlight the potential of machine learning to improve workload forecasting in manufacturing, offering a practical and easily implementable solution for SMEs.

Keywords:

supervised learning; AutoML; workload prediction; planning; SMEs

1. Introduction

Workload prediction, intended as the ability to forecast the capacity required to perform a task accurately, is a critical aspect of Production Planning and Control (PPC) processes, aiming to improve efficiency and ensure timely delivery. It has a significant impact on production efficiency by providing the ability to foresee and formulate appropriate actions before production, cost-effectively improving system throughput [1].

In small- and medium-sized enterprises (SMEs), generally, this planning task is based on managers’ experience, and this can be misleading but also risky if key personnel leave [2,3]. New data-driven approaches that provide practical solutions to improving accuracy need to be developed [4,5]. However, SMEs often struggle to adopt Big Data Analytics, Artificial Intelligence (AI), and Machine Learning (ML), cloud computing, and the Internet of Things solutions due to financial constraints and limited expertise and competence, relying instead on more traditional planning methods [6,7]. Adopting digital and smart solutions based on the use of data is crucial for growth and competitiveness in SMEs. What SMEs truly require are intuitive, lightweight systems that can be quickly adopted and seamlessly embedded into their already diverse and fragmented technological ecosystems [4,8,9].

Manufacturing SMEs often struggle to keep pace with rapid technological progress and maintain their market competitiveness, especially when it comes to adopting advanced manufacturing systems and fully embracing the Industry 4.0 (I4.0) paradigm. Despite its potential, the integration of I4.0 digital technologies still represents a problem for many of these enterprises [7,10], although it holds significant promise, enabling enhanced productivity, lower operational costs, and improved product quality. Studies reveal that SMEs frequently underutilise the full range of technological tools available for I4.0 implementation [11]. Focusing on AI and ML, generating valuable insights directly from operational data enables Industry 4.0 transformation [12]. The growing use of AI and especially ML in manufacturing contexts is driven by their potential to extract actionable knowledge from complex datasets, delivering tangible improvements in efficiency, adaptability, and innovation [13].

However, in SMEs, most academic and practical applications of these technologies have centred around maintenance and quality control, whereas production planning and operational control, on which this work focuses, remain underexplored areas in the literature [11]. While AI and ML are gaining traction in dynamic manufacturing, their potential for planning tasks remains largely uncovered and applied [14]. Systems incorporating AI possess analytical capabilities that emulate human cognition. ML, instead, the most promising subfield of AI, specifically enables computer systems to recognise correlations from data, thereby making human-like decisions without defined rules. It is based on the generalisation of knowledge from data and can be realised with different methods such as classification, clustering, regression, and anomaly detection [13,15]. Three categories of ML can be distinguished based on how the models learn from the data: supervised, unsupervised, and reinforcement learning techniques [15]. Supervised learning, which is the focus of this study, is generally performed when goals are specified to be achieved from a set of inputs that use labelled data to train algorithms to classify data (classification models used to predict a label) or forecast outcomes (regression models to predict a quantity). All the tools or methods based on the use of data are dependent on the level of digitalisation of the companies [8,16]. Although digitalisation in planning processes is still a challenge for SMEs, the literature highlights an interesting new trend related to data analytics applications. ML is also increasingly leveraged in workload management within SMEs to predict production times more accurately. By using predictive analytics on production data, SMEs can improve scheduling and workforce allocation. Although often challenging to implement due to limited datasets, this approach helps SMEs adjust their workflows more dynamically, enhancing their responsiveness to demand changes and minimising downtime [17]. According to the specific application, the companies need to be ready to implement “smart” solutions based on AI/ML [18].

Regarding workload prediction, establishing the amount of time needed for an order task in manufacturing is essential for defining its overall load on the manufacturing system and its distribution over time on the resources. This helps in defining the overall order Lead Time (LT) and assigning a reasonable and reliable due date for the customer. The accuracy of this process not only allows for meeting customer expectations and becoming more competitive, but also affects shop floor management practices [19,20,21]. It is essential to know how much time a product might take to get through the manufacturing system for good planning, which allows high flexibility of processes and resources and makes scheduling more predictable, agile, and flexible [20]. For industries with high product variability such as Engineer to Order (ETO), Make To Order (MTO), or Small Series (SS), despite the complexities, the degree to which the plan is executed depends largely upon the ability to accurately predict the amount of time needed for the execution of the order [20,22]. Accurate LT prediction, which is strictly dependent on the order’s overall workload, is essential for effective production planning, especially in contexts with high product variability [2,4,23,24].

In SMEs, recent advances in data analytics and AI offer promise, but integrating these with real-world data remains difficult [24,25]. Analytical approaches have been developed to forecast an accurate LT, even though they are limited to single-case applications [26]. To improve this planning task, SMEs must overcome these barriers, moving away from reliance on tacit knowledge to more systematic, data-driven approaches. However, fully connecting the shop floor to collect meaningful data and integrating automated retraining through Automated Machine Learning (AutoML), which provides methods and processes to make ML available for non-experts to improve the efficiency of machine learning and accelerate research in this field, are ongoing challenges in SMEs [4,24].

Aim of the Study

This work aims to provide an application of AutoML for regression that compares several supervised ML models and identifies the best one for the prediction of order task workload, considering the characteristics of the products manufactured as features on which the workload depends. The application was developed using data from a mid-sized company. To evaluate the impact of different product types on workload prediction accuracy, two cases were analysed: (A) focusing on a single product category and (B) incorporating all available products realised. This comparison helps determine whether product specificity enhances model performance. The innovative aspect of this work lies in offering a solution that is easy to implement. By starting with a manageable dataset, it becomes possible to enhance the planning process through more accurate workload estimations, all within an application built on open-source software. The primary advantages of this application lie in its ability to utilise the total predicted workload based on product features during the offer phase. This enables the sales department to formulate more precise offers, ensuring a more accurate budgeted cost. Additionally, the planning function benefits from a more precise workload estimation, allowing for the effective scheduling of work centres. In the context of Industry 4.0, this research work makes a significant contribution to the digital transformation of manufacturing SMEs by fostering the adoption of advanced technologies to enable data-driven decision-making in planning processes.

The structure of this paper is organised as follows. Section 2 describes the methodology followed, even providing details related to the case study’s company. Section 3 shows the results achieved for each case and predicts the results as a function of the regression model evaluated, and Section 4 discusses them. The main conclusions, limitations, and future research steps are drawn in Section 5.

2. Materials and Methods

2.1. Company Details

The company Motortecnica s.r.l., located in San Cipriano Picentino (Salerno, Italy), is a medium-sized company, founded in 1989, that operates in the electromechanical sector to provide repair services for electrical machines; it later extended its activities to the design and construction of motors and alternators of various types. It is an example of an “Engineer to Order” production system.

The company collected all data related to the orders realised, the characteristics of the product/service, and data from planning and monitoring, thanks to two customised tools developed in Microsoft Access and Microsoft Excel. Each manufacturing order, following a project-management-based approach, is decomposed into the realisation of multiple tasks constrained and interdependent. For each of them is essential in the planning phase to establish the overall workload (number of hours) on the working centres. The company faced several challenges in planning and controlling production orders, which hindered operational efficiency and led to numerous delays and inefficiencies in the overall process. The company relied heavily on an experience-based approach to planning, where decisions were made based on the intuition and historical knowledge of employees rather than data-driven insights. Although useful, experience-based planning lacks the precision needed in today’s complex production contexts. It also made it difficult to scale operations or make changes in response to shifting market demands or production challenges. As a result of the previous inefficiencies, the company often found itself in a situation where overtime was required to compensate for the workload not properly forecasted in the planning phase. This not only increased costs but also placed additional pressure mainly on operators, leading to potential burnout, reduced morale, and a decline in overall productivity.

2.2. Methodology

This study adopts the Cross-Industry Standard Process and Data Mining (CRISP-DM) framework as its methodological foundation. This framework is one of the most popular methodologies for data analytics [27] that consists of the following steps: (i) business understanding refers to the conversion of the business objective into a data mining problem; (ii) data understanding identifies the data source and obtains the variables related to the problem; (iii) data preparation uses several data cleaning and transforming techniques to produce a well-structured dataset before analysis; (iv) predictive modelling includes variable selection, model development, hyperparameter tuning, and validation; (v) model evaluation measures and compares the predictive performance of the models based on different predefined error measurements; and (vi) model deployment generates insights to assist managerial decision-making. The application developed in this research study, based on the CRISP-DM framework, has been detailed in Figure 1.

For the implementation, KNIME Analytics Software (version 5.3) was used [25,28], an open-source data analytics, reporting, and integration platform that enables users to perform data processing, analysis, and visualisation tasks through a visual, drag-and-drop interface. The software allows for creating complex workflows by connecting various components or “nodes” that represent different data operations. Thus, a smart application was realised to evaluate through the AutoML Regression tool to predict the workload of tasks in the planning phase. Two different applications have been modelled:

Case A: Only the data related to the orders that realised a complete generator have been investigated. Most of the orders handled by the company fall into this category. For this reason, specific attention was provided to this kind of product.
Case B: Data from all the orders and types of products were analysed. The objective was to demonstrate whether the typology of the product, considered as an explanatory variable rather than a predefined dataset constraint, affected the prediction models’ performance and to what extent. The product types examined include the full range of components associated with generators, which, for some customers, can also be sold as standalone products, such as rotor poles, stator windings, rotor bars, field coils, as well as other types of products like low-voltage and DC motors.

The developed workflow is depicted in Figure 2. It includes a sub-workflow named “AutoML (Regression)” that has been used to carry out the parameter optimisation and the best model selection according to a specific criterion to optimise.

This workflow uses the entire dataset as a sample to split into learner and predictor splits. This is partitioned into 80% and 20% to learn and therefore train the model within the Auto ML Regression component. After the data preprocessing operation phase, the data sample is split into two parts, with 80% of the data going to the learner partition and the other 20% going to the predictor partition.

Regarding the feature selection, i.e., the identification of the most significant type of data that could be used for the regression, it is a critical preprocessing step in ML modelling. It involves selecting a subset of relevant features while removing irrelevant and redundant ones to improve model performance, reduce the computational cost, and enhance data visualisation. Identifying the most important features can accelerate training and improve model interpretability [29,30]. In this case study, the correlation analysis carried out to analyse the correlations between the independent variable (workload) and the dependent numeric variables was combined with the discussion with the planner and the production functions. Pearson’s correlation, widely used to measure the degree of correlation between two variables [31], was assessed. The correlation value ranges from −1 (strong negative correlation) to 1 (strong positive correlation), with 0 indicating no linear correlation. In addition, a p-value, representing the probability of obtaining the observed result if the correlation coefficient were zero (null hypothesis), was calculated to assess statistical significance. When this probability falls below the conventional threshold of 5% (p < 0.05), the correlation coefficient can be considered statistically significant [32]. The results are presented in Table 1, with Case A referring to the constrained dataset and Case B to the full dataset. Significant values identified in this preliminary analysis are highlighted in bold.

Given the limited volume of available data, the correlation analysis, while helpful in providing some preliminary insights, proved insufficient on its own for supporting robust feature selection. In particular, the Pearson correlation coefficient did not fully capture the complex, often non-linear relationships between variables that influence workload in a real-world production environment. As such, we adopted a hybrid approach that combined quantitative analysis with expert judgement. Discussions with the planner and production functions were essential to this process. Their operational knowledge provided critical context, enabling us to evaluate the practical significance of certain features that, although not statistically prominent, were known to impact workload dynamics. This expert input helped to uncover underlying dependencies and process-specific nuances that the statistical model alone could not detect. Variables that showed weak or no correlation on paper were nonetheless retained in the feature set based on contextual relevance and domain experience. By integrating empirical data with operational expertise, a feature set was created that was not only data-informed but also grounded in practical reality. This hybrid methodology enhanced the reliability and applicability of the workload prediction model, aligning it more closely with actual production scenarios. The features identified, including the categorical variables that affect the workload for each task, with the related description, are reported in Table 2.

Among 11 features, 5 are highly dependent on the specific product and may exhibit significant variability based on the unique characteristics and configuration associated with the “Type of product/service”. For example, in the case of generators, key parameters are represented by “Voltage [kV]”, “Power [MVA]”, “Rotational Speed [rpm]”, and the type of axis configuration (“Axis”). In contrast, when considering Case B, where data from a broader range of products and orders are analysed, there is considerable variability across different product types. For instance, in rotor poles, the factor that most affects workload is the internal length of the polar coils (“Polar Coil Length [mm]”). Meanwhile, for stator windings, the voltage level is a primary determinant, as it dictates the selection of insulating materials, which in turn influences the production technology employed. Specifically, the choice between Vacuum Pressure Impregnation (VPI) and Resin Rich (RR) processes is closely linked to the voltage requirements, with RR typically involving a higher workload compared to VPI.

Finally, machine learning models for regression were compared. In particular, the models evaluated through the AutoML node are as follows:

Linear Regression, which establishes the relationship between two variables by fitting a linear equation to observed data. One variable is treated as the explanatory variable and the other as the dependent variable. It has been trained with default parameters in KNIME.
Polynomial Regression, which represents the relationship between the explanatory and dependent variables using an nth-degree polynomial, allows the capture of non-linear relationships. It has been trained with the optimised parameter “Polynomial degree”.
XGBoost Linear Ensemble, which has been trained with optimised parameters “alpha” and “lambda”.
H2O Generalised Linear Model, trained with the KNIME H2O Machine Learning Integration.
Regression Tree, trained with optimised parameter “Min number records per node”.
Random Forest, derived from decision tree algorithms, addresses classification and regression problems through ensemble learning, which combines multiple classifiers to solve complex tasks. This method generates numerous decision trees, aggregates their predictions, and averages the outputs to improve accuracy. In KNIME, it has been trained with optimised parameters “Tree Depth”, “Number of models”, and “Minimum child node size”.
Gradient Boosted Trees, which builds a “strong” model by combining multiple “weak” models (e.g., decision trees). At each step, the error of the strong model is predicted using a new weak model, and the result is subtracted to reduce the error. It has been trained with the optimised parameter “Number of trees”.
XGBoost Tree Ensemble, trained with optimised parameters “eta” and “max depth”.
H2O AutoML, trained with the KNIME H2O Machine Learning Integration.

Then, the predictions from all models have been compared, and performance metrics have been calculated to evaluate the accuracy of machine learning models, assess the discrepancy between predicted and actual results, and ensure both the reliability and overall effectiveness of the model [33]. Performance metrics are crucial for evaluating the accuracy of machine learning models. The performance metrics evaluated are as follows:

Mean Absolute Error (MAE), i.e., the average of absolute individual errors. On average, the predicted value is off by the MAE value. It defines the magnitude of errors without considering whether they are overestimations or underestimations. Unlike MSE, which squares the errors and can be influenced by outliers, MAE provides a more balanced error representation. It is generally used when the direction of errors is not critical.
Mean Squared Error (MSE), i.e., the average squared difference between the value observed in a statistical study and the values predicted from a model. A lower MSE indicates that the model’s predictions are closer to the true values, reflecting better overall performance.
Root Mean Squared Error (RMSE), i.e., the square root of the average squared differences between predicted and observed outcomes. By squaring, more weight is given to larger errors.
R-squared (R²), i.e., a measure of how well the independent variable(s) in a statistical model explain the variation in the dependent variable.

To establish the best model from the AutoML tool, as a criterion, the minimisation of MAE was identified as the best solution, as also reported in other similar applications in the literature [4,34]. At the end of the workflow, the system selects the best model according to the metric to optimise. The results obtained for each case are detailed in the following sections. As a benchmark for the final model evaluation, and also to establish a comparison with the actual conditions, simple averaging (representing a commonly used value as a rough estimate of variables considered in industrial settings based on the product type, task type, and the quantity to be processed) and manual prediction have been compared to the results obtained from the best ML model [4,35]. In particular, the simple average was specifically determined based on the product type, task type, and the quantity to be processed.

3. Analysis of Results

3.1. Case A: Workload Prediction for Generators

The results of the workload prediction for Case A are reported in Table 3. The best model, which minimises MAE, has been highlighted in bold italics.

The Regression Tree has the most accurate prediction in terms of absolute errors with an MAE of 104.3 h, followed by Linear Regression (108.6 h) and Gradient Boosted Trees (111.2 h). The worst models in terms of MAE are Polynomial Regression (270.4 h) and manual prediction (197.3 h). In terms of the MSE, which penalises larger errors more heavily than MAE, Linear Regression outperforms, suggesting it handles extreme errors better. The same is true for the RMSE. A high R² value indicates a good explanation of variability. Linear Regression and Regression Tree are the most effective according to this metric. Linear Regression consistently performs across all metrics, with the best R², MSE, and RMSE values. However, the best model identified according to the minimisation of MAE is the Regression Tree. It achieves the best MAE and is second-best for R², MSE, and RMSE. It is noteworthy that the Regression Tree significantly outperformed both the simple average and manual prediction, achieving a reduction in Mean Absolute Error (MAE) of 31% and 47%, respectively. Furthermore, improvements were observed in terms of Mean Squared Error (MSE) and Root Mean Squared Error (RMSE), with reductions of 73% and 48% for the simple average, and 82% and 58% for manual prediction, respectively. Despite the relatively limited dataset size, these results demonstrate that even a straightforward application of machine learning can enhance predictive accuracy. Figure 3 reports a comparison between the best model identified and the actual values of the task workload.

Lastly, Figure 4 reports the residuals versus predicted values for the best model obtained (Regression Tree). Residuals represent the difference between the observed or actual value of the target variable (y) and the predicted value (ŷ), i.e., the error in the model’s prediction. Residual analysis is a key step in evaluating model performance and identifying any systematic patterns the model may have missed. It serves as a valuable diagnostic tool for detecting outliers or non-linearity. Ideally, residuals should be randomly distributed around zero across the entire range of predicted values. In the plot under examination, however, an increasing variance in residuals is noticeable at higher predicted values, suggesting a possible heteroskedasticity (the error terms have non-constant variance). However, due to the limited dataset, this aspect is not highlighted in the plot.

3.2. Case B: Workload Prediction Considering All Products

The results for Case B are detailed in Table 4. The best model, which minimises MAE, has been highlighted in bold italics.

The best model, according to the MAE metric, was the XGBoost Tree Ensemble with an MAE of 201.1 h, followed by Gradient Boosted Trees (249.8 h) and Random Forest (255.9 h). XGBoost Tree Ensemble also outperformed the other models in terms of MSE and RMSE. It explains the highest proportion of variability in the dataset with a value of R² of 0.4. The best application has been demonstrated to be the XGBoost Tree Ensemble across all metrics, making it the most suitable model for this dataset. Also, Random Forest and Gradient Boosted Trees performed well but were less consistent compared to the XGBoost Tree Ensemble. It is noteworthy that the XGBoost Tree Ensemble demonstrated significantly superior performance compared to the simple average, achieving a 32% reduction in MAE, and outperforming manual prediction with a 42% reduction in the same metric. Improvements were noted in both MSE and RMSE, with reductions of 63% and 75%, respectively, when compared to the simple average. In comparison to manual predictions, the reductions were 39% for MSE and 50% for RMSE. Figure 5 illustrates the comparison between the best-identified model and the actual workload values.

Figure 6 reports the residuals versus predicted values for the best model obtained (XGBoost Tree Ensemble). Also in this case, an increasing variance in residuals is noticeable at higher predicted values. However, the residuals form a horizontal band around the zero line up to a predicted workload of approximately 400 h, suggesting low error variance within this range. Beyond this point, the error variance increases significantly.

4. Discussions

Due to the limited availability of data, it was not possible to develop models tailored to each specific product. However, when comparing the results obtained in the two cases (Figure 7, Figure 8 and Figure 9), it is important to highlight that restricting the dataset to a specific product type (Case A) leads to significantly greater predictive accuracy. This trend holds consistently across all applied models. In fact, in Case A, all performance metrics are markedly better for every indicator, regardless of the model used.

In addition, comparing the best models obtained for both cases, the Regression Tree model (Case A) outperformed the XGBoost Tree Ensemble (Case B), achieving reductions models in MAE, MSE, and RMSE of 48%, 77%, and 52%, respectively, along with a 75% increase in R².

Moreover, in assessing the workload associated with individual tasks, no consideration was given to the specific resources’ centres to which these tasks are assigned or their respective pre-existing workloads. This decision was made because, at the planning stage, the workload is intrinsically linked to the characteristics of the product. However, during the scheduling phase, it should vary according to the resources allocated to each task.

The AutoML approach demonstrated superior performance compared to both simple average and manual prediction. This method proved to be an effective and efficient way to predict the workload for order tasks accurately. Leveraging automation not only enhances prediction accuracy but also reduces the time and effort typically required for manual estimation, making it a valuable tool for improving planning processes.

A better estimation for single orders’ tasks allows for improved planning, leading to more efficient resource allocation and enhanced production planning. One of the most significant benefits of this application is the ability to utilise the predicted workload, based on product features, during the offer phase. This enables the sales department to establish a more accurate offer, aligning costs more closely with actual production requirements and reducing the risk of budget deviations. Furthermore, a precise workload estimation supports the planning function by providing a more reliable basis for defining the work centre schedules. This contributes to optimised resource utilisation, improved on-time delivery rates, and a more agile response to market demands.

In traditional methods generally used by SMEs, workload predictions are often made using simple averages or estimations based on limited data, leading to imprecise or overly optimistic forecasts. In contrast, the “smart” approach proposed in this article uses data analytics, historical data, and machine learning to create more reliable and refined projections. The case study application described allowed us to predict a better workload for order tasks in both cases of single or multiple product modelling. The proposed AutoML-based approach minimises the guesswork that usually accompanies manual predictions. This not only results in a better understanding of the order’s needs but also allows for more efficient management of time. This approach can adapt to new data and changing conditions, improving prediction accuracy. By doing so, it offers a level of accuracy and reliability that traditional methods cannot match. Regarding the specific case study, the sales engineering evaluation of costs did not include an accurately predicted workload in the offer. The budgeting was carried out without taking an accurate predicted workload (from a planning point of view) into account in the offer, leading to significant discrepancies between actual costs and expectations. Costs can be evaluated with a clear understanding of the predicted workload, leading to more accurate and reliable pricing and expectations for both the company and the customer. The more the dataset is restricted to a specific product type, the more it is possible to obtain accurate predictions. It should be possible to think in the future about integrating different ML models for the same output, according to the product.

The AutoML approach demonstrated superior performance compared to both simple average and manual prediction. These predictions are based on the workload assigned to the resources identified, ensuring a more reliable allocation of resources and improved planning. By leveraging automation, AutoML not only enhances prediction accuracy but also reduces the time and effort typically required for manual estimation, making it a valuable tool for optimising resource management and task scheduling. It is also worth noting that this application relies on data collected using Microsoft Excel and Microsoft Access (both from Microsoft Office 2021) and processed with KNIME Analytics Software (version 5.3), which is open-source software for smart applications. Regardless of the data collection that can be managed in multiple ways, the smart application can be implemented and managed internally without a significant initial investment.

Accurate data collection on the shop floor is crucial for optimising processes and making informed decisions. Nevertheless, measuring processing LTs and collecting data pose significant challenges that need to be addressed to improve data quality. Many systems, above all in SMEs, still rely on manual entry, which leads to errors and inconsistencies, and hinders real-time insights. In SMEs, generally, user-friendly dashboards allow managers and operators to track workload in real time, enhancing transparency and quick responses. When off-the-shelf solutions are not sufficient, customised tools can be developed to address specific production needs.

Manual solutions remain common in SMEs, but other technological solutions are available, although they may be challenging to implement in SMEs due to their complexity, low adaptability, and high cost. For example, existing Internet of Things (IoT) solutions, such as RFID tags and real-time location systems [36,37,38], enable automated and accurate LT tracking, improving data reliability. SMEs can gradually adopt IoT-based solutions for data acquisition to enhance their production planning and control processes [39]. Although technological solutions exist, the main challenge today is to make this system adaptable and easily exploitable for SMEs [40]. By investing in IoT and integrated systems, organisations can achieve accurate insights that enhance efficiency and competitiveness. Implementing effective shop floor control and data collection systems in manufacturing SMEs can lead to better decision-making and improvement in manufacturing operations, but it is still an open challenge today due to the context of SMEs. Not all SMEs have the same needs or difficulties. The monitoring of order LT and data collection strictly depends on the available technological resources. In this specific case study application, data have been collected manually through an integrated platform that allows them to associate the order task workload realised and the distribution over time. This straightforward solution provides the essential data needed for task workload prediction. However, it is interesting to note that it has not been possible to collect much data on completed orders in the last few years.

This preliminary application serves only to define a data-driven approach that can help the decision maker but not force them into choices not due to their experience. In the future, downstream of new data collection, the already developed model should be reused, and the overall performance should be re-verified. In practice, the more the system tends to collect data, the more the estimates can improve. Combining AI/ML with the actions of the decision maker can ensure a more efficient planning system that can “augment” but not completely replace human decisions. This type of approach is more and more valid for small realities, where the decision maker often plays more than one role and needs to control flows and not be subjected to decisions. In the literature, performance indicators are often associated with model reliability, even though this association is not entirely accurate.

A critical factor in evaluating an ML algorithm’s quality is measuring its accuracy. However, while traditional metrics like MSE and RMSE assess overall model performance, they fail to capture localised insights into the expected error for prediction error for individual data points, especially when those points were not seen during training [41]. MSE and RMSE are useful for evaluating overall performance in low-risk scenarios. However, in high-risk contexts such as medical diagnostics or manufacturing, relying exclusively on these metrics can result in unsafe predictions, which could have critical consequences. General performance measures do not guarantee the predictions’ safety or accuracy for individual inputs, which could lead to severe consequences [42,43]. In practice, a prediction model may produce some reliable predictions while others are less so. Average accuracy alone does not offer insight into the reliability of individual predictions. The reliability of a model should be defined as the uncertainty of its predictions in a clear and meaningful probabilistic manner to help assess the confidence of individual predictions [44]. In classification ML tasks, reliability can be defined as the probability that the predicted class is equal to the actual one [45,46]. Regarding the reliability of regression models, i.e., the degree of trust that the prediction is correct, it is still poorly addressed in the literature [42,46]. Evaluating the performance of machine learning models solely with metrics such as MAE, MSE, and RMSE is inadequate. Particularly in environments like manufacturing, there is an increasing need to assess the reliability of predictions. For regression algorithms, this issue remains challenging due to the complexity of this task. In the future, an extension of the current application could incorporate reliability indicators for each prediction model, considering these when selecting the optimal model. At present, the criterion for the choice of the best model relies solely on MAE, which only measures the absolute error concerning the test set.

5. Conclusions

This study presents a practical and accessible ML-based approach for task workload prediction, delivering significantly improved accuracy over traditional average or manual estimation methods. It illustrates a smart, data-driven approach to managing order tasks in a real case study application that uses data collected and not simulated. For single-product modelling, the Regression Tree significantly outperformed both the simple average and manual prediction, achieving a reduction of 31% and 47% in MAE, respectively. When incorporating all available product data (multi-product case), despite a slight performance decrease in MAE, which is higher than the previous case, the XGBoost Tree Ensemble, as one of the ML models tested, still achieved better performance compared to the simple average and manual prediction (reduction of 32% and 42% in MAE, respectively). In both cases, ML models outperformed the traditional approaches.

Generally, predictions are often made using simple averages or estimations based on limited data, leading to imprecise or overly optimistic forecasts. In contrast, the “smart” approach uses data analytics, historical data, and advanced planning techniques to create more reliable and refined projections. For instance, by factoring in the specific capabilities and workloads of operators, as well as the integration of various departments, this approach minimises the guesswork that usually accompanies manual predictions. This not only results in a better understanding of the project’s needs but also allows for more efficient allocation of resources, better management of time, and the achievement of higher quality outcomes within the expected time frame. The advantage of this approach lies in its ability to adapt and refine predictions in real time based on new data and evolving project conditions. It offers a level of accuracy and reliability that traditional methods cannot match, ultimately leading to better project outcomes, reduced risks, and more successful project deliveries.

5.1. Limitations

One of the main limitations of this study lies in the limited size of the dataset used to train and evaluate the machine learning models, a consequence of the limited data available from the company. A restricted sample size can affect the generalisability and robustness of the results [47,48], increasing the risk of overfitting and diminishing the reliability of performance metrics when applied to unseen new data.

In the current work, model validation was conducted using a basic train/test split. Although this is a common practice, it may not fully capture the variability inherent in the data, nor does it provide a comprehensive evaluation of model performance across different data subsets. To mitigate this limitation and enhance the reliability of the findings, future studies should consider employing more rigorous validation techniques, such as k-fold cross-validation. This method involves dividing the dataset into k subsets (or “folds”) and iteratively training the model on some folds while testing it on the remaining ones [49]. Such an approach enables the model to be assessed across multiple data partitions, thereby offering a more robust estimate of its generalisation ability and reducing potential biases introduced by a single train/test split.

Furthermore, efforts should be directed towards increasing the dataset size, either by collecting additional real-world data or by implementing data augmentation techniques to artificially expand the dataset where appropriate [50,51]. A larger and more diverse dataset would not only facilitate better model training but also improve the statistical power of the evaluation, leading to more credible and generalisable conclusions. While the current findings offer valuable insights, they must be interpreted with caution, given the constraints imposed by the limited dataset size and the validation strategy employed. Addressing these limitations in future research will be critical for the development of more robust and generalisable machine learning models. Another limitation concerns the features included in the models. Expanding the variables considered could help enhance the overall performance of the model, making it more accurate and effective in processing data and achieving more accurate results. Despite the complexity of collecting such data, ML-based approaches could offer significant benefits to integrating and correlating all these characteristics.

5.2. Future Research Steps

Moving forward, the goal should be to evaluate a combination of accuracy and reliability indicators, once suitable strategies are defined for assessment, to select the best model for a given prediction task. In addition, the characteristics of machines and operators (those responsible for executing tasks on the ground) play a significant role in determining the overall workload for tasks in the scheduling phase, which was not analysed in this application. The specific skills, experience, and capacity of the operators can directly affect the efficiency and speed with which tasks are completed. For example, operators with advanced technical skills and experience are likely to execute their tasks more quickly and with fewer errors, reducing the overall workload and project duration. Conversely, operators with less experience or insufficient skills may need more time for training, supervision, and error correction. This can prolong the project’s timeline and increase the workload for other team members. In addition to skill levels, other factors, such as the number of operators for each task and their work schedules, can impact project timelines. A well-balanced team, where operators are assigned tasks that align with their strengths, can work more efficiently and meet deadlines more consistently. Moreover, understanding these characteristics and considering them when planning can lead to more accurate project estimations, better workload distribution, and improved overall project performance. Although there are complexities in this kind of data collection, it could be extremely beneficial to use ML-based approaches to integrate and correlate all these characteristics.

In the coming years, increasing attention will need to be given to the role of the workforce within production systems, along with the broader social aspects that influence planning activities. This is particularly true for SMEs, where the workforce plays a central role and where success often depends on the knowledge, expertise, and experience of the workforce. For this reason, planning should go beyond technical resources and consider individual skills and current working conditions for improving planning tasks such as workload prediction, aligning with Industry 5.0 principles by supporting human-centric planning. From this perspective, it becomes essential to shift toward a model that places the human being at the core of innovation and industrial design. Unlike Industry 4.0, which has primarily focused on automation and digital technologies, Industry 5.0 introduces a more human-centric philosophy, promoting collaboration between people and smart systems, and social sustainability alongside technological progress [52]. By integrating workforce characteristics into workload forecasting and planning processes, companies can build more adaptive, inclusive, and resilient workplaces. This approach not only will enhance productivity but also support the needs and the state of workers, reaffirming their fundamental role in the future of industry.

Author Contributions

Conceptualisation, V.D.S., V.D.P., S.M. and R.I.; Data curation, V.D.S. and J.C.; Methodology, V.D.S., V.D.P., S.M. and R.I.; Software, V.D.S.; Supervision, S.M. and R.I.; Visualisation, V.D.P. and J.C.; Writing—original draft, V.D.S. and J.C.; Writing—review and editing, V.D.P., S.M. and R.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets analysed during the current study are not publicly available due to confidential company data.

Conflicts of Interest

Author Joanna Calabrese was employed by Motortecnica S.r.l. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Lai, X.; Shui, H.; Ni, J. A Two-Layer Long Short-Term Memory Network for Bottleneck Prediction in Multi-Job Manufacturing Systems. In Proceedings of the ASME 2018 13th International Manufacturing Science and Engineering Conference, MSEC 2018, College Station, TX, USA, 18–22 June 2018; p. 3. [Google Scholar] [CrossRef]
Berlec, T.; Potočnik, P.; Govekar, E.; Starbek, M. Forecasting Lead Times of Production Orders in Sme’s. Iran. J. Sci. Technol. Trans. B Eng. 2010, 34, 521–538. [Google Scholar]
Sajko, N.; Kovacic, S.; Ficko, M.; Palcic, I.; Klancnik, S. Manufacturing Lead Time Prediction for Extrusion Tools with the Use of Neural Networks. Manag. Prod. Eng. Rev. 2020, 11, 48–55. [Google Scholar] [CrossRef]
Bender, J.; Trat, M.; Ovtcharova, J. Benchmarking AutoML-Supported Lead Time Prediction. Procedia Comput. Sci. 2022, 200, 482–494. [Google Scholar] [CrossRef]
Lingitz, L.; Gallina, V.; Ansari, F.; Gyulai, D.; Pfeiffer, A.; Sihn, W. Lead Time Prediction Using Machine Learning Algorithms: A Case Study by a Semiconductor Manufacturer. Procedia CIRP 2018, 72, 1051–1056. [Google Scholar] [CrossRef]
Alexopoulos, K.; Nikolakis, N.; Xanthakis, E. Digital Transformation of Production Planning and Control in Manufacturing SMEs-The Mold Shop Case. Appl. Sci. 2022, 12, 10788. [Google Scholar] [CrossRef]
Ghobakhloo, M.; Iranmanesh, M. Digital Transformation Success under Industry 4.0: A Strategic Guideline for Manufacturing SMEs. J. Manuf. Technol. Manag. 2021, 32, 1533–1556. [Google Scholar] [CrossRef]
Hansen, E.B.; Bøgh, S. Artificial Intelligence and Internet of Things in Small and Medium-Sized Enterprises: A Survey. J. Manuf. Syst. 2021, 58, 362–372. [Google Scholar] [CrossRef]
Kaiser, J.; Terrazas, G.; McFarlane, D.; de Silva, L. Towards Low-Cost Machine Learning Solutions for Manufacturing SMEs. AI Soc. 2021, 38, 2659–2665. [Google Scholar] [CrossRef]
Rauch, E.; Vickery, A.R. Systematic Analysis of Needs and Requirements for the Design of Smart Manufacturing Systems in Smes. J. Comput. Des. Eng. 2020, 7, 129–144. [Google Scholar] [CrossRef]
Sevinç, A.; Gür, Ş.; Eren, T. Analysis of the Difficulties of SMEs in Industry 4.0 Applications by Analytical Hierarchy Process and Analytical Network Process. Processes 2018, 6, 264. [Google Scholar] [CrossRef]
Bertolini, M.; Mezzogori, D.; Neroni, M.; Zammori, F. Machine Learning for Industrial Applications: A Comprehensive Literature Review. Expert Syst. Appl. 2021, 175, 114820. [Google Scholar] [CrossRef]
Kaymakci, C.; Wenninger, S.; Pelger, P.; Sauer, A. A Systematic Selection Process of Machine Learning Cloud Services for Manufacturing SMEs. Computers 2022, 11, 14. [Google Scholar] [CrossRef]
De Simone, V.; Di Pasquale, V.; Miranda, S. An Overview on the Use of AI/ML in Manufacturing MSMEs: Solved Issues, Limits, and Challenges. Procedia Comput. Sci. 2023, 217, 1820–1829. [Google Scholar] [CrossRef]
Sarker, I.H. AI-Based Modeling: Techniques, Applications and Research Issues Towards Automation, Intelligent and Smart Systems. SN Comput. Sci. 2022, 3, 158. [Google Scholar] [CrossRef]
Szedlak, C.; Leyendecker, B.; Reinemann, H.; Kschischo, M.; Pötters, P. Risks and Benefits of Artificial Intelligence in Small-and-Medium Sized Enterprises. In Proceedings of the International Conference on Industrial Engineering and Operations Management, Rome, Italy, 2–5 August 2021; pp. 195–205. [Google Scholar]
Patil, P.S.; Patil, S.S.; Patil, S.M.; Dhanvijay, M.R. Development of MS Excel and Power BI Integrated Production Scheduling System for an MSME. Eng. Access 2024, 10, 124–142. [Google Scholar]
Schkarin, T.; Dobhan, A. Prerequisites for Applying Artificial Intelligence for Scheduling in Small- and Medium-Sized Enterprises. In Proceedings of the International Conference on Enterprise Information Systems, ICEIS, Online, 25–27 April 2022; Volume 1, pp. 529–536. [Google Scholar] [CrossRef]
Burggräf, P.; Wagner, J.; Koke, B.; Steinberg, F. Approaches for the Prediction of Lead Times in an Engineer to Order Environment-A Systematic Review. IEEE Access 2020, 8, 142434–142445. [Google Scholar] [CrossRef]
Gyulai, D.; Pfeiffer, A.; Nick, G.; Gallina, V.; Sihn, W.; Monostori, L. Lead Time Prediction in a Flow-Shop Environment with Analytical and Machine Learning Approaches. IFAC Pap. OnLine 2018, 51, 1029–1034. [Google Scholar] [CrossRef]
Öztürk, A.; Kayaligil, S.; Özdemirel, N.E. Manufacturing Lead Time Estimation Using Data Mining. Eur. J. Oper. Res. 2006, 173, 683–700. [Google Scholar] [CrossRef]
Zhu, H.; Woo, J.H. Hybrid NHPSO-JTVAC-SVM Model to Predict Production Lead Time. Appl. Sci. 2021, 11, 6369. [Google Scholar] [CrossRef]
Mourtzis, D.; Doukas, M.; Fragou, K.; Efthymiou, K.; Matzorou, V. Knowledge-Based Estimation of Manufacturing Lead Time for Complex Engineered-To-Order Products. Procedia CIRP 2014, 17, 499–504. [Google Scholar] [CrossRef]
Bender, J.; Ovtcharova, J. Prototyping Machine-Learning-Supported Lead Time Prediction Using AutoML. Procedia Comput. Sci. 2021, 180, 649–655. [Google Scholar] [CrossRef]
Thiagarajan, V.; Dath, T.N.S.; Rajendran, C. Manufacturing Flow Time Estimation Using the Model-Tree Induction Approach in a Dynamic Job Shop Environment. Int. J. Ind. Syst. Eng. 2018, 28, 402–420. [Google Scholar] [CrossRef]
Siddique, M.T.; Idrees, M.D.; Jamil, A.; Ansari, A.; Sami, A.; Rauf, M. Lead Time Prediction for Sheeter Machine Production in a Paper Conversion Industry. J. Appl. Eng. Sci. 2022, 20, 1009–1015. [Google Scholar] [CrossRef]
Wirth, R. CRISP-DM: Towards a Standard Process Model for Data Mining. In Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining, Manchester, UK, 11–13 April 2000; pp. 29–39. [Google Scholar]
Gulia, P. Experimental Evaluation of Open Source Data Mining Tools: R, Rapid Miner and KNIME. Int. J. Innov. Technol. Explor. Eng. 2019, 3075, 4133–4144. [Google Scholar] [CrossRef]
Hsu, C.H. Optimal Decision Tree for Cycle Time Prediction and Allowance Determination. IEEE Access 2021, 9, 41334–41343. [Google Scholar] [CrossRef]
Flores-Huamán, K.J.; Escudero-Santana, A.; Muñoz-Díaz, M.L.; Cortés, P. Lead-Time Prediction in Wind Tower Manufacturing: A Machine Learning-Based Approach. Mathematics 2024, 12, 2347. [Google Scholar] [CrossRef]
Wei, Z.; Li, Z.; Niu, R.; Jin, P.; Yu, Z. A Study on the Man-Hour Prediction in Structural Steel Fabrication. Processes 2024, 12, 1068. [Google Scholar] [CrossRef]
Lorenzo-Espejo, A.; Escudero-Santana, A.; Muñoz-Díaz, M.L.; Robles-Velasco, A. Machine Learning-Based Analysis of a Wind Turbine Manufacturing Operation: A Case Study. Sustainability 2022, 14, 7779. [Google Scholar] [CrossRef]
Plevris, V.; Solorzano, G.; Bakas, N.P.; Ben Seghier, M.E.A. Investigation of Performance Metrics in Regression Analysis and Machine Learning-Based Prediction Models. In Proceedings of the World Congress in Computational Mechanics and ECCOMAS Congress, Oslo, Norway, 5–9 June 2022. [Google Scholar] [CrossRef]
Baldi, N.; Giorgetti, A.; Polidoro, A.; Palladino, M.; Giovannetti, I.; Arcidiacono, G.; Citti, P. A Supervised Machine Learning Model for Regression to Predict Melt Pool Formation and Morphology in Laser Powder Bed Fusion. Appl. Sci. 2023, 14, 328. [Google Scholar] [CrossRef]
Mohsen, O.; Petre, C.; Mohamed, Y. Machine-Learning Approach to Predict Total Fabrication Duration of Industrial Pipe Spools. J. Constr. Eng. Manag. 2023, 149, 04022172. [Google Scholar] [CrossRef]
Participatory, A.; Frameworks, P. Design of IoT Based Smart Shop Floor-An Exploratory Case Study. In Proceedings of the 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), Chennai, India, 1–2 August 2017; pp. 1231–1237. [Google Scholar]
Wang, Y.H.; He, G.J.; Kong, L.X. Research on Collection of Process Data Based on RFID Technology for Job-Shop. In Proceedings of the 2010 International Conference on Logistics Systems and Intelligent Management, ICLSIM 2010, Harbin, China, 9–10 January 2010; Volume 1, pp. 451–455. [Google Scholar] [CrossRef]
Kang, K.; Zhong, R.Y. A Methodology for Production Analysis Based on the RFID-Collected Manufacturing Big Data. J. Manuf. Syst. 2023, 68, 628–634. [Google Scholar] [CrossRef]
Reuter, C.; Brambring, F.; Hempel, T.; Kopp, P. Benefit Oriented Production Data Acquisition for the Production Planning and Control. Procedia CIRP 2017, 61, 487–492. [Google Scholar] [CrossRef]
Kanakana-Katumba, M.G.; Maladzi, R.W.; Oyesola, M.O. Smart Manufacturing Systems for Small Medium Enterprises: A Conceptual Data Collection Architecture. Lecture Notes in Mechanical Engineering. In Proceedings of the 18th Global Conference on Sustainable Manufacturing, Berlin, Germany, 5–7 October 2022; pp. 604–613. [Google Scholar] [CrossRef]
Bosnić, Z.; Kononenko, I. Estimation of Individual Prediction Reliability Using the Local Sensitivity Analysis. Appl. Intell. 2008, 29, 187–203. [Google Scholar] [CrossRef]
Zhang, X.; Bose, I. Reliability Estimation for Individual Predictions in Machine Learning Systems: A Model Reliability-Based Approach. Decis. Support Syst. 2024, 186, 114305. [Google Scholar] [CrossRef]
Bosnić, Z.; Kononenko, I. An Overview of Advances in Reliability Estimation of Individual Predictions in Machine Learning. Intell. Data Anal. 2009, 13, 385–401. [Google Scholar] [CrossRef]
Scorzato, L. Reliability and Interpretability in Science and Deep Learning. Minds Mach. 2024, 34, 27. [Google Scholar] [CrossRef]
Schulam, P.; Saria, S. Can You Trust This Prediction? Auditing Pointwise Reliability after Learning. In Proceedings of the AISTATS 2019—22nd International Conference on Artificial Intelligence and Statistics, Naha, Japan, 16–18 April 2019; p. 89. [Google Scholar]
Nicora, G.; Rios, M.; Abu-Hanna, A.; Bellazzi, R. Evaluating Pointwise Reliability of Machine Learning Prediction. J. Biomed. Inform. 2022, 127, 103996. [Google Scholar] [CrossRef] [PubMed]
Drescher, B.; Rippe, C. Industrial Small Data: Definition and Techniques for Data Augmentation in Manufacturing. Procedia CIRP 2024, 126, 266–271. [Google Scholar] [CrossRef]
Jourdan, N.; Longard, L.; Biegel, T.; Metternich, J. Machine Learning For Intelligent Maintenance And Quality Control: A Review Of Existing Datasets And Corresponding Use Cases. In Proceedings of the Conference on Production Systems and Logistics, Online, 10–11 August 2021; pp. 499–513. [Google Scholar] [CrossRef]
Anguita, D.; Ghelardoni, L.; Ghio, A.; Oneto, L.; Ridella, S. The ‘K’ in K-Fold Cross Validation. In Proceedings of the 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2012 Proceedings), Bruges, Belgium, 25–27 April 2012; pp. 441–446. [Google Scholar]
Nanthini, K.; Sivabalaselvamani, D.; Chitra, K.; Gokul, P.; KavinKumar, S.; Kishore, S. A Survey on Data Augmentation Techniques. In Proceedings of the 7th International Conference on Computing Methodologies and Communication, ICCMC 2023, Erode, India, 23–25 February 2023; pp. 913–920. [Google Scholar] [CrossRef]
Dablain, D.A.; Chawla, N. V Data Augmentation’s Effect on Machine Learning Models When Learning with Imbalanced Data. In Proceedings of the 2024 IEEE 11th International Conference on Data Science and Advanced Analytics, DSAA 2024, San Diego, CA, USA, 6–10 October 2024. [Google Scholar] [CrossRef]
Takawira, B.; Pooe, D. SME Readiness for Industry 5.0: A Systematic Literature Review. S. Afr. J. Entrep. Small Bus. Manag. 2025, 17, 946. [Google Scholar] [CrossRef]

Figure 1. CRISP-DM-based methodology for machine learning prediction modelling.

Figure 2. KNIME workflows for task workload prediction (the first row represents Case A, the second row Case B).

Figure 3. Comparison of the best model prediction (Regression Tree) results for the workload (green line) and the actual results (orange line)—case A.

Figure 4. Residual vs. predicted values for the best model prediction (Regression Tree—Case A).

Figure 5. Comparison of the best model prediction (XGBoost Tree Ensemble) results for task workload (yellow line) and the actual results (orange line)—Case B.

Figure 6. Residual vs. predicted values for the best model prediction (XGBoost Tree Ensemble—Case B).

Figure 7. Comparison of MAE [h] of various tested ML models for Case A and Case B.

Figure 8. Comparison of MSE [h²] and RMSE [h] of various tested ML models for Case A and Case B.

Figure 9. Comparison of R² of various tested ML models for Case A and Case B.

Table 1. Correlation analysis results for available numeric variables.

	Case A		Case B
	Correlation Value	p Value	Correlation Value	p Value
Quantity	−0.12	0.16	0.31	0.00
Polar Coil Width [mm]	−0.01	0.94	0.25	0.14
Polar Coil Length [mm]	0.10	0.24	0.46	0.01
Voltage [kV]	−0.11	0.21	0.06	0.49
Power [MVA]	0.14	0.12	0.23	0.01
Rotational Speed [rpm]	0.06	0.51	−0.09	0.30

Table 2. Feature identification for Case A and Case B.

ID	Feature	Type of Data	Meaning	A	B
1	Centre	Categorical	Type of working centre	✓	✓
2	Task	Categorical	Specific kind of production task	✓	✓
3	Type of product/service	Categorical	Type of order product/service to realise		✓
4	Quantity	Numeric	Quantity of products	✓	✓
5	Complexity Level	Categorical	Complexity level associated with the order by production manager (low, medium, or high)		✓
6	Polar Coil Width [mm]	Numeric	Width of polar coils		✓
7	Polar Coil Length [mm]	Numeric	Length of polar coils		✓
8	Voltage [kV]	Numeric	Voltage of Complete Electric Machine	✓	✓
9	Power [MVA]	Numeric	Power of Complete Electric Machine	✓	✓
10	Rotational Speed [rpm]	Numeric	Rotational Speed of Complete Electric Machine	✓	✓
11	Axis	Categorical	Type of axis (Horizontal or Vertical)	✓	✓

Table 3. Results for workload prediction—Case A.

Dataset	Method	MAE [h]	MSE [h²]	RMSE [h]	R²
Train: 105 Test: 27	Linear Regression (LR)	108.6	15,962.5	126.3	0.8
	Polynomial Regression	270.4	145,198.4	381.1	−1.0
	XGBOOST Linear Ensemble	199.6	54,409.6	233.3	0.3
	H2O Gener. Linear Model	152.4	33,377.5	182.7	0.5
	Regression Tree (RT)	104.3	22,330.5	149.4	0.7
	Random Forest	139.6	32,801.6	181.1	0.6
	Gradient Boosted Trees	111.2	37,447.7	193.5	0.5
	XGBoost Tree Ensemble	122.7	26,816.6	163.8	0.6
	H20 AutoML	116.2	32,831.1	181.2	0.5
	Simple average	151.5	83,046.0	288,2	/
	Manual prediction	197.3	122,843.2	350.5	/

Table 4. Results for workload prediction—Case B.

Dataset	Method	MAE [h]	MSE [h²]	RMSE [h]	R²
Train: 105 Test: 27	Linear Regression (LR)	342.3	238,659.7	488.5	−0.4
	Polynomial Regression	419.3	351,708.7	593.1	−1.0
	XGBOOST Linear Ensemble	296.1	150,932.9	388.5	0.1
	H2O Gener. Linear Model	272.6	160,616.5	400.8	0.1
	Regression Tree (RT)	270.9	169,883.5	412.2	0.0
	Random Forest	255.9	136,546.4	369.5	0.2
	Gradient Boosted Trees	249.8	148,031.9	384.8	0.2
	XGBoost Tree Ensemble	201.1	99,590.5	315.6	0.4
	H20 AutoML	230.2	132,144.7	363.5	0.3
	Simple average	297.8	267,012.0	516.7	/
	Manual prediction	347.2	393,279.0	627.1	/

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

De Simone, V.; Di Pasquale, V.; Calabrese, J.; Miranda, S.; Iannone, R. A Supervised Machine Learning-Based Approach for Task Workload Prediction in Manufacturing: A Case Study Application. Machines 2025, 13, 602. https://doi.org/10.3390/machines13070602

AMA Style

De Simone V, Di Pasquale V, Calabrese J, Miranda S, Iannone R. A Supervised Machine Learning-Based Approach for Task Workload Prediction in Manufacturing: A Case Study Application. Machines. 2025; 13(7):602. https://doi.org/10.3390/machines13070602

Chicago/Turabian Style

De Simone, Valentina, Valentina Di Pasquale, Joanna Calabrese, Salvatore Miranda, and Raffaele Iannone. 2025. "A Supervised Machine Learning-Based Approach for Task Workload Prediction in Manufacturing: A Case Study Application" Machines 13, no. 7: 602. https://doi.org/10.3390/machines13070602

APA Style

De Simone, V., Di Pasquale, V., Calabrese, J., Miranda, S., & Iannone, R. (2025). A Supervised Machine Learning-Based Approach for Task Workload Prediction in Manufacturing: A Case Study Application. Machines, 13(7), 602. https://doi.org/10.3390/machines13070602

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Supervised Machine Learning-Based Approach for Task Workload Prediction in Manufacturing: A Case Study Application

Abstract

1. Introduction

Aim of the Study

2. Materials and Methods

2.1. Company Details

2.2. Methodology

3. Analysis of Results

3.1. Case A: Workload Prediction for Generators

3.2. Case B: Workload Prediction Considering All Products

4. Discussions

5. Conclusions

5.1. Limitations

5.2. Future Research Steps

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI