Next Article in Journal
Investigations into Microchannel-Controlled Copper–Copper Temperature Gradient Bonding
Next Article in Special Issue
Numerical Investigation of Thermal Radiation Absorption by Humid Air and Its Impact on Conjugate Heat and Mass Transfer into a Room with a Semitransparent Wall
Previous Article in Journal
The Effect of Baffle Structure and Rotational Speed on the Flow Field in the Silicon Purification Process via the Rotational Segregation Method: A Water Model Study on Tracer Transport and Concentration Variation
Previous Article in Special Issue
Experimental Methodology for Thermo-Mechanical Stress Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Artificial Neural Network-Based Classification of Industrial Sustainability Profiles for Differentiated Fiscal Policy Design in Remanufacturing Processes

by
Marta Lilia Eraña-Díaz
1,*,
Juana Enríquez-Urbano
2,*,
Beatriz Martínez-Bahena
2,
Jazmin Yanel Juárez-Chávez
2,
Alfonso D’Granda-Trejo
3 and
Javier De-la-Rosa-Mondragon
2
1
Center for Research in Cognitive Sciences, Autonomous University of the State of Morelos, Cuernavaca 62210, Morelos, Mexico
2
Faculty of Chemical Sciences and Engineering, Autonomous University of the State of Morelos, Cuernavaca 62210, Morelos, Mexico
3
Faculty of Accounting, Administration & Informatics, Autonomous University of the State of Morelos, Cuernavaca 62210, Morelos, Mexico
*
Authors to whom correspondence should be addressed.
Processes 2026, 14(9), 1501; https://doi.org/10.3390/pr14091501
Submission received: 28 February 2026 / Revised: 1 April 2026 / Accepted: 27 April 2026 / Published: 6 May 2026

Abstract

The design of differentiated fiscal instruments for industrial sustainability requires robust, data-driven tools capable of capturing the heterogeneity of environmental performance across manufacturing units—a challenge that conventional econometric approaches address only partially, given the non-linear nature of operational–environmental interactions in reconfigurable production systems. This study introduces a two-phase computational framework that integrates unsupervised machine learning and supervised classification to generate evidence-based sustainability profiles for fiscal policy targeting. Its principal contribution is the combination of K-Means clustering with a binary artificial neural network (ANN) classifier, operationalized through an accessible decision-support interface that enables differentiated incentive allocation without requiring programming expertise from policymakers. A dataset of 1000 manufacturing records comprising seven operational and technological input variables—material usage, production capacity, reconfiguration time, downtime, AI optimization, IoT connectivity, and predictive maintenance—and three environmental output indicators—energy consumption, carbon emissions, and waste generation—was analyzed. In Phase One, K-Means segmentation with k = 6, selected through multi-criteria convergence (Silhouette = 0.102; Elbow, Davies–Bouldin, and Calinski–Harabasz indices), identified six distinct sustainability profiles with marked environmental differentiation. In Phase Two, a binary ANN classifier (architecture: 7 → 64 → 32 → 1 neurons; ReLU and sigmoid activations) was trained to distinguish the reference cluster C0 (low environmental impact: energy 145.1 kWh, emissions 45.2 CO2-eq) from the high-impact cluster C1 (emissions 67.8 CO2-eq, waste 41.5 kg). The trained classifier achieved an overall accuracy of 75.4% and an AUC-ROC of 0.774 on the held-out test set, with a macro-averaged F1-score of 0.753 and a Cohen’s kappa coefficient of 0.508, indicating moderate-to-substantial agreement beyond chance. Class C1 (high-impact establishments) achieved a precision of 0.794 and a recall of 0.730, supporting reliable identification of manufacturing units that would most benefit from targeted fiscal support. The framework is deployed through a Gradio-based graphical interface incorporating a traffic-light sustainability classification (green/yellow/red), enabling direct and interactive application by tax authorities and industrial policymakers. The modular architecture supports adaptation to larger or sector-specific datasets, making it transferable across industrial policy contexts.

Graphical Abstract

1. Introduction

The transition toward more sustainable industrial models represents one of the most pressing challenges of the 21st century, requiring not only technological innovation but also coherent and effective public policy frameworks. In this context, fiscal measures—including tax incentives, direct subsidies, value-added tax (VAT) reforms, and carbon pricing mechanisms—have emerged as pivotal instruments for steering organizational behavior toward environmentally responsible practices [1]. Industries engaged in remanufacturing, circular economy processes, and energy-efficient production are particularly sensitive to fiscal signals, as these activities often involve higher upfront costs and longer return horizons compared to conventional manufacturing [2].
A growing body of empirical and model-based evidence supports the effectiveness of fiscal support in promoting sustainable industrial behavior. Systematic analyses spanning Europe, China, Latin America, and other regions confirm that tax-based instruments and subsidy schemes can significantly stimulate carbon emission reductions, circular economy advancement, and energy efficiency improvements [3]. Studies from Spain associate a 0% VAT reform scenario with approximately €25 billion in savings and a measurable increase in repair demand, while model-based research from China demonstrates that carbon trading schemes can substitute or complement direct subsidies in driving remanufacturing output [4,5]. However, the effectiveness of these mechanisms is consistently found to be contingent upon stable, well-defined regulatory frameworks and the degree of market acceptance within the target sector [6].
Despite this body of evidence, three interconnected research gaps limit the capacity of current approaches to support differentiated fiscal policy design in manufacturing. First, the quantitative evaluation of fiscal policy impacts on organizational sustainability remains methodologically constrained: the multiplicity of interacting operational, environmental, and economic variables makes it difficult to isolate and measure the specific contribution of individual policy instruments. Second, the inherent heterogeneity of the industrial sector is rarely incorporated into fiscal instrument design—most policy frameworks treat manufacturing units as homogeneous targets, overlooking the marked variation in environmental performance across establishments. Third, traditional econometric and simulation models, while valuable, are structurally limited in their ability to capture the non-linear relationships among process variables and sustainability outcomes, particularly in reconfigurable production environments where Industry 4.0 technologies modulate environmental impact in complex, context-dependent ways [7,8]. Addressing these gaps calls for advanced computational tools capable of simultaneously profiling industrial heterogeneity and modeling non-linear operational–environmental interactions.
Based on these identified gaps, this study is guided by the following research hypotheses:
  • Manufacturing units operating in reconfigurable production environments can be segmented into identifiable groups with differential patterns of energy consumption, carbon emissions, and waste generation, suggesting the existence of heterogeneous sustainability profiles amenable to data-driven classification.
  • A binary ANN classifier trained on operational and technological variables can provide useful discriminative information for identifying manufacturing establishments with high environmental impact, offering a data-driven complement to qualitative assessments in fiscal targeting decisions.
  • The integration of machine learning-based sustainability profiling into an accessible decision-support interface represents a technically feasible and practically accessible approach to supporting—rather than replacing—expert judgment in the differentiated allocation of fiscal incentives in the manufacturing sector.
Artificial neural networks (ANNs) offer a powerful and flexible modeling paradigm for addressing this challenge. Unlike linear statistical models, ANNs can approximate complex, non-linear functional relationships between input variables—such as production capacity, material usage, reconfiguration time, and the adoption of enabling technologies (IoT, AI optimization, predictive maintenance)—and sustainability outputs, including energy consumption, carbon emissions, and waste generation [9]. Their ability to learn patterns from empirical data without requiring explicit functional specifications makes them particularly well-suited for environments where the causal mechanisms are incompletely understood or highly context-dependent [10].
This study proposes a two-phase computational framework applied to operational data from a reconfigurable manufacturing dataset comprising 1000 observations across multiple factories and machines. In Phase 1, an unsupervised K-Means clustering algorithm (k = 6) segments the industrial observations into behaviorally homogeneous sustainability profiles, enabling the identification of distinct groups with contrasting environmental and operational characteristics. In Phase 2, the two most environmentally contrasting clusters—Cluster 0 (low environmental impact, indicative of good practices) and Cluster 1 (high environmental impact, requiring fiscal support)—serve as the basis for training a binary ANN classifier. The resulting model enables the automated identification of the sustainability profile of any new industrial observation, providing a data-driven foundation for the differentiated design and targeting of fiscal incentive policies in remanufacturing.
To maximize its practical utility, the trained ANN is deployed through a Gradio-based graphical interface, enabling interactive use by policymakers, researchers, and industrial managers without requiring programming expertise. A traffic-light classification system (green/yellow/red) translates continuous model predictions into actionable sustainability assessments, facilitating the interpretation of results in real decision-making contexts.
The principal contributions of this work are:
  • A two-phase computational framework that operationalizes industrial sustainability profiling for fiscal policy purposes, combining unsupervised segmentation with supervised binary classification.
  • The application of multi-criteria K-Means clustering to identify distinct environmental sustainability profiles within a reconfigurable manufacturing dataset, demonstrating that industrial heterogeneity is structured and classifiable.
  • A binary ANN classifier with demonstrated discriminative capacity for distinguishing high-impact manufacturing units, providing a quantitative basis for differentiated fiscal targeting.
  • A Gradio-based decision-support interface incorporating a traffic-light sustainability classification system, designed for non-technical end-users in tax administration and industrial policy contexts.
  • An open, reproducible implementation deployed via Google Collaboratory, supporting validation and extension of the framework to other industrial datasets and policy contexts.
The remainder of this article is organized as follows. Section 2 presents the literature review. Section 3 presents the materials and methods, describing the dataset, the K-Means clustering procedure, and the binary ANN classifier architecture and training protocol. Section 4 reports the experimental results, including cluster characterization, intra-cluster traffic-light sustainability assessments, and ANN performance metrics. Section 5 discusses the findings in relation to the existing literature on fiscal policies for sustainable remanufacturing. Finally, Section 6 presents the main conclusions and outlines directions for future research.

2. Literature Review

2.1. Fiscal Instruments for Industrial Sustainability and Remanufacturing

The design of fiscal instruments to promote sustainable industrial practices has received sustained attention across policy-oriented and model-based research. Systematic reviews confirm that tax-based mechanisms—spanning carbon taxes, VAT reform, investment credits, and deduction schemes—and direct subsidy programs represent the dominant policy archetypes across European, East Asian, and Latin American contexts [1,3]. The effectiveness of these instruments is, however, consistently found to be sector-specific and contingent on the stability of the regulatory framework: retroactive policy changes significantly undermine private investment in energy-efficient and remanufacturing-intensive activities [4,6].
Remanufacturing operations are particularly sensitive to fiscal signals because they involve higher upfront capital requirements and longer return horizons than conventional manufacturing [2,5]. Model-based research from China demonstrates that neither pure carbon tax nor pure subsidy schemes dominate across all market structures; rather, hybrid or sequenced policy designs—where subsidies are phased out as carbon trading matures—tend to yield more robust remanufacturing output and social welfare outcomes [7]. Studies from Spain associate VAT reform targeted at repair and reuse services with substantial consumer cost savings and measurable demand shifts toward circular economy activities [6], while evidence from Latin American contexts underscores the heterogeneity of fiscal incentive effectiveness across industrial subsectors and firm sizes [11].
A recurrent finding across this literature is that uniform fiscal instruments applied across heterogeneous industrial targets systematically under-incentivize the units most in need of transition support—precisely those with the highest environmental footprint and lowest adoption of enabling technologies. This observation motivates the present study’s focus on data-driven sustainability profiling as a prerequisite for differentiated fiscal instrument design.

2.2. Machine Learning for Environmental Profiling and Sustainability Classification

The application of machine learning to environmental profiling and sustainability classification has expanded substantially since 2020, driven by the growing availability of operational and sensor-level industrial data [12]. Unsupervised clustering—particularly K-Means—has been employed to identify typologies of carbon dioxide emissions across heterogeneous national and regional contexts. Jiménez-Preciado et al. [13] combined K-Means with PCA and t-SNE dimensionality reduction to segment CO2 emission profiles among 208 countries, demonstrating that multi-criteria cluster validation (Calinski–Harabasz and Davies–Bouldin indices) identifies policy-relevant groupings even under modest silhouette separability. Similarly, Wang et al. [14] applied K-Means to provincial-level carbon emission data in China, showing that regionally differentiated CO2 typologies support more targeted decarbonization policy design than uniform national standards—a conclusion with direct methodological relevance to the present study’s application at the establishment level.
At the firm and process level, supervised classification approaches have been used to predict sustainability-relevant outcomes from operational variables. Artificial neural networks have demonstrated utility in modeling non-linear relationships between production parameters and environmental outputs, including energy consumption and emissions [15,16]. The application of binary classifiers to sustainability-related categorization tasks has precedent in adjacent domains. Sleem [17] demonstrated that a ResNet-based convolutional neural network can effectively distinguish between organic and recyclable waste categories from image datasets within a binary sustainability framework—a structural approach analogous to the classification proposed in the present study, applied to waste image data rather than manufacturing process variables. Similarly, Abdullah et al. [18] document AI’s demonstrated contribution to environmental monitoring and resource management across multiple UN Sustainable Development Goals, while Khan and Haq [16] provide a systematic taxonomy of machine learning techniques, confirming ANN as the central paradigm for non-linear classification in structured data environments. Recent work integrating ANN classifiers with explainable AI (XAI) techniques—specifically SHAP and LIME—has shown that interpretable model outputs are critical for regulatory acceptance and stakeholder trust in AI-assisted decision-making contexts [19]. The present study’s framework is positioned within this trend, proposing a binary ANN classifier whose outputs are made operationally accessible through a traffic-light interface and whose interpretability is identified as a priority direction for future research.
Table 1 presents a benchmark comparison of methodologically related studies in ML-based environmental and sustainability classification, providing the landscape against which the present framework can be situated.

2.3. Industry 4.0 Technologies and Eco-Efficiency in Reconfigurable Manufacturing

The adoption of Industry 4.0 technologies—including IoT-based real-time monitoring, AI-driven process optimization, and predictive maintenance systems—has been identified as a key determinant of eco-efficiency in advanced manufacturing environments [20,21,22]. IoT-enabled production systems reduce unplanned downtime and optimize energy consumption patterns through continuous feedback loops, while AI optimization applied to reconfiguration scheduling has been associated with measurable reductions in material waste and carbon intensity per unit of output [12]. Predictive maintenance, in particular, extends equipment lifespan and reduces the frequency of energy-intensive reconfiguration cycles, contributing to lower cumulative emissions per production run [23].
The present study operationalizes these technology adoption patterns as binary input variables (AI_Optimization_Applied, IoT_Enabled, Predictive_Maintenance) in the K-Means and ANN frameworks. This design choice reflects evidence that technology adoption status, even when measured dichotomously, is a meaningful discriminator of sustainability profile in heterogeneous manufacturing populations—a finding consistent with Industry 4.0 adoption surveys across emerging and developed economies [21].

2.4. Synthetic and Benchmark Datasets in Environmental and Policy ML Research

The use of structured synthetic or platform-curated datasets—including those distributed via open repositories such as Kaggle—has become a recognized methodological practice in machine learning research applied to industrial sustainability. Such datasets enable the development and validation of computational frameworks under controlled conditions, particularly in research contexts where access to proprietary firm-level environmental records is restricted by confidentiality agreements or regulatory constraints [17]. Several studies in the sustainability and operations research literature have employed publicly curated datasets to demonstrate proof-of-concept frameworks for classification, clustering, or policy optimization, treating reproducibility and methodological transparency as primary contributions rather than dataset novelty [16,17,18].
The dataset used in the present study—Reconfigurable_manufacturing.csv, distributed under a CC0 public domain license via Kaggle [24]—represents a structured operational environment consistent with Industry 4.0 reconfigurable manufacturing architectures. While its specific institutional and geographic provenance is not documented by the original contributor, its variable schema aligns with the operational and environmental indicators reported in the reconfigurable manufacturing literature [25]. The absence of documented provenance is acknowledged as a limitation; the framework’s transferability to institutionally validated datasets—including national manufacturing surveys—is identified as a priority direction for future research (see Section 5).

3. Materials and Methods

3.1. Data Source and Description

This study uses a structured dataset of 1000 records representing operational observations from a reconfigurable manufacturing environment comprising multiple factories and machines. The dataset—named Reconfigurable_manufacturing.csv and distributed under a CC0 public domain license via an open data repository [24]—contains 12 variables characterizing operational, technological, and environmental aspects of industrial production processes. Each record corresponds to a unique Factory–Machine combination identified by a Factory_ID and a Machine_ID. While the specific institutional and geographic provenance of the dataset has not been documented by the original contributor, its variable schema is consistent with the operational and environmental indicators reported in the reconfigurable manufacturing literature [25], and its Industry 4.0 technology adoption indicators align with the architectural components described by Hermann et al. [26] and Xu et al. [27]. The dataset’s undocumented provenance is acknowledged as a limitation of this study; a detailed discussion of this constraint and its implications for generalizability is provided in Appendix A. Future work should validate the proposed framework using institutionally documented datasets from national manufacturing surveys or industry consortia.
The variables used in this study were selected based on their documented relevance to industrial eco-efficiency and sustainability performance in reconfigurable manufacturing environments. Downtime and Reconfiguration_Time are established determinants of energy waste and carbon intensity in flexible production systems, as unplanned stoppages and reconfiguration cycles are associated with increased per-unit energy consumption and material waste [24,25]. Material_Usage and Production_Capacity reflect the resource intensity and operational efficiency of manufacturing units, which are primary drivers of environmental output variability [19,28]. The three binary technology adoption indicators—AI_Optimization_Applied, IoT_Enabled, and Predictive_Maintenance—capture the degree of Industry 4.0 integration, which has been empirically linked to reductions in energy consumption and waste generation through real-time monitoring and predictive control [12,13]. The three environmental output variables—Energy_Consumption (kWh), Carbon_Emissions (CO2-eq), and Waste_Generated (kg)—are standard indicators in industrial sustainability assessment frameworks [20,29]. Identifier columns (Factory_ID, Machine_ID) were excluded from the analytical models. Table 2 summarizes the variables used, their type, their role in the model, and a brief description.
Binary variables (AI_Optimization_Applied, IoT_Enabled, Predictive_Maintenance) were already encoded as integers (0 = absent, 1 = present) in the original dataset. Numeric variables were standardized prior to clustering using z-score normalization (mean = 0, standard deviation = 1) to prevent scale-dependent bias in distance-based algorithms.
A detailed description of the dataset structure, variable schema, and Industry 4.0 technology indicators represented in the data is provided in Appendix A.

3.2. General Research Design

The methodological framework unfolds in two sequential phases, illustrated in Figure 1. In Phase 1, an unsupervised K-Means clustering algorithm is applied to segment the industrial observations into behaviorally homogeneous groups. In Phase 2, the two most contrasting clusters identified in Phase 1 are used to train a binary Artificial Neural Network (ANN) classifier that distinguishes organizations with good environmental performance.
(C0—Good Practices) from those that require fiscal support to improve their sustainability outcomes.
(C1—Requires Fiscal Support). This two-phase design is coherent with the study objective of providing a quantitative, data-driven tool to guide the differentiated allocation of fiscal incentives for sustainable remanufacturing.

3.3. Phase 1: Unsupervised Segmentation via K-Means Clustering

The clustering phase was conducted using the K-Means algorithm, a partition-based method that assigns each observation to the cluster with the nearest centroid by minimizing the within-cluster sum of squared distances. The analysis was performed using seven input features, including four numerical variables and three binary indicators. Prior to model fitting, all numerical variables were standardized using z-score normalization to ensure that differences in scale did not disproportionately influence the distance calculations. Binary variables were retained in their original 0/1 encoding and treated as pseudo-continuous indicators, which is a common approach in mixed-variable K-Means implementations.
Distances between observations and cluster centroids were computed using the squared Euclidean distance metric, the standard distance measure for K-Means in continuous feature spaces. The clustering algorithm was executed with a maximum of 300 iterations and 10 random initializations to reduce sensitivity to centroid initialization. A fixed random seed was used to ensure reproducibility of the clustering process.
The optimal number of clusters k was determined through a multi-criteria evaluation framework that compared candidate solutions across the range k = 2 to k = 10 . Four complementary cluster-quality metrics were used for this purpose: the Elbow method based on inertia (within-cluster sum of squares), the Silhouette coefficient, the Davies–Bouldin index, and the Calinski–Harabasz index [30,31]. The joint analysis of these metrics was used to identify the partition that provided the best balance between intra-cluster cohesion and inter-cluster separation.
Once the optimal number of clusters was determined, the final cluster assignments were stored and appended to the dataset as a new categorical variable representing cluster membership. To support visualization and evaluate the separability of the identified clusters, Principal Component Analysis (PCA) was subsequently applied to reduce the dimensionality of the feature space while preserving the main variance structure of the data.
Finally, a traffic-light classification scheme was defined for the three environmental output variables using intra-cluster tercile thresholds (P33 and P66). This procedure allows sustainability performance to be evaluated relative to the distribution of observations within each cluster, thereby avoiding the use of global thresholds that may not adequately reflect the heterogeneity of the industrial segments analyzed.

3.4. Phase 2: Binary ANN Classifier (C0 vs. C1)

Following the cluster characterization stage described in Phase 1, the two clusters exhibiting the most contrasting environmental profiles were selected to construct a supervised classification dataset. These clusters served as reference groups representing different levels of environmental performance, enabling the development of a predictive model capable of identifying manufacturing units with similar profiles.
A feedforward Artificial Neural Network (ANN) was implemented using the TensorFlow/Keras framework (version 2.x) in Python 3.10. The model input consisted of the same seven variables used in the clustering phase, including four standardized numerical variables and three binary indicators, forming a seven-dimensional feature vector. The target variable corresponded to cluster membership, representing the binary classification task.
The dataset containing the observations from both selected clusters was partitioned into training and testing subsets using stratified random sampling in order to preserve the class distribution across both partitions. The training subset was used to fit the neural network model, while the test subset was reserved for evaluating the model’s predictive performance.
The ANN was trained using the Adam optimizer and the binary cross-entropy loss function, which are standard choices for binary classification problems. During training, a validation subset was used to monitor generalization performance across epochs.
Three evaluation metrics were tracked throughout the training process: training loss (binary cross-entropy), classification accuracy, and the Area Under the Receiver Operating Characteristic Curve (AUC–ROC) [32]. Early stopping was not applied in order to observe the complete learning dynamics; however, the learning curves were analyzed to detect potential signs of overfitting.

3.5. Software and Implementation

All analyses were implemented in Python 3.10 using Google Colaboratory as the computational environment. The principal libraries used were pandas (v2.0) and NumPy (v1.24) for data manipulation and numerical operations; scikit-learn (v1.3) for data preprocessing, clustering (K-Means), dimensionality reduction (PCA), and evaluation metrics; and TensorFlow/Keras (v2.13) for the construction and training of the Artificial Neural Network model. Data visualization and graphical analysis were performed using Matplotlib (v3.7) and Seaborn (v0.12).
To facilitate practical interaction with the trained classifier, a prototype graphical interface was developed using Gradio.(v3.50.2) This interface allows non-technical users—such as policymakers and industrial managers—to input operational and technological parameters and obtain a prediction of the corresponding sustainability class, together with a traffic-light assessment of the three environmental indicators.

4. Results

This section presents the results of the two-phase analytical framework. Section 4.1 reports the outcomes of the K-Means clustering analysis, including the selection of the optimal number of clusters, the characterization of each cluster’s operational and environmental profile, and the intra-cluster traffic-light sustainability classification. Section 4.2 presents the performance of the binary ANN classifier trained to distinguish Cluster 0 (C0, good environmental practices) from Cluster 1 (C1, requires fiscal support).

4.1. Phase 1: K-Means Clustering Results

4.1.1. Selection of the Optimal Number of Clusters

The seven input features listed in Table 2 (four numeric and three binary) were used as clustering variables. All numeric features were standardized (z-score) prior to fitting the model. Distance between each observation and cluster centroids is computed using the squared Euclidean distance, which is the standard metric for K-Means in continuous feature spaces [13]. Binary variables (AI_Optimization_Applied, IoT_Enabled, Predictive_Maintenance) were included in their original 0/1 encoding without additional transformation, following the convention for mixed-type K-Means implementations in which binary indicators are treated as pseudo-continuous [33]. Z-score normalization was applied exclusively to the four numeric variables to prevent features with larger absolute scales from dominating the distance calculation; binary variables, already bounded in [0, 1], were not normalized. The K-Means algorithm was executed with 300 maximum iterations and 10 random initializations (n_init = 10) using a fixed random seed (random_state = 42) to ensure reproducibility.
Figure 2 shows the four cluster-quality metrics evaluated for k = 2 to k = 10. The Elbow method (inertia) exhibits a discernible inflection point at k = 6, beyond which gains in compactness diminish substantially. The Silhouette coefficient reaches its global maximum (0.102) at k = 6, confirming that this partition offers the best trade-off between intra-cluster cohesion and inter-cluster separation. The Davies–Bouldin index decreases monotonically but shows a relative stabilization after k = 6, while the Calinski–Harabasz index declines consistently, as expected for increasing k values. The convergence of all four criteria at k = 6 provides robust, multi-criteria justification for the selected partition.
The final cluster assignments were stored in the variable Cluster and appended to the dataset as an additional column. Table 3 reports the values of all four criteria across the range k = 2 to k = 10.
The selected solution k = 6 simultaneously satisfies the following conditions: a discernible inflection (elbow) in the inertia curve at k = 6; the global maximum of the Silhouette coefficient (0.102); a further reduction in the Davies–Bouldin index; and a progressive decline in the Calinski–Harabasz index consistent with expected behavior for larger k values. Principal Component Analysis (PCA) was subsequently applied to verify cluster separability in reduced dimensionality, retaining the three principal components that together explain 43.1% of the total variance (PC1 = 15.2%, PC2 = 14.5%, PC3 = 13.4%).
The traffic-light (semaphore) classification of the three environmental output variables was defined using intra-cluster tercile thresholds (P33 and P66), as described in Table 4. This approach ensures that the sustainability assessment is contextualized within each cluster’s own distribution, rather than applying global fixed thresholds that may be inappropriate for heterogeneous industrial segments.
The selection of the 33rd and 66th percentiles as classification thresholds reflects a data-driven tercile partition that divides each cluster’s environmental distribution into three equally sized segments of relative performance. This approach was adopted in preference to absolute fixed thresholds—such as regulatory emission limits or sector-specific benchmarks—for two methodological reasons. First, the dataset lacks documented geographic or sectoral provenance, making it impossible to anchor thresholds to jurisdiction-specific standards without risking systematic misclassification across heterogeneous manufacturing contexts. Second, intra-cluster percentile thresholds ensure that sustainability assessments are calibrated to the actual empirical distribution of each identified profile, preserving the discriminative value of the traffic-light classification within each sustainability segment. It is important to note that this choice implies a relative rather than absolute classification: a green rating indicates performance in the lower third of a cluster’s own distribution, not necessarily compliance with any external environmental standard. The sensitivity of policy recommendations to threshold choice is acknowledged as a direction for future work, particularly in applications where verified decarbonization targets or regulatory benchmarks are available to anchor the classification to absolute performance criteria.

4.1.2. Cluster Characterization and Environmental Profiles

Each resulting cluster was characterized by computing the arithmetic mean of all input and output variables per group, generating a heatmap of normalized centroids, Figure 3, and a radar profile, Figure 4. The K-Means algorithm with k = 6 partitioned the 1000 manufacturing observations into six clusters with the following sizes: C0 (n = 159), C1 (n = 182), C2 (n = 155), C3 (n = 143), C4 (n = 193), and C5 (n = 168). Table 5 reports the arithmetic mean of all ten analytical variables per cluster, with environmental output variables color-coded according to their intra-cluster traffic-light classification (green = favorable, yellow = intermediate, red = unfavorable). Figure 3 presents the corresponding normalized heatmap for visual inspection of relative magnitudes across clusters.
The cluster profiles reveal markedly differentiated sustainability signatures. C0 (n = 159) stands out as the cluster with the lowest energy consumption (mean = 145.1 kWh), the lowest carbon emissions (45.2 CO2-eq), and moderate waste generation (34.7 kg). This cluster is further characterized by the shortest reconfiguration time (25.9 min) and the lowest downtime (11.3 min), suggesting efficient and well-managed production processes. C0 is therefore designated as the reference cluster for good environmental and operational practices.
In contrast, C1 (n = 182) exhibits the highest carbon emissions (67.8 CO2-eq) and the highest waste generation (41.5 kg), accompanied by above-average energy consumption (208.1 kWh), longer reconfiguration times (41.5 min), and moderately high downtime (17.1 min). This profile identifies C1 as the cluster that would most benefit from targeted fiscal support mechanisms oriented toward decarbonization and process efficiency.
C5 (n = 168) records the highest energy consumption overall (237.4 kWh) and the highest material usage (400.2 kg), despite having a relatively short reconfiguration time and low downtime. The discrepancy between operational efficiency indicators and energy output suggests that C5 may represent energy-intensive manufacturing processes where fiscal incentives for energy substitution or efficiency upgrades would be particularly relevant. C2 (n = 155) presents an unusual profile: high energy consumption (213.7 kWh) coexists with the lowest carbon emissions across all clusters (40.7 CO2-eq), possibly indicative of cleaner energy sources or low-carbon fuels, alongside low waste generation (33.4 kg). C4 (n = 193) has the highest production capacity (259.9 units) and the highest rates of IoT and predictive maintenance adoption (0.58 and 0.60, respectively), yielding comparatively lower waste (24.5 kg) in proportion to output, suggesting that technology adoption partially mediates environmental performance. Finally, C3 (n = 143) achieves the lowest waste generation of all clusters (19.1 kg) and the lowest material usage (292.5 kg) but exhibits relatively high carbon emissions (61.0 CO2-eq) despite moderate energy consumption (184.4 kWh).

4.1.3. Distribution of Environmental Variables by Cluster

Figure 5 presents box-and-whisker plots for the three environmental output variables disaggregated by cluster. For Energy Consumption, C0 is clearly the least intensive cluster (interquartile range approximately 115–165 kWh), while C5 occupies the upper extreme (IQR approximately 220–270 kWh). The distributions of Carbon Emissions show the greatest inter-cluster contrast between C0 (IQR = 37–54 CO2-eq) and C1 (IQR = 64–74 CO2-eq). For Waste Generated, C3 and C5 are the most favorable clusters (low median waste), while C1 presents the widest spread and highest median. The presence of outliers in several clusters, particularly C0 for Energy Consumption, indicates that even within well-performing groups, there exist individual observations with anomalously high emissions, reinforcing the case for observation-level fiscal targeting rather than aggregate sectoral incentives.

4.1.4. Technology Adoption by Cluster

Figure 6 displays the proportion of observations in each cluster that have adopted each of the three enabling technologies (AI optimization, IoT connectivity, and predictive maintenance). Adoption rates are broadly similar across clusters, ranging from 41% to 60%, indicating that the dataset does not exhibit strong technology-driven segmentation at the cluster level. The lowest AI adoption rate (41%) is observed in C2, while C4 records the highest rates of both IoT-enabled (58%) and predictive maintenance adoption (60%). The modest inter-cluster variation in technology adoption rates suggests that the environmental differentiation among clusters is driven primarily by operational variables (reconfiguration time, downtime, material usage) rather than by digital technology alone, which has implications for the design of fiscal incentives: policies targeting technology adoption may need to be complemented by process-oriented interventions to achieve meaningful environmental improvements.

4.1.5. Intra-Cluster Traffic-Light Sustainability Classification

Table 6 presents the intra-cluster traffic-light thresholds (tercile-based P33 and P66 percentiles) for the three environmental output variables across all six clusters. Observations below P33 within their cluster are classified as green (favorable); those between P33 and P66 as yellow (intermediate); and those above P66 as red (requiring intervention). The use of cluster-specific thresholds, rather than global cutoffs, ensures that the sustainability assessment is calibrated to the realistic operational range of each group, preventing systematic penalization of clusters that operate under inherently more energy-intensive production regimes.

4.1.6. PCA Validation of Cluster Separability

Figure 7 presents two-dimensional PCA projections of the cluster assignments (k = 6). The three retained principal components together explain 43.1% of the total variance (PC1 = 15.2%, PC2 = 14.5%, PC3 = 13.4%). As expected for a dataset with no dominant latent structure in low-dimensional space, the clusters exhibit substantial overlap in both the PC1–PC2 and PC1–PC3 planes. However, the cluster centroids (marked with stars) are visually separated in both projections, confirming that the K-Means algorithm has identified distinct mean profiles in the original 7-dimensional feature space. The moderate variance explained by three components reflects the high dimensionality and multi-factorial nature of the manufacturing sustainability data, which is consistent with prior literature on clustering of industrial datasets [13,14].

4.2. Phase 2: Binary ANN Classifier Results (C0 vs. C1)

Following the cluster characterization, Cluster 0 (C0, n = 159) and Cluster 1 (C1, n = 182) were selected for binary classification based on their contrasting environmental profiles. As shown in the heatmap and boxplot analyses (Figure 3 and Figure 5), C0 exhibits systematically lower values of Energy Consumption, Carbon Emissions, and Waste Generated compared to C1, making them suitable reference groups for the targeted application of fiscal incentive policies.
A feedforward ANN was implemented using the TensorFlow/Keras framework (version 2.x) in Python 3.10. The model architecture consists of an input layer, two hidden dense layers with Rectified Linear Unit (ReLU) activation, and a single-neuron output layer with sigmoid activation for binary probability estimation. The architecture is detailed in Table 7.
The input features for the ANN are the same seven variables used in the clustering phase (four numeric standardized, three binary), constituting a 7-dimensional feature vector. The target label is the binary cluster membership: 0 for C0 (good practices) and 1 for C1 (requires support). The combined subset (n = 341 records) was divided into training (80%) and test (20%) sets using stratified random sampling to preserve the class distribution in both partitions.
The model was compiled with the Adam optimizer (learning rate = 0.001) and binary cross-entropy loss function. Training was carried out over 32 epochs with a mini-batch size of 16. A validation split of 20% of the training set was reserved for monitoring generalization during training. Three performance metrics were tracked per epoch: loss (binary cross-entropy), classification accuracy, and the Area Under the Receiver Operating Characteristic curve (AUC-ROC). Early stopping was not applied in order to observe the full learning dynamics; however, the learning curves were analyzed to detect potential overfitting. The sigmoid output of the model produces a continuous probability estimate p ∈ [0, 1] representing the predicted likelihood of class membership in C1 (high environmental impact). Binary class labels are assigned using a decision threshold of 0.5: observations with p ≥ 0.5 are classified as C1 (Requires Fiscal Support), and those with p < 0.5 as C0 (Good Practices). This default threshold was retained for the primary evaluation; the sensitivity of classification performance to alternative threshold values is discussed in Section 5. Model evaluation on the held-out test set (n = 69; C0 = 32, C1 = 37) was performed using a comprehensive set of classification metrics. The confusion matrix reports true positives (TPs), true negatives (TNs), false positives (FPs), and false negatives (FNs) for both classes, from which per-class precision, recall, and F1-score were derived. Overall accuracy and macro-averaged F1-score (F1-macro) were computed to characterize aggregate performance across both classes. Cohen’s kappa coefficient (κ) was included as a chance-corrected agreement measure that accounts for class imbalance and provides a more conservative estimate of classifier reliability than raw accuracy [32]. The AUC-ROC provides a threshold-independent measure of discriminative capacity, with values closer to 1.0 indicating better separation between classes. Together, these metrics enable a rigorous assessment of both overall discriminative capacity and the specific performance on class C1—the high-impact establishments most relevant to fiscal policy targeting.

4.2.1. Environmental Contrast Between C0 and C1

Figure 8 confirms the environmental contrast between the two clusters selected for binary classification. For all three environmental output variables—Energy Consumption, Carbon Emissions, and Waste Generated—C1 (Requires Fiscal Support) exhibits systematically higher medians and wider distributions relative to C0 (Good Practices). The mean differences are: +63.0 kWh in energy consumption (+43.5%), +22.6 CO2-eq in carbon emissions (+50.0%), and +6.8 kg in waste (+19.6%). These differences confirm that C0 and C1 represent genuinely contrasting sustainability states, providing a meaningful binary classification target for the ANN.

4.2.2. ANN Training Dynamics

Figure 9 presents the learning curves of the binary ANN classifier over 32 training epochs. The binary cross-entropy loss decreases steadily for both the training set (from = 0.89 at epoch 1 to = 0.31 at epoch 32) and the validation set (from = 0.65 to = 0.49). The growing gap between training and validation loss in later epochs indicates the onset of moderate overfitting, a common occurrence in small datasets (n = 341 for the C0–C1 subset). Despite this, the validation loss does not increase sharply, suggesting that the degree of overfitting is contained.
Classification accuracy follows a consistent improvement trajectory: training accuracy rises from =0.50 to =0.86, while validation accuracy stabilizes around 0.80–0.82 from epoch 5 onward. The AUC-ROC curves show a similar pattern, with training AUC reaching = 0.93 and validation AUC stabilizing at approximately 0.87–0.89, demonstrating that the model possesses good discriminative capacity as assessed on the validation fold. The discrepancy between validation and held-out test performance (reported below) reflects the limited size of the test partition (n = 69) and typical variance in small-sample evaluations.

4.2.3. Classification Performance on the Test Set

Figure 10 presents the confusion matrix and Figure 11 presents the ROC curve evaluated on the held-out test set (n = 69 observations, stratified 80/20 split from the 341-observation C0–C1 subset). The confusion matrix shows 25 true negatives (C0 correctly classified), 27 true positives (C1 correctly classified), 7 false positives (C0 misclassified as C1), and 10 false negatives (C1 misclassified as C0).
Table 8 summarizes all performance metrics. The binary ANN classifier was evaluated on the held-out test set (n = 69; C0 = 32, C1 = 37). The classifier achieved an overall accuracy of 75.4% and an AUC-ROC of 0.774, indicating useful discriminative capacity between the two sustainability profiles. The macro-averaged F1-score of 0.753 and Cohen’s kappa coefficient of κ = 0.508 confirm moderate-to-substantial agreement beyond chance [34], providing a more conservative and rigorous assessment of classifier reliability than raw accuracy alone. Per-class performance analysis reveals the following: For class C0 (Good Practices): precision = 0.714, recall = 0.781, F1 = 0.746. For class C1 (Requires Fiscal Support): precision = 0.794, recall = 0.730, F1 = 0.761. The confusion matrix reports TP = 27, TN = 25, FP = 7, and FN = 10. Of the 37 C1 observations in the test set, 27 were correctly identified and 10 were missing (FN), yielding a C1 recall of 73.0%. This false-negative rate—representing establishments with high environmental impact that the model fails to flag at the default threshold of 0.5—carries direct policy implications that are discussed in Section 5.
Overall, the results demonstrate that the two-phase framework—K-Means profiling followed by binary ANN classification—successfully identifies distinct sustainability clusters and provides a data-driven tool for differentiating industrial units that could benefit from fiscal incentive mechanisms. The C0/C1 contrast and the classifier’s moderate-to-good discriminative performance establish a foundation for operationalizing the fiscal policy targeting approach described in the Introduction.

4.2.4. Gradio-Based Deployment and Traffic-Light Interface

The trained binary ANN classifier was deployed as an interactive graphical interface using the Gradio framework (v3.50.2) within the Google Collaboratory environment. The interface accepts the seven operational and technological input parameters—Material Usage, Production Capacity, Reconfiguration Time, Downtime, AI Optimization Applied, IoT Enabled, and Predictive Maintenance—and returns two simultaneous outputs: (i) the predicted sustainability class (C0: Good Practices or C1: Requires Fiscal Support), together with its associated probability score, and (ii) a traffic-light assessment (green/yellow/red) for each of the three environmental indicators—Energy Consumption, Carbon Emissions, and Waste Generated—based on the intra-cluster tercile thresholds reported in Table 6. Figure 12 presents a representative screenshot of the deployed interface.
The interface is organized into two panels. The top panel provides numerical input fields for the four continuous variables (Material Usage, Production Capacity, Reconfiguration Time, Downtime), each pre-populated with the dataset mean as a default value to facilitate exploratory use, and toggle switches for the three binary technology adoption variables. Upon submission, the bottom panel renders the sustainability class prediction and its probability score in real time, followed by the three color-coded traffic-light indicators. The green indicator signals that the predicted value of the corresponding environmental variable falls below the P33 threshold of the assigned cluster; yellow indicates an intermediate range (P33–P66); and red signals a value above P66, denoting a priority intervention target for fiscal policy purposes. Each indicator is accompanied by the numerical predicted value and the applicable intra-cluster threshold range, enabling the user to understand the quantitative basis of the classification.
To illustrate the interface behavior, Table 9 presents two representative use cases derived from the C0 and C1 cluster means. In Case A, an establishment with input parameters consistent with the C0 centroid (Reconfiguration Time = 26 min; Downtime = 11 min; Production Capacity = 231 units; AI Optimization = 1; IoT Enabled = 1; Predictive Maintenance = 1; Material Usage = 357 units) receives a C0 classification with probability 0.83 and green traffic-light indicators across all three environmental outputs. In Case B, parameters aligned with the C1 centroid (Reconfiguration Time = 42 min; Downtime = 17 min; AI Optimization = 0; IoT Enabled = 0; Predictive Maintenance = 0; Material Usage = 350 units; Production Capacity = 225 units) yield a C1 classification with probability 0.79, with red indicators for Carbon Emissions and Waste Generated and a yellow indicator for Energy Consumption. These contrasting outputs demonstrate that the interface correctly captures the environmental distinction between the two training clusters and translates it into actionable, user-interpretable guidance for fiscal policy decisions.
The Gradio interface directly addresses the operationalization gap between advanced machine learning modeling and practical decision-making in industrial sustainability policy. By eliminating the need for programming expertise, it enables direct use by sustainability auditors, fiscal policy analysts, and industrial managers who can interact with the tool through a standard web browser. The real-time, observation-level output—as opposed to aggregate sector-level statistics—supports the granular targeting of fiscal incentives at the establishment level, consistent with the differentiated policy design rationale articulated in the Introduction. The combination of a probabilistic class prediction and a three-indicator traffic-light summary provides two complementary layers of decision-relevant information: the class prediction supports binary eligibility determinations (e.g., whether an establishment qualifies for a fiscal support program), while the indicator-level traffic-light assessment identifies which specific environmental dimensions require priority intervention, thereby informing the design of conditioned and performance-linked incentive structures.

5. Discussion

5.1. Interpretation of Cluster Profiles in the Context of Fiscal Policy Design

The K-Means clustering analysis identified six distinct sustainability profiles among the 1000 manufacturing observations, confirming the fundamental heterogeneity of environmental performance across industrial units even within a single operational context. This result aligns with a growing body of empirical and policy literature emphasizing that one-size-fits-all fiscal instruments are insufficient to drive sector-wide decarbonization, and that differentiated fiscal approaches are necessary to address the varying baseline conditions of industrial actors [1,2].
The C0 cluster (n = 159)—characterized by the lowest energy consumption (145.1 kWh), the lowest carbon emissions (45.2 CO2-eq), and the shortest reconfiguration and downtime values—represents a benchmark of operational and environmental excellence within the dataset. This profile is consistent with manufacturing units that have already internalized efficiency improvements, possibly in response to existing market signals or prior regulatory exposure. Importantly, the presence of this cluster demonstrates that high-productivity operations (Production Capacity = 231.0) are compatible with low environmental impact, validating the theoretical premise of eco-efficient manufacturing [3]. From a fiscal policy standpoint, these units may not require corrective incentives; rather, they could be positioned as recipients of performance-based rewards or could serve as models for technology transfer programs.
In contrast, C1 (n = 182) exhibits the highest carbon emissions (67.8 CO2-eq) and waste generation (41.5 kg), together with above-average energy consumption (208.1 kWh) and notably longer reconfiguration times (41.5 min). The 50% emissions gap between C0 and C1 (22.6 CO2-eq) suggests that targeted fiscal interventions directed at C1-type establishments could yield substantial environmental improvements. The operational characteristics of C1—particularly longer downtime (17.1 min) and extended reconfiguration periods—indicate that inefficiencies in process management, rather than inherently energy-intensive activities, may be the main drivers of its elevated environmental outputs. This interpretation aligns with evidence from the Latin American industrial sector, where fiscal incentive programs linked to environmental compliance—such as income tax deductions of up to 20% of net income for certified environmental investments—have proven effective in promoting process efficiency improvements rather than solely end-of-pipe emission reductions [4]. The analogy is instructive: just as Ecuador’s oil sector benefited from structured fiscal instruments that conditioned tax relief on verifiable environmental management systems, C1-type manufacturing units operating in reconfigurable environments could respond to similarly structured, outcome-linked incentives. Such a policy approach is further justified by evidence from comparative industrial studies reporting emission differentials of 30–60% between high- and low-efficiency manufacturing units in the remanufacturing sector, suggesting that uniform fiscal instruments applied across heterogeneous populations tend to under-incentivize the highest-impact establishments [7,10].
The C5 cluster (n = 168) presents a distinct policy challenge: it records the highest energy consumption (237.4 kWh) and material usage (400.2 kg) among all clusters, despite efficient operational metrics (short reconfiguration time, low downtime). This combination suggests that C5 represents genuinely energy-intensive production regimes—perhaps involving materials or products with high embedded energy—rather than inefficient management. The parallel with energy-intensive industrial segments studied in the literature (e.g., extractive industries, heavy manufacturing) is clear: for these sectors, conventional process efficiency incentives may be insufficient, and fiscal instruments specifically targeting energy substitution—such as accelerated depreciation for clean energy capital expenditures or VAT exemptions on renewable energy equipment—may be more appropriate [5,6].
C2 (n = 155) presents an analytically important anomaly: high energy consumption (213.7 kWh) coexists with the lowest carbon emissions across all clusters (40.7 CO2-eq). This decoupling of energy use from emissions suggests that C2 manufacturing units may already be operating with cleaner energy sources—natural gas, biomass, or electricity from low-carbon grids—or may be applying energy in processes that do not directly generate combustion-related CO2. This pattern mirrors findings in the EPS recycling literature, where the choice of treatment technology—specifically the use of green solvents such as d-limonene and p-cymene in tertiary chemical processes—substantially reduces the environmental footprint associated with energy-intensive operations while maintaining processing capacity [7]. The fiscal implication is notable: emissions-based carbon taxes or carbon credits would correctly reward C2-type establishments, while energy-consumption-based levies would inadvertently penalize them despite their low emissions profile. This underscores the importance of multi-indicator policy design and the limitations of single-metric fiscal instruments.
C4 (n = 193), with the highest production capacity (259.9) and the highest rates of IoT and predictive maintenance adoption (58% and 60%, respectively), achieves comparatively low waste generation (24.5 kg) despite moderate energy consumption. The waste-reduction benefit of technology adoption observed in C4 aligns with evidence from circular economy studies in manufacturing contexts: the systematic recovery and reuse of materials—analogous to the material recovery programs documented in maquiladora operations in northern Mexico, where structured circular economy practices have been shown to generate measurable environmental and economic co-benefits [11]—can deliver both environmental and economic dividends. For C4-type establishments, fiscal incentives linked to IoT adoption, digital infrastructure investment, or Industry 4.0 technology integration could yield further environmental improvements by extending predictive maintenance and materials tracking capabilities.

5.2. ANN Classifier Performance and Policy Operationalization

The binary ANN classifier achieved an overall accuracy of 75.4% and an AUC-ROC of 0.774 on the held-out test set (n = 69; C0 = 32, C1 = 37), demonstrating discriminative capacity meaningfully above chance and above the majority-class baseline of approximately 53.6%. The macro-averaged F1-score of 0.753 and Cohen’s kappa coefficient of κ = 0.508 confirm moderate-to-substantial agreement beyond chance [30], providing a more conservative assessment of classifier reliability than raw accuracy alone. While these performance levels are moderate in absolute terms, they are contextually appropriate for a classifier trained on a 341-record subset comprising two clusters from a heterogeneous manufacturing dataset. The moderate Silhouette coefficient of the K-Means solution (0.102) reflects inherent distributional overlap between C0 and C1, establishing a practical upper bound on achievable binary separation [9].
Per-class performance reveals asymmetric profiles with direct relevance to fiscal policy design. Class C1 (Requires Fiscal Support) achieved a precision of 0.794 and a recall of 0.730: when the classifier flags a manufacturing unit as high-impact, it is correct approximately 79% of the time, implying a low rate of misallocation of fiscal resources toward units that do not genuinely require support. The C1 recall of 73.0%—representing 27 correctly identified units out of 37 in the test set—indicates that under the default decision threshold of 0.5, approximately 27% of high-impact establishments are not captured in a single classification pass (FN = 10). This false-negative rate is not trivial for policy completeness. Class C0 (Good Practices) achieved a precision of 0.714 and a recall of 0.781, indicating reliable identification of low-impact establishments that do not require fiscal intervention.
The trade-off between C1 precision (0.794) and C1 recall (0.730) reflects a fundamental decision-threshold problem with direct policy implications. In contexts where the cost of false negatives—missed high-impact establishments—is deemed higher than the cost of false positives—fiscal support allocated to units that do not strictly require it—policymakers may prefer to lower the classification threshold below 0.5 to prioritize recall. Conversely, when fiscal resources are highly constrained and precision of targeting is paramount, the default threshold or a higher value may be appropriate. This flexibility is operationally accessible through the Gradio interface, which allows authorized users to adjust the decision boundary interactively. Additional mitigation strategies include: (i) periodic re-application of the classifier as new operational data become available; (ii) ensemble methods combining the ANN output with complementary rule-based or regression models; and (iii) priority weighting of the C1 class during training to improve recall without modifying the threshold [10].
Analysis of the learning curves reveals stable convergence over 32 training epochs, with training and validation losses decreasing consistently from epoch 1 to approximately epoch 20, after which a modest divergence between training accuracy (=87%) and test accuracy (75.4%) is observed. This gap is consistent with a degree of overfitting attributable to the limited size of the training partition (n = 217 records after stratified split). The model’s generalization capacity could be improved in future iterations through regularization techniques—including dropout layers and L2 weight penalties—cross-validation over the full C0/C1 subset, or data augmentation strategies that expand the effective training sample. These directions are identified as priorities for subsequent development of the framework.
The deployment of the trained classifier through a Gradio-based graphical interface represents a methodologically important contribution that extends beyond model development. By enabling policymakers, industrial managers, and sustainability auditors to interactively query the classifier using operational parameters—without requiring programming expertise—the interface operationalizes the model as a practical decision-support tool. This approach aligns with current best practices in applied machine learning for public policy, where accessibility and interpretability of model outputs are recognized as critical factors for adoption [35]. The traffic-light classification overlay (green/yellow/red) translates probabilistic outputs into categorical assessments familiar to non-technical users, bridging the gap between computational modeling and real-world decision contexts. It is important to note that the framework is designed to support—rather than replace—expert judgment in fiscal policy design: the classifier provides a data-driven signal that should be integrated into broader administrative and sectoral assessment processes.

5.3. Role of Technology Adoption Variables and Implications for Incentive Design

A notable finding of the cluster analysis is that the three binary technology adoption variables—AI optimization, IoT connectivity, and predictive maintenance—exhibit modest inter-cluster variation, with adoption rates ranging from 41% to 60% across clusters. This relatively uniform distribution indicates that technology adoption alone does not drive the observed environmental differentiation among clusters. Rather, operational variables such as reconfiguration time, downtime, and material usage appear to be the primary determinants of cluster membership. This finding has important implications for fiscal policy design: subsidies exclusively targeting technology adoption may not generate the expected environmental outcomes if not accompanied by complementary support for operational process redesign [29].
Nonetheless, the C4 cluster demonstrates that high technology adoption rates—particularly for IoT and predictive maintenance—correlate with the lowest waste generation among high-capacity clusters, suggesting that the environmental benefits of digitalization are conditioned on operational context. This conditional relationship is consistent with the broader Industry 4.0 literature, where the environmental impact of digital technologies is found to be mediated by how they are integrated into production workflows rather than by their mere presence [20]. Fiscal incentives structured around measurable outcomes—verified reductions in waste per unit of production, documented energy intensity improvements, or certified IoT-enabled monitoring systems—may therefore be more effective in steering environmental improvements across diverse cluster profiles than blanket technology adoption subsidies. This conclusion aligns with recent evidence from AI and ML applications in industrial sustainability, where the environmental benefit of digital technologies is consistently found to be conditional on integration depth and operational context rather than adoption status alone [12,18].

5.4. Limitations and Directions for Future Research

Several limitations of this study warrant acknowledgment. First, the dataset consists of observations from a reconfigurable manufacturing environment whose geographic and institutional provenance has not been explicitly documented by its author (see Appendix A for details). While its variable structure is consistent with industrial manufacturing contexts and aligns with prior literature on reconfigurable systems and Industry 4.0, it does not capture the full complexity of real-world industrial settings, including sector heterogeneity, geographic variation, regulatory exposure, and firm-level economic constraints. Future research should validate and extend the proposed methodology using empirical data from documented manufacturing facilities, ideally incorporating information on fiscal policy exposure and regulatory compliance status as additional covariates. In the Mexican context, the Encuesta Mensual de la Industria Manufacturera (EMIM) published by INEGI, as well as sectoral databases from SEMARNAT and SE, offer institutionally validated sources of manufacturing operational and environmental data that could serve as a basis for geographically anchored replications of this framework. The use of open-repository datasets for proof-of-concept ML frameworks, while a recognized methodological practice, should be accompanied by validation studies using documented sources as the field matures [16,17].
Second, the Silhouette coefficient of 0.102—while the highest achieved across the tested k range—indicates limited cluster compactness in the 7-dimensional feature space. This is partially a consequence of the multi-factorial, mixed-type nature of the data (continuous and binary variables), and future clustering approaches may benefit from alternative algorithms better suited to mixed-type data, such as k-prototypes or hierarchical clustering with Gower distance [35]. Third, the binary classification framework—distinguishing only C0 from C1—does not directly address the policy-relevant status of the other four clusters (C2–C5). A multi-class extension of the ANN, or the development of separate binary classifiers for each cluster pair, would provide a more comprehensive policy profiling tool. Additionally, the integration of economic variables—such as firm size, sector, energy costs, and exposure to existing fiscal mechanisms—into both the clustering and classification phases would substantially enhance the policy relevance of the resulting profiles.
A fourth limitation concerns model interpretability. The binary ANN classifier, while effective as a discriminative tool, does not natively expose the relative contribution of each input variable to its classification decisions. The integration of explainability methods—specifically SHAP (SHapley Additive exPlanations) values and LIME (Local Interpretable Model-agnostic Explanations)—represents a high-priority extension of this framework [19,36]. SHAP analysis would identify which operational variables (e.g., reconfiguration time, material usage, IoT adoption) most strongly drive C1 classification, providing actionable feature-level insights for fiscal instrument design: if, for instance, reconfiguration time is identified as the dominant predictor of high-impact classification, fiscal incentives could be specifically targeted at reducing reconfiguration cycles rather than offering broad technology adoption subsidies. This level of variable-level policy guidance represents the logical next step in the operationalization of the framework and is identified as the primary direction for immediate follow-on work.

6. Conclusions

This study developed and validated a two-phase computational framework for classifying industrial sustainability profiles in reconfigurable manufacturing environments, with the objective of supporting the differentiated design of fiscal incentive policies. The framework combined K-Means unsupervised clustering—applied to a 1000-observation dataset with seven operational and technological input variables—with a binary artificial neural network classifier trained to distinguish environmentally efficient manufacturing units (C0) from those requiring fiscal support for sustainability improvement (C1).
The K-Means analysis identified six statistically distinct clusters (k = 6, selected via multi-criteria convergence), with mean profiles ranging from the low-impact benchmark of C0 (Energy: 145.1 kWh; Emissions: 45.2 CO2-eq; Downtime: 11.3 min) to the high-impact profile of C1 (Emissions: 67.8 CO2-eq; Waste: 41.5 kg; Reconfiguration time: 41.5 min) and the energy-intensive C5 (Energy: 237.4 kWh; Material usage: 400.2 kg). The clustering revealed that environmental differentiation is primarily driven by operational variables—reconfiguration time, downtime, and material usage—rather than by the adoption of digital technologies alone, with important implications for the design of fiscal instruments.
The binary ANN classifier, trained on the C0-C1 contrast subset (n = 341), achieved an overall accuracy of 75.4% and an AUC-ROC of 0.774 on the held-out test set. The macro-averaged F1-score of 0.753 and Cohen’s kappa coefficient of κ = 0.508 confirm moderate-to-substantial discriminative agreement beyond chance, providing a rigorous basis for fiscal targeting decisions. The precision for the high-impact C1 class (0.794) indicates that approximately 79% of model-flagged establishments genuinely require fiscal support—a practically useful targeting accuracy for policy allocation. A Gradio-based graphical interface enables non-technical users to apply the trained model to new operational inputs, returning a sustainability class prediction and a traffic-light assessment of energy, emissions, and waste indicators.
From a policy perspective, the findings support the argument that fiscal instruments for industrial sustainability—including income tax deductions, VAT reforms on clean technologies, energy substitution subsidies, and carbon pricing mechanisms—should be calibrated to the environmental and operational profile of target establishments rather than applied uniformly across industrial sectors. The six-cluster taxonomy provides a data-driven basis for this differentiation: C1-type establishments may benefit most from process efficiency incentives and operational restructuring support; C5-type units from energy substitution and clean energy investment incentives; C2-type units from emissions-based rewards recognizing low-carbon energy use; and C4-type units from digital infrastructure incentives that extend the waste-reducing benefits of IoT and predictive maintenance integration.
The proposed framework contributes to the methodological literature on computational tools for sustainability assessment by demonstrating that unsupervised clustering combined with ANN classification can provide interpretable, actionable profiles for fiscal policy targeting. The Gradio interface and traffic-light classification system extend the practical utility of the model beyond the research context to real-world decision support, addressing a recognized gap between advanced analytics and policy operationalization. Future work should prioritize four directions: (i) validation of the framework on empirically documented industrial datasets, including institutionally sourced manufacturing surveys such as the EMIM (INEGI) or sectoral databases from national environmental agencies; (ii) extension of the binary classification to all six sustainability profiles through multi-class ANN architectures or pairwise classifiers; (iii) integration of SHAP and LIME explainability methods to identify the variable-level drivers of C0/C1 classification and translate them into actionable fiscal instrument design; and (iv) incorporation of economic and regulatory exposure variables—firm size, sector, existing fiscal policy exposure—into both the clustering and classification phases to enhance policy relevance and generalizability.

Author Contributions

Conceptualization and methodology, M.L.E.-D., J.E.-U. and J.Y.J.-C.; software, M.L.E.-D., B.M.-B. validation, A.D.-T. and B.M.-B.; formal analysis and investigation, M.L.E.-D., J.E.-U., J.Y.J.-C. and J.D.-l.-R.-M.; resources, M.L.E.-D., J.E.-U. and J.Y.J.-C.; data curation, B.M.-B. and J.E.-U.; writing original draft preparation, M.L.E.-D., J.E.-U., B.M.-B. and J.D.-l.-R.-M.; writing review and editing, M.L.E.-D., A.D.-T., J.E.-U.; visualization, J.D.-l.-R.-M. and A.D.-T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset analyzed in this study is described in detail in Appendix A.1 (Dataset Overview and Source). The code developed for the two-phase computational framework is available from the corresponding author upon reasonable request.

Acknowledgments

During the preparation of this study, the authors used ChatGPT (OpenAI, GPT-4, November 2022 release) for the purposes of improving and refining scripts related to machine learning. The authors have carefully reviewed and edited all outputs generated by this tool and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
ANNArtificial Neural Network
AUC-ROCArea Under the Receiver Operating Characteristic Curve
CO2-eqCarbon Dioxide Equivalent
FNFalse Negative
FPFalse Positive
IoTInternet of Things
IQRInterquartile Range
K-MeansK-Means Clustering Algorithm
kWhKilowatt-hour
LIMELocal Interpretable Model-agnostic Explanations
PC1/PC2/PC3First/Second/Third Principal Component
PCAPrincipal Component Analysis
ReLURectified Linear Unit
ROCReceiver Operating Characteristic
SHAPSHapley Additive exPlanations
TNTrue Negative
TPTrue Positive
VATValue-Added Tax

Appendix A

Appendix A.1. Dataset Overview and Source

The dataset employed in this study, titled Sustainable Process Optimization in Industry—Reconfigurable Manufacturing System Efficiency Data, is publicly available on the Kaggle platform https://www.kaggle.com/datasets/programmer3/sustainable-process-optimization-in-industry (accessed on 25 February 2026). It was curated to support research and analysis in sustainable process planning strategies for Reconfigurable Manufacturing Systems (RMSs) in the context of Industry 4.0.
The dataset comprises 1000 records corresponding to unique Factory–Machine combinations across industrial environments. Each observation captures key operational metrics related to energy consumption, material usage, waste generation, carbon emissions, and production efficiency. The dataset also incorporates advanced manufacturing technology indicators—namely AI-based process optimization, IoT-enabled systems, and predictive maintenance—to support decision-making for process reconfiguration and sustainability assessment.
Table A1 presents the complete variable schema, including variable type, analytical category, and a brief operational description.
Table A1. Variable schema of the Reconfigurable Manufacturing System Efficiency dataset.
Table A1. Variable schema of the Reconfigurable Manufacturing System Efficiency dataset.
VariableTypeCategoryDescription
Energy_ConsumptionNumericalEnvironmental OutputTotal energy consumed during production processes (kWh)
Material_UsageNumericalOperational InputAmount of raw material consumed per production cycle (units)
Waste_GeneratedNumericalEnvironmental OutputIndustrial waste produced per cycle (kg)
Carbon_EmissionsNumericalEnvironmental OutputCO2 equivalent emissions per production cycle (CO2-eq)
Production_CapacityNumericalOperational InputMaximum productive output capacity of the machine
Reconfiguration_TimeNumericalOperational InputTime required to reconfigure the production line (min)
DowntimeNumericalOperational InputUnplanned machine downtime per cycle (min)
AI_Optimization_AppliedBinary (0/1)Industry 4.0 TechnologyWhether AI-based process optimization is applied
IoT_EnabledBinary (0/1)Industry 4.0 TechnologyWhether IoT sensors and connectivity are active
Predictive_MaintenanceBinary (0/1)Industry 4.0 TechnologyWhether predictive maintenance is in use
Factory_IDCategoricalIdentifierUnique factory identifier (excluded from models)
Machine_IDCategoricalIdentifierUnique machine identifier (excluded from models)
Note: Binary variables (AI_Optimization_Applied, IoT_Enabled, Predictive_Maintenance) were pre-encoded as integers (0 = absent, 1 = present) in the original dataset. Identifier columns (Factory_ID, Machine_ID) were excluded from all analytical models.
It should be noted that the dataset does not provide explicit documentation of its geographic, sectoral, or institutional origin. The contributing author is identified as a Python developer on the Kaggle platform (username: programmer3), and the dataset appears to have been curated for research and benchmarking purposes in the context of reconfigurable manufacturing systems, rather than extracted from a specific industrial facility. The dataset is published under a CC0: Public Domain license, and was last updated approximately in early 2025. In the absence of verified provenance, findings derived from this dataset should be interpreted as methodological demonstrations subject to empirical validation using data from documented manufacturing contexts [24].

Appendix A.2. Industry 4.0 Technologies Represented in the Dataset

The inclusion of three binary technology indicators reflects the core enabling technologies of Industry 4.0 that are most directly relevant to sustainable manufacturing performance. Industry 4.0 refers to the fourth industrial revolution, characterized by the digital integration of physical production processes through cyber–physical systems, the Internet of Things (IoT), cloud computing, big data analytics, and artificial intelligence [23,24,25,26,27,28]. In the context of Reconfigurable Manufacturing Systems, these technologies enable dynamic adaptation of production lines in response to changing market conditions while minimizing resource consumption and environmental impact.
AI-based process optimization (AI_Optimization_Applied). Artificial intelligence applications in manufacturing include machine learning algorithms for process parameter optimization, computer vision for quality control, and reinforcement learning for adaptive scheduling. AI optimization directly reduces material waste, energy consumption, and carbon emissions by identifying more efficient operating configurations. In this dataset, the AI_Optimization_Applied variable captures whether any form of AI-driven process optimization is actively deployed on a given machine.
IoT-enabled systems (IoT_Enabled). The Internet of Things enables real-time data collection from sensors embedded in production equipment, providing continuous monitoring of operational parameters such as temperature, vibration, energy draw, and throughput. IoT connectivity is a prerequisite for data-driven sustainability management, as it provides the granular process data necessary to detect inefficiencies and trigger corrective actions. The IoT_Enabled variable records whether active sensor networks and connectivity infrastructure are present on the machine.
Predictive maintenance (Predictive_Maintenance). Predictive maintenance leverages sensor data and machine learning models to forecast equipment failures before they occur, enabling scheduled interventions that prevent unplanned downtime and extend equipment life. From a sustainability perspective, predictive maintenance reduces waste associated with catastrophic failures, lowers energy consumption from degraded equipment operating inefficiently, and decreases the material footprint of reactive replacement cycles. The Predictive_Maintenance variable indicates whether a predictive maintenance strategy is implemented for the corresponding machine.
Together, these three indicators operationalize the degree of technological sophistication of each Factory–Machine unit. Their inclusion in the sustainability classification model reflects the empirical evidence that Industry 4.0 technologies serve as enablers of eco-efficient manufacturing performance, making them relevant both as predictors in the ANN classifier and as potential targets for fiscal incentive design—particularly investment tax credits and accelerated depreciation programs aimed at accelerating technology adoption in lagging industrial segments.

Appendix A.3. Dataset Relevance for Fiscal Policy Research

The dataset’s combination of operational metrics, environmental outputs, and technology adoption indicators makes it well-suited for the dual analytical objectives of this study. The environmental output variables (Energy_Consumption, Carbon_Emissions, Waste_Generated) provide direct measures of ecological performance that correspond to the regulatory targets of carbon pricing, emission trading schemes, and environmental tax incentives. The operational variables (Material_Usage, Production_Capacity, Reconfiguration_Time, Downtime) capture the productive efficiency dimensions that fiscal policy literature identifies as determinants of investment capacity and technology adoption decisions.
From a policy simulation perspective, the dataset enables the construction of differentiated sustainability profiles that can serve as the basis for targeted fiscal intervention design—identifying which industrial segments demonstrate sufficient performance to qualify for fiscal recognition, and which require structured incentive support to achieve sustainability transitions. This alignment between dataset structure and policy objectives is a key rationale for its selection in the present research.

References

  1. Tossani, H.A.; da Silva, D.J.; Romano, A. Revisão Sistemática Sobre Tributação Ambiental: Critérios, Impactos Globais e Lições Para Políticas Públicas. Rev. Gestão Secr. 2025, 16, e4921. [Google Scholar] [CrossRef]
  2. López Pérez, S.J.L.; Turnes Abelenda, J.A.; Vence Deza, X. Taxation and the circular economy in Spain: Current situation and potentialities of the use of tax benefits. Rev. Galega Econ. 2023, 32, 1–23. [Google Scholar] [CrossRef]
  3. Chai, Q.; Sun, M.; Lai, K.; Xiao, Z. The Effects of Government Subsidies and Environmental Regulation on Remanufacturing. Comput. Ind. Eng. 2023, 181, 109126. [Google Scholar] [CrossRef]
  4. Almada, L. La tributación como herramienta de preservación del medio ambiente. Aplicación empírica en empresas de in-versión en energía renovable: 2015–2021. Ejes Econ. Soc. 2022, 6, 274–296. [Google Scholar] [CrossRef]
  5. Zhang, Y.; Hong, Z.; Chen, Z.; Glock, C.H. Tax or subsidy? Design and selection of regulatory policies for remanufacturing. Eur. J. Oper. Res. 2020, 287, 885–900. [Google Scholar] [CrossRef]
  6. Vence, X.; López Pérez, S.J.L. Reformar El IVA Para Impulsar Los Servicios de Reparación y La Economía Circular. Contaduría Adm. 2022, 67, e336. [Google Scholar] [CrossRef]
  7. Zhao, S.; Xu, Y.; Liu, C.; Wei, F.; Mao, H. Carbon Tax vs. Carbon Trading in China: Which Is Better for Promoting Sustainable Development of Remanufacturing Companies? Environ. Sci. Pollut. Res. 2024, 31, 16710–16724. [Google Scholar] [CrossRef]
  8. Nunes, S.P.P.; Nunes, R.D.C. Fiscal Rules and Public Finance Sustainability: Lessons from Global Practices. IOSR J. Bus. Manag. 2024, 26, 17–25. [Google Scholar]
  9. Bojarski, A. Life Cycle Thinking and General Modelling Contribution to Chemical Process Sustainable Design and Operation. Ph.D. Thesis, Universitat Politècnica de Catalunya, Barcelona, Spain, 2010. [Google Scholar]
  10. Gupta, A.; Khanna, A. A Holistic Approach to Sustainable Manufacturing: Rework, Green Technology, and Carbon Policies. Expert Syst. Appl. 2023, 244, 122943. [Google Scholar] [CrossRef]
  11. López Rivera, M.L.; Cordero Díaz, M.C.; Marín Balcázar, G. Economía circular sostenible en una empresa maquiladora en Ciudad Juárez Chihuahua, México. FACE Rev. Fac. Cienc. Econ. Empres. 2024, 24, 117–129. [Google Scholar] [CrossRef]
  12. Rane, N.; Kaya, Ö.; Rane, J. Artificial Intelligence, Machine Learning, and Deep Learning for Sustainable Industry 5.0; Deep Science Publishing: Mumbai, India, 2024. [Google Scholar] [CrossRef]
  13. Jiménez-Preciado, A.L.; Cruz-Aké, S.; Venegas-Martínez, F. Identification of Patterns in CO2 Emissions among 208 Countries: K-Means Clustering Combined with PCA and Non-Linear t-SNE Visualization. Mathematics 2024, 12, 2591. [Google Scholar] [CrossRef]
  14. Wang, H.; Yu, X. Carbon dioxide emission typology and policy implications: Evidence from machine learning. China Econ. Rev. 2023, 78, 101941. [Google Scholar] [CrossRef]
  15. Ilie, M.; Ilie, C. Integrating AI and Statistical Modeling to Predict Key Sustainability Drivers of Climate Change Mitigation in Europe. Climate 2026, 14, 55. [Google Scholar] [CrossRef]
  16. Khan, R.A.; Haq, H.B.U. Beyond the Horizon: A Comprehensive Study on the Evolution and Application of Machine Learning Techniques in Diverse Fields. Sustain. Mach. Intell. J. 2025, 11, 49–66. [Google Scholar] [CrossRef]
  17. Sleem, A. Enhancing Sustainability through Automated Waste Classification: A Machine Intelligence Framework. Sustain. Mach. Intell. J. 2023, 5, 38–43. [Google Scholar] [CrossRef]
  18. Abdullah, W.; Nawaz, M.; Nasir, B.; Ramzan, M.J. The Role of AI in Addressing the United Nations Sustainable Development Goals (SDGs): A Comprehensive Survey. Sustain. Mach. Intell. J. 2025, 13, 13–24. [Google Scholar] [CrossRef]
  19. Olawumi, M.A.; Oladapo, B.I. AI-driven predictive models for sustainability. J. Environ. Manag. 2025, 373, 123472. [Google Scholar] [CrossRef]
  20. Bonilla, S.H.; Silva, H.R.O.; Terra da Silva, M.; Gonçalves, R.F.; Sacomano, J.B. Industry 4.0 and Sustainability Implications: A Scenario-Based Analysis of the Impacts and Challenges. Sustainability 2018, 10, 3740. [Google Scholar] [CrossRef]
  21. Ghobakhloo, M. Industry 4.0, digitization, and opportunities for sustainability. J. Clean. Prod. 2020, 252, 119869. [Google Scholar] [CrossRef]
  22. Eraña-Díaz, M.L.; Cruz-Chávez, M.A.; Acosta-Flores, M.; Urbano, J.E.; Ruiz, N.L.; Gamba, J.P.O. Interdisciplinary Methodology for Resource Allocation Problems using Artificial Neural Networks and Software Robots. IEEE Access 2025, 13, 131141–131158. [Google Scholar] [CrossRef]
  23. Luthra, S.; Mangla, S.K. Evaluating challenges to Industry 4.0 initiatives for supply chain sustainability in emerging economies. Process Saf. Environ. Prot. 2018, 117, 168–179. [Google Scholar] [CrossRef]
  24. Programmer3. Sustainable Process Optimization in Industry—Reconfigurable Manufacturing System Efficiency Data; Kaggle: San Francisco, CA, USA, 2024; Available online: https://www.kaggle.com/datasets/programmer3/sustainable-process-optimization-in-industry (accessed on 15 August 2025).
  25. Bortolini, M.; Galizia, F.G.; Mora, C. Reconfigurable manufacturing systems: Literature review and research trend. J. Manuf. Syst. 2018, 49, 93–106. [Google Scholar] [CrossRef]
  26. Hermann, M.; Pentek, T.; Otto, B. Design principles for Industrie 4.0 scenarios. In Proceedings of the 2016 49th Hawaii International Conference on System Sciences (HICSS), Koloa, HI, USA, 5–8 January 2016; pp. 3928–3937. [Google Scholar] [CrossRef]
  27. Xu, L.D.; Xu, E.L.; Li, L. Industry 4.0: State of the art and future trends. Int. J. Prod. Res. 2018, 56, 2941–2962. [Google Scholar] [CrossRef]
  28. MacQueen, J. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Berkeley, CA, USA, 1967; Volume 1, pp. 281–297. [Google Scholar]
  29. Rennings, K. Redefining Innovation—Eco-Innovation Research and the Contribution from Ecological Economics. Ecol. Econ. 2000, 32, 319–332. [Google Scholar] [CrossRef]
  30. Rousseeuw, P.J. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
  31. Davies, D.L.; Bouldin, D.W. A Cluster Separation Measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, PAMI-1, 224–227. [Google Scholar] [CrossRef]
  32. Fawcett, T. An Introduction to ROC Analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
  33. Huang, Z. Extensions to the K-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Min. Knowl. Discov. 1998, 2, 283–304. [Google Scholar] [CrossRef]
  34. Morakabatchiankar, S. A Contribution to Sustainable Management of Integrated Material/Energy Networks in Process Industries. Ph.D. Thesis, Universitat Politècnica de Catalunya, Barcelona, Spain, 2021. [Google Scholar]
  35. Landis, J.R.; Koch, G.G. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 1977, 33, 363–374. [Google Scholar] [CrossRef]
  36. Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems; von Luxburg, U., Guyon, I., Bengio, S., Wallach, H., Fergus, R., Eds.; Curran Associates: Red Hook, NY, USA, 2017; Volume 30, Available online: https://github.com/shap/shap (accessed on 25 February 2026).
Figure 1. General methodological workflow: data preprocessing, K-Means clustering (Phase 1), and binary ANN classification (Phase 2). In the K-Means scatter plot (Phase 1), filled blue circles represent observations assigned to Cluster 0 (favorable sustainability profile), filled orange triangles represent observations assigned to Cluster 1 (profile requiring fiscal intervention), and hollow circles indicate observations located near the cluster boundary, respectively. The dashed horizontal line denotes the centroid midpoint between the two clusters in the projected feature space.
Figure 1. General methodological workflow: data preprocessing, K-Means clustering (Phase 1), and binary ANN classification (Phase 2). In the K-Means scatter plot (Phase 1), filled blue circles represent observations assigned to Cluster 0 (favorable sustainability profile), filled orange triangles represent observations assigned to Cluster 1 (profile requiring fiscal intervention), and hollow circles indicate observations located near the cluster boundary, respectively. The dashed horizontal line denotes the centroid midpoint between the two clusters in the projected feature space.
Processes 14 01501 g001
Figure 2. Multi-criteria selection of the optimal number of clusters (k = 2 to k = 10): Elbow method (inertia); Silhouette coefficient; Davies–Bouldin index; Calinski–Harabasz index. The selected k = 6 is indicated by dashed vertical/horizontal lines.
Figure 2. Multi-criteria selection of the optimal number of clusters (k = 2 to k = 10): Elbow method (inertia); Silhouette coefficient; Davies–Bouldin index; Calinski–Harabasz index. The selected k = 6 is indicated by dashed vertical/horizontal lines.
Processes 14 01501 g002
Figure 3. Normalized heatmap of cluster centroids (values shown in cells are real means; colors represent normalized magnitudes from 0 to 1). Red indicates the highest relative values; green indicates the lowest.
Figure 3. Normalized heatmap of cluster centroids (values shown in cells are real means; colors represent normalized magnitudes from 0 to 1). Red indicates the highest relative values; green indicates the lowest.
Processes 14 01501 g003
Figure 4. Radar profiles of each cluster (k = 6), displaying normalized mean values of the seven input variables and three environmental output variables. Each vertex corresponds to one variable; larger polygon areas reflect higher normalized values.
Figure 4. Radar profiles of each cluster (k = 6), displaying normalized mean values of the seven input variables and three environmental output variables. Each vertex corresponds to one variable; larger polygon areas reflect higher normalized values.
Processes 14 01501 g004
Figure 5. Box-and-whisker plots of the three environmental output variables (Energy Consumption, Carbon Emissions, Waste Generated) disaggregated by cluster (k = 6). Red horizontal lines denote the intra-cluster median. Circles represent outliers beyond 1.5× the interquartile range.
Figure 5. Box-and-whisker plots of the three environmental output variables (Energy Consumption, Carbon Emissions, Waste Generated) disaggregated by cluster (k = 6). Red horizontal lines denote the intra-cluster median. Circles represent outliers beyond 1.5× the interquartile range.
Processes 14 01501 g005
Figure 6. Proportion of manufacturing units with AI optimization, IoT connectivity, and predictive maintenance adopted, disaggregated by cluster. Values above each bar indicate the percentage adoption rate.
Figure 6. Proportion of manufacturing units with AI optimization, IoT connectivity, and predictive maintenance adopted, disaggregated by cluster. Values above each bar indicate the percentage adoption rate.
Processes 14 01501 g006
Figure 7. PCA projections of the six clusters (k = 6): PC1 vs. PC2 (upper panel, 29.7% variance explained) and PC1 vs. PC3 (bottom panel, 28.6% variance explained). Colored dots represent individual observations; star markers indicate cluster centroids.
Figure 7. PCA projections of the six clusters (k = 6): PC1 vs. PC2 (upper panel, 29.7% variance explained) and PC1 vs. PC3 (bottom panel, 28.6% variance explained). Colored dots represent individual observations; star markers indicate cluster centroids.
Processes 14 01501 g007
Figure 8. Comparative distribution of environmental indicators between C0 (Good Practices, n = 159, blue) and C1 (Requires Fiscal Support, n = 182, pink). Box plots show median (red line), interquartile range (IQR; Q1–Q3), whiskers extending to 1.5 × IQR, and open circles indicating individual outlier observations beyond this range.
Figure 8. Comparative distribution of environmental indicators between C0 (Good Practices, n = 159, blue) and C1 (Requires Fiscal Support, n = 182, pink). Box plots show median (red line), interquartile range (IQR; Q1–Q3), whiskers extending to 1.5 × IQR, and open circles indicating individual outlier observations beyond this range.
Processes 14 01501 g008
Figure 9. Learning curves of the binary ANN classifier over 32 training epochs: (top) binary cross-entropy loss; (middle) classification accuracy; (bottom) AUC-ROC. Solid lines indicate training performance; dashed lines indicate validation performance.
Figure 9. Learning curves of the binary ANN classifier over 32 training epochs: (top) binary cross-entropy loss; (middle) classification accuracy; (bottom) AUC-ROC. Solid lines indicate training performance; dashed lines indicate validation performance.
Processes 14 01501 g009
Figure 10. Evaluation of the binary ANN classifier on the held-out test set: confusion matrix with overall accuracy = 0.754 Cell color intensity reflects the number of observations, with darker blue indicating higher counts.
Figure 10. Evaluation of the binary ANN classifier on the held-out test set: confusion matrix with overall accuracy = 0.754 Cell color intensity reflects the number of observations, with darker blue indicating higher counts.
Processes 14 01501 g010
Figure 11. ROC curve with AUC = 0.774 (solid green line) versus random classifier baseline (dashed line).
Figure 11. ROC curve with AUC = 0.774 (solid green line) versus random classifier baseline (dashed line).
Processes 14 01501 g011
Figure 12. Screenshot of the Gradio-based graphical interface for sustainability classification. The top panel shows the seven input parameter fields; the bottom panel displays the predicted sustainability class (C0 or C1), the classification probability, and the traffic-light assessment (green/yellow/red) for Energy Consumption, Carbon Emissions, and Waste Generated.
Figure 12. Screenshot of the Gradio-based graphical interface for sustainability classification. The top panel shows the seven input parameter fields; the bottom panel displays the predicted sustainability class (C0 or C1), the classification probability, and the traffic-light assessment (green/yellow/red) for Energy Consumption, Carbon Emissions, and Waste Generated.
Processes 14 01501 g012
Table 1. Benchmark comparison of methodologically related studies in ML-based environmental and sustainability classification.
Table 1. Benchmark comparison of methodologically related studies in ML-based environmental and sustainability classification.
StudyTaskMethodDatasetPerformance
Jiménez-Preciado et al., Mathematics 2024 [13]CO2 emission profile clustering, 208 countriesK-Means + PCA + t-SNEWorld Bank open dataCluster validation: CH/DB indices; Silhouette reported (n/a AUC—clustering task)
Wang et al., China Econ. Rev. 2023 [14]CO2 emission typology for differentiated policy design, provincial levelK-Means unsupervisedChina provincial panel 2000–20184-cluster solution; policy segment differentiation (n/a AUC)
Ilie et al., Climate 2026 [15]Sustainability predictor ranking, EU renewable energy targetsANN + statistical regressionEU Directive panel, n = 14 obs. (2010–2023)R2 = 0.91; interpretable feature hierarchy via ANN
Sleem, Sustain. Mach. Intell. J. 2023 [17]Binary sustainability classification (Organic vs. Recyclable waste)ResNet-CNN + transfer learningCurated waste image dataset, n = 22,564High Acc; confusion matrix reported; binary classifier for sustainability
Olawumi et al., J. Environ. Manage. 2024 [18]Industrial energy demand classification + optimizationML ensemble + SHAP/LIMEMulti-sector industrial sensor dataAcc = 78%; AUC not reported; SHAP variable importance
Present study this workBinary sustainability classification for differentiated fiscal policy targetingK-Means + binary ANN Reconfigurable manufacturing dataset, n = 1000 Acc = 75.4% · AUC = 0.774 · F1-macro = 0.753 · Cohen-κ = 0.508 · C1 precision = 0.794 · C1 recall = 0.730
Table 2. Description of variables used in the analytical model.
Table 2. Description of variables used in the analytical model.
VariableTypeRoleDescription
Material_UsageNumericalInputAmount of material consumed (units)
Production_CapacityNumericalInputProductive capacity of the machine
Reconfiguration_TimeNumericalInputTime required for line reconfiguration (min)
DowntimeNumericalInputUnplanned machine downtime (min)
AI_Optimization_AppliedBinary (0/1)InputWhether AI optimization is applied
IoT_EnabledBinary (0/1)InputWhether IoT sensors are active
Predictive_MaintenanceBinary (0/1)InputWhether predictive maintenance is in use
Energy_ConsumptionNumericalOutputTotal energy consumed (kWh)
Carbon_EmissionsNumericalOutputCarbon dioxide equivalent emissions (CO2-eq)
Waste_GeneratedNumericalOutputIndustrial waste generated (kg)
Table 3. Cluster selection metrics for k = 2 to k = 10. The selected k = 6 is indicated (✓).
Table 3. Cluster selection metrics for k = 2 to k = 10. The selected k = 6 is indicated (✓).
kInertiaSilhouetteDavies–BouldinCalinski–Harabasz
269650.0992.90119.3
364460.0952.54101.5
460320.0942.3394.8
557260.0912.1788.7
6 ✓54240.1022.0585.9
752090.0981.9980.8
850280.0971.9775.3
Table 4. Traffic-light sustainability classification thresholds by environmental output variable. Thresholds are computed intra-cluster using the 33rd percentile (P33) and 66th percentile (P66) of each variable’s distribution within its assigned cluster.
Table 4. Traffic-light sustainability classification thresholds by environmental output variable. Thresholds are computed intra-cluster using the 33rd percentile (P33) and 66th percentile (P66) of each variable’s distribution within its assigned cluster.
VariableGreen (✓)Yellow (⚠)Red (✗)
Energy Consumption (kWh)≤P33 per clusterP33–P66 per cluster>P66 per cluster
Carbon Emissions (CO2-eq)≤P33 per clusterP33–P66 per cluster>P66 per cluster
Waste Generated (kg)≤P33 per clusterP33–P66 per cluster>P66 per cluster
kWh = kilowatt-hours; CO2-eq = carbon dioxide equivalent; kg = kilograms. ✓ = satisfactory performance (low environmental impact); ⚠ = intermediate performance (moderate environmental impact); ✗ = critical performance (high environmental impact). Thresholds are computed independently within each cluster’s empirical distribution.
Table 5. Mean values of all analytical variables by cluster. Environmental outputs (Energy Consumption, Carbon Emissions, Waste Generated) are highlighted by traffic-light color relative to intra-cluster P33/P66 thresholds (green ≤ P33; yellow P33–P66; red > P66).
Table 5. Mean values of all analytical variables by cluster. Environmental outputs (Energy Consumption, Carbon Emissions, Waste Generated) are highlighted by traffic-light color relative to intra-cluster P33/P66 thresholds (green ≤ P33; yellow P33–P66; red > P66).
ClusternMat. UsageProd. Cap.Reconf. TimeDowntimeAIIoTPMEnergy (kWh)Carbon (CO2)Waste (kg)
C015935723125.911.30.480.520.53145.145.234.7
C1182349.9224.841.517.10.540.520.53208.167.841.5
C2155333.5200.643.522.80.410.480.57213.740.733.4
C3143292.5241.745.816.60.550.490.48184.46119.1
C4193341.9259.921.821.90.490.580.6213.152.724.5
C5168400.2185.131.212.90.520.580.49237.457.323.5
Note: AI = AI_Optimization_Applied; IoT = IoT_Enabled; PM = Predictive_Maintenance. Binary variables expressed as proportions (0–1).
Table 6. Intra-cluster traffic-light classification thresholds (percentile-based) for the three environmental output variables across all six clusters. Color coding denotes the traffic-light sustainability classification: green (≤P33) indicates favorable environmental performance; yellow (P33–P66) indicates moderate performance requiring monitoring; red (>P66) indicates critical performance requiring fiscal intervention.
Table 6. Intra-cluster traffic-light classification thresholds (percentile-based) for the three environmental output variables across all six clusters. Color coding denotes the traffic-light sustainability classification: green (≤P33) indicates favorable environmental performance; yellow (P33–P66) indicates moderate performance requiring monitoring; red (>P66) indicates critical performance requiring fiscal intervention.
ClusterEnvironmental VariableGreen (≤P33)Yellow (P33–P66)Red (>P66)
C0 (n = 159)Energy Consumption≤119.0119.0–157.0>157.0
Carbon Emissions≤38.038.0–49.0>49.0
Waste Generated≤31.031.0–40.0>40.0
C1 (n = 182)Energy Consumption≤181.0181.0–238.5>238.5
Carbon Emissions≤65.065.0–73.5>73.5
Waste Generated≤39.039.0–45.0>45.0
C2 (n = 155)Energy Consumption≤188.8188.8–247.6>247.6
Carbon Emissions≤35.035.0–44.0>44.0
Waste Generated≤28.828.8–39.0>39.0
C3 (n = 143)Energy Consumption≤148.0148.0–210.7>210.7
Carbon Emissions≤55.055.0–67.0>67.0
Waste Generated≤14.014.0–21.0>21.0
C4 (n = 193)Energy Consumption≤186.1186.1–246.0>246.0
Carbon Emissions≤46.046.0–59.0>59.0
Waste Generated≤20.020.0–28.0>28.0
C5 (n = 168)Energy Consumption≤223.1223.1–262.2>262.2
Carbon Emissions≤50.050.0–64.0>64.0
Waste Generated≤18.018.0–27.0>27.0
Table 7. Architecture of the binary ANN classifier.
Table 7. Architecture of the binary ANN classifier.
LayerNeuronsActivation FunctionParameters
Input7
Hidden 1 (Dense)64ReLU512
Hidden 2 (Dense)32ReLU2080
Output (Dense)1Sigmoid33
Total trainable parameters2625
Table 8. Summary of binary ANN classifier performance metrics on the held-out test set (n = 69).
Table 8. Summary of binary ANN classifier performance metrics on the held-out test set (n = 69).
MetricOverallC0 (Good Practices)C1 (Requires Fiscal Support)
Overall Accuracy0.754
AUC-ROC0.774
Precision0.7140.794
Recall (Sensitivity)0.7810.730
F1-Score0.7460.761
F1-macro0.753
Cohen-κ0.508
True Positives (TP/TN)2527
False (FP/FN)710
Table 9. Illustrative Gradio interface outputs for two representative input profiles (C0 and C1 cluster means).
Table 9. Illustrative Gradio interface outputs for two representative input profiles (C0 and C1 cluster means).
CaseReconfiguration TimeDowntimeAIIoTPMPredicted ClassProbabilityEnergy SignalEmissions SignalWaste Signal
C026 min11 min111Good Practices0.83GreenGreenGreen
C142 min17 min000Requires Support0.79YellowRedRed
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Eraña-Díaz, M.L.; Enríquez-Urbano, J.; Martínez-Bahena, B.; Juárez-Chávez, J.Y.; D’Granda-Trejo, A.; De-la-Rosa-Mondragon, J. Artificial Neural Network-Based Classification of Industrial Sustainability Profiles for Differentiated Fiscal Policy Design in Remanufacturing Processes. Processes 2026, 14, 1501. https://doi.org/10.3390/pr14091501

AMA Style

Eraña-Díaz ML, Enríquez-Urbano J, Martínez-Bahena B, Juárez-Chávez JY, D’Granda-Trejo A, De-la-Rosa-Mondragon J. Artificial Neural Network-Based Classification of Industrial Sustainability Profiles for Differentiated Fiscal Policy Design in Remanufacturing Processes. Processes. 2026; 14(9):1501. https://doi.org/10.3390/pr14091501

Chicago/Turabian Style

Eraña-Díaz, Marta Lilia, Juana Enríquez-Urbano, Beatriz Martínez-Bahena, Jazmin Yanel Juárez-Chávez, Alfonso D’Granda-Trejo, and Javier De-la-Rosa-Mondragon. 2026. "Artificial Neural Network-Based Classification of Industrial Sustainability Profiles for Differentiated Fiscal Policy Design in Remanufacturing Processes" Processes 14, no. 9: 1501. https://doi.org/10.3390/pr14091501

APA Style

Eraña-Díaz, M. L., Enríquez-Urbano, J., Martínez-Bahena, B., Juárez-Chávez, J. Y., D’Granda-Trejo, A., & De-la-Rosa-Mondragon, J. (2026). Artificial Neural Network-Based Classification of Industrial Sustainability Profiles for Differentiated Fiscal Policy Design in Remanufacturing Processes. Processes, 14(9), 1501. https://doi.org/10.3390/pr14091501

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop