Author Contributions
Conceptualization and methodology, M.L.E.-D., J.E.-U. and J.Y.J.-C.; software, M.L.E.-D., B.M.-B. validation, A.D.-T. and B.M.-B.; formal analysis and investigation, M.L.E.-D., J.E.-U., J.Y.J.-C. and J.D.-l.-R.-M.; resources, M.L.E.-D., J.E.-U. and J.Y.J.-C.; data curation, B.M.-B. and J.E.-U.; writing original draft preparation, M.L.E.-D., J.E.-U., B.M.-B. and J.D.-l.-R.-M.; writing review and editing, M.L.E.-D., A.D.-T., J.E.-U.; visualization, J.D.-l.-R.-M. and A.D.-T. All authors have read and agreed to the published version of the manuscript.
Figure 1.
General methodological workflow: data preprocessing, K-Means clustering (Phase 1), and binary ANN classification (Phase 2). In the K-Means scatter plot (Phase 1), filled blue circles represent observations assigned to Cluster 0 (favorable sustainability profile), filled orange triangles represent observations assigned to Cluster 1 (profile requiring fiscal intervention), and hollow circles indicate observations located near the cluster boundary, respectively. The dashed horizontal line denotes the centroid midpoint between the two clusters in the projected feature space.
Figure 1.
General methodological workflow: data preprocessing, K-Means clustering (Phase 1), and binary ANN classification (Phase 2). In the K-Means scatter plot (Phase 1), filled blue circles represent observations assigned to Cluster 0 (favorable sustainability profile), filled orange triangles represent observations assigned to Cluster 1 (profile requiring fiscal intervention), and hollow circles indicate observations located near the cluster boundary, respectively. The dashed horizontal line denotes the centroid midpoint between the two clusters in the projected feature space.
Figure 2.
Multi-criteria selection of the optimal number of clusters (k = 2 to k = 10): Elbow method (inertia); Silhouette coefficient; Davies–Bouldin index; Calinski–Harabasz index. The selected k = 6 is indicated by dashed vertical/horizontal lines.
Figure 2.
Multi-criteria selection of the optimal number of clusters (k = 2 to k = 10): Elbow method (inertia); Silhouette coefficient; Davies–Bouldin index; Calinski–Harabasz index. The selected k = 6 is indicated by dashed vertical/horizontal lines.
Figure 3.
Normalized heatmap of cluster centroids (values shown in cells are real means; colors represent normalized magnitudes from 0 to 1). Red indicates the highest relative values; green indicates the lowest.
Figure 3.
Normalized heatmap of cluster centroids (values shown in cells are real means; colors represent normalized magnitudes from 0 to 1). Red indicates the highest relative values; green indicates the lowest.
Figure 4.
Radar profiles of each cluster (k = 6), displaying normalized mean values of the seven input variables and three environmental output variables. Each vertex corresponds to one variable; larger polygon areas reflect higher normalized values.
Figure 4.
Radar profiles of each cluster (k = 6), displaying normalized mean values of the seven input variables and three environmental output variables. Each vertex corresponds to one variable; larger polygon areas reflect higher normalized values.
Figure 5.
Box-and-whisker plots of the three environmental output variables (Energy Consumption, Carbon Emissions, Waste Generated) disaggregated by cluster (k = 6). Red horizontal lines denote the intra-cluster median. Circles represent outliers beyond 1.5× the interquartile range.
Figure 5.
Box-and-whisker plots of the three environmental output variables (Energy Consumption, Carbon Emissions, Waste Generated) disaggregated by cluster (k = 6). Red horizontal lines denote the intra-cluster median. Circles represent outliers beyond 1.5× the interquartile range.
Figure 6.
Proportion of manufacturing units with AI optimization, IoT connectivity, and predictive maintenance adopted, disaggregated by cluster. Values above each bar indicate the percentage adoption rate.
Figure 6.
Proportion of manufacturing units with AI optimization, IoT connectivity, and predictive maintenance adopted, disaggregated by cluster. Values above each bar indicate the percentage adoption rate.
Figure 7.
PCA projections of the six clusters (k = 6): PC1 vs. PC2 (upper panel, 29.7% variance explained) and PC1 vs. PC3 (bottom panel, 28.6% variance explained). Colored dots represent individual observations; star markers indicate cluster centroids.
Figure 7.
PCA projections of the six clusters (k = 6): PC1 vs. PC2 (upper panel, 29.7% variance explained) and PC1 vs. PC3 (bottom panel, 28.6% variance explained). Colored dots represent individual observations; star markers indicate cluster centroids.
Figure 8.
Comparative distribution of environmental indicators between C0 (Good Practices, n = 159, blue) and C1 (Requires Fiscal Support, n = 182, pink). Box plots show median (red line), interquartile range (IQR; Q1–Q3), whiskers extending to 1.5 × IQR, and open circles indicating individual outlier observations beyond this range.
Figure 8.
Comparative distribution of environmental indicators between C0 (Good Practices, n = 159, blue) and C1 (Requires Fiscal Support, n = 182, pink). Box plots show median (red line), interquartile range (IQR; Q1–Q3), whiskers extending to 1.5 × IQR, and open circles indicating individual outlier observations beyond this range.
Figure 9.
Learning curves of the binary ANN classifier over 32 training epochs: (top) binary cross-entropy loss; (middle) classification accuracy; (bottom) AUC-ROC. Solid lines indicate training performance; dashed lines indicate validation performance.
Figure 9.
Learning curves of the binary ANN classifier over 32 training epochs: (top) binary cross-entropy loss; (middle) classification accuracy; (bottom) AUC-ROC. Solid lines indicate training performance; dashed lines indicate validation performance.
Figure 10.
Evaluation of the binary ANN classifier on the held-out test set: confusion matrix with overall accuracy = 0.754 Cell color intensity reflects the number of observations, with darker blue indicating higher counts.
Figure 10.
Evaluation of the binary ANN classifier on the held-out test set: confusion matrix with overall accuracy = 0.754 Cell color intensity reflects the number of observations, with darker blue indicating higher counts.
Figure 11.
ROC curve with AUC = 0.774 (solid green line) versus random classifier baseline (dashed line).
Figure 11.
ROC curve with AUC = 0.774 (solid green line) versus random classifier baseline (dashed line).
Figure 12.
Screenshot of the Gradio-based graphical interface for sustainability classification. The top panel shows the seven input parameter fields; the bottom panel displays the predicted sustainability class (C0 or C1), the classification probability, and the traffic-light assessment (green/yellow/red) for Energy Consumption, Carbon Emissions, and Waste Generated.
Figure 12.
Screenshot of the Gradio-based graphical interface for sustainability classification. The top panel shows the seven input parameter fields; the bottom panel displays the predicted sustainability class (C0 or C1), the classification probability, and the traffic-light assessment (green/yellow/red) for Energy Consumption, Carbon Emissions, and Waste Generated.
Table 1.
Benchmark comparison of methodologically related studies in ML-based environmental and sustainability classification.
Table 1.
Benchmark comparison of methodologically related studies in ML-based environmental and sustainability classification.
| Study | Task | Method | Dataset | Performance |
|---|
| Jiménez-Preciado et al., Mathematics 2024 [13] | CO2 emission profile clustering, 208 countries | K-Means + PCA + t-SNE | World Bank open data | Cluster validation: CH/DB indices; Silhouette reported (n/a AUC—clustering task) |
| Wang et al., China Econ. Rev. 2023 [14] | CO2 emission typology for differentiated policy design, provincial level | K-Means unsupervised | China provincial panel 2000–2018 | 4-cluster solution; policy segment differentiation (n/a AUC) |
| Ilie et al., Climate 2026 [15] | Sustainability predictor ranking, EU renewable energy targets | ANN + statistical regression | EU Directive panel, n = 14 obs. (2010–2023) | R2 = 0.91; interpretable feature hierarchy via ANN |
| Sleem, Sustain. Mach. Intell. J. 2023 [17] | Binary sustainability classification (Organic vs. Recyclable waste) | ResNet-CNN + transfer learning | Curated waste image dataset, n = 22,564 | High Acc; confusion matrix reported; binary classifier for sustainability |
| Olawumi et al., J. Environ. Manage. 2024 [18] | Industrial energy demand classification + optimization | ML ensemble + SHAP/LIME | Multi-sector industrial sensor data | Acc = 78%; AUC not reported; SHAP variable importance |
| Present study this work | Binary sustainability classification for differentiated fiscal policy targeting | K-Means + binary ANN | Reconfigurable manufacturing dataset, n = 1000 | Acc = 75.4% · AUC = 0.774 · F1-macro = 0.753 · Cohen-κ = 0.508 · C1 precision = 0.794 · C1 recall = 0.730 |
Table 2.
Description of variables used in the analytical model.
Table 2.
Description of variables used in the analytical model.
| Variable | Type | Role | Description |
|---|
| Material_Usage | Numerical | Input | Amount of material consumed (units) |
| Production_Capacity | Numerical | Input | Productive capacity of the machine |
| Reconfiguration_Time | Numerical | Input | Time required for line reconfiguration (min) |
| Downtime | Numerical | Input | Unplanned machine downtime (min) |
| AI_Optimization_Applied | Binary (0/1) | Input | Whether AI optimization is applied |
| IoT_Enabled | Binary (0/1) | Input | Whether IoT sensors are active |
| Predictive_Maintenance | Binary (0/1) | Input | Whether predictive maintenance is in use |
| Energy_Consumption | Numerical | Output | Total energy consumed (kWh) |
| Carbon_Emissions | Numerical | Output | Carbon dioxide equivalent emissions (CO2-eq) |
| Waste_Generated | Numerical | Output | Industrial waste generated (kg) |
Table 3.
Cluster selection metrics for k = 2 to k = 10. The selected k = 6 is indicated (✓).
Table 3.
Cluster selection metrics for k = 2 to k = 10. The selected k = 6 is indicated (✓).
| k | Inertia | Silhouette | Davies–Bouldin | Calinski–Harabasz |
|---|
| 2 | 6965 | 0.099 | 2.90 | 119.3 |
| 3 | 6446 | 0.095 | 2.54 | 101.5 |
| 4 | 6032 | 0.094 | 2.33 | 94.8 |
| 5 | 5726 | 0.091 | 2.17 | 88.7 |
| 6 ✓ | 5424 | 0.102 | 2.05 | 85.9 |
| 7 | 5209 | 0.098 | 1.99 | 80.8 |
| 8 | 5028 | 0.097 | 1.97 | 75.3 |
Table 4.
Traffic-light sustainability classification thresholds by environmental output variable. Thresholds are computed intra-cluster using the 33rd percentile (P33) and 66th percentile (P66) of each variable’s distribution within its assigned cluster.
Table 4.
Traffic-light sustainability classification thresholds by environmental output variable. Thresholds are computed intra-cluster using the 33rd percentile (P33) and 66th percentile (P66) of each variable’s distribution within its assigned cluster.
| Variable | Green (✓) | Yellow (⚠) | Red (✗) |
|---|
| Energy Consumption (kWh) | ≤P33 per cluster | P33–P66 per cluster | >P66 per cluster |
| Carbon Emissions (CO2-eq) | ≤P33 per cluster | P33–P66 per cluster | >P66 per cluster |
| Waste Generated (kg) | ≤P33 per cluster | P33–P66 per cluster | >P66 per cluster |
Table 5.
Mean values of all analytical variables by cluster. Environmental outputs (Energy Consumption, Carbon Emissions, Waste Generated) are highlighted by traffic-light color relative to intra-cluster P33/P66 thresholds (green ≤ P33; yellow P33–P66; red > P66).
Table 5.
Mean values of all analytical variables by cluster. Environmental outputs (Energy Consumption, Carbon Emissions, Waste Generated) are highlighted by traffic-light color relative to intra-cluster P33/P66 thresholds (green ≤ P33; yellow P33–P66; red > P66).
| Cluster | n | Mat. Usage | Prod. Cap. | Reconf. Time | Downtime | AI | IoT | PM | Energy (kWh) | Carbon (CO2) | Waste (kg) |
|---|
| C0 | 159 | 357 | 231 | 25.9 | 11.3 | 0.48 | 0.52 | 0.53 | 145.1 | 45.2 | 34.7 |
| C1 | 182 | 349.9 | 224.8 | 41.5 | 17.1 | 0.54 | 0.52 | 0.53 | 208.1 | 67.8 | 41.5 |
| C2 | 155 | 333.5 | 200.6 | 43.5 | 22.8 | 0.41 | 0.48 | 0.57 | 213.7 | 40.7 | 33.4 |
| C3 | 143 | 292.5 | 241.7 | 45.8 | 16.6 | 0.55 | 0.49 | 0.48 | 184.4 | 61 | 19.1 |
| C4 | 193 | 341.9 | 259.9 | 21.8 | 21.9 | 0.49 | 0.58 | 0.6 | 213.1 | 52.7 | 24.5 |
| C5 | 168 | 400.2 | 185.1 | 31.2 | 12.9 | 0.52 | 0.58 | 0.49 | 237.4 | 57.3 | 23.5 |
Table 6.
Intra-cluster traffic-light classification thresholds (percentile-based) for the three environmental output variables across all six clusters. Color coding denotes the traffic-light sustainability classification: green (≤P33) indicates favorable environmental performance; yellow (P33–P66) indicates moderate performance requiring monitoring; red (>P66) indicates critical performance requiring fiscal intervention.
Table 6.
Intra-cluster traffic-light classification thresholds (percentile-based) for the three environmental output variables across all six clusters. Color coding denotes the traffic-light sustainability classification: green (≤P33) indicates favorable environmental performance; yellow (P33–P66) indicates moderate performance requiring monitoring; red (>P66) indicates critical performance requiring fiscal intervention.
| Cluster | Environmental Variable | Green (≤P33) | Yellow (P33–P66) | Red (>P66) |
|---|
| C0 (n = 159) | Energy Consumption | ≤119.0 | 119.0–157.0 | >157.0 |
| Carbon Emissions | ≤38.0 | 38.0–49.0 | >49.0 |
| Waste Generated | ≤31.0 | 31.0–40.0 | >40.0 |
| C1 (n = 182) | Energy Consumption | ≤181.0 | 181.0–238.5 | >238.5 |
| Carbon Emissions | ≤65.0 | 65.0–73.5 | >73.5 |
| Waste Generated | ≤39.0 | 39.0–45.0 | >45.0 |
| C2 (n = 155) | Energy Consumption | ≤188.8 | 188.8–247.6 | >247.6 |
| Carbon Emissions | ≤35.0 | 35.0–44.0 | >44.0 |
| Waste Generated | ≤28.8 | 28.8–39.0 | >39.0 |
| C3 (n = 143) | Energy Consumption | ≤148.0 | 148.0–210.7 | >210.7 |
| Carbon Emissions | ≤55.0 | 55.0–67.0 | >67.0 |
| Waste Generated | ≤14.0 | 14.0–21.0 | >21.0 |
| C4 (n = 193) | Energy Consumption | ≤186.1 | 186.1–246.0 | >246.0 |
| Carbon Emissions | ≤46.0 | 46.0–59.0 | >59.0 |
| Waste Generated | ≤20.0 | 20.0–28.0 | >28.0 |
| C5 (n = 168) | Energy Consumption | ≤223.1 | 223.1–262.2 | >262.2 |
| Carbon Emissions | ≤50.0 | 50.0–64.0 | >64.0 |
| Waste Generated | ≤18.0 | 18.0–27.0 | >27.0 |
Table 7.
Architecture of the binary ANN classifier.
Table 7.
Architecture of the binary ANN classifier.
| Layer | Neurons | Activation Function | Parameters |
|---|
| Input | 7 | — | — |
| Hidden 1 (Dense) | 64 | ReLU | 512 |
| Hidden 2 (Dense) | 32 | ReLU | 2080 |
| Output (Dense) | 1 | Sigmoid | 33 |
| Total trainable parameters | — | — | 2625 |
Table 8.
Summary of binary ANN classifier performance metrics on the held-out test set (n = 69).
Table 8.
Summary of binary ANN classifier performance metrics on the held-out test set (n = 69).
| Metric | Overall | C0 (Good Practices) | C1 (Requires Fiscal Support) |
|---|
| Overall Accuracy | 0.754 | — | — |
| AUC-ROC | 0.774 | — | — |
| Precision | — | 0.714 | 0.794 |
| Recall (Sensitivity) | — | 0.781 | 0.730 |
| F1-Score | — | 0.746 | 0.761 |
| F1-macro | 0.753 | — | — |
| Cohen-κ | 0.508 | — | — |
| True Positives (TP/TN) | — | 25 | 27 |
| False (FP/FN) | — | 7 | 10 |
Table 9.
Illustrative Gradio interface outputs for two representative input profiles (C0 and C1 cluster means).
Table 9.
Illustrative Gradio interface outputs for two representative input profiles (C0 and C1 cluster means).
| Case | Reconfiguration Time | Downtime | AI | IoT | PM | Predicted Class | Probability | Energy Signal | Emissions Signal | Waste Signal |
|---|
| C0 | 26 min | 11 min | 1 | 1 | 1 | Good Practices | 0.83 | Green | Green | Green |
| C1 | 42 min | 17 min | 0 | 0 | 0 | Requires Support | 0.79 | Yellow | Red | Red |