# Machine Learning to Predict the Adsorption Capacity of Microplastics

^{1}

^{2}

^{*}

## Abstract

**:**

_{d}) using two different approximations (based on the number of input variables). The best-selected machine learning models present, in general, correlation coefficients above 0.92 in the query phase, which indicates that these types of models could be used for the rapid estimation of the absorption of organic contaminants on microplastics.

## 1. Introduction

_{d}) [17]. Since, according to the authors [17], the absorption data are nowadays limited, it would be interesting to dispose a method to predict the K

_{d}values under various conditions. For this reason, the application of the quantitative structure–property relationship (QSPR) together with machine learning (ML) techniques would be a possible alternative for the determination of these values.

#### 1.1. Random Forest

#### 1.2. Support Vector Machine

#### 1.3. Artificial Neural Networks

_{w}), n-octanol/water distribution coefficient under special pH condition (log D), and other quantum chemical descriptors) obtained from the literature [17]. These computational models will enable the quick prediction of the adsorption capacity of organic pollutants onto these three types of microplastics in water environments.

## 2. Materials and Methods

#### 2.1. Experimental Data Used

_{w}); and (iii) six different quantum chemical descriptors that allow the modeling of the microplastic/water partition coefficients (log K

_{d}) for diverse organics between and polyethylene/seawater–freshwater–pure water, polystyrene/seawater, and polypropylene/seawater [17]. The quantum chemical descriptors calculated by Li et al. (2020) were: (i) molecular volume (V′); (ii) the most negative atomic charge (q

^{−}); (iii) the most positive atomic charge on the H atom (qH

^{+}); (iv) the ratio of average molecular polarizability and molecular volume (π) and the covalent; (v) basicity (ε

_{β}); and (vi) acidity (ε

_{α}).

#### 2.2. Models Implemented

#### 2.2.1. Random Forest Models

#### 2.2.2. Support Vector Machine Models

^{−20}and 2

^{8}using 28 steps in linear or logarithmic scale; and C between ≈ 2

^{−10}and 2

^{20}using 30 steps in linear or logarithmic scale (SVM and SVM

_{log}). These values are an extension of the proposed values of Hsu et al. (2016) [63]. In addition to using the database in their real-scale, they were also normalized in the interval [−1,1] (first just normalizing the input variables (SVM

_{n}and SVM

_{n log}) and then normalizing the input and the output variables (SVM

_{n2}and SVM

_{n2 log}). The normalization was applied to the training input data, and later applied to the other phases. After the model selection, the output data were de-normalized to allow the real-scale comparison between all developed models

#### 2.2.3. Artificial Neural Network Models

_{lin}and ANN

_{log}). In addition, the decay was studied in mode true or false. The neural net operator to develop the ANN models scaled the values between −1 and 1 [66].

#### 2.2.4. Statistics Used to Analyze the Models

#### 2.2.5. Equipment and Software Used for the Development of the Models

^{®}Core™ i9-10900 at 2.80 GHz with 64GB RAM and Windows 10 Pro 21H1, and the second, an AMD Ryzen 7 3700X 8-Core at 3.60 GHz with 32 GB RAM and Windows 11 Pro 21H2.

## 3. Results and Discussion

#### 3.1. ML Models Using Input Variables Type 1

_{d}, developed with the same variable combination used by Li et al. (2020) [17].

_{log}) model (0.236), followed by the support vector machine (SVM

_{n2 log}) model (0.248), and finally, the random forest model (0.380). As can be seen, the three models present very high correlation coefficients for the validation phase, equal to or greater than 0.988; in addition, the mean absolute percentage error remains low, between 4.42% and 7.48%.

_{α}, and ε

_{β}).

_{lin}) model that presents an RMSE of 0.865, followed by the support vector machine (SVM

_{n log}) model with a value of 0.770, with the best model being the random forest, which has a root mean square error of 0.744. In this case, it can be seen that the mean absolute percentage errors exceed those obtained by the ML models of the first block, varying between 11.14% and 13.67%.

_{w}) and the other one using only one input variable (PE/pure water—2 with log D).

_{d}value, so this model lacks this case. As expected, the models offer different results depending on the input variables. When two input variables are used, the model that presents the best results for the validation phase is the support vector machine (SVM

_{n log}) model, while when only one input variable is used, the best model is the random forest. It can be seen that the use of two input variables improves the adjustments in the training and validation phases (except for the RF model). For the query phase, the adjustments remain practically unchanged, except for the case of the ANN (ANN

_{lin}) model, where the error, in terms of RMSE, drops from 0.729 to 0.431. As can be seen, the models developed with two input variables present low mean absolute percentage errors between 2.06% and 3.92% for the validation phase. This behavior worsens slightly for the training phase, passing to 4.92% and 5.93% for the ANN and SVM models, respectively, and 11.28% for the RF model. On the other hand, in the case of the query phase, the MAPE values are between 6.90% and 12.21%. Despite the increase in both the RMSE and the MAPE values, these models developed with two variables seem to behave adequately to predict log K

_{d}.

_{n2 log}) model and the ANN (ANN

_{lin}) model, present an RMSE value of 0.439 and 0.431 for this phase, slightly improving the results of the RF model for this phase.

_{d}in seawater, which present errors between 13.24% and 23.33%.

_{log}) model with an RMSE of 0.244 and, finally, the artificial neural network (ANN

_{lin}) model (0.270). The other statistics parameters of the validation phase show favorable behavior with MAPE values below 9% and with correlation coefficients above 0.980. For the training phase, the adjustments are similar to the validation phase, although an increase in the MAPE value of the random forest model is observed; even so, it remains below 10%.

_{β}). The SVM model presents high generalization errors, which imply that it should not be used for prediction tasks. It should be noted that this SVM model, which is the one with the lowest error for the validation phase among all the SVM models developed, is the one with the highest error for the query phase. Other SVM models with close RMSE values in the validation phase (0.255 and 0.262) subsequently showed a better result in the query phase (0.287 and 0.266, respectively).

_{n2 log}) model (0.524), followed by the artificial neural (ANN

_{lin}) network (0.643) and the random forest model (0.794). Based on the results presented by the mean absolute percentage error, it can be affirmed that these models destined to predict the adsorption for PS in seawater are the models that present the worst adjustments for the validation phase, varying between 14.61% and 21.69%. Despite this, the correlation coefficients remain high, with values greater than 0.960, except for the random forest model, whose correlation coefficient falls to 0.883. For the query phase, the values in terms of RMSE remain close, except for the random forest model, keeping the MAPE values above 15.1%.

#### 3.2. ML Models Using Input Variables Type 2

^{+}is not possible).

_{α}and ε

_{β}were used, in this new PE/seawater model, seven input variables were used (log D, M′

_{w}, ε

_{α}, ε

_{β}, q

^{−}, V′, π). It can be observed (Table 3), based on the RMSE value for the validation phase, that the best-developed machine learning model is the SVM (SVM

_{n log}) model, which has a value of 0.243 followed by the ANN (ANN

_{lin}) model (0.306), which is the random forest model and the model with the highest RMSE value for this phase (0.373). It is clear that, for this phase, the three selected models present suitable adjustments. In addition, these models also present high values of the correlation coefficient, all greater than 0.990. These promising results are also obtained for the training phase, although the random forest model presents an important increase regarding RMSE (from 0.373 to 0.824).

_{w}, ε

_{α}, ε

_{β}, qH

^{+}, q

^{−}, V′, π). In this case, the best model, based on the RMSE value for the validation phase, corresponds to the ANN (ANN

_{log}) model (0.446), followed by the SVM (SVM

_{n}) model (0.473) and the RF model (0.697). These reasonable adjustments are reflected in the high correlation coefficients all greater than 0.960. This behavior is improved in all statistical parameters for the training phase, except for the mean absolute percentage error of the random forest model. In the case of the query phase, these new models present RMSE values between 0.210 and 0.392, maintaining high correlation coefficients, all higher than 0.980. Comparing the ML models developed using the input variables selection Type 2 with the previously developed models using the input variables selection Type 1, it can be said that the ML models developed with eight variables improve the models developed with only one variable; the improvement is appreciable in all the parameters except three MAPE values.

_{w}, ε

_{α}, ε

_{β}, qH

^{+}, q

^{−}, V′, π) instead of the two or one which were used by Li et al. (2020) and that was also used in the development of the previous ML models (Table 2). In this case, the optimization process carried out by the RF model involved the elimination of the variable V′ in the trees of the forest.

_{log}) model, which presents a value of 0.154, followed by the RF model (0.204) and the ANN (ANN

_{log}) model (0.403). As in the previous models developed using the input variables selection Type 2, the correlation coefficients are high, all greater than 0.930. This good behavior for the validation phase is also observed in the training phase, although a small increase in the errors made by the models can be seen. For the query phase, the different models present RMSE values between 0.433 and 0.551, keeping the MAPE value at approximately 10% and correlation coefficients greater than 0.920.

_{w}, ε

_{α}, ε

_{β}, q

^{−}, V′, π).

_{log}) model (0.229), followed by the random forest model (0.245), and finally, the artificial neural network (ANN

_{lin}) model, which presents a higher error than the other two models (0.419). The correlation coefficients of the three models are greater than 0.975. This good behavior in the validation phase is also observed in the training phase, for both the random forest model and the support vector machine model; however, it should be noted that the artificial neural network model presents an error of 0.029 in the training phase. The three models present RMSEs for the query phase between 0.215 and 0.494, with the support vector machine model offering the best results, as was the case in the validation phase.

_{w}, ε

_{α}, ε

_{β}, q

^{−}, V′, π). In these new models, a significant improvement can be seen in the validation and query phase adjustment parameters. In fact, for the validation phase, the RMSE values are between 0.290 and 0.475 for the SVM (SVM

_{n2 log}) model and the RF model, respectively, while in the Type 1 models, the RMSE values were included between 0.524 and 0.794. Similar behavior is observed for the query phase, with the RMSE values between 0.385 and 0.873. As can be seen in Table 3, the best model on this occasion is the support vector machine model, which also offers the best adjustment parameters for the query phase (0.385).

_{log}) models developed using seven input variables improve the model developed by Li et al. (2020) with two input variables (0.385 and 0.407 vs. 0.714, respectively, in terms of RMSE values for the test phase).

_{d}for the best machine learning models according to RMSE in the validation phase of each block shown in Table 3.

_{d}values.

- Regardless of the input variables chosen, there is always some machine learning model that improves (in terms of RMSE for the query phase) the good adjustments provided by the models developed by Li et al. (2020).
- Including additional variables to develop the ML models does not always improve the variable selection carried out by Li et al. (2020). This is especially evident in the ML models destined to predict PE/seawater, where no model developed using the input variables selection Type 2 improves the Type 1 models. In this sense, it can be said that the selection of variables carried out by Li et al. (2020) is a good and reliable selection for the model’s development.
- Both models developed by Li et al. (2020) and the models developed in this research are models that could be used to determine the log K
_{d}values. - To the best of the authors’ knowledge, increasing the number of experimental cases for each microplastic/water group used to develop the models would be appropriate. Presumably, this increase would help the models present better adjustments.

## 4. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Lee, H.; Shim, J.E.; Park, I.H.; Choo, K.S.; Yeo, M.-K. Physical and Biomimetic Treatment Methods to Reduce Microplastic Waste Accumulation. Mol. Cell. Toxicol.
**2022**, 19, 13–25. [Google Scholar] [CrossRef] [PubMed] - Ghosh, S.; Greenfeld, I.; Daniel Wagner, H. CNT Coating and Anchoring Beads Enhance Interfacial Adhesion in Fiber Composites. Compos. Part A Appl. Sci. Manuf.
**2023**, 167, 107427. [Google Scholar] [CrossRef] - Ghosh, S.; Mondal, S.; Ganguly, S.; Remanan, S.; Singha, N.; Das, N.C. Carbon Nanostructures Based Mechanically Robust Conducting Cotton Fabric for Improved Electromagnetic Interference Shielding. Fibers Polym.
**2018**, 19, 1064–1073. [Google Scholar] [CrossRef] - Jaiswal, K.K.; Dutta, S.; Banerjee, I.; Pohrmen, C.B.; Singh, R.K.; Das, H.T.; Dubey, S.; Kumar, V. Impact of Aquatic Microplastics and Nanoplastics Pollution on Ecological Systems and Sustainable Remediation Strategies of Biodegradation and Photodegradation. Sci. Total Environ.
**2022**, 806, 151358. [Google Scholar] [CrossRef] - Singh, S.; Kumar Naik, T.S.S.; Anil, A.G.; Dhiman, J.; Kumar, V.; Dhanjal, D.S.; Aguilar-Marcelino, L.; Singh, J.; Ramamurthy, P.C. Micro (Nano) Plastics in Wastewater: A Critical Review on Toxicity Risk Assessment, Behaviour, Environmental Impact and Challenges. Chemosphere
**2022**, 290, 133169. [Google Scholar] [CrossRef] - Ng, E.-L.; Huerta Lwanga, E.; Eldridge, S.M.; Johnston, P.; Hu, H.-W.; Geissen, V.; Chen, D. An Overview of Microplastic and Nanoplastic Pollution in Agroecosystems. Sci. Total Environ.
**2018**, 627, 1377–1388. [Google Scholar] [CrossRef] - Vivekanand, A.C.; Mohapatra, S.; Tyagi, V.K. Microplastics in Aquatic Environment: Challenges and Perspectives. Chemosphere
**2021**, 282, 131151. [Google Scholar] [CrossRef] - Matthews, S.; Mai, L.; Jeong, C.-B.; Lee, J.-S.; Zeng, E.Y.; Xu, E.G. Key Mechanisms of Micro- and Nanoplastic (MNP) Toxicity across Taxonomic Groups. Comp. Biochem. Physiol. Part C Toxicol. Pharmacol.
**2021**, 247, 109056. [Google Scholar] [CrossRef] - Woods, J.S.; Verones, F.; Jolliet, O.; Vázquez-Rowe, I.; Boulay, A.-M. A Framework for the Assessment of Marine Litter Impacts in Life Cycle Impact Assessment. Ecol. Indic.
**2021**, 129, 107918. [Google Scholar] [CrossRef] - Peano, L.; Kounina, A.; Magaud, V.; Chalumeau, S.; Zgola, M.; Boucher, J. Plastic Leak Project, Methodological Guidelines. 2020. Available online: https://quantis.com/report/the-plastic-leak-project-guidelines/ (accessed on 13 February 2023).
- Ramachandraiah, K.; Ameer, K.; Jiang, G.; Hong, G.-P. Micro- and Nanoplastic Contamination in Livestock Production: Entry Pathways, Potential Effects and Analytical Challenges. Sci. Total Environ.
**2022**, 844, 157234. [Google Scholar] [CrossRef] - Abihssira-García, I.S.; Kögel, T.; Gomiero, A.; Kristensen, T.; Krogstad, M.; Olsvik, P.A. Distinct Polymer-Dependent Sorption of Persistent Pollutants Associated with Atlantic Salmon Farming to Microplastics. Mar. Pollut. Bull.
**2022**, 180, 113794. [Google Scholar] [CrossRef] - Gouin, T. Addressing the Importance of Microplastic Particles as Vectors for Long-Range Transport of Chemical Contaminants: Perspective in Relation to Prioritizing Research and Regulatory Actions. Microplastics Nanoplastics
**2021**, 1, 14. [Google Scholar] [CrossRef] - Ali, I.; Tan, X.; Li, J.; Peng, C.; Naz, I.; Duan, Z.; Ruan, Y. Interaction of Microplastics and Nanoplastics with Natural Organic Matter (NOM) and the Impact of NOM on the Sorption Behavior of Anthropogenic Contaminants—A Critical Review. J. Clean. Prod.
**2022**, 376, 134314. [Google Scholar] [CrossRef] - Katsumiti, A.; Losada-Carrillo, M.P.; Barros, M.; Cajaraville, M.P. Polystyrene Nanoplastics and Microplastics Can Act as Trojan Horse Carriers of Benzo(a)Pyrene to Mussel Hemocytes In Vitro. Sci. Rep.
**2021**, 11, 22396. [Google Scholar] [CrossRef] - Hu, L.; Zhao, Y.; Xu, H. Trojan Horse in the Intestine: A Review on the Biotoxicity of Microplastics Combined Environmental Contaminants. J. Hazard. Mater.
**2022**, 439, 129652. [Google Scholar] [CrossRef] [PubMed] - Li, M.; Yu, H.; Wang, Y.; Li, J.; Ma, G.; Wei, X. QSPR Models for Predicting the Adsorption Capacity for Microplastics of Polyethylene, Polypropylene and Polystyrene. Sci. Rep.
**2020**, 10, 14597. [Google Scholar] [CrossRef] - Kathuria, C.; Mehrotra, D.; Misra, N.K. A Novel Random Forest Approach to Predict Phase Transition. Int. J. Syst. Assur. Eng. Manag.
**2022**, 13, 494–503. [Google Scholar] [CrossRef] - Varnek, A.; Baskin, I. Machine Learning Methods for Property Prediction in Chemoinformatics: Quo Vadis? J. Chem. Inf. Model.
**2012**, 52, 1413–1437. [Google Scholar] [CrossRef] - Alduailij, M.; Khan, Q.W.; Tahir, M.; Sardaraz, M.; Alduailij, M.; Malik, F. Machine-Learning-Based DDoS Attack Detection Using Mutual Information and Random Forest Feature Importance Method. Symmetry
**2022**, 14, 1095. [Google Scholar] [CrossRef] - Taoufik, N.; Boumya, W.; Achak, M.; Chennouk, H.; Dewil, R.; Barka, N. The State of Art on the Prediction of Efficiency and Modeling of the Processes of Pollutants Removal Based on Machine Learning. Sci. Total Environ.
**2022**, 807, 150554. [Google Scholar] [CrossRef] - Breiman, L. Random Forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] [Green Version] - He, S.; Wu, J.; Wang, D.; He, X. Predictive Modeling of Groundwater Nitrate Pollution and Evaluating Its Main Impact Factors Using Random Forest. Chemosphere
**2022**, 290, 133388. [Google Scholar] [CrossRef] - Saglam, C.; Cetin, N. Prediction of Pistachio (Pistacia vera L.) Mass Based on Shape and Size Attributes by Using Machine Learning Algorithms. Food Anal. Methods
**2022**, 15, 739–750. [Google Scholar] [CrossRef] - Kang, B.; Seok, C.; Lee, J. Prediction of Molecular Electronic Transitions Using Random Forests. J. Chem. Inf. Model.
**2020**, 60, 5984–5994. [Google Scholar] [CrossRef] - Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling. J. Chem. Inf. Comput. Sci.
**2003**, 43, 1947–1958. [Google Scholar] [CrossRef] [PubMed] - Bienefeld, C.; Kirchner, E.; Vogt, A.; Kacmar, M. On the Importance of Temporal Information for Remaining Useful Life Prediction of Rolling Bearings Using a Random Forest Regressor. Lubricants
**2022**, 10, 67. [Google Scholar] [CrossRef] - Pang, A.; Chang, M.W.L.; Chen, Y. Evaluation of Random Forests (RF) for Regional and Local-Scale Wheat Yield Prediction in Southeast Australia. Sensors
**2022**, 22, 717. [Google Scholar] [CrossRef] - Geppert, H.; Vogt, M.; Bajorath, J. Current Trends in Ligand-Based Virtual Screening: Molecular Representations, Data Mining Methods, New Application Areas, and Performance Evaluation. J. Chem. Inf. Model.
**2010**, 50, 205–216. [Google Scholar] [CrossRef] - Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn.
**1995**, 20, 273–297. [Google Scholar] [CrossRef] - Rodríguez-Pérez, R.; Vogt, M.; Bajorath, J. Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction. ACS Omega
**2017**, 2, 6371–6379. [Google Scholar] [CrossRef] [Green Version] - Liu, G.; Zhu, H. Displacement Estimation of Six-Pole Hybrid Magnetic Bearing Using Modified Particle Swarm Optimization Support Vector Machine. Energies
**2022**, 15, 1610. [Google Scholar] [CrossRef] - Houssein, E.H.; Hosney, M.E.; Oliva, D. A Hybrid Seagull Optimization Algorithm for Chemical Descriptors Classification. In Proceedings of the 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), Cairo, Egypt, 26–27 May 2021; pp. 381–386. [Google Scholar]
- Sareminia, S. A Support Vector Based Hybrid Forecasting Model for Chaotic Time Series: Spare Part Consumption Prediction. Neural Process. Lett.
**2022**, 1–17. [Google Scholar] [CrossRef] - Orgeira-Crespo, P.; Míguez-Álvarez, C.; Cuevas-Alonso, M.; Doval-Ruiz, M.I. Decision Algorithm for the Automatic Determination of the Use of Non-Inclusive Terms in Academic Texts. Publications
**2020**, 8, 41. [Google Scholar] [CrossRef] - Drucker, H.; Surges, C.J.C.; Kaufman, L.; Smola, A.; Vapnik, V. Support Vector Regression Machines. In Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA, 1–6 December 1997; pp. 155–161. [Google Scholar]
- Smola, A.J.; Schölkopf, B. A Tutorial on Support Vector Regression. Stat. Comput.
**2004**, 14, 199–222. [Google Scholar] [CrossRef] [Green Version] - Prasanna, T.H.; Shantha, M.; Pradeep, A.; Mohanan, P. Identification of Polar Liquids Using Support Vector Machine Based Classification Model. IAES Int. J. Artif. Intell.
**2022**, 11, 1507–1516. [Google Scholar] [CrossRef] - Liu, Z.; He, H.; Xie, J.; Wang, K.; Huang, W. Self-Discharge Prediction Method for Lithium-Ion Batteries Based on Improved Support Vector Machine. J. Energy Storage
**2022**, 55, 105571. [Google Scholar] [CrossRef] - Elkorany, A.S.; Marey, M.; Almustafa, K.M.; Elsharkawy, Z.F. Breast Cancer Diagnosis Using Support Vector Machines Optimized by Whale Optimization and Dragonfly Algorithms. IEEE Access
**2022**, 10, 69688–69699. [Google Scholar] [CrossRef] - Niazkar, H.R.; Niazkar, M. Application of Artificial Neural Networks to Predict the COVID-19 Outbreak. Glob. Health Res. Policy
**2020**, 5, 50. [Google Scholar] [CrossRef] - Paturi, U.M.R.; Cheruku, S.; Reddy, N.S. The Role of Artificial Neural Networks in Prediction of Mechanical and Tribological Properties of Composites—A Comprehensive Review. Arch. Comput. Methods Eng.
**2022**, 29, 3109–3149. [Google Scholar] [CrossRef] - McCulloch, W.S.; Pitts, W. A Logical Calculus of the Ideas Immanent in Nervous Activity. Bull. Math. Biophys.
**1943**, 5, 115–133. [Google Scholar] [CrossRef] - Khan, M.T.; Kaushik, A.C.; Ji, L.; Malik, S.I.; Ali, S.; Wei, D.-Q. Artificial Neural Networks for Prediction of Tuberculosis Disease. Front. Microbiol.
**2019**, 10, 395. [Google Scholar] [CrossRef] [PubMed] - Mohamed, Z.E. Using the Artificial Neural Networks for Prediction and Validating Solar Radiation. J. Egypt. Math. Soc.
**2019**, 27, 47. [Google Scholar] [CrossRef] [Green Version] - Dikshit, A.; Pradhan, B.; Santosh, M. Artificial Neural Networks in Drought Prediction in the 21st Century–A Scientometric Analysis. Appl. Soft Comput.
**2022**, 114, 108080. [Google Scholar] [CrossRef] - Saikia, P.; Baruah, R.D.; Singh, S.K.; Chaudhuri, P.K. Artificial Neural Networks in the Domain of Reservoir Characterization: A Review from Shallow to Deep Models. Comput. Geosci.
**2020**, 135, 104357. [Google Scholar] [CrossRef] - Hornik, K.; Stinchcombe, M.; White, H. Multilayer Feedforward Networks Are Universal Approximators. Neural Netw.
**1989**, 2, 359–366. [Google Scholar] [CrossRef] - Shin-ike, K. A Two Phase Method for Determining the Number of Neurons in the Hidden Layer of a 3-Layer Neural Network. In Proceedings of the Proceedings of SICE Annual Conference 2010, Taipei, Taiwan, 18–21 August 2010; pp. 238–242. [Google Scholar]
- Ujong, J.A.; Mbadike, E.M.; Alaneme, G.U. Prediction of Cost and Duration of Building Construction Using Artificial Neural Network. Asian J. Civ. Eng.
**2022**, 23, 1117–1139. [Google Scholar] [CrossRef] - Salari, K.; Zarafshan, P.; Khashehchi, M.; Pipelzadeh, E.; Chegini, G. Modeling and Predicting of Water Production by Capacitive Deionization Method Using Artificial Neural Networks. Desalination
**2022**, 540, 115992. [Google Scholar] [CrossRef] - Shi, C.-F.; Yang, H.-T.; Chen, T.-T.; Guo, L.-P.; Leng, X.-Y.; Deng, P.-B.; Bi, J.; Pan, J.-G.; Wang, Y.-M. Artificial Neural Network-Genetic Algorithm-Based Optimization of Aerobic Composting Process Parameters of Ganoderma Lucidum Residue. Bioresour. Technol.
**2022**, 357, 127248. [Google Scholar] [CrossRef] - Hufnagl, B.; Steiner, D.; Renner, E.; Löder, M.G.J.; Laforsch, C.; Lohninger, H. A Methodology for the Fast Identification and Monitoring of Microplastics in Environmental Samples Using Random Decision Forest Classifiers. Anal. Methods
**2019**, 11, 2277–2285. [Google Scholar] [CrossRef] [Green Version] - Hufnagl, B.; Stibi, M.; Martirosyan, H.; Wilczek, U.; Möller, J.N.; Löder, M.G.J.; Laforsch, C.; Lohninger, H. Computer-Assisted Analysis of Microplastics in Environmental Samples Based on ΜFTIR Imaging in Combination with Machine Learning. Environ. Sci. Technol. Lett.
**2022**, 9, 90–95. [Google Scholar] [CrossRef] - Yang, J.; Gong, J.; Tang, W.; Shen, Y.; Liu, C.; Gao, J. Delineation of Urban Growth Boundaries Using a Patch-Based Cellular Automata Model under Multiple Spatial and Socio-Economic Scenarios. Sustainability
**2019**, 11, 6159. [Google Scholar] [CrossRef] [Green Version] - Sarraf Shirazi, A.; Frigaard, I. SlurryNet: Predicting Critical Velocities and Frictional Pressure Drops in Oilfield Suspension Flows. Energies
**2021**, 14, 1263. [Google Scholar] [CrossRef] - Moldes, Ó.A.; Morales, J.; Cid, A.; Astray, G.; Montoya, I.A.; Mejuto, J.C. Electrical Percolation of AOT-Based Microemulsions with n-Alcohols. J. Mol. Liq.
**2016**, 215, 18–23. [Google Scholar] [CrossRef] - Zou, Z.-M.; Chang, D.-H.; Liu, H.; Xiao, Y.-D. Current Updates in Machine Learning in the Prediction of Therapeutic Outcome of Hepatocellular Carcinoma: What Should We Know? Insights Imaging
**2021**, 12, 31. [Google Scholar] [CrossRef] [PubMed] - Yan, X.; Cao, Z.; Murphy, A.; Qiao, Y. An Ensemble Machine Learning Method for Microplastics Identification with FTIR Spectrum. J. Environ. Chem. Eng.
**2022**, 10, 108130. [Google Scholar] [CrossRef] - Bifano, L.; Meiler, V.; Peter, R.; Fischerauer, G. Detection of Microplastics in Water Using Electrical Impedance Spectroscopy and Support Vector Machines. In Proceedings of the Sensors and Measuring Systems; 21th ITG/GMA-Symposium, Nuremberg, Germany, 10–11 May 2022; pp. 356–359. [Google Scholar]
- Chang, C.-C.; Lin, C.-J. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol.
**2011**, 2, 27. [Google Scholar] [CrossRef] - Chang, C.C.; Lin, C.J. LIBSVM—A Library for Support Vector Machines. Available online: https://www.csie.ntu.edu.tw/~cjlin/libsvm/ (accessed on 17 October 2022).
- Hsu, C.-W.; Chang, C.-C.; Lin, C.-J. A Practical Guide to Support Vector Classification. Available online: https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf (accessed on 17 October 2022).
- Ng, W.; Minasny, B.; McBratney, A. Convolutional Neural Network for Soil Microplastic Contamination Screening Using Infrared Spectroscopy. Sci. Total Environ.
**2020**, 702, 134723. [Google Scholar] [CrossRef] - Guo, X.; Wang, J. Projecting the Sorption Capacity of Heavy Metal Ions onto Microplastics in Global Aquatic Environments Using Artificial Neural Networks. J. Hazard. Mater.
**2021**, 402, 123709. [Google Scholar] [CrossRef] - RapidMiner Documentation. Neural Net. Available online: https://docs.rapidminer.com/latest/studio/operators/modeling/predictive/neural_nets/neural_net.html (accessed on 17 October 2022).

**Figure 1.**Schemes of the different ML models developed in this research: (

**A**) RF model (in regression mode)—inspired by the figure of Yang et al. (2019) [55]; (

**B**) SVM model—inspired by the figure of Sarraf Shirazi and Frigaard (2021) [56]; and (

**C**) ANN model—inspired by the figures of Moldes et al. (2016) and Zou et al. (2021) [57,58].

**Figure 2.**Scatter plots for the experimental and predicted values of log K

_{d}for the selected ML models developed using the input variables selection Type 2. The dashed line corresponds to the line with slope 1.

**Table 1.**Input variables, marked in purple, are used according to the input variable selection to predict log K

_{d}. Type 1 and type 1 * are the configurations used by Li et al. (2020) [17], and Type 2 is the configuration used in this research. Polyethylene (PE), polystyrene (PS), polypropylene (PP), and the eight variables reported used by Li et al. (2020) [17]: (i) n-octanol/water distribution coefficient at special pH condition (log D); (ii) molecular mass (M′

_{w}); covalent; (iii) acidity (ε

_{α}); and (iv) basicity (ε

_{β}); (v) most positive atomic charge on H atom (qH

^{+}); (vi) most negative atomic charge (q

^{−}); (vii) molecular volume (V′); and (viii) molecular volume (π).

Model | Input Variables | Model | Input Variables | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

log D | M’_{w} | ε_{α} | ε_{β} | qH^{+} | q^{−} | V′ | π | log D | M’_{w} | ε_{α} | ε_{β} | qH^{+} | q^{−} | V′ | π | ||

PE/seawater | PP/seawater | ||||||||||||||||

Type 1 | Type 1 | ||||||||||||||||

Type 2 | Type 2 | ||||||||||||||||

PE/freshwater | PS/seawater | ||||||||||||||||

Type 1 | Type 1 | ||||||||||||||||

Type 2 | Type 2 | ||||||||||||||||

PE/pure water | |||||||||||||||||

Type 1 | |||||||||||||||||

Type 1 * | |||||||||||||||||

Type 2 |

**Table 2.**Adjustments for the different machine learning models developed using the input variables selection Type 1. RMSE is the root mean square error; MAPE corresponds to the mean absolute percentage error; and r is the correlation coefficient. RF is the random forest model; SVM is the support vector machine model; and ANN corresponds to the artificial neural network model. T, V, and Q are the training, validation, and query phases, respectively. The best models (regarding RMSE for the validation phase) are in bold.

T | V | Z | |||||||
---|---|---|---|---|---|---|---|---|---|

Model | RMSE | MAPE | r | RMSE | MAPE | r | RMSE | MAPE | r |

PE/seawater | |||||||||

RF | 0.525 | 18.67 | 0.983 | 0.380 | 7.48 | 0.988 | 0.523 | 13.38 | 0.979 |

SVM | 0.287 | 2.83 | 0.993 | 0.248 | 4.61 | 0.993 | 0.357 | 13.24 | 0.990 |

ANN | 0.257 | 3.13 | 0.994 | 0.236 | 4.42 | 0.994 | 0.561 | 23.33 | 0.979 |

PE/freshwater | |||||||||

RF | 0.549 | 8.08 | 0.973 | 0.744 | 13.67 | 0.944 | 0.565 | 7.23 | 0.963 |

SVM | 0.536 | 8.93 | 0.976 | 0.770 | 11.14 | 0.945 | 0.475 | 10.46 | 0.978 |

ANN | 0.489 | 6.79 | 0.978 | 0.865 | 13.20 | 0.932 | 0.464 | 8.59 | 0.974 |

PE/pure water—1 | |||||||||

RF | 0.471 | 11.28 | 0.968 | 0.176 | 3.31 | 0.992 | 0.531 | 9.48 | 0.929 |

SVM | 0.356 | 5.93 | 0.974 | 0.132 | 2.06 | 0.993 | 0.411 | 6.90 | 0.958 |

ANN | 0.309 | 4.92 | 0.981 | 0.225 | 3.92 | 0.982 | 0.729 | 12.21 | 0.937 |

PE/pure water—2 | |||||||||

RF | 0.410 | 7.79 | 0.967 | 0.132 | 2.25 | 0.993 | 0.526 | 8.59 | 0.936 |

SVM | 0.466 | 9.51 | 0.955 | 0.205 | 3.47 | 0.983 | 0.439 | 8.10 | 0.953 |

ANN | 0.409 | 6.45 | 0.965 | 0.231 | 4.23 | 0.981 | 0.431 | 7.72 | 0.955 |

PP/seawater | |||||||||

RF | 0.255 | 9.95 | 0.990 | 0.199 | 6.69 | 0.994 | 0.298 | 4.97 | 0.968 |

SVM | 0.260 | 5.12 | 0.989 | 0.244 | 6.92 | 0.988 | 0.779 | 7.32 | 0.817 |

ANN | 0.160 | 3.19 | 0.996 | 0.270 | 8.94 | 0.988 | 0.307 | 4.21 | 0.956 |

PS/seawater | |||||||||

RF | 0.221 | 5.28 | 0.996 | 0.794 | 14.61 | 0.883 | 1.003 | 15.11 | 0.820 |

SVM | 0.554 | 23.10 | 0.969 | 0.524 | 21.69 | 0.965 | 0.436 | 12.85 | 0.988 |

ANN | 0.337 | 9.21 | 0.988 | 0.643 | 15.69 | 0.972 | 0.773 | 15.07 | 0.956 |

**Table 3.**Adjustments for the different machine learning models developed using the input variables selection Type 2. RMSE is the root mean square error, MAPE corresponds to the mean absolute percentage error, and r is the correlation coefficient. RF is the random forest model, SVM is the support vector machine model, and ANN corresponds to the artificial neural network model. T, V, and Q are the training, validation, and query phases, respectively. The best models (regarding RMSE for the validation phase) are in bold.

T | V | Z | |||||||
---|---|---|---|---|---|---|---|---|---|

Model | RMSE | MAPE | r | RMSE | MAPE | r | RMSE | MAPE | r |

PE/seawater | |||||||||

RF | 0.824 | 38.89 | 0.954 | 0.373 | 7.69 | 0.988 | 0.693 | 26.80 | 0.970 |

SVM | 0.336 | 5.52 | 0.991 | 0.243 | 5.22 | 0.994 | 0.443 | 16.38 | 0.984 |

ANN | 0.040 | 0.56 | 1.000 | 0.306 | 5.46 | 0.989 | 0.762 | 15.28 | 0.946 |

PE/freshwater | |||||||||

RF | 0.424 | 16.78 | 0.991 | 0.697 | 8.78 | 0.962 | 0.392 | 11.86 | 0.986 |

SVM | 0.320 | 6.87 | 0.991 | 0.473 | 7.05 | 0.990 | 0.210 | 8.18 | 0.999 |

ANN | 0.289 | 4.94 | 0.992 | 0.446 | 7.10 | 0.991 | 0.272 | 10.40 | 0.997 |

PE/pure water | |||||||||

RF | 0.473 | 10.77 | 0.955 | 0.204 | 3.31 | 0.983 | 0.542 | 10.37 | 0.929 |

SVM | 0.306 | 5.34 | 0.981 | 0.154 | 2.56 | 0.990 | 0.433 | 7.25 | 0.956 |

ANN | 0.634 | 14.70 | 0.916 | 0.403 | 7.90 | 0.937 | 0.551 | 11.57 | 0.926 |

PP/seawater | |||||||||

RF | 0.295 | 6.44 | 0.988 | 0.245 | 9.42 | 0.994 | 0.215 | 3.36 | 0.983 |

SVM | 0.222 | 4.74 | 0.992 | 0.229 | 6.98 | 0.990 | 0.240 | 3.66 | 0.974 |

ANN | 0.029 | 0.54 | 1.000 | 0.419 | 12.20 | 0.979 | 0.494 | 8.20 | 0.938 |

PS/seawater | |||||||||

RF | 0.486 | 11.07 | 0.980 | 0.475 | 15.16 | 0.970 | 0.873 | 23.01 | 0.882 |

SVM | 0.248 | 4.72 | 0.994 | 0.290 | 8.50 | 0.986 | 0.385 | 12.05 | 0.976 |

ANN | 0.309 | 7.01 | 0.990 | 0.445 | 9.74 | 0.984 | 0.407 | 12.43 | 0.973 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Astray, G.; Soria-Lopez, A.; Barreiro, E.; Mejuto, J.C.; Cid-Samamed, A.
Machine Learning to Predict the Adsorption Capacity of Microplastics. *Nanomaterials* **2023**, *13*, 1061.
https://doi.org/10.3390/nano13061061

**AMA Style**

Astray G, Soria-Lopez A, Barreiro E, Mejuto JC, Cid-Samamed A.
Machine Learning to Predict the Adsorption Capacity of Microplastics. *Nanomaterials*. 2023; 13(6):1061.
https://doi.org/10.3390/nano13061061

**Chicago/Turabian Style**

Astray, Gonzalo, Anton Soria-Lopez, Enrique Barreiro, Juan Carlos Mejuto, and Antonio Cid-Samamed.
2023. "Machine Learning to Predict the Adsorption Capacity of Microplastics" *Nanomaterials* 13, no. 6: 1061.
https://doi.org/10.3390/nano13061061