Unraveling the Microbiome–Environmental Change Nexus to Contribute to a More Sustainable World: A Comprehensive Review of Artificial Intelligence Approaches
Abstract
1. Introduction
- What is the current state of the art in AI-driven prediction of environmental microbiome dynamics?
- How do various ML/DL algorithms compare in their ability to capture complex microbiome–environment interactions?
- To what extent can microbiome changes be accurately predicted using environmental parameters across different ecosystems?
- How might AI microbiome predictions contribute to a more sustainable world?
- What are the key methodological challenges and technical limitations affecting prediction accuracy?
- How effectively do current models capture the bidirectional relationship between microbiome alterations and environmental changes?
2. Materials and Methods
Document Search
3. Results
3.1. Environmental Impacts on Microbiome
3.2. High Throughput Sequencing
3.3. Machine and Deep Learning Applications
3.4. Literature Search Results
Analysis of the Results
4. Discussion
4.1. Critical Analysis for the Selected Papers
4.2. Limitations
5. Conclusions
6. Future Directions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- Song, D.; Huo, T.; Zhang, Z.; Cheng, L.; Wang, L.; Ming, K.; Liu, H.; Li, M.; Du, X. Metagenomic Analysis Reveals the Response of Microbial Communities and Their Functions in Lake Sediment to Environmental Factors. Int. J. Environ. Res. Public Health 2022, 19, 16870. [Google Scholar] [CrossRef]
- Carles, L.; Wullschleger, S.; Joss, A.; Eggen, R.I.; Schirmer, K.; Schuwirth, N.; Stamm, C.; Tlili, A. Wastewater microorganisms impact microbial diversity and important ecological functions of stream periphyton. Water Res. 2022, 225, 119119. [Google Scholar] [CrossRef]
- Shade, A.; Peter, H.; Allison, S.D.; Baho, D.L.; Berga, M.; Bürgmann, H.; Huber, D.H.; Langenheder, S.; Lennon, J.T.; Martiny, J.B.H.; et al. Fundamentals of Microbial Community Resistance and Resilience. Front. Microbiol. 2012, 3, 417. [Google Scholar] [CrossRef]
- Philippot, L.; Raaijmakers, J.M.; Lemanceau, P.; van der Putten, W.H. Going back to the roots: The microbial ecology of the rhizosphere. Nat. Rev. Microbiol. 2013, 11, 789–799. [Google Scholar] [CrossRef]
- Rittmann, B.E. Biofilms, active substrata, and me. Water Res. 2018, 132, 135–145. [Google Scholar] [CrossRef] [PubMed]
- Urbanek, A.K.; Rymowicz, W.; Mirończuk, A.M. Degradation of plastics and plastic-degrading bacteria in cold marine habitats. Appl. Microbiol. Biotechnol. 2018, 102, 7669–7678. [Google Scholar] [CrossRef]
- Cavicchioli, R.; Ripple, W.J.; Timmis, K.N.; Azam, F.; Bakken, L.R.; Baylis, M.; Behrenfeld, M.J.; Boetius, A.; Boyd, P.W.; Classen, A.T.; et al. Scientists’ warning to humanity: Microorganisms and climate change. Nat. Rev. Microbiol. 2019, 17, 569–586. [Google Scholar] [CrossRef]
- Jansson, J.K.; Hofmockel, K.S. Soil microbiomes and climate change. Nat. Rev. Microbiol. 2019, 18, 35–46. [Google Scholar] [CrossRef] [PubMed]
- Bøifot, K.O.; Gohli, J.; Moen, L.V.; Dybwad, M. Performance evaluation of a new custom, multi-component DNA isolation method optimized for use in shotgun metagenomic sequencing-based aerosol microbiome research. Environ. Microbiome 2020, 15, 1. [Google Scholar] [CrossRef] [PubMed]
- McElhinney, J.M.W.R.; Catacutan, M.K.; Mawart, A.; Hasan, A.; Dias, J. Interfacing Machine Learning and Microbial Omics: A Promising Means to Address Environmental Challenges. Front. Microbiol. 2022, 13, 851450. [Google Scholar] [CrossRef]
- Ghannam, R.B.; Techtmann, S.M. Machine learning applications in microbial ecology, human microbiome studies, and environmental monitoring. Comput. Struct. Biotechnol. J. 2021, 19, 1092–1107. [Google Scholar] [CrossRef]
- Rodrigues, P.M.; Madeiro, J.P.; Marques, J.A.L. Enhancing Health and Public Health through Machine Learning: Decision Support for Smarter Choices. Bioengineering 2023, 10, 792. [Google Scholar] [CrossRef]
- Lo, C.; Marculescu, R. MetaNN: Accurate classification of host phenotypes from metagenomic data using neural networks. BMC Bioinform. 2019, 20, 314. [Google Scholar] [CrossRef]
- Shi, Y.; Zhang, L.; Peterson, C.B.; Do, K.A.; Jenq, R.R. Performance determinants of unsupervised clustering methods for microbiome data. Microbiome 2022, 10, 25. [Google Scholar] [CrossRef]
- Papoutsoglou, G.; Tarazona, S.; Lopes, M.B.; Klammsteiner, T.; Ibrahimi, E.; Eckenberger, J.; Novielli, P.; Tonda, A.; Simeon, A.; Shigdel, R.; et al. Machine learning approaches in microbiome research: Challenges and best practices. Front. Microbiol. 2023, 14, 1261889. [Google Scholar] [CrossRef]
- Johnson, C.N.; Balmford, A.; Brook, B.W.; Buettel, J.C.; Galetti, M.; Guangchun, L.; Wilmshurst, J.M. Biodiversity losses and conservation responses in the Anthropocene. Science 2017, 356, 270–275. [Google Scholar] [CrossRef] [PubMed]
- Pecl, G.T.; Araújo, M.B.; Bell, J.D.; Blanchard, J.; Bonebrake, T.C.; Chen, I.C.; Clark, T.D.; Colwell, R.K.; Danielsen, F.; Evengård, B.; et al. Biodiversity redistribution under climate change: Impacts on ecosystems and human well-being. Science 2017, 355, eaai9214. [Google Scholar] [CrossRef] [PubMed]
- Barnosky, A.D.; Matzke, N.; Tomiya, S.; Wogan, G.O.; Swartz, B.; Quental, T.B.; Marshall, C.; McGuire, J.L.; Lindsey, E.L.; Maguire, K.C.; et al. Has the Earth’s sixth mass extinction already arrived? Nature 2011, 471, 51–57. [Google Scholar] [CrossRef] [PubMed]
- Ripple, W.J.; Wolf, C.; Newsome, T.M.; Galetti, M.; Alamgir, M.; Crist, E.; Mahmoud, M.I.; Laurance, W.F.; 15,364 Scientist Signatories from 184 Countries. World scientists’ warning to humanity: A second notice. BioScience 2017, 67, 1026–1028. [Google Scholar] [CrossRef]
- Bellard, C.; Bertelsmeier, C.; Leadley, P.; Thuiller, W.; Courchamp, F. Impacts of climate change on the future of biodiversity. Ecol. Lett. 2012, 15, 365–377. [Google Scholar] [CrossRef]
- Crist, E.; Mora, C.; Engelman, R. The interaction of human population, food production, and biodiversity protection. Science 2017, 356, 260–264. [Google Scholar] [CrossRef]
- Panthee, B.; Gyawali, S.; Panthee, P.; Techato, K. Environmental and human microbiome for health. Life 2022, 12, 456. [Google Scholar] [CrossRef]
- Staff, A. Microbes and Climate Change—Science, People & Impacts; Technical Report; American Society for Microbiology: Washington, DC, USA, 2022. [Google Scholar] [CrossRef]
- Staff, A. FAQ: Microbes and Climate Change; Technical Report; American Society for Microbiology: Washington, DC, USA, 2017. Available online: https://www.ncbi.nlm.nih.gov/books/NBK513763/ (accessed on 6 August 2025).
- Flandroy, L.; Poutahidis, T.; Berg, G.; Clarke, G.; Dao, M.C.; Decaestecker, E.; Furman, E.; Haahtela, T.; Massart, S.; Plovier, H.; et al. The impact of human activities and lifestyles on the interlinked microbiota and health of humans and of ecosystems. Sci. Total Environ. 2018, 627, 1018–1038. [Google Scholar] [CrossRef] [PubMed]
- Tiedje, J.M.; Bruns, M.A.; Casadevall, A.; Criddle, C.S.; Eloe-Fadrosh, E.; Karl, D.M.; Nguyen, N.K.; Zhou, J. Microbes and Climate Change: A Research Prospectus for the Future. mBio 2022, 13, e00800-22. [Google Scholar] [CrossRef]
- Gandolfi, I.; Bertolini, V.; Bestetti, G.; Ambrosini, R.; Innocente, E.; Rampazzo, G.; Papacchini, M.; Franzetti, A. Spatio-temporal variability of airborne bacterial communities and their correlation with particulate matter chemical composition across two urban areas. Appl. Microbiol. Biotechnol. 2015, 99, 4867–4877. [Google Scholar] [CrossRef]
- Uetake, J.; Tobo, Y.; Uji, Y.; Hill, T.C.; DeMott, P.J.; Kreidenweis, S.M.; Misumi, R. Seasonal changes of airborne bacterial communities over Tokyo and influence of local meteorology. Front. Microbiol. 2019, 10, 1572. [Google Scholar] [CrossRef] [PubMed]
- Zhen, Q.; Deng, Y.; Wang, Y.; Wang, X.; Zhang, H.; Sun, X.; Ouyang, Z. Meteorological factors had more impact on airborne bacterial communities than air pollutants. Sci. Total Environ. 2017, 601, 703–712. [Google Scholar] [CrossRef] [PubMed]
- Bertolini, V.; Gandolfi, I.; Ambrosini, R.; Bestetti, G.; Innocente, E.; Rampazzo, G.; Franzetti, A. Temporal variability and effect of environmental variables on airborne bacterial communities in an urban area of Northern Italy. Appl. Microbiol. Biotechnol. 2013, 97, 6561–6570. [Google Scholar] [CrossRef]
- Liu, H.; Zhang, X.; Zhang, H.; Yao, X.; Zhou, M.; Wang, J.; He, Z.; Zhang, H.; Lou, L.; Mao, W.; et al. Effect of air pollution on the total bacteria and pathogenic bacteria in different sizes of particulate matter. Environ. Pollut. 2018, 233, 483–493. [Google Scholar] [CrossRef]
- Retter, A.; Karwautz, C.; Griebler, C. Groundwater Microbial Communities in Times of Climate Change. Curr. Issues Mol. Biol. 2021, 41, 509–538. [Google Scholar] [CrossRef]
- Danczak, R.E.; Johnston, M.D.; Kenah, C.; Slattery, M.; Wilkins, M.J. Microbial community cohesion mediates community turnover in unperturbed aquifers. Msystems 2018, 3, e00066-18. [Google Scholar] [CrossRef] [PubMed]
- Šantl-Temkiv, T.; Gosewinkel, U.; Starnawski, P.; Lever, M.; Finster, K. Aeolian dispersal of bacteria in southwest Greenland: Their sources, abundance, diversity and physiological states. FEMS Microbiol. Ecol. 2018, 94, fiy031. [Google Scholar] [CrossRef]
- Li, H.; Yang, Q.; Li, J.; Gao, H.; Li, P.; Zhou, H. The impact of temperature on microbial diversity and AOA activity in the Tengchong Geothermal Field, China. Sci. Rep. 2015, 5, 17056. [Google Scholar] [CrossRef]
- Zhong, S.; Zhang, L.; Jiang, X.; Gao, P. Comparison of chemical composition and airborne bacterial community structure in PM2.5 during haze and non-haze days in the winter in Guilin, China. Sci. Total Environ. 2019, 655, 202–210. [Google Scholar] [CrossRef]
- Sogin, M.L.; Morrison, H.G.; Huber, J.A.; Welch, D.M.; Huse, S.M.; Neal, P.R.; Arrieta, J.M.; Herndl, G.J. Microbial diversity in the deep sea and the underexplored “rare biosphere”. Proc. Natl. Acad. Sci. USA 2006, 103, 12115–12120. [Google Scholar] [CrossRef]
- Caporaso, J.G.; Lauber, C.L.; Walters, W.A.; Berg-Lyons, D.; Huntley, J.; Fierer, N.; Owens, S.M.; Betley, J.; Fraser, L.; Bauer, M.; et al. Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J. 2012, 6, 1621–1624. [Google Scholar] [CrossRef]
- Caracciolo, A.B.; Topp, E.; Grenni, P. Pharmaceuticals in the environment: Biodegradation and effects on natural microbial communities. A review. J. Pharm. Biomed. Anal. 2015, 106, 25–36. [Google Scholar] [CrossRef]
- Caruso, G.; Azzaro, M.; Caroppo, C.; Decembrini, F.; Monticelli, L.S.; Leonardi, M.; Maimone, G.; Zaccone, R.; Ferla, R.L. Microbial community and its potential as descriptor of environmental status. ICES J. Mar. Sci. 2016, 73, 2174–2177. [Google Scholar] [CrossRef]
- Kress, W.J.; García-Robledo, C.; Uriarte, M.; Erickson, D.L. DNA barcodes for ecology, evolution, and conservation. Trends Ecol. Evol. 2015, 30, 25–35. [Google Scholar] [CrossRef] [PubMed]
- Valentini, A.; Pompanon, F.; Taberlet, P. DNA barcoding for ecologists. Trends Ecol. Evol. 2009, 24, 110–117. [Google Scholar] [CrossRef] [PubMed]
- Johnson, J.S.; Spakowicz, D.J.; Hong, B.Y.; Petersen, L.M.; Demkowicz, P.; Chen, L.; Leopold, S.R.; Hanson, B.M.; Agresta, H.O.; Gerstein, M.; et al. Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nat. Commun. 2019, 10, 5029. [Google Scholar] [CrossRef]
- Badotti, F.; de Oliveira, F.S.; Garcia, C.F.; Vaz, A.B.M.; Fonseca, P.L.C.; Nahum, L.A.; Oliveira, G.; Góes-Neto, A. Effectiveness of ITS and sub-regions as DNA barcode markers for the identification of Basidiomycota (Fungi). BMC Microbiol. 2017, 17, 42. [Google Scholar] [CrossRef] [PubMed]
- Wagner, B.D.; Grunwald, G.K.; Zerbe, G.O.; Mikulich-Gilbertson, S.K.; Robertson, C.E.; Zemanick, E.T.; Harris, J.K. On the Use of Diversity Measures in Longitudinal Sequencing Studies of Microbial Communities. Front. Microbiol. 2018, 9, 1037. [Google Scholar] [CrossRef] [PubMed]
- Qian, X.B.; Chen, T.; Xu, Y.P.; Chen, L.; Sun, F.X.; Lu, M.P.; Liu, Y.X. A guide to human microbiome research: Study design, sample collection, and bioinformatics analysis. Chin. Med. J. 2020, 133, 1844–1855. [Google Scholar] [CrossRef]
- Walters, K.E.; Martiny, J.B.H. Alpha-, beta-, and gamma-diversity of bacteria varies across habitats. PLoS ONE 2020, 15, e0233872. [Google Scholar] [CrossRef]
- Mande, S.S.; Mohammed, M.H.; Ghosh, T.S. Classification of metagenomic sequences: Methods and challenges. Briefings Bioinform. 2012, 13, 669–681. [Google Scholar] [CrossRef]
- Sharpton, T.J. An introduction to the analysis of shotgun metagenomic data. Front. Plant Sci. 2014, 5, 209. [Google Scholar] [CrossRef]
- Handelsman, J. Metagenomics: Application of Genomics to Uncultured Microorganisms. Microbiol. Mol. Biol. Rev. 2004, 68, 669–685. [Google Scholar] [CrossRef]
- Ross, E.M.; Moate, P.J.; Bath, C.R.; Davidson, S.E.; Sawbridge, T.I.; Guthridge, K.M.; Cocks, B.G.; Hayes, B.J. High throughput whole rumen metagenome profiling using untargeted massively parallel sequencing. BMC Genet. 2012, 13, 53. [Google Scholar] [CrossRef]
- Dahui, Q. Next-generation sequencing and its clinical application. Cancer Biol. Med. 2019, 16, 4–10. [Google Scholar] [CrossRef] [PubMed]
- Lynch, M.D.J.; Neufeld, J.D. Ecology and exploration of the rare biosphere. Nat. Rev. Microbiol. 2015, 13, 217–229. [Google Scholar] [CrossRef]
- Liu, S.; Moon, C.D.; Zheng, N.; Huws, S.; Zhao, S.; Wang, J. Opportunities and challenges of using metagenomic data to bring uncultured microbes into cultivation. Microbiome 2022, 10, 76. [Google Scholar] [CrossRef]
- Haykin, S.O. Neural Networks and Learning Machines, 3rd ed.; Pearson: Upper Saddle River, NJ, USA, 2008. [Google Scholar]
- Sathya, R.; Abraham, A. Comparison of supervised and unsupervised learning algorithms for pattern classification. Int. J. Adv. Res. Artif. Intell. 2013, 2, 34–38. [Google Scholar] [CrossRef]
- Kotsiantis, S.B.; Zaharakis, I.; Pintelas, P. Supervised machine learning: A review of classification techniques. Emerg. Artif. Intell. Appl. Comput. Eng. 2007, 160, 3–24. [Google Scholar]
- Wilhelm, R.C.; van Es, H.M.; Buckley, D.H. Predicting measures of soil health using the microbiome and supervised machine learning. Soil Biol. Biochem. 2022, 164, 108472. [Google Scholar] [CrossRef]
- Ribeiro, P.; Barbosa, M.I.; Sousa, C.; Rodrigues, P.M. Near-Infrared Spectroscopy Machine-Learning Spectral Analysis Tool for Blueberries (Vaccinium corymbosum) Cultivar Discrimination. Foods 2025, 14, 1428. [Google Scholar] [CrossRef]
- Rodrigues, P.M.; Bispo, B.C.; Garrett, C.; Alves, D.; Teixeira, J.P.; Freitas, D. Lacsogram: A new EEG tool to diagnose Alzheimer’s disease. IEEE J. Biomed. Health Inform. 2021, 25, 3384–3395. [Google Scholar] [CrossRef] [PubMed]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Biau, G.; Scornet, E. A random forest guided tour. Test 2016, 25, 197–227. [Google Scholar] [CrossRef]
- Soman, K.; Loganathan, R.; Ajay, V. Machine Learning with SVM and Other Kernel Methods; PHI Learning Pvt. Ltd.: Delhi, India, 2009. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Fiannaca, A.; La Paglia, L.; La Rosa, M.; Lo Bosco, G.; Renda, G.; Rizzo, R.; Gaglio, S.; Urso, A. Deep learning models for bacteria taxonomic classification of metagenomic data. BMC Bioinform. 2018, 19, 61–76. [Google Scholar] [CrossRef]
- Shalev-Shwartz, S.; Ben-David, S. Understanding Machine Learning; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
- Houdt, G.V.; Mosquera, C.; Nápoles, G. A review on the long short-term memory model. Artif. Intell. Rev. 2020, 53, 5929–5955. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
- García-Jiménez, B.; Muñoz, J.; Cabello, S.; Medina, J.; Wilkinson, M.D. Predicting microbiomes through a deep latent space. Bioinformatics 2020, 37, 1444–1451. [Google Scholar] [CrossRef]
- Prince, S.J. Understanding Deep Learning; The MIT Press: Cambridge, MA, USA, 2023. [Google Scholar]
- Hermans, S.M.; Buckley, H.L.; Case, B.S.; Curran-Cournane, F.; Taylor, M.; Lear, G. Using soil bacterial communities to predict physico-chemical variables and soil quality. Microbiome 2020, 8, 79. [Google Scholar] [CrossRef]
- Chang, H.X.; Haudenshield, J.S.; Bowen, C.R.; Hartman, G.L. Metagenome-Wide Association Study and Machine Learning Prediction of Bulk Soil Microbiome and Crop Productivity. Front. Microbiol. 2017, 8, 519. [Google Scholar] [CrossRef] [PubMed]
- Thompson, J.; Johansen, R.; Dunbar, J.; Munsky, B. Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition. PLoS ONE 2019, 14, e0215502. [Google Scholar] [CrossRef]
- Sørensen, M.B.; Faurdal, D.; Schiesaro, G.; Jensen, E.D.; Jensen, M.K.; Clemmensen, L.K.H. Exploring crop health and its associations with fungal soil microbiome composition using machine learning applied to remote sensing data. Commun. Earth Environ. 2025, 6, 355. [Google Scholar] [CrossRef]
- Hagen, M.; Dass, R.; Westhues, C.; Blom, J.; Schultheiss, S.J.; Patz, S. Interpretable machine learning decodes soil microbiome’s response to drought stress. Environ. Microbiome 2024, 19, 35. [Google Scholar] [CrossRef]
- Novielli, P.; Magarelli, M.; Romano, D.; de Trizio, L.; Di Bitonto, P.; Monaco, A.; Amoroso, N.; Stellacci, A.M.; Zoani, C.; Bellotti, R.; et al. Climate Change and Soil Health: Explainable Artificial Intelligence Reveals Microbiome Response to Warming. Mach. Learn. Knowl. Extr. 2024, 6, 1564–1578. [Google Scholar] [CrossRef]
- Chen, S.; Teng, Y.; Luo, Y.; Kuramae, E.; Ren, W. Threats to the soil microbiome from nanomaterials: A global meta and machine-learning analysis. Soil Biol. Biochem. 2024, 188, 109248. [Google Scholar] [CrossRef]
- Xu, N.; Kang, J.; Ye, Y.; Zhang, Q.; Ke, M.; Wang, Y.; Zhang, Z.; Lu, T.; Peijnenburg, W.; Penuelas, J.; et al. Machine learning predicts ecological risks of nanoparticles to soil microbial communities. Environ. Pollut. 2022, 307, 119528. [Google Scholar] [CrossRef]
- Chen, B.; Liu, M.; Zhang, Z.; Lv, B.; Yu, Y.; Zhang, Q.; Xu, N.; Yang, Z.; Lu, T.; Xia, S.; et al. Data-Driven Approach for Designing Eco-Friendly Heterocyclic Compounds for the Soil Microbiome. Environ. Sci. Technol. 2025, 59, 1530–1541. [Google Scholar] [CrossRef]
- Ebrahimi, M.; Safari Sinegani, A.A.; Sarikhani, M.R.; Mohammadi, S.A. Comparison of artificial neural network and multivariate regression models for prediction of Azotobacteria population in soil under different land uses. Comput. Electron. Agric. 2017, 140, 409–421. [Google Scholar] [CrossRef]
- Sadeghi, S.; Petermann, B.J.; Steffan, J.J.; Brevik, E.C.; Gedeon, C. Predicting microbial responses to changes in soil physical and chemical properties under different land management. Appl. Soil Ecol. 2023, 188, 104878. [Google Scholar] [CrossRef]
- Yuan, J.; Wen, T.; Zhang, H.; Zhao, M.; Penton, C.R.; Thomashow, L.S.; Shen, Q. Predicting disease occurrence with high accuracy based on soil macroecological patterns of Fusarium wilt. ISME J. 2020, 14, 2936–2950. [Google Scholar] [CrossRef]
- Xue, P.; Minasny, B.; Wadoux, A.M.J.; Dobarco, M.R.; McBratney, A.; Bissett, A.; de Caritat, P. Drivers and human impacts on topsoil bacterial and fungal community biogeography across Australia. Glob. Change Biol. 2024, 30, e17216. [Google Scholar] [CrossRef]
- Smith, M.B.; Rocha, A.M.; Smillie, C.S.; Olesen, S.W.; Paradis, C.; Wu, L.; Campbell, J.H.; Fortney, J.L.; Mehlhorn, T.L.; Lowe, K.A.; et al. Natural Bacterial Communities Serve as Quantitative Geochemical Biosensors. mBio 2015, 6, 00326-15. [Google Scholar] [CrossRef] [PubMed]
- Liu, B.; Sträuber, H.; Saraiva, J.; Harms, H.; Silva, S.G.; Kasmanas, J.C.; Kleinsteuber, S.; da Rocha, U.N. Machine learning-assisted identification of bioindicators predicts medium-chain carboxylate production performance of an anaerobic mixed culture. Microbiome 2022, 10, 48. [Google Scholar] [CrossRef] [PubMed]
- Cordier, T.; Forster, D.; Dufresne, Y.; Martins, C.I.M.; Stoeck, T.; Pawlowski, J. Supervised machine learning outperforms taxonomy-based environmental DNA metabarcoding applied to biomonitoring. Mol. Ecol. Resour. 2018, 18, 1381–1391. [Google Scholar] [CrossRef]
- Frühe, L.; Cordier, T.; Dully, V.; Breiner, H.W.; Lentendu, G.; Pawlowski, J.; Martins, C.; Wilding, T.A.; Stoeck, T. Supervised machine learning is superior to indicator value inference in monitoring the environmental impacts of salmon aquaculture using eDNA metabarcodes. Mol. Ecol. 2020, 30, 2988–3006. [Google Scholar] [CrossRef]
- Janßen, R.; Zabel, J.; von Lukas, U.; Labrenz, M. An artificial neural network and Random Forest identify glyphosate-impacted brackish communities based on 16S rRNA amplicon MiSeq read counts. Mar. Pollut. Bull. 2019, 149, 110530. [Google Scholar] [CrossRef]
- Hempel, C.A.; Buchner, D.; Mack, L.; Brasseur, M.V.; Tulpan, D.; Leese, F.; Steinke, D. Predicting environmental stressor levels with machine learning: A comparison between amplicon sequencing, metagenomics, and total RNA sequencing based on taxonomically assigned data. Front. Microbiol. 2023, 14, 1217750. [Google Scholar] [CrossRef]
- Wijaya, J.; Park, J.; Yang, Y.; Siddiqui, S.I.; Oh, S. A metagenome-derived artificial intelligence modeling framework advances the predictive diagnosis and interpretation of petroleum-polluted groundwater. J. Hazard. Mater. 2024, 472, 134513. [Google Scholar] [CrossRef]
- Wijaya, J.; Byeon, H.; Jung, W.; Park, J.; Oh, S. Machine learning modeling using microbiome data reveal microbial indicator for oil-contaminated groundwater. J. Water Process Eng. 2023, 53, 103610. [Google Scholar] [CrossRef]
- Mo, Y.; Bier, R.; Li, X.; Daniels, M.; Smith, A.; Yu, L.; Kan, J. Agricultural practices influence soil microbiome assembly and interactions at different depths identified by machine learning. Commun. Biol. 2024, 7, 1349. [Google Scholar] [CrossRef] [PubMed]
- Oh, S.; Byeon, H.; Wijaya, J. Machine learning surveillance of foodborne infectious diseases using wastewater microbiome, crowdsourced, and environmental data. Water Res. 2024, 265, 122282. [Google Scholar] [CrossRef] [PubMed]
- Jing, Z.; Zhang, Y.; Liu, X.; Li, Q.; Hao, Y.; Li, Y.; Gao, H. Identifying human activities causing water pollution based on microbial community sequencing and source classifier machine learning. Environ. Int. 2025, 195, 109240. [Google Scholar] [CrossRef]
- Kang, J.; Zhang, Z.; Chen, Y.; Zhou, Z.; Zhang, J.; Xu, N.; Zhang, Q.; Lu, T.; Peijnenburg, W.; Qian, H. Machine learning predicts the impact of antibiotic properties on the composition and functioning of bacterial community in aquatic habitats. Sci. Total Environ. 2022, 828, 154412. [Google Scholar] [CrossRef]
- Wijaya, J.; Oh, S. Machine learning reveals the complex ecological interplay of microbiome in a full-scale membrane bioreactor wastewater treatment plant. Environ. Res. 2023, 222, 115366. [Google Scholar] [CrossRef]
- Zhu, Z.; Ding, J.; Du, R.; Zhang, Z.; Guo, J.; Li, X.; Jiang, L.; Chen, G.; Bu, Q.; Tang, N.; et al. Systematic tracking of nitrogen sources in complex river catchments: Machine learning approach based on microbial metagenomics. Water Res. 2024, 253, 121255. [Google Scholar] [CrossRef]
- Wang, L.; Lu, W.; Song, Y.; Liu, S.; Fu, Y.V. Using machine learning to identify environmental factors that collectively determine microbial community structure of activated sludge. Environ. Res. 2024, 260, 119635. [Google Scholar] [CrossRef]
- Kim, Y.; Oh, S. Machine-learning insights into nitrate-reducing communities in a full-scale municipal wastewater treatment plant. J. Environ. Manag. 2021, 300, 113795. [Google Scholar] [CrossRef] [PubMed]
- Larsen, P.E.; Field, D.; Gilbert, J.A. Predicting bacterial community assemblages using an artificial neural network approach. Nat. Methods 2012, 9, 621–625. [Google Scholar] [CrossRef]
- Glasl, B.; Bourne, D.G.; Frade, P.R.; Thomas, T.; Schaffelke, B.; Webster, N.S. Microbial indicators of environmental perturbations in coral reef ecosystems. Microbiome 2019, 7, 94. [Google Scholar] [CrossRef]
- Lambert, B.S.; Groussman, R.D.; Schatz, M.J.; Coesel, S.N.; Durham, B.P.; Alverson, A.J.; White, A.E.; Armbrust, E.V. The dynamic trophic architecture of open-ocean protist communities revealed through machine-guided metatranscriptomics. Proc. Natl. Acad. Sci. USA 2022, 119, e2100916119. [Google Scholar] [CrossRef] [PubMed]
- Dubinsky, E.A.; Butkus, S.R.; Andersen, G.L. Microbial source tracking in impaired watersheds using PhyloChip and machine-learning classification. Water Res. 2016, 105, 56–64. [Google Scholar] [CrossRef]
- Wang, C.; Mao, G.; Liao, K.; Ben, W.; Qiao, M.; Bai, Y.; Qu, J. Machine learning approach identifies water sample source based on microbial abundance. Water Res. 2021, 199, 117185. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Nie, Y.; Wu, X.L. Predicting microbial community compositions in wastewater treatment plants using artificial neural networks. Microbiome 2023, 11, 93. [Google Scholar] [CrossRef]
- Hampton-Marcell, J.T.; Ghosh, A.; Gukeh, M.J.; Megaridis, C.M. A new approach of microbiome monitoring in the built environment: Feasibility analysis of condensation capture. Microbiome 2023, 11, 129. [Google Scholar] [CrossRef]
- Choi, Y.; Kang, B.; Kim, D. Effective detection of indoor fungal contamination through the identification of volatile organic compounds using mass spectrometry and machine learning. Environ. Pollut. 2024, 363, 125195. [Google Scholar] [CrossRef]
- Fang, R.; Chen, T.; Han, Z.; Ji, W.; Bai, Y.; Zheng, Z.; Su, Y.; Jin, L.; Xie, B.; Wu, D. From air to airway: Dynamics and risk of inhalable bacteria in municipal solid waste treatment systems. J. Hazard. Mater. 2023, 460, 132407. [Google Scholar] [CrossRef]
- Lee, B.G.; Jeong, K.H.; Kim, H.E.; Yeo, M.K. Machine learning models for predicting indoor airborne fungal concentrations in public facilities utilizing environmental variables. Environ. Pollut. 2025, 368, 125684. [Google Scholar] [CrossRef]
- Peng, S.; Luo, M.; Long, D.; Liu, Z.; Tan, Q.; Huang, P.; Shen, J.; Pu, S. Full-length 16S rRNA gene sequencing and machine learning reveal the bacterial composition of inhalable particles from two different breeding stages in a piggery. Ecotoxicol. Environ. Saf. 2023, 253, 114712. [Google Scholar] [CrossRef] [PubMed]
- Lee, J.Y.; Miao, Y.; Chau, R.L.; Hernandez, M.; Lee, P.K. Artificial intelligence-based prediction of indoor bioaerosol concentrations from indoor air quality sensor data. Environ. Int. 2023, 174, 107900. [Google Scholar] [CrossRef]
- Miao, Y.; Zhou, T.; Zheng, X.; Mahendra, S. Investigating Biodegradation of 1,4-Dioxane by Groundwater and Soil Microbiomes: Insights into Microbial Ecology and Process Prediction. ACS ES T Water 2023, 4, 1046–1060. [Google Scholar] [CrossRef]
- Dully, V.; Balliet, H.; Frühe, L.; Däumer, M.; Thielen, A.; Gallie, S.; Berrill, I.; Stoeck, T. Robustness, sensitivity and reproducibility of eDNA metabarcoding as an environmental biomonitoring tool in coastal salmon aquaculture—An inter-laboratory study. Ecol. Indic. 2021, 121, 107049. [Google Scholar] [CrossRef]
- Lee, J.Y.; Sadler, N.C.; Egbert, R.G.; Anderton, C.R.; Hofmockel, K.S.; Jansson, J.K.; Song, H.S. Deep learning predicts microbial interactions from self-organized spatiotemporal patterns. Comput. Struct. Biotechnol. J. 2020, 18, 1259–1269. [Google Scholar] [CrossRef]
- Yang, Y.; Shen, Z.; Bissett, A.; Viscarra Rossel, R.A. Estimating soil fungal abundance and diversity at a macroecological scale with deep learning spectrotransfer functions. Soil 2022, 8, 223–235. [Google Scholar] [CrossRef]
- Wang, Y.; Zou, Q. Deep learning meta-analysis for predicting plant soil-borne fungal disease occurrence from soil microbiome data. Appl. Soil Ecol. 2024, 202, 105532. [Google Scholar] [CrossRef]
- Jiang, J.; Zhou, H.; Zhang, T.; Yao, C.; Du, D.; Zhao, L.; Cai, W.; Che, L.; Cao, Z.; Wu, X.E. Machine learning to predict dynamic changes of pathogenic Vibrio spp. abundance on microplastics in marine environment. Environ. Pollut. 2022, 305, 119257. [Google Scholar] [CrossRef]
- Zeng, W.; Gautam, A.; Huson, D.H. DeepToA: An ensemble deep-learning approach to predicting the theater of activity of a microbiome. Bioinformatics 2022, 38, 4670–4676. [Google Scholar] [CrossRef]
- Goodfellow, I.A.; Courville; Bengio, Y. Deep Learning; Adaptive Computation and Machine Learning; The MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Soper, D.S. Greed Is Good: Rapid Hyperparameter Optimization and Model Selection Using Greedy k-Fold Cross Validation. Electronics 2021, 10, 1973. [Google Scholar] [CrossRef]
- Ortigara, A.; Kay, M.; Uhlenbrook, S. A Review of the SDG 6 Synthesis Report 2018 from an Education, Training, and Research Perspective. Water 2018, 10, 1353. [Google Scholar] [CrossRef]
- Campbell, B.M.; Hansen, J.; Rioux, J.; Stirling, C.M.; Twomlow, S.; (Lini) Wollenberg, E. Urgent action to combat climate change and its impacts (SDG 13): Transforming agriculture and food systems. Curr. Opin. Environ. Sustain. 2018, 34, 13–20. [Google Scholar] [CrossRef]
- Recuero Virto, L. A preliminary assessment of the indicators for Sustainable Development Goal (SDG) 14 “Conserve and sustainably use the oceans, seas and marine resources for sustainable development”. Mar. Policy 2018, 98, 47–57. [Google Scholar] [CrossRef]
- Ishtiaque, A.; Masrur, A.; Rabby, Y.W.; Jerin, T.; Dewan, A. Remote Sensing-Based Research for Monitoring Progress towards SDG 15 in Bangladesh: A Review. Remote Sens. 2020, 12, 691. [Google Scholar] [CrossRef]
- Jiang, Y.; Luo, J.; Huang, D.; Liu, Y.; dan Li, D. Machine Learning Advances in Microbiology: A Review of Methods and Applications. Front. Microbiol. 2022, 13, 925454. [Google Scholar] [CrossRef] [PubMed]
- Wang, S.; Sun, A.; Yang, S.; Ni, R.; Lin, X.; Shu, W.; Price, G.; Song, L. Dominance of acetoclastic methanogenesis in municipal solid waste (MSW) decomposition despite high variability in microbial community composition: Insights form natural stable carbon isotope and metagenomic analyses. Energy Environ. Sustain. 2025, 1, 100018. [Google Scholar] [CrossRef]
- Zhao, M.; Zou, G.; Li, Y.; Pan, B.; Wang, X.; Zhang, J.; Xu, L.; Li, C.; Chen, Y. Biodegradable microplastics coupled with biochar enhance Cd chelation and reduce Cd accumulation in Chinese cabbage. Biochar 2025, 7, 31. [Google Scholar] [CrossRef]
- Chu, Y.; Zhang, X.; Tang, X.; Jiang, L.; He, R. Uncovering anaerobic oxidation of methane and active microorganisms in landfills by using stable isotope probing. Environ. Res. 2025, 271, 121139. [Google Scholar] [CrossRef] [PubMed]
- Pang, Q.; Zhao, G.; Wang, D.; Zhu, X.; Xie, L.; Zuo, D.; Wang, L.; Tian, L.; Peng, F.; Xu, B.; et al. Water periods impact the structure and metabolic potential of the nitrogen-cycling microbial communities in rivers of arid and semi-arid regions. Water Res. 2024, 267, 122472. [Google Scholar] [CrossRef] [PubMed]
- Teron, G.; Bordoloi, R.; Paul, A.; Singha, L.B.; Tripathi, O.P. Effect of altitude on soil physico-chemical properties and microbial biomass carbon in the Eaglenest Wildlife Sanctuary of Arunachal Pradesh. Geol. Ecol. Landscapes 2024, 1–19. [Google Scholar] [CrossRef]
- Lin, J.; Cheng, Q.; Kumar, A.; Zhang, W.; Yu, Z.; Hui, D.; Zhang, C.; Shan, S. Effect of degradable microplastics, biochar and their coexistence on soil organic matter decomposition: A critical review. TrAC Trends Anal. Chem. 2025, 183, 118082. [Google Scholar] [CrossRef]
Environment | Article | Sequencing Technique | Classification | Input Data | Number of Samples | Model | Metric | Novel Test Context |
---|---|---|---|---|---|---|---|---|
Terrestrial (different land uses) | Hermans, S. et al. (2020) [71] | 16S rRNA | Predicting soil quality and land use | Composition of soil bacterial communities, bacterial Operational Taxonomic Unit (OTU) tables | 3000 samples | RF (hold-out validation) | Accuracy = 85% | No |
Terrestrial (agricultural fields) | Chang, H. et al. (2017) [72] | Shotgun Metagenomics | Predicting soil productivity | OTU abundances, environmental, soil and crop productivity data | 12 samples | RF (leave-one-out cross-validation) | Accuracy = 79% | No |
Terrestrial (pine litter decomposition) | Thompson, J et al. (2019) [73] | 16S rRNA | Predicting dissolved organic carbon (OC) concentration | OTU abundances, OC concentration | 302 samples | RF (hold-out validation) | R = 0.676 | No |
Terrestrial (agricultural soils across Europe) | Sørensen, M. B. et al. (2025) [74] | ITS | Predicting crop health | Abiotic and biotic data, normalized difference vegetation indexes | 885 samples | RF (5-fold cross-validation) | = 0.58, RMSE = 0.14 | No |
Terrestrial (farmland soils across the USA and Canada) | Wilhelm, R. C. et al. (2022) [58] | eDNA metabarcoding | Predict ecological quality status | Amplicon sequence variant (ASV) abundance profiles, organic matter content, respiration, autoclaved citrate extractable (ACE) protein, active carbon, pH, phosphorus, potassium, minor elements, aggregate stability, available water capacity, surface and subsurface hardness, tillage status | 949 samples | RF, SVM (hold-out validation) | = 0.80 | Farmland vs. pastureland soils |
Terrestrial (soil, root, and rhizosphere) | Hagen, M. et al. (2024) [75] | 16S rRNA | Analysying drought stress impact on microbiome | Relative abundances of bacterial taxa | 332 samples | RF (5-fold cross-validation) | Accuracy = 92.3% | Sorghum-Drought vs. Grass-Drought datasets |
Terrestrial (soil samples) | Novielli, P. et al. (2024) [76] | Not mentioned | Analyzing climate change impact on soil health | Environmental, soil microbiota, biochemical recalcitrance and mineral protection factors | 623 samples | RF, ETC, LoR, XGB, DT, SVM, KNN (10-fold cross-validation) | Accuracy = 92.3%, AUC = 0.964 | No |
Terrestrial (soil samples) | Chen, S. et al. (2024) [77] | Not mentioned | Predicting nanomaterials impact on the microbiome | Nanomaterial features (type, size, exposure dose), duration, pH, soil organic matter content, microbial diversity, biomass, enzyme activities | 2134 paired observation | RF, XGB (hold-out validation) | = 0.71 | No |
Terrestrial (globally distributed soils samples) | Xu, N. et al. (2022) [78] | 16S rRNA | Correlating nanoparticles properties with microbiome stability | Nanoparticles and soil characteristics | 365 samples | RF (10-fold cross-validation) | = 0.91 | No |
Terrestrial (acidic sandy loam) | Chen, B. et al. (2025) [79] | 16S rRNA | Predicting heterocyclic compounds impact on the microbiome | Chemical structure, environmental and microbial features | 156 samples | XGB (hold-out validation) | = 0.94, RMSE = 0.008 | No |
Terrestrial (different land uses) | Ebrahimi, M. et al. (2017) [80] | Not applicable | Predicting Azotobacteria population in soil | pH, electrical conductivity, calcium carbonate equivalent, OC, sand/silt/clay content, hot/cold water extractable OC, light/heavy fraction OC, basal respiration, substrate-induced respiration, bacteria, fungi and actinomycetes counts | 50 samples | ANN, Multivariate Linear Regression (MLR) (hold-out validation) | = 0.76, RMSE = 0.36 | No |
Terrestrial (global agroecosystems with Fusarium-susceptible crops) | Sadeghi, S. et al. (2023) [81] | Not applicable | Predicting soil microbial communities based on soil physical and chemical properties under different agricultural management systems and soil depths | Bulk density, sand, silt, clay content, ammonium, nitrate, phosphorus, potassium, electrical conductivity, pH, organic matter, total phospholipid fatty acid, management treatments and soil depths | 538 samples | Cubist algorithm (5-fold cross-validation combined with hold-out validation) | = 0.96 | No |
Terrestrial (six crops across nine countries/ regions) | Yuan, J. et al. (2020) [82] | 16S rRNA, ITS | Predicting the occurrence of Fusarium wilt disease in soils | OTUs relative abundances | 1549 samples | RF, SVM, LoR (hold-out validation) | Accuracy > 80% | Independent datasets and new field soil samples |
Terrestrial (diverse bioregions and land uses) | Xue, P. et al. (2024) [83] | Amplicon | Predicting spatial distributions of dominant bacterial and fungal phyla across Australia, identifying key environmental and the human impacts on microbiome | Relative abundances of dominant phyla, land use, soil type, climate factors (mean annual aridity index, annual precipitation, temperature range, solar radiation, etc), OC, pH, total nitrogen, total phosphorus, total sulphur, electrical conductivity, cation exchange capacity, and clay content | 1384 samples | RF, QRF (10-fold cross-validation) | = 0.90 | No |
Aquatic (contaminated groundwater and water samples) | Smith M. B. et al. (2015) [84] | 16S rRNA | Predicting water contamination and geochemical conditions | OTU relative abundances | 93 samples | RF (cross-validation) | Accuracy = 82% | No |
Aquatic (bioreactors) | Liu, B. et al. (2022) [85] | 16S rRNA | Predicting bioreactor production | OTU relative abundances | 54 samples | RF (hold-out validation) | Accuracy = 90% | No |
Aquatic (marine benthic) | Cordier, T. et al. (2018) [86] | SSU RNA | Predicting quality status associated with salmon farms | OTU relative abundances, reference biotic indices | 144 samples | RF (hold-out validation) | = 0.89 | No |
Aquatic (marine benthic) | Frühe, L. et al. (2020) [87] | SSU RNA | Predicting environmental quality of marine aquaculture | OTUs relative abundances | 152 samples | RF, SVM (hold-out validation) | = 0.72 | Independent datasets |
Aquatic (water column samples) | Janßen, R. et al. (2019) [88] | 16S rRNA | Predicting water contamination | Taxon count table | 32 samples | RF, ANNs (hold-out validation) | Accuracy = 97.10% | No |
Aquatic (stream mesocosms) | Hempel, C. A. et al. (2023) [89] | 16S rRNA, ITS, metagenomics, and total RNA sequencing | Predicting environmental stressor levels | Taxa relative abundances | 121 samples | KNN, SVML, Ridge, Lasso, RF, SVC, XGB (hold-out validation) | MCC = 0.45 | No |
Aquatic (groundwater) | Wijaya, J. et al. (2024) [90] | 16S rRNA, Metagenomics | Predicting groundwater pollution | Microbial families relative abundances, genes and pathways abundances | 35 samples | LoR, SVML, SVMRBF, RF, DT (hold-out validation) | Accuracy = 98%, AUC = 0.99 | No |
Aquatic (groundwater) | Wijaya, J. et al. (2023) [91] | 16S rRNA | Predicting groundwater pollution | OTUs relative abundances, pH, dissolved oxygen, oxidation-reduction potential, electrical conductivity, total petroleum hydrocarbon, temperature, turbidity | 42 samples | LoR, SVML, SVMRBF, RF (hold-out validation) | Accuracy = 99%, AUC = 0.99 | No |
Aquatic (single long-term agricultural field trial) | Mo, Y. et al. (2024) [92] | 16S rRNA, ITS | Identifying key agricultural factors for microbial community | Microbial community data relative abundances, environmental variables (soil pH, total carbon, total nitrogen, soil moisture and soil bulk density), agricultural practices (fertility source, tillage and cover crop application) | 96 samples | RF (10-fold cross-validation) | > 0.95, AUC = 0.996 | No |
Aquatic (three wastewater treatment plants) | Oh, S. et al. (2024) [93] | 16S rRNA, metagenomic and metatranscriptomic sequencing | Predicting Clostridium perfringens surveillance | Clostridium perfringens abundance, meteorological variables | 66 samples | OLR, LRLR, LRRR, SVRL, SVRRBF, RF, ABR, GBR (hold-out validation) | = 0.78 | No |
Aquatic (sediment water) | Jing, Z. et al. (2025) [94] | 16S rRNA | Tracing human activities causing water pollution | OTUs relative abundances, environmental and geographical indices (spatio-temporal, social development, meteorological, physicochemical indicators), microbiological indices (metacommunity type, Shannon diversity, Simpson diversity, ACE diversity metrics) | 915 samples | ANN, RF, XGB, LGBM, KNN, SVM (hold-out validation) | = 0.924 | No |
Aquatic (lake and river) | Kang, J. et al. (2022) [95] | 16S rRNA | Predicting the relations between antibiotic features and aquatic bacteria | Physical and chemical properties of antibiotics, microbial diversity indices, relative abundance of bacterial modules, functional pathways | Not mentioned | RF (10-fold cross-validation) | = 0.78 | No |
Aquatic (wastewater treatment plants) | Wijaya, J. and Oh, S. (2023) [96] | 16S rRNA | Identifying keystone taxa | OTU relative abundances, operational data | 38 samples | KNN, DT, LoR, SVML, SVMRBF, RF, LR, SVRL, SVRRBF, RFR (hold-out validation) | Accuracy ≥ 91.6%, = 0.98, MSE = 0.34 | No |
Aquatic (river catchments) | Zhu, Z. et al. (2024) [97] | 16S rRNA, shotgun metagenomics | Predicting river’s nitrogen pollution sources | Taxonomic composition and profiles, functional gene annotations, macroscopic characteristics (land use, soil type, elevation, river morphology (length, depth, slope, width)), physicochemical and sediment parameters | Not mentioned | RF (hold-out validation) | Accuracy = 84%, Kappa coefficient = 0.70 | Geographically distinct area |
Aquatic (wastewater treatment plant) | Wang, L. et al. (2024) [98] | 16S rRNA | Identifying the environmental factors that affect microbial communities | OTUs relative abundances, latitude, longitude, climate type, solids retention time, hydraulic retention time, liquor suspended solids, influent biochemical oxygen demand, total nitrogen, total phosphorus, pH, dissolved oxygen, temperature, precipitation | 1262 samples | Extremely Randomized Trees (hold-out validation) | Accuracy = 71.43% | Independent dataset |
Aquatic (wastewater treatment plant) | Kim, Y. and Oh, S. (2021) [99] | 16S rRNA | Predicting operational conditions and identify key microbial taxa | OTU tables, relative abundance of microbial taxa, PCA-transformed coordinates | 18 samples | SVML, LoR, SVM, SVMRBF, RF, DT, KNN (hold-out validation) | Accuracy = 93%, AUC = 0.99 | No |
Aquatic (coastal marine area) | Larsen, P. E. et al. (2012) [100] | 16S rRNA | Predicting microbial community structure | Relative abundance of 24 bacterial orders, in situ and satellite-derived parameters | Not mentioned | ANN (hold-out validation) | Bray-Curtis similarity = 89.7 | Hypothetical environmental conditions |
Aquatic (seawater) | Glasl, B. et al. (2019) [101] | 16S rRNA | Identifying reef microbiomes to use as environmental conditions indicators | OTUs relative abundance, sea surface temperature, chlorophylla, total suspended solids, particulate OC and other water quality parameters | 381 samples | RF (hold-out validation) | Accuracy = 92%, Kappa coefficient 88%, = 0.67, RMSE = 0.5 | No |
Aquatic (open-ocean habitats) | Lambert, B. S. et al. (2022) [102] | Transcriptomics | Predicting the trophic mode of protists in marine environments | Transcriptomes, gene families, nutrient availability, sea surface temperature, light levels, microbial biomass | >541 samples | RF, XGB (5-fold cross-validation) | Accuracy = 81%, Cohen’s k = 0.90 | No |
Aquatic (river and creek) | Dubinsky, E. A. et al. (2016) [103] | DNA microarray | Identifying and distinguishing fecal contamination sources in water samples | 16S rRNA gene fragments | 134 samples | RF, SourceTracker (leave-one-out cross-validation) | AUC = 0.97, Sensitivity = 100%, Specificity = 100% | Challenge and field Samples |
Aquatic (river) | Wang, C. et al. (2021) [104] | 16S rRNA, metagenomics | Predicting the source of water samples | Values of physicochemical indices, abundance data of microbial indices and combination of both | 252 samples | RF (hold-out validation) | Kappa Coefficient = 0.8694 | No |
Aquatic (wastewater treatment plants from different countries) | Liu, X. et al. (2023) [105] | 16S rRNA | Predict microbial compositions | 48 environmental data points, relative abundances of bacterial/archaeal taxa, alpha diversity indices (Shannon–Wiener, Pielou’s evenness, species richness, Faith’s phylogenetic diversity), ASVs and functional groups | 777 samples | ANN (hold-out validation) | = 0.6286 | No |
Air (indoor environments) | Hampton-Marcell, J. T. et al. (2023) [106] | 16S rRNA | Identifying microbial taxa | ASV relative abundance | 38 samples | RF (validation not mentioned) | AUC = 0.60 | No |
Air (different subtracts and materials) | Choi, Y. et al. (2024) [107] | Not mentioned | Differentiating of microbial non-volatile and volatile organic compounds | Non-volatile and volatile organic compounds | 261 samples | RF (10-fold cross-validation) | Accuracy = 100% | No |
Air (municipal solid waste treatment) | Fang, R. et al. (2023) [108] | 16S rRNA | Identifying seasonal exposure biomarkers | ASVs, waste, throat swabs, temperature, humidity, PM2.5, PM10, O33, SO22, NO22, CO, air quality index, waste type, seasonal and spatial factors | 71 samples | RF (10-fold cross-validation) | Accuracy = 100%, Precision = 1.00, AUC = 1.00 | No |
Air (diverse public facilities) | Lee, B. et al. (2025) [109] | Not mentioned | Predicting airborne fungal concentrations | Facility type, floor level, month, air temperature, relative humidity, coarse PM2.5–10, precipitation | 137 samples | ENR, KNN, SVR, RF, GB, XGB, DT (hold-out validation) | MAE = 0.42, MSE = 0.28, RMSE = 0.53, = 0.78 | No |
Air (two types of pig houses) | Peng, S. et al. (2023) [110] | 16S rRNA | Quantifying the influence of environmental factors | Air pollutants concentrations, OTU relative abundance | 48 samples | ABT (cross-validation) | = 0.969 | No |
Air (commercial office and shopping mall samples) | Lee, J.Y.Y. et al. (2023) [111] | Not applicable | Estimating concentrations of bioaerosols and particulate matter | Sensors’ physicochemical data, ultraviolet light-induced fluorescence (UV-LIF) observations, bioaerosol and PM concentrations | 30513 time-series data points | LR, Lasso Regression, RF, XGB (hold-out validation) | Willmott’s Index = 0.82 | No |
Terrestrial, Aquatic (field groundwater samples) | Miao, Y. et al. (2023) [112] | 16S rRNA, shotgun metagenomic | Predicting contaminant levels and duration | Microbial taxa relative abundance, 1,4-dioxane and chlorinated solvents concentration, dissolved oxygen, oxidation-reduction potential, pH, total OC, temperature, sampling depth, aquifer material, and injection of electron donor | 102 samples | RF, MLR, LGBM, AdaBoost, GBR, SVM, NB, KNN (6-fold cross-validation) | Accuracy = 57%, Kappa coefficient = 0.56, = 0.81 | Independent field datasets |
Terrestrial, Aquatic (sediment from coastal salmon aquaculture sites) | Dully, V. et al. (2021) [113] | eDNA metabarcoding | Evaluating eDNA metabarcoding for classifying sediment samples into environmental quality categories | ASVs, Infaunal Quality Index scores | 12 samples | RF (leave-one-out cross-validation) | = 0.91 | No |
Environment | Article | Sequencing Technique | Classification | Input Data | Number of Samples | Model | Metric | Novel Test Context |
---|---|---|---|---|---|---|---|---|
Terrestrial (simulated environments and real co-culture experiments) | Lee, J. et al. (2020) [114] | Not applicable | Predicting microbial interactions | Fluorescence microscopy | 35000 images | CNN, ResNet (5-fold cross-validation) | = 0.844 | No |
Terrestrial (italian and philippine rhizosphere) | García-Jiménez, B. (2020) et al. [69] | 16S rRNA | Predicting microbial composition from phenotypic features | Temperature, precipitation, plant age, maize line, and variety, microbial abundance profiles | 4724 samples | Autoencoder (hold-out validation) | R = 0.7348 | Hypothetical climate change scenarios |
Terrestrial (five ecosystem types) | Yang, Y. et al. (2022) [115] | ITS | Predicting fungal abundance and diversity | Visible/near-infrared spectra, soil properties, climate, vegetation and terrain data, fungal phyla relative abundances | 577 samples | CNN (10-fold cross-validation ) | = 0.73 | No |
Terrestrial (different soil types) | Wang, Y. and Zou, Q. (2024) [116] | 16S rRNA, ITS | Predicting soil-borne fungal diseases | Bacterial and fungal ASV features | 6715 samples | DCA + MLP & RF (hold-out and leave-one-out cross-validation) | Accuracy > 90% | No |
Aquatic (estuary and a mariculture samples) | Jiang, J. et al. (2022) [117] | 16S rRNA | Predicting the relative abundance of Vibrio spp. | Temperature, dissolved oxygen, salinity, pH, total nitrogen, total phosphorus and relative abundance of Vibrio spp. | 150 sets of experimental data | DNN, RF, SVR, ElasticNet, XGB (hold-out validation) | RMSE = 12.16, MAE = 6.67 | No |
Air (commercial office and shopping mall samples) | Lee, J.Y.Y. et al. (2023) [111] | Not applicable | Estimating concentrations of bioaerosols and particulate matter | Physical and chemical data from sensors, UV-LIF observations, bioaerosol/PM concentrations | 30513 time-series data points | LSTM, MLP and RNN (hold-out validation) | Willmott’s Index = 0.82 | No |
All (different “theaters of activity” samples) | Zeng, W. et al. (2022) [118] | Shotgun Metagenomics | Predicting the “theater of activity” of a microbiome | Taxonomic and functional profiles | 6048 samples | DeepToA (hold-out validation) | Accuracy = 98.30% | No |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Barbosa, M.I.; Silva, G.; Ribeiro, P.; Vieira, E.; Perrotta, A.; Moreira, P.; Rodrigues, P.M. Unraveling the Microbiome–Environmental Change Nexus to Contribute to a More Sustainable World: A Comprehensive Review of Artificial Intelligence Approaches. Sustainability 2025, 17, 7209. https://doi.org/10.3390/su17167209
Barbosa MI, Silva G, Ribeiro P, Vieira E, Perrotta A, Moreira P, Rodrigues PM. Unraveling the Microbiome–Environmental Change Nexus to Contribute to a More Sustainable World: A Comprehensive Review of Artificial Intelligence Approaches. Sustainability. 2025; 17(16):7209. https://doi.org/10.3390/su17167209
Chicago/Turabian StyleBarbosa, Maria Inês, Gabriel Silva, Pedro Ribeiro, Eduarda Vieira, André Perrotta, Patrícia Moreira, and Pedro Miguel Rodrigues. 2025. "Unraveling the Microbiome–Environmental Change Nexus to Contribute to a More Sustainable World: A Comprehensive Review of Artificial Intelligence Approaches" Sustainability 17, no. 16: 7209. https://doi.org/10.3390/su17167209
APA StyleBarbosa, M. I., Silva, G., Ribeiro, P., Vieira, E., Perrotta, A., Moreira, P., & Rodrigues, P. M. (2025). Unraveling the Microbiome–Environmental Change Nexus to Contribute to a More Sustainable World: A Comprehensive Review of Artificial Intelligence Approaches. Sustainability, 17(16), 7209. https://doi.org/10.3390/su17167209