Next Article in Journal
Colorimetry Characteristics and Influencing Factors of Sulfur-Rich Lapis Lazuli
Previous Article in Journal
Corrosion and Discharge Behavior of Mg-Y-Al-Zn Alloys as Anode Materials for Primary Mg-Air Batteries
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Machine Learning for Photocatalytic Materials Design and Discovery

by
David O. Obada
1,2,3,*,
Shittu B. Akinpelu
4,*,
Simeon A. Abolade
1,
Mkpe O. Kekung
3,
Emmanuel Okafor
5,
Syam Kumar R
1,4,
Aniekan M. Ukpong
2,6 and
Akinlolu Akande
1,4,*
1
Mathematical Modelling and Intelligent Systems for Health and Environment Research Group, School of Science, Atlantic Technological University, Ash Lane, Ballytivnan, F91 YW50 Sligo, Ireland
2
Theoretical and Computational Condensed Matter and Materials Physics Group (TCCMMP), School of Chemistry and Physics, University of KwaZulu-Natal, Pietermaritzburg 3201, South Africa
3
Department of Mechanical Engineering, Ahmadu Bello University, Zaria 810222, Nigeria
4
Modelling & Computation for Health and Society (MOCHAS), Atlantic Technological University, Ash Lane, Ballytivnan, F91 YW50 Sligo, Ireland
5
Center of Excellence in Artificial Intelligence, Naif Arab University for Security Sciences, Riyadh 11452, Saudi Arabia
6
National Institute for Theoretical and Computational Sciences (NITheCS), Pietermaritzburg 3201, South Africa
*
Authors to whom correspondence should be addressed.
Crystals 2025, 15(12), 1034; https://doi.org/10.3390/cryst15121034
Submission received: 15 August 2025 / Revised: 30 October 2025 / Accepted: 5 November 2025 / Published: 3 December 2025
(This article belongs to the Section Inorganic Crystalline Materials)

Abstract

Traditionally, the development and optimisation of photocatalytic materials have relied on experimental approaches and density functional theory (DFT) calculations. Although these methods have driven significant scientific progress, they are increasingly constrained by high computational costs, lengthy development cycles, and limited scalability. In recent years, machine learning (ML) has emerged as a powerful and sustainable alternative, offering a data-driven framework that accelerates materials discovery through rapid and accurate property prediction. This review highlights the essential components of the ML workflow data collection, feature engineering, model selection, and validation while exploring its application in predicting photocatalytic properties. It further discusses recent advances in forecasting key characteristics such as band edge positions, charge carrier mobility, and surface reactivity using both supervised and unsupervised ML techniques. Persistent challenges, including data scarcity, model interpretability, and generalisability, are also addressed, alongside potential strategies to improve the robustness and reliability of ML-driven materials design. By combining high prediction accuracy with superior computational efficiency, ML holds the potential to revolutionise high-throughput screening and guide the systematic development of next-generation photocatalysts.

1. Introduction

Materials have been the global cornerstone shaping technological advancement [1]. Many modern technologies owe their existence to the availability of suitable materials. With the increase in technological and innovation demands, materials science has become a pivotal field in science and technology [2]. The development of advanced materials such as photocatalysts, capable of reducing greenhouse gas emissions and degrading harmful pollutants, is vital for environmental sustainability [3,4]. Photocatalysts are transformative materials that harness sunlight to drive chemical reactions, making them invaluable in energy applications [3]. They play a crucial role in solar fuel production, such as enabling water splitting to generate hydrogen (H2) and reducing carbon dioxide (CO2) into hydrocarbons like methane and methanol [4]. These processes not only provide clean, renewable energy but also help reduce greenhouse gas emissions, addressing both energy and environmental challenges. Also, photocatalysts enable the storage of solar energy as chemical energy, ensuring a continuous energy supply even when sunlight is unavailable. By integrating with photovoltaic systems, photocatalysts can capture unused light wavelengths, enhancing the efficiency of solar energy systems [5].
Beyond energy applications, photocatalysts are highly effective in environmental remediation. They are widely used to degrade pollutants in water and air through photocatalytic oxidation processes [6]. Figure 1 presents the diverse applications of single and heterojunction photocatalysts in three key areas: water and wastewater treatment (pollutant degradation, microbial inactivation), air purification (pollutant removal, self-cleaning concrete), and energy conversion (water splitting, CO2 reduction). These examples demonstrate the remarkable versatility of photocatalysts in promoting environmental remediation and supporting sustainable energy production through effective pollutant control and resource recovery.
The photocatalytic cycle begins with photon absorption (hv ≥ Eg), creating electron–hole pairs. The band gap (Eg) must satisfy two competing requirements: it must be sufficiently wide to provide a driving force for redox reactions, yet narrow enough to absorb visible light (1.8–3.0 eV for solar applications) [7]. Beyond the band gap value, the absolute band edge positions relative to redox potentials are equally important. The conduction band minimum (CBM) must be more negative than the reduction potential (e.g., H+/H2 at 0 V vs. NHE), while the valence band maximum (VBM) must be more positive than the oxidation potential (e.g., O2/H2O at +1.23 V) [8]. This dual requirement of band gap and band edge alignment represents a unique challenge for ML models compared to single-property optimisation common in other material classes.
Following light absorption, photogenerated electrons and holes must separate and migrate to surface active sites before recombination occurs. The charge carrier lifetime and mobility thus become critical descriptors [9]. Materials with appropriate doping and favourable band structure curvature (effective mass) exhibit longer carrier lifetimes, typically in the nanosecond to microsecond range for efficient photocatalysts [10]. However, these properties are rarely reported in literature datasets, creating a fundamental challenge for ML models trained predominantly on static electronic structure data.
Surface chemistry determines the final reaction kinetics. The density and nature of active sites, surface hydroxyl groups, oxygen vacancies, and adsorption energies of reaction intermediates all influence photocatalytic rates [11]. For CO2 reduction, the binding energies of CO2, COOH, and CO intermediates must be optimised; binding that is too strong poisons the surface, while binding that is too weak provides insufficient stabilisation [12]. Photocatalysts can break down organic contaminants, dyes, and harmful microorganisms in wastewater, contributing to cleaner water resources [13]. In air purification, photocatalysts help eliminate volatile organic compounds (VOCs) [14] and other hazardous substances, improving indoor and outdoor air quality [15]. This dual functionality in energy conversion and pollutant degradation underscores the versatility of photocatalysts in addressing global sustainability challenges. The advancement of photocatalysis relies on innovative materials such as titanium dioxide (TiO2) [16] and perovskites [17], etc. The discovery of efficient photocatalysts hinges on understanding their electronic structures, band gaps, and surface chemistries [18]. These materials are engineered for optimal light absorption, stability, and efficiency. Despite challenges like low efficiency and limited visible light absorption, research leveraging machine learning (ML) and computational tools is accelerating the discovery of high-performance photocatalysts [19].
The guiding ideology of building materials for technological use can be summarised in four paradigms, as shown in Figure 2 [20], which are empirical trial and error, physical and chemical laws, computer simulations, and big data-driven science. Traditionally, materials were discovered through experimental methods. Early discoveries were made through experimentation without a theoretical framework. Later, systematic approaches utilising physical and chemical principles emerged. However, computational techniques, such as density functional theory (DFT), have revolutionised the field by enabling the prediction of material properties before experimental validation. Despite their success, these methods are computationally expensive and unsuitable for exploring the vast chemical space. ML addresses these limitations by analysing large datasets from computational and experimental sources. ML techniques, including regression, classification, clustering, and neural networks, excel at uncovering complex relationships between material compositions, structures, and properties. This capability has significantly advanced materials discovery, particularly for photocatalysts and energy applications. The advent of computational techniques allowed for the simulation and prediction of material properties [21]. Finally, integrating data mining and artificial intelligence (AI) has unified theory, experimentation, and simulation, ushering in a new era of materials discovery. The fourth paradigm, driven by big data and ML, has emerged as a tool in materials science. ML was proposed by Samuel in 1959 and has been widely applied in computer vision, general game playing, economics, data mining, and bioinformatics, among other areas [22].
A photocatalyst’s performance is dictated by key physicochemical properties which ML approaches increasingly help predict and optimise, enabling rapid screening of candidate materials [23]. ML models can predict band gaps from composition or structure, reducing reliance on expensive quantum calculations [24]. Band-edge alignment is equally critical: conduction and valence band positions must straddle the redox potentials of the target reaction with sufficient overpotential, e.g., for water splitting, to drive H2 and O2 evolution. ML tools screen materials for proper band edges using descriptors like electronegativities or multi-task models predicting both gaps and edges [25]. Charge carrier dynamics which includes mobility, lifetime, and recombination rates determine whether photogenerated electrons and holes reach reactive sites, and ML models trained on spectroscopy data can identify structural or dopant factors that reduce recombination. Stability under illumination and reaction conditions is also essential. Thermodynamic, chemical, and operational stability must be considered, as otherwise materials may photocorrode or deactivate. Photocatalyst stability under operational conditions (aqueous solutions, oxidative/reductive environments, UV irradiation) is paramount yet often overlooked in initial ML screening [26]. By addressing these limitations, photocatalysts are poised to revolutionise renewable energy systems and environmental remediation, offering sustainable solutions to global energy and ecological crises. Overall, ML accelerates identification of photocatalysts with the optimal combination of light absorption, redox driving force, charge dynamics, and stability. Models trained on formation energy datasets can estimate whether a composition resists decomposition, while classification models categorise materials as “stable” or “unstable” under specific pH or potential conditions based on past experimental data [25]. By filtering out unstable candidates or suggesting dopants and structural modifications, ML ensures that photocatalysts are both active and durable. Combined with predictions of band gaps and band-edge positions [23,24], these stability insights help define the ideal photocatalyst profile. With key properties predictable, material engineering strategies such as doping, alloying, and forming heterojunctions can be accelerated and refined using ML, unifying optimal properties with enhanced catalytic performance.
A major challenge in ML-driven materials discovery may be publication bias in literature-derived datasets, which is also noticed in photocatalyst materials. Scientific journals preferentially report successful experiments, leading to a potential overrepresentation of high-performing materials while inactive or poor-performing photocatalysts remain undocumented. This “survivorship bias” prevents ML models from learning complete structure-property relationships, particularly the factors causing photocatalytic failure. To improve dataset representativeness, researchers should implement active learning workflows that experimentally validate both high- and low-predicted performers [27,28]. Complementary strategies include augmenting experimental data with high-throughput computational screening across broader chemical space [29]. Researchers must critically examine datasets before model development, recognising that ML algorithms cannot overcome fundamentally biased training data.
Thus, this review is motivated by the growing significance of ML in expediting the design and discovery of photocatalyst materials. It gives an overview of photocatalyst design principles and descriptors, the materials science databases, and a structured exploration of ML methodologies, tools, and applications in computational materials science. Emphasis is placed on key aspects, including property prediction and the integration of ML into photocatalytic applications, in addition to the intersection of ML and photocatalyst materials to provide sustainable pathways for future advancements in the field.

2. ML-Enhanced Photocatalyst Design Principles and Descriptors

Modern photocatalyst design combines structural, compositional, and surface engineering to optimise performance, with ML accelerating these efforts. Traditional strategies like doping, defect engineering, heterojunction construction, and alloying tune band gaps, band-edge positions, and charge separation, but rely heavily on trial-and-error approaches. ML now enables rapid evaluation of potential modifications, predicting how changes in composition or structure affect key photocatalytic properties [25]. Doping and defect introduction remain foundational, allowing control of light absorption, carrier mobility, and mid-gap states. ML models trained on experimental or computational datasets can forecast optimal dopant types and concentrations, reducing experimental iterations. Heterojunctions, including Type-II and Z-scheme architectures, further improve charge separation and broaden spectral utilisation. ML-assisted screening of thousands of 2D van der Waals heterostructures identified a small subset of Z-scheme candidates with ideal band alignments, demonstrating ML’s efficiency in narrowing search spaces and guiding rational design [30]. Alloying and solid-solution formation enable continuous tuning of electronic and optical properties, but experimental exploration of multicomponent systems is challenging. ML models, using DFT or experimental inputs, predict band gaps, stability, and optimal composition ranges, as shown in double perovskite alloys, facilitating the identification of compositions that balance light absorption and durability [31]. Surface reaction descriptors complement structural and compositional tuning by ensuring catalytic functionality. Adsorption energies of intermediates (e.g., H*, OH*, N2) can be predicted by ML models, guiding the selection of active sites, dopants, or single-atom catalysts for reactions such as water splitting, nitrogen reduction, CO2 conversion, and pollutant degradation [32]. Integrating these descriptors with ML-driven optimisation ensures that enhanced band structure, charge separation, and stability translate into real-world photocatalytic performance.

3. Overview of Materials Database

ML prediction will be feasible by leveraging vast datasets that can be obtained from useful repositories such as NOMAD (http://nomad-coe.eu) [33], Materials Project (http://materialsproject.org) [34], Aflowlib (http://www.aflowlib.org) [35], and OQMD [36] (http://oqmd.org), etc., as shown in Table 1. A fundamental challenge in ML for photocatalysis is the limited size of experimental datasets compared to the vast chemical space of potential photocatalysts. While computational databases like the Materials Project contain over 140,000 compounds [34], experimental photocatalytic data remain scarce, with most curated datasets containing fewer materials [37], representing a tiny fraction of synthesised photocatalysts. This data scarcity becomes more acute when targeting specific applications. Datasets for CO2 reduction photocatalysts typically contain fewer than 200 entries with complete characterisation data. Mining or learning from these resources, or other reliable existing data, can reveal previously unknown correlations between material properties, and enable the derivation of both qualitative and quantitative rules, often referred to as surrogate models. These models can predict material properties far more rapidly and cost-effectively, with substantially less human effort, than the benchmark simulations or experimental methods originally used to generate the data. ML can therefore predict properties, optimise synthesis processes, and identify novel materials, effectively complementing and enhancing traditional approaches.

4. Advances in Machine Learning Methods for Photocatalyst Discovery

Recent years have witnessed a paradigm shift in materials discovery driven by the rapid integration of ML into computational materials science. In photocatalysis research, ML has emerged as a transformative tool, capable of accelerating the identification and optimisation of candidate materials far beyond the pace of traditional experimentation or first-principles simulations. By leveraging vast datasets and advanced learning algorithms, ML models can uncover complex, non-linear relationships between structural, electronic, and catalytic properties that are often inaccessible to human intuition. This capability not only enhances predictive accuracy but also enables rational design of photocatalysts with tailored band structures, surface energetics, and reaction selectivity. The growing synergy between ML frameworks, high-throughput computation, and automated literature mining is redefining how photocatalysts are screened, understood, and optimised for sustainable energy applications.
ML is gradually becoming a cornerstone of modern computational science, enabling energy–material advancements. A robust framework for conducting ML studies ensures the development of accurate, generalisable, and efficient models. The foundation of any ML study is the dataset, which involves collecting, cleaning, and organising data to ensure reliability and relevance. For instance, in materials science, especially for energy studies, datasets might include properties of compounds derived from manual extraction from the literature [39], Natural Language Processing (NLP)- or NLP-based automated literature mining [40], high-throughput computation [41], and extraction from materials databases [42]. Identifying relevant data sources and ensuring the dataset covers the intended scope of the study is critical, as data diversity and representativeness help avoid biases. After the collation of the dataset using NLP, for instance, there could be cases of missing values, outliers, and inconsistencies, which can be handled through techniques like imputation or removal of erroneous entries to ensure data quality [43]. Dividing the dataset into training, validation, and testing subsets is typically performed in ratios such as 70-20-10 or 80-10-10. However, many studies focus solely on dividing the dataset into training and testing sets. This division can be implemented in various proportions, but it is crucial to allocate between 85% and 95% of the dataset for training, depending on the total size of the dataset. This practice is essential to ensure the ML model achieves optimal performance. For small datasets, augmentation techniques such as scaling or synthetic data generation can expand the dataset size and improve model robustness [44].
Feature extraction is another ML technique that aims to more accurately represent samples such as numerical values that a computer can understand. For example, it extracts information about crystal elements, atomic positions, atomic interactions, and local structures to help the model understand material characteristics [45,46]. This then transforms raw data into meaningful inputs for ML models, bridging the gap between raw data and model training. Identifying the most relevant features reduces dimensionality and enhances model performance, using methods like the Pearson correlation coefficient and recursive feature elimination, and embedded approaches such as the Least Absolute Shrinkage and Selection Operator (LASSO) [47]. Standardising data to ensure uniform ranges for all features is achieved through normalisation and scaling techniques like Min-Max scaling and Z-score normalisation. Dimensionality reduction methods like Principal Component Analysis (PCA) [48] or t-SNE help reduce the feature space while retaining essential information [49], streamlining the model training process. Recent studies have highlighted the effectiveness of automated feature engineering tools, which can significantly reduce the time and expertise required for this step [50,51].
Model training involves selecting and optimising an algorithm to identify patterns in data while balancing model complexity, training time, and performance. Choosing an appropriate algorithm depends on the problem type and data characteristics. For example, decision trees are suitable for interpretable models, neural networks excel with high-dimensional data, and support vector machines are effective for classification tasks. Hyperparameter tuning through techniques like grid search, random search, or Bayesian optimisation enhances model performance [52,53]. Using k-fold cross-validation enhances model robustness and mitigates overfitting, while the choice of evaluation metrics depends on the task. For classification tasks, metrics like accuracy, precision, recall, and F1 score are commonly employed, whereas regression tasks typically use metrics such as the coefficient of determination ( R 2 ), mean squared error (MSE), root mean squared error (RMSE), and mean absolute error (MAE). The workflow is diagrammatically represented in Figure 3. The figure represents a stepwise progression from raw data preparation to feature engineering, application of ML algorithms, model training, and finally evaluation, highlighting the sequential workflow in an ML process. Nonetheless, challenges such as high sensitivity to hyperparameter selection remain active areas of research [54].
ML encompasses various models, each suited to specific problem types. In supervised learning [55], ML models are trained on labelled data to predict outcomes, addressing tasks like classification and regression. Algorithms such as linear regression, logistic regression, support vector machines, and neural networks are commonly used, with applications like predicting photocatalytic properties of perovskites based on compositional features. Unsupervised learning, on the other hand, deals with unlabelled data, uncovering hidden patterns or structures through methods like K-means clustering, hierarchical clustering, Gaussian mixture models, and autoencoders [56]. This method is particularly useful for grouping materials with similar photocatalytic properties, for example, for exploratory analysis. Semi-supervised learning leverages both labelled and unlabelled data, reducing the dependency on extensive labelled datasets. Techniques such as self-training, graph-based methods, and semi-supervised support vector machines enhance model predictions by combining small labelled datasets with large unlabelled ones. Despite its promise, semi-supervised learning often faces challenges related to the quality and representativeness of the unlabelled data [57].
Reinforcement learning (RL) [58] involves training autonomous agents to make sequential decisions by maximising cumulative rewards within a defined environment. Algorithms such as Q-learning, deep Q-networks (DQNs), and policy gradient methods have been successfully applied to optimise experimental parameters and guide the discovery of novel materials. However, implementing RL in real-world materials research often demands extensive computational resources, sophisticated environment modelling, and large numbers of iterations to achieve convergence [59]. Transfer learning (TL) complements reinforcement learning by enabling models to leverage knowledge gained from one domain or task to improve learning efficiency in another. In materials science, this can involve adapting a model pre-trained on general material datasets to predict specific photocatalytic properties such as those of double perovskites, thereby significantly reducing data and computational requirements [60]. Moreover, transfer learning can be integrated with reinforcement learning to accelerate policy optimisation, especially in data-scarce environments. For instance, an RL agent trained on simulated materials data can transfer its learned strategies to experimental conditions through domain adaptation techniques, improving generalisation across differing data distributions [61,62]. Active learning (AL) [63] further enhances this synergy by identifying the most informative data points for model retraining, thereby minimising the need for extensive labelling. In the context of ML interatomic potentials (MLIPs) [64], AL strategies such as uncertainty sampling, query-by-committee, and expected model change are employed to select configurations where prediction uncertainty is highest, which are then refined using high-fidelity quantum–mechanical calculations or experiments. Increasingly, AL is combined with semi-supervised and transfer learning frameworks to maximise efficiency and accuracy in developing MLIPs under limited data or computational resources [65].
After discussing various ML methods, it is essential to highlight recent advancements in applying ML to predict the photocatalytic properties of materials. This area of research is still in its infancy, presenting a promising opportunity for further exploration. For this review, the literature is examined through the lens of the specific ML algorithms employed, providing a structured perspective on the methodologies and their applications in this emerging field.

4.1. Ensemble Methods

Ensemble methods are useful ML techniques that can combine multiple models to improve prediction accuracy and robustness [66]. The two primary types of ensemble methods are bagging and boosting. Bagging, short for bootstrap aggregating, trains multiple models on different subsets of the data and averages their predictions to reduce variance [67]. Random Forest, a popular bagging method, has been widely applied in materials science to predict properties like band gaps and thermal conductivity. On the other hand, boosting trains models sequentially, with each model correcting the errors of its predecessor. Methods such as Gradient Boosting and XGBoost excel in capturing subtle patterns in the data, making them ideal for high-accuracy property predictions [68]. By aggregating the predictions of individual models, ensemble methods leverage the strengths of each while mitigating their weaknesses. This approach is particularly beneficial in complex domains like materials science, where data can be noisy or sparse. Some researchers have worked on using the ensemble learning method to predict photocatalytic properties. Moharramzadeh [69] investigated the photocatalytic efficiency of cobalt and Co2O2 clusters supported on graphitic carbon nitride (-C3N4) for CO2 reduction. ML was applied to identify descriptors influencing reaction transition state energies, such as adsorption energies and normalised valence electrons. The ML analysis revealed that these descriptors are important in understanding catalytic activity. The study concluded that single cobalt atoms show higher CO2 reduction activity compared to clusters, offering valuable insights into catalyst design for CO2 reduction and emphasising the role of ML in optimising catalytic performance. Also, Xu et al. [70] used a dataset of 350 random solid solutions, and applied various ML algorithms, with Random Forest Regressor (RFR) showing the best performance for predicting band gaps and band-edge positions. The RFR model achieved values of 0.93, 0.97, and 0.98 for band gap, valence band maximum (VBM), and conduction band minimum (CBM), respectively. The ML approach uncovered microstructure–property relationships and guided the design of efficient photocatalysts. The study highlighted the power of ML in optimising material properties for enhanced photocatalytic activity. Moreover, Cheng et al. [71], in their review, highlighted the integration of ML with experimental and DFT approaches for CO2 photoreduction to CH4. XGBoost regression (XGBR) demonstrated its capability to reduce computational costs while maintaining accuracy in predicting photocatalyst properties. The study emphasised the potential of combining high-throughput screening and ML to accelerate the discovery of next-generation photocatalysts. Additionally, it stressed the importance of high-quality datasets for meaningful predictions, showcasing ML’s transformative role in photocatalyst development. Yan et al. [72] developed ML models, including Random Forest, XGBoost, and CatBoost, to correlate experimental parameters with the H2 production rate of element-doped C3N4 photocatalysts. The models identified key features influencing photocatalytic performance and revealed new insights into less-studied parameters. The study concluded that ML is a reliable tool for optimising doping strategies, enabling efficient and cost-effective catalyst design. This research highlights ML’s capability to enhance the understanding and performance of doped photocatalysts.
In an article by Liu et al. [73], they focused on a metal-free photocatalyst derived from cellulose for hydrogen peroxide production. ML models, including XGBoost and Gradient Boosted Decision Trees (GBDTs), identified six key features affecting catalytic performance. XGBoost achieved the best prediction accuracy, guiding catalyst synthesis optimisation. The study highlighted the synergy between ML and transient photovoltage technology in designing sustainable catalysts, demonstrating ML’s utility in advancing green chemistry applications. Arabacı et al. [74] employed ML models, such as XGBoost and ensemble methods, to predict hydrogen evolution rates using Cu/-C3N4 catalysts. XGBoost achieved a value of 0.9942, demonstrating excellent predictive performance. The study concluded that ML could reliably forecast photocatalytic activity and optimise experimental conditions, enhancing the efficiency of renewable hydrogen production. This work exemplifies ML’s role in driving sustainable energy solutions. However, Zong et al. [75] focused on improving the efficiency of photocatalysts for the nitrogen reduction reaction (NRR), a key process for sustainable ammonia production. They proposed a series of Single Atom Catalysts (SACs) with transition metals anchored on porous boron nitride (p-BN) nanosheets. The researchers used DFT and ML techniques to predict the catalytic performance of these materials. Specifically, they applied Sure Independence Screening and Sparsifying Operation (SISSO) algorithms and Random Forest to develop a linear relationship descriptor that links the catalytic activity to the central metal atom and its coordination environment. The study revealed that the Re-B3@p-BN catalyst demonstrated excellent catalytic activity, with a limiting potential of 0.31 V for nitrogen reduction. The ML models efficiently predicted materials with high catalytic activity and selectivity, significantly reducing the time and resources required for materials discovery. The authors concluded that ML could accelerate the identification of high-performance photocatalysts, with Re-B3@p-BN emerging as a promising candidate for sustainable nitrogen fixation. This work demonstrates how ML can be leveraged to streamline the design and optimisation of catalysts, pushing the boundaries of catalytic performance. Zhao et al. [76] used ML models, including Support Vector Machines (SVM) and XGBoost, to predict the type and band gap of -GaN-based van der Waals heterojunctions. The classification model achieved an AUC of 0.93, and the regression model for band gap prediction had a mean absolute error of 0.24 eV. These findings demonstrated the efficiency of ML in screening 2D heterostructures for photocatalytic applications, emphasising its role in materials discovery and optimisation. Biswas et al. [77] used multi-fidelity ML and screened 150,000 halide perovskites for photocatalytic water splitting. Composition-based ML regression models, trained on DFT data, identified hundreds of stable compounds with suitable band gaps and edges. Validations with high-level DFT confirmed the predictions, showcasing ML’s role in navigating vast chemical spaces and accelerating photocatalyst discovery. This study underscored ML’s potential to transform material screening processes, paving the way for innovative photocatalyst development.

4.2. Neural Networks (NNs)

NNs are computational models inspired by the human brain, designed to learn complex relationships between input and output data [78,79]. Their versatility and scalability make them a cornerstone of modern ML, with applications spanning numerous fields, including materials science. At their core, neural networks consist of layers of interconnected nodes (neurons) as depicted in Figure 4. The input layer receives the data, while one or more hidden layers process it using weighted connections and activation functions to capture non-linear relationships. The output layer generates the final predictions. Basic neural networks are effective for simple tasks, but advanced architectures like deep neural networks (DNNs) extend their capabilities to handle intricate problems.
In materials science, neural networks have been employed for tasks such as predicting mechanical properties [81], electronic band gaps [82,83], and optical behaviour [84]. Convolutional neural networks (CNNs) [85], specialised for spatial data, are particularly useful for analysing microscopy images to study defects or grain boundaries. Graph neural networks (GNNs) [86] represent materials as graphs, with atoms as nodes and bonds as edges, enabling predictions at the atomic scale. Furthermore, transfer learning, where pre-trained models are fine-tuned for specific tasks, has become a valuable tool for reducing computational costs in materials research. Neural networks also play a critical role in inverse design, where the goal is to generate material structures with target properties. By learning structure–property relationships, these models can suggest new material compositions or configurations, accelerating the discovery process. For photocatalytic applications, Tao et al. [87] utilised ML models such as Gradient Boosting Regression (GBR) and Backpropagation Artificial Neural Networks (BPANNs), to predict band gaps and hydrogen production rates of ABO3 type perovskites. BPANN exhibited superior prediction accuracy for hydrogen production rates, while GBR excelled for band gap predictions. From a pool of 30,000 candidates, 14 promising perovskites were identified, showcasing the effectiveness of ML in accelerating photocatalyst discovery. The study demonstrated the synergy between ML and high-throughput screening in identifying high-performance materials.

4.3. Graph Neural Networks for Photocatalyst Modelling

Graph neural networks (GNNs) are a class of deep learning models particularly well-suited to materials science because they can naturally represent atomic structures as graphs. In a GNN, nodes represent atoms, and edges represent bonds or neighbouring relationships, allowing the crystal or molecular structure of a photocatalyst to be encoded in a graph form [88]. During model training, GNNs perform iterative “message passing” between connected atoms to learn an abstract representation of the material’s structure. Popular architectures include the Crystal Graph Convolutional Neural Network (CGCNN) and the Materials Graph Network (MEGNet), which have demonstrated the ability to predict properties like formation energies and band gaps directly from crystal structures [89,90]. CGCNN have emerged as a good ML algorithm for predicting semiconductor band structures, with recent studies demonstrating their effectiveness in predicting valence band maximum (VBM) and conduction band minimum (CBM) through transfer learning approaches for photocatalytic water splitting applications [91]. For example, a recent study used automated GNN models (CGCNN and MEGNet) to predict the band gaps of over 10,000 metal–organic frameworks (MOFs) with high accuracy [90]. These models were trained on databases of DFT-computed electronic properties and could generalise to new MOF structures, enabling rapid screening of MOF optoelectronic performance. GNN architectures are often enriched with advanced layers such as graph attention mechanisms or gated convolutions, which further refine how each atom’s environment is weighed in the prediction. Notably, integrating a graph attention network (GAT) architecture can improve accuracy by learning which neighbouring interactions are most important. In one example relevant to photocatalysis, a GAT-based model (incorporating both molecular graph features and experimental conditions) achieved an R 2 0.90 in predicting pollutant degradation rates on TiO2, outperforming standard graph convolutional networks [92]. A notable multi-task regression framework utilising deep neural networks has been developed to simultaneously predict CBM, VBM, and solar-to-hydrogen (STH) efficiency from data comprising over 15,000 materials, achieving mean squared errors of 0.0001 and R 2 values of 0.8265 for STH predictions [93]. Recent AI-driven frameworks have integrated graph neural networks to predict band gap energy and photocatalytic efficiency with accuracies within ±0.05 eV, combining reinforcement learning and physics-informed neural networks for hydrogen production optimisation [94]. Furthermore, graph convolutional neural networks have demonstrated scalability in predicting HOMO-LUMO gaps in molecular systems with high accuracy, expanding the applicability of these methods beyond crystalline photocatalysts [95]. GNN models are typically trained via supervised learning (e.g., backpropagation), using known materials data (from either computations or experiments) as the training set. Through this process, the network’s millions of parameters adjust to encode chemical knowledge, yielding a model that can map atomic structure to target properties. One practical advantage of GNNs is that they eliminate the need for manual feature engineering. The model learns directly from connectivity and elemental attributes to construct an internal representation of the catalyst. This has proven powerful for complex materials where properties depend on subtle structural details. GNNs can capture effects of local geometry, bonding, and composition in a unified framework. Despite their strengths, standard GNNs produce point estimates and do not natively provide uncertainty quantification. To address this, researchers often employ techniques like model ensembling, Monte Carlo dropout, or evidential regression on GNN outputs to estimate confidence intervals [88]. These measures are important in a scientific context to gauge the reliability of predictions (for instance, identifying when a predicted band gap or activity might be uncertain because a given material lies outside the training distribution). In summary, GNNs offer a powerful, structure-aware approach for learning complex relationships in photocatalyst materials data, making them ideal for high-throughput prediction tasks such as virtual screening of new semiconductor compositions or probing structure activity trends.

4.4. Sparse Gaussian Process Regression for Modelling and Uncertainty

Gaussian Process Regression (GPR) is rooted in Bayesian statistics. This ML framework inherently provides predictions with uncertainty bounds, an attractive feature for guiding photocatalyst discovery, where knowing the confidence of a prediction is crucial [96]. A key advantage of GPR is its strong performance in data-scarce regimes. The model complexity grows flexibly with the data, and it does not require as large a dataset to begin making reasonable predictions. Indeed, a Gaussian process model has far fewer explicit parameters than a deep neural network, which helps avoid overfitting on small datasets and yields interpretable uncertainty estimates [96]. This makes GPR (and its extensions) highly relevant when only limited high-quality data on photocatalysts are available (for example, a few dozen measured quantum yields or reaction rates).
However, classical GPR comes with a computational cost that scales as O N 3 , with the number of training points, N , which becomes a bottleneck for larger datasets. Sparse Gaussian Process Regression (SGPR) techniques mitigate this issue by introducing approximations that make GPR tractable for larger N values without sacrificing much accuracy. One common strategy is to use a set of M representative inducing points (with M   N ) that summarise the behaviour of the full dataset, thereby reducing complexity to O N M 2 . Another approach is the Bayesian Committee Machine (and its variants like the robust BCM), which partitions the dataset into subsets, trains a smaller GP model on each, and then combines their predictions in a principled Bayesian way. Such committee-based SGPR ensembles effectively perform a divide-and-conquer of the regression problem, enabling near-linear scaling with data and parallelisation of training. Recent research has demonstrated the power of these approaches in materials science: for instance, an SGPR-based “first-principles potential” was developed to model interatomic interactions across diverse systems (battery electrodes, solar cell materials, and catalysts) by training on quantum-mechanical calculations [97]. This work showed that a sparse GP, augmented with efficient rank-reduction techniques, can serve as a highly accurate surrogate for ab initio potential energy surfaces, enabling molecular dynamics and property predictions at a fraction of the direct DFT cost. Importantly, for photocatalysis, such GP-based potentials allow simulations of catalyst behaviour (surface reconstructions, reaction pathways, etc.) with quantified uncertainty, meaning the model can signal when a particular atomic configuration or reaction coordinate lies in an extrapolation regime (high uncertainty) and should be confirmed by higher-fidelity methods. Gaussian Process Regression has demonstrated exceptional predictive capabilities in photocatalysis, with recent studies achieving R 2 values of 0.992 when coupled with genetic algorithms and particle swarm optimisation for predicting photocatalytic dye degradation based on band gap, dye concentration, photocatalyst dosage, and degradation rate constants [98]. GPR models have shown remarkable precision in predicting energy band gaps of anatase TiO2 photocatalysts, achieving correlation coefficients of 99.99% using lattice parameters and surface area as input features [99]. The exploration of various GPR kernel functions, including Matern, Exponential, Squared Exponential, and Rational Quadratic kernels, has revealed that the Exponential kernel can achieve perfect R 2 values of 1.0 for photocatalytic degradation predictions [100]. Machine learning models constructed from extensive datasets of 971 entries have successfully predicted hydrogen evolution reaction rates using active photon flux as a unifying feature, demonstrating GPR’s versatility in correlating operational parameters with photocatalytic performance [101]. GPR’s strength is its capacity to quantify uncertainty alongside predictions, which is especially useful for experimental design and optimisation in photocatalytic systems, where limited data often hinders model development.

4.5. Machine Learning Interatomic Potentials (MLIPs)

MLIPs [102] represent a cutting-edge approach to modelling atomic interactions with near-quantum accuracy at a fraction of the computational cost. Traditional quantum–mechanical methods, such as DFT, are highly accurate but computationally expensive, limiting their use in large-scale simulations. MLIPs address this challenge by using ML models to approximate the potential energy surface (PES) of a system, enabling efficient and accurate predictions of energies and forces. The development of MLIPs begin with the generation of descriptors and numerical representations of atomic environments. These descriptors capture essential features, such as atomic positions and chemical identities, and serve as inputs to ML models like Gaussian Processes, Neural Networks, or Kernel Ridge Regression [103]. The trained MLIP can then predict the properties of new atomic configurations with high fidelity. Advanced techniques in MLIPs include active learning, which iteratively refines the training dataset by identifying configurations where the model is uncertain. This strategy ensures that the MLIP is trained on the most informative data points, reducing the need for extensive quantum–mechanical calculations. Additionally, hybrid MLIPs, which combine ML with physics-based models, improve transferability across diverse material systems.
The applications of MLIPs in materials science are vast [104,105,106,107]. They have recently been used in the prediction of photocatalyst materials; for instance, Allam et al. [108] delves into the challenges of optimising photocatalytic processes for water treatment, particularly in the presence of nontarget cosolutes such as natural organic matter. These substances can inhibit photocatalytic degradation, drastically reducing efficiency. The authors employed a combination of DFT and MLIPs to predict the inhibitory effects of small organic molecules during the TiO2 photocatalytic degradation of para-chlorobenzoic acid (pCBA). The study found that the spatial arrangement and electronic interactions of functional groups, rather than the functional group type alone, play a crucial role in influencing adsorption dynamics and inhibitory behaviour. By fine-tuning MLIPs derived from a larger dataset, the researchers were able to simulate adsorption behaviours over extended timescales, offering deeper insights into the dynamic interactions that govern photocatalytic reactions. This approach extended the capabilities of traditional ab initio molecular dynamics, enabling a more comprehensive understanding of molecular-level interactions. The authors concluded that MLIPs offer a robust framework for predicting the inhibitory effects of organic molecules, thus advancing photocatalytic technology for environmental applications.
Also, Hu et al. [109] investigated the band alignments at the CoO(100)–water and CoO(111)–water interfaces, which are crucial for photocatalytic water splitting. Cobalt monoxide (CoO) nanomaterials have attracted attention for their ability to perform water splitting without the need for an externally applied potential or co-catalyst. However, the precise roles of different CoO surfaces in the overall water splitting process were not fully understood. To address this, the authors employed a combination of ab initio molecular dynamics and ML-accelerated molecular dynamics simulations (MLIP). Their findings revealed that CoO(100) supports both the hydrogen evolution reaction (HER) and the oxygen evolution reaction (OER), while CoO(111) only facilitates HER. The intrinsic potential difference between the two surfaces promotes the migration of electrons toward CoO(100) and hole accumulation on CoO (111), which enhances the separation of photoexcited carriers and thus improves water splitting efficiency. By using MLIPs, the authors were able to speed up the simulation process and gain insights into the underlying mechanisms of photocatalysis. The study concluded that MLIP-based simulations could accelerate the understanding of complex surface interactions, offering a valuable tool for optimising photocatalytic materials for water splitting applications. By providing a balance between accuracy and efficiency, MLIPs are revolutionising computational materials science, making it possible to explore complex systems and phenomena that were previously computationally prohibitive.
Furthermore, the gradient-domain machine learning (GDML) approach is another paradigm of ML that combines accuracy with computational efficiency when modelling molecular systems. This trains models on force data derived from coupled cluster calculations, capturing detailed molecular interactions with high precision [110]. This method enables accurate simulations of molecular dynamics, crucial for understanding photocatalytic processes where small changes in the electronic and atomic effects affect the performance. Integrating ML with quantum chemistry can accelerate the design of efficient photocatalysts by enabling predictive modelling of reaction pathways and catalyst behaviour, bridging theory and experiment.

5. Strategies for a Combination of ML Methods for Accelerated Photocatalyst Discovery

To accelerate the discovery of photocatalytic and energy materials, developing an effective ML framework requires a strategic approach that spans multiple stages (Figure 5). The process begins with careful dataset preparation, followed by feature engineering, which ensures that relevant material properties and reaction dynamics are captured. The application of diverse ML methods, such as supervised, unsupervised, and reinforcement learning, enables researchers to tackle various challenges in material discovery, and photocatalytic materials are no exception [111]. To optimise resource utilisation and speed up the discovery process, advanced techniques like transfer learning, leveraging pre-trained models on related tasks and active learning, where the model actively selects the most informative data points for training, can significantly reduce computational costs and enhance model efficiency. Despite these advancements, several challenges remain. One such example is ensuring that model interpretability enables understanding of the decision-making process of ML models, especially when applied to complex material systems. Biases in training data and improving the scalability of models to handle larger datasets are also critical areas for improvement [28]. Future research should focus on integrating explainable AI (XAI) methods, which provide transparency and foster trust in ML models, making them more accessible for practical applications. Additionally, hybrid approaches that combine different learning paradigms, such as ensemble methods or multi-task learning, could offer improved performance by capturing a broader range of material behaviours and optimising multiple objectives simultaneously.

6. Future Perspective

The application of ML techniques across these studies has demonstrated transformative potential in predicting photocatalytic performance. Collectively, these studies emphasise the critical role of robust datasets, the careful selection of appropriate descriptors, and the integration of both experimental and computational data. ML methods have proven effective in predicting photocatalytic properties. These advancements highlight ML’s ability to accelerate materials discovery, reduce computational costs, and direct experimental efforts toward the development of next-generation perovskite materials. However, there is still significant room for further research, particularly in the use of neural networks for predicting material properties, an area that remains underexplored. Additionally, the application of LLMs and transformers presents a promising avenue for enhancing materials design for photocatalytic applications. However, translating ML or computational predictions into experimentally validated materials remains challenging. A key limitation lies in the mismatch between the ideal computational conditions and realistic experimental environments, such as variations in synthesis routes, surface morphology, and defect chemistry. Additionally, ML models trained solely on theoretical data often overlook thermodynamic stability, which are critical for experimental realisation. To bridge this gap, emerging frameworks now integrate active learning and closed-loop experimentation, where ML predictions guide synthesis, characterisation, and subsequent model refinement in an iterative manner [113]. The development of autonomous laboratories, combining high-throughput synthesis and in situ spectroscopy, further exemplifies this synergy. Nonetheless, ensuring data consistency across computational and experimental domains, as well as developing interpretable models that can reveal synthesis structure property relationships, remains an open challenge. Addressing these issues will be important to achieving robust, experimentally validated ML-driven discovery of next-generation photocatalysts. These emerging techniques could offer new insights and capabilities, potentially revolutionising the development of advanced materials for sustainable energy solutions.

Author Contributions

Conceptualisation, D.O.O., S.B.A., S.A.A. and A.A.; writing—original draft preparation, D.O.O., S.B.A., S.A.A., A.M.U. and A.A.; writing—review and editing, D.O.O., S.B.A., S.A.A., M.O.K., E.O., A.M.U., S.K.R. and A.A.; supervision, A.M.U. and A.A.; project administration, A.A.; funding acquisition, D.O.O., A.M.U. and A.A. All authors have read and agreed to the published version of the manuscript.

Funding

The authors acknowledge the funding support of Research Ireland granted to DOO with Project ID GOIPD/2021/28. SAA acknowledges Atlantic Technological University, Sligo, President Bursary Award for financial assistance. SBA appreciates the Modelling & Computation for Health and Society (MOCHAS) Group of Atlantic Technological University, Sligo, Ireland for financial support.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors acknowledge the use of Grammarly for grammar correction. All content modified through this tool was critically reviewed and edited by the authors. The authors accept full responsibility for the integrity, accuracy, and originality of the final manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ninduwezuor-Ehiobu, N.; Tula, O.A.; Daraojimba, C.; Ofonagoro, K.A.; Ogunjobi, O.A.; Gidiagba, J.O.; Egbokhaebho, B.A.; Banso, A.A. Exploring innovative material integration in modern manufacturing for advancing U.S. competitiveness in sustainable global economy. Eng. Sci. Technol. J. 2023, 4, 140–168. [Google Scholar] [CrossRef]
  2. Callister, W.D., Jr.; Rethwisch, D.G. Materials Science and Engineering: An Introduction; John Wiley & Sons: Hoboken, NJ, USA, 2020; ISBN 978-1-119-72177-2. [Google Scholar]
  3. Askari, N.; Jamalzadeh, M.; Askari, A.; Liu, N.; Samali, B.; Sillanpaa, M.; Sheppard, L.; Li, H.; Dewil, R. Unveiling the Photocatalytic Marvels: Recent Advances in Solar Heterojunctions for Environmental Remediation and Energy Harvesting. J. Environ. Sci. 2025, 148, 283–297. [Google Scholar] [CrossRef]
  4. Liu, J.; Li, S.; Dewil, R.; Vanierschot, M.; Baeyens, J.; Deng, Y. Water Splitting by MnOx/Na2CO3 Reversible Redox Reactions. Sustainability 2022, 14, 7597. [Google Scholar] [CrossRef]
  5. Wang, Q.; Pornrungroj, C.; Linley, S.; Reisner, E. Strategies to Improve Light Utilization in Solar Fuel Synthesis. Nat. Energy 2022, 7, 13–24. [Google Scholar] [CrossRef]
  6. Ganguly, P.; Mathew, S.; Clarizia, L.; Kumar, S.R.; Akande, A.; Hinder, S.; Breen, A.; Pillai, S.C. Theoretical and Experimental Investigation of Visible Light Responsive AgBiS2-TiO2 Heterojunctions for Enhanced Photocatalytic Applications. Appl. Catal. B Environ. 2019, 253, 401–418. [Google Scholar] [CrossRef]
  7. Chen, X.; Shen, S.; Guo, L.; Mao, S.S. Semiconductor-Based Photocatalytic Hydrogen Generation. Chem. Rev. 2010, 110, 6503–6570. [Google Scholar] [CrossRef]
  8. Xu, Y.; Schoonen, M.A.A. The Absolute Energy Positions of Conduction and Valence Bands of Selected Semiconducting Minerals. Am. Mineral. 2000, 85, 543–556. [Google Scholar] [CrossRef]
  9. Kudo, A.; Miseki, Y. Heterogeneous Photocatalyst Materials for Water Splitting. Chem. Soc. Rev. 2009, 38, 253–278. [Google Scholar] [CrossRef]
  10. Zhao, Y.; Chen, G.; Bian, T.; Zhou, C.; Waterhouse, G.I.N.; Wu, L.-Z.; Tung, C.-H.; Smith, L.J.; O’Hare, D.; Zhang, T. Defect-Rich Ultrathin ZnAl-Layered Double Hydroxide Nanosheets for Efficient Photoreduction of CO2 to CO with Water. Adv. Mater. 2015, 27, 7824–7831. [Google Scholar] [CrossRef] [PubMed]
  11. Montoya, J.H.; Seitz, L.C.; Chakthranont, P.; Vojvodic, A.; Jaramillo, T.F.; Nørskov, J.K. Materials for Solar Fuels and Chemicals. Nat. Mater. 2017, 16, 70–81. [Google Scholar] [CrossRef]
  12. Peterson, A.A.; Abild-Pedersen, F.; Studt, F.; Rossmeisl, J.; Nørskov, J.K. How Copper Catalyzes the Electroreduction of Carbon Dioxide into Hydrocarbon Fuels. Energy Environ. Sci. 2010, 3, 1311–1315. [Google Scholar] [CrossRef]
  13. Paumo, H.K.; Dalhatou, S.; Katata-Seru, L.M.; Kamdem, B.P.; Tijani, J.O.; Vishwanathan, V.; Kane, A.; Bahadur, I. TiO2 Assisted Photocatalysts for Degradation of Emerging Organic Pollutants in Water and Wastewater. J. Mol. Liq. 2021, 331, 115458. [Google Scholar] [CrossRef]
  14. Almaie, S.; Vatanpour, V.; Rasoulifard, M.H.; Koyuncu, I. Volatile Organic Compounds (VOCs) Removal by Photocatalysts: A Review. Chemosphere 2022, 306, 135655. [Google Scholar] [CrossRef]
  15. Fermoso, J.; Sánchez, B.; Suarez, S. 5—Air Purification Applications Using Photocatalysis. In Nanostructured Photocatalysts; Boukherroub, R., Ogale, S.B., Robertson, N., Eds.; Micro and Nano Technologies; Elsevier: Amsterdam, The Netherlands, 2020; pp. 99–128. ISBN 978-0-12-817836-2. [Google Scholar]
  16. Peiris, S.; de Silva, H.B.; Ranasinghe, K.N.; Bandara, S.V.; Perera, I.R. Recent Development and Future Prospects of TiO Photocatalysis. J. Chin. Chem. Soc. 2021, 68, 738–769. [Google Scholar] [CrossRef]
  17. Schanze, K.S.; Kamat, P.V.; Yang, P.; Bisquert, J. Progress in Perovskite Photocatalysis. ACS Energy Lett. 2020, 5, 2602–2604. [Google Scholar] [CrossRef]
  18. Guo, J.; Li, X.; Liang, J.; Yuan, X.; Jiang, L.; Yu, H.; Sun, H.; Zhu, Z.; Ye, S.; Tang, N.; et al. Fabrication and Regulation of Vacancy-Mediated Bismuth Oxyhalide towards Photocatalytic Application: Development Status and Tendency. Coord. Chem. Rev. 2021, 443, 214033. [Google Scholar] [CrossRef]
  19. Gusarov, S. Advances in Computational Methods for Modeling Photocatalytic Reactions: A Review of Recent Developments. Materials 2024, 17, 2119. [Google Scholar] [CrossRef]
  20. Schleder, G.R.; Padilha, A.C.M.; Acosta, C.M.; Costa, M.; Fazzio, A. From DFT to Machine Learning: Recent Approaches to Materials Science—A Review. J. Phys. Mater. 2019, 2, 032001. [Google Scholar] [CrossRef]
  21. Kang, J.; Zhang, X.; Wei, S.-H. Advances and Challenges in DFT-Based Energy Materials Design. Chin. Phys. B 2022, 31, 107105. [Google Scholar] [CrossRef]
  22. Wei, J.; Chu, X.; Sun, X.-Y.; Xu, K.; Deng, H.-X.; Chen, J.; Wei, Z.; Lei, M. Machine Learning in Materials Science. InfoMat 2019, 1, 338–358. [Google Scholar] [CrossRef]
  23. Tunala, S.; Zhai, S.; Wu, F.; Chen, Y.-H. Machine Learning in Photocatalysis: Accelerating Design, Understanding, and Environmental Applications. Sci. China Chem. 2025, 68, 3415–3428. [Google Scholar] [CrossRef]
  24. Rehman, A.; Iqbal, M.A.; Haider, M.T.; Majeed, A.; Rehman, A.; Iqbal, M.A.; Haider, M.T.; Majeed, A. Artificial Intelligence-Guided Supervised Learning Models for Photocatalysis in Wastewater Treatment. AI 2025, 6, 258. [Google Scholar] [CrossRef]
  25. Kumar, R.; Singh, A.K. Chemical Hardness-Driven Interpretable Machine Learning Approach for Rapid Search of Photocatalysts. npj Comput. Mater. 2021, 7, 197. [Google Scholar] [CrossRef]
  26. Chen, S.; Huang, D.; Xu, P.; Xue, W.; Lei, L.; Cheng, M.; Wang, R.; Liu, X.; Deng, R. Semiconductor-Based Photocatalysts for Photocatalytic and Photoelectrochemical Water Splitting: Will We Stop with Photocorrosion? J. Mater. Chem. A 2020, 8, 2286–2322. [Google Scholar] [CrossRef]
  27. Zhong, M.; Tran, K.; Min, Y.; Wang, C.; Wang, Z.; Dinh, C.-T.; Luna, P.D.; Yu, Z.; Rasouli, A.S.; Brodersen, P.; et al. Accelerated Discovery of CO_2 Electrocatalysts Using Active Machine Learning. Nature 2020, 581, 178–183. [Google Scholar] [CrossRef]
  28. Ge, L.; Ke, Y.; Li, X. Machine Learning Integrated Photocatalysis: Progress and Challenges. Chem. Commun. 2023, 59, 5795–5806. [Google Scholar] [CrossRef]
  29. Raccuglia, P.; Elbert, K.C.; Adler, P.D.F.; Falk, C.; Wenny, M.B.; Mollo, A.; Zeller, M.; Friedler, S.A.; Schrier, J.; Norquist, A.J. Machine-Learning-Assisted Materials Discovery Using Failed Experiments. Nature 2016, 533, 73–76. [Google Scholar] [CrossRef] [PubMed]
  30. Liu, X.; Li, Y.; Zhang, X.; Zhao, Y.-M.; Wang, X.; Zhou, J.; Shen, J.; Zhou, M.; Shen, L. High-Throughput Computation and Machine Learning Screening of van Der Waals Heterostructures for Z-Scheme Photocatalysis. J. Mater. Chem. A 2025, 13, 5649–5660. [Google Scholar] [CrossRef]
  31. Sabagh Moeini, A.; Shariatmadar Tehrani, F.; Naeimi-Sadigh, A. Machine Learning-Enhanced Band Gaps Prediction for Low-Symmetry Double and Layered Perovskites. Sci. Rep. 2024, 14, 26736. [Google Scholar] [CrossRef] [PubMed]
  32. Singh, A.N.; Anand, R.; Zafari, M.; Ha, M.; Kim, K.S. Progress in Single/Multi Atoms and 2D-Nanomaterials for Electro/Photocatalytic Nitrogen Reduction: Experimental, Computational and Machine Leaning Developments. Adv. Energy Mater. 2024, 14, 2304106. [Google Scholar] [CrossRef]
  33. Sbailò, L.; Fekete, Á.; Ghiringhelli, L.M.; Scheffler, M. The NOMAD Artificial-Intelligence Toolkit: Turning Materials-Science Data into Knowledge and Understanding. npj Comput. Mater. 2022, 8, 250. [Google Scholar] [CrossRef]
  34. Jain, A.; Ong, S.P.; Hautier, G.; Chen, W.; Richards, W.D.; Dacek, S.; Cholia, S.; Gunter, D.; Skinner, D.; Ceder, G. Commentary: The Materials Project: A Materials Genome Approach to Accelerating Materials Innovation. APL Mater. 2013, 1, 011002. [Google Scholar] [CrossRef]
  35. Wang, T.; Zhang, K.; Thé, J.; Yu, H. Accurate Prediction of Band Gap of Materials Using Stacking Machine Learning Model. Comput. Mater. Sci. 2022, 201, 110899. [Google Scholar] [CrossRef]
  36. Pilania, G.; Balachandran, P.V.; Gubernatis, J.E.; Lookman, T. Learning with Large Databases. In Data-Based Methods for Materials Design and Discovery: Basic Ideas and General Methods; Pilania, G., Balachandran, P.V., Gubernatis, J.E., Lookman, T., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 59–86. ISBN 978-3-031-02383-5. [Google Scholar]
  37. Takanabe, K. Photocatalytic Water Splitting: Quantitative Approaches toward Photocatalyst by Design. ACS Catal. 2017, 7, 8006–8022. [Google Scholar] [CrossRef]
  38. Cai, J.; Chu, X.; Xu, K.; Li, H.; Wei, J. Machine Learning-Driven New Material Discovery. Nanoscale Adv. 2020, 2, 3115–3130. [Google Scholar] [CrossRef] [PubMed]
  39. Jacobsson, T.J.; Hultqvist, A.; García-Fernández, A.; Anand, A.; Al-Ashouri, A.; Hagfeldt, A.; Crovetto, A.; Abate, A.; Ric-ciardulli, A.G.; Vijayan, A.; et al. An Open-Access Database and Analysis Tool for Perovskite Solar Cells Based on the FAIR Data Principles. Nat. Energy 2022, 7, 107–115. [Google Scholar] [CrossRef]
  40. Shon, Y.J.; Min, K. Extracting Chemical Information from Scientific Literature Using Text Mining: Building an Ionic Conductivity Database for Solid-State Electrolytes. ACS Omega 2023, 8, 18122–18127. [Google Scholar] [CrossRef]
  41. Mannodi-Kanakkithodi, A.; Chan, M.K.Y. Data-Driven Design of Novel Halide Perovskite Alloys. Energy Environ. Sci. 2022, 15, 1930–1949. [Google Scholar] [CrossRef]
  42. Cheng, G.; Gong, X.G.; Yin, W.J. Crystal Structure Prediction by Combining Graph Network and Optimization Algorithm. Nat. Commun. 2022, 13, 1492. [Google Scholar] [CrossRef]
  43. Deekshith, A. Data Engineering for AI: Optimizing Data Quality and Accessibility for Machine Learning Models. Int. J. Manag. Educ. Sustain. Dev. 2021, 4, 1–33. [Google Scholar]
  44. Ma, B.; Wei, X.; Liu, C.; Ban, X.; Huang, H.; Wang, H.; Xue, W.; Wu, S.; Gao, M.; Shen, Q.; et al. Data Augmentation in Microscopic Images for Material Data Mining. npj Comput. Mater. 2020, 6, 125. [Google Scholar] [CrossRef]
  45. Damewood, J.; Karaguesian, J.; Lunger, J.R.; Tan, A.R.; Xie, M.; Peng, J.; Gómez-Bombarelli, R. Representations of Materials for Machine Learning. Annu. Rev. Mater. Res. 2023, 53, 399–426. [Google Scholar] [CrossRef]
  46. Li, S.; Liu, Y.; Chen, D.; Jiang, Y.; Nie, Z.; Pan, F. Encoding the Atomic Structure for Machine Learning in Materials Science. WIREs Comput. Mol. Sci. 2022, 12, e1558. [Google Scholar] [CrossRef]
  47. Fitriani, S.A.; Astuti, Y.; Wulandari, I.R. Least Absolute Shrinkage and Selection Operator (LASSO) and k-Nearest Neighbors (k-NN) Algorithm Analysis Based on Feature Selection for Diamond Price Prediction. In Proceedings of the 2021 International Seminar on Machine Learning, Optimization, and Data Science (ISMODE), Jakarta, Indonesia, 29–30 January 2022; pp. 135–139. [Google Scholar]
  48. Hasan, B.M.S.; Abdulazeez, A.M. A Review of Principal Component Analysis Algorithm for Dimensionality Reduction. J. Soft Comput. Data Min. 2021, 2, 20–30. [Google Scholar] [CrossRef]
  49. Silva, R.; Melo-Pinto, P. T-SNE: A Study on Reducing the Dimensionality of Hyperspectral Data for the Regression Problem of Estimating Oenological Parameters. Artif. Intell. Agric. 2023, 7, 58–68. [Google Scholar] [CrossRef]
  50. Zebari, R.; Abdulazeez, A.; Zeebaree, D.; Zebari, D.; Saeed, J. A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction. J. Appl. Sci. Technol. Trends 2020, 1, 56–70. [Google Scholar] [CrossRef]
  51. Li, H.; Chutatape, O. Automated Feature Extraction in Color Retinal Images by a Model Based Approach. IEEE Trans. Biomed. Eng. 2004, 51, 246–254. [Google Scholar] [CrossRef]
  52. Bischl, B.; Binder, M.; Lang, M.; Pielok, T.; Richter, J.; Coors, S.; Thomas, J.; Ullmann, T.; Becker, M.; Boulesteix, A.-L.; et al. Hyperparameter Optimization: Foundations, Algorithms, Best Practices, and Open Challenges. WIREs Data Min. Knowl. Discov. 2023, 13, e1484. Available online: https://wires.onlinelibrary.wiley.com/doi/full/10.1002/widm.1484 (accessed on 1 January 2025). [CrossRef]
  53. Yu, T.; Zhu, H. Hyper-Parameter Optimization: A Review of Algorithms and Applications. arXiv 2020, arXiv:2003.05689. [Google Scholar] [CrossRef]
  54. Ali, Y.A.; Awwad, E.M.; Al-Razgan, M.; Maarouf, A. Hyperparameter Search for Machine Learning Algorithms for Optimizing the Computational Complexity. Processes 2023, 11, 349. [Google Scholar] [CrossRef]
  55. Jiang, T.; Gradus, J.L.; Rosellini, A.J. Supervised Machine Learning: A Brief Primer. Behav. Ther. 2020, 51, 675–687. [Google Scholar] [CrossRef]
  56. Gentleman, R.; Carey, V.J. Unsupervised Machine Learning. In Bioconductor Case Studies; Hahne, F., Huber, W., Gentleman, R., Falcon, S., Eds.; Springer: New York, NY, USA, 2008; pp. 137–157. ISBN 978-0-387-77240-0. [Google Scholar]
  57. Taha, K. Semi-Supervised and Un-Supervised Clustering: A Review and Experimental Evaluation. Inf. Syst. 2023, 114, 102178. [Google Scholar] [CrossRef]
  58. Shakya, A.K.; Pillai, G.; Chakrabarty, S. Reinforcement Learning Algorithms: A Brief Survey. Expert Syst. Appl. 2023, 231, 120495. [Google Scholar] [CrossRef]
  59. Dulac-Arnold, G.; Levine, N.; Mankowitz, D.J.; Li, J.; Paduraru, C.; Gowal, S.; Hester, T. Challenges of Real-World Reinforcement Learning: Definitions, Benchmarks and Analysis. Mach. Learn. 2021, 110, 2419–2468. [Google Scholar] [CrossRef]
  60. Pan, S.J. Transfer Learning. In Data Classification; Chapman and Hall/CRC: Boca Raton, FL, USA, 2014; ISBN 978-0-429-10263-9. [Google Scholar]
  61. Niu, S.; Liu, Y.; Wang, J.; Song, H. A Decade Survey of Transfer Learning (2010–2020). IEEE Trans. Artif. Intell. 2020, 1, 151–166. [Google Scholar] [CrossRef]
  62. Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A Comprehensive Survey on Transfer Learning. Proc. IEEE 2021, 109, 43–76. [Google Scholar] [CrossRef]
  63. Yannier, N.; Hudson, S.E.; Koedinger, K.R.; Hirsh-Pasek, K.; Golinkoff, R.M.; Munakata, Y.; Brownell, S.E. Active Learning: “Hands-on” Meets “Minds-On”. Science 2021, 374, 26–30. [Google Scholar] [CrossRef]
  64. Novikov, I.S.; Gubaev, K.; Podryabinkin, E.V.; Shapeev, A.V. The MLIP Package: Moment Tensor Potentials with MPI and Active Learning. Mach. Learn. Sci. Technol. 2020, 2, 025002. [Google Scholar] [CrossRef]
  65. Miller, B.; Linder, F.; Mebane, W.R., Jr. Active Learning Approaches for Labeling Text: Review and Assessment of the Performance of Active Learning Approaches. Political Anal. 2020, 28, 532–551. [Google Scholar] [CrossRef]
  66. Zounemat-Kermani, M.; Batelaan, O.; Fadaee, M.; Hinkelmann, R. Ensemble Machine Learning Paradigms in Hydrology: A Review. J. Hydrol. 2021, 598, 126266. [Google Scholar] [CrossRef]
  67. Syam, N.; Kaul, R. Random Forest, Bagging, and Boosting of Decision Trees. In Machine Learning and Artificial Intelligence in Marketing and Sales: Essential Reference for Practitioners and Data Scientists; Emerald Publishing Limited: Leeds, UK, 2021; pp. 139–182. [Google Scholar]
  68. Okafor, E.; Obada, D.O.; Dodoo-Arhin, D. Ensemble Learning Prediction of Transmittance at Different Wavenumbers in Natural Hydroxyapatite. Sci. Afr. 2020, 9, e00516. [Google Scholar] [CrossRef]
  69. Moharramzadeh Goliaei, E. Photocatalytic Efficiency for CO2 Reduction of Co and Cluster Co2O2 Supported on G-C3N4: A Density Functional Theory and Machine Learning Study. Langmuir 2024, 40, 7871–7882. [Google Scholar] [CrossRef] [PubMed]
  70. Xu, J.; Wang, Q.; Yuan, Q.; Chen, H.; Wang, S.; Fan, Y. Machine Learning Predictions of Band Gap and Band Edge for (GaN)1−x(ZnO)x Solid Solution Using Crystal Structure Information. J. Mater. Sci. 2023, 58, 7986–7994. [Google Scholar] [CrossRef]
  71. Cheng, S.; Sun, Z.; Lim, K.H.; Gani, T.Z.H.; Zhang, T.; Wang, Y.; Yin, H.; Liu, K.; Guo, H.; Du, T.; et al. Emerging Strategies for CO2 Photoreduction to CH4: From Experimental to Data-Driven Design. Adv. Energy Mater. 2022, 12, 2200389. [Google Scholar] [CrossRef]
  72. Yan, L.; Zhong, S.; Igou, T.; Gao, H.; Li, J.; Chen, Y. Development of Machine Learning Models to Enhance Element-Doped g-C3N4 Photocatalyst for Hydrogen Production through Splitting Water. Int. J. Hydrogen Energy 2022, 47, 34075–34089. [Google Scholar] [CrossRef]
  73. Liu, Y.; Wang, X.; Zhao, Y.; Wu, Q.; Nie, H.; Si, H.; Huang, H.; Liu, Y.; Shao, M.; Kang, Z. Highly Efficient Metal-Free Catalyst from Cellulose for Hydrogen Peroxide Photoproduction Instructed by Machine Learning and Transient Photovoltage Technology. Nano Res. 2022, 15, 4000–4007. [Google Scholar] [CrossRef]
  74. Arabacı, B.; Bakır, R.; Orak, C.; Yüksel, A. Integrating Experimental and Machine Learning Approaches for Predictive Analysis of Photocatalytic Hydrogen Evolution Using Cu/g-C3N4. Renew. Energy 2024, 237, 121737. [Google Scholar] [CrossRef]
  75. Zong, J.; He, C.; Zhang, W.; Bai, M. Transition Metals Anchored on Two-Dimensional p-BN Support with Center-Coordination Scaling Relationship Descriptor for Spontaneous Visible-Light-Driven Photocatalytic Nitrogen Reduction. J. Colloid Interface Sci. 2023, 652, 878–889. [Google Scholar] [CrossRef]
  76. Zhao, Z.; Shen, Y.; Zhu, H.; Zhang, Q.; Zhang, Y.; Yang, X.; Liang, P.; Chen, L. Prediction Model of Type and Band Gap for Photocatalytic g-GaN-Based van Der Waals Heterojunction of Density Functional Theory and Machine Learning Techniques. Appl. Surf. Sci. 2023, 640, 158400. [Google Scholar] [CrossRef]
  77. Biswas, M.; Desai, R.; Mannodi-Kanakkithodi, A. Screening of Novel Halide Perovskites for Photocatalytic Water Splitting Using Multi-Fidelity Machine Learning. Phys. Chem. Chem. Phys. 2024, 26, 23177–23188. [Google Scholar] [CrossRef]
  78. Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 6999–7019. [Google Scholar] [CrossRef]
  79. Thakur, A.; Konde, A. Fundamentals of Neural Networks. Int. J. Res. Appl. Sci. Eng. Technol. 2021, 9, 407–426. [Google Scholar] [CrossRef]
  80. Said, A.M.; Ibrahim, F.S. Comparative Study of Segmentation Techniques for Detection of Tumors Based on MRI Brain Images. Int. J. Biosci. Biochem. Bioinform. 2017, 8, 1–10. [Google Scholar] [CrossRef]
  81. Okasha, N.M.; Mirrashid, M.; Naderpour, H.; Ciftcioglu, A.O.; Meddage, D.P.P.; Ezami, N. Machine Learning Approach to Predict the Mechanical Properties of Cementitious Materials Containing Carbon Nanotubes. Dev. Built Environ. 2024, 19, 100494. [Google Scholar] [CrossRef]
  82. Obada, D.O.; Okafor, E.; Abolade, S.A.; Ukpong, A.M.; Dodoo-Arhin, D.; Akande, A. Explainable Machine Learning for Predicting the Band Gaps of ABX_3 Perovskites. Mater. Sci. Semicond. Process. 2023, 161, 107427. [Google Scholar] [CrossRef]
  83. Saidi, W.A.; Shadid, W.; Castelli, I.E. Machine-Learning Structural and Electronic Properties of Metal Halide Perovskites Using a Hierarchical Convolutional Neural Network. npj Comput. Mater. 2020, 6, 36. [Google Scholar] [CrossRef]
  84. Jiang, J.; Chen, M.; Fan, J.A. Deep Neural Networks for the Evaluation and Design of Photonic Devices. Nat. Rev. Mater. 2021, 6, 679–700. [Google Scholar] [CrossRef]
  85. Ketkar, N.; Moolayil, J. Convolutional Neural Networks. In Deep Learning with Python: Learn Best Practices of Deep Learning Models with PyTorch; Ketkar, N., Moolayil, J., Eds.; Apress: Berkeley, CA, USA, 2021; pp. 197–242. ISBN 978-1-4842-5364-9. [Google Scholar]
  86. Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph Neural Networks: A Review of Methods and Applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
  87. Tao, Q.; Lu, T.; Sheng, Y.; Li, L.; Lu, W.; Li, M. Machine Learning Aided Design of Perovskite Oxide Materials for Photocatalytic Water Splitting. J. Energy Chem. 2021, 60, 351–359. [Google Scholar] [CrossRef]
  88. Ding, R.; Chen, J.; Chen, Y.; Liu, J.; Bando, Y.; Wang, X. Unlocking the Potential: Machine Learning Applications in Electrocatalyst Design for Electrochemical Hydrogen Energy Transformation. Chem. Soc. Rev. 2024, 53, 11390–11461. [Google Scholar] [CrossRef]
  89. Pandey, S.; Qu, J.; Stevanović, V.; John, P.S.; Gorai, P. Predicting Energy and Stability of Known and Hypothetical Crystals Using Graph Neural Network. Patterns 2021, 2, 100361. [Google Scholar] [CrossRef] [PubMed]
  90. Zhang, Z. Automated Graph Neural Networks Accelerate the Screening of Optoelectronic Properties of Metal–Organic Frameworks. J. Phys. Chem. Lett. 2023, 14, 1239–1245. [Google Scholar] [CrossRef] [PubMed]
  91. Haghshenas, Y.; Wong, W.P.; Sethu, V.; Amal, R.; Kumar, P.V.; Teoh, W.Y. Full Prediction of Band Potentials in Semiconductor Materials. Mater. Today Phys. 2024, 46, 101519. [Google Scholar] [CrossRef]
  92. Solout, M.V.; Ghasemi, J.B. Predicting Photodegradation Rate Constants of Water Pollutants on TiO2 Using Graph Neural Network and Combined Experimental-Graph Features. Sci. Rep. 2025, 15, 19156. [Google Scholar] [CrossRef]
  93. Chen, Z.; Hu, W.-J.; Xu, H.-K.; Xu, X.-F.; Chen, X.-Y.; Chen, Z.; Hu, W.-J.; Xu, H.-K.; Xu, X.-F.; Chen, X.-Y. Multi-Task Regression Model for Predicting Photocatalytic Performance of Inorganic Materials. Catalysts 2025, 15, 681. [Google Scholar] [CrossRef]
  94. Belkhode, P.N.; Awatade, S.M.; Prakash, C.; Shelare, S.D.; Marghade, D.; Gajghate, S.S.; Noor, M.M.; Dennison, M.S. An Integrated AI-Driven Framework for Maximizing the Efficiency of Heterostructured Nanomaterials in Photocatalytic Hydrogen Production. Sci. Rep. 2025, 15, 24936. [Google Scholar] [CrossRef]
  95. Choi, J.Y.; Zhang, P.; Mehta, K.; Blanchard, A.; Lupo Pasini, M. Scalable Training of Graph Convolutional Neural Networks for Fast and Accurate Predictions of HOMO-LUMO Gap in Molecules. J. Cheminform. 2022, 14, 70. [Google Scholar] [CrossRef]
  96. Zhang, Y.; Yang, X.; Zhang, C.; Zhang, Z.; Su, A.; She, Y.-B.; Zhang, Y.; Yang, X.; Zhang, C.; Zhang, Z.; et al. Exploring Bayesian Optimization for Photocatalytic Reduction of CO2. Processes 2023, 11, 2614. [Google Scholar] [CrossRef]
  97. Willow, S.Y.; Hajibabaei, A.; Ha, M.; Yang, D.C.; Myung, C.W.; Min, S.K.; Lee, G.; Kim, K.S. Sparse Gaussian Process Based Machine Learning First Principles Potentials for Materials Simulations: Application to Batteries, Solar Cells, Catalysts, and Macromolecular Systems. Chem. Phys. Rev. 2024, 5, 041307. [Google Scholar] [CrossRef]
  98. Ali, H.; Yasir, M.; Haq, H.U.; Guler, A.C.; Masar, M.; Khan, M.N.A.; Machovsky, M.; Sedlarik, V.; Kuritka, I. Machine Learning Approach for Photocatalysis: An Experimentally Validated Case Study of Photocatalytic Dye Degradation. J. Environ. Manag. 2025, 386, 125683. [Google Scholar] [CrossRef]
  99. Zhang, Y.; Xu, X. Machine Learning Band Gaps of Doped-TiO2 Photocatalysts from Structural and Morphological Parameters. ACS Omega 2020, 5, 15344–15352. [Google Scholar] [CrossRef]
  100. Salahshoori, I.; Yazdanbakhsh, A.; Baghban, A. Machine Learning-Powered Estimation of Malachite Green Photocatalytic Degradation with NML-BiFeO3 Composites. Sci. Rep. 2024, 14, 8676. [Google Scholar] [CrossRef] [PubMed]
  101. Haghshenas, Y.; Ping Wong, W.; Gunawan, D.; Khataee, A.; Keyikoğlu, R.; Razmjou, A.; Vijaya Kumar, P.; Ying Toe, C.; Masood, H.; Amal, R.; et al. Predicting the Rates of Photocatalytic Hydrogen Evolution over Cocatalyst-Deposited TiO2 Using Machine Learning with Active Photon Flux as a Unifying Feature. EES Catal. 2024, 2, 612–623. [Google Scholar] [CrossRef]
  102. Anstine, D.M.; Isayev, O. Machine Learning Interatomic Potentials and Long-Range Physics. J. Phys. Chem. A 2023, 127, 2417–2431. [Google Scholar] [CrossRef]
  103. Kulichenko, M.; Nebgen, B.; Lubbers, N.; Smith, J.S.; Barros, K.; Allen, A.E.A.; Habib, A.; Shinkle, E.; Fedik, N.; Li, Y.W.; et al. Data Generation for Machine Learning Interatomic Potentials and Beyond. Chem. Rev. 2024, 124, 13681–13714. [Google Scholar] [CrossRef]
  104. Wan, K.; He, J.; Shi, X. Construction of High Accuracy Machine Learning Interatomic Potential for Surface/Interface of Nanomaterials—A Review. Adv. Mater. 2024, 36, 2305758. [Google Scholar] [CrossRef]
  105. Shayestehpour, O.; Zahn, S. Efficient Molecular Dynamics Simulations of Deep Eutectic Solvents with First-Principles Accuracy Using Machine Learning Interatomic Potentials. J. Chem. Theory Comput. 2023, 19, 8732–8742. [Google Scholar] [CrossRef]
  106. Mortazavi, B.; Zhuang, X.; Rabczuk, T.; Shapeev, A.V. Atomistic Modeling of the Mechanical Properties: The Rise of Machine Learning Interatomic Potentials. Mater. Horiz. 2023, 10, 1956–1968. [Google Scholar] [CrossRef] [PubMed]
  107. Zhu, Y.; Dong, E.; Yang, H.; Xi, L.; Yang, J.; Zhang, W. Atomic Potential Energy Uncertainty in Machine-Learning Interatomic Potentials and Thermal Transport in Solids with Atomic Diffusion. Phys. Rev. B 2023, 108, 014108. [Google Scholar] [CrossRef]
  108. Allam, O.; Maghsoodi, M.; Jang, S.S.; Snow, S.D. Unveiling Competitive Adsorption in TiO2 Photocatalysis through Machine-Learning-Accelerated Molecular Dynamics, DFT, and Experimental Methods. ACS Appl. Mater. Interfaces 2024, 16, 36215–36223. [Google Scholar] [CrossRef] [PubMed]
  109. Hu, J.-Y.; Zhuang, Y.-B.; Cheng, J. Band Alignment of CoO(100)–Water and CoO(111)–Water Interfaces Accelerated by Machine Learning Potentials. J. Chem. Phys. 2024, 161, 134110. [Google Scholar] [CrossRef] [PubMed]
  110. Sauceda, H.E.; Chmiela, S.; Poltavsky, I.; Müller, K.-R.; Tkatchenko, A. Molecular Force Fields with Gradient-Domain Machine Learning: Construction and Application to Dynamics of Small Molecules with Coupled Cluster Forces. J. Chem. Phys. 2019, 150, 114102. [Google Scholar] [CrossRef] [PubMed]
  111. Jaison, A.; Mohan, A.; Lee, Y.-C. Machine Learning-Enhanced Photocatalysis for Environmental Sustainability: Integration and Applications. Mater. Sci. Eng. R Rep. 2024, 161, 100880. [Google Scholar] [CrossRef]
  112. Yang, X.; Zhou, K.; He, X.; Zhang, L. Methods and Applications of Machine Learning in Computational Design of Optoelectronic Semiconductors. Sci. China Mater. 2024, 67, 1042–1081. [Google Scholar] [CrossRef]
  113. Chow, V.; Phan, R.C.-W.; Ngo, A.C.L.; Krishnasamy, G.; Chai, S.-P. Data-Driven Photocatalytic Degradation Activity Prediction with Gaussian Process. Process Saf. Environ. Prot. 2022, 161, 848–859. [Google Scholar] [CrossRef]
Figure 1. Schematic illustration demonstrating the impactful applications of photocatalysis.
Figure 1. Schematic illustration demonstrating the impactful applications of photocatalysis.
Crystals 15 01034 g001
Figure 2. The four science paradigms: empirical, theoretical, computational, and data-driven. Each paradigm both benefits from and contributes to the others. Reproduced with permission from [20]. IOP SCIENCE.
Figure 2. The four science paradigms: empirical, theoretical, computational, and data-driven. Each paradigm both benefits from and contributes to the others. Reproduced with permission from [20]. IOP SCIENCE.
Crystals 15 01034 g002
Figure 3. General workflow of ML.
Figure 3. General workflow of ML.
Crystals 15 01034 g003
Figure 4. Architecture of a simple neural network. Adapted from [80].
Figure 4. Architecture of a simple neural network. Adapted from [80].
Crystals 15 01034 g004
Figure 5. Different strategies, tasks, and models for machine learning. Reproduced with permission from [112]. SpringerNature.
Figure 5. Different strategies, tasks, and models for machine learning. Reproduced with permission from [112]. SpringerNature.
Crystals 15 01034 g005
Table 1. Computational materials databases with description, data size, link, and institution. Adapted from [38].
Table 1. Computational materials databases with description, data size, link, and institution. Adapted from [38].
DatabaseDescriptionData SizeLinkInstitution
AFLOWComputational database of materials3.5 mhttp://aflowlib.org
(accessed on 4 November 2025).
Duke University
Materials ProjectComputational database of materials154 khttps://materialsproject.org
(accessed on 4 November 2025).
U.S. Department of Energy
OQMDComputational database of materials1 mhttp://oqmd.org
(accessed on 4 November 2025).
Northwestern University
CSDDatabase of organic and inorganic materials searched from previous journal publications504 khttp://crystallography.net
(accessed on 4 November 2025).
University of Cambridge
NOMADNovel materials discovery project12 mhttps://nomad-lab.eu/prod/rae/gui/search
(accessed on 4 November 2025).
Humboldt-Universität zu Berlin
Materials CloudA platform for open computational science29 mhttps://www.materialscloud.org
(accessed on 4 November 2025).
École Polytechnique Fédérale de Lausanne
CEPHarvard clean energy project2 mhttp://cleanenergy.harvard.edu
(accessed on 4 November 2025).
Harvard University
OMDBAn electronic structure database for various organic and organometallic materials12.5 khttps://omdb.mathub.io
(accessed on 4 November 2025).
KTH Royal Institute of Technology and Stockholm University
PubChemAn open chemistry database for small molecules115 mhttps://pubchem.ncbi.nlm.nih.gov
(accessed on 4 November 2025).
National Institutes of Health (NIH)
NREL MatDBComputational materials database for renewable energy applications20 khttps://materials.nrel.gov
(accessed on 4 November 2025).
National Renewable Energy Laboratory
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Obada, D.O.; Akinpelu, S.B.; Abolade, S.A.; Kekung, M.O.; Okafor, E.; Kumar R, S.; Ukpong, A.M.; Akande, A. Machine Learning for Photocatalytic Materials Design and Discovery. Crystals 2025, 15, 1034. https://doi.org/10.3390/cryst15121034

AMA Style

Obada DO, Akinpelu SB, Abolade SA, Kekung MO, Okafor E, Kumar R S, Ukpong AM, Akande A. Machine Learning for Photocatalytic Materials Design and Discovery. Crystals. 2025; 15(12):1034. https://doi.org/10.3390/cryst15121034

Chicago/Turabian Style

Obada, David O., Shittu B. Akinpelu, Simeon A. Abolade, Mkpe O. Kekung, Emmanuel Okafor, Syam Kumar R, Aniekan M. Ukpong, and Akinlolu Akande. 2025. "Machine Learning for Photocatalytic Materials Design and Discovery" Crystals 15, no. 12: 1034. https://doi.org/10.3390/cryst15121034

APA Style

Obada, D. O., Akinpelu, S. B., Abolade, S. A., Kekung, M. O., Okafor, E., Kumar R, S., Ukpong, A. M., & Akande, A. (2025). Machine Learning for Photocatalytic Materials Design and Discovery. Crystals, 15(12), 1034. https://doi.org/10.3390/cryst15121034

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop