4.1. Prediction Accuracy
An excellent prediction performance was obtained by using the proposed machine learning method. Predictions without computational chemistry and using computational chemistry were both possible, which indicated that machine learning can be useful for understanding the antioxidant capacity. Machine learning exhibits considerable flexibility to provide prediction accuracy according to the user’s objectives by using various explanatory variables, which is critical for practical applications. Each bit of the Morgan fingerprint is represented by 0 and 1, respectively. Furthermore, the substructure from the central atom to at most two bonds ahead is used as the feature value. Therefore, the length of the conjugated system is not reflected, and counting the number of locations that have the same substructure is not feasible. Therefore, the prediction results using the Morgan fingerprint were less accurate than those using molecular descriptors. LASSO regression was an exception to this trend, but more variables can improve prediction accuracy. DNNs did not perform well on either dataset in this study. Thus, the simple DNN was not suitable for predicting the antioxidant capacity using a small amount of data.
Comparing the RMSE during cross-validation with the RMSE when evaluated on the test data, we determined that the machine learning models did not overfit. We also revealed that the Morgan fingerprint tends to overfit our models. Whether or not overfitting occurs depends on the size of the data, especially the number of features.
4.2. Importance Analysis
The results of the Morgan fingerprint revealed that conjugated systems and carbon atoms with double and single bonds are of particular importance. As the conjugated chain became longer, the absorption due to the π–π* transition shifted to the longer wavelength side [
37]. This result indicated that the HOMO–LUMO gap became narrower, which is consistent with the high importance of the HOMO and HOMO–LUMO gaps in molecular descriptor datasets.
The electronic energy transfer (EET) [
17] is expressed as Equation (1), in which the quencher transitions to the triplet state upon elimination of singlet oxygen and exhibits a rate constant close to the diffusion-rate-limiting rate of the quenching mechanism of singlet oxygen.
In this scheme, the narrow HOMO–LUMO gap implied the ease of energy exchange between the quencher and singlet oxygen. In the EET mechanism, an encounter complex is formed by singlet oxygen and an antioxidant in the singlet state, and energy transfer is proposed to occur through the term crossing of the complex [
38]. The larger HOMO value suggested that the antioxidant is more likely to approach the singlet oxygen, which is an electrophilic agent, thus promoting energy transfer. The reaction mechanism reported as a competitive reaction in this scheme is displayed in
Scheme 1 [
39]. The quencher and oxygen reacted to form a complex that underwent radicalization. Subsequently, oxygen chemical quenching occurred, or peroxides and carbonyl compounds were formed. Therefore, HOMO is expected to be used as an indicator of nucleophilicity from antioxidants to oxygen during chemical quenching.
In the quenching of singlet oxygen by phenols, two types of physical quenching reactions are known, namely, electrons are transferred between the aromatic ring and oxygen in the transition state but no oxygen is consumed, and chemical quenching, in which peroxides are formed [
40]. The EET mechanism was consistent with machine learning inference because the reaction rate of the EET mechanism is close to the diffusion rate.
As mentioned earlier, singlet-oxygen-scavenging activity is correlated with the length of the conjugated chain, the length of the carbon chain, and the absorption wavelength of the ground state. The Morgan fingerprint bit displayed in
Figure 3 is a critical indicator that can be explained using the EET mechanism as well as the HOMO and HOMO–LUMO gap because it can represent the structure of the conjugated system or a part of it. Because fingerprints can be used for machine learning, the substructure of the compound could be used as an alternative indicator to the HOMO.
Because SlogP_VSA2, SlogP_VSA4, SlogP_VSA6, and PEOE_VSA7 are critical when molecular descriptors are used as a dataset, atomic distribution for solubility, and partial charge were important in predicting antioxidant capacity. Reorganizing the dataset and examining new descriptors of electron density or polarity can improve the prediction performance and reveal electronic effects that are critical for studying antioxidants.
Analyzing the behavior of machine learning models by feature importance can explain prediction accuracy. Although feature importance is an ineffective measure for explaining causality, we interpreted it chemically by comparing it to previously known information. The process of testing hypotheses formulated by machine learning with computational chemistry and experiments is useful not only for efficiently evaluating properties that previously relied solely on experiments, such as antioxidant capacity, but also for verifying the validity of the evaluation.