Next Article in Journal
Jasmonic Acid Signals Involved in Valsa Canker Resistance Caused by C2H2-Type Transcription Factor PbeSTOP2 in Pyrus betulifolia
Previous Article in Journal
Extracellular Vesicular Proteins in Plasma from Patients with Cutaneous Lupus Correlate with Disease Activity
Previous Article in Special Issue
Reactive Sulfur Species and Protein Persulfidation: An Emerging Redox Axis in Human Health and Disease
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DPPH Measurement for Phenols and Prediction of Antioxidant Activity of Phenolic Compounds in Food

Department of Chemistry and Life Science, Yokohama National University, Hodogaya-ku, Yokohama 240-8501, Japan
*
Author to whom correspondence should be addressed.
Curr. Issues Mol. Biol. 2026, 48(1), 12; https://doi.org/10.3390/cimb48010012
Submission received: 14 November 2025 / Revised: 18 December 2025 / Accepted: 19 December 2025 / Published: 23 December 2025

Abstract

Consuming foods with high antioxidant capacity is considered beneficial to health, and predicting the antioxidant capacity of food components is important. In the 2,2-diphenyl-1-picrylhydrazyl (DPPH) assay, multiple reactions occur simultaneously, and because the experimental conditions are not standardized across studies, quantitative prediction of DPPH activity is difficult. In this study, we qualitatively and quantitatively predicted the DPPH activity of phenols in food using data obtained under unified experimental conditions and machine learning. We measured DPPH activity of 96 compounds to create a dataset comprising measurements of 274 compounds, including values previously reported by our laboratory. The classification model implemented using LightGBM showed high performance, achieving an accuracy of 0.88 and an F1 score of 0.86. The support vector regression model satisfied the Golbraikh–Tropsha criteria, with an R2test of 0.70, RMSEtest of 0.44, q2 of 0.61, and RMSEvalidation of 0.46. Furthermore, the chemical validity of the prediction was confirmed by comparing the results of the machine learning model with those of previous studies. This method provides a basis for the quantitative prediction of DPPH activity of numerous phenolic compounds in foods and is expected to contribute to the elucidation of the antioxidant capacity of foods.

1. Introduction

Reactive oxygen species (ROS) are highly reactive and promote lipid oxidation in foods. Furthermore, within the body, they are generated from oxygen taken up during respiration. When produced in excess, they attack biomolecules and cause various diseases [1,2,3]. Antioxidants suppress the generation and reactivity of ROS, thus preventing oxidation [4]. They are used as additives to prevent oxidation and quality deterioration of food. Antioxidants can be easily obtained from everyday meals. People who consume foods rich in antioxidants have a lower risk of mortality from cardiovascular, cardiac, and cerebrovascular diseases than those who do not [5,6]. Obtaining antioxidants through food is an important factor for extending healthy life expectancy. When antioxidants are obtained from food, it is necessary for the compounds to be absorbed into the body. Lipinski et al.’s Rule of Five [7] is often used as an indicator of whether a compound is easily absorbed by the body. Polyphenols are the primary antioxidants present in food. There is a correlation between the polyphenol content of commonly consumed foods and their antioxidant capacities [8]. Therefore, to elucidate the antioxidant capacity of food, it is important to understand the antioxidant properties of phenolic compounds that are easily absorbed by the body.
The 2,2-diphenyl-1-picrylhydrazyl (DPPH) assay is used to measure the ability of a substance to scavenge DPPH radicals [9]. Although various methods have been proposed to evaluate antioxidant capacity [10], no single universal method exists. The DPPH assay is widely used because it is inexpensive and provides rapid results. Although in vitro antioxidant activity measurement approaches are considered to have conceptual and technical limitations, they are considered effective for confirming whether a compound possesses antioxidant activity or not [11]. When comparing the DPPH activity of food extracts with the cumulative DPPH activity of the antioxidants contained in the food, the sum of the activities of the individual components do not match the activity of the extract [12,13,14]. However, certain studies have shown a strong correlation between the content of antioxidants (particularly phenols) and the DPPH activity of food extracts [15,16,17,18], indicating that the cumulative DPPH activity of food components can be used, to some extent, to predict the overall DPPH activity of a food. However, foods contain numerous components that are difficult to isolate for DPPH activity measurement. Therefore, predicting the DPPH activity of food components is important for understanding the DPPH activity of foods.
Regarding the structure–activity relationship (SAR) of antioxidants, several studies have reported correlations with thermodynamic indicators, such as the ionization potential (IP) [19]. Most SAR studies have compared compounds with common structures based on differences in their partial structures [20,21,22]. However, these studies have often reported inconsistent trends. One reason for this is the lack of standardized measurement conditions. When comparing DPPH assay results obtained under different experimental conditions, such as temperature and solvent, simple structural rules do not apply. As the measurement temperature increases, the observed DPPH activity increases owing to the higher kinetic energy of the reacting molecules and the greater degree of hydroxyl-group dissociation. In addition, the reaction mechanism depends strongly on solvent polarity in nonpolar solvents, reactions via the single electron transfer (SET) or sequential proton-loss electron transfer (SPLET) mechanisms are unlikely, and the hydrogen atom transfer (HAT) mechanism tends to dominate; whereas, in polar solvents, the SET or SPLET mechanism becomes more favorable, leading to changes in the measured values. The influence of these experimental variations on the measured values is particularly pronounced for molecules with a weak DPPH activity [23]. As the experimental conditions were not standardized, it was difficult to compare the results between different laboratories [24]. In this milieu, efforts have been made to standardize experimental conditions and minimize errors in the measurement results among different laboratories [25,26]. Building on this effort, a reagent company has produced a simple kit that has been in the market since 2019. Several studies have performed component analyses using this kit [27,28,29,30]. It should be noted that the reported trends only apply to a limited range of compounds. Simple structural rules may not be applicable to complex molecules, resulting in different trends observed across multiple studies. For example, it is generally known that molecules with a greater number of hydroxyl groups exhibit a higher antioxidant activity [31]. However, carnosic acid, which has two phenolic hydroxyl groups, has been reported to show a higher antioxidant activity than rosmarinic acid, which contains four phenolic hydroxyl groups [32]. Furthermore, when phenols form dimers through o–o homocoupling, the number of phenolic hydroxyl groups increases, which generally enhances DPPH activity. Nevertheless, some dimers exhibit lower DPPH activity than their monomers [33]. We believe that applying machine learning to capture more complex patterns would be beneficial.
Properties of compounds have been predicted using SAR models, such as the Hansch and Hammett equations. In recent years, machine learning, represented by neural networks (NNs), has been used to predict nonlinear relationships [34,35,36,37,38]. The primary methods for improving prediction accuracy include selecting an appropriate machine learning model and choosing suitable explanatory variables. NNs are among the best suited models for capturing nonlinear relationships. In predicting DPPH activity, some studies have collated experimental data from the literature and performed predictions using a multi-perceptron NN [39]. However, owing to the lack of standardization in measuring DPPH activity, the data are not suitable for use as training data or target values in prediction. Furthermore, reports indicate that for data that are difficult to obtain in large quantities, non-deep learning models such as gradient decision trees yield higher prediction accuracy than deep learning models such as NNs [40]. Additionally, NNs have a high degree of black-box nature, which makes it difficult provide a chemical interpretation of the predictions [41]. Regarding the explanatory input variables, reports also show that incorporating quantum chemical calculation values, not just molecular structural information, improves prediction accuracy [42]. Therefore, to ensure prediction accuracy while enabling chemical interpretation, it is advisable to use molecular descriptors, including quantum chemical calculation values, as explanatory variables, and employ machine learning models capable of importance analysis for prediction.
The purpose of this study was to establish a foundation for understanding the antioxidant capacity of phenols in foods by analyzing numerous phenolic compounds using machine learning. An overview of the study workflow is presented in Figure 1. First, as predictive experimental data, the DPPH activities of various phenols were measured under standardized experimental conditions to enable comparison with previously reported DPPH activities by our group. Quantum chemical calculation values were obtained for the measured compounds and the compounds registered in FooDB [43], a dataset of food-component compounds. Furthermore, using the obtained molecular structural information and quantum chemical descriptors as explanatory variables, machine learning models were used to predict the presence or absence of DPPH activity of FooDB compounds using LightGBM [44], and to predict IC50 values for selected compounds using support vector machine (SVM). By analyzing the machine learning models, we determined the factors influencing DPPH activity based on the calculated molecular descriptors, including compound substructures and quantum chemical calculation values input into the models during prediction. This research is expected to deepen our understanding of the antioxidant capacity of food components that are difficult to isolate by enabling antioxidant capacity prediction without conducting experiments.

2. Materials and Methods

2.1. Reagents and Synthesis

The reagents used in this study were purchased from Wako Pure Chemical Co. (Osaka, Japan), Tokyo Chemical Industry Co. (Tokyo, Japan), Sigma-Aldrich (St. Louis, MO, USA), Kanto Chemical Co., Inc. (Tokyo, Japan), and Thermo Fisher Scientific (Waltham, MA, USA). For compounds with only one hydroxyl group, reagents were selected to cover a range of compounds with structures similar to those registered in FooDB. The reagents were then analyzed without purification. Furthermore, as eight compounds could not be purchased directly, they were synthesized according to previously reported methods [45]. These compounds were selected to fill the gaps in the physical properties and structural space in our previously reported measurement data [19,33] and food component compounds, resulting in a total of 96 measured compounds. Some of these data are shown in Figure 2. The details of these compounds are provided in the Supplementary Information (SI.xlsx).

2.2. DPPH Radical-Scavenging Activity Measurement

The DPPH radical-scavenging activity was measured using the DPPH Antioxidant Assay Kit from Dojin Chemical Laboratory (Kumamoto, Japan) [46] following the manufacturer’s protocol. The total reaction volume was 200 µL, consisting of 100 µL (50% (v/v)) of DPPH solution in ethanol, 80 µL (40% (v/v)) of buffer solution provided with the kit, and 20 µL (10% (v/v)) of sample solution. The sample solutions were prepared in ethanol or, for ethanol-insoluble compounds, in dimethyl sulfoxide (DMSO). Absorbance measurements were performed using Thermo Scientific SkanIt™ software (Ver 7.0.2). For a preliminary experiment, sample solutions were prepared at concentrations of 1, 10, 100, and 1000 µg/mL. These solutions were reacted with specific amounts of DPPH solution to determine the optimal range of IC50. Subsequently, the DPPH radical-scavenging rate was measured at four points within the optimal concentration range. After confirming the linearity of the scavenging rate with concentration changes and the presence of the 50% scavenging rate point on the line, a regression line was drawn, and the IC50 (µg/mL) was determined via interpolation. For each experiment, the IC50 of 6-hydroxy-2,5,7,8-tetramethyl-3,4-dihydrochromene-2-carboxylic acid (Trolox) was measured to correct for interexperimental variation. The Trolox equivalent antioxidant capacity (TEAC) was calculated using Equation (1). Furthermore, pIC50 (Equation (2)), which is the common logarithm of the reciprocal of IC50 (mol/L), was calculated. For samples whose optimal concentration range exceeded 1000 µg/mL, accurate IC50 measurement was not performed, as the TEAC value was expected to be extremely low.
T E A C = I C 50 T r o l o x I C 50 S a m p l e
p I C 50 = log I C 50 m o l / L

2.3. Dataset

The SMILES of the compounds contained in the foods were obtained from FooDB in December 2024. Neutral molecules with phenolic hydroxyl groups were extracted using the fr_phenol method in RDKit (ver, 2024.09.1) [47].
PubChemQC [48], a quantum chemistry database, was used to obtain quantum chemistry calculation values. PubChemQC is a database that compiles the results of structural optimization using the PM6 method and vibrational calculations using B3LYP/6-31G(d) for compounds with molecular weights less than 1000 among the compounds reported in PubChem [49]. For the compounds registered in FooDB as of December 2024, we retrieved the SMILES, highest occupied molecular orbital energy (E_HOMO), and lowest unoccupied molecular orbital energy (E_LUMO) values, yielding a total of 4547 compounds. Additionally, quantum chemistry calculation values were similarly obtained from PubChemQC for the compounds whose DPPH activity was measured in this study (96 types) and for the compounds whose DPPH activity was measured previously using the same assay kit (169 types, 9 types) [19,33]. Hereinafter, E_HOMO and E_LUMO values obtained from PubChemQC will be referred to as E_HOMO_PubChemQC and E_LUMO_PubChemQC, respectively.

2.4. E_HOMO_calc and E_LUMO_calc Calculations

The calculated values reported in PubChemQC were obtained under vacuum conditions using B3LYP without considering solvent effects. Therefore, we attempted to improve calculation accuracy using B3LYP/6-31G(d)//PM6 in PubChemQC while accounting for solvent effects, with the aim of obtaining more accurate chemical values under conditions closer to the experimental system.
Molecular objects were generated from the isomeric SMILES using RDKit, followed by desalting and searching for three-dimensionally stable conformations. The ETKDG method [50] was used for conformer searches, generating 1000 conformers per compound. Each structure was optimized using the Merck Molecular Force Field (MMFF) [51], and the most stable structure obtained was used as the initial structure for quantum chemical calculations.
For the neutral molecules obtained from these structures, Gaussian16 [52] was used to perform structural optimization at B3LYP/6-31G(d), followed by vibrational analysis at M06-2X/6-311++G(d,p). The structural optimization and vibrational analyses were performed in aqueous solvents using SMD [53]. This yielded E_HOMO and E_LUMO values. Hereinafter, the E_HOMO and E_LUMO values obtained in this manner will be denoted as E_HOMO_calc and E_LUMO_calc, respectively. Furthermore, owing to the anticipated high computational cost, E_HOMO_calc and E_LUMO_calc were obtained only for the 274 compounds for which DPPH measurement data existed.

2.5. Machine Learning

2.5.1. Calculation Descriptor

For the 274 measured compounds, only molecules for which both PubChemQC-calculated values and the calculated values were available were considered. To account for toxicity to living organisms and membrane permeability within the body [7], the dataset (measured dataset) was narrowed down to compounds with a molecular weight of less than 500 and MolLogP between 0 and 5 (247 compounds). For each compound, molecular descriptors were calculated using RDKit. Furthermore, 2D and 3D molecular descriptors were calculated using Mordred software (ver,1.2.0) [54]. For each compound, 1000 conformers were generated and optimized using MMFF. The most stable structure obtained was used to calculate the 3D molecular descriptors using Mordreds. These molecular descriptors, along with E_HOMO_PubChemQC and E_LUMO_PubChemQC, formed an initial set of 1499 molecular descriptors.
For food ingredient compounds obtained from FooDB, we filtered compounds with molecular weights less than 500 and MolLogP between 0 and 5. Using RDKit, we extracted compounds with fr_phenol ≥ 1 (2235 compounds), forming a dataset (FooDB dataset). Molecular descriptors were calculated using RDKit and Mordreds as previously described. Structures for calculating the 3D molecular descriptors were obtained using the same method as for the molecules in the measured dataset. E_HOMO_PubChemQC and E_LUMO_PubChemQC were added to these to form the initial molecular descriptors.

2.5.2. Classification Model

We used the classification model to calculate the ECFP (radius = 3, bit = 2048) [55] using RDKit and used the value as an explanatory variable for the machine learning model. A binary variable (Assay) was created, assigned with a value of 0 if the IC50 (µg/mL) was ≥1000 µg/mL and 1 otherwise, and used as the target variable. All compounds in the measurement dataset were used for model training and evaluation. Of the measured compounds, 50% (50 compounds) registered in FooDB were used as test data and the remaining compounds (197 compounds) were used as training data. The machine learning model was implemented using the LightGBM module [44]. Using Optuna [56], a hyperparameter search was performed 30 times based on the Tree-Structured Parzen Estimator (TPE) [57] to maximize the ROC AUC [58]. The search performed a leave-one-out cross-validation (LOOCV) on the training data to determine the hyperparameters. Subsequently, LOOCV was performed on the training data using the determined hyperparameters, and the accuracy, F1 Score [59], and Matthews Correlation Coefficient (MCC) [59] were calculated to evaluate the generalization performance of the model. Furthermore, we evaluated the prediction accuracy by calculating the accuracy, F1 Score, and MCC of the test data. Subsequently, we performed an importance analysis using Shapley’s additive explanation (SHAP) [60].

2.5.3. Regression Model

The pIC50 was calculated using Equation (2) and used as the target variable. To distinguish compounds with IC50 (µg/mL) values exceeding 1000 µg/mL from those with IC50 values ≤ 1000 µg/mL, the pIC50 was calculated assuming an IC50 of 2000 µg/mL. Boruta-Shap [61] was used to select molecular descriptors. Among the calculated molecular descriptors, those with absolute correlation coefficients exceeding 0.9 were removed. The remaining 586 molecular descriptors were input into Boruta-Shap to select the molecular descriptors.
Using RDKit on the measurement dataset, we extracted compounds (162 in total) that contained one aromatic hydroxy group and used them for model training and evaluation. Of the extracted compounds, 50% (35 compounds) registered in FooDB were used as test data and the remainder (127 compounds) were used as training data. The machine learning model was constructed using an SVM implemented in the Scikit-Learn [62] module. During training, the molecular descriptors of the training data were standardized using Scikit-Learn’s StandardScaler. Molecular descriptors of the test data were transformed accordingly. A hyperparameter search was performed 30 times using Optuna based on TPE to minimize the root mean squared error (RMSE). Similarly to the classification model, the search involved LOOCV on the training data to determine hyperparameters. Subsequently, LOOCV was performed on the training data using the determined hyperparameters, and the coefficient of determination (q2) and RMSE were calculated to evaluate the generalization performance of the model. Furthermore, the prediction accuracy was evaluated using the coefficient of determination (R2) and the RMSE of the test data. Subsequently, an importance analysis was performed using SHAP.

3. Results

3.1. Results of DPPH Assay

As the DPPH activity data previously reported by our group [19,33] were insufficient in terms of both quantity and structural diversity for machine learning analysis, we measured DPPH radical-scavenging activity in this study. Preliminary experiments on 96 compounds revealed that 43 compounds had IC50 values of 1000 µg/mL or less. Additional experiments were conducted on these compounds to calculate their IC50 and TEAC values. No additional experiments were conducted on the remaining 53 compounds because their TEAC were expected to be very small. In addition to phenols, compounds not included in FooDB were also measured. Some compounds were soluble only in DMSO and not in ethanol. Therefore, to investigate the effect of the solvent used to dissolve the samples, experiments were conducted on six compounds using samples dissolved in ethanol and samples dissolved in DMSO. We examined the effect of different solvents on TEAC; however, as all data were within the experimental error range, we considered that the data obtained for samples dissolved in ethanol and those obtained for samples dissolved in DMSO could be treated equivalently. Details are provided in the Supplementary Information. Furthermore, our previous study has shown that data obtained using the same assay kit are comparable [19]. This resulted in dataset of measurements of 274 compounds, comprising values reported previously and those obtained in the present study. To the best of our knowledge, this is the largest dataset reported by a single laboratory.
To predict the extent to which the FooDB range would be covered by the prediction range when building a machine learning model using the measurement dataset, we investigated the difference between the compound range of the measurement dataset and that of FooDB. Figure 3a shows the MolLogP-MolWt plot for compounds with one phenolic hydroxyl group in the measurement dataset and compounds with one phenolic hydroxyl group in the FooDB dataset. Figure 3b shows that MolLogP covers a wide range of the FooDB dataset. Figure 3c shows that, with respect to molecular weight, the number of measurements for compounds with molecular weights between 300 and 500 (representing the molecular weights of most compounds in the FooDB dataset) was small, and the coverage was not as wide as that for MolLogP. However, as measurements were performed for the entire molecular weight distribution, we believe that we were able to create a broad dataset of DPPH activity measurements.

3.2. Comparison of Calculation Level

Boruta-Shap was used to screen molecular descriptors and incorporate factors that significantly influenced DPPH activity into the explanatory variables input into the machine learning model. Figure 4a partially shows the results of feature selection and importance analysis using Boruta-Shap. The screening results suggest that E_HOMO is an important factor for predicting DPPH activity. The detailed results of the Boruta-Shap analysis are presented in the Supplementary Information. Figure 4b shows the distribution of E_HOMO_PubChemQC for compounds in the FooDB dataset. Compounds with higher E_HOMO values exhibited DPPH activity more frequently, whereas compounds with lower E_HOMO values were more likely to lack DPPH activity. This finding aligns with that of a previous study [63], indicating that E_HOMO influences DPPH activity, supporting the importance of E_HOMO in the reaction between DPPH and the compounds.
Next, we investigated whether the relationship between E_HOMO and pIC50 was affected by compound structure. Plots of pIC50-E_HOMO_PubChemQC for the compounds in the measurement dataset are shown in Figure 4c,d. Figure 4c shows the plot for compounds with one phenolic hydroxyl group and Figure 4d shows the plot for compounds with two or more phenolic hydroxyl groups. Figure 4c shows a narrower range of compounds with and those without DPPH activity, whereas Figure 4d shows a wider range of compounds with and those without DPPH activity. This is because compounds with two or more phenolic hydroxyl groups are more prone to subsequent reactions than those with one phenolic hydroxyl group [40] and factors other than E_HOMO are also significantly involved in the reaction. We attempted to determine a similar relationship using a higher computational level, E_HOMO_calc. There was no significant change in the plot of E_HOMO_PubChemQC vs. E_HOMO_calc, and a similar relationship was observed depending on the number of phenolic hydroxyl groups present. This finding could be attributed to the correlation between E_HOMO_PubChemQC and E_HOMO_calc (R2 = 0.75), which resulted in no significant difference in the overall shape of the plot. Details are provided in the Supplementary Information. Compounds with only one phenolic hydroxyl group had greater DPPH activity than compounds with two or more phenolic hydroxyl groups. Furthermore, even compounds with low E_HOMO values showed DPPH activity when the number of phenolic hydroxyl groups increased, indicating that the number of phenolic hydroxyl groups contributed to the DPPH activity in addition to E_HOMO.

3.3. Results of Machine Learning Analysis

3.3.1. Classification Results

To achieve high accuracy and chemical interpretability of DPPH activity prediction, we first constructed a classification model (LGBM_ECFP) using ECFP as the explanatory variable. The confusion matrix for the test data composed solely of FooDB-registered compounds is shown in Figure 5a. The accuracy, F1 Score, and MCC were 0.88, 0.86, and 0.76, respectively. The LOOCV-based generalization performance evaluation of LGBM_ECFP is presented in the Supplementary Information. Furthermore, we constructed a model (LGBM_PubChemQC) in which the explanatory variable was changed from ECFP to molecular descriptors selected using Boruta-Shap. Although prediction accuracy on the test data decreased compared to that when ECFP was used as the explanatory variable (accuracy: 0.84, F1 score: 0.81, MCC: 0.68), the LOOCV generalization performance evaluation showed that it outperformed LGBM_ECFP by approximately 0.05 to 0.13 across all metrics. Furthermore, the model with a higher E_HOMO calculation level (LGBM_calc) achieved the best results for the test data (accuracy, 0.90; F1 score, 0.88; MCC, 0.80). In the LOOCV generalization performance evaluation, it also outperformed LGBM_ECFP by approximately 0.1 across all metrics. Details are provided in the Supporting Information. These results demonstrate that, although the accuracy is lower than that achieved using molecular descriptors selected using Boruta-Shap, the DPPH activity can be predicted to a certain extent using partial structures based on ECFP.
As LGBM_ECFP can classify DPPH activity to some extent based on molecular substructures, SHAP analysis was performed to identify substructures important for DPPH activity. The SHAP values represent the contribution of each feature to the model output, calculated as the difference between the prediction for a given sample and the average prediction over the background dataset. Positive and negative SHAP values indicate features that increase or decrease the predicted DPPH activity, respectively [60]. Figure 5b shows the results of the importance analysis for LGBM_ECFP and some of the partial structures predicted to contribute to the activity. Figure 5(c-1,c-2) shows some of the compounds predicted to have activity and Figure 5(d-1,d-2) shows some of the compounds predicted to lack activity. In Figure 5(c-1,c-2,d-1,d-2), the substructures that contribute positively according to the SHAP analysis are shown in red, those that contribute negatively are shown in blue, and those that contribute nothing are shown in gray. As shown in Figure 5b, the hydroxyl group of FP_202 contributes significantly and positively to the prediction of the presence of that substructure, suggesting that the model meaningfully learns. In Figure 5(c-1,c-2), the molecules predicted to be active with phenolic hydroxyl groups are shown in red. However, hydroxyl groups that are not directly bonded to the aromatic ring are not very important, suggesting that phenolic hydroxyl groups are important for DPPH activity and that phenols tend to act as antioxidants. The ortho-methoxy groups are shown in red, indicating that they contributed significantly. This observation supports the findings of a previous study [64] showing that ortho-methoxy groups affect activity. In contrast, Figure 5(d-1,d-2) shows that the molecules predicted to be inactive have carbon atoms on the benzene ring without substituents, which are displayed in blue. This finding suggests that the presence of substituents tends to increase the DPPH activity. Furthermore, the oxygen of the carboxyl group contributed negatively to this prediction. This finding is in agreement with the results of a previous study [64]. Thus, the important substructures obtained by analyzing LGBM_ECFP can be explained to some extent by previously reported findings, suggesting that the machine learning model performs meaningful learning and provides highly accurate predictions, while also indicating the possibility of obtaining new findings.

3.3.2. Regression Results

The classification model can predict the presence or absence of DPPH activity; however, predicting antioxidant capacity of a compound requires not only determining whether activity is present, but also its strength. Therefore, we constructed a regression model that can predict the strength of activity as a continuous value. Initially, we applied the regression model to all molecules, but did not obtain satisfactory prediction accuracy. The data provided in Figure 4c,d suggest that the number of phenolic hydroxyl groups affects the relationship between E_HOMO and pIC50; so, we predicted molecules with only one phenolic hydroxyl group. Figure 6a shows the YY plot of the model (SVM_PubChemQC) trained with features selected using Boruta-Shap. The R2test was 0.70, RMSEtest was 0.44, q2 was 0.61, and RMSEvalidation was 0.46, satisfying the Golbraikh–Tropsha criteria [65]. To improve the accuracy of the machine learning model, we constructed a new model (SVM_calc) by replacing E_HOMO_PubChemQC with E_HOMO_calc, which was calculated at a higher level. SVM_calc had an R2test of 0.71, an RMSEtest of 0.44, a q2 of 0.59, and an RMSEvalidation of 0.48. The model details are provided in the Supplementary Information. The potential reason why the prediction accuracy did not change significantly despite increasing the E_HOMO calculation level is that E_HOMO_PubChemQC and E_HOMO_calc were correlated, and the relative strength of E_HOMO did not change significantly depending on the calculation level. This finding suggests that DPPH activity can be predicted to a certain extent for molecules with only one phenolic hydroxyl group.
To investigate the factors influencing DPPH activity, SHAP analysis was performed using SVM_PubChemQC. In this study, we examined both factors effectively across the entire dataset and those based on activity strength. The SHAP summary plot of SVM_PubChemQC is presented in Figure 6b. The SHAP analysis results for 2,6-dimethoxyphenol, a compound with pIC50 ≥ 3, are shown in Figure 6c. The SHAP analysis results for 4-hydroxybenzoic acid butyl, a compound with pIC50 < 2, are shown in Figure 6d. Figure 6b shows that E_HOMO contributed significantly to the overall prediction. This finding supports that of prior research [63], indicating a large contribution from E_HOMO, as mentioned earlier. Comparing Figure 6c,d, E_HOMO contributes positively to the prediction in Figure 6c but negatively to that in Figure 6d. This observation indicates that within the measured dataset, 2,6-dimethoxyphenol had a relatively high E_HOMO, whereas 4-hydroxybenzoic acid butyl ester had a low E_HOMO. Here, we considered the reaction mechanism between DPPH and these compounds. Three mechanisms have been proposed for the DPPH–compound reactions: HAT, ET, and SPLET [66]. Among these, the ET mechanism is thought to be related to the IP. A strong correlation between IP and E_HOMO has been reported. Furthermore, as the ET mechanism involves a cationization reaction in which electrons are abstracted, it is considered to be within the applicability range of Koopman’s theorem [67]. Consequently, the predictions using SVM_PubChemQC indicated a significant contribution from E_HOMO. This finding suggests that the ET mechanism, in which IP serves as a considerable driving force, is likely to occur in the reaction between DPPH and the compound, consistent with the findings of previous research [19,68].

3.4. Prediction of FooDB Compounds

As the DPPH activity of foods can be roughly calculated as the sum of the DPPH activities of the antioxidants in the food, regression prediction of the DPPH activity of food constituents is considered important. First, classification prediction was performed using LGBM_ECFP for 2235 phenols included in FooDB with a molecular weight of less than 500 and a MolLogP of 0 or more and less than 5. Among these, 1225 phenols were predicted to have DPPH activity, while 1010 phenols were predicted to not have DPPH activity. Next, regression prediction was performed for 753 phenols with one phenolic hydroxyl group, for which E_HOMO was obtained from PubChemQC. Among them, 148 compounds were predicted to exhibit high activity, with pIC50 exceeding 3. Compounds predicted to have high pIC50 values and low pIC50 values, and no DPPH activity are shown in Figure 7. Details of the classification and regression predictions are presented in the Supplementary Information. By predicting the continuous value of pIC50 for food component compounds with one phenolic hydroxyl group, it is possible to predict the presence and strength of antioxidant activity, which has been challenging until now. We believe that quantitative prediction of antioxidant activity enables efficient exploration of the chemical space for expensive reagents that are difficult to experiment with and food component compounds that are difficult to extract. Furthermore, we have laid the foundation for the quantitative prediction of the antioxidant activity of food components.

3.5. Limitations and Directions for Future Work

Regarding the evaluation metrics used for the classification models, each metric has inherent limitations. Accuracy is sensitive to class imbalance and may overestimate performance for the majority class. The F1 score, defined as the harmonic mean of precision and recall, does not consider true negatives and therefore does not fully reflect the model’s ability to correctly exclude inactive compounds. Although the MCC incorporates all elements of the confusion matrix and is relatively robust to class imbalance, its value can become unstable when applied to relatively small datasets, such as those evaluated using LOOCV, and should therefore be interpreted with caution. The regression prediction model constructed in this study showed high adaptability for compounds containing only one phenolic hydroxyl group; however, its predictive accuracy could be improved by incorporating more complex descriptors, such as bond dissociation energy and acidity constant, as well as by expanding the dataset. Furthermore, the model cannot be applied to compounds with two or more phenolic hydroxyl groups, which are major contributors to antioxidant capacity. Therefore, to quantify the antioxidant capacities of foods more comprehensively, it is essential to include compounds with multiple hydroxyl groups in the model. While the descriptors and non-deep learning models currently in use may not be able to fully reflect the DPPH activity of these compounds, the introduction of machine learning methods with excellent nonlinear feature extraction capabilities, such as deep learning, is expected to enable more accurate handling of a wider variety of phenols for accurate prediction, albeit at the cost of interpretability. The results of this study lay the foundation for this and, in the future, it may be possible to develop a platform for comprehensively predicting and comparing the antioxidant capacities of a wide range of phenolic compounds in foods. Previous studies (e.g., [15]) have shown that the sum of the antioxidant capacities of food components is almost equal to the antioxidant capacity of the food itself. We believe that by predicting the antioxidant capacity of food components, it is possible to predict the antioxidant capacity of the food itself.

4. Conclusions

In this study, we established a foundation for quantitatively predicting the antioxidant capacity of numerous phenolic compounds, which affect the antioxidant capacity of foods. We measured DPPH activity of 96 compounds present in food components and added the measurements to the current dataset. Including previously reported data, we created a dataset of 274 compounds, which, to our knowledge, is the largest single-laboratory dataset. Quantum chemical calculations were performed for these compounds and the calculated values were compared with the TEAC and pIC50 values. In addition, a DPPH activity prediction model was constructed using machine learning to predict the DPPH activity more accurately. This prediction model performance was satisfactory, and the model satisfied the Golbraikh–Tropsha criteria. Furthermore, by obtaining quantum chemical calculation values from PubChemQC, a model constructed for other food components was adapted. To our knowledge, this is the largest study to quantitatively predict the DPPH activity of food components. Each food contains numerous components, and they need to be extracted individually to perform quantitative measurements of antioxidant capacity, making prediction challenging. This study contributes to the quantitative elucidation of the antioxidant capacities of foods containing various components.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/cimb48010012/s1.

Author Contributions

Conceptualization, R.K. and H.G.; methodology, R.K.; software, R.K. and Y.M.; validation, R.K. and C.T. and M.Y.; formal analysis, R.K. and C.T.; investigation, R.K. and C.T. and M.Y.; resources, R.K.; data curation, R.K.; writing—original draft preparation, R.K.; writing—review and editing, R.K.; visualization, R.K.; supervision, H.G.; project administration, R.K.; fund acquisition, H.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Japan Society for the Promotion of Science (JSPS) KAKENHI, grant number JP24K08785.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Materials. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
DPPH2,2-Diphenyl-1-Picrylhydrazyl
ROSReactive Oxygen Species
SARStructure–Activity Relationship
IPIonization Potential
NNsNeural Networks
DMSODimethyl Sulfoxide
Trolox6-Hydroxy-2,5,7,8-Tetramethyl-3,4-Dihydrochromene-2-Carboxylic Acid
TEACTrolox Equivalent Antioxidant Capacity
E_HOMOHighest Occupied Molecular Orbital Energy
E_LUMOLowest Unoccupied Molecular Orbital Energy
MMFFMerck Molecular Force Field
TPETree-Structured Parzen Estimator
LOOCVLeave-One-Out Cross-Validation
MCCMatthews Correlation Coefficient
SHAPShapley’s Additive Explanation
SVMSupport Vector Machine
RMSERoot Mean Squared Error

References

  1. Gulcin, İ. Antioxidants and antioxidant methods: An updated overview. Arch. Toxicol. 2020, 94, 651–715. [Google Scholar] [CrossRef]
  2. Ashok, A.; Andrabi, S.S.; Mansoor, S.; Kuang, Y.; Kwon, B.K.; Labhasetwar, V. Antioxidant therapy in oxidative stress-induced neurodegenerative diseases: Role of nanoparticle-based drug delivery systems in clinical translation. Antioxidants 2022, 11, 408. [Google Scholar] [CrossRef]
  3. Gulcin, İ. Antioxidants: A comprehensive review. Arch. Toxicol. 2025, 99, 1893–1997. [Google Scholar] [CrossRef] [PubMed]
  4. Apak, R.; Gorinstein, S.; Böhm, V.; Schaich, K.M.; Özyürek, M.; Güçlü, K. Methods of measurement and evaluation of natural antioxidant capacity/activity (IUPAC Technical Report). Pure Appl. Chem. 2013, 85, 957–998. [Google Scholar] [CrossRef]
  5. Kashino, I.; Mizoue, T.; Serafini, M.; Akter, S.; Sawada, N.; Ishihara, J.; Kotemori, A.; Inoue, M.; Yamaji, T.; Goto, A.; et al. Higher dietary non-enzymatic antioxidant capacity is associated with decreased risk of all-cause and cardiovascular disease mortality in Japanese adults. J. Nutr. 2019, 149, 1967–1976. [Google Scholar] [CrossRef]
  6. Pellegrini, N.; Vitaglione, P.; Granato, D.; Fogliano, V. Twenty-five years of total antioxidant capacity measurement of foods and biological fluids: Merits and limitations. J. Sci. Food Agric. 2020, 100, 5064–5078. [Google Scholar] [CrossRef]
  7. Muchmore, S.W.; Edmunds, J.J.; Stewart, K.D.; Hajduk, P.J. Cheminformatic tools for medicinal chemists. J. Med. Chem. 2010, 53, 4830–4841. [Google Scholar] [CrossRef]
  8. Parejo, I.; Viladomat, F.; Bastida, J.; Rosas-Romero, A.; Flerlage, N.; Burillo, J.; Codina, C. Comparison between the radical scavenging activity and antioxidant activity of six distilled and nondistilled mediterranean herbs and aromatic plants. J. Agric. Food Chem. 2002, 50, 6882–6890. [Google Scholar] [CrossRef]
  9. Brand-Williams, W.; Cuvelier, M.E.; Berset, C. Use of a free radical method to evaluate antioxidant activity. LWT—Food Sci. Technol. 1995, 28, 25–30. [Google Scholar] [CrossRef]
  10. Gülçin, I. Antioxidant activity of food constituents: An overview. Arch. Toxicol. 2012, 86, 345–391. [Google Scholar] [CrossRef]
  11. Xie, J.; Schaich, K.M. Re-Evaluation of the 2,2-Diphenyl-1-Picrylhydrazyl free radical (DPPH) assay for antioxidant activity. J. Agric. Food Chem. 2014, 62, 4251–4260. [Google Scholar] [CrossRef] [PubMed]
  12. Monika, B.; Klaudia, S.; Vanja, T.; Barbara, K.; Wojciech, C.; Sladjana, S.; Agnieszka, B. Interactions between bioactive components determine antioxidant, cytotoxic and nutrigenomic activity of cocoa powder extract. Free Radic. Biol. Med. 2020, 154, 48–61. [Google Scholar] [CrossRef]
  13. Tripti, J.; Kartik, A.; Manan, M.; Deepa, P.R.; Pankaj, K. Measurement of antioxidant synergy between phenolic bioactives in traditional food combinations (legume/non-legume/fruit) of (semi) arid regions: Insights into the development of sustainable functional foods. Discov. Food 2024, 4, 11. [Google Scholar] [CrossRef]
  14. Matsufuji, H.; Sasa, R.; Honma, Y.; Miyajima, H.; Chino, M.; Yamazaki, T.; Shimamura, T.; Ukeda, H.; Matsui, T.; Matsumoto, K.; et al. 1,1-diphenyl-2-picrylhydrazyl radical scavenging activity of binary mixtures of antioxidants. J. Jpn. Soc. Food Preserv. Sci. 2009, 56, 129–136. [Google Scholar] [CrossRef]
  15. Shimamura, T. Food chemical study on verification of assay for antioxidant capacity of food additive and its application. J. Jpn. Soc. Food Preserv. Sci. 2018, 44, 33–36. [Google Scholar] [CrossRef]
  16. Lang, S.; Liu, L.; Li, Z.; Liu, S.; Liang, J.; Lu, L.; Wang, L. Untargeted metabolomics reveals phenolic compound dynamics during mung bean fermentation. Food Chem. X 2025, 31, 103189. [Google Scholar] [CrossRef]
  17. Luisa, P.; Teresa, G.; Andrea, R.; Vincenzo, L.; Stanislaw, W.; Ryszard, A.; Magdalena, K. Characterization of antioxidant and antimicrobial activity and phenolic compound profile of extracts from seeds of different Vitis species. Molecules 2023, 28, 4924. [Google Scholar] [CrossRef]
  18. Sushila, S. In vitro antioxidant activity and total phenolic content of Digera muricata leaves. World J. Biol. Pharm. Health Sci. 2023, 14, 105–112. [Google Scholar] [CrossRef]
  19. Yamauchi, M.; Kitamura, Y.; Nagano, H.; Kawatsu, J.; Gotoh, H. DPPH measurements and structure—Activity relationship studies on the antioxidant capacity of phenols. Antioxidants 2024, 13, 309. [Google Scholar] [CrossRef]
  20. Ordoudi, S.A.; Tsimidou, M.Z.; Vafiadis, A.P.; Bakalbassis, E.G. Structure−DPPH• scavenging activity relationships: Parallel study of catechol and guaiacol acid derivatives. J. Agric. Food Chem. 2006, 54, 5763–5768. [Google Scholar] [CrossRef]
  21. Laguna, O.; Durand, E.; Baréa, B.; Dauguet, S.; Fine, F.; Villeneuve, P.; Lecomte, J. Synthesis and evaluation of antioxidant activities of novel Hydroxyalkyl esters and Bis-Aryl esters based on sinaptic and caffeic acids. J. Agric. Food Chem. 2020, 68, 9308–9318. [Google Scholar] [CrossRef]
  22. Ogata, M.; Hoshi, M.; Urano, S.; Endo, T. Antioxidant activity of eugenol and related monomeric and dimeric compounds. Chem. Pharm. Bull. 2000, 48, 1467–1469. [Google Scholar] [CrossRef] [PubMed]
  23. Dawidowicz, A.L.; Olszowy, M.; Jóźwik-Dolęba, M. Importance of solvent association in the estimation of antioxidant properties of phenolic compounds by the DPPH method. J. Food Sci. Technol. 2015, 52, 4523–4529. [Google Scholar] [CrossRef] [PubMed]
  24. Landrum, G.A.; Riniker, S. Combining IC50 or Ki values from different sources is a source of significant noise. J. Chem. Inf. Model. 2024, 64, 1560–1567. [Google Scholar] [CrossRef] [PubMed]
  25. Shimamura, T.; Matsuura, R.; Tokuda, T.; Sugimoto, N.; Yamazaki, T.; Matsufuji, H.; Matsui, T.; Matsumoto, K.; Ukeda, H. Comparison of Conventional Antioxidants Assays for Evaluating Potencies of Natural Antioxidants as Food Additives by Collaborative Study. J. Jpn. Soc. Food Sci. Technol. 2007, 54, 482–487. [Google Scholar] [CrossRef]
  26. Shimamura, T.; Sumikura, Y.; Yamazaki, T.; Tada, A.; Kashiwagi, T.; Ishikawa, H.; Matsui, T.; Sugimoto, N.; Akiyama, H.; Ukeda, H. Applicability of the DPPH assay for evaluating the antioxidant capacity of food additives—Inter-laboratory evaluation study. Anal. Sci. 2014, 30, 717–721. [Google Scholar] [CrossRef]
  27. Muramatsu, D.; Uchiyama, H.; Higashi, H.; Kida, H.; IwaiI, A. Effects of heat degradation of betanin in red beetroot (Beta vulgaris L.) on biological activity and antioxidant capacity. PLoS ONE 2023, 18, e0286255. [Google Scholar] [CrossRef]
  28. Sofian, F.F.; Kikuchi, N.; Koseki, T.; Kanno, Y.; Uesugi, S.; Shiono, Y. Antioxidant p-terphenyl compound, isolated from edible mushroom Boletopsis leucomelas. Biosci. Biotechnol. Biochem. 2022, 86, 300–304. [Google Scholar] [CrossRef]
  29. Harunari, E.; Imada, C.; Igarashi, Y. Konamycins A and B and rubromycins CA1 and CA2, aromatic polyketides from the tunicate-derived Streptomyces hyaluromycini MB-PO13T. J. Nat. Prod. 2019, 82, 1609–1615. [Google Scholar] [CrossRef]
  30. Ullah, A.; Sun, L.; Wang, F.; Fei, N.; Nawaz, H.; Yamashita, K.; Cai, Y.; Anwar, F.; Khan, M.Q.; Mayakrishnan, G.; et al. Eco-friendly bioactive β-caryophyllene/halloysite nanotubes loaded nanofibrous sheets for active food packaging. Food Packag. Shelf Life 2023, 35, 101028. [Google Scholar] [CrossRef]
  31. Villaño, D.; Fernández-Pachón, M.S.; Moyá, M.L.; Troncoso, A.M.; García-Parrilla, M.C. Radical scavenging ability of polyphenolic compounds towards DPPH free radical. Talanta 2007, 71, 230–235. [Google Scholar] [CrossRef]
  32. Erkan, N.; Ayranci, G.; Ayranci, E. Antioxidant activities of rosemary (Rosmarinus officinalis L.) extract, blackseed (Nigella sativa L.) essential oil, carnosic acid, rosmarinic acid, and sesamol. Food Chem. 2008, 110, 76–82. [Google Scholar] [CrossRef]
  33. Yamauchi, M.; Kitamura, Y.; Tada, C.; Kato, R.; Gotoh, H. Kinetic and thermodynamic evaluation of antioxidant reactions: Factors influencing the radical scavenging properties of phenolic compounds in foods. J. Sci. Food Agric. 2025, 105, 8186–8195. [Google Scholar] [CrossRef] [PubMed]
  34. Qin, Y.; Deng, H.; Yan, H.; Zhong, R. An accurate nonlinear QSAR model for the antitumor activities of chloroethylnitrosoureas using neural networks. J. Mol. Graph. Model. 2011, 29, 826–833. [Google Scholar] [CrossRef] [PubMed]
  35. Zhao, D.; Zhang, Y.; Chen, Y.; Li, B.; Zhou, W.; Wang, L. Highly accurate and explainable predictions of small-molecule antioxidants for eight in vitro assays simultaneously through an alternating multitask learning strategy. J. Chem. Inf. Model. 2024, 64, 9098–9110. [Google Scholar] [CrossRef]
  36. Inoue, N.; Shibata, T.; Tanaka, Y.; Taguchi, H.; Sawada, R.; Goto, K.; Momokita, S.; Aoyagi, M.; Hirao, T.; Yamanishi, Y. Revealing comprehensive food functionalities and mechanisms of action through machine learning. J. Chem. Inf. Model. 2024, 64, 5712–5724. [Google Scholar] [CrossRef]
  37. Fujimoto, T.; Gotoh, H. Prediction and chemical interpretation of singlet-oxygen-scavenging activity of small molecule compounds by using machine learning. Antioxidants 2021, 10, 1751. [Google Scholar] [CrossRef]
  38. Kato, Y.; Hamada, S.; Goto, H. Validation study of QSAR/DNN models using the competition datasets. Mol. Inform. 2020, 39, 1900154. [Google Scholar] [CrossRef]
  39. Jorge, E.G.; Rayar, A.M.; Barigye, S.J.; Rodríguez, M.E.J.; Veitía, M.S.I. Development of an in silico model of DPPH• free radical scavenging capacity: Prediction of antioxidant activity of coumarin type compounds. Int. J. Mol. Sci. 2016, 17, 881. [Google Scholar] [CrossRef]
  40. Deng, J.; Yang, Z.; Wang, H.; Ojima, I.; Samaras, D.; Wang, F. A systematic study of key elements underlying molecular property prediction. Nat. Commun. 2023, 14, 6395. [Google Scholar] [CrossRef]
  41. Balagopalan, A.; Zhang, H.; Hamidieh, K.; Hartvigsen, T.; Rudzicz, F.; Ghassemi, M. The road to explainability is paved with bias: Measuring the fairness of explanations. ACM Int. Conf. Proc. Ser. 2022, 22, 1194–1206. [Google Scholar] [CrossRef]
  42. Tagade, P.M.; Adiga, S.P.; Park, M.S.; Pandian, S.; Hariharan, K.S.; Kolake, S.M. Empirical relationship between chemical structure and redox properties: Mathematical expressions connecting structural features to energies of frontier orbitals and redox potentials for organic molecules. J. Phys. Chem. C 2018, 122, 11322–11333. [Google Scholar] [CrossRef]
  43. FooDB. Available online: https://foodb.ca/ (accessed on 29 October 2025).
  44. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. [Google Scholar]
  45. Nishii, T.; Ichizawa, K.; Nagano, H.; Mukai, H.; Sakaguchi, D.; Gotoh, H. Predicting substrate reactivity in oxidative homocoupling of phenols using positive and unlabeled machine learning. ACS Omega 2025, 10, 49805–49815. [Google Scholar] [CrossRef]
  46. Antioxidant Ability Assay DPPH Antioxidant Assay Kit Dojindo. Available online: https://www.dojindo.com/JP-EN/products/D678/ (accessed on 29 October 2025).
  47. RDKit. Available online: https://www.rdkit.org/ (accessed on 29 October 2025).
  48. Nakata, M.; Maeda, T. PubChemQC B3LYP/6-31G*//PM6 Data Set: The electronic structures of 86 million molecules using B3LYP/6-31G* Calculations. J. Chem. Inf. Model. 2023, 63, 5734–5754. [Google Scholar] [CrossRef]
  49. PubChem. Available online: https://pubchem.ncbi.nlm.nih.gov/ (accessed on 29 October 2025).
  50. Riniker, S.; Landrum, G.A. Better informed distance geometry: Using what we know to improve conformation generation. J. Chem. Inf. Model. 2015, 55, 2562–2574. [Google Scholar] [CrossRef]
  51. Halgren, T.A. Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J. Comput. Chem. 1996, 17, 490–519. [Google Scholar] [CrossRef]
  52. Frisch, M.J.; Trucks, G.W.; Schlegel, H.B.; Scuseria, G.E.; Robb, M.A.; Cheeseman, J.R.; Scalmani, G.; Barone, V.; Petersson, G.A.; Nakatsuji, H.; et al. Gaussian 16; Revision A; Gaussian Inc.: Wallingford, CT, USA, 2016. [Google Scholar]
  53. Marenich, A.V.; Cramer, C.J.; Truhlar, D.G. Universal solvation model based on solute electron density and on a continuum model of the solvent defined by the bulk dielectric constant and atomic surface tensions. J. Phys. Chem. B 2009, 113, 6378–6396. [Google Scholar] [CrossRef]
  54. Moriwaki, H.; Tian, Y.S.; Kawashita, N.; Takagi, T. Mordred: A molecular descriptor calculator. J. Cheminform. 2018, 10, 4. [Google Scholar] [CrossRef]
  55. Rogers, D.; Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 2010, 50, 742–754. [Google Scholar] [CrossRef]
  56. Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-Generation Hyperparameter Optimization Framework. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA, 25 July 2019; pp. 2623–2631. [Google Scholar] [CrossRef]
  57. Bergstra, J.; Bardenet, R.; Bengio, Y.; Kégl, B. Algorithms for hyper-parameter optimization. Adv. Neural Inf. Process Syst. 2011, 24, 2546–2554. [Google Scholar]
  58. Bradley, A.P. The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]
  59. Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef] [PubMed]
  60. Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4766–4775. [Google Scholar] [CrossRef]
  61. Keany, E. BorutaShap: A wrapper feature selection method which combines the boruta feature selection algorithm with shapley values. (Version 1.1) [Software]. Zenodo 2020. [Google Scholar] [CrossRef]
  62. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  63. Vlocskó, R.B.; Mastyugin, M.; Török, B.; Török, M. Correlation of physicochemical properties with antioxidant activity in phenol and thiophenol analogues. Sci. Rep. 2025, 15, 73. [Google Scholar] [CrossRef]
  64. Chen, J.; Yang, J.; Ma, L.; Li, J.; Shahzad, N.; Kim, C.K. Structure-antioxidant activity relationship of methoxy, phenolic hydroxyl, and carboxylic acid groups of phenolic acids. Sci. Rep. 2020, 10, 2611, Correction in Sci. Rep. 2020, 10, 5666. https://doi.org/10.1038/s41598-020-62493-y. [Google Scholar] [CrossRef]
  65. Golbraikh, A.; Tropsha, A. Beware of q2! J. Mol. Graph. Model. 2002, 20, 269–276. [Google Scholar] [CrossRef]
  66. Craft, B.D.; Kerrihard, A.L.; Amarowicz, R.; Pegg, R.B. Phenol-based antioxidants and the in vitro methods used for their assessment. Compr. Rev. Food Sci. Food Saf. 2012, 11, 148–173. [Google Scholar] [CrossRef]
  67. Luo, J.; Xue, Z.Q.; Liu, W.M.; Wu, J.L.; Yang, Z.Q. Koopmans’ theorem for large molecular systems within density functional theory. J. Phys. Chem. A 2006, 110, 12005–12009. [Google Scholar] [CrossRef]
  68. Ichikawa, K.; Sasada, R.; Chiba, K.; Gotoh, H. Effect of side chain functional groups on the DPPH radical scavenging activity of bisabolane-type phenols. Antioxidants 2019, 8, 65. [Google Scholar] [CrossRef]
Figure 1. Workflow of the study.
Figure 1. Workflow of the study.
Cimb 48 00012 g001
Figure 2. FooDB [43]-registered compounds analyzed in this study.
Figure 2. FooDB [43]-registered compounds analyzed in this study.
Cimb 48 00012 g002
Figure 3. Relationship between compounds containing one phenolic hydroxyl group in the measurement dataset and compounds containing one phenolic hydroxyl group in the FooDB dataset. (a): Molecular weight-MolLogP plot, (b): MolLogP histogram, (c): molecular weight histogram. Histograms were stacked, with the total height representing the total number of samples.
Figure 3. Relationship between compounds containing one phenolic hydroxyl group in the measurement dataset and compounds containing one phenolic hydroxyl group in the FooDB dataset. (a): Molecular weight-MolLogP plot, (b): MolLogP histogram, (c): molecular weight histogram. Histograms were stacked, with the total height representing the total number of samples.
Cimb 48 00012 g003
Figure 4. (a): Partial representation of feature selection and importance analysis using Boruta-Shap. (b): Distribution of E_HOMO_PubChemQC in the measurement dataset and FooDB dataset. (c): pIC50-E_HOMO_PubChemQC plot for compounds with one phenolic hydroxyl group in the measurement dataset. (d): pIC50-E_HOMO_PubChemQC plot for compounds with two or more phenolic hydroxyl groups in the measurement dataset. Histograms are stacked, and the total height represents the total number of samples.
Figure 4. (a): Partial representation of feature selection and importance analysis using Boruta-Shap. (b): Distribution of E_HOMO_PubChemQC in the measurement dataset and FooDB dataset. (c): pIC50-E_HOMO_PubChemQC plot for compounds with one phenolic hydroxyl group in the measurement dataset. (d): pIC50-E_HOMO_PubChemQC plot for compounds with two or more phenolic hydroxyl groups in the measurement dataset. Histograms are stacked, and the total height represents the total number of samples.
Cimb 48 00012 g004
Figure 5. (a): Confusion matrix of LGBM_ECFP on test data. (b): SHAP (SHapley Additive exPlanations) [60] analysis of LGBM_ECFP and top four sub-structures identified as highly important. (c-1,c-2): Important structural analysis of molecules predicted as active by LGBM_ECFP. (d-1,d-2): Important structural analysis of molecules predicted as inactive by LGBM_ECFP. In panels (c-1,c-2,d-1,d-2), red indicates structural features contributing positively to the prediction of activity, whereas blue indicates features contributing negatively; darker colors correspond to larger contributions.
Figure 5. (a): Confusion matrix of LGBM_ECFP on test data. (b): SHAP (SHapley Additive exPlanations) [60] analysis of LGBM_ECFP and top four sub-structures identified as highly important. (c-1,c-2): Important structural analysis of molecules predicted as active by LGBM_ECFP. (d-1,d-2): Important structural analysis of molecules predicted as inactive by LGBM_ECFP. In panels (c-1,c-2,d-1,d-2), red indicates structural features contributing positively to the prediction of activity, whereas blue indicates features contributing negatively; darker colors correspond to larger contributions.
Cimb 48 00012 g005
Figure 6. (a): YY plot of SVR_PubChemQC, (b): SHAP-summary-plot of SVM_PubChemQC, (c): SHAP analysis of 2,6-dimethoxyphenol, (d): SHAP analysis of 4-hydroxybenzoic acid butyl ester.
Figure 6. (a): YY plot of SVR_PubChemQC, (b): SHAP-summary-plot of SVM_PubChemQC, (c): SHAP analysis of 2,6-dimethoxyphenol, (d): SHAP analysis of 4-hydroxybenzoic acid butyl ester.
Cimb 48 00012 g006
Figure 7. Some of the food component compounds whose DPPH activity was predicted.
Figure 7. Some of the food component compounds whose DPPH activity was predicted.
Cimb 48 00012 g007
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kato, R.; Tada, C.; Yamauchi, M.; Matsumoto, Y.; Gotoh, H. DPPH Measurement for Phenols and Prediction of Antioxidant Activity of Phenolic Compounds in Food. Curr. Issues Mol. Biol. 2026, 48, 12. https://doi.org/10.3390/cimb48010012

AMA Style

Kato R, Tada C, Yamauchi M, Matsumoto Y, Gotoh H. DPPH Measurement for Phenols and Prediction of Antioxidant Activity of Phenolic Compounds in Food. Current Issues in Molecular Biology. 2026; 48(1):12. https://doi.org/10.3390/cimb48010012

Chicago/Turabian Style

Kato, Riku, Chihiro Tada, Moeka Yamauchi, Yuto Matsumoto, and Hiroaki Gotoh. 2026. "DPPH Measurement for Phenols and Prediction of Antioxidant Activity of Phenolic Compounds in Food" Current Issues in Molecular Biology 48, no. 1: 12. https://doi.org/10.3390/cimb48010012

APA Style

Kato, R., Tada, C., Yamauchi, M., Matsumoto, Y., & Gotoh, H. (2026). DPPH Measurement for Phenols and Prediction of Antioxidant Activity of Phenolic Compounds in Food. Current Issues in Molecular Biology, 48(1), 12. https://doi.org/10.3390/cimb48010012

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop