Author Contributions
Conceptualization, A.J.B.L., B.Z., D.W.L., J.J.L., S.E.J. and C.S.M.; methodology, A.J.B.L., B.Z., D.W.L. and C.S.M.; software, A.J.B.L. and C.S.M.; validation, A.J.B.L., B.Z., J.J.L., S.E.J., D.W.L. and C.S.M.; formal analysis, A.J.B.L., B.Z. and C.S.M.; investigation, D.W.L. and C.S.M.; resources, C.S.M.; data curation, A.J.B.L.; writing—original draft preparation, A.J.B.L., B.Z. and C.S.M.; writing—review and editing, A.J.B.L., B.Z., J.J.L., S.E.J., D.W.L. and C.S.M.; visualization, A.J.B.L., B.Z. and C.S.M.; supervision, C.S.M.; project administration, D.W.L., J.J.L. and C.S.M.; funding acquisition, C.S.M. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Rey Complex Figure (CF) scoring items. The left panel shows the Rey Complex Figure that participants reproduced during the copy, immediate recall, and delayed recall phases of the test. Each numbered element corresponds to a distinct structural component of the figure, summarized in the reference table on the right. The numbered labels are included to help readers identify component features and were not present on the version shown to participants.
Figure 1.
Rey Complex Figure (CF) scoring items. The left panel shows the Rey Complex Figure that participants reproduced during the copy, immediate recall, and delayed recall phases of the test. Each numbered element corresponds to a distinct structural component of the figure, summarized in the reference table on the right. The numbered labels are included to help readers identify component features and were not present on the version shown to participants.
Figure 2.
Pipeline for feature extraction and predictive analysis of CF scores. The top branch represents the statistical analysis pipeline, which included significance testing via t-tests and ANOVA tests using the raw data and subsequent linear regression to isolate the primary demographic variables of interest (education or sex or age). The bottom branch illustrates the machine learning pipeline, which identifies the most important CF or MoCA features to assess the isolated effect of either education, sex, or age.
Figure 2.
Pipeline for feature extraction and predictive analysis of CF scores. The top branch represents the statistical analysis pipeline, which included significance testing via t-tests and ANOVA tests using the raw data and subsequent linear regression to isolate the primary demographic variables of interest (education or sex or age). The bottom branch illustrates the machine learning pipeline, which identifies the most important CF or MoCA features to assess the isolated effect of either education, sex, or age.
Figure 3.
Correlation Matrix of Demographic Target Relationships. The correlation matrix shows near-zero associations: ρ (age, education) = 0.03, ρ (age, sex) = −0.05, and ρ (sex, education) = 0.01.
Figure 3.
Correlation Matrix of Demographic Target Relationships. The correlation matrix shows near-zero associations: ρ (age, education) = 0.03, ρ (age, sex) = −0.05, and ρ (sex, education) = 0.01.
Figure 4.
Classifier Performance by Target and Metric. Panels (a–c) show the performance of three classifiers—SVM (a), Logistic Regression (b), and Random Forest (c)—in predicting demographic targets. Each panel displays classifier performance across Accuracy, Precision, and Recall for three target variables: Education (Ed), Sex, and Age. Logistic Regression (b) achieved the highest accuracy for predicting both Education and Age, while Random Forest (c) outperformed other classifiers in predicting Sex, showing both the highest accuracy and a marked increase in precision.
Figure 4.
Classifier Performance by Target and Metric. Panels (a–c) show the performance of three classifiers—SVM (a), Logistic Regression (b), and Random Forest (c)—in predicting demographic targets. Each panel displays classifier performance across Accuracy, Precision, and Recall for three target variables: Education (Ed), Sex, and Age. Logistic Regression (b) achieved the highest accuracy for predicting both Education and Age, while Random Forest (c) outperformed other classifiers in predicting Sex, showing both the highest accuracy and a marked increase in precision.
Figure 5.
Overlap of Top 15 Predictive Features Across Classification Targets. The Venn diagram illustrates the overlap of the top 15 features contributing to the classification of Education, Sex, and Age. Each circle represents the set of features most important for a given demographic classification task, with overlapping regions indicating features shared across tasks. Features placed in the center are predictive across all three targets, while features located in the pairwise overlaps are shared between two demographic targets. The external legend provides color-coded annotations for individual targets (Education, Sex, Age) as well as their intersections (Education and Sex, Education and Age, Sex and Age, and all three combined). Key for abbreviations in this figure: Imm_1 = Imm_Vertical_Cross, Imm_6 = Imm_Small_Rectangle, Imm_8 = Imm_Four_Parallel_Lines, Imm_11 = Imm_Circle_with_Three_Dots, Imm_17 = Imm_Horizontal_Cross, Imm_18 = Imm_Square_attached_to_Large_Rectangle, Del_9 = Del_Small_Triangle_above_Large_Rectangle, Del_12 = Del_Five_Parallel_Lines, Del_18 = Del_Square_attached_to_Large_Rectangle.
Figure 5.
Overlap of Top 15 Predictive Features Across Classification Targets. The Venn diagram illustrates the overlap of the top 15 features contributing to the classification of Education, Sex, and Age. Each circle represents the set of features most important for a given demographic classification task, with overlapping regions indicating features shared across tasks. Features placed in the center are predictive across all three targets, while features located in the pairwise overlaps are shared between two demographic targets. The external legend provides color-coded annotations for individual targets (Education, Sex, Age) as well as their intersections (Education and Sex, Education and Age, Sex and Age, and all three combined). Key for abbreviations in this figure: Imm_1 = Imm_Vertical_Cross, Imm_6 = Imm_Small_Rectangle, Imm_8 = Imm_Four_Parallel_Lines, Imm_11 = Imm_Circle_with_Three_Dots, Imm_17 = Imm_Horizontal_Cross, Imm_18 = Imm_Square_attached_to_Large_Rectangle, Del_9 = Del_Small_Triangle_above_Large_Rectangle, Del_12 = Del_Five_Parallel_Lines, Del_18 = Del_Square_attached_to_Large_Rectangle.
![Jcm 14 07562 g005 Jcm 14 07562 g005]()
Table 1.
Copy, immediate recall, and delayed recall subscores with coded CF component identifiers. The table lists the conversion of Rey Complex Figure (CF) subscores across the three test phases: copy, immediate recall, and delayed recall. Numbers in parentheses correspond to the coded component identifiers shown in
Figure 1. In the dataset and results, each CF subtest is denoted as X_, where X is replaced by Copy_, Imm_, or Del_, referring to the respective CF subtests.
Table 1.
Copy, immediate recall, and delayed recall subscores with coded CF component identifiers. The table lists the conversion of Rey Complex Figure (CF) subscores across the three test phases: copy, immediate recall, and delayed recall. Numbers in parentheses correspond to the coded component identifiers shown in
Figure 1. In the dataset and results, each CF subtest is denoted as X_, where X is replaced by Copy_, Imm_, or Del_, referring to the respective CF subtests.
| Coded Variable X (Copy_, Imm_, Del_) | Actual CF Subscore |
|---|
| X_1 | Vertical Cross |
| X_2 | Large Rectangle |
| X_3 | Diagonal Cross |
| X_4 | Horizontal Midline of Large Rectangle (2) |
| X_5 | Vertical Midline of Large Rectangle (2) |
| X_6 | Small Rectangle |
| X_7 | Small Horizontal Line above Small Rectangle (6) |
| X_8 | Four Parallel Lines |
| X_9 | Small Triangle above Large Rectangle (2) |
| X_10 | Small Vertical Line within Large Rectangle (2) |
| X_11 | Circle with Three Dots |
| X_12 | Five Parallel Lines |
| X_13 | Sides of Large Triangle attached to Large Rectangle |
| X_14 | Diamond |
| X_15 | Vertical Line within Sides of Large Triangle (13) |
| X_16 | Horizontal Line within Sides of Large Triangle (13) |
| X_17 | Horizontal Cross |
| X_18 | Square attached to Large Rectangle (2) |
Table 2.
VIF and tolerance values for demographic variables.
Table 2.
VIF and tolerance values for demographic variables.
| Variable | VIF | Tolerance |
|---|
| Age | 1.142 | 0.876 |
| Sex | 1.135 | 0.881 |
| Education | 1.110 | 0.901 |
Table 3.
Demographic features and sample sizes.
Table 3.
Demographic features and sample sizes.
| Demographic Feature | Sample Size (N) | % of Study |
|---|
| Male | 271 | 29.27 |
| Female | 655 | 70.73 |
| White | 850 | 91.79 |
| Black | 67 | 7.24 |
| Asian | 9 | 0.97 |
| Age 1 (45–60 years) | 327 | 35.31 |
| Age 2 (60–67 years) | 326 | 35.21 |
| Age 3 (67–80 years) | 273 | 29.48 |
| Education 1 (10–16 years) | 152 | 16.41 |
| Education 2 (16–18 years) | 367 | 39.63 |
| Education 3 (18–22 years) | 407 | 43.95 |
| Total | 926 | |
Table 4.
Descriptive statistics of CF scores, subtest scores, and subtest drawing times (copy, immediate recall, and delayed recall) across demographic groups. The table presents the mean standard deviation for Copy Sum, Immediate Sum, Delayed Sum, and MoCA scores, along with average completion times (in seconds) for the copy, immediate recall, and delayed recall tasks. Data are stratified by sex, race, age group, and education level.
Table 4.
Descriptive statistics of CF scores, subtest scores, and subtest drawing times (copy, immediate recall, and delayed recall) across demographic groups. The table presents the mean standard deviation for Copy Sum, Immediate Sum, Delayed Sum, and MoCA scores, along with average completion times (in seconds) for the copy, immediate recall, and delayed recall tasks. Data are stratified by sex, race, age group, and education level.
| Demographic Group | Copy Sum | Immediate Sum | Delayed Sum | MoCA | Copy Time (s) | Immediate Time (s) | Delay Time (s) |
|---|
| Male | 32.45 ± 3.08 | 18.70 ± 6.58 | 17.67 ± 6.48 | 26.79 ± 1.69 | 164.48 | 129.79 | 105.58 |
| Female | 32.21 ± 3.22 | 17.91 ± 6.05 | 16.78 ± 6.30 | 27.31 ± 1.78 | 160.00 | 129.69 | 103.14 |
| Age 1 | 32.51 ± 3.12 | 19.34 ± 6.19 | 18.10 ± 6.34 | 27.48 ± 1.77 | 166.28 | 135.64 | 106.66 |
| Age 2 | 32.23 ± 3.22 | 18.00 ± 6.06 | 16.99 ± 6.19 | 27.01 ± 1.77 | 156.53 | 125.61 | 102.65 |
| Age 3 | 32.08 ± 3.21 | 16.89 ± 6.20 | 15.84 ± 6.39 | 26.95 ± 1.73 | 161.06 | 127.53 | 101.93 |
| Education 1 | 31.43 ± 3.42 | 16.26 ± 6.53 | 15.47 ± 6.95 | 26.55 ± 1.73 | 161.53 | 128.22 | 102.30 |
| Education 2 | 32.42 ± 3.08 | 18.49 ± 6.15 | 17.40 ± 6.30 | 27.16 ± 1.77 | 159.32 | 129.29 | 105.14 |
| Education 3 | 32.48 ± 3.14 | 18.54 ± 6.04 | 17.30 ± 6.11 | 27.39 ± 1.74 | 163.02 | 130.67 | 103.27 |
Table 5.
p-values from statistical tests assessing the relationship between demographic variables (education, sex, age) and CF subscores (Copy_Sum, Imm_Sum, Del_Sum, and MoCA) in the raw data. The raw data p-values show several statistically significant relationships (p < 0.05), particularly between education and all CF scores, as well as between age and immediate/delayed recall and MoCA.
Table 5.
p-values from statistical tests assessing the relationship between demographic variables (education, sex, age) and CF subscores (Copy_Sum, Imm_Sum, Del_Sum, and MoCA) in the raw data. The raw data p-values show several statistically significant relationships (p < 0.05), particularly between education and all CF scores, as well as between age and immediate/delayed recall and MoCA.
| Independent Variable | Dependent Variable | Raw Data p-Value |
|---|
| Education | Copy_Sum | 0.0027 |
| Imm_Sum | 0.0006 |
| Del_Sum | 0.0097 |
| MoCA | 9.9 × 10−7 |
| Sex | Copy_Sum | 0.2671 |
| Imm_Sum | 0.0471 |
| Del_Sum | 0.0331 |
| MoCA | 0.0001 |
| Age | Copy_Sum | 0.0747 |
| Imm_Sum | 4.85 × 10−7 |
| Del_Sum | 6.49 × 10−6 |
| MoCA | 0.0001 |
Table 6.
The table lists the ten highest-ranked features from random forest classifiers trained to predict education, sex, and age. Feature ranks were derived from model-specific importance scores, where lower rank values indicate greater importance. The “Sum of Ranks” column aggregates feature ranks across all three tasks to highlight features consistently important for prediction. Notably, Copy_time, Imm_time, and MoCA emerged as the three most universally influential features across demographic target classification.
Table 6.
The table lists the ten highest-ranked features from random forest classifiers trained to predict education, sex, and age. Feature ranks were derived from model-specific importance scores, where lower rank values indicate greater importance. The “Sum of Ranks” column aggregates feature ranks across all three tasks to highlight features consistently important for prediction. Notably, Copy_time, Imm_time, and MoCA emerged as the three most universally influential features across demographic target classification.
| Feature | Rank in Ed | Rank in Sex | Rank in Age | Overall Rank |
|---|
| Copy_time | 2 | 2 | 2 | 1 |
| Imm_time | 1 | 5 | 1 | 2 |
| MoCA | 4 | 1 | 4 | 3 |
| Del_time | 5 | 6 | 3 | 4 |
| Imm_Sum | 3 | 8 | 5 | 5 |
| Del_Sum | 7 | 4 | 6 | 6 |
| MoCA_EF | 8 | 3 | 8 | 7 |
| Copy_Sum | 6 | 7 | 7 | 8 |
| MoCA_Mem | 9 | 10 | 9 | 9 |
| Del_17 | 15 | 13 | 10 | 10 |