1. Introduction
Amino acids are the fundamental building blocks of proteins and play essential roles in numerous physiological processes, including protein synthesis, energy metabolism, immune function, and cellular signaling [
1]. Beyond their structural roles, individual amino acids have distinct metabolic functions and are involved in various physiological processes related to health and disease [
2,
3]. Accordingly, growing evidence suggests that the quantity and composition of dietary amino acids, rather than total protein intake alone, are associated with chronic disease risk, metabolic health, and aging-related outcomes [
4,
5,
6].
Accurate assessment of dietary amino acid intake is therefore essential in nutritional epidemiology. However, estimating amino acid intake at the population level remains methodologically challenging. Unlike energy or macronutrients, amino acid composition data are not comprehensively available for all foods, and protein values reported in food composition tables are often calculated using nitrogen conversion factors [
7]. As a result, total protein intake alone may be insufficient to accurately characterize dietary protein quality, as differences in individual amino acid composition and digestibility can influence the nutritional contribution of protein-containing foods [
8].
The construction of an amino acid database is a critical step in enabling amino-acid-specific dietary assessment. Such databases allow researchers to translate food intake data into quantitative estimates of individual amino acid intake, thereby facilitating more detailed analyses of diet–disease relationships [
9]. This need is particularly pronounced in large-scale epidemiological studies that aim to evaluate long-term dietary exposure, where systematic errors in nutrient estimation can influence observed associations [
10].
Early studies demonstrated the feasibility of estimating dietary amino acid intake through the development of amino acid composition databases linked to dietary intake data [
9]. More recent research has expanded this approach by integrating nationally representative dietary survey data with comprehensive food composition databases to estimate population-level amino acid intake and examine temporal trends and major food sources [
11,
12]. Despite these advances, relatively few studies have focused on the development of standardized, rule-based algorithms for systematically assigning amino acid composition data to dietary intake instruments, particularly in the context of food matching, substitution, and the treatment of incomplete amino acid composition data.
The Korean Genome and Epidemiology Study (KoGES) is a large, population-based cohort designed to investigate the incidence and determinants of chronic diseases through comprehensive assessments of dietary intake, lifestyle factors, clinical indicators, and genetic information. Although the KoGES food composition database has been periodically updated and partial amino acid information has been incorporated, the application of a standardized, reproducible, algorithm-based framework for systematically estimating amino acid intake from food frequency questionnaire (FFQ) data, as well as consistent coverage of a broad range of amino acids, has not yet been fully established.
Therefore, the present study aimed to develop and validate a standardized, rule-based algorithm for food matching and substitution of amino acid composition data based on the KoGES FFQ. Using this algorithm, we constructed a comprehensive amino acid database and evaluated its coverage and applicability by applying it to data from the KoGES Ansan and Ansung cohorts. This study provides a transparent and reproducible methodological framework for amino-acid-specific dietary assessment in large-scale cohort and dietary survey research.
2. Materials and Methods
2.1. Study Population and Dietary Assessment
The KoGES is a large, population-based cohort established to investigate the incidence and determinants of chronic diseases among Korean adults. The present study used data from the Ansan and Ansung cohorts, applying dietary information collected during the second follow-up (Phase 3) survey conducted in 2005–2006 to examine the applicability of the constructed amino acid database in cohort-based dietary assessment. This survey wave was selected because it was the first assessment to employ the expanded 106-item FFQ, which was revised in 2004 to improve dietary coverage compared with the original 103-item FFQ developed in 2001 [
13].
Dietary intake was assessed using a validated semi-quantitative FFQ developed for KoGES. The second follow-up survey employed the 106-item FFQ, designed to assess usual dietary intake over the previous year. Amino acid intake estimation was conducted based on 475 food items derived from the KoGES FFQ recipe database, to which the constructed amino acid database was applied. This consent form in the study was exempted by the Institutional Review Board of Sangmyung University (IRB-SMU-S2024-1-005), in accordance with the Bioethics and Safety Act, as it involved the secondary analysis of de-identified data.
2.2. Algorithm for Food Matching and Substitution in Amino Acid Database Construction
An amino acid database was constructed to enable the estimation of dietary amino acid intake from the KoGES food frequency questionnaire (FFQ). A standardized, rule-based algorithm was applied to assign amino acid composition values to FFQ food items through a sequential decision process. The rule-based algorithm was developed based on established procedures used in food composition database construction and previous studies on nutrient matching and substitution, with adaptations to reflect the structure and characteristics of the KoGES FFQ [
9,
11,
12,
14,
15].
First, FFQ food items were matched to food composition database entries based on exact concordance of food names, which served as an initial screening criterion. When food names were concordant, the consistency of preparation and processing forms (e.g., raw, boiled, fried, or dried) was examined, as differences in preparation methods may substantially affect nutrient composition [
9,
14]. For food items with identical preparation forms, nutritional similarity was further evaluated by comparing energy, carbohydrate, protein, fat, and moisture contents. If differences in these components were within ±20%, analytically measured amino acid values were directly assigned to the corresponding FFQ food item [
9,
14,
15]. The ±20% threshold was selected based on previous studies and established practices in food composition database matching, which consider this range acceptable for identifying nutritionally comparable foods while accounting for variability due to food composition, preparation methods, and analytical measurement [
14].
If differences in energy or macronutrient composition exceeded ±20% despite identical preparation forms, the algorithm evaluated the availability of analytically measured amino acid data for nutritionally similar foods with different preparation forms. When such data were available, amino acid values were estimated using calculated values derived from analytically measured data of similar foods, in accordance with the predefined algorithmic rules. When neither directly matched analytical values nor analytically measured values from comparable foods were available, the assignment of amino acid values for the corresponding FFQ food item was temporarily deferred at this stage of the database construction process. Food items with incomplete or inconsistent compositional data were handled according to the predefined rule-based algorithm, which applies stepwise criteria for food matching, substitution, and calculation. This approach was intended to use the best available data while maintaining consistency in the assignment process.
When food names were not concordant, or when food names were concordant but preparation or processing forms were not identical, the algorithm next evaluated the availability of analytically measured amino acid data for similar food items. If analytical data for a similar food item were available and differences in energy, carbohydrate, protein, fat, and moisture contents were within ±20%, the corresponding amino acid values were directly assigned and classified as substituted values.
If analytical data for a similar food item were available but nutrient differences exceeded ± 20%, amino acid values were estimated using calculated values derived from the analytical data of the similar food item, in accordance with the algorithm-defined decision rules. If no analytically measured amino acid data were available for any similar food item, amino acid database construction for the corresponding FFQ food item was temporarily deferred.
In this algorithm, similar foods were defined based on food characteristics and data availability. For natural foods, similarity was determined by comparing taxonomic characteristics (species, genus, or family) using domestic and international analytical data. For processed foods, similarity was determined based on comparable food types, with consideration of the primary ingredient and its protein content.
2.3. Definition and Calculation of Total and Essential Amino Acids
Total amino acids (TAA) and essential amino acids (EAA) were calculated based on the definition and composition criteria adopted in the Korean Food Composition Table provided by the Rural Development Administration (RDA). Essential amino acids are defined as amino acids that cannot be synthesized endogenously in sufficient amounts and must therefore be obtained through the diet, as described in established nutritional guidelines. This approach was adopted because the Korean Food Composition Table represents the national standard for nutrient composition data and provided the highest proportion of available amino acid information among the data sources used in this study. Specifically, TAA was defined as the sum of the following 19 amino acids: isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine, histidine, arginine, tyrosine, cysteine, alanine, aspartic acid, glutamic acid, glycine, proline, and serine. EAA was defined as the sum of isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine, histidine, and arginine, consistent with the RDA classification, which includes histidine and arginine as essential amino acids during infancy and early growth; this classification was retained for consistency with the data source. For each FFQ food item, TAA and EAA were calculated by summing the available individual amino acid values. All individual amino acids, as well as TAA and EAA, were expressed in milligrams (mg), following the unit conventions of the Korean Food Composition Table.
When composition data for one or more constituent amino acids were unavailable for a given food item, TAA and EAA were computed using the sum of the remaining available amino acids, following the approach applied in the Korean Food Composition Table. As a result, TAA and EAA values were successfully derived for all 475 FFQ food items, yielding complete coverage for both indices despite partial missingness in specific amino acids such as taurine.
2.4. Amino Acid Data Sources and Nitrogen-Based Estimation
Amino acid composition data were obtained from multiple domestic and international sources, including the Korean Food Composition Table (10th, 10.3rd revisions) [
16,
17], the Standard Marine Products Composition Table (2018) [
18], the Standard Processed Seafood Composition Table (2023) [
19], the Standard Tables of Food Composition in Japan (7th revised edition) [
20], the USDA FoodData Central database [
21], and relevant published literature [
22,
23,
24,
25,
26]. When multiple data sources were available for a given food item, analytically measured values from domestic databases were prioritized.
For food items without direct amino acid composition data, nitrogen content was derived from protein values using food-group-specific nitrogen-to-protein conversion factors. Nitrogen content was calculated as shown in Equation (1):
The nitrogen-to-protein conversion factors applied in this study were based on those used in the 9th revision of the Korean Food Composition Table, which were derived from the Standard Tables of Food Composition in Japan (7th revised edition). When amino acid values were estimated by substitution, a nitrogen correction approach was applied using the following Equation (2):
For natural foods, substitution was performed based on domestic analytical data with consideration of species and cultivar similarity. For processed foods, substitution was conducted using nutritionally similar food types, with additional consideration of nitrogen content and processing characteristics.
2.5. Database Quality Control and Application
Assigned amino acid values were classified into three categories according to their derivation: analytically measured values, calculated values, and substituted values. Coverage rates were calculated for each amino acid to assess database completeness, and all procedures were reviewed to minimize potential misclassification.
The constructed amino acid database was applied to dietary data from the KoGES Ansan and Ansung cohorts to estimate individual-level protein and amino acid intakes. Dietary intakes were assessed using a validated semi-quantitative FFQ, and amino acid intakes were calculated by linking FFQ food items to the constructed database. Total amino acid intake was defined as the sum of amino acids from all reported FFQ food items, while EAA intake was calculated as the sum of individual essential amino acids.
Participants were categorized into three age groups (30–49, 50–64, and 65–74 years) based on the Dietary Reference Intakes for Koreans (KDRI) age categories. Although the KDRI defines additional age groups, only these three categories were applicable to the age range of participants included in the present cohort analysis. Intake adequacy was evaluated by calculating the proportion of participants with protein and essential amino acid intakes below the Estimated Average Requirement (EAR), stratified by sex and age group. The use of EAR values from the KDRIs is appropriate for evaluating nutrient adequacy at the population level in generally healthy Korean adults, such as those included in the KoGES cohort, as EAR is specifically designed for group-level assessment. The validity of the constructed database was assessed by examining coverage rates and the consistency of the estimated amino acid intakes across participants. In addition, the relationship between total amino acid intake and protein intake was evaluated to assess internal consistency, and the distribution of amino acid intakes was compared with ranges reported in previous studies. Given that the protein EAR for Koreans aged 1 year and older is calculated on a body weight basis (0.66 g/kg/day)/0.9 × body weight, with additional allowances for growth during periods of development, the higher prevalence of intakes below the EAR among older adults and females suggests that protein intake per kilogram of body weight tends to be lower with increasing age and in females [
27].
2.6. Statistical Analysis
Descriptive statistics were used to summarize the composition and coverage of the amino acid database. Amino acid intake estimates are presented as means and standard deviations. All statistical analyses were performed using SAS Studio (SAS version 9.4; SAS Institute Inc., Cary, NC, USA) via remote secure access provided by the Clinical & Omics Data Archive (CODA) at the National Institute of Health, Korea. Individual-level KoGES data were accessed and analyzed within the CODA research environment under approved data use agreements, ensuring secure data handling in accordance with institutional and national guidelines.
3. Results
The developed rule-based algorithm was applied to all 475 food items included in the KoGES FFQ to evaluate its applicability and to quantify how food items were processed at each decision step.
Figure 1 illustrates the distribution of FFQ food items as they progressed through the algorithm, with the number of items assigned at each decision node indicated.
Among the 475 FFQ food items, the majority were assigned amino acid values through direct matching or evaluation of equivalent food properties. A subset of food items required additional assessment based on nutrient differences and the availability of analytical data for similar foods, resulting in the assignment of calculated or substituted amino acid values. No FFQ food items were excluded from amino acid database construction at the final stage of the algorithm.
Table 1 summarizes the composition of the constructed amino acid database according to the type of assigned values and data sources for the 475 FFQ food items, in relation to the outcomes of the rule-based algorithm.
As shown in
Figure 1, FFQ food items were first evaluated for the availability of analytically measured amino acid data and the presence of nutritionally comparable foods within the same food group. When foods with similar nutrient profiles were identified, amino acid values were assigned using either analytically measured values or substituted values derived from official food composition tables or validated analytical literature. Through this process, 30.9% (
n = 147) of FFQ food items were assigned analytically measured values.
When foods with similar characteristics were available but showed substantial differences in energy, carbohydrate, protein, fat, or moisture content, amino acid values were assigned using calculated values based on nitrogen content and food-group-specific estimation procedures. This pathway accounted for the largest proportion of FFQ food items, 64.2% (n = 305), reflecting the frequent need for compositional adjustment during database construction. Calculated values were primarily derived from the Korean Food Composition Table (version 10.3), with supplementary data from official seafood composition tables, the Standard Tables of Food Composition in Japan, the USDA FoodData Central database, and peer-reviewed literature.
A smaller subset of FFQ food items, 4.8% (n = 23), required substituted values, which were assigned when analytically measured amino acid data were unavailable for the target food despite the presence of nutritionally similar foods. These substitutions were based on officially published food composition tables and were applied in accordance with the final decision steps of the algorithm.
In addition to national and international food composition tables, peer-reviewed analytical studies were used only for food items lacking complete amino acid profiles in official databases. Specifically, amino acid composition data for leaves of Lactuca indica, Japanese kelp (Saccharina japonica), wasabi leaves (Wasabia japonica), pinus leaves, and dog meat were obtained from studies employing officially recognized analytical methods. For dried kelp, overlapping analytical observations were available and incorporated into the database, resulting in a total of six analytical observations (n = 6).
Overall, the algorithm enabled the systematic assignment of amino acid values to all 475 FFQ food items (100%) by differentiating between analytical, calculated, and substituted values according to data availability and nutritional comparability, thereby ensuring comprehensive database coverage.
Table 2 summarizes the coverage of total and individual amino acids across the 475 FFQ food items included in the constructed amino acid database. Coverage was defined as the proportion of FFQ food items for which amino acid values were assigned using the rule-based algorithm. TAA and EAA were available for all FFQ food items, resulting in complete coverage. All individual essential amino acids, including isoleucine, leucine, lysine, methionine, phenylalanine, threonine, valine, and histidine, also showed full coverage. Among non-essential amino acids, most achieved complete coverage across FFQ food items. Tryptophan and tyrosine showed slightly lower coverage, with values assigned to 98.1% (
n = 466) and 99.2% (
n = 471) of FFQ food items, respectively. Taurine values were available for 77.1% (
n = 366) of FFQ food items. Overall, the coverage varied across individual amino acids, reflecting differences in data availability among amino acid composition sources.
Dietary protein and amino acid intakes estimated by applying the constructed amino acid database to the KoGES Ansan and Ansung cohort data are presented in
Table 3. When applied to cohort dietary data, the database produced distributions of protein and amino acid intakes that were comparable to previously reported ranges in similar adult populations indirectly supporting the applicability and validity of the constructed amino acid database.
Among individual essential amino acids, leucine (3903.7 mg/day) and lysine (2806.0 mg/day) showed relatively higher intakes, whereas tryptophan intake (579.4 mg/day) was the lowest. Intakes of all essential amino acids were higher in men than in women. Among non-essential amino acids, glutamic acid (9625.4 mg/day) and aspartic acid (4875.5 mg/day) accounted for the highest intakes, while taurine intake was 136.8 mg/day. Overall, most individual amino acid intakes were higher in men than in women.
The mean protein intake among all participants was 57.0 g/day, while TAA intake was corresponding to 86.7% of total protein intake (
Figure 2). The mean intake of EAA was 21,703.4 mg/day. By sex, mean protein intake was 60.8 g/day in men and 53.5 g/day in women. The corresponding TAA intakes were 54,026.9 mg/day in men and 45,165.3 mg/day in women, representing 88.8% and 84.5% of protein intake, respectively (
Figure 2). Mean EAA intake was 23,797.1 mg/day in men and 19,794.9 mg/day in women.
The proportions of participants with intakes below the EAR for protein and essential amino acids by age group and sex are shown in
Figure 3. In the total population, the prevalence of inadequate intake generally increased with age across most nutrients. Participants aged 65–74 years consistently exhibited the highest proportions of EAR inadequacy, followed by those aged 50–64 years and 30–49 years.
Protein inadequacy increased markedly with age, rising from 21.8% among adults aged 30–49 years to 44.7% in those aged 65–74 years. Similar age-related trends were observed for lysine (32.0% to 50.3%) and phenylalanine + tyrosine (10.4% to 22.5%). Branched-chain amino acids, including leucine, isoleucine, and valine, also showed progressively higher inadequacy rates in older age groups.
Sex-stratified analyses revealed that women consistently exhibited higher proportions of EAR inadequacy than men across most nutrients and age groups. Among women, the prevalence of protein intake below the EAR increased from 18.0% in the 30–49-year group to 42.3% in the 65–74-year group, whereas the corresponding values in men were 25.7% and 47.8%, respectively. Similar patterns were observed for lysine and other essential amino acids.
In contrast, tryptophan and histidine showed relatively low proportions of inadequacy across all age and sex groups, although older adults still demonstrated higher prevalence than younger adults. Overall, these findings indicate pronounced age-related increases in EAR inadequacy, with women generally exhibiting higher rates of inadequate protein and essential amino acid intakes compared with men.
4. Discussion
This study developed and implemented a standardized, rule-based algorithm to construct an amino acid database applicable to FFQ data and applied the resulting database to the KoGES Ansan and Ansung cohorts. The algorithm systematically integrated analytically measured, calculated, and substituted amino acid values using sequential decision rules based on food characteristics, preparation methods, and nutritional similarity, enabling transparent documentation of assignment pathways across all FFQ food items. Through this approach, amino acid values were successfully assigned to 100% of FFQ foods, achieving complete coverage for total amino acids and essential amino acids, with high coverage for most non-essential amino acids.
Application of the constructed database to the second follow-up (Phase 3) of the KoGES Ansan and Ansung community-based cohorts yielded plausible distributions of protein and amino acid intakes and enabled comprehensive population-level assessment of intake patterns and adequacy. Total amino acid intake accounted for approximately 86.7% of total protein intake, reflecting the expected relationship between protein and its constituent amino acids. Although the observed protein and amino acid intakes were lower than those reported in the Korea National Health and Nutrition Examination Survey [
12,
28], they were comparable to values reported in previous cohort-based studies, suggesting consistency with long-term dietary assessment in community-based populations [
14]. Among essential amino acids, leucine and lysine contributed the largest shares of intake, whereas tryptophan showed the lowest intake, consistent with known amino acid composition patterns of commonly consumed foods. Moreover, intakes of most amino acids were higher in men than in women. Among non-essential amino acids, glutamic acid showed the highest intake, which is consistent with previous reports reflecting its abundance in protein-rich foods and mixed diets [
12].
Importantly, the database facilitated evaluation of intake adequacy using EAR-based criteria, revealing substantial proportions of participants with inadequate intakes of protein and several essential amino acids. Pronounced age-related increases in EAR inadequacy were observed, with adults aged 65–74 years consistently exhibiting the highest prevalence, particularly for protein, lysine, phenylalanine + tyrosine, and branched-chain amino acids. Clear sex differences were also evident, with women generally showing higher proportions of EAR inadequacy across most nutrients and age groups. Together, these findings demonstrate that the constructed database not only supports estimation of dietary amino acid intake from FFQ data but also enables identification of nutritionally vulnerable subgroups within population-based cohorts. These observed disparities may be partly explained by differences in overall energy and protein intake across age and sex groups. Older adults may have lower total food intake, while women may have lower absolute protein intake compared to men, which could contribute to the higher prevalence of inadequacy observed among older adults and women, consistent with previous findings in Korean populations reporting age- and sex-related differences in nutrient intake and adequacy [
29]. Furthermore, given that the protein EAR for Koreans aged 1 year and older is calculated on a body weight basis (0.66 g/kg/day)/0.9 × body weight, with additional allowances for growth during periods of development, the higher prevalence of intakes below the EAR among older adults and women suggests that protein intake per kilogram of body weight tends to be lower with increasing age and in women [
27].
These observed patterns of amino acid inadequacy are particularly relevant given growing evidence linking dietary amino acid intake to metabolic health. Beyond methodological considerations, accumulating evidence indicates that dietary amino acid intake is associated with metabolic health and chronic disease outcomes. Previous studies in Korean adult populations have reported associations between amino acid intake patterns and cardiometabolic risk factors, including dyslipidemia, metabolic syndrome, and insulin resistance [
3,
6,
30]. These findings underscore the relevance of amino-acid-specific dietary assessment beyond total protein intake and highlight the importance of reliable estimation of individual amino acid intake in epidemiological research.
A major strength of the present study lies in the use of a standardized, algorithm-based approach to guide amino acid value assignment from FFQ data. Dietary assessment in epidemiological research commonly relies on self-reported methods, such as FFQs, which are known to be subject to various sources of error, including limitations in food composition databases used to convert reported food intake into nutrient estimates [
31]. In this context, careful handling of food matching and nutrient assignment procedures is essential to improve the quality and interpretability of dietary intake data.
Previous studies have demonstrated the feasibility of estimating dietary amino acid intake by linking food composition databases to dietary assessment data [
9,
11]. However, food composition data are often derived from multiple sources, and amino acid information may be incomplete or unavailable for certain foods, necessitating the use of calculated or substituted values. Methodological research on food composition data emphasizes that, when such procedures are applied, the criteria and decision processes should be clearly defined and documented [
32].
In the present study, sequential decision rules were explicitly established to guide food matching, substitution, and estimation of amino acid values, considering food names, preparation methods, and nutritional similarity. By formalizing these procedures within a transparent framework, the algorithm-based approach adopted in this study enhances methodological clarity and reproducibility. Such a structured approach may support the application of amino acid database construction to other cohort studies and dietary assessment instruments, particularly in large-scale epidemiological settings where heterogeneous food composition data are commonly encountered [
31,
32].
Another notable feature of the constructed database is its comprehensive coverage across FFQ food items. Complete coverage was achieved for total amino acids and all EAAs, while most non-essential amino acids also showed full or near-complete coverage. These results reflect the effectiveness of the standardized algorithm in accommodating diverse food items and data sources. Similar challenges in achieving complete amino acid coverage have been reported in other amino acid database construction efforts; for example, a Japanese amino acid composition database contained values for only a fraction of foods in the standard table of food composition, necessitating extensive imputation for missing values [
9].
Lower coverage was observed for certain amino acids such as taurine, reflecting the limited availability of analytically measured taurine composition data in existing food composition databases. This limitation is inherent to the nature of available food composition data rather than the database construction process itself and highlights the need for continued expansion of analytical data for specific amino acids in food composition research.
Several limitations of this study should be acknowledged. First, the constructed database did not include all amino acids uniformly across all food items, reflecting constraints in the availability of analytically measured amino acid composition data. Although the standardized algorithm enabled extensive coverage, incomplete data for certain amino acids remained an inherent limitation of current food composition resources. Second, the amino acid database developed in this study was designed and applied specifically for use with cohort-based dietary data derived from the KoGES FFQ. Accordingly, its direct applicability to other dietary assessment tools or survey formats may be limited, and caution is warranted when extending the database beyond similar cohort settings. In addition, as the database was developed based on food items, dietary patterns, and food composition data specific to the Korean population, its direct application to other populations may be limited. However, the overall framework of the rule-based algorithm, including the stepwise approach to food matching, substitution, and estimation, may be adaptable to other populations with appropriate modification of food composition databases and context-specific criteria.
Finally, the present database primarily focused on foods represented in the cohort FFQ, and comprehensive amino acid composition data for a wide range of processed foods were not constructed. In addition, the applicability of the database was evaluated using FFQ-based dietary data only, and direct validation against short-term dietary assessment methods, such as 24 h dietary recalls or weighed food records, was not performed. Future studies incorporating expanded processed food databases and multiple dietary assessment methods would strengthen the robustness and generalizability of amino acid intake estimation.
Despite these limitations, the present study provides a transparent and reproducible methodological framework for constructing amino acid databases applicable to epidemiological research using FFQ data. By explicitly defining sequential decision rules for food matching, substitution, and estimation, the algorithm-based approach developed in this study enables systematic handling of heterogeneous and incomplete amino acid composition data in large-scale cohort settings.
Given the growing body of evidence linking dietary amino acid intake to chronic disease risk in Korean populations [
3,
6,
26], the constructed database and algorithmic framework offer a practical foundation for investigating amino acid–disease relationships beyond total protein intake. Furthermore, the successful application of the database to the KoGES Ansan and Ansung cohorts demonstrates its feasibility for population-level dietary assessment within long-term cohort studies. Future research may build upon this framework by incorporating additional analytically measured amino acid data, expanding coverage for underrepresented food groups—including processed foods—and applying the algorithm to other dietary assessment tools and population groups. Such efforts would further strengthen the validity, robustness, and generalizability of amino acid intake estimation in nutritional epidemiology.