Genetic Diversity and Nutritional Composition of Cottonseed: A Multi-Trait Analysis

Wang, Zhong; Liu, Huayuan; Zou, Ying; Zheng, Kai; Abdukerim, Sibanur; Wu, Shuaijun; Ma, Jingjing; Chen, Quanjia; Deng, Xiaojuan

doi:10.3390/agriculture16050514

Open AccessArticle

Genetic Diversity and Nutritional Composition of Cottonseed: A Multi-Trait Analysis

by

Zhong Wang

,

Huayuan Liu

,

Ying Zou

,

Kai Zheng

,

Sibanur Abdukerim

,

Shuaijun Wu

,

Jingjing Ma

,

Quanjia Chen

and

Xiaojuan Deng

^*

College of Agronomy, Xinjiang Agricultural University, Urumqi 830052, China

^*

Author to whom correspondence should be addressed.

Agriculture 2026, 16(5), 514; https://doi.org/10.3390/agriculture16050514

Submission received: 22 December 2025 / Revised: 7 February 2026 / Accepted: 20 February 2026 / Published: 26 February 2026

(This article belongs to the Section Crop Genetics, Genomics and Breeding)

Download

Browse Figures

Versions Notes

Abstract

Cotton is one of the most significant economic crops cultivated worldwide. Cottonseed is a strategic reservoir of high-quality plant protein and an underexploited resource for the food and feed industries. To quantify nutritional diversity and identify superior germplasm, we evaluated 312 upland cotton (Gossypium hirsutum L.) accessions over two consecutive growing seasons and characterized 30 agronomic and nutritional traits. Protein content varied widely (29.6–48.8%), with a coefficient of variation of 7.5–11.7% and a two-year mean of 37.0%. Glutamic acid (Glu; 154.0 mg/g) and aspartic acid (Asp; 90.7 mg/g) were the most abundant amino acids, and lysine and arginine were relatively high among essential amino acids. Correlation analysis based on genotype best linear unbiased estimates (BLUEs) showed that most nutritional traits were positively or neutrally associated with key yield-related traits, particularly lint percentage (LP) (e.g., protein vs. LP: r = 0.18, p < 0.01), indicating the feasibility of simultaneous improvement in seed nutritional quality and lint yield potential. Using 29 core traits with complete two-year data, we developed an integrated evaluation framework combining principal component analysis (PCA), grey relational analysis (GRA), TOPSIS, and the analytic hierarchy process (AHP) to rank accessions comprehensively. This framework identified 10 elite germplasm lines with high protein content and favorable yield potential, exemplified by “Xinluzhong 34” (Rank 1; phenotypic comprehensive value, P_i = 0.733). These results provide a quantitative foundation for value-added cottonseed utilization and support breeding strategies aimed at developing cultivars with both high yield and enhanced nutritional quality.

Keywords:

Gossypium hirsutum L.; cottonseed; protein; amino acids; agronomic traits; comprehensive evaluation

1. Introduction

Cotton (Gossypium spp.) underpins the global textile industry as the dominant natural fiber crop. Yet its main byproduct, cottonseed, is still undervalued and insufficiently utilized [1,2]. As demand for sustainable, high-quality plant protein continues to rise, reliance on conventional sources such as soybean and animal proteins is increasingly constrained [2]. Therefore, expanding non-traditional protein resources is crucial for food security and accelerating a circular bioeconomy [1].

Cottonseeds are nutrient-dense, typically containing 40.0–50.0% crude protein in defatted meals [3]. Comparative nutritional profiling indicates that the amino acid composition of cottonseed protein is analogous to that of soybean meal, offering a rich source of essential amino acids vital for human and livestock health [3,4,5]. Although lysine is generally the first limiting amino acid, cottonseed is exceptionally abundant in arginine, glutamic acid, and methionine, which are critical for immune regulation and muscle development [3]. Furthermore, cottonseed meal contains significant concentrations of essential minerals, including phosphorus (P), magnesium (Mg), and iron (Fe), often exceeding the levels found in cereal-based ingredients [3]. Historically, the utilization of cottonseed has been constrained by the presence of toxic gossypol. However, technological advancements have fundamentally shifted consumption paradigms, significantly enhancing the economic valorization of cottonseed proteins. The development of “ultra-low gossypol” varieties and modern detoxification processes has validated cottonseed protein [6] as a safe functional ingredient for food systems and a cost-effective alternative to fish meal and soybean protein in livestock and aquaculture industries [7]. Recent food science research has demonstrated that cottonseed protein isolate possesses superior oil-holding capacity and emulsification properties, facilitating its increasing application in meat analogs and fortified baked goods [1]. Moreover, the U.S. FDA’s regulatory approval of specific low-gossypol cottonseed varieties for human consumption has further solidified its market potential, positioning it as a “clean-label” alternative aligned with modern dietary trends [6]. Despite these prospects, the high-value utilization of cottonseed currently faces two primary bottlenecks: (1) a paucity of systematic fundamental data and (2) insufficient screening of elite germplasm resources, which impedes directional breeding and comprehensive utilization. Although preliminary evaluations of nutritional quality exist, previous studies have predominantly focused on isolated components, lacking a holistic assessment [8]. Furthermore, environmental factors, such as drought and temperature, significantly modulate the asynchronous accumulation of amino acids [9]. Given these phenotypic complexities, single-trait selection is inefficient, necessitating the adoption of a multivariate evaluation system.

Multivariate statistical analysis offers robust tools for dissecting complex phenotypic data. PCA is widely employed to mitigate data redundancy and has been proven effective in characterizing the genetic diversity of major crops, such as wheat and cotton, as demonstrated by Vijeth et al. [10,11,12]. The GRA assesses the geometric similarity between reference sequences and has been successfully applied to evaluate legume germplasm, including peanuts, by Zhao et al. [13,14]. Furthermore, AHP and TOPSIS provide structured frameworks for multi-criteria decision-making, demonstrating high efficacy in optimizing variety selection for crops such as sugarcane and foxtail millet within complex agricultural systems, as evidenced by Gbegbelegbe et al. [11,15,16]. Despite their individual potential, these methodologies have not yet been systematically integrated for a comprehensive evaluation of upland cotton.

To address these knowledge gaps, we hypothesized that (1) cottonseed nutritional traits (protein and amino acid composition) show substantial genotypic diversity among upland cotton accessions and (2) nutritional quality is not necessarily antagonistic to key yield-related traits, thereby allowing the simultaneous improvement of yield and quality through selection. Accordingly, the objectives of this study were to: (1) systematically quantify the variation and phenotypic distribution of 30 agronomic and nutritional traits in 312 upland cotton germplasm accessions evaluated across two consecutive growing seasons; (2) characterize the relationships between nutritional components and agronomic performance; and (3) integrate PCA, GRA, AHP, and TOPSIS to construct a comprehensive evaluation model for identifying elite germplasm with superior overall performance.

2. Materials and Methods

2.1. Experimental Material

A diverse panel of 312 upland cotton (Gossypium hirsutum L.) accessions were evaluated in this study. All germplasm resources were provided by the Cotton Innovation Team of the Xinjiang Agricultural University. The population comprised 108 conventional varieties from the Northwest Inland Region (NW), 101 from the Yellow River Basin (YR), and 34 from the Yangtze River Basin (YZ), along with 24 local varieties (Local) and 45 foreign introductions (Intro). Detailed information on these accessions is provided in Table S7.

2.2. Experimental Design

Field experiments were conducted during the 2023 and 2024 growing seasons (mid-April to late October) at the Experimental Station in Yuepuhu County, Kashgar Prefecture, Xinjiang Uygur Autonomous Region, China (39.243° N, 76.801° E; Figure 1A). Daily meteorological data for the experimental period (April–December) were obtained from the National Meteorological Information Center (https://data.cma.cn/), as shown in Figure 1B.

The field trial was conducted over two consecutive years on the same field using a randomized complete block design (RCBD). The field was divided into three blocks, and each block served as one biological replicate. The planting configuration followed the local standard high-yield cultivation model, which uses plastic film mulching with drip irrigation. Specifically, one mulch film covered six crop rows arranged in a wide-narrow row pattern (66 cm + 10 cm). Each experimental plot consisted of two 3.0 m rows representing a single genotype, with an intra-row plant spacing of 10 cm. All agronomic management practices were performed according to local standards to minimize environmental heterogeneity.

2.3. Analysis of Cottonseed Protein Content

The crude protein content was determined using the Biuret method [17], with Bovine Serum Albumin (BSA) as the standard.

2.3.1. Standard Curve Construction

A stock standard solution was prepared by dissolving 0.4 g of BSA in 0.1 mol/L NaOH to obtain a final volume of 25 mL. To construct a standard curve, a graded series of this solution (0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, and 1.4 mL) was transferred into separate colorimetric tubes. Each tube was supplemented with 8 mL of Biuret reagent (containing copper sulfate, potassium sodium tartrate, and NaOH) and diluted to a final volume of 10 mL with 0.1 mol/L NaOH solution. The mixtures were vortexed and incubated at room temperature for 40 min before use. The absorbance was measured at 555 nm using a spectrophotometer. A standard curve (Figure 2) was generated by plotting the absorbance (A) against the volume of the standard solution (V, mL).

2.3.2. Sample Preparation and Measurement

Cottonseed samples were collected from the same field over two consecutive years. Seeds harvested from the three field replicates (blocks) of each accession within a year were pooled to form one composite sample. The composite sample was then analyzed in triplicate as technical replicates, and the mean value was used for subsequent analyses. The samples were ground to pass through an 80-mesh sieve, and 200–500 mg of the powder was defatted with ether. The residue was extracted with 10 mL of 0.1 mol/L NaOH for 40 min and centrifuged at 3000 rpm for 10 min. A 2 mL aliquot of the supernatant was collected and subjected to the same colorimetric reaction and measurement procedure described in Section 2.3.1. Protein content (%) was calculated as Protein content (%) = [(C × V × 10)/(a × W)] × 100%, where C is the concentration of the BSA standard solution (mg/mL), V is the equivalent volume derived from the standard curve (mL), 10 is the total extraction volume (mL), a is the supernatant volume used for the assay (2 mL), and W is the sample mass (mg).

2.4. Determination of Amino Acid Content in Cottonseeds

Amino acid composition was determined by HPLC using an amino acid standard solution (Sigma-Aldrich Inc., St. Louis, MO, USA; 3050 Spruce Street, St. Louis, MO 63103, USA; AAS18-5 mL) for calibration and quantification. Cottonseed samples were collected from the same field over two consecutive years. Seeds harvested from the three field replicates (blocks) of each accession within a year were pooled to form one composite sample. The composite sample was then analyzed in triplicate as technical replicates, and the mean value was used for subsequent analyses.

Acid Hydrolysis: Dehulled cottonseed powder (0.1000 g ± 0.0001 g) was weighed into a 15 mL centrifuge tube, mixed with 1.0 mL of 6 mol/L HCl, and homogenized by oscillation for 5 min. The tube was sealed and hydrolyzed at 105 °C for 24 h. After cooling to room temperature, the hydrolysate was neutralized with 6 mol/L NaOH, diluted to 2.0 mL with distilled water, mixed for 5 min, and centrifuged at 4000 rpm for 10 min. The supernatant was collected.

Derivatization: An aliquot of the supernatant (0.5 mL) was transferred to an amber tube. Sodium bicarbonate buffer (0.5 mL, 0.5 mol/L, pH 9.0) and DNFB reagent (0.5 mL) were added, and the mixture was incubated at 60 ± 0.5 °C for 60 min in the dark. The reaction was quenched with 3.5 mL phosphate buffer (pH 7.0), mixed thoroughly, left for 15 min in the dark, and filtered through a 0.22 μm membrane prior to injection.

HPLC Conditions: Analysis was performed on an Agilent 1100 HPLC system (Agilent Technologies, Inc., Santa Clara, CA, USA; 5301 Stevens Creek Blvd, Santa Clara, CA 95051, USA) equipped with a variable wavelength detector (VWD) and a Diamonsil AAA column (Dikma Technologies Inc., Foothill Ranch, CA, USA; 51 Massier Lane, Foothill Ranch, CA 92610, USA) maintained at 38 °C. The injection volume was 20 μL. The gradient elution program is shown in Table 1.

Tryptophan was quantified; however, because it was unavailable for one growing season, it was excluded from the across-year BLUE estimation and subsequent multivariate analyses.

2.5. Evaluation of Agronomic Traits

Field agronomic traits were systematically evaluated during the boll-opening stage (mid- to late August) in strict accordance with the Specifications and Data Standards for Cotton Germplasm Resource Description. Five consecutive plants exhibiting uniform growth were selected from each of the three field replicates (blocks) for each genotype. Morphological measurements included PH (distance from the cotyledonary node to the main stem apex), GH (distance from the cotyledonary node to the first fruiting branch node), and FFN. Additionally, BN and SBP were measured. The angle and length of the lower fruiting branches were measured using a protractor and measuring tape, respectively. Harvesting and laboratory analyses were conducted in early October. Twenty naturally opened bolls were collected from the five sampled plants in each replicate, pooled and ginned using a laboratory gin. The SC and Lint were determined gravimetrically. The LP was calculated using the following equation: LP (%) = (Lint/SC) × 100%. All measurements were carried out plot by plot to ensure statistical reliability.

2.6. Methods for Comprehensive Evaluation

To comprehensively evaluate the combined agronomic and nutritional performance of upland cotton germplasm, we implemented an integrated workflow consisting of: (1) data preparation and scaling, (2) objective weighting by PCA (Principal Component Analysis), (3) AHP (Analytic Hierarchy Process) weighting derived from PCA weights, and (4) multi-index scoring and consensus integration. This design allowed for (a) reducing redundancy among traits via PCA-based weighting, (b) obtaining an AHP weight system without subjective pairwise scoring, and (c) producing a single consensus ranking for germplasm screening from several commonly used multi-trait indices (Figure 3).

2.6.1. Data Preparation and Scaling

A total of N = 29 traits (19 nutritional and 10 agronomic traits) with complete two-year data were used for comprehensive evaluation. For these traits, sporadic missing observations were imputed using trait-wise mean substitution, and all traits were normalized to the [0, 1] range by min–max scaling. Tryptophan was quantified but excluded from the 29-trait matrix because it was unavailable in one season and thus was not used in BLUE estimation or subsequent multivariate analyses.

Z_{i j} = \frac{x_{i j} - \min (x_{j})}{\max (x_{j}) - \min (x_{j})}

(1)

where

x_{i j}

is the original value of trait

j

for accession

i

and min (

x_{j}

) and max (

x_{j}

) are computed across all accessions for trait

j

.

2.6.2. PCA-Based Objective Weights

PCA was performed to derive objective trait weights. The number k of retained components was chosen to ensure sufficient information preservation (cumulative explained variance reaching 70.2% in this dataset). Let

ω_{i}

denote the variance contribution (explained variance ratio) of the

i

retained component and

V_{j i}

denote the loading of trait

j

on component

i

. The PCA-based weight of trait

j

was computed as

W_{j}^{P C A} = \frac{\sum_{i = 1}^{k} | V_{j i} | \times ω_{i}}{\sum_{i = 1}^{k} ω_{i}}

(2)

2.6.3. Deriving AHP Weights from PCA Weights and AHP Scoring

To avoid subjective construction of the AHP judgment matrix, we generated the pairwise comparison matrix A = [

a_{i j}

] algorithmically from

W^{P C A}

using a mapped 1–9 scale:

a_{i j} = \{\begin{array}{l} 1 + (\frac{W_{i}^{P C A}}{W_{j}^{P C A}} - 1) \times 8, & i f \frac{W_{i}^{P C A}}{W_{j}^{P C A}} \geq 1 \\ \frac{1}{1 + (\frac{W_{j}^{P C A}}{W_{i}^{P C A}} - 1) \times 8}, & i f \frac{W_{i}^{P C A}}{W_{j}^{P C A}} < 1 \end{array}

(3)

All a_ij were constrained to [1/9, 9]. The AHP weight vector

W^{P C A}

was obtained as the normalized eigenvector associated with the maximum eigenvalue of A. Consistency was accepted when CR < 0.10.

Using

W^{A H P}

and the scaled data

Z_{i j}

, the AHP comprehensive score of accession

i

was computed as

A_{i} = \sum_{j = 1}^{N} Z_{i j} W_{j}^{A H P}

(4)

2.6.4. Multi-Index Scoring Methods Based on W^PCA

(1): Membership-function weighted score (MFM)

M_{i} = \sum_{j = 1}^{N} Z_{i j} W_{j}^{P C A}

(5)

(2): Grey Relational Analysis (GRA)

For each trait j, the reference (ideal) value was defined as

Z_{j}

^* =

{{m a x}_{i} (Z}_{i j})

. The absolute difference between accession

i

and the reference sequence is

Δ_{i j}

= |

Z_{i j}

−

Z_{j}

^*|. Let

Δ_{m i n}

=

{{m i n}_{i, j} (Δ}_{i j})

and

Δ_{m a x}

=

{{m a x}_{i, j} (Δ}_{i j})

.

Δ_{i j}

represents the absolute difference between accession

i

and the reference (ideal) value for trait j.

Δ_{m i n}

and

Δ_{m a x}

define the range of the differences, where

Δ_{m i n}

is the smallest and

Δ_{m a x}

is the largest absolute difference across all accessions and traits. The grey relational coefficient is

ξ_{i j} = \frac{Δ_{m i n} + ρ Δ_{m a x}}{Δ_{i j} + ρ Δ_{m a x}}

(6)

where

ρ

is the distinguishing coefficient (set to 0.5). The weighted grey relational grade is

G_{i} = \sum_{j = 1}^{N} ξ_{i j} W_{j}^{P C A}

(7)

(3): Technique for Order Preference by Similarity to Ideal Solution (TOPSIS)

A weighted normalized matrix was constructed as

V_{i j}

=

Z_{i j}

W_{j}^{P C A}

. The positive and negative ideal solutions are

V_{j}^{+}

=

{m a x}_{i} (V_{i j})

and

V_{j}^{-}

=

{m i n}_{i} (V_{i j})

. The Euclidean distances are

d_{i}^{+} = \sqrt{\sum_{j = 1}^{N} (V_{i j} - V_{j}^{+})^{2}}, d_{i}^{-} = \sqrt{\sum_{j = 1}^{N} (V_{i j} - V_{j}^{-})^{2}}

(8)

The TOPSIS closeness coefficient is

T_{i} = \frac{d_{i}^{-}}{d_{i}^{+} + d_{i}^{-}}

(9)

2.6.5. Consensus Index: Phenotypic Comprehensive Value

To provide a single ranking index for germplasm screening, we integrated the four evaluation outputs (

M_{i}

,

G_{i}

,

T_{i}

, and

A_{i}

) using their arithmetic mean:

P_{i} = \frac{M_{i} + G_{i} + T_{i} + A_{i}}{4}

(10)

P_{i}

was used as the final consensus score for internal comparison and screening of germplasm. The agreement among methods (e.g., correlations among indices) was assessed and reported in the Section 3 as a post-hoc concordance check, rather than as a prerequisite for defining

P_{i}

.

2.7. Statistical Analysis

Statistical analyses were performed in R (version 4.5.2). Field trials were conducted using a randomized complete block design (RCBD), with three blocks serving as field replicates each year, and agronomic traits were recorded plot by plot. For seed-quality traits (protein and amino acids), seeds from the three field replicates (blocks) of each accession within a year were pooled to form one composite sample. This composite sample was then analyzed in triplicate as technical replicates, and the mean value was used for subsequent mixed-model analysis.

To obtain genotype performance estimates across environments, Best Linear Unbiased Estimates (BLUEs) were calculated from the combined two-year dataset using linear mixed models. For seed-quality (laboratory) traits, the model was Y_ij = μ + G_i + Y_j + (G × Y)_ij + ε_ij, where Y_ij is the mean phenotypic value of genotype i in year j (averaged over technical replicates), μ is the overall mean, G_i is the fixed genotype effect (for BLUE estimation), Y_j is the random year effect, (G × Y)_ij is the random genotype-by-year interaction, and ε_ij is the residual. For field agronomic traits, the model additionally included blocks nested within year to account for the RCBD structure Y_ijk = μ + G_i + Y_j + (G × Y)_ij + B_k(j) + ε_ijk, where Y_ijk is the plot-level observation of genotype i in year j and block k and B_k(j) denotes the random effect of block k nested within year j. The resulting BLUEs were used as inputs for downstream multivariate analyses, including correlation analysis, hierarchical clustering, and PCA, implemented with the corrplot, ggtree, and factoextra packages, and were further used to identify the top 3% of genotypes (approximately 10 accessions).

In addition, trait means were compared using a gamma-distributed generalized linear model (GLM) followed by Tukey’s post-hoc test for multiple comparisons, where different lowercase letters denote statistically significant differences (p < 0.05).

3. Results

3.1. Genetic Diversity Analysis of Protein and Amino Acid Components in Upland Cottonseeds

Across 312 upland cotton accessions evaluated over two consecutive years (Table S1), protein content showed a broadly unimodal, approximately normal distribution (Figure 4A,B). The two-year mean protein content was 37.0%, with an observed range of 29.6–48.8% across accessions. The magnitude of variation was moderate but differed between years, as reflected by the coefficients of variation (CV): 7.5% in 2023 (mean 37.0%, SD 2.8) and 11.7% in 2024 (mean 37.0%, SD 4.3), indicating a stable central tendency but increased dispersion in 2024. Distribution descriptors further supported near-normality for protein (skewness 0.654–0.698; kurtosis −0.217 to 1.129).

Amino acid composition profiles (Figure 4C,D) showed that Glu, Asp, Lys, Arg, and Ala were consistently the most abundant amino acids. Glu was the dominant component, with a two-year mean of 154.0 mg/g and moderate year-dependent variability (CV 10.2% in 2023 vs. 20.8% in 2024). In contrast, Arg displayed the largest relative variability (two-year mean 26.6 mg/g; CV 16.0% in 2023 and 30.1% in 2024) and the strongest deviation from normality in 2024 (skewness 1.726; kurtosis 3.252), consistent with a right-skewed, heavy-tailed distribution. Data for tryptophan are provided in the Supplementary Materials but were excluded from multi-year analysis (see Section 2.6.1).

3.2. Differences in Seed Protein and Amino Acids Across Upland Cotton Germplasm

To assess geographic differentiation in cottonseed nutritional quality, the 312 upland cotton accessions were classified into five origin groups: YR, YZ, NW, Local, and Intro (Table S2). Accessions introduced from multiple foreign countries were pooled into the Intro group due to the limited number of accessions from any single country. This pooling increased statistical robustness and provided a stable composite reference group for multiple comparisons, rather than attempting to resolve country-specific differences within the introductions. Based on the data in Table S2, both protein content (%) and most amino acid contents (mg/g; excluding tryptophan) differed significantly among geographic groups (p < 0.05). Notably, the NW group consistently exhibited the highest protein content in both years (37.9% in 2023 and 38.5% in 2024), significantly surpassing the other groups (p < 0.05), where protein content ranged from 35.8% to 36.9% in both years. Similar geographic patterns were observed for several major amino acids. For instance, Asp in the NW group reached 86.8 mg/g in 2023 and increased to 104.4 mg/g in 2024, remaining significantly higher than the other groups in 2024 (e.g., 89.5 mg/g in the Intro group and 93.8 mg/g in the Local group). For Glu, no significant regional differences were detected in 2023 (ns; ~141.9–147.7 mg/g across groups), whereas clear differentiation emerged in 2024, with the NW group reaching 175.7 mg/g compared with 150.1–158.0 mg/g in the other groups. Overall, the NW group exhibited consistently superior nutritional performance, reflected by elevated protein content and higher levels of several key amino acids across both years (Table S2).

3.3. Variation Analysis of Agronomic Traits in Upland Cotton Germplasm

Ten key cotton traits—three yield-related traits (Lint, SC, and LP) and seven architectural traits (PH, FFN, GH, BN, SBP, Angle, and Len)—were evaluated in 2023–2024 (Table S3 and Figure 5A,B). Distribution statistics, including skewness, excess kurtosis, and coefficients of variation (CV), were calculated based on accession-level estimates (i.e., genotype means within each year) rather than raw plot observations. Overall, most agronomic traits exhibited approximately symmetric distributions with limited departures from normality (with most values for |skewness| and |excess kurtosis| < 1; Table S3), consistent with the smoothing effect of replicated field measurements and genotype-mean estimates. Trait variability varied markedly among traits: SBP was the most variable (CV = 48.1% in 2023; 28.9% in 2024), while Angle (9.7%) and LP (10.6%) were among the most stable (Table S3). Notably, year-to-year changes in CV were trait-dependent: CV decreased in 2024 for FFN, GH, and SBP, but increased for PH, BN, Lint, SC, and Angle, while remaining similar for LP and Len (Table S3). This suggests that environmental sensitivity differed across traits, rather than uniformly reducing variability.

3.4. Correlation Analysis of Cottonseed Protein and Amino Acids

To characterize the interrelationships between traits at the accession level, genotypic correlations were computed using Pearson’s correlation coefficients (r) based on the genotype BLUEs of 29 traits across 312 upland cotton accessions (Figure 6). Since the analysis was based on BLUEs, the resulting correlations reflect genotypic associations that are directly relevant for breeding. With a large sample size (N = 312), the statistical power was sufficient to detect even modest associations (approximately |r| ≈ 0.20 at p < 0.05). Protein content showed significant positive correlations with 17 amino acids, including Asp (r = 0.77, p < 0.001) and Glu (r = 0.75, p < 0.001), whereas Met was not significantly correlated with protein (r = 0.07, p > 0.05). Among agronomic traits, LP and GH displayed consistent associations with nutritional traits. LP was positively correlated with protein (r = 0.18, p < 0.01) and showed highly significant positive correlations with 16 amino acids (p < 0.001) (e.g., Asp, r = 0.30, p < 0.001; Glu, r = 0.32, p < 0.001). Similarly, GH was positively correlated with protein (r = 0.23, p < 0.001) and showed highly significant correlations with five amino acids, including Arg (r = 0.27, p < 0.001). Collectively, these genotypic associations suggest that key yield-related traits are positively or neutrally associated with cottonseed nutritional traits, supporting the feasibility of simultaneous improvement of both yield potential and nutritional quality in breeding programs.

3.5. Cluster Analysis of Seed Composition and Agronomic Traits in Upland Cotton

In this study, a total genotypic cluster analysis was performed on 312 upland cotton accessions using the BLUEs of 29 traits. A dendrogram was constructed using Ward’s hierarchical clustering method. To define a clear population structure, a Euclidean distance (tree height) of 25.2 was established as the cutoff threshold, partitioning the germplasm into four distinct groups: Group I (N = 95), Group II (N = 84), Group III (N = 106), and Group IV (N = 27) (Figure 7). Detailed cluster membership and the corresponding group-level trait profiles are provided in Table S4, which facilitates direct comparison among groups. Group I was characterized by superior vegetative growth and SC production, exhibiting the highest SC (154.2 g) of all clusters. However, its fiber conversion efficiency was moderate, with a Lint of 61.5 g and LP of 40.0%, which were lower than those of Group IV. The protein content of this group was moderate at 36.5%. Group II exhibited substandard performance across most agronomic and quality traits, characterized by the lowest protein content (34.9%) and underwhelming yield traits, including the lowest lint (51.4 g), SC (135.7 g), and LP (37.9%) weights. Group III exhibited a highly balanced trait profile. Unlike Group I, Group III maintained a high LP (41.1%), comparable to elite lines, while achieving a Lint of 60.6 g and a relatively high protein content (38.1%). This combination suggests that Group III possesses a stable genetic background suitable for the simultaneous improvement of multiple traits. Group IV represented elite germplasm resources and displayed the most favorable comprehensive performance. It not only possessed the highest protein content (41.6%) and superior amino acid composition but also achieved the highest Lint (62.5 g) and LP (41.2%). Furthermore, Group IV exhibited superior growth vigor, as evidenced by the greatest PH (74.5 cm), BN (9.3 count), and SBP (11.0 count). In summary, the multi-trait cluster analysis provided explicit guidance for targeted breeding programs: Group IV was identified as particularly suitable as core parental material for high-quality, high-yield breeding; Group I was effective for increasing seed cotton biomass; Group III served as valuable material for trait enhancement; and Group II represented a pool requiring systematic improvement.

3.6. Comprehensive Evaluation of Upland Cotton Germplasm Resources

To systematically identify elite upland cotton germplasm, a comprehensive evaluation framework was established by integrating Principal Component Analysis (PCA) with four multi-criteria decision-making models (MFM, GRA, TOPSIS, and AHP). This approach enabled a holistic assessment of 312 accessions using the BLUEs of 29 phenotypic traits. PCA was initially employed to mitigate multicollinearity; the first four principal components accounted for 70.2% of the total phenotypic variation (Figure 8A). PC1 was primarily defined by positive loadings on 17 amino acids, representing the nutritional dimension, whereas PC2 was predominantly driven by vegetative traits, such as PH and GH (Table S5). Subsequently, the Phenotypic Comprehensive Value (P_i) was derived by integrating the normalized scores from the four decision-making models. The P_i values for the 312 accessions ranged from 0.733 to 0.278, with the distribution shown in Figure 8C; the dashed blue and red vertical lines indicate the median and mean, respectively. The individual rankings generated by the four models are presented in Table S6. Spearman rank correlation analysis revealed a high degree of concordance among the rankings (r = 0.982–0.999), particularly between TOPSIS and AHP (r = 0.999), underscoring the reliability and stability of the evaluation system (Figure 8B). Targeted screening based on the top 3% of Pi rankings identified ten elite accessions with superior comprehensive traits (Table 2). Xinluzhong 34 (Rank 1, P_i = 0.733) emerged as the top-performing variety, distinguished by its exceptional balance of nutritional quality and yield potential, exhibiting high levels of protein (42.7%), Lys (82.6 mg/g), Lint (75.0 g), and LP (48.2%). Xinluzhong 62 (Rank 2, P_i = 0.695) was characterized by the highest protein content (43.4%) among the elite pool, alongside excellent LP (48.2%). Chang Kangmian (Rank 3, P_i = 0.670) featured a unique amino acid profile, peaking in Lys (90.7 mg/g) and Met (0.9 mg/g). Xinluzhong 63 (Rank 4, P_i = 0.660) displayed a balanced performance with notable protein (42.7%) and LP (45.2%). Regarding yield components, Xinluzhong 65 (Rank 5, P_i = 0.657) demonstrated outstanding potential, achieving high lint (72.0 g) and SC (170.0 g) yields. Pengze 4 (Rank 6, P_i = 0.649) was notable for its strong reproductive capacity, as evidenced by its high SBP (16.7) and protein (42.9%) contents. 4133Bt (Rank 7, P_i = 0.638) excelled specifically in yield traits, recording impressive lint (76.5 g) and SC (169.3 g). Xinpao 1 (Rank 8, P_i = 0.638) presented superior nutritional quality with high protein (41.7%) and Lys (83.9 mg/g). Xinluzhong 61 (Rank 9, P_i = 0.638) exhibited stable agronomic traits, supported by a high LP (45.3%) and Lint (71.9 g). Finally, Jin 34 (Rank 10, P_i = 0.638) showed stable performance across the key traits (protein, Lint, and LP), meeting the elite selection criteria (P_i values are rounded to three decimals; see Table S6 for full precision).

4. Discussion

This study aims to address the critical gap highlighted in the introduction regarding the lack of systematic evaluation data for cottonseed, a resource that has historically been underutilized despite its abundance. Based on extensive germplasm evaluation, the present study not only clarifies the high nutritional potential of cottonseed but also reveals the possibility of synergistically improving lint yield and nutritional quality. Furthermore, the comprehensive evaluation system constructed in this study provides a practical basis for prioritizing candidate genotypes in breeding selection.

4.1. Re-Evaluation of Cottonseed Protein Value

Against the backdrop of surging global population growth and the consequent urgent demand for sustainable plant proteins, identifying alternatives to soybeans has become a strategic priority in agricultural science [1,2]. The data from the current study indicate that the protein content of upland cottonseed exhibits extensive variation, ranging from 29.6% to 48.8%, with a mean of 37.0%. These results strongly corroborate previous assertions by He et al. [3] and Yan et al. [4] that cottonseed protein content is comparable to, and in specific elite germplasm, even superior to, that of soybean (Glycine max). For instance, He et al. [3] noted that defatted cottonseed meal can reach protein contents of 40–50%, and the screening of high-protein germplasm such as “Xinluzhong 62” (43.4%) in the present study provides the concrete genetic material to substantiate the potential of cottonseed. Recent in vivo experiments have further confirmed that cottonseed meal can completely replace soybean meal in amino acid-balanced diets without compromising the growth performance of ruminants [18].

The nutritional efficacy of proteins is fundamentally determined by their amino acid profiles. Cottonseed protein was found to be exceptionally abundant in arginine (Arg) and Glutamic Acid (Glu), which is highly consistent with the reviews by Shang et al. [1] and He et al. [3]. Arginine plays a critical role in immune regulation and in maintaining cardiovascular health [3]. Furthermore, as highlighted by He et al. [3], Glutamic Acid serves as a vital precursor for neurotransmission and is the primary source of the “umami” taste, thereby enhancing the flavor profile of food formulations. Although the Lysine content is relatively low as the first limiting amino acid [19], previous studies by Wu et al. [20] and Qiu et al. [7] have demonstrated that lysine bioavailability can be significantly improved in modern livestock and aquaculture systems through precise formulation with other protein sources or supplementation. Moreover, given that Methionine is typically the first limiting amino acid in legume-based diets, the identification of germplasm with elevated Methionine levels (e.g., “Chang Kangmian”, 0.9 mg/g) in the present study holds significant practical value for optimizing amino acid balance and facilitating soybean meal substitution [1,4]. Crucially, the toxicity of gossypol, which has historically limited cottonseed utilization, is being progressively reduced. Rathore et al. [6] reported that “ultra-low gossypol” (ULG) cotton, developed via gene-silencing technologies, has been approved for human consumption. Combined with the efficient protein extraction protocols developed by Kumar et al. [21] and the inherently superior emulsifying properties reported by Kumar et al. [22], the cottonseed industry is accelerating the transition to high-value-added food ingredients. Recent aquaculture studies have also demonstrated that degossypolized cottonseed protein concentrate can replace up to 50% of soybean meals without negatively affecting growth performance [23].

4.2. Synergy Between Yield and Quality

The negative correlation between yield and quality traits is a pervasive phenomenon in crop breeding, typically attributed to pleiotropy or linkage drag [24]. However, numerous studies have indicated that yield and quality traits can be improved synergistically [25,26,27,28,29]. In the current study, the large-scale genotypic correlation analysis (based on accession BLUEs) indicated that Lint Yield and Lint Percentage exhibited positive or neutral associations with protein content and most amino acid traits, suggesting that simultaneous improvement of yield and nutritional quality is feasible and that the classical trade-off is not necessarily inevitable in upland cotton germplasm. This conclusion is further supported by recent genomic studies revealing that the trade-off between yield and quality is not absolute but may be regulated by complex genetic networks and can be partially decoupled through selection and allele recombination, thereby achieving trait decoupling [30,31,32]. Importantly, the practical breeding value of this synergy is substantiated by the clustering results: Group IV (Cluster 4) represented an elite pool combining superior nutritional traits with favorable yield performance. Specifically, Group IV showed the highest mean protein content (41.6%) while maintaining the highest mean lint yield (62.5 g) and lint percentage (41.2%); representative accessions such as “Xinluzhong 34” further exemplified this dual advantage (high lint yield together with high protein content). These accessions therefore constitute valuable donor parents for breeding programs aiming to co-improve cottonseed nutritional quality and fiber yield.

4.3. Construction of Evaluation System

To address the current deficiency in comprehensive evaluation systems for cotton germplasm [8], a framework integrating PCA-based dimensionality reduction, objective weighting, and multi-model integration was constructed. Facing 29 complex traits, single-method evaluations often fail to capture the full phenotypic landscape, a limitation noted in previous studies [3,8]. Therefore, four mathematical models that have been applied in agriculture were applied in parallel and integrated to generate a robust consensus ranking.

Principal Component Analysis (PCA): As a classic dimensionality reduction tool, PCA effectively extracts core variation information. The efficacy of PCA in dissecting the genetic diversity of complex agronomic traits has been demonstrated by Kumar et al. [33] in Bt cotton, Venkatesan et al. [34] in Asiatic cotton, and Vijeth et al. [10] in wheat.

Grey Relational Analysis (GRA): GRA quantifies how close each genotype is to an ideal reference profile based on the normalized trait deviations, without relying on distributional or linear model assumptions. Zhao et al. [14] and Zhang et al. [13] successfully used GRA to screen elite germplasm in pea and peanut, respectively, proving the universality of the method in crop comprehensive evaluation.

Membership Function Method (MFM): MFM applies min–max scaling to transform each trait to the [0, 1] range, enabling traits with different units to be integrated into a single composite score. The MFM technique was employed by Gyanagoudar et al. [12] for salinity tolerance in eggplants and by Zhang et al. [35] for pear seedling screening to quantify comprehensive performance. Recent studies by Wang et al. [36] successfully applied MFM to evaluate the phenotypic diversity of apple germplasm resources.

TOPSIS and Analytic Hierarchy Process (AHP): These methods provide structured decision support. Gbegbelegbe et al. [11] utilized TOPSIS for precise ranking of dryland agricultural technologies and maize variety adaptability; meanwhile, Schiavon et al. [15] and Tian et al. [37] validated the advantages of AHP in weight assignment and hierarchical decision-making.

Despite the differing mathematical principles of the applied models, the results showed high concordance in the rankings of the 312 accessions (Spearman r = 0.982–0.999). The high inter-method correlation aligns with the findings of Long et al. [38] in maize hybrid evaluation, indicating that multi-model integration significantly reduces method-specific bias and enhances decision-making robustness.

4.4. Cottonseed Utilization and Breeding Prospects

This study substantiates the nutritional value of cottonseed as a strategic protein source [1,3] and demonstrates the feasibility of simultaneous improvement of yield and quality by identifying elite germplasm such as “Xinluzhong 34”. The constructed evaluation system provides a practical basis for breeding selection. In addition, a meta-analysis of comparable composite-ranking studies could further strengthen the generality of this approach. Future research should integrate multi-omics to elucidate the underlying genetic mechanisms [39,40] and prioritize high-value product development (e.g., peptides and artificial meat) to establish a complete industrial chain [6,22].

5. Conclusions

In the present study, phenotypic variation analysis was initially conducted on 30 agronomic and nutritional traits in 312 upland cotton germplasm resources. However, owing to the unavailability of data for one growing season, Tryptophan was excluded from the subsequent multivariate modeling to ensure evaluation robustness. Consequently, the systematic comprehensive evaluation was performed using 29 core traits. The results indicated that the protein content exhibited extensive variation, with a mean of 37.0%, whereas Glutamic Acid and Arginine were the predominant amino acids. Furthermore, nutritional traits were found to have significant positive or neutral correlations with key fiber traits (Lint Yield, Lint Percentage), demonstrating the feasibility of simultaneous improvement. Based on the comprehensive evaluation system established herein, 10 elite accessions, exemplified by “Xinluzhong 34”, were identified. These findings provide valuable data and lay a solid practical foundation for the high-value utilization of cottonseed and synergistic breeding of varieties combining superior yield and nutritional quality.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/agriculture16050514/s1: Table S1: Descriptive statistics of the nutritional quality of 312 cottonseed samples; Table S2: Differences in Seed Protein and Amino Acids across Upland Cotton Germplasm; Table S3: Phenotypic Variation of Agronomic Traits; Table S4: Cluster analysis of cottonseed protein, amino acid components, and agronomic traits; Table S5: 312 Gossypium hirsutum germplasm resources for 10 principal components; Table S6: Scores of 312 Germplasms under Different Models; Table S7: List of 312 Upland Cotton Germplasm Accessions with Origins.

Author Contributions

Z.W., H.L. and Y.Z.: methodology, data curation, software, formal analysis, validation, writing—original draft. S.A., S.W. and J.M.: investigation, validation, supervision. K.Z., Q.C. and X.D.: investigation, supervision, resources, project administration, writing—review & editing, funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Xinjiang Major Science and Technology Project (2024A02003-3) and the Xinjiang Key Research and Development Program (2024B02001-1/-2).

Data Availability Statement

The data presented in this study are included in the article and its Supplementary Materials. Additional data and inquiries can be directed to the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AHP	Analytic Hierarchy Process
GRA	Grey Relational Analysis
MFM	Membership Function Method
TOPSIS	Technique for Order Preference by Similarity to Ideal Solution
P_i	Phenotypic Comprehensive Value
PCA	Principal Component Analysis
BSA	Bovine Serum Albumin
BLUE	Best Linear Unbiased Estimate
RCBD	Randomized Complete Block Design
HPLC	High-Performance Liquid Chromatography
ULG	Ultra-Low Gossypol
GLM	Generalized Linear Model
CV	Coefficient of Variation
YR	Yellow River Basin
YZ	Yangtze River Basin
NW	Northwest Inland Region
Intro	Foreign Introductions
Local	Local Varieties
Asp	Aspartic Acid
Glu	Glutamic Acid
Hyp	Hydroxyproline
Ser	Serine
Arg	Arginine
Gly	Glycine
Thr	Threonine
Pro	Proline
Ala	Alanine
Val	Valine
Met	Methionine
Cys	Cysteine
Ile	Isoleucine
Leu	Leucine
His	Histidine
Phe	Phenylalanine
Lys	Lysine
Tyr	Tyrosine
Trp	Tryptophan
PH	Plant Height
FFN	First Fruiting Branch Node
GH	Height to First Branch
BN	Number of Fruiting Branches
SBP	Bolls Per Plant
Lint	Lint Yield
SC	Seed Cotton Yield
LP	Lint Percentage
Angle	Angle of Lower Fruiting Branches
Len	Length of Lower Fruiting Branches
DNFB	2,4-Dinitrofluorobenzene
VWD	Variable Wavelength Detector

References

Shang, X.; Li, S.; Lin, S.; Sun, S.; Liang, L.; Zhang, Y.; Yang, R. Cottonseed protein as a sustainable alternative plant protein: Basic characteristics, recent advancements, applications and limitations. Trends Food Sci. Technol. 2026, 169, 105531. [Google Scholar] [CrossRef]
Henchion, M.; Hayes, M.; Mullen, A.M.; Fenelon, M.; Tiwari, B. Future Protein Supply and Demand: Strategies and Factors Influencing a Sustainable Equilibrium. Foods 2017, 6, 53. [Google Scholar] [CrossRef] [PubMed]
He, Z.; Zhang, H.; Olk, D.C. Chemical Composition of Defatted Cottonseed and Soy Meal Products. PLoS ONE 2015, 10, e0129933. [Google Scholar] [CrossRef] [PubMed]
Yan, L.; An, S.; Lv, X.; Lv, Z.; Zhang, B.; Choct, M.; Guo, Y.; Wang, Z.; Yan, B.; Li, Y. Effects of replacing soybean meal with cottonseed meal on growth performance, carcass trait, intestinal development and intestinal microbiota of broiler chickens. Poult. Sci. 2025, 104, 104653. [Google Scholar] [CrossRef] [PubMed]
Tao, A.; Wang, J.; Luo, B.; Liu, B.; Wang, Z.; Chen, X.; Zou, T.; Chen, J.; You, J. Research progress on cottonseed meal as a protein source in pig nutrition: An updated review. Anim. Nutr. 2024, 18, 220–233. [Google Scholar] [CrossRef]
Rathore, K.S.; Pandeya, D.; Campbell, L.M.; Wedegaertner, T.C.; Puckhaber, L.; Stipanovic, R.D.; Thenell, J.S.; Hague, S.; Hake, K. Ultra-Low Gossypol Cottonseed: Selective Gene Silencing Opens Up a Vast Resource of Plant-Based Protein to Improve Human Nutrition. Crit. Rev. Plant Sci. 2020, 39, 1–29. [Google Scholar] [CrossRef]
Qiu, K.; Wang, X.C.; Wang, J.; Wang, H.; Qi, G.H.; Zhang, H.J.; Wu, S.G. Comparison of amino acid digestibility of soybean meal, cottonseed meal, and low-gossypol cottonseed meal between broilers and laying hens. Anim. Biosci. 2023, 36, 619–628. [Google Scholar] [CrossRef]
Huang, Y.; Li, C.; Fu, S.; Wu, Y.; Zhou, D.; Huang, L.; Peng, J.; Kuang, M. Comprehensive Evaluation of Nutritional Quality Diversity in Cottonseeds from 259 Upland Cotton Germplasms. Foods 2025, 14, 2895. [Google Scholar] [CrossRef]
Li, Y.; Zou, J.; Zhu, H.; He, J.; Setter, T.L.; Wang, Y.; Meng, Y.; Chen, B.; Zhao, W.; Wang, S.; et al. Drought deteriorated the nutritional quality of cottonseed by altering fatty acids and amino acids compositions in cultivars with contrasting drought sensitivity. Environ. Exp. Bot. 2022, 194, 104747. [Google Scholar] [CrossRef]
Vijeth, G.M.; Uday, G.; Krishnappa, G.; Shashidhar, N. Genetic variability and principal component analysis in durum wheat germplasm for terminal drought stress. Indian J. Agric. Sci. 2025, 95, 522–528. [Google Scholar] [CrossRef]
Gbegbelegbe, S.; Alene, A.; Swamikannu, N.; Frija, A. Multi-dimensional impact assessment for priority setting of agricultural technologies: An application of TOPSIS for the drylands of sub-Saharan Africa and South Asia. PLoS ONE 2024, 19, e0314007. [Google Scholar] [CrossRef]
Gyanagoudar, H.S.; Hatiya, S.T.; Guhey, A.; Dharmappa, P.M.; Seetharamaiah, S.K. A comprehensive approach for evaluating salinity stress tolerance in brinjal (Solanum melongena L.) germplasm using membership function value. Physiol. Plant. 2024, 176, e14239. [Google Scholar] [CrossRef] [PubMed]
Zhang, N.; Zhang, H.; Ren, J.Y.; Bai, B.Y.; Guo, P.; Lv, Z.H.; Kang, S.L.; Zhao, X.H.; Yu, H.Q.; Zhao, T.H. Characterization and Comprehensive Evaluation of Phenotypic and Yield Traits in Salt-Stress-Tolerant Peanut Germplasm for Conservation and Breeding. Horticulturae 2024, 10, 147. [Google Scholar] [CrossRef]
Zhao, T.; Quan, W.; Du, Z.; Xie, Q.; Kang, Y.; Xue, W. A comprehensive evaluation of pea germplasm resources through cluster and gray relational analyses. Genet. Resour. Crop Evol. 2023, 70, 1135–1149. [Google Scholar] [CrossRef]
Schiavon, L.L.P.; Lima, P.A.B.; Crepaldi, A.F.; Mariano, E.B. Use of the Analytic Hierarchy Process Method in the Variety Selection Process for Sugarcane Planting. Eng 2023, 4, 602–614. [Google Scholar] [CrossRef]
Yu, J.; Bai, X.; Zhang, K.; Feng, L.; Yu, Z.; Jiao, X.; Guo, Y. Assessment of Breeding Potential of Foxtail Millet Varieties Using a TOPSIS Model Constructed Based on Distinctness, Uniformity, and Stability Test Characteristics. Plants 2024, 13, 2102. [Google Scholar] [CrossRef]
Ma, L.H.; Tian, Y.; Jiang, Z.M.; Tian, B. Preparation of rapid detecting paper for total protein in milk powder by biuret method. Sci. Technol. Food Ind. 2012, 33, 327–329. [Google Scholar] [CrossRef]
Yue, Y.; Lin, J.; Lv, G.; Liu, B.; Deng, X.; Li, Y.; Li, X.; Chen, K. Effects of replacing soybean meal with cottonseed meal in amino acid balanced diets on growth performance, apparent digestibility, ruminal fermentation, and microbial diversity in fattening Dorper × Hu crossbred sheep. Front. Vet. Sci. 2025, 12, 1681407. [Google Scholar] [CrossRef]
Fisher, H. Unrecognized Amino Acid Deficiencies of Cottonseed Protein for the Chick. J. Nutr. 1965, 87, 9–12. [Google Scholar] [CrossRef]
Wu, M.; Li, M.; Wen, H.; Yu, L.; Jiang, M.; Lu, X.; Tian, J.; Huang, F. Dietary lysine facilitates muscle growth and mediates flesh quality of Pacific white shrimp (Litopenaeus vannamei) reared in low-salinity water. Aquac. Int. 2023, 31, 603–625. [Google Scholar] [CrossRef]
Kumar, M.; Potkule, J.; Patil, S.; Saxena, S.; Patil, P.G.; Mageshwaran, V.; Punia, S.; Varghese, E.; Mahapatra, A.; Ashtaputre, N.; et al. Extraction of ultra-low gossypol protein from cottonseed: Characterization based on antioxidant activity, structural morphology and functional group analysis. LWT 2021, 140, 110692. [Google Scholar] [CrossRef]
Kumar, M.; Tomar, M.; Punia, S.; Grasso, S.; Arrutia, F.; Choudhary, J.; Singh, S.; Verma, P.; Mahapatra, A.; Patil, S.; et al. Cottonseed: A sustainable contributor to global protein requirements. Trends Food Sci. Technol. 2021, 111, 100–113. [Google Scholar] [CrossRef]
Yan, Q.; Bu, X.; Liu, Y.; Yao, C.; Wang, Z.; Shi, M.; Zhang, Z.; Zhang, J.; Zhang, J.; Du, J.; et al. Effects of Replacing Soybean Meal with Degossypolized Cottonseed Protein on the Growth Performance, Protein Metabolism, Digestive Capacity, and Antioxidant Capacity of Hybrid Fish Hefang Bream. Aquac. Nutr. 2025, 2025, 4633901. [Google Scholar] [CrossRef] [PubMed]
Clement, J.D.; Constable, G.A.; Stiller, W.N.; Liu, S.M. Negative associations still exist between yield and fibre quality in cotton breeding programs in Australia and USA. Field Crops Res. 2012, 128, 1–7. [Google Scholar] [CrossRef]
Yuan, Y.C.; Wang, X.L.; Wang, L.Y.; Xing, H.X.; Wang, Q.K.; Saeed, M.; Tao, J.C.; Feng, W.; Zhang, G.H.; Song, X.L.; et al. Genome-Wide Association Study Identifies Candidate Genes Related to Seed Oil Composition and Protein Content in Gossypium hirsutum L. Front. Plant Sci. 2018, 9, 1359. [Google Scholar] [CrossRef]
Sharma, A.; Xu, M.; Vitrakoti, D.; Patel, J.D.; Chee, P.W.; Paterson, A.H. Genetic basis and role of exotic accessions in cultivated cotton fiber quality improvement. Theor. Appl. Genet. 2025, 138, 260. [Google Scholar] [CrossRef]
Chu, Q.; Fu, X.; Zhao, J.; Li, Y.; Liu, L.; Zhang, L.; Zhang, Y.; Guo, Y.; Pei, Y.; Zhang, M. Simultaneous improvement of fiber yield and quality in upland cotton (Gossypium hirsutum L.) by integration of auxin transport and synthesis. Mol. Breed. 2024, 44, 64. [Google Scholar] [CrossRef]
Zeng, L.; Meredith, W.R. Associations among Lint Yield, Yield Components, and Fiber Properties in an Introgressed Population of Cotton. Crop Sci. 2009, 49, 1647–1654. [Google Scholar] [CrossRef]
Li, S.Q.; Kong, L.L.; Xiao, X.H.; Li, P.T.; Liu, A.Y.; Li, J.W.; Gong, J.W.; Gong, W.K.; Ge, Q.; Shang, H.H.; et al. Genome-wide artificial introgressions of Gossypium barbadense into G. hirsutum reveal superior loci for simultaneous improvement of cotton fiber quality and yield traits. J. Adv. Res. 2023, 53, 1–16. [Google Scholar] [CrossRef]
Parkash, V.; Snider, J.L.; Virk, G. Interactive effects of water deficit and nitrogen deficiency on photosynthesis, its underlying component processes, and carbon loss processes in cotton. Crop Sci. 2024, 64, 3480–3501. [Google Scholar] [CrossRef]
Zhai, M.H.; Wei, X.W.; Pan, Z.L.; Xu, Q.Q.; Qin, D.L.; Li, J.H.; Zhang, J.; Wang, L.Z.; Wang, K.F.; Duan, X.Y.; et al. Optimizing plant density and canopy structure to improve light use efficiency and cotton productivity: Two years of field evidence from two locations. Ind. Crops Prod. 2024, 222, 119946. [Google Scholar] [CrossRef]
Qin, A.; Aluko, O.O.; Liu, Z.; Yang, J.; Hu, M.; Guan, L.; Sun, X. Improved cotton yield: Can we achieve this goal by regulating the coordination of source and sink? Front. Plant Sci. 2023, 14, 1136636. [Google Scholar] [CrossRef] [PubMed]
Kumar, S.; Jattan, M.; Kumar, D.; Sharma, A.; Malik, K.S.; Saini, A.K.; Mandhania, S. Genetic Diversity in Bt-Cotton (Gossypium hirsutum L.) Genotypes for Yield and Fibre Quality Traits using Multivariate Analysis. J. Adv. Biol. Biotechnol. 2025, 28, 1081–1088. [Google Scholar] [CrossRef]
Venkatesan, T.; Anandan, K.; Ramakrishnan, S.H.; Nallathambi, P.; Ramadoss, B.R. Genetic variability of Asiatic cotton (Gossypium arboreum L.) germplasm for yield and surgical cotton properties. Ind. Crops Prod. 2024, 219, 119065. [Google Scholar] [CrossRef]
Zhang, W.L.; Yuan, S.; Liu, N.; Zhang, H.X.; Zhang, Y.X. Evaluation and screening of dwarfing ‘Duli’ (Pyrus betulifolia Bunge) seedlings by principal component analysis and membership function analysis. Sci. Hortic. 2025, 345, 114128. [Google Scholar] [CrossRef]
Tian, W.; Li, Z.; Wang, L.; Sun, S.; Wang, D.; Wang, K.; Wang, G.; Liu, Z.; Lu, X.; Feng, J.; et al. Comprehensive Evaluation of Apple Germplasm Genetic Diversity on the Basis of 26 Phenotypic Traits. Agronomy 2024, 14, 1264. [Google Scholar] [CrossRef]
Tian, J.; Jiang, M.; Ji, G.; Zhang, J.; Luo, D.; Zhang, F.; Li, L.; Li, M. A comprehensive evaluation of appearance stability during rice storage based on analytic hierarchy process. J. Cereal Sci. 2025, 123, 104160. [Google Scholar] [CrossRef]
Long, Y.; Zeng, Y.; Liu, X.; Yang, Y. Multivariate Analysis of Grain Yield and Main Agronomic Traits in Different Maize Hybrids Grown in Mountainous Areas. Agriculture 2024, 14, 1703. [Google Scholar] [CrossRef]
Lee, S.; Van, K.; Sung, M.; Nelson, R.; LaMantia, J.; McHale, L.K.; Mian, M.A.R. Genome-wide association study of seed protein, oil and amino acid contents in soybean from maturity groups I to IV. Theor. Appl. Genet. 2019, 132, 1639–1659. [Google Scholar] [CrossRef]
Umer, M.J.; Lu, Q.; Huang, L.; Batool, R.; Liu, H.; Li, H.; Wang, R.; Qianxia, Y.; Varshney, R.K.; Pandey, M.K.; et al. Genome-wide association study reveals the genetic basis of amino acids contents variations in Peanut (Arachis hypogaea L.). Physiol. Plant. 2024, 176, e14542. [Google Scholar] [CrossRef]

Figure 1. Overview of the experimental site in Yuepuhu County and related experimental data. (A) Geographical location of the experimental sites. (B) Temporal variations in air temperature and precipitation during the 2023–2024 study period (meteorological data, April–December). The size of the blue dots is positively correlated with the amount of precipitation (larger dots indicate higher precipitation).

Figure 2. Protein quantification standard curve based on the Biuret method. Points A–H represent BSA standard solutions with increasing concentrations (0.00–1.40 mL) for protein quantification.

Figure 3. Schematic diagram of the integrated comprehensive evaluation framework. The workflow illustrates the transition from the initial phenotypic data matrix of 312 accessions to the extraction of principal components (PCA), derivation of objective weights (AHP), parallel scoring through four decision-making models (MFM, GRA, TOPSIS, and AHP), and final consensus integration aimed at identifying elite germplasm. Green, blue, purple, and orange represent MFM, GRA, TOPSIS, and AHP models, respectively.

Figure 4. Analysis of 20 quality traits in 312 upland cotton germplasm resources. Subfigure (A) (2023) and subfigure (B) (2024) display the distribution of cottonseed protein content, while subfigure (C) (2023) and subfigure (D) (2024) illustrate the distributions of 18 amino acids, including: Asp (Aspartic Acid), Glu (Glutamic Acid), Hyp (Hydroxyproline), Ser (Serine), Arg (Arginine), Gly (Glycine), Thr (Threonine), Pro (Proline), Ala (Alanine), Val (Valine), Met (Methionine), Cys (Cysteine), Ile (Isoleucine), Leu (Leucine), His (Histidine), Phe (Phenylalanine), Lys (Lysine), Tyr (Tyrosine), and Trp (Tryptophan). GLM + Tukey multiple-comparison test; different lowercase letters indicate significant differences at p < 0.05.

Figure 5. Analysis of 10 agronomic traits in 312 upland cotton germplasm resources. Subfigure (A) (2023) and Subfigure (B) (2024) display the distributions of 10 agronomic traits, including: PH (Plant Height), FFN (First Fruiting Branch Node), GH (Height to First Branch), BN (Number of Fruiting Branches), SBP (Bolls Per Plant), Lint (Lint Yield), SC (Seed Cotton Yield), LP (Lint Percentage), Angle (Angle of Lower Fruiting Branches), and Len (Length of Lower Fruiting Branches). GLM + Tukey multiple-comparison test; different lowercase letters indicate significant differences at p < 0.05.

Figure 6. Total genotypic (BLUE-based) correlation matrix of cottonseed protein, amino acid components, and agronomic traits based on BLUEs. The upper triangle displays Pearson’s correlation coefficients (r). The lower triangle visually represents these correlations using ellipses, with statistical significance denoted by asterisks (in the lower triangle; * p < 0.05, ** p < 0.01, *** p < 0.001). Red and blue colors indicate positive and negative correlations, respectively. The color intensity and ellipse eccentricity are proportional to the magnitude of the correlation coefficients. Abbreviations: Asp (Aspartic Acid), Glu (Glutamic Acid), Hyp (Hydroxyproline), Ser (Serine), Arg (Arginine), Gly (Glycine), Thr (Threonine), Pro (Proline), Ala (Alanine), Val (Valine), Met (Methionine), Cys (Cysteine), Ile (Isoleucine), Leu (Leucine), His (Histidine), Phe (Phenylalanine), Lys (Lysine), Tyr (Tyrosine); Agronomic traits: PH (Plant Height), FFN (First Fruiting Branch Node), GH (Height to First Branch), BN (Number of Fruiting Branches), SBP (Bolls Per Plant), Lint (Lint Yield), SC (Seed Cotton Yield), LP (Lint Percentage), Angle (Angle of Lower Fruiting Branches), Len (Length of Lower Fruiting Branches).

Figure 7. Hierarchical cluster analysis of 312 upland cotton accessions based on the BLUEs of 29 phenotypic traits. A dendrogram was constructed using the Ward method based on the Euclidean distance. The cutoff distance of 25.2 divided the population into four distinct genetic groups (I, II, III, and IV). The outermost ring visualizes the distribution of cottonseed protein content across accessions.

Figure 8. Comprehensive evaluation results for 312 upland cotton accessions. (A) Principal Component Analysis (PCA) biplot of 29 phenotypic traits (19 nutritional and 10 agronomic traits), showing trait relationships and their contributions to variation. (B) Pairwise Spearman rank-agreement plot among the rankings produced by four multi-criteria decision-making methods (MFM, GRA, TOPSIS, and AHP), presented as a scatterplot matrix with correlation coefficients (*** p < 0.001). This panel is used to describe the concordance among methods rather than to validate predictive performance. (C) Frequency distribution of the phenotypic comprehensive value (Pi); dashed vertical lines indicate the median and mean.

Table 1. Gradient elution program for amino acid analysis.

Time (min)	Mobile Phase A (%)	Mobile Phase B (%)
0	14	86
39	40	60
44	70	30
44.01	14	86
50	14	86

Note: Solvent A: acetonitrile:methanol (90:10, v/v); Solvent B: 0.02 mol/L sodium phosphate buffer (NaH₂PO₄/Na₂HPO₄).

Table 2. Comprehensive evaluation model scores of participating varieties.

Rank	Material Name	P_i Value	Elite Characteristics and Potential Application
1	Xinluzhong 34	0.733	Comprehensive Elite: Ranked 1st in LP (48.2%) and SBP (18.6) and top-tier in Lint (75.0 g) and Protein (42.7%). A superior multi-purpose core parent.
2	Xinluzhong 62	0.695	High-Protein Type: Exhibited the highest Protein content (43.4%) in the population. Also features excellent LP (48.2%, Ranked 2nd). Ideal for quality improvement.
3	Chang Kangmian	0.670	Amino Acid Specialist: Possesses the highest Lysine (90.7 mg/g) and Met (0.9 mg/g) content. A unique genetic resource for nutritional enhancement.
4	Xinluzhong 63	0.660	Balanced Type: Shows consistent performance, ranking in the top 10% for both Protein (42.7%) and LP (45.2%). A stable donor for dual-trait improvement.
5	Xinluzhong 65	0.657	High-Yield Type: Positioned in the top 5 for both Lint (72.0 g) and SC (170.0 g). A robust source for yield trait improvement.
6	Pengze 4	0.649	Reproductive Potential: Ranks 2nd in SBP (16.7), significantly exceeding the population mean (11.6). Key parent for improving boll number.
7	4133Bt	0.638	Lint Yield Specialist: Achieves the 2nd highest Lint yield (76.5 g), demonstrating high efficiency in converting biomass to economic yield.
8	Xinpao 1	0.638	Quality Specialist: A distinct quality donor with high Protein (41.7%) and Lys (83.9 mg/g), surpassing yield-focused lines in biochemical traits.
9	Xinluzhong 61	0.638	High-LP Type: Characterized by a superior LP (45.3%), ranking 5th. Efficiently achieves high Lint (71.9 g) despite a moderate boll number.
10	Jin 34	0.638	Stable Type: Exhibits high stability with all key traits (Protein, Lint, LP) consistent with elite standards, showing minimal deviation. A reliable background for breeding.

Note: P_i, Phenotypic Comprehensive Value; Lys, Lysine; Met, Methionine; Lint, Lint Yield; SC, Seed Cotton Yield; LP, Lint Percentage; SBP, Bolls Per Plant. (Other traits mentioned: PH, Plant Height; GH, Height to First Branch).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Z.; Liu, H.; Zou, Y.; Zheng, K.; Abdukerim, S.; Wu, S.; Ma, J.; Chen, Q.; Deng, X. Genetic Diversity and Nutritional Composition of Cottonseed: A Multi-Trait Analysis. Agriculture 2026, 16, 514. https://doi.org/10.3390/agriculture16050514

AMA Style

Wang Z, Liu H, Zou Y, Zheng K, Abdukerim S, Wu S, Ma J, Chen Q, Deng X. Genetic Diversity and Nutritional Composition of Cottonseed: A Multi-Trait Analysis. Agriculture. 2026; 16(5):514. https://doi.org/10.3390/agriculture16050514

Chicago/Turabian Style

Wang, Zhong, Huayuan Liu, Ying Zou, Kai Zheng, Sibanur Abdukerim, Shuaijun Wu, Jingjing Ma, Quanjia Chen, and Xiaojuan Deng. 2026. "Genetic Diversity and Nutritional Composition of Cottonseed: A Multi-Trait Analysis" Agriculture 16, no. 5: 514. https://doi.org/10.3390/agriculture16050514

APA Style

Wang, Z., Liu, H., Zou, Y., Zheng, K., Abdukerim, S., Wu, S., Ma, J., Chen, Q., & Deng, X. (2026). Genetic Diversity and Nutritional Composition of Cottonseed: A Multi-Trait Analysis. Agriculture, 16(5), 514. https://doi.org/10.3390/agriculture16050514

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Genetic Diversity and Nutritional Composition of Cottonseed: A Multi-Trait Analysis

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Material

2.2. Experimental Design

2.3. Analysis of Cottonseed Protein Content

2.3.1. Standard Curve Construction

2.3.2. Sample Preparation and Measurement

2.4. Determination of Amino Acid Content in Cottonseeds

2.5. Evaluation of Agronomic Traits

2.6. Methods for Comprehensive Evaluation

2.6.1. Data Preparation and Scaling

2.6.2. PCA-Based Objective Weights

2.6.3. Deriving AHP Weights from PCA Weights and AHP Scoring

2.6.4. Multi-Index Scoring Methods Based on WPCA

2.6.5. Consensus Index: Phenotypic Comprehensive Value

2.7. Statistical Analysis

3. Results

3.1. Genetic Diversity Analysis of Protein and Amino Acid Components in Upland Cottonseeds

3.2. Differences in Seed Protein and Amino Acids Across Upland Cotton Germplasm

3.3. Variation Analysis of Agronomic Traits in Upland Cotton Germplasm

3.4. Correlation Analysis of Cottonseed Protein and Amino Acids

3.5. Cluster Analysis of Seed Composition and Agronomic Traits in Upland Cotton

3.6. Comprehensive Evaluation of Upland Cotton Germplasm Resources

4. Discussion

4.1. Re-Evaluation of Cottonseed Protein Value

4.2. Synergy Between Yield and Quality

4.3. Construction of Evaluation System

4.4. Cottonseed Utilization and Breeding Prospects

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.6.4. Multi-Index Scoring Methods Based on W^PCA