Proteins in Scalp Hair of Preschool Children

: Background. Early childhood experiences have long-lasting effects on subsequent mental and physical health, education, and employment. The measurement of these effects relies on insensitive behavioral signs, subjective assessments by adult observers, neuroimaging or neurophysiological studies, or retrospective epidemiologic outcomes. Despite intensive research, the underlying mechanisms of these long-term changes in development and health status remain unknown. Methods. We analyzed scalp hair from healthy children and their mothers using an unbiased proteomics platform combining tandem mass spectrometry, ultra-performance liquid chromatography, and collision-induced dissociation to reveal commonly observed hair proteins with a spectral count of 3 or higher. Results. We observed 1368 non-structural hair proteins in children and 1438 non-structural hair proteins in mothers, with 1288 proteins showing individual variability. Mothers showed higher numbers of peptide spectral matches and hair proteins compared to children, with important age-related differences between mothers and children. Age-related differences were also observed in children, with differential protein expression patterns between younger (2 years and below) and older children (3–5 years). We observed greater similarity in hair protein patterns between mothers and their biological children compared with mothers and unrelated children. The top 5% of proteins driving population variability represented biological pathways associated with brain development, immune signaling, and stress response regulation. Conclusions. Non-structural proteins observed in scalp hair include promising biomarkers to investigate the long-term developmental changes and health status associated with early childhood experiences.


Introduction
Early human development is extremely sensitive to parental, environmental, and societal influences that vary with the history of each individual (via genetic and epigenetic factors) and with their daily experiences.Variations in these factors, such as stress and social determinants of health, can singly or collectively introduce differences in developmental outcomes [1][2][3][4].Such differences are then magnified in the higher-order cognitive and behavioral capacities of the human mind-brain-body connectome, which are built on a series of sequential or staggered developmental epochs that can enable or constrain individuals' future potential, their role(s) in society, as well as their mental and physical health [4][5][6][7][8].
Objective assessment of social, emotional, and other environmental inputs across multiple timescales is challenging in early childhood.These challenges result from most subjects being pre-verbal, coming from unknown environments, or accompanied by unreliable, fearful, or distrusting historians [1,3,9,10].Developmental timescales can also range from milliseconds to minutes (e.g., affecting acute neuromodulatory tone, neuronal oscillations, or neuroendocrine changes), days to weeks (e.g., affecting circadian rhythms, metabolic functions, or memory and learning), or months to years (e.g., affecting brain growth and brain plasticity or emerging cognitive, behavioral, or social capacities) [4,11].Neurophysiological, neuroimaging, and observational studies have attempted to describe and quantify early developmental changes, but there remains a need for non-invasive, objective biomarkers that can be measured serially across the months and years of childhood development [12][13][14][15].
Human scalp hair from preschool children, derived from the neuroectoderm and mesoderm, grows constantly at about 1 cm/month and evolves via the prenatal lanugo, postnatal vellus, intermediate medullary, and terminal hair stages [16].Hair is comprised of 65-85% protein, 15-35% water, 1-9% lipids, and 0.1-5% pigment, like melanin and trace elements [17].Constantly growing scalp hair incorporates both endogenous and exogenous proteins in a time-averaged chronological manner [18], unlike any other biospecimens [19].Therefore, it is used routinely to monitor drug exposures, heavy metals, and other environmental toxins [20] or even reflect the social determinants of health [3].
Developmentally regulated hair proteins could offer biomarker candidates for the mind-brain-body connectome with the potential to monitor health status in real time during early childhood development.However, all published data on hair proteins are limited to adult subjects, include relatively small sample sizes, and focus mainly on structural hair proteins.Lee et al. described 343 hair proteins from three adults, showing evidence for posttranslational modifications [21].Laatsch et al. analyzed hair from 18 males and 3 females, reporting ethnic differences in keratins and keratin-associated proteins (KAPs) [22].Carlson et al. characterized hair proteins from one adult with limited sample availability [23], whereas Wu et al. used hierarchical protein clustering to match 10 monozygotic twin pairs and differentiate them from unrelated individuals [24].Parker et al. reported quantifiable measures [21] of identity discrimination and racial ancestry by detecting genetically variant peptides in the structural hair proteins for forensic purposes [25].
To fill the extant gaps in knowledge, we analyzed non-structural hair proteins using ultra-performance liquid chromatography tandem mass spectrometry (UPLC-MS/MS) and ELISA-based validation studies conducted on a limited subset of the detected nonstructural hair proteins present in preschool children and their mothers.Our subjects were not exposed to early life adversity, as evidenced by parental income, household structure, health insurance, and parent education [5] as well as hair cortisol concentrations [4,26].

Materials and Methods
After IRB approval and parental consent, mothers and children aged 1-6 years were recruited from local preschool facilities.All children were developmentally appropriate, healthy, and belonged to stable nuclear families (Table 1).We excluded children with tinea capitis, alopecia areata, eczema, or other scalp conditions; those receiving any prescription or over-the-counter drugs or steroid therapy in the past 3 months; and those with chronic medical conditions, developmental delay, or chemical exposures to the hair prior to study entry.Hair samples from the posterior vertex (1 cm 2 area) were trimmed at 0.1 mm from the scalp and stored in Ziploc ® bags at 4 • C. Mothers' hair (n = 8) had a significantly higher number of proteins (p = 0.001) and protein spectral matches (p = 0.0004) than children's hair (n = 32).Related children (n = 16) are grouped with their mothers, and unrelated children are listed below them (n = 16).Abbreviations: NH = Non-Hispanic, H = Hispanic, NA = Not Available.

Hair Protein Extraction
Proprietary methods were developed for extracting the soluble protein components of human scalp hair.

Proteomics Method
Protein pellets were resuspended in 50 mM ammonium bicarbonate in the presence of 0.0015% ProteaseMAX (Promega, Madison, WI, USA), and the total protein amount was estimated with Pierce BCA assays (Thermo Fisher Scientific, San Jose, CA, USA) for the consistent loading of all samples.Proteins were digested with 0.25 µg of Trypsin/LysC (Promega) at a 1:100 enzyme/substrate ratio overnight at 37 • C. Proteolytic digestion was quenched with 1% formic acid; peptides were dried by speed vac before dissolving in 30 µL of reconstitution buffer (2% acetonitrile + 0.1% Formic acid) to a concentration of 1 µg/µL, and 2 µL of this solution was injected into the MS instrument.
Experiments were performed on the Orbitrap Fusion Tribrid mass spectrometer (Thermo Scientific) coupled with an ACQUITY M-Class ultra-performance liquid chromatography (UPLC) system (Waters Corporation, Milford, MA, USA).For a typical LCMS experiment (liquid chromatography/mass spectrometry), a flow rate of 450 nL/min was used, in which mobile phase A was 0.2% formic acid in water and mobile phase B was 0.2% formic acid in acetonitrile.Analytical columns were pulled using fused silica (I.D. 100 microns) and packed with Magic 1.8-micron 120Å UChrom C18 stationary phase (nanoLCMS Solutions) to a length of ~25 cm.Peptides were directly injected onto the analytical column using a gradient (2-45% B followed by a high-B wash) of 80 min.The MS was operated in a data-dependent fashion using CID (collision-induced dissociation) to generate MS/MS spectra, which were collected in the ion trap with the collisional energy set at 35.
The *.raw data files were processed using Byonic v3.2.0 (ProteinMetrics, Cupertino, CA, USA) to infer protein isoforms using the Uniprot Homo sapiens database.Proteolysis with Trypsin/LysC was assumed to be semi-specific, allowing for N-ragged cleavage with up to 2 missed cleavage sites.Precursor mass accuracies were maintained within 12 ppm and 0.4 Da for MS/MS fragments.Proteins were limited to a false discovery rate (FDR) of 1% or lower using standard target-decoy approaches [27], and only the proteins with >3 spectral counts were selected for further data processing; keratins and KAPs were removed at this stage.

Generation of Age-Associated Proteomic Libraries
Initially, standard UPLC-MS/MS methods (Section 2.2) were employed to identify non-structural hair shaft proteins, using protein purification to remove keratins and establish age-associated hair shaft proteomic libraries with pooled hair samples from 40 children of diverse races/ethnicities (Asian, White, mixed, or other races; Hispanic/non-Hispanic ethnicity) aged 1-5 years (mean/SD = 44.5 months ± 12.6 months) and 43 mothers also of diverse races/ethnicities (aged 39 years ± 5 years).The utilization of large numbers of individuals of diverse races and ethnicities improved our ability to detect representative patterns of non-structural proteins incorporated in the hair shaft.We observed 1368 non-structural hair proteins in children and 1438 non-structural hair proteins in mothers, with 1288 proteins showing individual variability.The total number of age-associated proteins discovered in these libraries was also detected in the analyses of 40 independent subjects that were not used for the generation of the libraries.Individual hair samples from 8 mothers with 16 biologically related children and 16 unrelated children were analyzed against the pooled hair protein libraries to create a master library of hair proteins.These data were deposited through the PRIDE repository [28] into the ProteomeXchange Consortium [29,30].

Statistical Analysis
Spectral counts were used to calculate Euclidean distances between individuals and determine hierarchical clustering.A correlation matrix with Spearman's coefficient was also used for the rank-based depiction of similarities between the individual hair proteomes.
Principal component analysis (PCA) [32][33][34] was used to reduce the dimensionality of this rich dataset.PCA is a widely used technique for the analytical modeling of linear combinations of the original dimensions, which are called principal components [34].The largest proportion of data variance is captured by the first principal component, the second largest proportion of variance falls into the second principal component, and so on [32].For the first five principal components from each PCA, we multiplied the loading scores of each protein by the percent variance explained by that corresponding principal component; these weighted scores were summed for each protein to give its total loading score (TLS).On the basis of their TLS values, the top 5% of proteins were selected as the main drivers of variability in hair protein expression.
Additionally, we used t-distributed stochastic neighboring embedding (tSNE), a nonlinear probabilistic approach [35,36], to visualize proteins with non-linear similarity in high-dimensional space as neighbors in low-dimensional linear depictions.Unlike the reproducible PCA results, the probabilistic nature of tSNE can result in somewhat different results for each computation.To avoid serendipitous results, we ran each computation at least 10 times to ensure reproducibility.For each computation, the maximum number of iterations to converge was set to 1000, and perplexity was set to the maximum permitted value.The statistical significance of the tSNE clustering was calculated by how often a given statistic was reproduced in 1000 simulations of permuted versions of the dataset.
Boolean profiles of the hair proteins were also compared between the original dataset (each mother coupled with her own children) and 5000 simulated datasets, which were created by swapping mothers between families such that no mother was paired with her own children, but the two siblings remained together in all simulated datasets.Observed conservation in pairwise intra-family Manhattan distances from the original dataset could then be attributed to the similarities in hair protein expression between each mother and her children.
For the top 5% of proteins in children (n = 32), we averaged spectral counts for girls and boys separately and divided the girls' average by the boys' average.The resulting values were converted to log base 2. The same process was followed for spectral counts from mothers and children.
The log fold-change values of the top 5% of proteins were used as input for ingenuity pathway analysis (Qiagen: https://digitalinsights.qiagen.com/products/features/(accessed on 24 June 2019)).We analyzed direct and indirect relationships between molecules on the basis of experimentally observed data, restricted to human databases in the Ingenuity Knowledge Base.We used random forest (RF) models for both the classification (boys vs. girls, mothers vs. children) and regression (age prediction) tasks, with protein concentrations as model features and individuals as samples [37].For classification, the model output was the probability of an individual being female (sex classification) or being a mother (person classification).For regression (age prediction), the model output was the individual's predicted age.
Results were based on a 10-fold cross-validation repeated 100 times.Members of the same family were included in the same set, i.e., either training or test sets, to avoid information leaks due to familial similarities.For age prediction, we evaluated results using the R 2 coefficient of determination and the linear model p-value fitted on the predicted and observed data.For the classification tasks, we used the area under the ROC curve (AUC) and the Wilcoxon Mann-Whitney test to test the null hypothesis that one distribution is not stochastically greater than the other.

Features of Hair Proteins
There were 3124 proteoforms representing the gene products of 2278 genes.The expression of protein isoforms, alternative splicing of messenger RNA (mRNA), and post-translational modifications resulted in a higher number of hair proteins than their associated genes [21,38].Hair proteins observed in individual mothers and children contained 2269 unique 'proteoforms' or protein isoforms; 1438 proteins were commonly observed in mothers and 1368 proteins were commonly observed in children, and 1288 hair proteins showed individual variability among mothers and children.Higher spectral counts (p = 0.0004) and higher numbers of proteins (p = 0.001) were observed in mothers than in children (Figure 1), perhaps reflecting a wider array of biological functions in adult females related to reproduction [39][40][41], aging [37,42], or disease states [43].These age differences were explored further in subsequent analyses.
same family were included in the same set, i.e., either training or test sets, to avoid information leaks due to familial similarities.For age prediction, we evaluated results using the R 2 coefficient of determination and the linear model p-value fitted on the predicted and observed data.For the classification tasks, we used the area under the ROC curve (AUC) and the Wilcoxon Mann-Whitney test to test the null hypothesis that one distribution is not stochastically greater than the other.

Features of Hair Proteins
There were 3124 proteoforms representing the gene products of 2278 genes.The expression of protein isoforms, alternative splicing of messenger RNA (mRNA), and posttranslational modifications resulted in a higher number of hair proteins than their associated genes [21,38].Hair proteins observed in individual mothers and children contained 2269 unique 'proteoforms' or protein isoforms; 1438 proteins were commonly observed in mothers and 1368 proteins were commonly observed in children, and 1288 hair proteins showed individual variability among mothers and children.Higher spectral counts (p = 0.0004) and higher numbers of proteins (p = 0.001) were observed in mothers than in children (Figure 1), perhaps reflecting a wider array of biological functions in adult females related to reproduction [39][40][41], aging [37,42], or disease states [43].These age differences were explored further in subsequent analyses.(B) the numbers of proteins observed (with spectral counts >3) were consistently higher (p = 0.001, Wilcoxon tests) in mothers (M; cyan) than in children (C; pink).Mothers and their biological children (family labels: F107, F123, F134, F142, F183, F218, F271, and F288) and unrelated children (U) are identified on the X-axis: all mothers except F134 and F218 had higher spectral counts and more hair proteins than their children.

Hair Protein Profiles in Individuals and Families
Peptide spectral matches for each protein were combined to compare protein expression for all individuals and assess Spearman rank correlations.Hair proteins from the mothers were closely correlated to each other, whereas hair proteins in children showed correlations based on age and sex (Figure 2A).Euclidean distances were calculated for pairwise comparisons between individuals (Figure 2B) and used for hierarchical clustering to identify subjects with similarities in hair protein patterns (Figure 2C).Consistent with the correlation matrix, all mothers were clustered close together; younger children (0-2 years) were mostly located in one cluster, whereas older children were clustered with the mothers (Figure 2C).The Boolean profiles of the hair proteins for each mother and her two biological children showed significantly shorter intra-family Manhattan distances (p < 0.0002) than the 5000 'simulated' families with mismatched mothers and children (Figure 2D), revealing hereditary vs. environmental conservation of hair protein profiles within each family.
pairwise comparisons between individuals (Figure 2B) and used for hierarchical clustering to identify subjects with similarities in hair protein patterns (Figure 2C).Consistent with the correlation matrix, all mothers were clustered close together; younger children (0-2 years) were mostly located in one cluster, whereas older children were clustered with the mothers (Figure 2C).The Boolean profiles of the hair proteins for each mother and her two biological children showed significantly shorter intra-family Manhattan distances (p < 0.0002) than the 5000 'simulated' families with mismatched mothers and children (Figure 2D), revealing hereditary vs. environmental conservation of hair protein profiles within each family.

Age-and Sex-Related Differences in Hair Proteins
Both PCA [32][33][34] and tSNE [35,36] were used to reduce the data dimensionality and identify the major contributors of hair protein variability.Principal components 1-5 accounted for 61.6% of hair protein variability for all subjects, 57.5% for all children, 84.0% for all mothers, 60.8% for mothers and related children, and 62.3% for mothers and unrelated children.Age differences were observed by plotting the first two principal components (PC1 and PC2) and tSNE dimensions (Figure 3).We observed two separate clusters for the younger children and the mothers, with the older children dispersed across these groups (Figure 3A).Similar clusters were observed from the remaining principal components.The tSNE projections also showed that mothers were located separately from the children (Figure 3B).The proteins driving these differences showed higher spectral counts in mothers vs. children for SERPINB4 (serine protease inhibitor), POF1B (actin filament binder), PLEC (cytoskeleton binding protein), A2ML1 (α2-macroglobulin-like proteinase inhibitor), HIST1H3A (histone), UQCRQ (electron transfer from ubiquinol to cytochrome C), and AHCY (adenosylhomocysteine hydrolase).By contrast, mammaglobin-B (SCGB2A1), a heterodimerization protein that binds androgen and other steroids, was observed only in children (Table 2).Older children had higher spectral counts for PLEC (plectin), EIF3A (eukaryotic translation initiation factor 3), AHCY (adenosylhomocysteinase), HAL (histidine ammonia-lyase), and TUBA1C (tubulin alpha 1c), whereas younger children had higher protein spectral counts for SCGB2A1 (secretoglobin 2A member 1) and CSN2 (casein beta) (Table 3).3).3).
Sex differences showed slightly higher spectral counts in girls vs. boys (p = 0.038) but no difference in the number of proteins (Table 1).PCA analyses and tSNE projections showed overlapping clusters of boys and girls (Figure 3C,D).When comparing individual proteins, higher spectral counts were observed for CSN2 (casein beta, p = 0.0184) in boys and ALMS1 (Alström syndrome protein 1, p = 0.0214) in girls (Table 3).To further characterize the effects of early childhood and adulthood on hair proteins, random forest regressions [44] were used to predict the participants' age from their hair protein profiles.This model predicted age differences in mothers and children (R 2 = 0.37, Figure 4A), but the regression model improved (R 2 = 0.45) when mothers were removed from this analysis and only children were included in this predictive model (Figure 4B).Random forest classifier algorithms showed acceptable mean accuracy for classifying mothers and children based on their predicted vs. observed age (mean area under the ROC curve = 0.93, Figure 4C; Wilcoxon test p = 0.00011, Figure 4).Note.Girls showed higher spectral counts than boys for several proteins (negative Expr Log Ratio), whereas boys had higher spectral counts for other proteins (positive Expr Log Ratio).CSN2 was significantly higher in boys, whereas ALMS1 was significantly higher in girls.Significance was based on Kruskal-Wallis ANOVA with post hoc Benjamini-Hochberg corrections for multiple comparisons (* p-value ≤ 0.05).
mothers and children based on their predicted vs. observed age (mean area under the ROC curve = 0.93, Figure 4C; Wilcoxon test p = 0.00011, Figure 4).A random forest classifier to predict sex from hair protein profiles in children could not reliably differentiate boys from girls (mean area under the ROC curve = 0.6, Figure 4E; Wilcoxon test p = 0.1703; Figure 4F), but predictions improved when classifying all participants including mothers and children (area under the ROC curve = 0.73, Figure 4G; Wilcoxon test p = 0.0083, Figure 4H).The latter result is likely due to the age-based distinction between mothers and children, although sample size-related effects cannot be ruled out (25 vs. 17 females).A random forest classifier to predict sex from hair protein profiles in children could not reliably differentiate boys from girls (mean area under the ROC curve = 0.6, Figure 4E; Wilcoxon test p = 0.1703; Figure 4F), but predictions improved when classifying all participants including mothers and children (area under the ROC curve = 0.73, Figure 4G; Wilcoxon test p = 0.0083, Figure 4H).The latter result is likely due to the age-based distinction between mothers and children, although sample size-related effects cannot be ruled out (25 vs. 17 females).

Top Contributors to Hair Protein Variability
The top 5% of proteins identified as the most prominent contributors, according to their total loading scores (TLS), explained 64.3% of hair protein variability in all individuals, 89.5% in all mothers, 57.5% in all children, 49.3% in mothers and related children, and 64.6% in mothers and unrelated children (Figure 5).A higher TLS indicates a higher influence of that protein on total variability.Keratins and KAPs are structural components but are usually considered contaminants in most proteomics experiments due to their high abundance in common lab analyses.We thus performed PCA analyses for all individuals with (Figure 5A) and without (Figure 5B) excluding the keratins and KAPs.Structural proteins contributed to hair protein variability, but they have limited biological significance.Separate PCA analyses performed to characterize the hair proteins observed in mothers (Figure 5C), children (Figure 5D), mothers and related children (Figure 5E), and mothers and unrelated children (Figure 5F) showed the same proteins as those ranked in all individuals and all children.Other than histones, no other proteins were common between mothers and children.TUBA1C, PLEC, SERPINB4, and UQCRQ were observed in multiple subgroups.

Biological Role(s) of the Strongest Contributors to Hair Protein Variability
Using experimentally observed human data in the Ingenuity Knowledge Base, the log fold change values of the top 5% of proteins from our dataset were used to analyze direct and indirect relationships between protein molecules.Protein networks for the top 5% of hair proteins contributing to age-related differences between mothers and children (Figure 6) and similar analyses for sex-related differences between girls and boys were examined (Figure 7).Using these molecular relationships as input for ingenuity pathway analysis, we identified protein classes involved in cellular metabolism, such as the protein ubiquitination pathway, the sirtuin signaling pathway, 14-3-3-mediated signaling, the Wnt-Ca ++ pathway, histidine degradation, mitochondrial function, and oxidative phosphorylation (Figure 8).Other proteins were associated with immune responses (including phagosome maturation, IL-8 signaling, and the regulation of macrophages, fibroblasts, and endothelial cells) or were involved in the regulation of stress-related pathways, including corticotropin-releasing hormone signaling, glucocorticoid receptor signaling, and prolactin and aldosterone signaling.Finally, hair proteins associated with brain development, including axonal guidance and gap junction signaling, were also identified (Figure 8).

Biological Role(s) of the Strongest Contributors to Hair Protein Variability
Using experimentally observed human data in the Ingenuity Knowledge Base, t log fold change values of the top 5% of proteins from our dataset were used to analy direct and indirect relationships between protein molecules.Protein networks for the t 5% of hair proteins contributing to age-related differences between mothers and childr (Figure 6) and similar analyses for sex-related differences between girls and boys we examined (Figure 7).Using these molecular relationships as input for ingenuity pathw including corticotropin-releasing hormone signaling, glucocorticoid receptor signaling, and prolactin and aldosterone signaling.Finally, hair proteins associated with brain development, including axonal guidance and gap junction signaling, were also identified (Figure 8). Figure 6.Protein network for the top 5% of hair proteins contributing to age-related differences between mothers and children.Note.Some hair proteins had higher spectral counts in children (orange), and others had higher spectral counts in mothers (blue); continuous lines show direct relationships, and interrupted lines denote indirect relationships.Mothers showed higher spectral counts mostly for 'enzymes' and 'peptidases' involved in cellular and metabolic processes, while proteins with higher spectral counts in children belonged to the 'other' group involved in growth and biological maturation.

Figure 6.
Protein network for the top 5% of hair proteins contributing to age-related differences between mothers and children.Note.Some hair proteins had higher spectral counts in children (orange), and others had higher spectral counts in mothers (blue); continuous lines show direct relationships, and interrupted lines denote indirect relationships.Mothers showed higher spectral counts mostly for 'enzymes' and 'peptidases' involved in cellular and metabolic processes, while proteins with higher spectral counts in children belonged to the 'other' group involved in growth and biological maturation.Protein network for top 5% of hair proteins contributing to sex differences between boys and girls.Note.Protein network for the top 5% of hair proteins contributing to sex differences between boys and girls.Some proteins had higher spectral counts in girls (orange), and others had higher spectral counts in boys (blue); continuous lines show direct relationships, and interrupted lines denote indirect relationships.Girls showed higher protein spectral counts mostly for 'enzymes' or 'transporters' associated with cellular localization and metabolic processes.Proteins with higher spectral counts in boys are 'enzymes', like 'kinases' or 'peptidases', associated with biological regulation of cellular and metabolic processes.

Figure 7.
Protein network for top 5% of hair proteins contributing to sex differences between boys and girls.Note.Protein network for the top 5% of hair proteins contributing to sex differences between boys and girls.Some proteins had higher spectral counts in girls (orange), and others had higher spectral counts in boys (blue); continuous lines show direct relationships, and interrupted lines denote indirect relationships.Girls showed higher protein spectral counts mostly for 'enzymes' or 'transporters' associated with cellular localization and metabolic processes.Proteins with higher spectral counts in boys are 'enzymes', like 'kinases' or 'peptidases', associated with biological regulation of cellular and metabolic processes.Protein network for top 5% of hair proteins contributing to sex differences between boys and girls.Note.Protein network for the top 5% of hair proteins contributing to sex differences between boys and girls.Some proteins had higher spectral counts in girls (orange), and others had higher spectral counts in boys (blue); continuous lines show direct relationships, and interrupted lines denote indirect relationships.Girls showed higher protein spectral counts mostly for 'enzymes' or 'transporters' associated with cellular localization and metabolic processes.Proteins with higher spectral counts in boys are 'enzymes', like 'kinases' or 'peptidases', associated with biological regulation of cellular and metabolic processes.

ELISA Validation of Other Non-Structural Hair Proteins
Select proteins of interest detected via standard UPLC-MS/MS methods were validated and quantified using commercially available ELISA kits.The first portion of the surplus volumes of individual protein extracts remaining after UPLC-MS/MS and HCC measures were pooled by low, intermediate, or high HCC values.Hair sample pools were used to quantify cortisol and arginine vasopressin (AVP), which potentiate the hypothalamic release of corticotropin-releasing hormone [45,46]; Cu/Zn superoxide dismutase (SOD1), an important cellular defense against reactive oxygen species [47,48]; HTrA serine peptidase 2 (HTRA2), a mitochondrial protease chaperone that regulates cellular proteostasis and cell-signaling events [49]; and glial fibrillary acid protein (GFAP), a protein responsible for the cytoskeletal structure of glial cells [50,51] (Table 4).

Discussion
The chemical composition of hair [17,[52][53][54][55][56] and its structural proteins (keratins, KAPs) are well-studied [22][23][24][25], but minimal data exists on non-structural hair proteins.This study represents the first description of non-structural hair proteins in mothers and young children.We found 2269 non-structural hair proteins with important differences between mothers and children, age-and sex-related differences among preschool children, and conserved hair protein profiles within families.Hair proteins driving variability in different populations were found to play vital roles in functions other than those of trichocytes in the hair follicle, including cellular metabolic pathways, brain development, immune signaling, and stress regulation.
We observed age-related hair protein profiles in children and mothers, with distinct patterns emerging in multiple analyses.Differences between mothers and children were largely driven by increased maternal expression of SERPINB4, PLEC, and UQCRQ.SERPINB4 is a granzyme inhibitor linked to squamous cell carcinoma and chronic liver disease [57][58][59], Plectin mutations are linked to epidermolysis bullosa simplex and may be a susceptibility gene for testicular germ cell tumors [60][61][62], and UQCRQ is a nuclear protein in the mitochondrial respiratory chain complex III essential for brain development [63].Mammaglobin-B (SCGB2A1), which is linked to familial febrile seizures in preschool children [64,65] and chemoresistant cancers in adults [66], was observed only in children's hair.
We found minimal sex differences in early childhood, confirmed by random forest predictive models.Biological pathways of cellular metabolism and innate immunity appeared more prominent in girls, whereas brain development and stress regulation were more prominent in boys.Perhaps sex differences in hair proteins are accentuated following the onset of puberty [67].Although hair protein profiles were conserved in mothers and their biological children, future studies in mother-child dyads and monozygotic vs. dizygotic twins will be required to explore the gene × environment interactions responsible for hair protein profiles [68].
From the ingenuity pathway analysis, we identified the hair proteins associated with axonal guidance [69] and gap junction signaling [70], both signifying important mechanisms in brain development.By cross-referencing the Uniprot database (https: //www.uniprot.org/(accessed on: 24 June 2019)) with the Allen Brain Atlas (https:// human.brain-map.org/static/brainexplorer(accessed on: 26 June 2019)) and the Human Brain Protein Atlas (https://www.proteinatlas.org/search/brain_category(accessed on: 28 June 2019)), we identified 191 hair proteins that are regionally enriched in the brain.Further studies will examine whether hair proteomics can complement neuroimaging and neurophysiological studies of early brain development [12].A study from Nepal reported specific plasma proteins associated with higher non-verbal intelligence and proinflammatory proteins associated with lower intelligence in children [71].That study, however, used an FDR of 5%, whereas the FDR threshold for our analyses was set at 1% or lower.Future developmental studies with large sample sizes could correlate hair proteins with cognitive or behavioral outcomes, thus investigating their role in brain development [72].Thus, unbiased or targeted protein profiles from serial hair samples (or sequential hair segments in the same hair sample) could be used as probes for child development [73,74] or life-course studies [43,75,76].
These findings must be interpreted in light of three limitations.First, our sample size of 32 children was insufficient to examine developmental differences at each age in the preschool period.We selected healthy children from homogenous socioeconomic environments; they did not experience any adverse conditions, and therefore, our data do not represent the full range of hair protein profiles present in the general population.Despite this, our sample size was larger than that of most other studies on hair proteomics in adults, and it is the first to include mothers and children.Our study design also allowed us to investigate differences in hair protein profiles between related and unrelated individuals, as well as differences between adults and children.
Second, our proteomics platform relied on peptide spectral matches, which presented only semi-quantitative data on the abundance of hair proteins in individuals.Since this is the first study investigating non-structural proteins from hair in humans, we chose a 'shotgun' proteomics approach rather than targeted and more quantitative approaches.We did, however, orthogonally confirm the presence of specific hair proteins using well-validated ELISA assays.Having established the first hair protein libraries in mothers and children, future studies can be designed for the quantitation of specific protein targets or protein groups.Lastly, we did not correlate hair proteins with children's developmental milestones or their cognitive and behavioral data.We feel that the sample size limitations at each age would preclude any generalizable conclusions from such analyses.
Despite these limitations, our initial findings reveal the potential importance of nonstructural hair proteins as biomarkers for brain development or other cellular regulatory pathways, providing a rich source of chronologically ordered information for life-course studies and early childhood development.

Conclusions
This research shows that exposures to family adversity, chronic stress, parenting and caregiving practices, and early attachment can be monitored by serial hair sampling to determine children's health status, brain development, and physical and mental health.We found that hair protein profiles are related to age, sex, and family relationships.The top 5% contributors to variability in hair protein patterns were associated with the regulation of (a) immune pathways (for phagosome maturation, IL-8 signaling, PKR interferon induction, regulation of fibroblasts, macrophages, and endothelial cells); (b) stress signaling pathways (for corticotropin-releasing hormone, glucocorticoid receptors, prolactin, and aldosterone); (c) brain development (axonal guidance and gap junction signaling); and (d) cellular metabolic pathways (for oxidative phosphorylation, mitochondrial dysfunction, histidine degradation, and caveolar-mediated endocytosis as well as the heat shock protein, 14-3-3 protein, sirtuin, and Wnt/Ca ++ signaling pathways).When amalgamated with well-established methods for tracking changes in hair hormones, this approach may provide mechanistic explanations for the developmental sequences leading to HPA axis (dys)regulation in early life.The assessment of parent-child synchrony, children's circadian rhythms, and positive and negative attachments need not depend on subjective questionnaires, invasive blood sampling, or neuroimaging.We propose that non-invasive hair sampling and tandem mass spectrometry methods can be used to compare non-structural hair protein profiles in healthy, normal children against hair protein profiles in subpopulations of children with confirmed exposures to toxic stress and/or adverse living conditions.Future studies will be designed to quantify and characterize panels of related hair proteins to probe changes in the immune system, stress regulation, brain development, and cellular metabolism to monitor environmental influences on the health status and development of children.Informed Consent Statement: Written informed consent was obtained from all subjects enrolled in the study.

Weighted
Score = Loading Score * Proportion of Variance Total Loading Score (TLS) =

Figure 1 .Figure 1 .
Figure 1.Hair proteins in mothers and children.Note.(A) Protein spectral counts (p = 0.0004) and (B) the numbers of proteins observed (with spectral counts >3) were consistently higher (p = 0.001, Figure 1.Hair proteins in mothers and children.Note.(A) Protein spectral counts (p = 0.0004) and(B) the numbers of proteins observed (with spectral counts >3) were consistently higher (p = 0.001, Wilcoxon tests) in mothers (M; cyan) than in children (C; pink).Mothers and their biological children (family labels: F107, F123, F134, F142, F183, F218, F271, and F288) and unrelated children (U) are identified on the X-axis: all mothers except F134 and F218 had higher spectral counts and more hair proteins than their children.

Figure 2 .Figure 2 .
Figure 2. Similarities in hair protein profiles of individuals and families.Note.(A) Spearman rank correlation matrix with high (purple) to low (orange) correlation coefficients; (B) Euclidean distances based on protein spectral counts showing individuals more closely related (red) or more distant (grey) from each other; (C) hierarchical cluster dendrogram based on log spectral counts showing 7/8 mothers grouped into one cluster with older children (4-6 years) (mustard), mostly White or Asian families, with the highest numbers of proteins and protein spectral matches (PSMs); one mother in an adjacent cluster with children 2-4 years (pink), mostly Hispanic families, with fewer

Psych 2024, 6 ,Figure 3 .
Figure 3. Age and sex-related differences in hair proteins.Note.(A) The first two principal components showing spatial separations by age, with children over 2 years old (pink) located between the children up to 2 years of age (blue, upper right) and the mothers (green, lower left).(B) The first two tSNE dimensions by age, showing mothers (green) in the left upper quadrant separate from the younger (0-2 years, blue) and older (3-5 yeas, pink) children.Higher spectral counts for 7/17 hair proteins occurred in mothers (SERPINB4, POF1B, PLEC, A2ML1, HIST1H3A, UQCRQ, and AHCY) and one protein (SCGB2A1) in children (Kruskal-Wallis ANOVA and post hoc Benjamini-Hochberg corrections).(C) PCA analyses of all children showing overlapping circles for girls (blue) and boys (pink).(D) tSNE dimensions by sex, showing overlap between boys (pink) and girls (blue).Higher spectral counts were observed for CSN2 (casein beta) in boys (p = 0.0184) and ALMS1 (Alström syndrome protein 1) in girls (p = 0.0214) (see Table3).

Figure 3 .
Figure 3. Age and sex-related differences in hair proteins.Note.(A) The first two principal components showing spatial separations by age, with children over 2 years old (pink) located between the children up to 2 years of age (blue, upper right) and the mothers (green, lower left).(B) The first two tSNE dimensions by age, showing mothers (green) in the left upper quadrant separate from the younger (0-2 years, blue) and older (3-5 yeas, pink) children.Higher spectral counts for 7/17 hair proteins occurred in mothers (SERPINB4, POF1B, PLEC, A2ML1, HIST1H3A, UQCRQ, and AHCY) and one protein (SCGB2A1) in children (Kruskal-Wallis ANOVA and post hoc Benjamini-Hochberg corrections).(C) PCA analyses of all children showing overlapping circles for girls (blue) and boys (pink).(D) tSNE dimensions by sex, showing overlap between boys (pink) and girls (blue).Higher spectral counts were observed for CSN2 (casein beta) in boys (p = 0.0184) and ALMS1 (Alström syndrome protein 1) in girls (p = 0.0214) (see Table3).

Figure 4 .
Figure 4. Machine learning algorithms predict age and sex from hair proteins.Note.Mean scatterplot from 100 runs of random forest regression showing (A) observed vs. predicted age for mothers and children (R 2 0.37, p = 0.00005) and (B) only for children (R 2 0.45, p = 0.00004).(C,D) Random forest plot showing mean accuracy for classifying mothers and children based on hair proteins

Figure 4 .
Figure 4. Machine learning algorithms predict age and sex from hair proteins.Note.Mean scatterplot from 100 runs of random forest regression showing (A) observed vs. predicted age for mothers and

Figure 5 .
Figure 5. Top 5% of proteins contributing to hair protein variability.Note.The loading scores for each protein were weighted by the percent variance explained by the corresponding principal component and then summed to give the total loading score (TLS) for each protein.The top 5% of proteins based on their TLSs were identified as the most prominent contributors in each group.(A) All individuals (n = 40, 49% of hair protein variability); (B) all individuals, including keratins and KAPs (n = 40, 64.3% variability); (C) all mothers (n = 8, 89.5% variability); (D) all children (n = 32, 57.5% variability); (E) mothers (n = 8) and their biological children (n = 16) (49.3% variability); and (F) mothers (n = 8) and unrelated children (n = 16) (64.6% variability).

Psych 2024, 6 , 15 Figure 7 .
Figure 7. Protein network for top 5% of hair proteins contributing to sex differences between boys and girls.Note.Protein network for the top 5% of hair proteins contributing to sex differences between boys and girls.Some proteins had higher spectral counts in girls (orange), and others had higher spectral counts in boys (blue); continuous lines show direct relationships, and interrupted lines denote indirect relationships.Girls showed higher protein spectral counts mostly for 'enzymes' or 'transporters' associated with cellular localization and metabolic processes.Proteins with higher spectral counts in boys are 'enzymes', like 'kinases' or 'peptidases', associated with biological regulation of cellular and metabolic processes.

Figure 7 .
Figure 7.Protein network for top 5% of hair proteins contributing to sex differences between boys and girls.Note.Protein network for the top 5% of hair proteins contributing to sex differences between boys and girls.Some proteins had higher spectral counts in girls (orange), and others had higher spectral counts in boys (blue); continuous lines show direct relationships, and interrupted lines denote indirect relationships.Girls showed higher protein spectral counts mostly for 'enzymes' or 'transporters' associated with cellular localization and metabolic processes.Proteins with higher spectral counts in boys are 'enzymes', like 'kinases' or 'peptidases', associated with biological regulation of cellular and metabolic processes.

Figure 8 .
Figure 8. Canonical pathways.Note.Canonical pathways associated with biologically significant proteins from the top 5% of variables in all individuals (n = 40) contributing to age-and sex-related differences were identified using ingenuity pathway analysis.Most of these proteins are involved in cellular metabolism, immune responses, brain development, and stress regulatory pathways.

Figure 8 .
Figure 8. Canonical pathways.Note.Canonical pathways associated with biologically significant proteins from the top 5% of variables in all individuals (n = 40) contributing to age-and sex-related differences were identified using ingenuity pathway analysis.Most of these proteins are involved in cellular metabolism, immune responses, brain development, and stress regulatory pathways.
Pursuant to the Patent Cooperation Treaty, an international patent was filed on November 10, 2022, identifiable in the United States Patent and Trademark Office by Application No. US2022/079619.Author Contributions: K.J.S.A. designed the study, obtained grant funding, and supervised all aspects of the research; C.R.R. processed hair samples and performed the protein extractions, data analyses, and ELISA assays; K.S., R.D.L. and A.S.C. performed the proteomics experiments; K.S., R.D.L., C.R.R. and K.J.S.A. wrote initial drafts and edited the manuscript; K.S., R.D.L., C.R.R., D.D., M.X. and N.A. performed statistical analyses and created the figures.All authors have read and agreed to the published version of the manuscript.Funding: The Maternal and Child Health Research Institute (MCHRI) at Stanford, the Eunice Kennedy Shriver National Institute for Child Health and Human Development (R01 HD099296), the National Cancer Institute (P30 CA124435) for the Stanford Cancer Institute Proteomics/Mass Spectrometry Shared Resource, and the National Institute of General Medical Sciences (R35GM138353) supported this research.The study sponsors had no role in the design or conduct of the study; the collection, management, analysis, or interpretation of the data; or the preparation, review, and approval of or the decision to publish this manuscript.Institutional Review Board Statement:The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of Stanford University (eProtocol #41369; approved 16 June 2017).

Table 1 .
Demographic characteristics and hair protein data of all participants.
Note.Demographic data, total number of proteins, and peptide spectral matches observed in all 40 individuals.

Table 2 .
Hair proteins mediating differences between mothers and children.

Table 2 .
Hair proteins mediating differences between mothers and children.
Note.Mothers showed higher spectral counts than children for 7/17 hair proteins (negative Expr Log Ratio), although children had higher spectral counts for SCGB2A1 (positive Expr Log Ratio).Of these, SCGB2A1 showed the most prominent results, with >5-fold differences from the mothers.Significance was based on Kruskal-Wallis ANOVA with post hoc Benjamini-Hochberg corrections for multiple comparisons (* p-value ≤ 0.05, ** p-value ≤ 0.01, *** p-value ≤ 0.001).

Table 3 .
Hair proteins mediating differences between preschool boys and girls.

Table 4 .
ELISA validation of proteins detected in human scalp hair via UPLC-MS/MS.Note.Groups of children and parents were determined on the basis of low, moderate, or high HCC values.Each of the six pools of samples was loaded in duplicate on the respective ELISA plates for testing arginine vasopressin (AVP), Cu/Zn superoxide dismutase (SOD1), HTrA serine peptidase 2 (HTRA2), and glial fibrillary acid protein (GFAP).Each method passed our criteria for low inter-assay (≤8% CV) and intra-assay (≤6%) variability.