Predicting Gene Expression Responses to Cold in Arabidopsis thaliana Using Natural Variation in DNA Sequence
Abstract
1. Introduction
2. Materials and Methods
2.1. Transcriptome and Genome Data
2.2. Searching for Sequence Motifs Using STREME
2.3. Random Forest Analysis
2.4. Training Convolutional Neural Networks
2.5. Determining Factors Influencing Accuracy of CNNs
2.6. Evaluating and Interpreting of Model Training
3. Results
3.1. The Presence of Enriched Motifs in the Up-Regulated DE Gene Sequences Suggests Their Potential Contribution to Environmental Responses
3.2. Known Transcription Factor Binding Sites Within Col-0 Do Not Accurately Predict the Expression Response to Cold Using Random Forests
3.3. Predicting Gene Expression Regulation Based on Regulatory Regions Using Naive Methods Is Possible
3.4. The Number of Discovered Motifs Can Impact the Correct Prediction of Up-Regulated DEGs
3.5. CNNs Imperfectly Identify a Simple Artificial Signal Within the Regulatory Regions
3.6. Predicting Only Among Genotypes Sharing Alleles for Major Trans-eQTL Does Not Improve the Prediction of the Up-Regulated DEGs
3.7. Including Both Upstream and Downstream Regions Improves the Prediction of Up-Regulated DEG Class
3.8. Genetic Diversity Does Not Improve Accuracy of Differentially Expressed Genes
4. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kawecki, T.J.; Ebert, D. Conceptual issues in local adaptation. Ecol. Lett. 2004, 7, 1225–1241. [Google Scholar] [CrossRef]
- Romero, I.G.; Ruvinsky, I.; Gilad, Y. Comparative studies of gene expression and the evolution of gene regulation. Nat. Rev. Genet. 2012, 13, 505–516. [Google Scholar] [CrossRef] [PubMed]
- Yang, Z.; Xu, G.; Zhang, Q.; Obata, T.; Yang, J. Genome-wide mediation analysis: An empirical study to connect phenotype with genotype via intermediate transcriptomic data in maize. Genetics 2022, 221, iyac057. [Google Scholar] [CrossRef] [PubMed]
- Josephs, E.B.; Lee, Y.W.; Stinchcombe, J.R.; Wright, S.I. Association mapping reveals the role of purifying selection in the maintenance of genomic variation in gene expression. Proc. Natl. Acad. Sci. USA 2015, 112, 15390–15395. [Google Scholar] [CrossRef] [PubMed]
- Josephs, E.B.; Lee, Y.W.; Wood, C.W.; Schoen, D.J.; Wright, S.I.; Stinchcombe, J.R. The Evolutionary Forces Shaping Cis- and Trans-Regulation of Gene Expression within a Population of Outcrossing Plants. Mol. Biol. Evol. 2020, 37, 2386–2393. [Google Scholar] [CrossRef]
- Mack, K.L.; Square, T.A.; Zhao, B.; Miller, C.T.; Fraser, H.B. Evolution of spatial and temporal cis-regulatory divergence in sticklebacks. Mol. Biol. Evol. 2023, 40, msad034. [Google Scholar] [CrossRef]
- Keagy, J.; Drummond, C.P.; Gilbert, K.J.; Grozinger, C.M.; Hamilton, J.; Hines, H.M.; Lasky, J.; Logan, C.A.; Sawers, R.; Wagner, T. Landscape transcriptomics as a tool for addressing global change effects across diverse species. Mol. Ecol. Resour. 2023, 25, e13796. Available online: https://onlinelibrary.wiley.com/doi/abs/10.1111/1755-0998.13796 (accessed on 11 April 2023). [CrossRef]
- Huo, H.; Wei, S.; Bradford, K.J. DELAY OF GERMINATION1 (DOG1) regulates both seed dormancy and flowering time through microRNA pathways. Proc. Natl. Acad. Sci. USA 2016, 113, E2199–E2206. [Google Scholar] [CrossRef]
- Mateos, J.L.; Tilmes, V.; Madrigal, P.; Severing, E.; Richter, R.; Rijkenberg, C.W.M.; Krajewski, P.; Coupland, G. Divergence of regulatory networks governed by the orthologous transcrip-tion factors FLC and PEP1 in Brassicaceae species. Proc. Natl. Acad. Sci. USA 2017, 114, 11037–11046. [Google Scholar] [CrossRef]
- Thomashow, M.F. Molecular Basis of Plant Cold Acclimation: Insights Gained from Studying the CBF Cold Response Pathway. Plant Physiol. 2010, 154, 571–577. [Google Scholar] [CrossRef]
- Adrian, J.; Farrona, S.; Reimer, J.J.; Albani, M.C.; Coupland, G.; Turck, F. cis-Regulatory Elements and Chromatin State Coordinately Control Temporal and Spatial Expression of FLOWERING LOCUS T in Arabidopsis. Plant Cell 2010, 22, 1425–1440. [Google Scholar] [CrossRef]
- Cubillos, F.A.; Stegle, O.; Grondin, C.; Canut, M.; Tisné, S.; Gy, I.; Loudet, O. Extensive cis-Regulatory Variation Robust to Environmental Perturbation in Arabidopsis. Plant Cell 2014, 26, 4298–4310. [Google Scholar] [CrossRef] [PubMed]
- Wittkopp, P.J.; Haerum, B.K.; Clark, A.G. Evolutionary changes in cis and trans gene regulation. Nature 2004, 430, 85–88. [Google Scholar] [CrossRef] [PubMed]
- Lovell, J.T.; Schwartz, S.; Lowry, D.B.; Shakirov, E.V.; Bonnette, J.E.; Weng, X.; Wang, M.; Johnson, J.; Sreedasyam, A.; Plott, C.; et al. Drought responsive gene expression regulatory divergence between upland and lowland ecotypes of a perennial C4 grass. Genome Res. 2016, 26, 510–518. [Google Scholar] [CrossRef] [PubMed]
- Schmitz, R.J.; Grotewold, E.; Stam, M. Cis-regulatory sequences in plants: Their importance, discovery, and future challenges. Plant Cell 2021, 34, 718–741. [Google Scholar] [CrossRef]
- de Meaux, J. An adaptive path through jungle DNA. Nat. Genet. 2006, 38, 506–507. [Google Scholar] [CrossRef]
- de Meaux, J. Cis-regulatory variation in plant genomes and the impact of natural selection. Am. J. Bot. 2018, 105, 1788–1791. [Google Scholar] [CrossRef]
- Erwin, D.H.; Davidson, E.H. The evolution of hierarchical gene regulatory networks. Nat. Rev. Genet. 2009, 10, 141–148. [Google Scholar] [CrossRef]
- Wray, G.A.; Hahn, M.W.; Abouheif, E.; Balhoff, J.P.; Pizer, M.; Rockman, M.V.; Romano, L.A. The Evolution of Transcriptional Regulation in Eukaryotes. Mol. Biol. Evol. 2003, 20, 1377–1419. [Google Scholar] [CrossRef]
- Brown, K.E.; Kelly, J.K. Genome-wide association mapping of transcriptome variation in Mimulus guttatus indicates differing patterns of selection on cis- versus trans-acting mutations. Genetics 2022, 220, iyab189. [Google Scholar] [CrossRef]
- Korfmann, K.; Gaggiotti, O.E.; Fumagalli, M. Deep Learning in Population Genetics. Genome Biol. Evol. 2023, 15, evad008. [Google Scholar] [CrossRef]
- Raicu, A.M.; Fay, J.C.; Rohner, N.; Zeitlinger, J.; Arnosti, D.N. Off the deep end: What can deep learning do for the gene expression field? J. Biol. Chem. 2023, 299, 102760. [Google Scholar] [CrossRef]
- Chen, Y.; Li, Y.; Narayan, R.; Subramanian, A.; Xie, X. Gene expression inference with deep learning. Bioinformatics 2016, 32, 1832–1839. [Google Scholar] [CrossRef]
- Giri, A.; Khaipho-Burch, M.; Buckler, E.S.; Ramstein, G.P. Haplotype associated RNA expression (HARE) improves prediction of complex traits in maize. PLoS Genet. 2021, 17, e1009568. [Google Scholar] [CrossRef] [PubMed]
- Sartor, R.C.; Noshay, J.; Springer, N.M.; Briggs, S.P. Identification of the expressome by machine learning on omics data. Proc. Natl. Acad. Sci. USA 2019, 116, 18119–18125. [Google Scholar] [CrossRef] [PubMed]
- Schwarz, B.; Azodi, C.B.; Shiu, S.H.; Bauer, P. Putative cis-Regulatory Elements Predict Iron Deficiency Responses in Arabidopsis Roots. Plant Physiol. 2020, 182, 1420–1439. [Google Scholar] [CrossRef] [PubMed]
- Moore, B.M.; Lee, Y.S.; Wang, P.; Azodi, C.; Grotewold, E.; Shiu, S.H. Modeling temporal and hormonal regulation of plant transcriptional response to wounding. Plant Cell 2021, 34, 867–888. [Google Scholar] [CrossRef]
- Uygun, S.; Azodi, C.B.; Shiu, S.H. Cis-Regulatory Code for Predicting Plant Cell-Type Transcriptional Response to High. Plant Physiol. 2019, 181, 1739–1751. [Google Scholar] [CrossRef]
- Azodi, C.B.; Lloyd, J.P.; Shiu, S.H. The cis-regulatory codes of response to combined heat and drought stress in Arabidopsis thaliana. NAR Genom. Bioinforma. 2020, 2, lqaa049. [Google Scholar] [CrossRef]
- Marand, A.P.; Eveland, A.L.; Kaufmann, K.; Springer, N.M. cis-Regulatory Elements in Plant Development, Adaptation, and Evolution. Annu. Rev. Plant Biol. 2023, 74, 111–137. [Google Scholar] [CrossRef]
- Wang, H.; Cimen, E.; Singh, N.; Buckler, E. Deep learning for plant genomics and crop improvement. Curr. Opin. Plant Biol. 2020, 54, 34–41. [Google Scholar] [CrossRef]
- Benoit, M. Hot ‘n cold: Applying the cis-regulatory code to predict heat and cold stress response in maize. Plant Cell 2021, 34, 497–498. [Google Scholar] [CrossRef]
- Zhou, P.; Enders, T.A.; Myers, Z.A.; Magnusson, E.; Crisp, P.A.; Noshay, J.; Gomez-Cano, F.; Liang, Z.; Grotewold, E.; Greenham, K.; et al. Prediction of conserved and variable heat and cold stress response in maize using cis-regulatory information. Plant Cell 2021, 34, 514–534. [Google Scholar] [CrossRef] [PubMed]
- Karollus, A.; Mauermeier, T.; Gagneur, J. Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers. Genome Biol. 2023, 24, 56. [Google Scholar] [CrossRef] [PubMed]
- Sasse, A.; Ng, B.; Spiro, A.E.; Tasaki, S.; Bennett, D.A.; Gaiteri, C.; De Jager, P.L.; Chikina, M.; Mostafavi, S. Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings. BioRxiv 2023, 55, 2060–2064. [Google Scholar] [CrossRef]
- Takou, M.; Wieters, B.; Kopriva, S.; Coupland, G.; Linstädter, A.; De Meaux, J. Linking genes with ecological strategies in Arabidopsis thaliana. J. Exp. Bot. 2019, 70, 1141–1151. [Google Scholar] [CrossRef] [PubMed]
- Yocca, A.E.; Lu, Z.; Schmitz, R.J.; Freeling, M.; Edger, P.P. Evolution of Conserved Noncoding Sequences in Arabidopsis thaliana. Mol. Biol. Evol. 2021, 38, 2692–2703. [Google Scholar] [CrossRef]
- He, F.; Arce, A.L.; Schmitz, G.; Koornneef, M.; Novikova, P.; Beyer, A.; De Meaux, J. The Footprint of Polygenic Adaptation on Stress-Responsive Cis-Regulatory Divergence in the Arabidopsis Genus. Mol. Biol. Evol. 2016, 33, 2088–2101. [Google Scholar] [CrossRef]
- Steige, K.A.; Laenen, B.; Reimegård, J.; Scofield, D.G.; Slotte, T. Genomic analysis reveals major determinants of cis-regulatory variation in Capsella grandiflora. Proc. Natl. Acad. Sci. USA 2017, 114, 1087–1092. [Google Scholar] [CrossRef]
- Lasky, J.R.; Des Marais, D.L.; Lowry, D.B.; Povolotskaya, I.; McKay, J.K.; Richards, J.H.; Keitt, T.H.; Juenger, T.E. Natural Variation in Abiotic Stress Responsive Gene Expression and Local Adaptation to Climate in Arabidopsis thaliana. Mol. Biol. Evol. 2014, 31, 2283–2296. [Google Scholar] [CrossRef]
- Akagi, T.; Masuda, K.; Kuwada, E.; Takeshita, K.; Kawakatsu, T.; Ariizumi, T.; Kubo, Y.; Ushijima, K.; Uchida, S. Genome-wide cis-decoding for expression design in tomato using cistrome data and explainable deep learning. Plant Cell 2022, 34, 2174–2187. [Google Scholar] [CrossRef]
- Hannah, M.A.; Wiese, D.; Freund, S.; Fiehn, O.; Heyer, A.G.; Hincha, D.K. Natural Genetic Variation of Freezing Tolerance in Arabidopsis. Plant Physiol. 2006, 142, 98–112. [Google Scholar] [CrossRef]
- Zhen, Y.; Dhakal, P.; Ungerer, M.C. Fitness Benefits and Costs of Cold Acclimation in Arabidopsis thaliana. Am. Nat. 2011, 178, 44–52. [Google Scholar] [CrossRef]
- Oakley, C.G.; Savage, L.; Lotz, S.; Larson, G.R.; Thomashow, M.F.; Kramer, D.M.; Schemske, D.W. Genetic basis of photosynthetic responses to cold in two locally adapted populations of Arabidopsis thaliana. J. Exp. Bot. 2018, 69, 699–709. [Google Scholar] [CrossRef]
- Oakley, C.G.; Ågren, J.; Atchison, R.A.; Schemske, D.W. QTL mapping of freezing tolerance: Links to fitness and adaptive trade-offs. Mol. Ecol. 2014, 23, 4304–4315. [Google Scholar] [CrossRef]
- 1001 Genomes Consortium. 1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis thaliana. Cell 2016, 166, 481–491. [Google Scholar] [CrossRef] [PubMed]
- Rai, A.; Umashankar, S.; Rai, M.; Kiat, L.B.; Bing, J.A.S.; Swarup, S. Coordinate Regulation of Metabolite Glycosylation and Stress Hormone Biosynthesis by TT8 in Arabidopsis. Plant Physiol. 2016, 171, 2499–2515. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R.; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25, 2078–2079. [Google Scholar] [CrossRef] [PubMed]
- Bailey, T.L. STREME: Accurate and versatile sequence motif discovery. Bioinformatics 2021, 37, 2834–2840. [Google Scholar] [CrossRef]
- Sandelin, A.; Alkema, W.; Engström, P.; Wasserman, W.W.; Lenhard, B. JASPAR: An open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids. Res. 2004, 32 (Suppl. S1), D91–D94. [Google Scholar] [CrossRef]
- JASPAR-A Database of Transcription Factor Binding Profiles. Available online: https://jaspar.genereg.net/ (accessed on 1 November 2021).
- Gupta, S.; Stamatoyannopoulos, J.A.; Bailey, T.L.; Noble, W.S. Quantifying similarity between motifs. Genome Biol. 2007, 8, R24. [Google Scholar] [CrossRef]
- Takou, M.; Balick, D.J.; Steige, K.A.; Dittberner, H.; Göbel, U.; Schielzeth, H.; de Meaux, J. Strength of stabilizing selection on the amino-acid sequence is associated with the amount of non-additive variance in gene expression. BioRxiv 2022. [Google Scholar] [CrossRef]
- Wright, M.N.; Ziegler, A. ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. J. Stat. Softw. 2017, 77, 1–17. [Google Scholar] [CrossRef]
- Washburn, J.D.; Mejia-Guerra, M.K.; Ramstein, G.; Kremling, K.A.; Valluru, R.; Buckler, E.S.; Wang, H. Evolutionarily informed deep learning methods for predicting relative transcript abundance from DNA sequence. Proc. Natl. Acad. Sci. USA 2019, 116, 5542–5549. [Google Scholar] [CrossRef] [PubMed]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv 2016, arXiv:1603.04467. [Google Scholar]
- Lee, G.; Sanderson, B.J.; Ellis, T.J.; Dilkes, B.P.; McKay, J.K.; Ågren, J.; Oakley, C.G. A large-effect fitness trade-off across environments is explained by a single mutation affecting cold acclimation. Proc. Natl. Acad. Sci. USA 2024, 121, e2317461121. [Google Scholar] [CrossRef]
- Fowler, S.G.; Cook, D.; Thomashow, M.F. Low Temperature Induction of Arabidopsis CBF1, 2, and 3 Is Gated by the Circadian Clock. Plant Physiol. 2005, 137, 961–968. [Google Scholar] [CrossRef] [PubMed]
- Bates, D.; Mächler, M.; Bolker, B.; Walker, S. Fitting Linear Mixed-Effects Models Using lme4. J. Stat. Softw. 2015, 67, 1–48. [Google Scholar] [CrossRef]
- Fox, J.; Weisberg, S. An R Companion to Applied Regression, 3rd ed.; Sage: Thousand Oaks, CA, USA, 2019; Available online: https://socialsciences.mcmaster.ca/jfox/Books/Companion/ (accessed on 14 July 2019).
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2018; Available online: http://www.R-project.org/ (accessed on 1 November 2021).
- Yanagisawa, S. Chapter 12-Structure, Function, and Evolution of the Dof Transcription Factor Family. In Plant Transcription Factors; Gonzalez, D.H., Ed.; Academic Press: Boston, MA, USA, 2016; pp. 183–197. Available online: https://www.sciencedirect.com/science/article/pii/B9780128008546000129 (accessed on 20 June 2023).
- Lenhard, B.; Sandelin, A.; Carninci, P. Metazoan promoters: Emerging characteristics and insights into transcriptional regulation. Nat. Rev. Genet. 2012, 13, 233–245. [Google Scholar] [CrossRef]
- Wei, Q.; Dunbrack, R.L., Jr. The Role of Balanced Training and Testing Data Sets for Binary Classifiers in Bioinformatics. PLoS ONE 2013, 8, e67863. [Google Scholar] [CrossRef]
- Buda, M.; Maki, A.; Mazurowski, M.A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 2018, 106, 249–259. [Google Scholar] [PubMed]
- Metzger, B.P.H.; Wittkopp, P.J.; Coolon, J.D. Evolutionary Dynamics of Regulatory Changes Underlying Gene Expression Divergence among Saccharomyces Species. Genome Biol. Evol. 2017, 9, 843–854. [Google Scholar] [CrossRef] [PubMed]
- Fraser, H.B.; Babak, T.; Tsang, J.; Zhou, Y.; Zhang, B.; Mehrabian, M.; Schadt, E.E. Systematic Detection of Polygenic cis-Regulatory Evolution. PLoS Genet. 2011, 7, e1002023. [Google Scholar] [CrossRef] [PubMed]
- Chawade, A.; Bräutigam, M.; Lindlöf, A.; Olsson, O.; Olsson, B. Putative cold acclimation pathways in Arabidopsis thaliana identified by a combined analysis of mRNA co-expression patterns, promoter motifs and transcription factors. BMC Genom. 2007, 8, 304. [Google Scholar] [CrossRef]
- Li, W.; Yin, Y.; Quan, X.; Zhang, H. Gene Expression Value Prediction Based on XGBoost Algorithm. Front. Genet. 2019, 10, 1077. Available online: https://www.frontiersin.org/articles/10.3389/fgene.2019.01077 (accessed on 19 December 2022). [CrossRef]
- Smet, D.; Opdebeeck, H.; Vandepoele, K. Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice. Front. Plant Sci. 2023, 14, 1212073. Available online: https://www.frontiersin.org/articles/10.3389/fpls.2023.1212073 (accessed on 31 July 2023). [CrossRef]
- Fowler, S.; Thomashow, M.F. Arabidopsis Transcriptome Profiling Indicates That Multiple Regulatory Pathways Are Activated during Cold Acclimation in Addition to the CBF Cold Response Pathway. Plant Cell. 2002, 14, 1675–1690. [Google Scholar] [CrossRef]
- Fagny, M.; Austerlitz, F. Understanding the adaptation of polygenic traits: The importance of gene regulatory networks. Trends Genet. 2021, 37, 631–638. [Google Scholar] [CrossRef]
- Kidokoro, S.; Konoura, I.; Soma, F.; Suzuki, T.; Miyakawa, T.; Tanokura, M.; Shinozaki, K.; Yamaguchi-Shinozaki, K. Clock-regulated coactivators selectively control gene expression in response to different temperature stress conditions in Arabidopsis. Proc. Natl. Acad. Sci. USA 2023, 120, e2216183120. [Google Scholar] [CrossRef]
- Corrales, A.R.; Carrillo, L.; Lasierra, P.; Nebauer, S.G.; Dominguez-Figueroa, J.; Renau-Morata, B.; Pollmann, S.; Granell, A.; Molina, R.V.; Vicente-Carbajosa, J.; et al. Multifaceted role of cycling DOF factor 3 (CDF3) in the regulation of flowering time and abiotic stress responses in Arabidopsis. Plant Cell Environ. 2017, 40, 748–764. [Google Scholar]
- Fornara, F.; de Montaigu, A.; Sánchez-Villarreal, A.; Takahashi, Y.; Ver Loren van Themaat, E.; Huettel, B. The GI–CDF module of Arabidopsis affects freezing tolerance and growth as well as flowering. Plant J. 2015, 81, 695–706. [Google Scholar] [CrossRef]
- Kim, H.J.; Hyun, Y.; Park, J.Y.; Park, M.J.; Park, M.K.; Kim, M.D.; Kim, H.-J.; Lee, M.H.; Moon, J.; Lee, I.; et al. A genetic link between cold responses and flowering time through FVE in Arabidopsis thaliana. Nat. Genet. 2004, 36, 167–171. [Google Scholar] [CrossRef]
- Lang, L.; Pettkó-Szandtner, A.; Elbaşı, H.T.; Takatsuka, H.; Nomoto, Y.; Zaki, A.; Dorokhov, S.; De Jaeger, G.; Eeckhout, D.; Ito, M.; et al. The DREAM complex represses growth in response to DNA damage in Arabidopsis. Life Sci. Alliance. 2021, 4, e202101141. Available online: https://www.life-science-alliance.org/content/4/12/e202101141 (accessed on 12 January 2024). [CrossRef]
- Kim, J.Y.; Ryu, J.Y.; Baek, K.; Park, C.M. High temperature attenuates the gravitropism of inflorescence stems by inducing SHOOT GRAVITROPISM 5 alternative splicing in Arabidopsis. New Phytol. 2016, 209, 265–279. [Google Scholar] [CrossRef]
- Meng, X.; Liang, Z.; Dai, X.; Zhang, Y.; Mahboub, S.; Ngu, D.W.; Roston, R.L.; Schnable, J.C. Predicting transcriptional responses to cold stress across plant species. Proc. Natl. Acad. Sci. USA 2021, 118, e2026330118. [Google Scholar] [CrossRef]
- Gorjifard, S.; Jores, T.; Tonnies, J.; Mueth, N.A.; Bubb, K.; Wrightsman, T.; Buckler, E.S.; Fields, S.; Cuperus, J.T.; Queitsch, C. Arabidopsis and maize terminator strength is determined by GC content; polyadenylation motifs and cleavage probability. Nat. Commun. 2024, 15, 5868. [Google Scholar] [CrossRef]
- Mei, W.; Stetter, M.G.; Gates, D.J.; Stitzer, M.C.; Ross-Ibarra, J. Adaptation in plant genomes: Bigger is different. Am. J. Bot. 2018, 105, 16–19. [Google Scholar] [CrossRef] [PubMed]
- Savolainen, O.; Lascoux, M.; Merilä, J. Ecological genomics of local adaptation. Nat. Rev. Genet. 2013, 14, 807–820. [Google Scholar] [CrossRef]
- Ferebee, T.H.; Buckler, E. Exploring the utility of regulatory network-based machine learning for gene expression prediction in maize. bioRxiv 2023. [Google Scholar] [CrossRef]
- Monroe, J.G.; McGovern, C.; Lasky, J.R.; Grogan, K.; Beck, J.; McKay, J.K. Adaptation to warmer climates by parallel functional evolution of CBF genes in Arabidopsis thaliana. Mol. Ecol. 2016, 25, 3632–3644. [Google Scholar] [CrossRef] [PubMed]
- Wittkopp, P.J.; Kalay, G. Cis-regulatory elements: Molecular mechanisms and evolutionary processes underlying divergence. Nat. Rev. Genet. 2012, 13, 59–69. [Google Scholar] [CrossRef]
- Buel, G.R.; Walters, K.J. Can AlphaFold2 predict the impact of missense mutations on structure? Nat. Struct. Mol. Biol. 2022, 29, 1–2. [Google Scholar] [CrossRef] [PubMed]
- Cheng, J.; Novati, G.; Pan, J.; Bycroft, C.; Žemgulytė, A.; Applebaum, T.; Pritzel, A.; Wong, L.H.; Zielinski, M.; Sargeant, T.; et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 2023, 381, eadg7492. [Google Scholar] [CrossRef] [PubMed]
- Rivière, Q.; Corso, M.; Ciortan, M.; Noël, G.; Verbruggen, N.; Defrance, M. Exploiting Genomic Features to Improve the Prediction of Transcription Factor Binding Sites in Plants. Plant Cell Physiol. 2022, 63, 1457–1473. [Google Scholar] [CrossRef]
- Srivastava, D.; Aydin, B.; Mazzoni, E.O.; Mahony, S. An interpretable bimodal neural network characterizes the sequence and preexisting chromatin predictors of induced transcription factor binding. Genome Biol. 2021, 22, 20. [Google Scholar] [CrossRef]
- Koo, P.K.; Ploenzke, M. Improving representations of genomic sequence motifs in convolutional networks with exponential activations. Nat. Mach. Intell. 2021, 3, 258–266. [Google Scholar] [CrossRef]
TFBS | Number of Discovered Motifs | Gene | Family |
---|---|---|---|
MA1159.1 | 4 | SGR5 | C2H2 zinc finger factors (class) |
MA1268.1 | 3 | CDF5 | DOF |
MA0556.1 | 3 | DOF1.7 | DOF |
MA0558.1 | 3 | FLC | MIKC |
MA0559.1 | 3 | (PISTILLATA) PI | MIKC |
MA0940.1 | 3 | AP1 | MADS box factors (AP1) |
MA1085.1 | 3 | WRKY40 | WRKY (class) |
MA1281.1 | 3 | DOF5.1 | DOF |
MA1367.1 | 3 | AT1G76870 | Trihelix |
MA1380.1 | 3 | TCX6 | CPP |
MA1182.1 | 5 | RVE8 | Myb-related |
MA1184.1 | 5 | RVE1 | Myb-related |
MA1380.1 | 5 | TCX6 | CPP |
MA1277.1 | 4 | DOF1.7 | DOF |
MA1281.1 | 4 | DOF5.1 | DOF |
MA1268.1 | 4 | CDF5 | DOF |
MA1278.1 | 4 | DOF3.4 | DOF |
MA1279.1 | 4 | DOF1.5 | DOF |
MA1190.1 | 4 | RVE5 | Myb-related |
MA0933.1 | 4 | AHL20 | HMGA factors |
MA1267.1 | 4 | DOF5.8 | DOF |
Up-regulated DEGs | Down-regulated DEGs | Spiked Up-regulated DEGs | Upstream Regions of up-DEGs | Up-regulated DEGs of Only Col-0 | Up-regulated DEGs of Up-sampled Col-0 | Genotypes with All CBFs as DEGs | |
---|---|---|---|---|---|---|---|
1st conv. filters | 128 | 64 | 128 | 64 | 128 | 128 | 128 |
2nd conv. filters | 128 | 64 | None | 128 | None | 128 | 128 |
3rd conv. filters | None | 128 | None | 128 | None | None | 64 |
Conv. width | 4 | 4 | 4 | 4 | 4 | 8 | 8 |
Pool width | 4 | 4 | 8 | 4 | 4 | 4 | 4 |
Pool stride | 4 | 8 | 4 | 4 | 8 | 4 | 8 |
Dropout | 0.25 | 0.25 | 0.25 | 0.1 | 0.1 | 0.1 | 0.25 |
1st dense layer units | 64 | 128 | 128 | 128 | 128 | 128 | 64 |
2nd dense units | 128 | None | 128 | None | 128 | 128 | None |
Number of conv. layers | 2 | 3 | 1 | 3 | 1 | 2 | 3 |
Number of dense layers | 3 | 2 | 3 | 2 | 3 | 3 | 2 |
Learning rate | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 |
Loss | 0.692 | 0.691 | 0.691 | 0.691 | 0.691 | 0.692 | 0.691 |
Accuracy | 0.621 | 0.682 | 0.673 | 0.666 | 0.692 | 0.636 | 0.672 |
Accuracy of DEGs | 0.368 | 0.365 | 0.480 | 0.339 | 0.347 | 0.368 | 0.341 |
Accuracy of non-DEGs | 0.693 | 0.684 | 0.665 | 0.671 | 0.697 | 0.681 | 0.681 |
prAUC test set | 0.701 | 0.712 | 0.704 | 0.701 | 0.7207444 | 0.703 | 0.711 |
prAUC training | 0.499 | 0.502 | 0.522 | 0.499 | 0.4971713 | 0.505 | 0.499 |
prAUC of the validation set | 0.501 | 0.502 | 0.578 | 0.509 | 0.4851478 | 0.517 | 0.506 |
Loss of the validation set | 0.693 | 0.693 | 0.692 | 0.693 | 0.693379 | 0.693 | 0.693 |
Epoch of the best model | 37 | 50 | 49 | 29 | 49 | 48 | 49 |
Difference in accuracy of DEG classes | 0.325 | 0.319 | 0.185 | 0.332 | 0.3495906 | 0.312 | 0.340 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Takou, M.; Bellis, E.S.; Lasky, J.R. Predicting Gene Expression Responses to Cold in Arabidopsis thaliana Using Natural Variation in DNA Sequence. Genes 2025, 16, 1108. https://doi.org/10.3390/genes16091108
Takou M, Bellis ES, Lasky JR. Predicting Gene Expression Responses to Cold in Arabidopsis thaliana Using Natural Variation in DNA Sequence. Genes. 2025; 16(9):1108. https://doi.org/10.3390/genes16091108
Chicago/Turabian StyleTakou, Margarita, Emily S. Bellis, and Jesse R. Lasky. 2025. "Predicting Gene Expression Responses to Cold in Arabidopsis thaliana Using Natural Variation in DNA Sequence" Genes 16, no. 9: 1108. https://doi.org/10.3390/genes16091108
APA StyleTakou, M., Bellis, E. S., & Lasky, J. R. (2025). Predicting Gene Expression Responses to Cold in Arabidopsis thaliana Using Natural Variation in DNA Sequence. Genes, 16(9), 1108. https://doi.org/10.3390/genes16091108