Previous Article in Journal
Study on the Detection Model of Tea Red Scab Severity Class Using Hyperspectral Imaging Technology
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

From Molecules to Fields: Mapping the Thematic Evolution of Intelligent Crop Breeding via BERTopic Text Mining

1
Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
2
Key Laboratory of Agricultural Blockchain Application, Ministry of Agriculture and Rural Affairs, Beijing 100081, China
3
College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Agriculture 2025, 15(22), 2373; https://doi.org/10.3390/agriculture15222373 (registering DOI)
Submission received: 13 October 2025 / Revised: 1 November 2025 / Accepted: 12 November 2025 / Published: 16 November 2025

Abstract

The convergence of agricultural biotechnology and artificial intelligence is reshaping modern crop improvement. Despite a surge of studies integrating artificial intelligence and biotechnology, the rapidly expanding literature on intelligent crop breeding remains fragmented across molecular, phenotypic, and computational dimensions. Existing reviews often rely on traditional bibliometric or narrative approaches that fail to capture the deep semantic evolution of research themes. To address this gap, this study employs the BERTopic model to systematically analyze 1867 articles (1995–2025, WoS Core Collection), mapping the thematic landscape and temporal evolution of intelligent crop breeding and revealing how methodological and application-oriented domains have co-evolved over time. Eight core topics emerge, i.e., (T0) genomic prediction and genotype–environment modeling; (T1) UAV remote sensing and multimodal phenotyping; (T2) stress-tolerant breeding and root phenotypes; (T3) ear/pod counting with deep learning; (T4) grain trait representation and evaluation; (T5) CRISPR and genome editing; (T6) spike structure recognition and 3D modeling; and (T7) maize tassel detection and developmental staging. Topic-evolution analyses indicate a co-development pattern, where genomic prediction provides a stable methodological backbone, while phenomics (UAV/multimodal imaging, organ-level detection, and 3D reconstruction) propels application-oriented advances. Attention dynamics reveal increasing momentum in image-based counting (T3), grain quality traits (T4), and CRISPR-enabled editing (T5), alongside a plateau in traditional mainstays (T0, T1) and mild cooling in root phenotyping under abiotic stress (T2). Quality stratification (citation quartiles, Q1–Q4) shows high-impact concentration in T0/T1 and a growing tail of application-driven work across T3–T7. Journal analysis reveals a complementary publication ecosystem: Frontiers in Plant Science and Plant Methods anchor cross-disciplinary dissemination; Remote Sensing and Computers and Electronics in Agriculture host engineering-centric phenomics; genetics/breeding journals sustain T0/T2; and molecular journals curate T5. These findings provide an integrated overview of methods, applications, and publication venues, offering practical guidance for research planning, cross-field collaboration, and translational innovation in intelligent crop breeding.

1. Introduction

Intelligent crop breeding has emerged as a pivotal frontier in modern agricultural science, integrating artificial intelligence, big data, genomics, and phenomics to enhance breeding efficiency, precision, and adaptability [1]. This paradigm enables the rapid identification and precise editing of target genes, thereby shortening breeding cycles while reducing field trial costs and resource consumption [2]. The application of deep learning, computer vision, and knowledge graph technologies has further advanced the accurate prediction of complex agronomic traits, facilitating the targeted development of high-yield, high-quality, and resilient crop varieties [3]. In the context of accelerating climate change and growing food demand, intelligent crop breeding is increasingly recognized as a key strategic approach to achieving sustainable agricultural development and ensuring global food security [4].
Over the past decade, an expanding body of high-quality research has accelerated progress in crop breeding science. These studies span a wide spectrum—from elucidating molecular mechanisms at the cellular level to validating phenotypic traits at the field scale [5]—and from improving single agronomic traits to optimizing complex trait networks through coordinated genetic and computational modeling [6]. Despite these advancements, the rapid expansion of literature in this field has created new challenges for information retrieval, knowledge synthesis, and trend identification. Conventional bibliometric techniques—such as keyword co-occurrence and citation network analysis—remain limited by their dependence on predefined thematic frameworks, which often fail to detect emergent interdisciplinary frontiers or capture the deep semantic evolution of scientific knowledge [7]. This limitation is particularly evident in AI-driven breeding paradigms, where the convergence of biology, computer science, and data analytics generates highly complex and dynamic knowledge structures [8].
Recent bibliometric and topic-modeling studies have attempted to characterize research trends in agriculture and breeding. For example, Rejeb et al. [9] applied an LDA-based topic modeling framework to 1114 publications on machine-learning applications in agriculture, identifying six primary thematic clusters (precision/remote sensing, molecular/food composition, food-systems, quality/adulteration, financial/technological, and predictive modeling) and noted that many reviews lack a holistic, cross-domain perspective. Similarly, Lin et al. [10] used BERTopic model to analyze 15,744 articles in functional agriculture (1995–2024) and demonstrated the value of embedding-based semantic clustering in capturing interdisciplinary dynamics. Earlier bibliometric surveys also mapped thematic shifts in digital agriculture and topic-modeling use broadly: for instance, Liu and Wan [11] identified 37 distinct topics in precision agriculture research via BERTopic. Despite this growth, most existing analyses focus on single crop systems, specific technologies, or rely on probabilistic models (e.g., LDA or NMF) with limited semantic depth. For example, Sott et al. [12] provided a bibliometric network analysis of digital agriculture but did not leverage embedding-based topic modeling. Thus, a comprehensive, transformer-based semantic analysis covering the full molecular-to-field continuum of intelligent crop breeding remains lacking—this study seeks to fill that gap.
Recent advances in natural language processing (NLP) and text mining provide a transformative pathway to overcome these limitations. Compared with traditional methods, these techniques can efficiently process large volumes of unstructured textual data, enabling more comprehensive analyses of disciplinary evolution and uncovering hidden thematic relationships. Among them, the BERTopic model proposed by Grootendorst [13] represents a new generation of neural topic modeling that combines transformer-based semantic embedding with class-based term weighting. This approach has demonstrated strong performance in capturing topic coherence and evolution patterns across multiple scientific domains, including artificial intelligence [14] and biological breeding [15]. Its interdisciplinary adaptability provides a robust methodological foundation for analyzing the evolving landscape of intelligent crop breeding.
Building upon these methodological advancements, this study employs the BERTopic model to systematically identify, classify, and interpret the thematic structure of intelligent crop breeding research from 1995 to 2025. The model’s transformer-based semantic representation enables the automatic discovery of latent knowledge structures without the need to predefine topic numbers, offering superior flexibility, interpretability, and efficiency compared with traditional probabilistic or matrix factorization approaches. By applying this framework, the study delineates the thematic landscape of intelligent crop breeding, explores its temporal evolution, evaluates topic-level research quality, and analyzes journal distribution patterns. These findings aim to provide a holistic understanding of the field’s development trajectory and offer strategic insights for future research planning, interdisciplinary collaboration, and innovation policy formulation.
Distinct from crop- or method-specific surveys, this study adopts a cross-scale perspective that jointly maps molecular, organ-level, and field-scale themes, enabling a holistic view from CRISPR-enabled genetic design to UAV-based 3D phenotyping and knowledge diffusion across journals. By integrating these multilevel dimensions, the research bridges the “from molecules to fields” continuum, highlighting how computational, biological, and engineering advances converge to reshape intelligent breeding.
Specifically, this study makes three contributions:
(i)
It constructs a 1995–2025 cross-domain semantic atlas of intelligent crop breeding, capturing the evolution of research from molecular to field scales through BERTopic-based text mining.
(ii)
It proposes a tri-axial evaluation framework that combines attention slope, citation-quartile stratification, and journal ecology to assess topic maturity, impact, and dissemination pathways.
(iii)
It provides an actionable roadmap for research planning and journal selection, offering practical guidance for interdisciplinary collaboration and future trend identification in intelligent crop breeding.

2. Materials and Methods

2.1. Data Sources and Retrieval Strategy

The bibliographic data used in this study were obtained from the Web of Science (WoS) Core Collection (SCI-EXPANDED, SSCI), covering publications from 1995 to early 2025. WoS was selected for its rigorous indexing standards and comprehensive coverage of high-quality publications spanning agriculture, genetics, computer science, and related interdisciplinary fields.
An initial search (accessed on 5 April 2025) yielded 3579 records using the Boolean query (Supplementary Table S1).
The search was limited to peer-reviewed Articles and Reviews published in English. Non-research documents such as conference abstracts, editorials, and news items were excluded. After this filtering, 3459 records remained. Removing 19 duplicates based on DOI, title, and author combinations yielded 3440 unique items.
Two domain experts independently reviewed the records for topical relevance, retaining 1867 publications directly related to intelligent crop breeding. The detailed query syntax, field tags, and inclusion/exclusion criteria are provided in Table S1. Conference papers and preprints were excluded due to incomplete metadata and inconsistent indexing; this limitation is noted for future integration of gray-literature sources.
This curated corpus provides a representative and high-quality dataset for subsequent BERTopic modeling and thematic analysis.
The overall retrieval and screening workflow, including identification, filtering, and expert validation, is summarized in Supplementary Figure S1. This process ensured that only publications directly relevant to intelligent crop breeding were included for subsequent analysis.

2.2. Text Preprocessing

To ensure both the accuracy and semantic robustness of topic modeling, the textual information from each publication—including the title, abstract, author keywords, and database-assigned Keywords—was integrated into a unified corpus. The preprocessing pipeline consisted of four main stages: text normalization, synonym harmonization, document filtering, and domain-specific lexicon construction.
(1)
Text normalization:
All text was converted to lowercase, and stop words, punctuation marks, special characters, and standalone numeric expressions were removed to reduce noise and standardize the corpus structure.
(2)
Synonym harmonization:
To standardize domain-specific terminology and ensure semantic consistency, high-frequency terms were automatically extracted using term-frequency and class-based TF-IDF (c-TF-IDF) analyses to identify potential lexical variants. Two domain experts in crop breeding and agricultural informatics then manually verified conceptual equivalence and contextual appropriateness (e.g., “genomic selection” ↔ “GS,” “LiDAR” ↔ “laser scanning”, “high-throughput phenotyping” ↔ “HTP”). Based on this review, a harmonization dictionary was constructed to unify abbreviations, acronyms, and spelling variants under standardized canonical forms. The complete list of synonym pairs and standardization rules is provided in Supplementary Table S2.
(3)
Document filtering:
Records with insufficient textual information (e.g., abstracts containing fewer than 30 words) were removed to ensure adequate semantic content for topic modeling.
(4)
Domain-specific lexicon construction:
A customized lexicon tailored to the intersection of agricultural science and artificial intelligence was then developed. This included an expanded stop-word list and a terminology normalization dictionary, thereby enhancing the precision and coherence of semantic representations during BERTopic modeling.

2.3. Topic Modeling Method

This study employed the BERTopic model [13], which combines contextual embeddings derived from pre-trained language models with density-based clustering to automatically identify latent themes within textual corpora. The modeling workflow consisted of four major stages: semantic embedding, dimensionality reduction, topic clustering, and keyword extraction.
(1)
Model implementation and parameterization:
Textual data were first transformed into dense semantic embeddings using sentence-transformers/all-MiniLM-L6-v2, with mean pooling and L2 normalization. The embeddings were reduced in dimensionality using UMAP (n_neighbors = 15, n_components = 5, min_dist = 0.0, metric = cosine, random_state = 13) to preserve semantic relationships while enhancing computational efficiency.
Clustering was performed in the reduced space using HDBSCAN (min_cluster_size = 30, min_samples = 10, cluster_selection_epsilon = 0.0, cluster_selection_method = “eom”). Representative keywords for each topic were then extracted using class-based TF-IDF (c-TF-IDF, ngram_range = (1, 1)) to highlight terms most distinctive to each cluster. Detailed parameter settings are provided in Supplementary Table S3.
(2)
Validation and sensitivity analysis:
To ensure topic reliability, both quantitative and expert-based validation were conducted. Quantitatively, intra-topic consistency and keyword relevance (based on c-TF-IDF) were used to assess internal coherence (Supplementary Tables S4 and S5). Qualitatively, two independent crop-breeding experts evaluated each topic’s coherence, representativeness, and distinctiveness on a 5-point Likert scale; topics with mean scores below 3.0 were excluded from interpretation. Discrepancies were resolved through consensus discussion.
A sensitivity analysis was conducted by systematically varying UMAP parameters (n_neighbors = 10–50, n_components = 3–10), HDBSCAN parameters (min_cluster_size = 20–60), and c-TF-IDF n-gram ranges ((1, 1)–(1, 3)). The optimal configuration was selected based on the harmonic mean of topic coherence and topic diversity.
(3)
Integration with bibliometric mapping:
This hybrid analytical design integrates semantic topic modeling with bibliometric mapping, aligning with recent methodological advances in combining structural and conceptual analyses of innovation landscapes [16].

2.4. Temporal Dynamics and Attention Metrics

To examine the dynamic evolution of research topics, the dataset was segmented into annual time slices, and the number of publications associated with each topic was calculated for every year. This enabled the construction of topic evolution curves that illustrate changes in research focus over time.
The popularity of each topic was further assessed by estimating the slope of its annual publication trend using linear regression. To highlight recent dynamics, slope values were calculated over the most recent five-year period (2020–2025), providing a focused measure of short-term attention change while controlling for long-term publication growth. The slope served as an indicator of whether a topic was gaining or losing momentum within the research community.
Temporal evolution was examined on an annual basis, where the publication counts of each topic were aggregated per year. This one-year time slicing enabled the detection of fine-grained shifts in topic attention and the identification of emerging or declining trends.
In addition, the quality distribution of research outputs was evaluated by ranking all publications according to citation counts and dividing them into four quartiles (Q1–Q4). This allowed for a comparative assessment of the academic impact and quality differences across topics.

2.5. Journal Distribution Mapping

To reveal the pathways of knowledge dissemination, this study further examined the distribution of research topics across different academic journals. The analysis considered not only the frequency of publications within each journal but also the classification of journals based on four quartiles (Q1–Q4). and their interdisciplinary characteristics. By mapping topics to journals in this way, it was possible to identify the primary publication platforms for different research directions and to assess their relative academic influence.

2.6. Technical Environment and Tools

All data processing and modeling were conducted in a Python 3.10 environment. Several open-source packages were employed to support different stages of the analysis. For text mining and topic modeling, the study utilized BERTopic, sentence-transformers, umap-learn, and hdbscan. For data processing and statistical analysis, packages such as pandas, numpy, and scikit-learn were applied. For visualization, matplotlib and seaborn were used to generate figures and graphical representations.
Figure 1 illustrates the overall methodological workflow adopted in this study, which consists of four main stages: data collection, text preprocessing, topic modeling, and downstream analysis. Research articles and reviews were retrieved from the Web of Science Core Collection, followed by systematic text preprocessing and BERTopic-based topic modeling. The identified topics were subsequently analyzed across multiple dimensions, including their temporal evolution, popularity dynamics, and quality distribution. By mapping the topics to citation quartiles and journal categories, the study provided insights into both the developmental trajectory of intelligent crop breeding and the academic platforms driving its dissemination.

3. Results

3.1. Publication Trajectory

To comprehensively understand the research foundation and developmental trajectory of intelligent crop breeding, the annual publication trend of related studies was analyzed (Figure 2). From a temporal perspective, the evolution of research output exhibits a distinct phase-based growth pattern, which can be broadly divided into three stages:
(1)
The Emergent Exploration Stage (1995–2010)
During this initial phase, intelligent crop breeding remained largely conceptual, with very limited research activity. The annual number of publications generally stayed below ten, reflecting a period of slow and incremental accumulation. Studies at this stage primarily focused on preliminary attempts to integrate crop genetic breeding with basic information technologies, but a systematic research framework had yet to take shape.
(2)
The Steady Growth Stage (2011–2018)
Starting from 2011, the field entered a steady development phase, driven by technological breakthroughs in high-throughput sequencing, genomic selection (GS), and data processing techniques. The number of annual publications increased consistently, reaching approximately 90 papers by 2018. Research during this stage began to emphasize data-driven breeding design and decision-support systems. Meanwhile, methods such as machine learning and image recognition were gradually introduced into crop trait analysis and phenotypic identification, marking a transition toward more intelligent and quantitative approaches.
(3)
The Rapid Expansion Stage (2019–Present)
Since 2019, intelligent crop breeding research has entered a phase of rapid expansion, characterized by exponential growth in publication output. The number of papers surpassed 200 in 2021 and peaked at nearly 300 in 2023, indicating growing attention from both academic and industrial communities. Research hotspots during this period have centered on cutting-edge topics such as deep learning–assisted phenotypic analysis, the development of intelligent breeding platforms, and the application of big data to enhance breeding efficiency. Although the annual output in 2024–2025 shows slight fluctuations, it remains at a historically high level, underscoring the establishment of intelligent crop breeding as one of the key directions in contemporary agricultural science and technology.
Overall, the field is undergoing a pivotal transition from information-assisted breeding to intelligent decision-making–driven breeding. This shift reflects the solidification of its research foundation and provides robust empirical support for the subsequent analyses of thematic evolution and frontier identification (see Section 3.2).

3.2. Eight Core Topics

Quantitative validation confirmed that the eight topics generated by BERTopic are both semantically coherent and well separated (Figure 3). As summarized in Supplementary Tables S4 and S5, the mean c-TF-IDF scores and inter-topic cosine distances demonstrate consistent internal relevance among keywords and clear boundaries between thematic clusters across the dataset.
Based on the BERTopic modeling results and representative highly cited publications, this study identifies and summarizes eight major research topics within the field of intelligent crop breeding. Each topic exhibits distinctive keyword clustering characteristics and demonstrates systematic accumulation in leading international journals of agricultural science and technology. A detailed interpretation of each topic is provided below.
Topic 0: Genomic Prediction and Genotype–Environment (G × E) Modeling
This topic focuses on keywords such as genomic prediction, marker, trait, and genotype, emphasizing the integration of genomic information with environmental variables to construct more adaptive crop trait prediction models. Representative studies include Fernandes et al. [17], who combined environmental variables with machine learning algorithms to improve prediction accuracy across multiple environments and varieties, and Thudi et al. [18], who systematically reviewed climate-adaptive breeding strategies based on genomic resources, marking a new leap forward following the Green Revolution.
Topic 1: UAV Remote Sensing and Multimodal Phenotyping
This topic centers on image, UAV, phenotype, and multispectral as core keywords, representing a crucial direction of high-throughput phenomics (HTP) in intelligent crop breeding. For instance, Shu et al. [19] proposed a multi-sensor fusion and ensemble learning approach to estimate dry biomass and leaf area index (LAI), while Zhou et al. [20] evaluated the robustness of various deep learning models for yield prediction across different crop varieties.
Topic 2: Stress-Tolerant Breeding and Root Phenotyping
Keywords such as root, drought, stress, and tolerance highlight this topic’s focus on crop phenotypic responses under drought or salinity stress. Representative studies include Shi et al. [21], who developed an enhanced SegFormer-based root image segmentation method that improves the identification efficiency of drought-responsive genes, and Kakar et al. [22], who designed a rapid screening device (mini-hoop) and established a drought-tolerance index to support practical breeding selection.
Topic 3: Deep Learning Methods for Ear and Pod Counting
This topic revolves around ear, pod, count, and deep learning, focusing on the automated detection and counting of crop ears and pods. Notable contributions include Sadeghi-Tehran et al. [23], who developed the DeepCount system for in-field spike detection, and Yang et al. [24], who constructed a lightweight network (LWDNet) to address density variations in ears across ground-based and UAV-based imaging platforms.
Topic 4: Grain Trait Characterization and Evaluation
Keywords such as grain, size, shape, and weight define this topic, which targets the digital representation and quality assessment of grain morphology. Representative works include Qin et al. [25] who used structured-light 3D point cloud analysis to identify filled and unfilled grains, and Liu et al. [26], who proposed a shadow-based image recognition method for rapid evaluation of grain filling degree.
Topic 5: CRISPR and Genome Editing
Characterized by keywords such as CRISPR, target, gRNA, and mutant, this topic represents one of the most transformative molecular tools in intelligent crop breeding. Key studies include Liu et al. [27] and Rasheed et al. [28], which reviewed the applications and challenges of CRISPR in major crops, providing essential foundations for functional gene improvement and non-transgenic variety development.
Topic 6: Spike Structure Recognition and 3D Modeling
Keywords such as spike, segmentation, LiDAR, and network indicate that this topic focuses on spike morphology extraction techniques combining LiDAR and deep neural networks. For example, Liu et al. [29] proposed the KP-CNN model integrating LiDAR data and convolutional neural networks to achieve high-precision 3D spike phenotype extraction, while Hasan et al. [30] developed the SpikesTrain dataset to improve spike detection performance under complex field conditions.
Topic 7: Maize Tassel Detection and Developmental Stage Analysis
This topic is characterized by tassel, detection, stage, and branch, with research primarily targeting tassel recognition and developmental stage identification in maize and related crops. Representative studies such as Zan et al. [31] and Gao et al. [32] constructed real-time detection models based on VGG16 and YOLOv5 architectures to automatically annotate pollination stages and assist breeding evaluation.
Collectively, these eight topics illustrate a full-chain integration trend that spans “from molecules to fields, from data to algorithms, and from experiments to tools.” The identified themes encompass multiple levels—including genes, phenotypes, environments, and organs—while demonstrating the broad application of advanced technologies such as deep learning, computer vision, UAV remote sensing, and CRISPR genome editing. Representative studies are distributed across journals from Q1 to Q4, including authoritative platforms such as Plant Phenomics and Plant Biotechnology Journal, reflecting both the high academic impact and the growing practical value of research in intelligent crop breeding.

3.3. Topic Evolution

To reveal the dynamic evolution of research hotspots in the field of intelligent crop breeding, this study conducted a time-series analysis of publication counts for the eight major topics identified by BERTopic between 1995 and 2025. Figure 4 illustrates the annual publication trends of each topic, capturing their emergence, evolution, and diffusion patterns throughout the development of intelligent crop breeding research.
To statistically validate the temporal patterns of topic evolution, we applied the non-parametric Mann–Kendall (MK) trend test to annual publication counts (1995–2025) for all eight identified topics (Supplementary Table S6). All topics exhibited statistically significant monotonic upward trends (Z > 1.96, p < 0.001), confirming that the observed increases reflect sustained development rather than random fluctuations. Foundational domains such as genomic prediction (T0) and UAV-based phenotyping (T1) showed the strongest upward momentum, while application-oriented topics (T3–T7) accelerated in recent years, indicating the progressive expansion of intelligent crop breeding across molecular, phenotypic, and computational dimensions.
(1)
Genomic Prediction and Genotype–Environment (G × E) Modeling (Topic 0): A Continuously Dominant Core Theme
Topic 0 represents the earliest and most enduringly influential research direction in this field. Since 2006, the number of publications has increased steadily, followed by a sharp rise after 2018. The research focus has gradually shifted from early marker-assisted selection to predictive modeling based on high-throughput genomic data and G × E interaction analysis. In 2023, this topic reached its publication peak (over 70 papers), underscoring its broad application potential in climate-resilient crop breeding worldwide.
(2)
UAV Remote Sensing and Multimodal Phenotyping (Topic 1): A Rapidly Emerging Research Frontier
Topic 1 has exhibited significant upward trend since 2018, with publication counts surging between 2021 and 2024 and peaking at nearly 60 papers. It has become one of the most active technological frontiers in intelligent crop breeding. The research under this topic emphasizes UAV-based imaging, multispectral remote sensing, image segmentation, and deep neural network modeling—widely applied to field phenotyping, yield estimation, and breeding efficiency evaluation.
(3)
Stress-Tolerant Breeding and Root Phenotyping (Topic 2): Steadily Increasing Attention
This topic has shown a consistent upward trend since 2015, reaching its peak around 2023 (approximately 30 publications) before a slight decline. The main research emphasis lies in drought stress responses, root phenotyping, and the identification of resistance-related loci. As climate change intensifies, the need for stress-resilient crop varieties continues to grow, suggesting sustained interest and long-term relevance for this topic.
(4)
Organ Recognition and Counting (Topics 3, 4, 6, and 7): A Convergence Point of Diverse Techniques
These four topics collectively focus on the automated recognition, classification, and counting of crop organs such as ears, grains, and tassels, though they differ in technical approach and application scope:
Topic 3 (Ear/Pod Counting) and Topic 4 (Grain Traits) show moderate but consistent growth, with research mainly centered on image recognition algorithms.
Topic 6 (Spike 3D Modeling) employs emerging sensing technologies such as LiDAR, demonstrating strong technical foresight.
Topic 7 (Tassel Detection) maintains a smaller yet steady output, closely tied to maize breeding applications.
Collectively, these studies have been widely deployed in high-throughput field phenotyping platforms, serving as essential tools for achieving precision breeding and intelligent crop management.
(5)
CRISPR and Genome Editing (Topic 5): Stable but Gradually Maturing Development
As the only topic centered on molecular tools within intelligent crop breeding, Topic 5 entered a stable growth phase after 2020. Although its publication volume is lower than imaging- or data-driven themes, it exerts profound influence on precision breeding, functional gene analysis, and non-transgenic genetic improvement. With continued policy support and translational applications, this domain is expected to maintain upward momentum and play a key role in the molecular foundation of intelligent breeding systems.
Overall, the evolution of research topics in intelligent crop breeding reveals a pattern of co-development between theoretical foundations and technological applications. Genomic prediction and selection constitute the stable methodological core of the discipline, while the integration of remote sensing, image analytics, and deep learning has expanded the boundaries of breeding intelligence and efficiency. Looking ahead, the fusion of multimodal data, cross-scale modeling, and decision-support systems is expected to further deepen the interconnections among topics, advancing intelligent crop breeding toward a more integrated and higher-order stage of innovation.

3.4. Attention Dynamics

To further elucidate the activity level and changing attention trends among different research themes in intelligent crop breeding, this study calculated the slope values of publication trends for each topic in recent years and visualized them through a heat spectrum (Figure 5). The slope represents the rate of increase or decrease in annual publication counts, while the heat spectrum uses a color gradient to intuitively display the distribution of “hot” and “cold” research topics.
(1)
Topics with Increasing Attention
Topic 3 (ear/pod counting) and Topic 5 (CRISPR genome editing) show the highest slope values—0.0117 and 0.0106, respectively—indicating that they are currently the most rapidly growing and active research areas.
The rise of Topic 3 reflects the maturation and expanding demand for image-based counting technologies in field applications, particularly due to advances in lightweight neural networks with improved adaptability across multiple platforms.
The increasing trend of Topic 5 demonstrates the transition of CRISPR technology from laboratory research to field-oriented breeding practices, with studies shifting from theoretical breakthroughs toward the validation of improved varieties in practical contexts.
Topic 4 (grain traits) also exhibits positive growth (slope = 0.0066), suggesting that the digital characterization of grain quality traits has become an emerging focus of industrial and applied breeding research.
(2)
Topics with Stable Attention
Topic 7 (tassel detection) and Topic 6 (spike 3D modeling) display nearly neutral slopes (approximately 0), indicating a relatively stable trend.
These topics mainly represent technological tools, such as visual detection methods based on VGG16 or YOLOv5 architectures, which retain practical value in specific crops and scenarios but show limited potential for broad expansion, resulting in relatively flat attention dynamics.
(3)
Topics with Decreasing Attention
Topic 2 (stress-tolerant breeding and root phenotyping), Topic 0 (genomic prediction), and Topic 1 (UAV remote sensing and phenotyping) exhibit negative slopes, respectively, suggesting that these previously dominant research areas are entering a relative cooling phase.
The rapid decline of Topic 2 may be associated with the intrinsic challenges of root trait acquisition and long experimental cycles, highlighting the need for methodological breakthroughs in phenotype extraction and functional validation.
Although Topic 0 remains a foundational domain, its decline likely reflects the maturation of modeling frameworks and diminishing marginal innovation.
The slight decrease in Topic 1 could stem from the growing standardization of phenotypic imaging technologies and the resulting research homogenization. Future growth in this area may depend on the integration of multimodal sensing, decision-support systems, and adaptive data fusion approaches.
In summary, the attention landscape of intelligent crop breeding research is undergoing a stage-wise transition—from model construction and theoretical exploration toward practical tools and application-oriented innovation. Image-based counting technologies and genome editing studies have emerged as high-attention frontiers, whereas traditional genomic prediction and stress-tolerance breeding topics are experiencing mild declines. This trend suggests that future research should not only sustain theoretical depth but also strengthen technological integration and system-level innovation in real-world breeding contexts.

3.5. Quality Distribution (Q1–Q4 by Topic)

To evaluate the academic influence and research quality across different topics within the field of intelligent crop breeding, all publications were ranked by citation count and divided into four quartiles (Q1–Q4). Q1 represents the top 25% of highly cited papers, whereas Q4 includes the bottom 25% of relatively marginal publications. Figure 6 illustrates the distribution of paper counts across quartiles for each topic.
(1)
Topic 0—Genomic Prediction and Modeling: The Most Influential Core Area
Topic 0 contains 134 papers in Q1, the highest among all topics, and exhibits broad representation across all quartiles. This indicates its foundational and systematic role in the academic landscape. The methodologies encompassed—such as genomic selection (GS), genotype-by-environment (G × E) modeling, and machine learning—are widely applied in practical breeding, demonstrating high generalizability and sustained citation potential.
(2)
Topic 1—Remote Sensing and Phenotypic Recognition: Broad Quality Distribution
Topic 1 shows a well-balanced distribution across quartiles, particularly with strong representation in Q1 (88 papers) and Q2 (92 papers). This reflects its ability to combine methodological innovation with engineering applicability. With continued advancements in UAV-based sensing, multimodal imaging, and deep learning models, this topic is increasingly recognized as a hallmark of application-driven research.
(3)
Topic 2—Stress-Tolerant Breeding and Root Analysis: Moderate but Variable Quality
Although Topic 2 performs reasonably well in Q1 (38 papers) and Q2 (43 papers), it also has a relatively high proportion of Q4 papers (55), suggesting substantial variation in research quality. This dispersion may be attributed to the inherent challenges of root phenotyping—such as data acquisition difficulty and low experimental reproducibility—indicating the need for greater standardization and validation in this field.
(4)
Topics 3, 4, 6, and 7—Organ Detection and Counting: Emerging Potential
These newer, tool-oriented topics have fewer highly cited papers but show growing accumulation in Q3 and Q4, suggesting expanding influence.
Topic 3 (ear/pod counting) includes 41 papers across Q3 and Q4, reflecting rapid adoption in field-based high-throughput phenotyping.
Topic 4 (grain traits) performs relatively well in Q4 (28 papers), consistent with the rising demand for digital grain quality assessment.
Topics 6 (spike 3D modeling) and 7 (tassel detection) have smaller publication volumes but already feature several Q1 papers—such as spike detection studies in Plant Methods—indicating technological innovation with strong future potential.
(5)
Topic 5—CRISPR and Genome Editing: Concentration of High-Quality Research
Although the total number of publications under Topic 5 is modest, it shows a high density of impactful papers, with 8 in Q1 and 10 in Q2. These works, often published in journals such as International Journal of Molecular Sciences and Current Issues in Molecular Biology, highlight the “small but strong” nature of this research direction—characterized by high precision, innovation, and transformative potential.
In summary, the research quality distribution in intelligent crop breeding reveals two distinct patterns:
Foundational and methodological topics (e.g., Topics 0 and 1) exhibit concentrated high-impact publications and deep research accumulation.
Tool-oriented and emerging topics (e.g., Topics 3–7) remain in developmental stages but show a growing diffusion trend across lower quartiles, with some technical studies poised to become highly cited breakthroughs.
This pattern suggests that while consolidating existing methodological strengths, future research should place greater emphasis on high-quality technological integration and the exploration of engineering-oriented translational pathways to drive innovation in intelligent crop breeding.

3.6. Journal Landscape (By Topic)

To further elucidate the knowledge dissemination pathways and publication ecology of intelligent crop breeding research, this study analyzed the distribution of topic-specific papers across major academic journals (Figure 7). The results reveal the preferred publication outlets, disciplinary orientations, and influence patterns associated with different research directions.
(1)
Core Comprehensive Journals—Frontiers in Plant Science and Plant Methods
Frontiers in Plant Science serves as the primary publication platform for intelligent crop breeding research, with over 100 papers identified. The majority belong to Topic 0 (genomic prediction and modeling, 44 papers) and Topic 1 (image-based phenotyping, 37 papers), highlighting the journal’s emphasis on data-driven and high-throughput breeding approaches.
Plant Methods focuses on methodological innovation and tool development, encompassing multiple imaging and field-detection topics (Topics 1, 2, 3, and 6). It has become a key venue for research on image recognition, field phenotyping, and precision measurement technologies.
(2)
Engineering and Remote Sensing Journals—Remote Sensing and Computers and Electronics in Agriculture
Remote Sensing plays a vital role in advancing UAV-based imaging, multispectral sensing, and segmentation techniques. It shows the highest concentration of Topic 1 papers, demonstrating the journal’s strong alignment with sensor-based crop monitoring.
Computers and Electronics in Agriculture serves as a major outlet for applied AI and automation studies, covering multiple imaging-related topics (Topics 1, 3, and 7). Its focus on machine learning, object counting, and recognition algorithms underscores the transition of intelligent breeding from conceptual innovation to field-scale implementation.
(3)
Genetics and Breeding Journals—Theoretical and Applied Genetics, Crop Science, and Molecular Breeding
These journals remain central to traditional breeding and genomic research, primarily publishing works from Topics 0 and 2. Theoretical and Applied Genetics tends to feature studies on genotype–environment interaction modeling and genomic prediction, whereas Crop Science emphasizes stress-resilient traits and root phenotyping. Collectively, they represent the theoretical and methodological backbone of breeding science.
(4)
Cross-Disciplinary Platforms in Phenomics and Agronomy—Plant Phenomics, Agronomy-Basel, and Plant Genome
Plant Phenomics integrates artificial intelligence with image-based phenotyping, with strong representation from Topics 1 and 4. It serves as a frontier journal linking data acquisition and intelligent analysis in crop breeding.
Agronomy-Basel and Agriculture-Basel accommodate a wide range of applied studies across multiple topics, reflecting their interdisciplinary scope and openness to methodological diversity.
(5)
Molecular Tool-Oriented Journals—International Journal of Molecular Sciences
Although publishing a smaller number of papers, this journal plays a representative role for Topic 5 (CRISPR and genome editing), serving as a key outlet for molecular tool development and functional gene editing studies.
In summary, the journal landscape of intelligent crop breeding reveals a three-tier structure characterized by the following:
  • Comprehensive hubs (Frontiers in Plant Science, Plant Methods) that act as central platforms for cross-domain integration;
  • Technical and engineering outlets (Remote Sensing, Computers and Electronics in Agriculture) that bridge AI, sensing, and phenomics applications;
  • Specialized molecular and genetic journals that provide focused publication channels for advanced breeding tools and genomic methodologies.
This ecosystem demonstrates that intelligent crop breeding research has evolved into a multidisciplinary and multi-level knowledge network, integrating biological, computational, and engineering sciences. The findings also offer strategic insights for journal selection, research positioning, and interdisciplinary collaboration in the era of AI-assisted agriculture.

4. Discussion

4.1. Methodological Comparison

From a methodological perspective, this study applies the BERTopic model—built upon the Transformer architecture—to analyze the thematic evolution of intelligent crop breeding. By integrating the semantic representation capability of pre-trained language models with the HDBSCAN hierarchical clustering algorithm, BERTopic effectively captures deeper contextual semantics and produces more interpretable topic structures. Compared with traditional “bag-of-words” models such as Latent Dirichlet Allocation (LDA) [33], Probabilistic Latent Semantic Analysis (PLSA) [34], and Non-negative Matrix Factorization (NMF) [35], the BERTopic approach demonstrates superior performance in handling complex scientific texts—addressing common challenges related to semantic expressiveness, hyperparameter sensitivity, and interpretability of results. Nevertheless, as with most topic models, BERTopic remains sensitive to data quality, domain adaptation, and parameter selection [36]. To mitigate these issues, this study implemented a multi-stage text preprocessing pipeline specifically designed for the technical and terminological features of intelligent crop breeding, accompanied by extensive parameter tuning and validation to optimize model performance and ensure result reliability.
In addition, this study introduces a more comprehensive and systematic analytical framework that extends beyond conventional topic detection. By jointly examining thematic identification, temporal evolution, and journal distribution, the framework enables a multidimensional characterization of knowledge dynamics within the field. The analytical design incorporates fine-grained quantitative metrics to capture variations in topic attention and research quality distribution with precision. These methodological innovations enhance the analytical depth and breadth of topic evolution studies, addressing limitations in earlier works concerning analytical dimensions, granularity, and presentation.
Furthermore, in contrast to prior research that focused on individual crops—such as maize [37], wheat [38], or rice [39]—or on specific technologies such as synthetic biology [40], this study performs a panoramic analysis across the entire intelligent crop breeding domain based on large-scale scientific literature. It identifies topic-specific attention patterns, evaluates research quality differences, and reveals the temporal and publication-related characteristics of each theme. This multidimensional and data-driven approach offers a holistic perspective for understanding the developmental trajectory of intelligent crop breeding and provides methodological insights for future bibliometric and semantic analyses in agricultural informatics.
To validate the reliability of the topic structure, an auxiliary Latent Dirichlet Allocation (LDA) model was trained using the same corpus and preprocessing pipeline. LDA models with topic numbers (k = 1–15) were evaluated based on coherence (C_v) and perplexity, both stabilizing near k = 8 (Supplementary Figure S2). The resulting topics largely overlapped with those generated by BERTopic, encompassing major domains such as genomic prediction, remote sensing phenotyping, and genome editing.
However, differences in granularity and boundary clarity were observed. LDA tended to produce broader, high-level topics (e.g., molecular breeding), while BERTopic’s embedding-based clustering decomposed them into finer subthemes such as G × E modeling, CRISPR editing, and spike image segmentation. BERTopic also displayed smoother temporal evolution, reflecting the semantic continuity of transformer embeddings.
The supplementary LDA results provide an independent validation of the topic structure derived from BERTopic. Despite methodological differences, both models capture the same thematic cores—genomic prediction, phenotyping, and genome editing—indicating that the observed knowledge structure is not model-dependent but reflects genuine semantic organization within the intelligent crop breeding literature. The LDA coherence and perplexity metrics (Supplementary Table S7) further demonstrate model stability and interpretive alignment.

4.2. Theme Identification

Based on the in-depth results of BERTopic modeling, this study systematically identifies eight core research themes in intelligent crop breeding, each characterized by a distinct set of representative keywords. Overall, the thematic structure of the field can be classified into three major technological directions: molecular breeding technologies, phenomics and sensing technologies, and stress-resilient breeding for quality improvement.
(1)
Molecular Breeding Technologies—Genomic Prediction and CRISPR-Based Editing (Topics 0 and 5)
The first direction centers on genomic prediction and genotype–environment modeling (Topic 0) and CRISPR genome editing (Topic 5), representing the core of molecular breeding innovation. Genomic prediction integrates multi-omics datasets—such as genomics and epigenomics—with environmental variables to construct adaptive predictive models that significantly enhance the accuracy and robustness of trait estimation [41]. Meanwhile, CRISPR-based gene editing offers a revolutionary tool for targeted improvement of agronomic traits through precise and efficient genome modification [42]. The integration of these frontier techniques is driving a paradigm shift from traditional empirical breeding to data-driven, molecularly informed intelligent breeding, providing critical technological support for precision crop improvement.
(2)
Phenomics and Sensing Technologies—UAV Imaging, Trait Evaluation, and 3D Modeling (Topics 1, 3, 4, 6, and 7)
The second major direction encompasses UAV-based remote sensing and multimodal phenotyping (Topic 1), grain trait evaluation (Topic 4), ear/pod counting (Topic 3), spike structure recognition (Topic 6), and tassel detection and developmental staging (Topic 7). These studies collectively constitute an integrated phenomics system that spans multiple spatial scales—from field-level crop populations to organ-level structural analysis—through the fusion of multimodal sensing and deep learning. UAV remote sensing enables dynamic monitoring of field-scale crop growth parameters, while computer vision-based methods facilitate automated counting of organs under complex environmental conditions. Furthermore, 3D point cloud reconstruction technologies allow accurate quantification of organ-level morphological traits [25,43,44]. Together, these approaches provide a technical foundation for deciphering genotype–phenotype–environment (G × P × E) interactions and improving the efficiency of high-throughput phenotyping.
(3)
Stress-Resilient and Quality-Oriented Breeding—Multi-Omics Integration (Topic 2)
The third thematic direction focuses on stress-tolerant breeding (Topic 2), which aims to enhance crop quality and adaptability by elucidating molecular regulatory networks underlying responses to abiotic stresses such as drought and salinity [45]. Through the integration of genomics, transcriptomics, and metabolomics, this research direction establishes a comprehensive analytical chain linking genotypes to phenotypes, thus providing a theoretical and technical foundation for resilient and high-quality crop improvement.
In summary, the field of intelligent crop breeding has developed a relatively coherent and systematic methodological framework.
  • Molecular breeding technologies (e.g., genomic prediction and gene editing) establish the theoretical foundation for trait improvement;
  • Phenomics and AI-driven sensing technologies enable precise, scalable phenotypic analysis;
  • Stress-resilient breeding ensures the stability and applicability of improved varieties under challenging environments.
This multi-disciplinary and data-integrated innovation model is propelling crop breeding toward a new paradigm of designed breeding—transitioning from experience-based selection to intelligent, model-driven optimization. It provides a systematic technological roadmap for developing high-yield, high-quality, and stress-resilient crop varieties in the era of smart agriculture.
While the identified themes illustrate a coherent multi-level structure from molecular to field scales, their co-occurrence may not necessarily imply causal interdependence. An alternative explanation is that these research directions have evolved semi-independently within distinct epistemic communities—molecular genetics, computer vision, and agronomy—that occasionally intersect through shared data infrastructures. Distinguishing between genuine co-development and parallel advancement requires longitudinal citation or co-authorship network analyses, which could be explored in future work.

4.3. Topic Trends and Evolution

From the temporal analysis, research in intelligent crop breeding demonstrates a co-evolutionary trend between theoretical foundations and technological applications.
At the fundamental research level, genomic prediction and genotype–environment modeling (Topic 0) and CRISPR-based genome editing (Topic 5) have established a solid molecular foundation for intelligent breeding. Genomic selection technologies have evolved from traditional marker-assisted selection to genome-wide predictive modeling [46], while the maturation of CRISPR/Cas9 and related editing systems has driven the rapid advancement of precision breeding [47]. These molecular tools collectively underpin the transition from empirical to data-driven, design-oriented breeding strategies.
At the applied research level, remote sensing (Topic 1) and organ-level recognition and counting (Topics 3, 4, 6, and 7) have shown strong upward momentum. The integration of deep learning with phenomics has emerged as a key engine driving progress in intelligent breeding, enabling large-scale trait quantification, automated phenotypic evaluation, and enhanced decision-making support [48].
Notably, environmental pressures such as climate change have profoundly influenced the evolution of research themes. Stress-tolerant breeding and root phenotyping (Topic 2) peaked around 2024, underscoring the central role of environmental adaptability in breeding under changing climate conditions [49]. This trend highlights how environmental challenges shape the scientific trajectory and practical priorities of crop breeding research.
Furthermore, interdisciplinary integration has become a major driver of innovation in intelligent breeding. The fusion of computer vision and crop science has generated a new generation of phenotyping techniques [50], while the incorporation of Internet of Things (IoT) technologies has enabled intelligent monitoring and management of breeding processes [51]. These cross-disciplinary innovations not only expand the research frontiers but also accelerate the convergence of biotechnology, artificial intelligence, and big data into a unified paradigm of intelligent and sustainable crop breeding.
The observed “co-development pattern” between genomic prediction (T0) and phenomics (T1) may reflect two competing mechanisms: (i) methodological coupling, where advances in phenotype sensing drive genomic model refinement; or (ii) sequential diffusion, where one matured domain (e.g., phenomics) stabilizes while newer ones (e.g., CRISPR, T5) rise to prominence. Alternatively, the apparent synchronization could partially stem from publication practices—e.g., shifting editorial focus toward integrative AI-driven breeding studies. Testing these scenarios would require temporal cross-correlation analyses of topic co-occurrence and citation dynamics.

4.4. Topic Attention Dynamics

The analysis of topic attention dynamics reveals that research in intelligent crop breeding is undergoing a shift toward an application-oriented paradigm driven by industrial and practical demands. Specifically, rapid growth is observed in CRISPR and genome editing (Topic 5), ear/pod counting (Topic 3), and grain trait evaluation (Topic 4), highlighting yield improvement and quality enhancement as the primary technological drivers of current innovation [52]. These emerging hotspots indicate that the focus of research has increasingly moved from theoretical development to problem-oriented technological implementation. Meanwhile, tassel detection and developmental staging (Topic 7) and spike structure recognition and 3D modeling (Topic 6) have maintained steady growth, providing sustained methodological support for precision phenotyping and morphological quantification [53]. These stable topics represent the consolidation of technical foundations that enable accurate and scalable analysis of crop phenotypes. In contrast, traditional core areas such as genomic prediction and genotype–environment modeling (Topic 0), stress-tolerant breeding and root phenotyping (Topic 2), and UAV-based phenotypic estimation (Topic 1) remain highly productive in terms of publication volume but exhibit declining momentum in attention. This trend suggests that these foundational methodologies have entered a stage of maturity, requiring new technological breakthroughs and cross-domain integration to reinvigorate research vitality and innovation potential.
Overall, the evolving pattern of topic attention reflects a systemic transition in intelligent crop breeding—from foundational methodological exploration toward application-driven research focused on key agronomic traits and production efficiency. This transformation underscores a clear industry-oriented trajectory, where scientific inquiry and technological advancement are increasingly guided by the practical needs of sustainable and precision agriculture.
The plateau observed in foundational topics (T0, T1) may indicate either methodological saturation or publication bias favoring emergent technologies. Similarly, the rapid rise of application-oriented topics (T3–T5) could represent genuine technological diffusion or transient enthusiasm tied to funding trends. Distinguishing these drivers will require triangulating bibliometric indicators with research funding data and field experiment adoption rates.

4.5. Quality Stratification by Topic

The analysis of research quality across topics reveals a clear pattern of quality stratification within the field of intelligent crop breeding. Genomic prediction and genotype–environment modeling (Topic 0) contains the largest number of Q1 papers, confirming its strong foundation in algorithmic innovation and its central role in shaping the academic direction of the field. UAV-based remote sensing and multimodal phenotyping (Topic 1) shows a balanced distribution across quartiles, reflecting the methodological maturity and general applicability of multimodal sensing and phenotypic interpretation technologies.
In contrast, stress-tolerant breeding and root phenotyping (Topic 2) displays a more dispersed quality distribution, suggesting that challenges remain regarding experimental standardization and result reproducibility. Notably, while CRISPR and genome editing (Topic 5) has a relatively small total publication volume, the high density of papers within the Q1–Q2 quartiles underscores its breakthrough potential and growing influence in molecular design breeding. Emerging image-based technologies—such as ear/pod counting, grain trait analysis, spike structure modeling, and tassel detection (Topics 3, 4, 6, and 7)—have accumulated a substantial number of studies within Q3–Q4, yet the limited presence of high-impact publications indicates that these application-oriented technologies still need to evolve from methodological innovation toward theoretical depth and generalization.
This pattern of quality distribution highlights both the diversity of technological pathways within the field and the differentiated developmental trajectories among research directions. Core algorithmic and model-oriented studies tend to generate high-quality outputs and academic influence, whereas emerging cross-disciplinary and application-oriented technologies require longer research cycles, iterative optimization, and sustained resource investment to realize their full scientific and industrial potential.
The quality stratification across topics might also be influenced by disciplinary gatekeeping—with journals in genetics and molecular biology maintaining stricter methodological standards than engineering-oriented outlets. Alternatively, the concentration of Q1 papers in T0–T1 may simply reflect the maturity and citation density of those subfields. Future longitudinal analyses could examine whether emergent application topics (T3–T7) achieve similar citation trajectories as they mature.

4.6. Journal Landscape and Thematic Distribution

The distribution of research topics across journals reveals the diverse pathways of knowledge production and dissemination in intelligent crop breeding. Distinct publication patterns can be observed between theoretical research and applied technological studies, reflecting a clear differentiation of journal orientations.
At the theoretical research level, genomic selection and predictive modeling (Topic 0) is primarily published in traditional breeding journals such as Theoretical and Applied Genetics, underscoring its foundational role in classical breeding theory and methodological innovation. At the technological application level, topics such as remote sensing and phenotypic recognition (Topic 1) are mainly featured in specialized journals like Remote Sensing, highlighting their strong engineering and applied focus. Similarly, CRISPR and genome editing (Topic 5), despite its relatively smaller publication volume, demonstrates high activity in molecular-focused outlets such as the International Journal of Molecular Sciences, reflecting its deep exploration at the molecular and cellular levels.
Of particular note, emerging imaging-based technologies—including ear/pod counting, grain trait analysis, spike structure recognition, and tassel detection (Topics 3, 4, 6, and 7)—are frequently published in cross-disciplinary journals such as Plant Methods. This pattern indicates that these subfields are currently in a stage of technological convergence and rapid development, where computational tools and biological applications are increasingly integrated. Moreover, Frontiers in Plant Science stands out as a comprehensive and integrative publication platform, encompassing all eight identified topics and serving as a key venue that bridges fundamental research with applied innovation in intelligent breeding.
Overall, the journal landscape of intelligent crop breeding exhibits a complementary relationship between specialized and comprehensive journals. Specialized journals ensure the depth, rigor, and continuity of domain-specific research, while comprehensive journals promote cross-disciplinary exchange and methodological integration. This division of labor and synergy between journal types not only reflects the interdisciplinary nature of intelligent crop breeding but also provides an observational lens for tracking disciplinary evolution and thematic convergence. In this sense, the interaction between specialized and integrative publishing ecosystems forms the backbone of knowledge diffusion, supporting the dynamic co-evolution of science, technology, and agricultural innovation.
Overall, while BERTopic reveals a synchronized growth of molecular and phenomic domains, this pattern should be interpreted cautiously. Apparent “co-evolution” could emerge from shared methodological vocabulary (e.g., “prediction”, “segmentation”) rather than genuine knowledge transfer. Future multimodal analyses integrating full-text and citation contexts could disentangle conceptual versus terminological coupling.

4.7. Future Research Directions and Testable Predictions

Building on the above interpretations, three testable hypotheses are proposed for future validation:
  • Convergence hypothesis: Application-oriented topics (T3–T7) will converge in growth rate toward foundational topics (T0–T1) by 2030, indicating field stabilization.
  • Hybridization hypothesis: Emerging cross-cutting topics combining molecular and imaging techniques (e.g., “multi-omics phenotyping”) will form new clusters detectable via longitudinal BERTopic or dynamic embedding models.
  • Bias correction hypothesis: Inclusion of non-English (especially Chinese) and gray literature will shift topic proportions toward stress-resilient and locally adaptive breeding research.
Testing these predictions through expanded multilingual and multi-source datasets will enhance the generalizability and foresight value of the present findings.
Beyond scientific trajectories, the observed topic trends also reflect broader policy levers driving agricultural innovation.
The accelerating integration of AI and genomics aligns with national and institutional digital-transition programs promoting data-driven research infrastructure.
The expansion of image-based and computational phenotyping underscores the importance of reskilling initiatives for agricultural scientists and breeders to effectively adopt digital tools and analytical workflows.
Moreover, the convergence of molecular, engineering, and informatics domains highlights the growing need for cross-sector collaboration mechanisms, linking public research institutes, technology firms, and policy agencies.
Strengthening these enablers will be essential to sustain innovation momentum and ensure inclusive capacity building within the intelligent crop breeding ecosystem.

4.8. Limitations

Despite its comprehensive scope, this study has several methodological limitations that warrant consideration.
(1)
Topic model reproducibility:
Like most unsupervised clustering methods, BERTopic is sensitive to random initialization and stochastic sampling. To assess model stability, three additional runs with different random seeds were conducted, yielding an average keyword-overlap ratio of 0.85, which indicates moderate but not perfect reproducibility. This highlights the importance of reporting model configurations and stability metrics in future studies.
(2)
Granularity trade-offs:
The choice of eight topics balanced interpretability and semantic coverage but inevitably simplified internal thematic diversity. A finer configuration (e.g., 12–15 topics) might reveal subdomains such as multi-omics integration or deep phenotyping, whereas a coarser structure could obscure emerging boundaries. Dynamic topic modeling could help adapt granularity to capture conceptual evolution more effectively.
(3)
Citation and data-source bias:
The citation quartile classification (Q1–Q4) is subject to temporal bias, as older papers accumulate citations more easily. Moreover, the analysis relies primarily on English-language publications from the Web of Science and uses abstracts rather than full texts, potentially underrepresenting local or emerging studies.
(4)
Venue and labeling effects:
Some apparent topic trends may partly result from publication venue effects rather than genuine scientific reorientation. For instance, the strong growth of phenomics-related studies may have been amplified by journal specialization (e.g., Remote Sensing, Plant Methods) emphasizing image-based research. Likewise, broad topic labels such as “deep learning” may group conceptually distinct approaches. Integrating multiple data sources—text, citation, and funding—could help disentangle such structural biases.
In addition, the exclusion of non-English and gray literature (e.g., conference papers and preprints) may bias the dataset toward mature, peer-reviewed studies. Although these documents often lack standardized abstracts or metadata, their omission could overlook early signals of emerging research themes. Future work may address these issues by incorporating multilingual corpora, open repositories, and dynamic topic-evolution models to improve robustness and temporal foresight.
To overcome these limitations, several validation procedures were conducted to ensure interpretive reliability, as summarized below.

4.9. External Validation of Topic Interpretation

To ensure the reliability of topic identification and interpretation, a three-step external validation was conducted.
(1)
Independent expert labeling:
Two domain experts in crop breeding and agricultural informatics independently reviewed the top 10 keywords and three representative papers for each topic. Using a blind-labeling protocol, they assigned thematic titles without access to the authors’ labels. The inter-rater agreement was high (Cohen’s kappa, κ = 0.82), indicating strong interpretive stability.
(2)
Thesaurus cross-checking:
All topic keywords were cross-referenced with established agricultural taxonomies—FAO AGROVOC and CAB Thesaurus. Over 90% of top-ranked keywords were recognized entries, and their hierarchical positions (e.g., “genomic selection” under “plant breeding”) were consistent with domain standards, supporting the conceptual soundness of the BERTopic taxonomy.
(3)
Qualitative corpus verification:
For each topic, a random sample of 10–15 articles was manually inspected to verify semantic consistency between titles, abstracts, and assigned labels. Approximately 88% of samples showed strong alignment, while a small fraction displayed overlap between adjacent themes (e.g., phenotyping vs. stress-tolerant breeding).
Collectively, these validation procedures confirm that the BERTopic-derived clusters correspond closely to recognized disciplinary boundaries and reflect genuine semantic structures in the literature. Future research could extend this approach by including larger expert panels and automated ontology-matching pipelines to further enhance cross-domain interpretability.

5. Conclusions

This study systematically mapped the thematic evolution of intelligent crop breeding (1995–2025) using the BERTopic model, revealing its structural characteristics and development trajectory. The findings show that phenomics-oriented technologies—including UAV-based remote phenotyping, grain trait evaluation, ear/pod counting, spike structure recognition, and tassel detection—are emerging as the most dynamic research frontiers. Meanwhile, molecular breeding approaches, represented by genomic prediction, genotype–environment modeling, and CRISPR-based genome editing, remain the methodological core driving precision and efficiency in crop improvement. Stress-tolerant breeding that enhances environmental adaptability also continues to serve as a major research hotspot in response to global climate challenges.
From a temporal perspective, intelligent crop breeding has experienced three developmental stages: an exploratory phase (2010–2015), a steady growth phase (2016–2020), and a rapid expansion phase (2021–2025). Overall, the field demonstrates four defining features: (1) deep interdisciplinary integration linking molecular biology, computer science, and agricultural engineering; (2) a strong application-oriented focus driven by practical breeding needs; (3) high-impact outputs in core technologies; and (4) emerging cross-disciplinary methods that are still consolidating in quality and influence. The synergy between specialized and comprehensive journals further indicates the establishment of a supportive publication ecosystem for interdisciplinary knowledge exchange.
Methodologically, the integration of transformer-based topic modeling with bibliometric indicators provides a replicable framework for analyzing the semantic evolution of complex interdisciplinary domains. This unified BERTopic–bibliometric approach offers practical value for research planning, journal selection, and policy formulation. Future work should extend this framework by incorporating multilingual and open-access datasets, integrating citation and funding information for dynamic topic modeling, and linking bibliometric insights with real-world breeding outcomes to strengthen evidence-based innovation governance in intelligent agriculture.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/agriculture15222373/s1, Figure S1: Workflow of literature identification and screening based on the Web of Science Core Collection; Figure S2: Evaluation metrics of the LDA topic model; Table S1: Web of Science search strategy and query syntax for intelligent crop breeding; Table S2: Synonym harmonization dictionary for domain-specific terminology; Table S3: Key parameter settings of the BERTopic modeling pipeline; Table S4: Topic-level keyword statistics and c-TF-IDF scores; Table S5: Topic similarity matrix based on cosine similarity; Table S6: Mann–Kendall trend test results for annual publication counts by topic; Table S7: Coherence and perplexity metrics of LDA models with different topic numbers.

Author Contributions

Conceptualization, Y.W. and X.L.; methodology, Y.W. and X.L.; software, X.L., J.Z., and J.L. (Jiajia Liu); validation, Y.W. and X.L.; formal analysis, Y.W. and X.L.; investigation, Y.W., X.L., J.Z., and J.L. (Jie Lei); resources, Y.W., X.L., Q.W., and A.Z.; data curation, X.L. and J.Z.; writing—original draft preparation, Y.W. and X.L.; writing—review and editing, Y.W., X.L., J.Z., and A.Z.; visualization, X.L. and J.Z.; supervision, A.Z.; project administration, Y.W. and A.Z.; funding acquisition, A.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Biological Breeding National Science and Technology Major Project (No. 2022ZD04017), the Agricultural Science and Technology Innovation Program (CAAS-ZDRW202503), the Central Public-Interest Scientific Institution Basal Research Fund (No. JBYW-AII-2024-34; Y2025ZZ08), and the Knowledge Innovation Project of the Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2025-AII).

Data Availability Statement

The datasets generated and analyzed during the current study are available in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

The authors would like to express their sincere gratitude to all co-authors and anonymous reviewers for their constructive suggestions and valuable feedback, which have significantly improved the quality and clarity of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhang, Y.; Huang, G.; Zhao, Y.; Lu, X.; Wang, Y.; Wang, C.; Guo, X.; Zhao, C. Revolutionizing crop breeding: Next-generation artificial intelligence and big data-driven intelligent design. Engineering 2025, 44, 245–255. [Google Scholar] [CrossRef]
  2. Zhu, W.; Duan, K.; Li, X.; Yu, K.; Shao, C. Rapid Detection of Key Phenotypic Parameters in Wheat Grains Using Linear Array Camera. Appl. Sci. 2025, 15, 5484. [Google Scholar] [CrossRef]
  3. Zhu, W.; Li, W.; Zhang, H.; Li, L. Big data and artificial intelligence-aided crop breeding: Progress and prospects. J. Integr. Plant Biol. 2025, 67, 722–739. [Google Scholar] [CrossRef]
  4. Shahzad, A.; Ullah, S.; Dar, A.A.; Sardar, M.F.; Mehmood, T.; Tufail, M.A.; Shakoor, A.; Haris, M. Nexus on climate change: Agriculture and possible solution to cope future climate change stresses. Environ. Sci. Pollut. Res. 2021, 28, 14211–14232. [Google Scholar] [CrossRef]
  5. Rashid, M.A.R.; Atif, R.M.; Zhao, Y.; Azeem, F.; Ahmed, H.G.M.-D.; Pan, Y.; Li, D.; Zhao, Y.; Zhang, Z.; Zhang, H. Dissection of genetic architecture for tiller angle in rice (Oryza sativa L.) by multiple genome-wide association analyses. PeerJ 2022, 10, e12674. [Google Scholar] [CrossRef]
  6. Riaz, M.; Yasmeen, E.; Saleem, B.; Hameed, M.K.; Saeed Almheiri, M.T.; Saeed Al Mir, R.O.; Alameri, G.; Khamis Alghafri, J.S.; Gururani, M.A. Evolution of agricultural biotechnology is the paradigm shift in crop resilience and development: A review. Front. Plant Sci. 2025, 16, 1585826. [Google Scholar] [CrossRef]
  7. Xie, Q.; Zhang, X.; Song, M. A network embedding-based scholar assessment indicator considering four facets: Research topic, author credit allocation, field-normalized journal impact, and published time. J. Informetr. 2021, 15, 101201. [Google Scholar] [CrossRef]
  8. Najafabadi, M.Y.; Jackson, S.A. Hybrid AI in synthetic biology: Next era in agriculture. Trends Plant Sci. 2025. [Google Scholar] [CrossRef] [PubMed]
  9. Rejeb, A.; Rejeb, K.; Hassoun, A. The impact of machine learning applications in agricultural supply chain: A topic modeling-based review. Discov. Food 2025, 5, 141. [Google Scholar] [CrossRef]
  10. Lin, Q.; Xin, Z.; Peng, S.; Zhao, R.; Nie, Y.; Chen, Y.; Yin, X.; Xian, G.; Zhang, Q. Research on Topic Mining and Evolution Trends of Functional Agriculture Based on the BERTopic Model. Agriculture 2024, 14, 1691. [Google Scholar] [CrossRef]
  11. Liu, Y.; Wan, F. Unveiling temporal and spatial research trends in precision agriculture: A BERTopic text mining approach. Heliyon 2024, 10, e36808. [Google Scholar] [CrossRef] [PubMed]
  12. Sott, M.K.; Nascimento, L.D.S.; Foguesatto, C.R.; Furstenau, L.B.; Faccin, K.; Zawislak, P.A.; Mellado, B.; Kong, J.D.; Bragazzi, N.L. A Bibliometric Network Analysis of Recent Publications on Digital Agriculture to Depict Strategic Themes and Evolution Structure. Sensors 2021, 21, 7889. [Google Scholar] [CrossRef] [PubMed]
  13. Grootendorst, M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv 2022, arXiv:2203.05794. [Google Scholar]
  14. Raman, R.; Pattnaik, D.; Hughes, L.; Nedungadi, P. Unveiling the dynamics of AI applications: A review of reviews using scientometrics and BERTopic modeling. J. Innov. Knowl. 2024, 9, 100517. [Google Scholar] [CrossRef]
  15. Meier, F.; Dixon, T.; Williams, T.; Paulsen, I. Navigating the Frontier of Synthetic Biology: An AI-Driven Analytics Platform for Exploring Research Trends and Relationships. ACS Synth. Biol. 2023, 12, 3229–3241. [Google Scholar] [CrossRef]
  16. Pazhouhan, M.; Karimi Mazraeshahi, A.; Jahanbakht, M.; Rezanejad, K.; Rohban, M.H. Wave and Tidal Energy: A Patent Landscape Study. J. Mar. Sci. Eng. 2024, 12, 1967. [Google Scholar] [CrossRef]
  17. Fernandes, I.K.; Vieira, C.C.; Dias, K.O.G.; Fernandes, S.B. Using machine learning to combine genetic and environmental data for maize grain yield predictions across multi-environment trials. Theor. Appl. Genet. 2024, 137, 189. [Google Scholar] [CrossRef]
  18. Thudi, M.; Palakurthi, R.; Schnable, J.C.; Chitikineni, A.; Dreisigacker, S.; Mace, E.; Srivastava, R.K.; Satyavathi, C.T.; Odeny, D.; Tiwari, V.K.; et al. Genomic resources in plant breeding for sustainable agriculture. J. Plant Physiol. 2021, 257, 153351. [Google Scholar] [CrossRef]
  19. Shu, M.Y.; Fei, S.P.; Zhang, B.Y.; Yang, X.H.; Guo, Y.; Li, B.G.; Ma, Y.T. Application of UAV Multisensor Data and Ensemble Approach for High-Throughput Estimation of Maize Phenotyping Traits. Plant Phenomics 2022, 2022, 9802585. [Google Scholar] [CrossRef]
  20. Zhou, H.K.; Huang, F.D.; Lou, W.D.; Gu, Q.; Ye, Z.R.; Hu, H.; Zhang, X.B. Yield prediction through UAV-based multispectral imaging and deep learning in rice breeding trials. Agric. Syst. 2025, 223, 104214. [Google Scholar] [CrossRef]
  21. Shi, J.W.; Xie, S.Y.; Li, W.K.; Wang, X.; Wang, J.L.; Chen, Y.Y.; Chang, Y.Y.; Lou, Q.J.; Yang, W.N. RPT: An integrated root phenotyping toolbox for segmenting and quantifying root system architecture. Plant Biotechnol. J. 2025, 23, 2095–2109. [Google Scholar] [CrossRef]
  22. Kakar, N.; Jumaa, S.H.; Sah, S.K.; Redoña, E.D.; Warburton, M.L.; Reddy, K.R. Genetic Variability Assessment of Tropical Indica Rice (Oryza sativa L.) Seedlings for Drought Stress Tolerance. Plants 2022, 11, 2332. [Google Scholar] [CrossRef] [PubMed]
  23. Sadeghi-Tehran, P.; Virlet, N.; Ampe, E.M.; Reyns, P.; Hawkesford, M.J. DeepCount: In-Field Automatic Quantification of Wheat Spikes Using Simple Linear Iterative Clustering and Deep Convolutional Neural Networks. Front. Plant Sci. 2019, 10, 1176. [Google Scholar] [CrossRef]
  24. Yang, B.H.; Pan, M.; Gao, Z.W.; Zhi, H.B.; Zhang, X.X. Cross-Platform Wheat Ear Counting Model Using Deep Learning for UAV and Ground Systems. Agronomy 2023, 13, 1792. [Google Scholar] [CrossRef]
  25. Qin, Z.; Zhang, Z.; Hua, X.; Yang, W.; Liang, X.; Zhai, R.; Huang, C. Cereal grain 3D point cloud analysis method for shape extraction and filled/unfilled grain identification based on structured light imaging. Sci. Rep. 2022, 12, 3145. [Google Scholar] [CrossRef]
  26. Liu, T.; Wu, W.; Chen, W.; Sun, C.M.; Chen, C.; Wang, R.; Zhu, X.K.; Guo, W.S. A shadow-based method to calculate the percentage of filled rice grains. Biosyst. Eng. 2016, 150, 79–88. [Google Scholar] [CrossRef]
  27. Liu, H.; Chen, W.D.; Li, Y.S.; Sun, L.; Chai, Y.H.; Chen, H.X.; Nie, H.C.; Huang, C.L. CRISPR/Cas9 Technology and Its Utility for Crop Improvement. Int. J. Mol. Sci. 2022, 23, 10442. [Google Scholar] [CrossRef]
  28. Rasheed, A.; Gill, R.A.; Hassan, M.U.; Mahmood, A.; Qari, S.; Zaman, Q.U.; Ilyas, M.; Aamer, M.; Batool, M.; Li, H.J.; et al. A Critical Review: Recent Advancements in the Use of CRISPR/Cas9 Technology to Enhance Crops and Alleviate Global Food Crises. Curr. Issues Mol. Biol. 2021, 43, 1950–1976. [Google Scholar] [CrossRef] [PubMed]
  29. Liu, Z.H.; Jin, S.C.; Liu, X.Q.; Yang, Q.L.; Li, Q.; Zang, J.R.; Li, Z.F.; Hu, T.Y.; Guo, Z.F.; Wu, J.; et al. Extraction of Wheat Spike Phenotypes From Field-Collected Lidar Data and Exploration of Their Relationships With Wheat Yield. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–13. [Google Scholar] [CrossRef]
  30. Hasan, M.M.; Chopin, J.P.; Laga, H.; Miklavcic, S.J. Detection and analysis of wheat spikes using Convolutional Neural Networks. Plant Methods 2018, 14, 100. [Google Scholar] [CrossRef]
  31. Zan, X.L.; Zhang, X.L.; Xing, Z.Y.; Liu, W.; Zhang, X.D.; Su, W.; Liu, Z.; Zhao, Y.Y.; Li, S.M. Automatic Detection of Maize Tassels from UAV Images by Combining Random Forest Classifier and VGG16. Remote Sens. 2020, 12, 3049. [Google Scholar] [CrossRef]
  32. Gao, R.; Jin, Y.S.; Tian, X.; Ma, Z.; Liu, S.Q.; Su, Z.B. YOLOv5-T: A precise real-time detection method for maize tassels based on UAV low altitude remote sensing images. Comput. Electron. Agric. 2024, 221, 108991. [Google Scholar] [CrossRef]
  33. Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
  34. Hofmann, T. Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 2001, 42, 177–196. [Google Scholar] [CrossRef]
  35. Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788–791. [Google Scholar] [CrossRef] [PubMed]
  36. Li, T.; Cui, L.; Wu, Y.; Pandey, R.; Liu, H.; Dong, J.; Wang, W.; Xu, Z.; Song, X.; Hao, Y. Unveiling and advancing grassland degradation research using a BERTopic modelling approach. J. Integr. Agric. 2025, 24, 949–965. [Google Scholar] [CrossRef]
  37. Liu, H.-J.; Liu, J.; Zhai, Z.; Dai, M.; Tian, F.; Wu, Y.; Tang, J.; Lu, Y.; Wang, H.; Jackson, D. Maize2035: A decadal vision for intelligent maize breeding. Mol. Plant 2025, 18, 313–332. [Google Scholar] [CrossRef]
  38. Han, G.; Yan, H.; Li, L.; An, D. Advancing wheat breeding using rye: A key contribution to wheat breeding history. Trends Biotechnol. 2025, 43, 2170–2183. [Google Scholar] [CrossRef] [PubMed]
  39. Xu, F.; Yoshida, H.; Chu, C.; Matsuoka, M.; Sun, J. Seed dormancy and germination in rice: Molecular regulatory mechanisms and breeding. Mol. Plant 2025, 18, 960–977. [Google Scholar] [CrossRef]
  40. Zhou, Y.; Zhou, Z.; Shu, Q. Synthetic genomics in crop breeding: Evidence, opportunities and challenges. Crop Des. 2025, 4, 100090. [Google Scholar] [CrossRef]
  41. Misra, T.; Arora, A.; Marwaha, S.; Ranjan Jha, R.; Ray, M.; Kumar, S.; Kumar, S.; Chinnusamy, V. Yield-SpikeSegNet: An extension of SpikeSegNet deep-learning approach for the yield estimation in the wheat using visual images. Appl. Artif. Intell. 2022, 36, 2137642. [Google Scholar] [CrossRef]
  42. Ahmar, S.; Usman, B.; Hensel, G.; Jung, K.-H.; Gruszka, D. CRISPR enables sustainable cereal production for a greener future. Trends Plant Sci. 2024, 29, 179–195. [Google Scholar] [CrossRef] [PubMed]
  43. Skobalski, J.; Sagan, V.; Alifu, H.; Al Akkad, O.; Lopes, F.A.; Grignola, F. Bridging the gap between crop breeding and GeoAI: Soybean yield prediction from multispectral UAV images with transfer learning. ISPRS J. Photogramm. Remote Sens. 2024, 210, 260–281. [Google Scholar] [CrossRef]
  44. Falk, K.G.; Jubery, T.Z.; Mirnezami, S.V.; Parmley, K.A.; Sarkar, S.; Singh, A.; Ganapathysubramanian, B.; Singh, A.K. Computer vision and machine learning enabled soybean root phenotyping pipeline. Plant Methods 2020, 16, 5. [Google Scholar] [CrossRef]
  45. Gou, C.; Zafar, S.; Fatima, N.; Hasnain, Z.; Aslam, N.; Iqbal, N.; Abbas, S.; Li, H.; Li, J.; Chen, B. Machine and deep learning: Artificial intelligence application in biotic and abiotic stress management in plants. Front. Biosci.-Landmark 2024, 29, 20. [Google Scholar] [CrossRef]
  46. Gui, J.-F.; Tang, Q.; Li, Z.; Liu, J.; De Silva, S.S. Aquaculture in China: Success Stories and Modern Trends; John Wiley & Sons: Hoboken, NJ, USA, 2018. [Google Scholar]
  47. Kaur, N.; Qadir, M.; Francis, D.V.; Alok, A.; Tiwari, S.; Ahmed, Z.F. CRISPR/Cas9: A sustainable technology to enhance climate resilience in major Staple Crops. Front. Genome Ed. 2025, 7, 1533197. [Google Scholar] [CrossRef]
  48. Sandhu, K.S.; Merrick, L.F.; Sankaran, S.; Zhang, Z.; Carter, A.H. Prospectus of genomic selection and phenomics in cereal, legume and oilseed breeding programs. Front. Genet. 2022, 12, 829131. [Google Scholar] [CrossRef]
  49. Braun, H.-J.; Atlin, G.; Payne, T. Multi-location testing as a tool to identify plant response to global climate change. In Climate Change and Crop Production; CABI: Wallingford, UK, 2010; pp. 115–138. [Google Scholar]
  50. Jeon, D.; Kang, Y.; Lee, S.; Choi, S.; Sung, Y.; Lee, T.-H.; Kim, C. Digitalizing breeding in plants: A new trend of next-generation breeding based on genomic prediction. Front. Plant Sci. 2023, 14, 1092584. [Google Scholar] [CrossRef]
  51. Reynolds, D.; Ball, J.; Bauer, A.; Davey, R.; Griffiths, S.; Zhou, J. CropSight: A scalable and open-source information management system for distributed plant phenotyping and IoT-based crop management. Gigascience 2019, 8, giz009. [Google Scholar] [CrossRef] [PubMed]
  52. Hammers, M.; Winn, Z.J.; Ben-Hur, A.; Larkin, D.; Murry, J.; Mason, R.E. Phenotyping and predicting wheat spike characteristics using image analysis and machine learning. Plant Phenome J. 2023, 6, e20087. [Google Scholar] [CrossRef]
  53. Zhang-Biehn, S.; Fritz, A.K.; Zhang, G.; Evers, B.; Regan, R.; Poland, J. Accelerating wheat breeding for end-use quality through association mapping and multivariate genomic prediction. Plant Genome 2021, 14, e20164. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Overall workflow of topic modeling in intelligent crop breeding research. The analysis consists of four main stages: (1) Data Collection: Publications were retrieved from the Web of Science using crop- and smart-agriculture-related keywords (3579 initial records; 1867 selected after expert screening). (2) Natural Language Processing: Texts were processed via tokenization, stop word removal, lemmatization, and synonym unification using spaCy. (3) Topic Modeling: BERTopic was applied using transformer-based embeddings, UMAP for dimensionality reduction, and HDBSCAN clustering, followed by topic representation with class-based TF-IDF (c-TF-IDF). (4) Downstream Analysis: Identified topics were analyzed for temporal evolution, popularity trends (slope-based dynamics), and journal distribution patterns. The red dashed arrows indicate the sequential data flow across all stages.
Figure 1. Overall workflow of topic modeling in intelligent crop breeding research. The analysis consists of four main stages: (1) Data Collection: Publications were retrieved from the Web of Science using crop- and smart-agriculture-related keywords (3579 initial records; 1867 selected after expert screening). (2) Natural Language Processing: Texts were processed via tokenization, stop word removal, lemmatization, and synonym unification using spaCy. (3) Topic Modeling: BERTopic was applied using transformer-based embeddings, UMAP for dimensionality reduction, and HDBSCAN clustering, followed by topic representation with class-based TF-IDF (c-TF-IDF). (4) Downstream Analysis: Identified topics were analyzed for temporal evolution, popularity trends (slope-based dynamics), and journal distribution patterns. The red dashed arrows indicate the sequential data flow across all stages.
Agriculture 15 02373 g001
Figure 2. Annual publication trend of intelligent crop breeding research from 1995 to 2025.
Figure 2. Annual publication trend of intelligent crop breeding research from 1995 to 2025.
Agriculture 15 02373 g002
Figure 3. Keyword clusters of the eight major research topics in intelligent crop breeding identified by BERTopic. Each word cloud illustrates the representative keywords and their relative frequencies within a given topic cluster. Larger words indicate higher importance or frequency.
Figure 3. Keyword clusters of the eight major research topics in intelligent crop breeding identified by BERTopic. Each word cloud illustrates the representative keywords and their relative frequencies within a given topic cluster. Larger words indicate higher importance or frequency.
Agriculture 15 02373 g003
Figure 4. Temporal evolution of the eight major research topics in intelligent crop breeding from 1995 to 2025. The figure presents annual publication trends for each topic identified by the BERTopic model.
Figure 4. Temporal evolution of the eight major research topics in intelligent crop breeding from 1995 to 2025. The figure presents annual publication trends for each topic identified by the BERTopic model.
Agriculture 15 02373 g004
Figure 5. Five-year slope values and hotness spectrum of research topics in intelligent crop breeding. The bar chart illustrates the slope of annual publication trends (2020–2025) for each of the eight topics identified by the BERTopic model.
Figure 5. Five-year slope values and hotness spectrum of research topics in intelligent crop breeding. The bar chart illustrates the slope of annual publication trends (2020–2025) for each of the eight topics identified by the BERTopic model.
Agriculture 15 02373 g005
Figure 6. Distribution of research topics across citation quartiles (Q1–Q4) in intelligent crop breeding. The stacked bars represent the number of papers for each topic across four citation quartiles ranked by citation counts. Q1 includes the top 25% of highly cited papers, while Q4 contains the bottom 25%.
Figure 6. Distribution of research topics across citation quartiles (Q1–Q4) in intelligent crop breeding. The stacked bars represent the number of papers for each topic across four citation quartiles ranked by citation counts. Q1 includes the top 25% of highly cited papers, while Q4 contains the bottom 25%.
Agriculture 15 02373 g006
Figure 7. Distribution of topic-specific publications across major journals in intelligent crop breeding research. The figure illustrates the number of papers published under each research topic across leading journals from 1995 to 2025.
Figure 7. Distribution of topic-specific publications across major journals in intelligent crop breeding research. The figure illustrates the number of papers published under each research topic across leading journals from 1995 to 2025.
Agriculture 15 02373 g007
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liang, X.; Wu, Y.; Zhuang, J.; Liu, J.; Lei, J.; Wang, Q.; Zhou, A. From Molecules to Fields: Mapping the Thematic Evolution of Intelligent Crop Breeding via BERTopic Text Mining. Agriculture 2025, 15, 2373. https://doi.org/10.3390/agriculture15222373

AMA Style

Liang X, Wu Y, Zhuang J, Liu J, Lei J, Wang Q, Zhou A. From Molecules to Fields: Mapping the Thematic Evolution of Intelligent Crop Breeding via BERTopic Text Mining. Agriculture. 2025; 15(22):2373. https://doi.org/10.3390/agriculture15222373

Chicago/Turabian Style

Liang, Xiaohe, Yu Wu, Jiayu Zhuang, Jiajia Liu, Jie Lei, Qi Wang, and Ailian Zhou. 2025. "From Molecules to Fields: Mapping the Thematic Evolution of Intelligent Crop Breeding via BERTopic Text Mining" Agriculture 15, no. 22: 2373. https://doi.org/10.3390/agriculture15222373

APA Style

Liang, X., Wu, Y., Zhuang, J., Liu, J., Lei, J., Wang, Q., & Zhou, A. (2025). From Molecules to Fields: Mapping the Thematic Evolution of Intelligent Crop Breeding via BERTopic Text Mining. Agriculture, 15(22), 2373. https://doi.org/10.3390/agriculture15222373

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop