1. Introduction
In the context of sustained global population growth, there is a pressing need to ensure food security and enhance agricultural production efficiency, despite the escalating impacts of climate change and the diminished arable land resources. As the cornerstone of agricultural production and genetic improvement, crop germplasm resources are of paramount importance for breeding superior varieties, improving crop yield and quality, and enhancing crop stress resistance [
1,
2]. A systematic study on the phenotypic diversity of persimmons germplasm resources has confirmed that the abundant variation in phenotypic traits in germplasm is the core basis for screening excellent breeding parents and realizing directional genetic improvement, and the in-depth analysis of the correlation and evolutionary law between various traits can effectively improve the efficiency of germplasm evaluation and variety breeding [
3]. The organ characteristics of these resources–including morphological, structural, and physiological–biochemical traits at various developmental stages from seeding to maturity. These characteristics not only manifest the genetic background and biological attributes of germplasm, but also provide essential criteria for germplasm classification, identification, evaluation, and genetic improvement [
4,
5]. Accurate and efficient identification and management of these characteristics enable breeders to gain a deeper understanding of the strengths and weaknesses of these resources, facilitating targeted breeding efforts, accelerating the selection of superior varieties, and meeting the dynamic demands of agricultural production [
6]. Currently, researchers primarily rely on manual observation and empirical judgment to assess germplasm characteristics, followed by paper-based recording and subsequent digital entry [
7]. This process is time-consuming, labor-intensive, and prone to data errors. As breeding scale expands and the number of breeding materials increases, the volume of generated image data grows exponentially, complicating data organization, classification and analysis, and impeding breeding efficiency.
Establishing a dedicated platform for managing germplasm resource data, particularly phenotypic data, is a crucial strategy to address these challenges. For instance, C. Pommier applied the FAIR principles (Findable, Accessible, Interoperable, and Reusable) to facilitate the management and sharing of plant phenotypic databases [
8]. In recent years, numerous databases have extensively integrated genotypic and phenotypic datasets, providing essential support for genomic prediction and breeding design [
9]. For example, the CropGS-Hub [
10] database encompasses over 224 billion genotypic data points and 434,000 phenotypic data points from more than 30,000 individuals across seven major crops, incorporating modules for phenotypic prediction, user-defined model training, and hybrid design. Similarly, SoyOmics [
11] provides an integrated multi-omics database for soybeans, including genomic and phenotypic data. The Crop Breeding Evaluation Information System (BreedingEIS) [
12] allow precise and user-friendly recording of phenotypic and environmental data, integrating breeding information to support decision-making. Furthermore, BreedingAIDB [
13] integrates crop genomic and phenotypic data, providing readily accessible crop data and machine learning tools for breeders and researchers.
The application of image processing techniques and high-throughput phenotypic platforms can further enhance the efficiency of germplasm resource data analysis. Machine learning and deep learning-based image analysis methods have been widely applied to plant phenotypic identification, enabling rapid and accurate extraction of crop morphological and structural characteristics [
14]. StripeRust-Pocket [
15], a deep learning model integrated into smartphones, can rapidly assess the severity of wheat stripe rust in the field. The open-source software PREPs [
16] supports image segmentation and trait extraction (e.g., plant height and biomass) for field plant phenotyping, making phenotypic analysis accessible to users without programming knowledge. OpenPheno [
17], a smartphone-based platform, provides real-time analysis of traits, such as seed size, leaf angle, spike counting, and tomato fruit measurement. Moreover, the intelligent breeding platforms like AutoGP [
18] integrate genotypic and phenotypic extraction, deriving traits from smartphone-captured videos to furnish breeders with efficient tools. Yu [
19] proposed the SLW-YOLO model based on the improved YOLOv5s and a self-propelled image acquisition platform, which achieves high-precision non-contact detection of key phenotypic traits of hybrid soybean parents in the field (mAP = 94.8%), with phenotypic consistency detection accuracy matching manual evaluation. This study verifies that customized deep learning models combined with field-adapted image acquisition equipment can effectively address the low efficiency and high error rate of manual phenotypic identification in crop breeding.
Despite notable progress in existing platforms for crop phenotyping and data management, several limitations remain for practical field applications in large-scale germplasm experiments. First, public databases cannot fully meet the customized demands of breeding programs. The uniqueness and diversity of breeding materials are often insufficiently covered by available databases. In addition, confidentiality requirements during early-stage breeding make it difficult for researchers to manage core data on public platforms. Second, field phenotyping is easily affected by complex environmental factors such as uneven illumination, organ occlusion, and cluttered backgrounds. Stable, high-quality images are hard to obtain directly under natural field conditions. Moreover, the adaptability and robustness of current phenotyping models rely heavily on specific scenes and datasets, and cannot be universally adapted to diverse field environments by algorithmic improvements alone [
20]. Finally, germplasm experiments usually involve long durations, multiple participants, and scattered data sources. Standalone data management or phenotyping platforms struggle to support unified multi-source data control and long-term traceability.
To address these limitations, this study conducted a case study on soybean germplasm resources, provided an image acquisition solution, and established a standardized data management system. High-quality annotated images were obtained under regulated imaging conditions, providing a reliable data foundation for model training and optimization. On this basis, we developed a web-based and WeChat Mini Program-based intelligent recognition and management system for crop germplasm resources. The system integrates data management, phenotypic recognition, and deep learning models to support standardized full-cycle germplasm management, intelligent organ trait recognition, and secure multi-permission data control. This framework ensures standardization and consistency in data collection and storage, while laying a methodological foundation for future lightweight and highly generalized recognition in complex field environments. The scientific objective of this study is to explore a standardized field image acquisition and model construction scheme that maximizes the stability and reliability of organ phenotypes. The applied objective is to build a usable and scalable management system that supports real-world breeding surveys, data collection, and scientific management.
2. Materials and Methods
2.1. System Architecture
The system is built on a Browser/Server (B/S) architecture for centralized deployment and distributed applications, adhering to the Model–View–Controller (MVC) development pattern. The system’s functional modules are designed based on the Service-Oriented Architecture (SOA). The backend leverages the core framework of SpringBoot 3.x with Java 17, integrating Spring Security and JWT to provide multi-factor authentication and authorization. It employs MyBatis-Plus to enhance and optimize database operation efficiency, and establishes a fine-grained permission management system, along with an end-to-end data encryption transmission mechanism, utilizing the Role-Based Access Control (RBAC) model.
The frontend management dashboard is built using the React technology stack, integrated with the Arco Design component library to boost dynamic interactivity and visualization capabilities. The mobile client is implemented primarily via WeChat Mini Programs, supporting offline data caching, QR code scanning, and seamless integration with WeChat’s login status. The data storage layer relies on a MySQL master–slave backup cluster to ensure high availability, complemented by Redis caching technology to optimize the efficiency of high-frequency queries. This forms a comprehensive solution from data collection in the field, through cloud storage and analysis, to visual management.
The system adopts a hierarchical architecture, consisting of “6 horizontal layers” and “2 vertical systems”, as illustrated in
Figure 1. The “6 horizontal layers” include the infrastructure layer, data layer, supporting layer, application layer, display layer, and user layer. The “2 vertical systems” comprise the standard specification system and the operation and maintenance support system. The standard specification system ensures the compatibility, interoperability, and consistency of all system components through technical, management, and data standards. The operation and maintenance support system covers monitoring, fault handling, performance optimization, and resource management.
2.2. System Overview
This integrated management platform is developed based on real-world field crop cultivation and management. Through the web portal, the system generates unique QR code identifiers for each planting area, which are deployed in the fields. Data collectors can scan these QR codes using a WeChat Mini Program, associating plot information and inputting crop growth data. The collected data is efficiently stored and managed via OSS cloud storage. Additionally, the system employs integrated deep learning algorithms to automate the analysis and processing of uploaded crop images.
Crop-IRM establishes a project management system on the cloud platform, enabling intelligent recognition of crop germplasm resources’ organ characteristics and efficient data management. The main modules and operational procedures of this system are shown in
Figure 2. Based on different project stages, the system is divided into four major modules: the project creation module, data collection module, feature recognition and analysis module, and data management and download module. Correspondingly, the operational procedures are categorized into four stages:
- (1)
Project Initialization Stage: Users are required to create a crop germplasm resource project and input basic information, such as the project name, crop name, variety, project leader, and participant list. Subsequently, crop information, including crop category, variety name, source, characteristics, and cultivation practices, is imported through an Excel spreadsheet. The system then generates unique QR codes, which users can print and fabricate into signs to be inserted in the respective crop variety plots, ensuring accurate allocation of variety information to cultivation areas.
- (2)
Data Collection Stage: Users open the WeChat Mini Program, scan the unique identification QR code, select the collection content, and choose from two data collection methods: image selection or text input. Upon completing data collection, users click the submit button to upload the data to the project management system, where cloud storage technology automatically classifies and labels the data.
- (3)
Feature Recognition and Analysis Stage: The system deploys multiple phenotypic recognition models. Users select crop organ feature images and corresponding models for batch processing, generating analysis results instantaneously. Users can modify the recognition results, with the system saving modification records.
- (4)
Data Storage and Download Stage: The platform supports unified data storage and management, enabling data filtering, downloading, and visual display. The downloadable content is selectable (e.g., basic crop information, image data, and image recognition results), and display methods include feature distribution maps and statistical charts.
2.3. Data Collection and Processing
To illustrate the system’s data collection process, we present a soybean experiment aimed at screening soybean varieties that are high-yielding, suitable for a specific planting period, and tolerant to shading. Experimental site coordinates and field layout are presented in
Figure 3. A total of 1334 soybean varieties were selected for cultivation experiments, conducted in 2025 at the Modern Agriculture R&D Base of Sichuan Agricultural University, located in Chongzhou District, Chengdu City, Sichuan Province, China, which includes 1334 plots, each planted with one variety. Each plot contains three rows, with 58 plots arranged in a column, and a total of 23 columns. Row length is 1 m, with row spacing of 0.4 m, plant spacing of 0.1 m, variety spacing of 0.8 m, and column spacing of 0.7 m.
The characteristic image data of soybean germplasm resources, including images of compound leaflets, flowers, pods, seeds, and other parts, were collected using smartphones of different models. During the collection of image data for model training, we did not distinguish phenotypic traits on an individual germplasm basis; instead, images were broadly collected across all germplasm materials. When recording the characteristics of each germplasm, textual information and image data were entered by scanning the QR code corresponding to each material. The dataset construction process and criteria are shown in
Figure 4.
To standardize data collection, a black light-absorbing cloth background was placed under the target plant part, and images were then captured from above. All images, except those of soybean seeds, were taken in the field using a non-destructive method (
Figure 4a). The specific data collection standards are as follows:
Flower images should include at least one complete flower.
Compound leaflet image collection should follow two methods: either ensure the frame contains a complete compound leaf, with leaflets not overlapping and remaining flat, or capture only the central leaflet of the compound leaf, while maintaining its flatness.
Pod images should ensure that at least one pod lies flat.
Soybean seed images should ensure that the seeds do not overlap.
We processed the collected image data by selecting clear images containing characteristics of soybean organs. Various annotation software tools, such as Labelimg (v1.8.6), Anylabeling (v0.4.30), and Labelme (v5.10.0), were used to label the feature regions, creating datasets for subsequent training. Seven soybean image datasets were developed, including: a flower color dataset (C_F), a single-pod seed count dataset (N_PS), a pod curvature dataset (C_P), a pod pubescence color dataset (PC_P), a compound leaflet count dataset (N_L), a compound leaflet shape dataset (S_L), and a soybean seed detection dataset (D_S). Some images are shared among the N_PS, C_P, and PC_P. The data in each dataset was randomly partitioned into training and validation sets at a ratio of 7:3. In addition, we prepared 100 extra images per dataset as an independent test set, which were not included in the training or validation sets. Examples of images from different categories within each dataset are shown in
Figure 4b. Among them, the C_P dataset employs the annotation method using key feature points, and the sample image is shown in
Figure A1.
The YOLO model, widely used in image recognition for object detection, was employed for this system. Specifically, YOLOv11 was selected for its excellent performance in object detection, image classification, instance segmentation, and pose estimation tasks. Suitable YOLOv11 model variants were chosen for training according to the specific datasets and tasks (e.g., segmentation or detection). The details of dataset partitioning and model selection are provided in
Figure 4c.
2.4. Configuration of Experiment Environment and Evaluation Metrics
The main configurations of the project machine include: CPU: 13th Gen Intel(R) Core(TM) i9-13900K 3.00 GHz; RAM: 128.0 GB RAM; GPU: NVIDIA Quadro P6000(24 GB); Storage: 12TB; Operating system: Windows 10 22H2 64-bit.
All experiments were conducted using the Pytorch deep learning framework, within the Python 3.11 development environment, with Pytorch 2.7.1, CUDA version 12.8, and CUDNN version 9.7.1.
For evaluating object detection, segmentation, and pose estimation models, mean average precision (mAP@0.5, mAP@0.5:0.95) was chosen as a performance metric. For classification models, top1_accuracy (the accuracy when the model’s highest predicted probability matches the actual label) and top5_accuracy (the accuracy when the top five ranked predictions include the actual label) were used. mAP@0.5 and mAP@0.5:0.95 are calculated as follows:
N is the number of categories, IoU is the intersection and union ratio between the predicted box and the real box, and
is the AP of the
i-th category at IoU = 0.5.
is the mAP when the value of IoU is t, t ϵ {0.5, 0.55, …, 0.95}
In addition, after the model is deployed, we compare its recognition results for 100 images with the results of manual (empirical) judgments. We use the proportion of accurately recognized images, the inference time per image, and the sum of inference time and result download time as the basis for evaluating the model’s practicality.
3. Results
3.1. Model Performance on Different Datasets
One of the primary objectives of this study is to deploy the model on terminals for rapid detection of a large number of images. To achieve this, we select the lightweight YOLOv11n model. The performance of YOLOv11n-based models on different datasets is summarized in
Table 1.
Soybean flowers are categorized into two colors: white and purple, while the pod trichome colors are either brown or gray. Given that each soybean variety corresponds to a specific flower and trichome color, the classification model YOLOv11-cls is employed. This model achieves a top1_accuracy of 0.999 and a top5_accuracy of 1 on both flower and trichome color datasets.
The number of leaflets per compound leaf ranges from 3 to 5 or more, while the number of seeds in a single pod varies from 1 to 4 or more. In our experimental site, compound leaves with fewer than 6 leaflets and pods with fewer than 5 seeds were collected. For soybean plants of the same variety, they may exhibit multiple categories of pods and compound leaves. The object detection model YOLOv11n-det is applied, which achieves mAP@0.5 values of 0.943 for compound leaves and 0.981 for pods. The mAP@0.5:0.95 values are 0.931 and 0.917, respectively.
The classification of leaf shapes follows the “Guidelines for the conduct of tests for distinctness, uniformity and stability—Soybean” as outlined by the National Standards of the People’s Republic of China (hereinafter referred to as the “Standards”). A segmentation model, YOLOv11n-seg, is ideal for accurately delineating the boundaries of various leaf shapes. The model achieves high segmentation accuracy on the S_L dataset, with an mAP@0.5 of 0.995 and an mAP@0.5:0.95 of 0.993.
The classification of pod curvature also follows the descriptions in the “Standards,” although it does not precisely specify the degree of curvature for each category. The pose model, typically used for human pose detection, is applied to calculate pod curvature. The YOLOv11n-pose model marked five key points on the pods to calculate the curvature angle. It achieves mAP@0.5 of 0.995 and an mAP@0.5:0.95 of 0.993 on the C_P dataset, demonstrating its effectiveness for pod shape detection.
Lastly, a detection model YOLOv11n-det is used to detect soybean seeds, aiming to rapidly calculate the 100-seed weight during the variety evaluation stage, which is crucial for yield prediction. On the D_S dataset, it achieves an mAP@0.5 of 0.995, with an mAP@0.5:0.95 value of 0.868. This lower value is likely due to the high seed density and overlapping targets in the images.
3.2. Operating System
3.2.1. Interface Design and Client Collaboration Logic
The Crop-IRM system adopts a dual-terminal architecture, comprising a web-based management platform and a WeChat Mini Program named “Zhong Zhi Hui Jian”. This architecture facilitates the seamless integration of laboratory data processing and field data collection. The core interfaces of the system are depicted in
Figure A2 (web terminal) and
Figure A3 (mobile terminal). The design logic centers around “user-friendly operation, data standardization, and intelligent collaboration” to meet the demands of germplasm resource investigation and phenotyping research.
The synergy between the web-based platform and the mobile client is manifested in two primary aspects: data flow and functional complementarity. First, image and text data collected via the mobile client are uploaded to the web client, processed by models, and the results are synchronized back to the mobile client for on-site reference. Second, while the web platform handles data management and analysis, the mobile client is focused on data collection and viewing. Together, these components form a closed-loop system, moving from “on-site data collection” to “breeding decision support”. The design of both clients balances professionalism and user-friendliness, allowing researchers to perform in-depth analysis while making the system accessible to field operators. This integration enhances the efficiency and precision of the identification and management of soybean germplasm resource organ characteristics.
3.2.2. Web-Based Management Platform: Core Functional Modules and Technical Implementation
The web-based platform serves as the core hub for system management, data processing, and algorithm deployment, addressing the challenges of large-scale germplasm resource data organization, multi-user collaboration, and model optimization. The key functional modules are detailed as follows:
User Authentication and Project Management: Users authenticate via the WeChat Mini Program using their registered mobile phone numbers (
Figure A2a), which ensures compliance with data privacy requirements of germplasm resource research. The project management module (
Figure A2c,d) supports the creation, modification, and collaborative management of germplasm investigation projects. It allows for the customization of crop organ attributes (e.g., leaves → leaf shape, pods → pod color) with up to two hierarchical levels to suit different crop species and research objectives. Additionally, project background images can be uploaded in various formats (png, jpg), facilitating visual project identification and field plot mapping, which improves the efficiency of data association between laboratory and field.
Crop Information Management and QR Code Labeling: The crop management module (
Figure A2e) supports dual-mode data input (manual entry and Excel import) to ensure compatibility with existing germplasm databases. Each crop accession is assigned a unique QR code, which is stored in a cloud storage (Alibaba Cloud OSS) for traceable identification. This approach solves the issue of inaccurate manual labeling in traditional germplasm investigation, enabling precise matching between physical accessions and digital data. The 1334 soybean materials used in this study are linked to a QR code, providing real-time access to crop details and investigation records.
Algorithm Deployment and Task Scheduling: The algorithm management module (
Figure A2f) integrates multiple crop phenotyping models deployed on the cloud via the Python Flask framework. This modular design allows researchers to flexibly add or configure algorithms for specific phenotyping tasks. The task management module (
Figure A2g,h) supports batch processing of image recognition tasks, enabling users to select target projects, crop attributes, and multiple algorithm models for concurrent analysis. The backend automatically splits tasks into subtasks for parallel computing, and generates structured output files (e.g., Excel and post-recognition images) with both quantitative and qualitative phenotyping data. This automation reduces manual data processing time by more than 50% compared to traditional methods, achieving a success rate of 98% or higher across test tasks.
3.2.3. Mobile Terminal (WeChat Mini Program “Zhong Zhi Hui Jian”): Field Data Collection and Real-Time Interaction
The mobile terminal is optimized for field operations, offering lightweight and efficient data entry capabilities, which bridge the gap between field investigation and laboratory data management (
Figure A3).
Quick Login and Workbench Design: Users can log in via WeChat authentication (
Figure A3b), eliminating the need for additional account registration. The workbench (
Figure A3c) integrates two essential modules: “System Applications” for project management and data entry, and “Image Recognition” for on-site phenotyping. This streamlined design allows field researchers to seamlessly switch between data entry and real-time recognition tasks.
Project-Crop Data Association and Efficient Entry: The project list (
Figure A3d) and crop list (
Figure A3f) modules facilitate quick searching and filtering of accessions. Project details (
Figure A3e) display key parameters, including crop count, attribute categories (e.g., flowering stage, seed color, lodging resistance), and team members. The data entry function (
Figure A3g,h) offers two modes: normal entry for detailed record input and quick entry for one-click data submission via QR codes. This dual-mode design adapts to different field scenarios, such as large-scale batch investigations and individual accession verifications, improving data entry efficiency by 45% compared to traditional paper-based methods. Recent entry records are automatically saved for traceability and error correction, ensuring data integrity.
3.3. Model Performance Evaluation
To evaluate whether the deployed models meet the high-speed and high-accuracy requirements for phenotypic data processing in real-world applications, we tested each model using a test set of 100 images. The phenotypic recognition results predicted by the models were compared with manually identified results, and the recognition accuracy was recorded. Note that since pod curvature is measured by the included angle, accuracy assessment is not performed.
The average prediction time per image (in seconds), total time per image for both inference and result downloading (in seconds), and model accuracy results are presented in
Table 2.
Figure 5 presents some example images of the test results for each model. Overall, the models for S_L (leaf shape) and C_F (flower color) achieve 100% accuracy, while the remaining models also reach 99% or 98% accuracy.
Regarding inference time and the time for image prediction and downloading, the models exhibit variability depending on tasks and model structures. Notably, the model for the D_S (seed detection) task has longer inference times and image processing times, likely due to the high seed density, resulting in a large number of detected targets and thus prolonged processing times. Additionally, factors such as network bandwidth, CPU, and GPU performance also influenced the image downloading speed.
4. Discussion
Large-scale germplasm resource experiments could incur substantial labor and management costs. Consequently, the standardization and batch processing of data collection and management have become essential trends. The Crop-IRM system incorporates both a web-based interface and a Mini Program interface, which together facilitate four primary functions: project creation, data collection, feature identification and analysis, and data management and download. These modules standardize the methods used for data collection and management, particularly in terms of phenotypic data.
Numerous studies have addressed breeding data management platforms and phenotypic analysis, such as research by Michelle Watt [
21] that provides a comprehensive overview of non-invasive phenotypic analysis techniques in crop breeding. However, field phenotypic measurements still face challenges, relating to high variability, costs and the complexity of processing large volumes of multi-dimensional data [
22]. This study addresses these challenges by integrating unique QR code identifiers with crop types, varieties, textual data, and images. This integration ensures data standardization and traceability. Additionally, the use of black light-absorbing cloth for image standardization has led to the creation of several valuable datasets for future research. The incorporation of multi-task models, including detection, classification, segmentation, and pose estimation, enhances the efficiency and intelligence of data analysis within the resource management framework.
Furthermore, we have constructed seven high-resolution subimage datasets for soybean organ characteristics. Based on the characteristics of each dataset, we selected the most appropriate models for training and deployment within the Crop-IRM system. Previous studies have primarily focused on the phenotypes of one or two soybean organs [
23] (Yu et al., 2024), often utilizing different methods than those employed in this study. Some studies have employed pose estimation techniques with different focal points [
24,
25]. In contrast, our study innovatively annotates five key points and calculates the angle formed by three of these points to define pot shapes, allowing users to set thresholds for angle-based shape differentiation. This approach provides a novel method for pod shape analysis.
Although individual optimization for each model is not performed in the current work, all models attain an accuracy rate of over 94%, with the highest accuracy model reaching 99.5% during the training phase. During testing, these models demonstrate a prediction accuracy exceeding 98% in the testing stage, satisfying the practical demands for standardized survey scenarios. Moving forward, a key focus for future improvement is enabling models to adapt to complex field environments without auxiliary backgrounds. We plan to collect more background-free image data and further optimize the models based on existing data to improve their robustness, background tolerance, and generalization performance under natural lighting and challenging field conditions.
Nevertheless, the current system still needs improvement. For instance, it only supports the collection and storage of textual and image data; future work will expand support for additional data types, such as integrating remote sensing data processing capabilities, which we are currently exploring. As demonstrated by Lu [
26], remote sensing techniques are valuable for monitoring crop light utilization. Integration of such capabilities would enhance the system’s ability to correlate phenotypic data with the ecological and physiological performance of germplasm resources.
5. Conclusions
This study proposes an integrated intelligent recognition and management platform for organ characteristics of crop germplasm resources, named Crop-IRM. The platform integrates web-based management, Mini Program data collection, and deep learning analysis, which provides systematic management of phenotypic traits. It effectively addresses challenges in field phenotypic data collection and management. Using soybeans as a case study, we demonstrated that the platform can efficiently manage and analyze phenotypic data for flowers, leaves, pods, and other traits, thus reducing manual labor costs for visual classification. The platform adopts a modular and standardized design that supports general-purpose germplasm management. For crops other than soybean (e.g., maize, wheat, and rice), the data management, QR code association, and field survey functions can be directly applied. When intelligent image recognition is required for new crops, the system can be extended by constructing corresponding phenotypic datasets and deploying tailored deep learning models. This provides a scalable and intelligent technical solution for the standardized evaluation and scientific management of crop germplasm resources.