This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessData Descriptor
DLCPD-25: A Large-Scale and Diverse Dataset for Crop Disease and Pest Recognition
by
Heng-Wei Zhang
Heng-Wei Zhang 1,†,
Rui-Feng Wang
Rui-Feng Wang 2,3,†
,
Zhengle Wang
Zhengle Wang 4 and
Wen-Hao Su
Wen-Hao Su 1,*
1
College of Engineering, China Agricultural University, 17 Qinghua East Road, Haidian, Beijing 100083, China
2
Department of Agricultural and Biological Engineering, University of Florida, Gainesville, FL 32611, USA
3
Department of Crop and Soil Sciences, College of Agriculture and Environmental Sciences, University of Georgia, Tifton, GA 31793, USA
4
College of Information and Electrical Engineering, China Agricultural University, 17 Qinghua East Road, Haidian, Beijing 100083, China
*
Author to whom correspondence should be addressed.
†
These authors contributed equally to this work.
Sensors 2025, 25(22), 7098; https://doi.org/10.3390/s25227098 (registering DOI)
Submission received: 31 October 2025
/
Revised: 14 November 2025
/
Accepted: 18 November 2025
/
Published: 20 November 2025
Abstract
The accurate identification of crop pests and diseases is critical for global food security, yet the development of robust deep learning models is hindered by the limitations of existing datasets. To address this gap, we introduce DLCPD-25, a new large-scale, diverse, and publicly available benchmark dataset. We constructed DLCPD-25 by integrating 221,943 images from both online sources and extensive field collections, covering 23 crop types and 203 distinct classes of pests, diseases, and healthy states. A key feature of this dataset is its realistic complexity, including images from uncontrolled field environments and a natural long-tail class distribution, which contrasts with many existing datasets collected under controlled conditions. To validate its utility, we pre-trained several state-of-the-art self-supervised learning models (MAE, SimCLR v2, MoCo v3) on DLCPD-25. The learned representations, evaluated via linear probing, demonstrated strong performance, with the SimCLR v2 framework achieving a top accuracy of 72.1% and an F1 score () of 71.3% on a downstream classification task. Our results confirm that DLCPD-25 provides a valuable and challenging resource that can effectively support the training of generalizable models, paving the way for the development of comprehensive, real-world agricultural diagnostic systems.
Share and Cite
MDPI and ACS Style
Zhang, H.-W.; Wang, R.-F.; Wang, Z.; Su, W.-H.
DLCPD-25: A Large-Scale and Diverse Dataset for Crop Disease and Pest Recognition. Sensors 2025, 25, 7098.
https://doi.org/10.3390/s25227098
AMA Style
Zhang H-W, Wang R-F, Wang Z, Su W-H.
DLCPD-25: A Large-Scale and Diverse Dataset for Crop Disease and Pest Recognition. Sensors. 2025; 25(22):7098.
https://doi.org/10.3390/s25227098
Chicago/Turabian Style
Zhang, Heng-Wei, Rui-Feng Wang, Zhengle Wang, and Wen-Hao Su.
2025. "DLCPD-25: A Large-Scale and Diverse Dataset for Crop Disease and Pest Recognition" Sensors 25, no. 22: 7098.
https://doi.org/10.3390/s25227098
APA Style
Zhang, H.-W., Wang, R.-F., Wang, Z., & Su, W.-H.
(2025). DLCPD-25: A Large-Scale and Diverse Dataset for Crop Disease and Pest Recognition. Sensors, 25(22), 7098.
https://doi.org/10.3390/s25227098
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.