Towards Data-Driven Decisions in Agriculture—A Proposed Data Quality Framework for Grains Trials Research

Chadha, Aakansha; Robinson, Nathan; Channon, Judy

doi:10.3390/data11010019

Open AccessArticle

Towards Data-Driven Decisions in Agriculture—A Proposed Data Quality Framework for Grains Trials Research

by

Aakansha Chadha

^*

,

Nathan Robinson

and

Judy Channon

Centre for eResearch and Digital Innovation, Federation University, Ballarat, VIC 3350, Australia

^*

Author to whom correspondence should be addressed.

Data 2026, 11(1), 19; https://doi.org/10.3390/data11010019

Submission received: 29 October 2025 / Revised: 6 January 2026 / Accepted: 7 January 2026 / Published: 13 January 2026

Download

Browse Figures

Versions Notes

Abstract

Future agriculture will depend on smart systems and digital technologies to improve food production and sustainability. Data-driven methods, such as artificial intelligence, will become integral to agricultural research and development, transforming how decisions are made and how sustainability goals are achieved. Reliable, high-quality data is essential to ensure that research users can trust their conclusions and decisions. To achieve this, a standard for assessing and reporting data quality is required to realise the full potential of data-driven agriculture. Two practical and empirical data quality assessment tools are proposed—a trial data quality test (primarily for data contributors) and a trial data quality statement (for data users). These tools provide information on data qualities assessed for contributors to the submitted trial data and those seeking to use the data for decision support purposes. An action case study using the Online Farm Trials platform illustrates their application. The proposed data quality framework provides a consistent approach for evaluating trial quality and determining fitness for purpose. Flexible and adaptable, the DQF and its tools can be tailored to different agricultural contexts, strengthening confidence in data-driven decision-making and advancing sustainable agriculture.

Keywords:

decision systems; data quality dimensions; DQF; AI; digital farming

1. Introduction

Global food security and the challenge of increasing yield while reducing humans’ environmental footprint remains a critical objective for 21st century agricultural production systems. To meet this increase in global food demand [1] for both humans and livestock, cereal production is expected to increase to 3.1 billion metric tonnes by 2032: an increase of 320 million metric tonnes (Mt) from the 2020–2022 production estimates [2]. Oilseed production is also anticipated to increase 13% in the next decade (up 70 Mt to 604 Mt), with soybeans accounting for much of this increase. For this increase in global grains production while simultaneously increasing profitability for grain and oilseed producers, novel field trials and experiments that are focused on multiple facets, including plant breeding, climate change adaptation, integration of emerging technologies, pest control and the optimisation of crop nutrition and soil amelioration, will be pivotal [3,4,5,6,7,8].

Experimental trials and demonstrations continue to serve as a mainstay in the advancement of agricultural systems across the globe. Future agriculture, or Agriculture 4.0, will rely on smart systems and digital technologies within agriculture to contribute towards developments for enhanced food production and sustainability [9,10]. Here, the emergence and integration of data-driven techniques into agricultural research is instigating a revolutionary shift in agrifood production systems worldwide [11,12]. However, an overall lack of data governance and standardisation has been systemic for agricultural data [13,14,15], inhibiting its expanded utility for analytical purposes.

In an era of digitalisation and digitisation of farming, data is central to these decisions [16,17] and modern scientific discovery. The extrinsic and intrinsic qualities of data [18] are crucial for scientific progress and trust in these outcomes. High-quality data in experimentation encourages data reuse, enables meta-analyses, preserves information for future interpretation and ensures users can have confidence in the conclusions they form and the decisions they make [19,20,21].

To aid these data-driven techniques, high data quality, or higher- or better-quality data from trials can lead to improved data-driven decisions, whereas lower-quality data leads to difficulties in replication and reduced confidence in research outcomes and information dissemination [11]. The use of machine learning (ML), artificial intelligence (AI) and meta-analyses in agriculture are dependent on data, wherein the good quality, consistent and complete data is fundamental to train the model better, consequently resulting in better decisions or optimisation [22]. This enables the full potential of the data collected to be harnessed through a variety of uses [17,23,24,25]. The substantial volume of data generated by research institutions, when amalgamated with advanced analytics and emerging technologies such as AI, provides the opportunity to deliver significant advantages, including step changes to yield improvements and global food production and security.

In this article, we use the term “data-driven decisions in agriculture” to refer to operational, tactical and strategic choices made by growers, advisers, researchers, industry and policy makers that are explicitly informed by quantified and analysed evidence from agriculture, rather than relying solely on experience or anecdotes. In grains systems, such decisions include the selection of crop varieties, fertiliser and input regimes, sowing and harvest timing, crop rotations and risk management strategies that are informed by trial results and their associated metadata. Because these decisions increasingly depend on aggregating and re-using trial data from multiple sources, regions and years, their robustness is directly constrained by the quality, completeness, coherence and interoperability of the underlying data. This dependency provides the rationale for a data quality framework (DQF) that is specifically tailored to grains trials research.

Transforming data into valuable insights and assuring that this data, along with the relevant contextual information (metadata), is stored and managed in a manner that ensures their clarity and accessibility to both current and future analysts, is equally crucial [26]. To maximise the benefits of the trial data for the agricultural industry, users of trial data and information, as well as those that commission these trials, need to have confidence in the rigour of the science undertaken and the reported findings. To guide and inform users and creators of these qualities in data, an overarching standard methodology to assess and report these, otherwise known as a data quality framework (DQF), is required. Implementing a DQF will be a pathway to improve data quality, which in turn would build trust in and the usability of the data [27]. Using a DQF, organisations can define best practices in the data collection and presentation for decision-making while enhancing their internal data management processes [28], ensuring a continuous improvement cycle for the generation of data [29]. Within agricultural systems, the specific domain for this article is grains trial research. Hence, this article aims to answer the following questions:

What are the fundamental principles of data quality and how can they be represented within a data quality framework (DQF) for grains trial research?
Which data quality dimensions are the most useful for assessing and reporting data quality in grains trial research and how do we define those within a DQF?
How can these quality dimensions be translated into practical tools, e.g., a trial data quality test or data quality statement that can facilitate improved data reuse and decision support from contributors and other end-users?
How can the proposed DQF and its tools be implemented in a platform such as Online Farm Trials (https://www.farmtrials.com.au/, 6 January 2026) and what can we learn from these use cases about their impact?

This paper adopts a structured, evidence-informed approach to answer these questions. First, a synthesis of established and contemporary data quality frameworks is used to identify the fundamental principles and dimensions of data quality that are most relevant to grains trial research (Section 2). Building on this synthesis, we conceptualise a data quality framework (DQF) that is tailored to the operational and analytical requirements of grains trials, defining its core components and their interactions (Section 3). The framework is then operationalised through two practical tools—a trial data quality test for contributors and a trial data quality statement for users—which are designed to translate quality dimensions into actionable and transparent assessments (Section 4.1). Finally, the applicability and implications of the proposed DQF are demonstrated through its application to the Online Farm Trials platform, providing insights into implementation challenges, potential impacts on data reuse and trust and lessons for broader adoption (Section 4 and Section 5). Together, these elements show how a domain-specific yet transferable DQF can support more reliable, reusable and decision-ready agricultural trial data.

2. Data Quality: Its Dimensions and Principles for Grains Trials and Research

ISO 9001:2015 [27] (https://www.iso.org/standard/62085.html, 6 January 2026) defines quality as the measure of one set of inherent characteristics meeting the needs of a set of requirements. Data quality is defined in many ways, reflecting the ‘emphasis’ or ‘priority’ an organisation or person has for the ownership, distribution or use of the data. While data quality or information quality has various definitions, fitness-of-use or potential use is the most widely accepted [23,30,31,32,33]. The notion of fitness for purpose has also been noted, with quality labelled a multidimensional concept that also includes aspects such as relevance and interpretability [34].

Data quality, or rather the impacts of poor data quality, can impact trial research outcomes and decision-making by farmers where erroneous results or conclusions are formed. This can have major financial and reputational impacts on organisations delivering trials research. Poor or absent annotation of units of measurement (e.g., kg/ha; t/ha; lb/ac; hectograms/ac; g/m²) is one example where assumptions, or incorrect assignment, may lead to significant errors in analysis, modelling and conclusions. With the increased prevalence of machine learning and technologies, including sensors, data quality assurance with grains trial data will be even more critical to minimise error and bias that may eventuate in predictions, decisions and compound errors made on such data.

A structured review and synthesis of foundational and contemporary data quality frameworks were conducted. Appendix A summarises the key publications analysed, representing four decades of work on data quality. Each dimension mentioned in these studies was extracted, tabulated and compared in terms of frequency, conceptual similarity and relevance to agricultural experimentation. Dimensions that appeared consistently across multiple independent frameworks (e.g., accuracy, completeness, timeliness, accessibility) were identified as core data-quality constructs. Other dimensions were evaluated for their applicability to grains trials, based on characteristics that are unique to agricultural research, including variability in trial methodologies, dependence on metadata, spatial–temporal comparability and the need for scientific interpretability.

Through this synthesis, seven dimensions emerged as both theoretically well-established and practically essential for assessing the utility of grains trial data: accessibility, accuracy, coherence, institutional environment, interpretability, relevance and timeliness. These dimensions represent the intersection of the most frequently cited dimensions across the literature, as highlighted in Appendix A, and the requirements of grains trial research, where decision-support value relies not only on statistical soundness but on metadata richness, methodological transparency and inter-trial comparability [35]. The final set of dimensions therefore reflects a convergence of evidence from the literature and the operational needs of digital agriculture and, specifically, grains industry stakeholders.

To deliver quality data for agriculture-related decisions, principles of quality should be applied to all phases of data management, including capture, digitisation, storage, analysis and accessibility. Improving data quality can be achieved by preventing errors in the data management process, but also by correcting data where it is required or found to be erroneous [36]. Reducing the potential for error in data management is considered a better solution than dealing with error detection (i.e., prevention is better than cure), which can be costly to implement and maintain. It is strongly recommended that a ‘vision’ is established and supported through a data ‘policy’ and implementation ‘strategy.’ Data quality then follows once these foundations for data quality assessment and reporting are established [31].

3. DQF Conceptualisation for Grains Trial Research

A framework for data quality includes all aspects of data quality management linked to the data quality vision. A proposed DQF for grains trials and experiments comprises different components (Figure 1), including a data quality strategy, data quality assessment (tool), reporting on data quality and a continuous cycle for data quality improvement. Here, it is assumed the data quality vision and policy has been set by industry and government authorities for regulatory and compliance purposes. A data quality strategy for grains trials is a formalised approach to address data quality that sets out the activities that the trial contributors should adopt to strengthen their approach to the collection, handling, use and dissemination of data and information. Data quality assessment tools are a set of measures to comprehensively assess data across quality dimensions for these trials and experiments, as listed in Table 1. Following the assessment of data quality, a ‘data quality statement’ is produced, which reports on key assessment criteria for data quality to inform contributors and users of trial data of relevant qualities for their purposes. The data quality report and statement then support and inform an over-arching continuous data quality improvement cycle to further promote and guide those in grains trials for all aspects of trial data quality management.

Data quality dimensions were adopted specifically for standardised, statistics-driven reporting. These dimensions also serve as the foundation of statistical collections that enable the production of high-quality outputs [34]. For implementation in grains trials, all seven adopted dimensions (Table 1) were considered for data quality assessment and reporting, as they form a comprehensive and relevant set of priorities for managing and improving data quality from grains trials research. The seven dimensions may not necessarily be equally weighted, as the importance of each dimension may vary, depending on the data source and context.

Accessibility: The concept of accessibility pertains to users’ convenience in reaching grains trial data, encompassing the simplicity of detecting information’s presence and the compatibility of the medium or format for information retrieval. This is equivalent to the findable and accessible principles of FAIR data [37].

Accuracy (and precision): The degree to which the data depict the phenomenon they were intended to measure is referred to as accuracy and precision (accuracy hereon). In grains trials, this is important to determine how accurately data reflect reality, which directly affects how helpful and significant the data will be in analysis and for interpretation, decision-making or subsequent research.

Coherence: In a broad context and over time, coherence refers to the internal consistency of a statistical collection, product or release, as well as its comparability with other sources of information. Coherence is used to confirm if a dataset can be usefully compared with other spatially or temporally distant trials. Any elements that can alter the comparability of the data across time, such as weather conditions during the trial period or changes in the data or measurement types to ensure logical consistency, must be described in quality statistical measure declarations.

Institutional environment: The institutional and organisational aspects of the trial contributor are referred to in this dimension. These quality aspects may have a substantial impact on the efficiency and authority of the organisation. The institution in which the data is generated should also be considered, as it allows for an evaluation of the external context, which may affect the data’s validity, reliability or suitability.

Interpretability: The availability of contextual information about the data collection process and data collected that provides insights into the data is referred to as interpretability. This contextual data could include (but not be limited to) description of the variables used and the availability of metadata including concepts, classifications and measures of accuracy. Availability of information regarding the data collected, such as soil measurements and observations (data) from various sample depths and paddock history aid in the interpretation of results and in decision-making.

Relevance: This dimension concerns how well the measured concept(s) and the represented population(s) suit the demands of the users of the data. It is crucial to consider the relevance of data, as it permits an evaluation of data fitness to address the issues that matter most to producers, agronomists, policymakers, researchers and the wider agriculture sector.

Timeliness: The term “timeliness” refers to the intervals between the reference period (to which the data apply) and the date at which they become available, in addition to the interval between the advertised date and the actual release date. These factors should be considered when evaluating quality, as significant gaps between the reference period and the data’s availability or between the anticipated and actual release dates may affect how current or reliable the data are or what can be inferred with confidence.

4. Action Case Study Using Online Farm Trials to Demonstrate Data Quality Assessment and Reporting

Online Farm Trials (OFT) is a web-based information system developed by the Centre for eResearch and Digital Innovation (CeRDI) at Federation University Australia, with funding from the Grains Research and Development Corporation (GRDC) [38,39]. Established in 2014, OFT aggregates over 7000 trial projects from over 90 contributors, ranging from government organisations and universities to grower groups, showing the diversity of data sources and formats in the grains sector [40,41]. Its primary purpose is to maximise access to current and past grains industry research data to improve the productivity and sustainability of Australian farming enterprises [38,39,40,42]. OFT functions through a Trial Explorer for searching and filtering trial projects, and a Map portal and linked library for accessing research documents [38]. Its central role in digitising legacy data, supporting the FAIR (findable, accessible, interoperable and reusable) data principles and addressing the historical challenges of data silos underlines its strategic importance in agricultural knowledge and decision-making systems [39,43,44].

The OFT platform serves as a robust case study on data quality in the agricultural sector, due to its scale, history and the complexities it embodies in managing agricultural data. However, there are some ongoing challenges, such as inconsistent data quality, varied metadata standards, and user trust issues, which highlight the real-world barriers to achieving high data integrity and interoperability [41,42]. These characteristics make OFT a practical case study for exploring how digital platforms in agriculture can both succeed and struggle in ensuring data quality across a fragmented (because organisations have their own data collection and management system) and evolving research ecosystem.

Despite the growing utility of the Online Farm Trials OFT system in Australian agriculture, significant concerns around the quality, completeness and trustworthiness of its data continue to hinder its wider adoption and utility. These challenges stem from inconsistencies in data contribution practices, lack of standardisation and issues in ensuring long-term usability and interoperability.

The main challenges related to data quality include the following:

Inconsistent and incomplete trial data submissions. Many trials have missing metadata or incomplete documentation, impacting their utility and trustworthiness for research and decision-making purposes [41,42]. Many legacy trial reports, for instance, do not contain all the necessary information to generate searchable metadata within OFT [20]. An interview participant noted, “Sometimes the data isn’t complete that you access, sometimes it’s hard to get the exact parameters for which the data is gone in …” [41].
Lack of standardisation across trial contributors. Contributors vary widely in how they collect, structure, report and label data, which impedes comparison and synthesis across trials [39,42]. In an interview conducted regarding OFT, a participant said “I think that, before you can use the data from OFT to make decisions or include it in planning, you need to have some way of making a choice about what the evidence is. Some way to indicate clearly level of quality and how reliable it really is. Then you can use it or not use it or use it with conditions” [38]. Another interview participant highlighted, “Good scientific rigour is that you should question all the data and make sure the statistics are right and the methodology makes sense to you to know yourself…you’ve got to take responsibility to know whether you trust the data or not” [42].
Difficulty in assessing comparability and scientific validity. Users face challenges in comparing trials due to inconsistencies in methodologies, limited access to raw data and the absence of clear, uniform quality ratings [38]. As one interviewee explained, “Unless there is the capacity to go back to the raw data then how do you check the scientific validity of what is being available. You need this to be sure in what you are deciding and whether what you are looking at in the summaries captures what the raw data found” [38].
Perceived trust and quality issues. Users repeatedly expressed concern about the accuracy and reliability of the available data, which affects their willingness to use OFT for critical decisions [42]. An interviewee made this statement about the quality of data and the trust in using it: “I think that, before you can use the data from OFT to make decisions or include it in planning, you need to have some way of making a choice about what the evidence is. Some way to indicate clearly level of quality and how reliable it really is. Then you can use it or not use it or use it with conditions” [38]. While enabling contributors to directly enter data improves efficiency, it has introduced challenges in maintaining consistent quality control, especially for non-mandatory metadata fields, as these may not be adequately monitored before publication [20].
Challenges with metadata and FAIR Principles. While OFT aligns with FAIR data principles, the implementation varies, particularly in metadata quality and interoperability across systems [41,44]. Enhancing the FAIRness of data, especially by improving interoperability and advancing along the machine-readability continuum, remains a significant challenge, yet it also offers a valuable opportunity for OFT and the wider grains and agricultural research community to boost data accessibility, integration and impact [43].

4.1. Proposed Reporting on Trial Data Quality

The goal of the proposed DQF for OFT is to provide users with a standard for assessing the ‘qualities’ of a trial and the degree to which it is fit for their intended purpose. For contributors, it is to identify areas for improvement in data submissions, as ultimately users will define if the data qualities fulfil their needs and requirements (reuse and utility). Devising and implementing an inclusive yet accurate scheme for the assessment of trial quality involves many elements, including understanding and communicating the importance of data quality principles to trial contributors; devising tests to determine if data quality is being upheld; providing guidance on best practice and supporting trial contributors to achieve improved data quality governance and control.

Two tools for data quality assessments are proposed—a trial data quality test (primarily for data contributors) and a trial data quality statement (for data users). These tools provide information on data qualities assessed for contributors of the submitted trial data and those seeking to use the data for decision support purposes. A summary report combining the data quality test (Section 4.1.1) and data quality statement (Section 4.1.2) is presented as Figure 2.

4.1.1. Trial Data Quality Test

The proposed trial record data quality test questionnaire (Appendix B) is divided into five sections, based on filters with increasing complexity through the progression from within variable, between variables, between records, between tables and between systems [45]. There are 65 data quality questions (including 102 sub-questions) with the result for each of the 65 questions being either a pass, fail, warning or unchecked/missing data. All questions are equally weighted in this test. This validation of the quality of trial data provides contributors (and users) with a standard for assessing the ‘qualities’ of a trial that is desired for their purpose. An example output containing the ‘data quality test’ summary is provided in Figure 2, where of the 65 questions assessed, 59 were passed, 2 were failed and 4 were unchecked. There were eight questions with warnings issued for questions that were passed. A total score is assigned from questions passed minus those questions failed, unchecked or with warnings. In this example, 51 questions were passed without warnings, achieving a total score of 78% [(51/65) × 100]. This score is then assigned a categorical ranking based on the proposed scheme: 0–20% = very low; 20–40% = low; 40–60% = moderate; 60–80% = high; 80–100% = very high.

4.1.2. Trial Data Quality Statement

The proposed trial data quality statement is based on questions from the seven data quality dimensions (Table 1). There are five questions for each dimension and for each question, yes (✓) = 1 and no (X) = 0, with a maximum possible score of five for each dimension. The highest cumulative score is 35 (seven quality dimensions with a maximum score of five for each dimension). As per the data quality test, all questions in this statement are equally weighted, with the score thresholds expertly defined. The overall score determines the data quality level rating for the trial with the ratings being as follows: low (0–14), low–medium (15–21), medium–high (21 to 28) and high (28–35). The scores are presented as scale bars for each dimension (Figure 2).

Table 2 lists the five assessment questions for each of the seven data quality dimensions used to fulfil the trial data quality statement. The statement can be provided as a separate reportable or executable that a user can access should they choose to look at the questions and how well the trial meets their quality needs.

4.1.3. Process for Improving Data Quality in OFT

Based on the trial data quality test and the data quality statement, improvements are suggested to the system as part of the data’s continuous improvement cycle, including:

Minimum trial metadata for admission of a trial into OFT include additional mandatory fields to enable detailed and uniform assessment of trial utility. These additional data entry fields include the following: improved trial location coordinates, crop variety information (compliant with industry standards), trial design details (experimental plot design, number of replicates, blocking, plot randomisation and plot size), harvest and sowing dates, soil sampling details (date sampled, what tests and sample depth), sowing details (tillage type and depth, row spacing, sowing depth), harvest details and treatment details (herbicide/insecticide/fungicide used, application timing and the rate of application).
Provide more information about the trial contributor to check against the institutional environment assessment criteria.
Identify trial locations with a unique identifier, so multiple trials conducted at one site can be linked.
Implement common and globally accepted vocabulary and data standards for trial data (e.g., sowing rate as kg/ha rather than use of legacy measures and observations such as lb/acre).
Automated data input from key technologies (e.g., geographic information systems (GIS), GPS, online structured data collection forms) to remove transcriptions errors, enable real-time data processing and remove the likelihood of other quality issues that arise in data collection processes.
Validation of dates and their logical sequence (e.g., sowing date prior to harvest date).
Trials should be assessed according to their classification. A key consideration for grains trials is that not all trials have been designed with multiple users and purposes in mind, and a framework should not seek to discredit a trial for its intended purpose. For example, quality assessments will be different for demonstration trials compared with research trials.

4.1.4. Validation and Refinement of the Proposed DQF

Future investigations will focus on validating and refining the proposed DQF through systematic stakeholder engagement and testing. This will involve consultation with growers, grower groups, researchers, industry organisations and data custodians to ensure the framework’s assessment criteria, scoring thresholds and metadata requirements align with practical needs and industry expectations. Pilot implementation of the DQF across a representative set of grains trials will enable empirical calibration of scoring thresholds, including the weighting of questions used in the data quality test and statement, the evaluation of usability and the identification of opportunities to streamline or enhance the assessment tools. Insights gained from this process will inform iterative improvements to the framework and support the development of guidelines, training materials and implementation pathways for broader adoption across the grains sector and other agricultural industries.

5. Discussion

There are many varied techniques for assessing data quality, offering numerous solutions for improving it. However, despite DQF methodologies accepting that data, and indeed data quality, needs will vary from organisation to organisation, there is consensus that the DQF methodology chosen should be defined by the purpose of the data. In other words, when it comes to DQFs, form must follow function.

In a review of 12 general DQFs, it was found that whilst there were differences in the DQFs reviewed, the qualities of completeness, timeliness and accuracy appeared to be common across all [47]. These three dimensions are all integrated into the expansive proposed DQF for grains trial research proposed here. It is also recognised that DQFs are not always generalizable and that specific domains may require special purpose frameworks [47]. A DQF applied to grains trials will enhance the statistical relevance and reliability of the data, as it will provide an interpretation of data that is relevant to a series of key parameters to inform users of potential trial data qualities immediately. The DQF will improve system quality controls, transparency, usability and trial resource presentation in response to end-user requirements, which will conversely build trust in trial outputs. The trial data quality statement and the trial data quality test both provide assessments of the intrinsic qualities of the presented trial data. It is anticipated that a DQF will increase trust in the data and hence maximise the use of data to create valuable insights to benefit global agrifood production. The trial data quality tests can also guide organisations that are conducting or managing trials on principles to improve the rigour and utility of their investment in trials.

Improved data quality can facilitate open data sharing and cross-collaboration, which are arguably cornerstones for advancing research in this domain. To benefit from the volume of data that is produced today, researchers must share it in ways that promote interpretability and usability [48]. The DQF will also make it easier for researchers to assess a dataset and easily identify defective data for correction to ensure high-quality datasets in use and reuse. Whilst high quality should be the aspiration for all data and collections, it is of particular importance for open-source data. It can be assumed that researchers and organisations who are aware that their data will be open-source and potentially seen and judged by their peers are incentivised to conduct higher-quality research to produce higher-quality data. High-quality data creates trust in the data and open data sharing of high-quality, trusted data allows for not only the verification of results, but reduces the need for duplication, allowing others to build on the studies that have come before. However, this does not guarantee the translation into higher-quality research [49].

Based on the proposed DQF for grains trials, the best practice would require adherence to a minimum standard to ensure consistency, utility and interoperability with other data to enable future trial analysis and comparability [50]. Data quality is at the core of data-driven decision-making; hence, improvements in data quality and its monitoring will be an integral part of future research directions for global agriculture [51,52]. A potential future research direction for data quality impacts on decision-making is to investigate the application of ML algorithms to automate detection and suggest improvements in data quality issues [53]. To maintain the authenticity and reliability of trials data, blockchain technology may offer solutions to enhance data quality, integrity and data trust [54,55]. As data being collected on farms is increasing every day, due to the use of various kinds of proximal and in-field sensors (e.g., soil moisture, weather, precision nutrient sensors), cameras, satellite platforms and drones, this will generate big data and it is timely to prepare for the unique challenges that will be associated with large quantities of data, including scalability issues, data cleaning, validation and integration from diverse heterogenous sources [9,56,57]. With the use of technology and data being generated in real-time, data quality and all its aspects, including monitoring, data cleaning and data validation, will also need to be managed in real time to perform data analytics that are central to digital decision systems [54].

As data quality is defined as the degree to which a set of inherent characteristics of data fulfil the requirements [58], it is important to explore how data quality requirements and standards can be adapted based on the specific context of data usage. Research should focus on evaluating the effectiveness, efficiency and scalability of various data quality tools and techniques to help users choose the most suitable solutions based on the data and purpose. Further consideration must be given to the opportunities and challenges of high-quality open-source data on informing current AI algorithms, as well as considerations for future disruptive technologies. Another area that requires further investigation is that of AI capabilities in relation to high-quality data: particularly high-quality open-access data. AI capabilities have increased dramatically over recent years, with AI algorithms now able to mine raw data from texts to images. This raises questions around the ethics of AI for data mining, which may impact the rate of data hallucinations, fabrications and misinformation from mined data [59]. Further research will need to be conducted in this space in relation to high-quality open-source data.

Tools are being developed to assist in this process of data standardisation and data harmonisation, thereby making the task of identifying data quality issues of a given dataset easier. For example, the Data Risk Assessment Tool (DRAT), which helps to create appropriate data classification categories to identify and tag sensitive data and to determine privacy associated with data [60], is now being adapted to provide a quality assessment of data. Having a robust data quality assessment means that the DRAT can support interoperability in the data harvested by data repositories. Whilst tools such as the DRAT are valuable, the effective use of any data quality tool will require integration with vocabularies across sectors and standardised practices, particularly in data curation and additional training and upskilling.

The proposed DQF can be systematically replicated because it operationalises data quality through explicit, standardised assessment questions that can be applied to any structured agricultural trial dataset. Both components of the framework—the data quality test and the data quality statement—provide clearly defined questions, scoring rules and decision criteria that enable independent users to reproduce the assessment process without reliance on subjective judgement or local context. While the framework has been designed for grains trials, its structure is intentionally non-domain-specific, allowing for the same questions and scoring methodology to be adapted for other agricultural enterprises such as cotton, sugarcane, horticulture or pasture systems. By adjusting crop-specific metadata fields while retaining the seven core dimensions and their underlying assessment logic, researchers and industry groups can replicate the DQF to evaluate the quality, completeness and comparability of their own trial data [35]. This transferability ensures that the framework not only strengthens data quality governance within the grains sector but also provides a scalable model for broader agricultural research domains seeking consistent and transparent data quality evaluation.

To illustrate the transferability of the proposed data quality framework (DQF), an example from horticulture trial research is used.

In horticultural systems, trials often involve high value crops, shorter production cycles, multiple harvest events and greater sensitivity to microclimate and management interventions. Applying the DQF in this context would retain the same seven core data quality dimensions but would require adaptation of metadata requirements and assessment questions. For example, minimum metadata would need to capture cultivar-specific phenology, harvest frequency, post-harvest handling conditions and quality attributes such as size, colour or sugar content, in addition to yield. Coherence and interpretability assessments would place greater emphasis on temporal alignment across multiple harvests and on consistent quality-grading standards. Timeliness thresholds may also be tightened, as decision value in horticulture is often seasonally constrained and market-driven. No structural change to the DQF would be required; only domain-specific metadata fields, vocabularies and scoring thresholds would be adjusted.

5.1. Lessons Learned

The DQF proposed in this paper has relevance to many areas in agriculture where data-driven decisions will be central to future farming. An example is the Australian Agriculture Sustainability Framework (https://aasf.org.au/, 6 January 2026), where agricultural data in a broader data ecosystem will be central to indicator assessment, measures and metrics that are required for reporting purposes. Without quality assured data, the underpinnings of the AASF are tenuous and the policy implications are significant. Likewise, for agricultural market value chains, quality certified data will be necessary to ensure that all stakeholders have transparency and trust in the outcomes and decisions that are built on underlying data from various agriculture sources. Precision agriculture, as another example, is dependent on reliable data where imprecision and inaccuracy can lead to detrimental effects on crop and animal performance and economic imposts on farmers and food consumers. Calibration, interpretations, data validation, the handling of vast quantities of data and its standardisation and processing [61] are some of the key management issues impacting data quality. Through the adoption of digital technologies to reduce potential error in data handling, this can enhance the qualities of the trial data shared and provide more trust in the data for users and confidence from contributors in the reuse of this agricultural research.

It is necessary to ensure that the use and reuse of data can be assured, with confidence in its qualities and potential utility clearly explained. For data contributors and users of this data, the benefits of sharing data via open-access platforms such as OFT support the potential for cross-collaboration among researchers and industry, as they have an avenue to now find that the trial and data exists and a point of contact to engage on the research and areas of interest. Collaborative working is more likely when high-quality trusted data is available and open sharing platforms such as OFT can increase the scrutiny of data and ultimately lead to improved data accuracy and reliability. Furthermore, an open sharing high-quality data platform allows access to data, especially for researchers in under-resourced institutions or countries, across multi-disciplinary domains, which in turn can encourage novel applications and interdisciplinary insights. This provides a pathway to begin engagement with data sharing and reuse and how the qualities of the data will set the level of trust in its reuse and interpretation for other purposes. For the current and future decision support systems, data quality is often assumed and rarely questioned. For the growing dependency on good quality agricultural data for future decisions, it is time for a broader discussion on the qualities of data used in agrifood production and security for the globe.

The implementation of a DQF is not necessarily an easy path once trodden; rather, it will require ongoing support, conviction and a concise need that is easily conveyed and understood by all stakeholders in the data’s use. Enhancing data quality requires stronger co-design processes involving data contributors, to develop standardised reporting templates and metadata protocols, ensuring uniformity across all submissions [20,38,39,40]. Addressing potential barriers to adoption, such as resistance to new practices or lack of training, would make the framework more practical and actionable. Stakeholders (e.g., researchers, farmers, industry contributors) will need training and support, combined with increased industry engagement and visibility to understand why data quality matters, how they can contribute to improved data quality and how their needs and uses can be connected to the quality assessments delivered through a DQF [39,41,42]. While there may be resistance to the adoption of a pathway for improved data quality, the growing demand from many stakeholders across food supply chains for quality-assured and controlled data for purposes such as traceability and biosecurity will be driven by regulation and market access purposes [62,63]. Furthermore, continuous improvement through dedicated service teams and clear guidelines for contributors, coupled with exploring options for mandated data uploads, has also been suggested to ensure sustained data quality and consistency [20,38,42].

5.2. Limitations

It must be noted that whilst this is a proposed framework for data quality assessment and communication with users and contributors of such trial data, there are limitations that need to be considered in the reading of this paper: most notably that this framework has not been validated at a large scale. Furthermore, it is noted that OFT is a grains trial portal that specifically references growing regions and crop types but does not include every grain type or cropping region outside of Australia, due to the nature of the portal scope. Current data quality practices within grains trial research typically rely on implicit, ad hoc and highly variable procedures, rather than a unified or standardised approach. The proposed DQF therefore represents a shift from these informal practices toward an explicit, structured and multi-dimensional assessment framework. However, as the DQF has not yet been implemented or tested operationally, a quantitative before-and-after evaluation of its effectiveness could not be undertaken in the present study. This type of empirical validation will form an important component of future work.

6. Conclusions

This study explicitly addresses the four research questions posed in the introduction and demonstrates how a data quality framework (DQF) can be systematically designed, operationalised and applied to grains trial research. The synthesis of existing frameworks in the data quality literature (Appendix A) has grounded the selected dimensions of accessibility, accuracy, coherence, institutional environment, interpretability, relevance and timeliness to represent the data quality needs and demands of grains industry stakeholders. The underlying fundamental principles of data quality that are applicable in this context are fitness-for-use, transparency and comparability.

The seven selected dimensions provide a scientifically defensible and practical basis for assessing grains trial data, as they directly support trial data reuse, synthesis and decision-making. The formation of this framework using these dimensions is translated into practical tools through a trial data quality test and a trial data quality statement, delivering explicit, reproducible assessments that support both contributor improvement and user trust. Finally, the application of the DQF within the Online Farm Trials platform demonstrates how such a framework can be implemented in a large scale, heterogeneous data environment, and why this approach has scientific merit. It replaces ad hoc judgement with transparent, repeatable evaluation and enables continuous improvement through clearly defined standard assessment and reporting mechanisms.

While the framework is tailored to grains trial research, its structure allows it to serve as a transferable template for other agricultural industries by retaining the core dimensions while adapting domain-specific metadata, vocabularies and thresholds. The recommended next steps include large-scale testing of the framework with stakeholders, refinement through the proposed continuous improvement cycle and formal comparative evaluation against existing data management practices. Future work will involve implementing the DQF with trial contributors, assessing data quality before and after adoption, and quantifying improvements in completeness, consistency, interpretability and usability, relative to current ad hoc approaches. Such validation will provide evidence-based confirmation of the framework’s effectiveness and support its broader adoption across data-intensive agricultural research systems.

Author Contributions

Conceptualisation, N.R.; Methodology, N.R. and A.C.; Writing—original draft, N.R., A.C. and J.C.; Writing—review and editing, N.R., A.C. and J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Table of Data Quality Dimensions Used (X) in Key Publications on Data Quality (1983–2015)

	Bailey and Pearson (1983) [64]	Ives et al. (1983) [65]	Ballou and Pazer (1985) [66]	Delone and Mclean (1992) [67]	Wand and Wang (1996) [30]	Wang and Strong (1996) [68]	Redman (1997) [69]	Jarke (2002) [70]	Veregin (1999) [23]	Bovee (2003) [71]	Fisher and Kingma (2001) [72]	Pipino et al. (2002) [73]	Chapman (2005) [31]	Herzog et al. (2007) [74]	Abs (2009) [34]	Moges et al. (2013) [75]	Jayawardene et al. (2015) [76]
ACCESSIBILITY						X		X		X		X	X	X	X		X
ACCURACY AND PRECISION	X	X	X	X	X	X	X	X	X	X	X		X	X	X		X
ALIGNMENT																X
APPROPRIATE AMOUNT OF DATA		X				X	X					X				X
AVAILABILITY								X									X
BELIEVABILITY						X						X					X
COMPARABILITY														X
COMPLETENESS	X	X	X	X	X	X	X	X	X	X	X	X	X	X		X	X
CONCISE REPRESENTATION						X						X				X
CONSISTENCY/REPRESENTATIONAL CONSISTENCY			X	X	X	X	X	X	X	X	X			X	X		X
CREDIBILITY								X		X							X
EASE OF UNDERSTANDING						X						X				X
FITNESS FOR USE											X
FREE OF ERROR												X
INTERPRETABILITY						X	X	X		X		X	X		X	X	X
OBJECTIVITY						X						X
PORTABILITY								X
RELEVANCY/RELEVANCE	X	X		X		X	X			X	X	X	X	X	X
RELIABILITY				X	X			X
REPUTATION						X						X				X
RESPONSIVENESS/RESPONSE TIME								X
SECURITY/ACCESS SECURITY						X		X				X
TIME-RELATED DIMENSIONS (TIMELINESS)	X	X	X	X	X	X	X	X		X	X	X	X	X	X		X
TRACEABILITY																X
VALUE-ADDED						X						X

Appendix B. Questions for Assessment of Trial Data Quality

Within variable: Within variable tests a data value for its compliance with a domain (e.g., does a numerical value fall within a defined range; does a categorical value comply with a pre-existing reference list or standard), including when a missing value is appropriate.
Trial Dataset	Data Quality Questions	Data Quality Tests
1. Data resource	1a. Does a disclaimer exist for the trial, including independence of the organisation?	1a i. Disclaimer (Yes/No). 1a ii. Organisation classification.
	1b. Has all treatment data been published/disclosed?	1b i. Tick box (Yes/No).
	1c. Does the organisation have satisfactory data quality control procedures and trained personnel?	1c i. Organisation quality control procedures (Yes/No). 1c ii. Organisation with statistically trained personnel (Yes/No).
	1d. Does the organisation have procedures in place to manage confidentiality and change in trial status?	1d i. Confidentiality procedure (Yes/No).
2. Trial dates, date loaded, published and last processed	2a. What reference period for the trial, e.g., 2017 winter crop and when loaded into OFT?	2a. Difference between growing season, year and date loaded into OFT (<5 = H, 5–10 = M, >10 = L).
	2b. Is there likely to be subsequent data from the same trial project (completeness)?	2b. Grouped trials or subsequent trial.
	2c. Date published on OFT.	2c. Published/Not published.
	2d. Have there been modifications or revisions of the trial data?	2d. Has data against a trial in OFT changed (Yes/No).
	2e. Is the date sequence logical?	2e. Date last processed > date loaded >= growing year.
3. Trial number	3a. Cross-check system assigned trial numbers with published number, as each trial on OFT is assigned a trial number.	3a. Does system trial number = published number.
4. Access and display level	4a. Is the trial data publicly accessible via OFT?	4a. Tick box (Yes/No).
	4b. Is the trial data accessible to organisations/individuals with different levels of permitted access?	4b. Organisation control/grant permission access, e.g., login controlled profile.
	4c. If the data is not publicly accessible, how is that represented/presented to users?	4c. Directs to the trial contributor with their contact details.
5. Trial project code	5a. Is the project code a valid project code for the funding organisation or trial contributor?	5a. Cross-check funding organisation’s project code or trial contributor’s project code.
6. Trial project name	6a. Does the project name align with the project code?	6a. Check with the funding organisation/trial contributor’s project code and project name.
6. Trial project name	6b. Is the project name a valid project name for the funding organisation or trial contributor?	6b. Cross-check funding organisation’s project name or trial contributor’s project name.
7. Grouped trials	7a. Trials should be grouped if trials are assigned the same project code.	7a. If project code = project code, group trials.
8. Trial site name	8a. If a trial site is used for >1 year, trial site is to be allocated to both with year postfix.	8a. Example- location a _year 1, location a _year 2.
8. Trial site name	8b. If a trial site is used for ≤1 year, all corner co-ordinates of the plot are documented. A location of sequential assignment is added to document for cause and effect.	8b. System check for co-ordinates of all corners of the plot.
9. Locality	9a. Does the locality exist? Is the spelling correct?	9a. System check of locality name.
10. Co-ordinates	10a. Latitude: Check for entry error in latitude, including what datum is used for this set of coordinates?	10a. System check for latitude.
10. Co-ordinates	10b. Longitude: Check for entry error in longitude including what datum is used for this set of coordinates?	10b. System check for longitude.
11. Trial site accuracy	11a. Check that a class has been designated to reflect the accuracy of the trial site (i.e., nearest town or centre of the region)	11a. Check for missing data.
12. State or territory	12a. Spatial checks and queries against coordinates. These can be auto populated.	12a. Spatial reference can be auto populated.
13. GRDC region	13a. Check that a GRDC region has been allocated.	13a. Check for missing data (e.g., Southern region).
14. GRDC sub-region	14a. Check that a GRDC sub-region has been allocated.	14a. Check for missing data (e.g., Mallee).
15. Researchers	15a. Confirm key researchers.	15a. Cross-reference with the funding organisation/trial contributor.
15. Researchers	15b. Confirm spelling of researchers’ names.	15b. Researcher spell check.
16. Related programs	16a. Check if any related program is listed.	16a. All trial projects under a related program should be grouped together.
17. Lead research organisation	17a. Check against the organisation database.	17a. Cross-reference with organisation database.
17. Lead research organisation	17b. Check funding body/trial contributor database or contracts.	17b. Cross-reference with funding body/trial contributor database.
18. Host research organisation	18a. Check against the organisation database.	18a. Cross reference with organisation database.
	18b. Does the organisation have satisfactory data quality control procedures and trained personnel?	18b i. Organisation quality control procedures (Yes/No). 18b ii. Organisation with statistically trained personnel (Yes/No).
	18c. Does the organisation have procedures in place to manage confidentiality and change in trial status?	18c i. Confidentiality procedure (Yes/No).
19. Other trial partners	19a. Check against the organisation database.	19a. Cross-reference with organisation database.
20. Funding sources	20a. Check against funding sources.	20a. Cross-reference with funding body/trial contributor database.
20. Funding sources	20b. Sum of funding allocation of all sources = 100%.	20b. Sum of % test = 100 (Yes/No).
21. Trial aim	21a. Trial aim must be populated and spell-checked.	21a i. Populated (Yes/No). 21a ii. Spell check.
21. Trial aim	21b. Cross reference with funding body or trial contributor reports/database to confirm aim/research question is comparable.	21b. Cross-reference with funding body/trial contributor reports/database.
22. Key message	22a. Trial aim must be populated and spell checked	22a i. Populated (Yes/No). 22a ii. Spell check.
23. Acknowledgements	23a. Spell check of inserted text.	23a. Spell check.
24. Internal notes (login-controlled profile)	24a. Spell check of inserted text.	24a. Spell check.
25. Public notes	25a. Spell check of inserted text.	25a. Spell check.
26. Hyperlink	26a. Regular test to confirm hyperlink/website and web content still exists.	26a. Web link test.
27. Trial affected by adverse factors	27a. Check if the trial has been affected by adverse factors (e.g., frost, flooding).	27a. Populated (Yes/No). If yes, test 27b should be conducted.
27. Trial affected by adverse factors	27b. What adverse factors affected the trial and the details of events that may influence the reliability of results (dates, damage estimate, crop growth stage at time of damage, etc.).	27b. Spell check of inserted text
28. Linked trials	28a. Cross-reference for trials at the same trial site in the same growing year.	28a. Trials at same site (co-ordinates) in the same growing year.
	28b. Cross-reference for trials of the same program and project code.	28b. Trials of the same program/project.
	28c. Cross-reference for trials at the same trial site through time.	28c. Trials at same site (co-ordinates) but with different growing years.
29. Feature trial	29a. Feature trial must be in the set of linked trials.	29a. Feature trial = in listed linked trials.
30. Crop type	30a. Crop type = one or multiple from the defined list of crop types. (defined list of crop types should reflect global standards).	30a. Populated (Yes/No).
31. Crop type variety	31a. Variety information entered for crop type.	31a. Populated (Yes/No).
	31b. Crop type variety matches existing national and international standards, e.g., BRAPI, NVT, PBR.	31b. Cross-check variety with BRAPI, NVT, etc.
	31c. ‘Unreleased’ varieties linked to a variety vocabulary (managed by GRDC).	31c. Cross-reference unreleased varieties with variety library (https://www.ipaustralia.gov.au/plant-breeders-rights (accessed on 6 January 2026)).
32. Treatment type	32a. Treatment type(s) entered for the trial.	32a. Populated (Yes/No).
32. Treatment type	32b. Do treatment types comply with global standards.	32b. Cross-check BRAPI and crop ontology standards.
33. Sow rate	33a. Sow rate entered for each crop type. Should be a required field.	33a. Populated (Yes/No).
33. Sow rate	33b. Values are numeric and in kg/ha.	33b. Validation for numeric characters.
34. Target density	34a. Values are numeric and in plants/m² (note: method to assess plant density is to be recorded in metadata, e.g., quadrats).	34a i. Validation for numeric characters. 34a ii. Metadata note regarding the method used to assess plant density.
35. Sowing machinery	35a. Sowing practice classified using international standards, e.g., direct drill, disc seeder, conventional tyne, point type, harrow chains, press wheel, etc.	35a. Populated (Yes/No).
36. Sowing date	36a. Sowing date entered for the trial. If trial does not have a sowing date, then “not applicable” should be entered.	36a. Populated (Yes/No).
36. Sowing date	36b. Values for sowing date in standard time–date format (dd-mm-yyyy).	36b. Validation for date format.
37. Sowing depth	37a. Values are numeric and “mm depth from the soil surface”.	37a i. Validation for numeric characters. 37a ii. Values converted from other measurement units, e.g., cm or inches to mm.
38. Harvest date	38a. Harvest date entered for the trial. If trial does not have a harvest date, then “not applicable” should be entered.	38a. Populated (Yes/No).
	38b. Values for harvest date in standard time–date format (dd-mm-yyyy).	38b. Validation for date format.
	38c. Harvest time should be after sowing date.	38c Harvest date > sowing date.
39. Plot size	39a. Dimensions of plots in numeric metres, e.g., width = 2.40 m, length = 12.00 m.	39a. Validation for numeric characters up to 2 decimal places.
40. Plot replication	40a. Number of replicates (numeric, as a whole number).	40a. Validation for numeric characters and whole number.
41. Plot randomisation	41a. Yes/No test. If there is blocking and multifactorial then randomisation is required.	41a. Populated (Yes/No).
42. Plot blocking	42a. Number of blocks (numeric as a whole number).	42a. Validation for numeric characters and whole number.
43. Trial design	43a. Trial design layout file provided as attached file in OFT.	43a. Trial design layout file attached (Yes/No).
44. Paddock history	44a. Year by treatment (crop, fertiliser, etc.).	44a. Populated (Yes/No).
45. Details of fertiliser(s) used	45a. Fertiliser used (Yes/No).	45a. Populated (Yes/No). If yes, test 45b should be conducted.
45. Details of fertiliser(s) used	45b. Details of fertiliser(s) used: fertiliser type(s), rate(s) of application, date(s) of application, method of application.	45b. Standard fields—fertiliser type, rate of application (kg/ha), date of application (dd-mm-yyyy), application method.
46. Details of Fungicide(s) used	46a. Fungicide used (Yes/No).	46a. Populated (Yes/No) If yes, test 46b should be conducted
46. Details of Fungicide(s) used	46b. Details of fungicide(s) used: fungicide type(s), rate(s) of application, date(s) of application, method of application.	46b. Standard fields—fungicide type, rate of application (kg/ha), date of application (dd-mm-yyyy), application method.
47. Details of Herbicide(s) used	47a. Herbicide used (Yes/No).	47a. Populated (Yes/No) If yes, test 47b should be conducted
47. Details of Herbicide(s) used	47b. Details of herbicide(s) used: herbicide type(s), rate(s) of application, date(s) of application, method of application.	47b. Standard fields—herbicide type, rate of application (kg/ha), date of application (dd-mm-yyyy), application method.
48. Details of Insecticide(s) used	48a. Insecticide used (Yes/No).	48a. Populated (Yes/No). If yes, test 48b should be conducted.
48. Details of Insecticide(s) used	48b. Details of insecticide(s) used: insecticide type(s), rate(s) of application, date(s) of application, method of application.	48b. Standard fields—insecticide type, rate of application (kg/ha), date of application (dd-mm-yyyy), application method.
49. Soil amelioration	49a. Soil amelioration (Yes/No).	49a. Populated (Yes/No). If yes, test 49b should be conducted.
49. Soil amelioration	49b. Details of soil amelioration: year/date, treatment type, rate (kg/ha), ingredient qualities (e.g., Deep MAP—nutrients kg/ha; lime with ENV%).	49b. Standard fields—treatment date (dd-mm-yyyy), treatment type, rate of application (kg/ha), ingredient quality.
50. Seed treatment	50a. Any seed treatment performed (Yes/No).	50a. Populated (Yes/No). If yes, test 50b should be conducted.
50. Seed treatment	50b. Product listed with Australian Pesticides and Veterinary Medicines Authority (APVMA).	50b. Product matches APVMA standards.
51. Inoculant	51a. Any inoculant used (Yes/No).	51a. Populated (Yes/No). If yes, test 51b should be conducted.
51. Inoculant	51b. Product listed with APVMA.	51b. Product matches APVMA standards.
52. Tillage	52a. Any tillage performed on the paddock (Yes/No).	52a. Populated (Yes/No). If yes, test 52b should be conducted.
52. Tillage	52b. Details of tillage performed: date (dd-mm-yyyy), type, depth estimate.	52b. Standard fields—treatment date (dd-mm-yyyy), tillage type, depth of tillage (numeric value).
53. Trial results	53a. Do results and data types comply with measurement types?	53a. Standard field for result data (numeric, text, etc.).
Between variables: Between variables tests, the data value between two variables for standard values and logical tests (e.g., oil content and crop type where oil content of barley = non-logical).
Trial dataset	Data quality questions	Data quality tests
54. Are dates between variables sequential?	54a. Do the dates for treatments, sowing, harvest, etc. all comply with a date sequence?	54a. Check for sequence of the dates (e.g., sowing date < harvest date).
55. Measurement type data value	55a. This is when a measurement type result value does not comply with the crop type or treatment type.	55a. The measurement value should comply with the crop and measurement type (e.g., grain yield is measured as kg/ha and not lt/ha).
56. Trial site location	56a. When the trial location details (coordinates) do not comply with other designated information (e.g., state, GRDC zone).	56a. Verification checks for correct country, state and region. The same town can exist in multiple states or countries, so the trial site location should be selected correctly.
57. Corrupt trial result	57a. When the trial result entered is corrupt.	57a. System checks to test trial result is entered in a machine-readable format and accessible to viewers.
Between records: Between records tests for data where a proper sequence can be expected in data, missing records occur or comparison of data for a measurement type.
Trial dataset	Data quality questions	Data quality tests
58. Are dates between records sequential?	58a. Do the dates for treatments, sowing, harvest, etc. all comply with a date sequence?	58a. Check for sequence of the dates (e.g., sowing date < harvest date).
59. Trial site numbers	59a. This test checks that trial site numbers are retained and that new numbers are assigned to new trial sites.	59a. System check to validate each trial site has a unique trial site number (1 trial site number cannot be assigned to more than 1 trial site).
60. Grouping trials	60a. Have trials been successfully grouped and links between trials maintained?	60a. System checks to validate all the sub trials are linked to the trial project.
61. Trial site names	61a. This checks that trial site names are unique and that trial sites with the same name are identified and corrections are made.	61a. System check to validate that trial site names are unique and that trial sites with the same name are identified and exceptions are out.
Between tables: Between tables tests for data values across two or multiple tables where relationships across tables can be violated (e.g., labels or measurement types do not match trial metadata and key messages). This could include dates, incorrect attachment of the trial report, incorrectly linked trials or grouped trials, trial contributor/s and researcher inconsistencies, incorrect site location, etc.
Trial dataset	Data quality questions	Data quality tests
62. Link identifiers integrity retained	62a. Has the permanent identifiers for tables in OFT been maintained and are they not corrupt?	62a. Check if the integrity of the link for the tables is maintained.
63. Are dates between tables sequential?	63a. Do the dates for treatments, sowing, harvest, etc. all comply with a date sequence?	63a. Check for sequence of the dates (e.g., sowing date < harvest date).
Between systems: Between tables tests for data values across two or multiple tables where relationships across tables can be violated (e.g., labels or measurement types do not match trial metadata and key messages). This could include dates, incorrect attachment of the trial report, incorrectly linked trials or grouped trials, trial contributor/s and researcher inconsistencies, incorrect site location, etc.
Trial dataset	Data quality questions	Data quality tests
64. Web referencing	64a. Is the trial discoverable online through search engines (e.g., Google search, Bing)?	64a. Check for search engine optimisation for each trial on OFT.
65. Interoperability test	65a. Are data sourced from external interoperable systems for the same year as the trial (e.g., Bureau of Meteorology (BOM) data, CSIRO DAP)? 65b. Does the trial link with funding body/ trial contributor’s final reports?	65a. System check to validate the year of data export, corresponding to a trial site. 65b. Check for funding body/trial contributor’s project codes to link.

References

Fraser, E.D.; Campbell, M. Agriculture 5.0: Reconciling production with planetary health. One Earth 2019, 1, 278–280. [Google Scholar] [CrossRef]
OECD; FAO. Cereals. In OECD-FAO Agricultural Outlook 2023–2032; OECD Publishing: Paris, France, 2023. [Google Scholar] [CrossRef]
Anderson, R.; Bayer, P.E.; Edwards, D. Climate change and the need for agricultural adaptation. Curr. Opin. Plant Biol. 2020, 56, 197–202. [Google Scholar] [CrossRef]
Briat, J.-F.; Gojon, A.; Plassard, C.; Rouached, H.; Lemaire, G. Reappraisal of the central role of soil nutrient availability in nutrient management in light of recent advances in plant nutrition at crop and molecular levels. Eur. J. Agron. 2020, 116, 126069. [Google Scholar] [CrossRef]
Qaim, M. Role of New Plant Breeding Technologies for Food Security and Sustainable Agricultural Development. Appl. Econ. Perspect. Policy 2020, 42, 129–150. [Google Scholar] [CrossRef]
Khan, N.; Ray, R.L.; Sargani, G.R.; Ihtisham, M.; Khayyam, M.; Ismail, S. Current Progress and Future Prospects of Agriculture Technology: Gateway to Sustainable Agriculture. Sustainability 2021, 13, 4883. [Google Scholar] [CrossRef]
Awasthi, G.; Nagar, V.; Mandzhieva, S.; Minkina, T.; Sankhla, M.S.; Pandit, P.P.; Aseri, V.; Awasthi, K.K.; Rajput, V.D.; Bauer, T.; et al. Sustainable Amelioration of Heavy Metals in Soil Ecosystem: Existing Developments to Emerging Trends. Minerals 2022, 12, 85. [Google Scholar] [CrossRef]
Gerhards, R.; Andujar Sanchez, D.; Hamouz, P.; Peteinatos, G.G.; Christensen, S.; Fernandez-Quintanilla, C. Advances in site-specific weed management in agriculture—A review. Weed Res. 2022, 62, 123–133. [Google Scholar] [CrossRef]
Weersink, A.; Fraser, E.; Pannell, D.; Duncan, E.; Rotz, S. Opportunities and challenges for big data in agricultural and environmental analysis. Annu. Rev. Resour. Econ. 2018, 10, 19–37. [Google Scholar] [CrossRef]
Araújo, S.O.; Peres, R.S.; Barata, J.; Lidon, F.; Ramalho, J.C. Characterising the Agriculture 4.0 Landscape—Emerging Trends, Challenges and Opportunities. Agronomy 2021, 11, 667. [Google Scholar] [CrossRef]
Osinga, S.; Paudel, D.; Mouzakitis, S.; Athanasiadis, I. Big data in agriculture: Between opportunity and solution. Agric. Syst. 2022, 195, 103298. [Google Scholar] [CrossRef]
Fenz, S.; Neubauer, T.; Johannes, H.; Jurgen, F.; Wohlmuth, M.-L. AI- and data-driven pre-crop values and crop rotation matrices. Eur. J. Agron. 2023, 150, 126949. [Google Scholar]
Nativi, S.; Mazzetti, P.; Santoro, M.; Papeschi, F.; Craglia, M.; Ochiai, O. Big data challenges in building the global earth observation system of systems. Environ. Model. Softw. 2025, 68, 1–26. [Google Scholar] [CrossRef]
Nandyala, C.S.; Kim, H.-K. Big and meta data management for U-Agriculture mobile services. Int. J. Softw. Eng. Its Appl. 2016, 10, 257–270. [Google Scholar] [CrossRef]
Jouanjean, M.-A.; Casalini, F.; Wiseman, L.; Gray, E. Issues around data governance in the digital transformation of agriculture. In OECD Food, Agriculture and Fisheries; Paper No 146; OECD Publishing: Paris, France, 2020; Volume Paper No 146. [Google Scholar]
Provost, F.; Fawcett, T. Data science and its relationship to big data and data-driven decision making. Big Data 2013, 1, 51–59. [Google Scholar] [CrossRef]
Maria, K.; Maria, B.; Andrea, K. Exploring actors, their constellations, and roles in digital agricultural innovations. Agric. Syst. 2021, 186, 102952. [Google Scholar] [CrossRef]
Gonzalez-Vidal, A.; Ramallo-González, A.P.; Skarmeta, A.F. Intrinsic and extrinsic quality of data for open data repositories. ICT Express 2022, 8, 328–333. [Google Scholar] [CrossRef]
Tenopir, C.; Rice, N.; Allard, S.; Baird, L.; Borycz, J.; Christian, L.; Grant, B.; Olendorf, R.; Sandusky, R. Data sharing, management, use, and re-use: Practices and perceptions of scientists worldwide. PLoS ONE 2020, 15, e0229003. [Google Scholar]
Walters, J.; Light, K.; Robinson, N. Using agricultural metadata: A novel investigation of trends in sowing date in on-farm research trials using the Online Farm Trials database. F1000Research 2021, 9, 1305. [Google Scholar] [CrossRef]
Nicholson, N.; Negrao Carvalho, R.; Štotl, I. A FAIR Perspective on Data Quality Frameworks. Data 2025, 10, 136. [Google Scholar] [CrossRef]
Benos, L.; Tagarakis, A.C.; Dolias, G.; Berruto, R.; Kateris, D.; Bochtis, D. Machine Learning in Agriculture: A Comprehensive Updated Review. Sensors 2021, 21, 3758. [Google Scholar] [CrossRef] [PubMed]
Veregin, H. Data quality parameters. Geogr. Inf. Syst. 1999, 1, 177–189. [Google Scholar]
Phillips, P.W.B.; Relf-Eckstein, J.-A.; Jobe, G.; Wixted, B. Configuring the new digital landscape in western Canadian agriculture. NJAS-Wagening. J. Life Sci. 2019, 90–91, 100295. [Google Scholar] [CrossRef]
Hatanaka, M.; Konefal, J.; Strube, J.; Glenna, L.; Conner, D. Data-Driven Sustainability: Metrics, Digital Technologies, and Governance in Food and Agriculture. Rural Sociol. 2022, 87, 206–230. [Google Scholar] [CrossRef]
Roitsch, T.; Cabrera-Bosquet, L.; Fournier, A.; Ghamkhar, K.; Jimenez-Berni, J.; Pinto, F.; Ober, E. Review: New sensors and data-driven approaches- A path to next generation phenomics. Plant Sci. 2019, 282, 2–10. [Google Scholar] [CrossRef]
Robinson, N.; Thompson, H.; Milne, R.; Wills, B.; Feely, P.; MacLeod, A.; Parker, J.; Walters, J. Online Farm Trials: Data Quality Framework for OFT Trial Resources; CeRDI Internal Report; CeRDI: Clermont-Ferrand, France, 2018; 60p. [Google Scholar]
ISO 9001:2015; Quality Management Systems. Requirements. International Organisation for Standardisation: Geneva, Switzerland, 2015.
Earley, S.; Henderson, D.; Seba. DAMA-DMBOK: Data Management Body of Knowledge, 2nd ed.; Technics Publications, LLC: Bradley Beach, NJ, USA, 2017. [Google Scholar]
Wand, Y.; Wang, R.Y. Anchoring data quality dimensions in ontological foundations. Commun. ACM 1996, 39, 86–95. [Google Scholar] [CrossRef]
Chapman, A.D. Principles of Data Quality; Report for the Global Biodiversity Information Facility; Global Biodiversity Information Facility: Copenhagen, Denmark, 2005. [Google Scholar]
Redman, T.C. Data quality management past, present, and future: Towards a management system for data. In Handbook of Data Quality: Research and Practice; Sadiq, S., Ed.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 15–40. [Google Scholar]
ISO 8000-2; Data Quality. Vocabulary. International Organisations for Standardisation: Geneva, Switzerland, 2022.
ABS; Australian Bureau of Statistics. 1520.0—ABS Data Quality Framework. 2009. Available online: https://www.abs.gov.au/ausstats/abs@.nsf/mf/1520.0 (accessed on 14 August 2023).
Guillen-Aguinaga, M.; Aguinaga-Ontoso, E.; Guillen-Aguinaga, L.; Guillen-Grima, F.; Aguinaga-Ontoso, I. Data Quality in the Age of AI: A Review of Governance, Ethics, and the FAIR Principles. Data 2025, 10, 201. [Google Scholar] [CrossRef]
Fan, W.; Geerts, F. Foundations of Data Quality Management; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]
Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.-W.; da Silva Santos, L.B.; Bourne, P.E. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 1–9. [Google Scholar] [CrossRef] [PubMed]
Murphy, A.; McKenna, K.; Corbett, J.; Taylor, M. Online Farm Trials Impact Research (First Wave) Extended Timeframe Research Study; Centre for eResearch and Digital Innovation, Federation University, Australia: Ballarat, VIC, Australia, 2016; 106p. [Google Scholar]
Walters, J.; Milne, R.; Thompson, H. Online Farm Trials: A national web-based information source for Australia grains research, development and extension. Rural Ext. Innov. Syst. J. 2018, 14, 117–123. [Google Scholar]
Robinson, N.; Dahlhaus, P.; Feely, P.; Light, K.; MacLeod, A.; Milne, R.; Parker, J.; Thompson, H.; Walters, J.; Wills, B. Online Farm Trials (OFT)—The past, present and future. In Cells to Satellites, Proceedings of the 19th Australian Society of Agronomy Conference, Wagga Wagga, NSW, Australia, 25–29 August 2019; Australian Society of Agronomy: Winthrop, WA, Australia, 2019. [Google Scholar]
Ollerenshaw, A.; Robinson, N.; Chadha, A.; Channon, J. A smart agriculture information system delivering research data for the adoption by the Australian grains industry. Smart Agric. Technol. 2024, 9, 100610. [Google Scholar] [CrossRef]
Ollerenshaw, A.; Murphy, A.; Walters, J.; Robinson, N.; Thompson, H. Use of digital technology for research data and information transfer within the Australian grains sector: A case study using Online Farm Trials. Agric. Syst. 2023, 206, 103591. [Google Scholar] [CrossRef]
Wills, B.; Parker, J.; Robinson, N.; Wong, M. Improving the FAIRness of Australia’s grains research sector data. In Cells to Satellites, Proceedings of the 19th Australian Society of Agronomy Conference, Wagga Wagga, NSW, Australia, 25–29 August 2019; Australian Society of Agronomy: Winthrop, WA, Australia, 2019. [Google Scholar]
Walters, J.R.; Light, K. The Australian digital Online Farm Trials database increases the quality of systematic reviews and meta-analyses in grains crop research. Crop Pasture Sci. 2021, 72, 789–800. [Google Scholar] [CrossRef]
Nousak, P.; Phelps, R. A scorecard approach to improving data quality. In Proceedings of the Data Warehousing and Enterprise Solutions, Sugi-27, Orlando, FL, USA, 14–17 April 2002. [Google Scholar]
Government of NSW. NSW Government Standard for Data Quality Reporting. 2015. Available online: https://www.finance.nsw.gov.au/ict/resources/data-quality-standard (accessed on 28 March 2018).
Cichy, C.; Rass, S. An overview of data quality frameworks. IEEE Access 2019, 7, 24634–24648. [Google Scholar] [CrossRef]
Borgman, C.L. The conundrum of sharing research data. J. Am. Soc. Inf. Sci. Technol. 2012, 63, 1059–1078. [Google Scholar] [CrossRef]
van Vlokhoven, H. The effect of open access on research quality. J. Informetr. 2019, 13, 751–756. [Google Scholar] [CrossRef]
Moore, E.K.; Kriesberg, A.; Schroeder, S.; Geil, K.; Haugen, I.; Barford, C.; Johns, E.M.; Arthur, D.; Sheffield, M.; Ritchie, S.M. Agricultural data management and sharing: Best practices and case study. Agron. J. 2022, 114, 2624–2634. [Google Scholar] [CrossRef]
Wolfert, S.; Ge, L.; Verdouw, C.; Bogaardt, M.-J. Big data in smart farming—A review. Agric. Syst. 2017, 153, 69–80. [Google Scholar] [CrossRef]
Chergui, N.; Kechadi, M.T. Data analytics for crop management: A big data view. J. Big Data 2022, 9, 1–37. [Google Scholar] [CrossRef]
Gupta, N.; Patel, H.; Afzal, S.; Panwar, N.; Mittal, R.S.; Guttula, S.; Jain, A.; Nagalapatti, L.; Mehta, S.; Hans, S. Data Quality Toolkit: Automatic assessment of data quality and remediation for machine learning datasets. arXiv 2021, arXiv:2108.05935. [Google Scholar] [CrossRef]
Bhat, S.A.; Huang, N.-F. Big data and ai revolution in precision agriculture: Survey and challenges. IEEE Access 2021, 9, 110209–110222. [Google Scholar] [CrossRef]
Rouhani, S.; Deters, R. Data trust framework using blockchain technology and adaptive transaction validation. IEEE Access 2021, 9, 90379–90391. [Google Scholar] [CrossRef]
Juddoo, S. Overview of data quality challenges in the context of Big Data. In Proceedings of the International Conference on Computing, Communication and Security (ICCCS), Pointe aux Piments, Mauritius, 4–6 December 2015. [Google Scholar]
Gudivada, V.; Apon, A.; Ding, J. Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations. Int. J. Adv. Softw. 2017, 10, 1–20. [Google Scholar]
ISO 8000-8; Data Quality. Information and Data Quality: Concepts and Measuring. International Organization for Standardization: Geneva, Switzerland, 2015.
Susarla, A.; Gopal, R.; Thatcher, J.B.; Sarker, S. The Janus Effect of Generative AI: Charting the Path for Responsible Conduct of Scholarly Activities in Information Systems. Inf. Syst. Res. 2023, 34, 399–408. [Google Scholar] [CrossRef]
Sikorska, J.; Bradley, S.; Hodkiewicz, M.; Fraser, R. DRAT: Data risk assessment tool for university–industry collaborations. Data-Centric Eng. 2020, 1, e17. [Google Scholar] [CrossRef]
Higgins, S.; Schellberg, J.; Bailey, J.S. Improving productivity and increasing the efficiency of soil nutrient management on grassland farms in the UK and Ireland using precision agriculture technology. Eur. J. Agron. 2019, 106, 67–74. [Google Scholar] [CrossRef]
Charlebois, S.; Latif, N.; Ilahi, I.; Sarker, B.; Music, J.; Vezeau, J. Digital Traceability in Agri-Food Supply Chains: A Comparative Analysis of OECD Member Countries. Foods 2024, 13, 1075. [Google Scholar] [CrossRef]
DAFF. National Agricultural Traceability Strategy 2023 to 2033; Department of Agriculture, Fisheries and Forestry (DAFF): Canberra, Australia, 2023. [Google Scholar]
Bailey, J.E.; Pearson, S.W. Development of a tool for measuring and analyzing computer user satisfaction. Manag. Sci. 1983, 29, 530–545. [Google Scholar] [CrossRef]
Ives, B.; Olson, M.H.; Baroudi, J.J. The measurement of user information satisfaction. Commun. ACM 1983, 26, 785–793. [Google Scholar] [CrossRef]
Ballou, D.P.; Pazer, H.L. Designing information systems to optimize the accuracy-timeliness tradeoff. Inf. Syst. Res. 1995, 6, 51–72. [Google Scholar] [CrossRef]
DeLone, W.H.; McLean, E.R. Information systems success: The quest for the dependent variable. Inf. Syst. Res. 1992, 3, 60–95. [Google Scholar] [CrossRef]
Wang, R.Y.; Strong, D.M. Beyond accuracy: What data quality means to data consumers. J. Manag. Inf. Syst. 1996, 12, 5–33. [Google Scholar] [CrossRef]
Redman, T.C. Data Quality for the Information Age; Artech House, Inc.: Norwood, MA, USA, 1997. [Google Scholar]
Jarke, M.; Lenzerini, M.; Vassiliou, Y.; Vassiliadis, P. Fundamentals of Data Warehouses; Springer Science & Business Media: Berlin, Germany, 2002. [Google Scholar]
Bovee, M.; Srivastava, R.P.; Mak, B. A conceptual framework and belief-function approach to assessing overall information quality. Int. J. Intell. Syst. 2003, 18, 51–74. [Google Scholar] [CrossRef]
Fisher, C.W.; Kingma, B.R. Criticality of data quality as exemplified in two disasters. Inf. Manag. 2001, 39, 109–116. [Google Scholar] [CrossRef]
Pipino, L.L.; Lee, Y.W.; Wang, R.Y. Data quality assessment. Commun. ACM 2002, 45, 211–218. [Google Scholar] [CrossRef]
Herzog, T.N.; Scheuren, F.J.; Winkler, W.E. Data Quality and Record Linkage Techniques; Springer: New York, NY, USA, 2007. [Google Scholar]
Moges, H.-T.; Dejaeger, K.; Lemahieu, W.; Baesens, B. A multidimensional analysis of data quality for credit risk management: New insights and challenges. Inf. Manag. 2013, 50, 43–58. [Google Scholar] [CrossRef]
Jayawardene, V.; Sadiq, S.; Indulska, M. An Analysis of Data Quality Dimensions; School of Information Technology and Electrical Engineering, The University of Queensland: Brisbane City, Australia, 2015. [Google Scholar]

Figure 1. Key components of the proposed data quality framework for grains trial research, illustrating the overarching data quality vision supported by four interconnected elements: a data quality strategy, data quality assessment tool, data quality reports and a continuous improvement cycle.

Figure 2. Example for a trial record data quality assessment, combining the data quality test and data quality statement.

Table 1. The seven dimensions of data quality and the standards for assessing and reporting on the quality of statistical information of research trials.

Data Quality Dimension	Tests and Reporting Assessments
Accessibility	Accessibility to the public; accessibility of data products
Accuracy	Coverage error; sample error; non-response error; response error; other sources of error; revisions to data
Coherence	Changes to data items; comparisons across data items; comparisons with previous releases; comparison with other available products
Institutional environment	Impartiality and objectivity; professional independence; data collection mandate; adequacy of resources; quality commitment; statistical confidentiality
Interpretability	Availability of information regarding the data; presentation of information
Relevance	Scope and coverage; reference period; geographic detail; main data outputs; classification and statistical standards; estimate variable types
Timeliness	Time lag between trial implementation and data availability; frequency and survey/trials

Table 2. Assessment questions for the proposed trial data quality statement. Adapted from the NSW Government Standard for Data Quality Reporting, 2015 [46].

Institutional Environment
✓ Contributor to OFT publishing: this data is the recognised data custodian.	Χ The contributor to OFT publishing: this data is not the data custodian.
✓ Data is collected or managed according to a data quality framework.	Χ Data is not collected or managed according to a data quality framework.
✓ Data governance roles and responsibilities are clearly assigned for this dataset or data source.	Χ Data governance roles and responsibilities are not assigned for this dataset or data source.
✓ Data collection is mandated or required by a law, regulation or agreement.	Χ Data collection is not mandated or required by a law, regulation or agreement.
✓ Trial data custodian has no commercial interest or conflict of interest in the data or has declared any interest in the data.	Χ No information available regarding whether the trial data custodian has any commercial interest in or conflict of interest with the data.
Relevance
✓ The aim of the trial is included as trial metadata.	Χ There is no trial aim included in trial details.
✓ Details on the trial location are entered for the actual trial site. OR ✓ Details on the trial site location are entered (either at regional or locality precision) pending the embargo period	Χ Trial location details are entered at either regional or locality precision or are absent.
✓ Information is available on the environmental variables (climate, soils), management practices (e.g., sowing) and trial metadata to help users evaluate the temporal relevance of the trial.	Χ There is no further information available to help users evaluate the temporal relevance of the trial.
✓ Data analysis methods and standards used (and data reported) to assess trial results if available for users in trial metadata.	Χ Data analysis methods and standards are not included in trial data or metadata.
✓ The trial measures and results use current standards, classification schemes and for contemporary cropping systems.	Χ No details exist on the adoption of standard measures and classifications for the trial.
Timeliness
✓ The trial data is published with no embargo period enforced.	Χ The trial data is under an embargo period.
✓ The trial is part of a time series (e.g., crop rotation) with instructions informing users of impending crops and data to be collected.	Χ No instructions or information on the following data collection are provided to users.
✓ All or part of the trial result data is entered into OFT system, with a trial report also published.	Χ No trial results have been entered into the OFT system; however, a trial report may be published.
✓ Details on the trial, including sowing date, harvest date and date of trial observations and measurements, recorded.	Χ Details on trial dates are incomplete or absent.
✓ Trial aim and key messages have been published online in OFT.	Χ Details on the trial aim and key messages remain absent.
Accuracy
✓ This data has been subject to quality assurance processes, i.e., checking for errors at each stage of data collection and processing or verifying data entry and making corrections if necessary.	Χ This data has not been subject to any quality assurance processes, i.e., checking for errors at each stage of data collection and processing or verifying data entry and making corrections if necessary.
✓ The data collection met the objectives of the primary user. The data correctly represents what it was designed to measure, monitor or report.	Χ The data collection did not meet the objectives of the primary user. The data may not fully represent what it was designed to measure, monitor or report.
✓ There are no known gaps in the data (for example: non-responses, missing records, data not collected). OR ✓ Gaps are identified in caveats attached to the dataset or data source (metadata).	Χ There is no information available about gaps in the data, (for example: non-responses, missing records, data not collected—no metadata).
✓ There have been no adjustments, changes or other factors that could impact the validity of the data. (For example: weighting, rounding, de-identification of data; changes or flaws in data collection or verification methods.) OR ✓ Adjustments are identified in caveats attached to the dataset or data source (metadata).	Χ There is no information available about adjustments, changes or other factors that could impact the validity of the data. (For example: weighting, rounding, de- identification of data; changes or flaws in data collection or verification methods).
✓ Revision policy: If errors are identified, data is revised, and the revision is publicised (including metadata).	Χ There is no revision policy
Coherence
✓ Standard definitions, common concepts, classifications and data recording practices have been used.	Χ Standard definitions, common concepts, classifications and data recording practices have not been used.
✓ Elements within the data can be meaningfully compared (e.g., statistical parameters).	Χ Elements within the data cannot be meaningfully compared.
✓ This data is consistent with similar or related data sources.	Χ No similar or related data sources have been identified.
✓ This dataset is a single collection. It is not impacted by changes in methodology or external events over time. OR ✓ This data is part of a time series. There have not been any significant changes in the way data items are defined, classified or counted since the start of the series.	Χ This data is part of a time series. There have been significant changes in collection methodologies and the way data items are defined, classified or counted since the start of the collection or time series.
✓ This data is consistent with previous releases. There have been no changes in methodology or external impacts since the last data release.	Χ This data may not be consistent with previous releases. There have been changes in methodology or external impacts since the last release in this series.
Interpretability
✓ A data dictionary is available to explain the meaning of data elements, their origin, format and relationships.	Χ There is no data dictionary available to explain the meaning of data elements, their origin, format and relationships.
✓ Information is available about the primary data sources and methods of data collection. (For example: instruments, forms, instructions.)	Χ There is no explanatory information available about the primary data sources and methods of data collection.
✓ Information is available to help users evaluate the accuracy of the data and any level of error.	Χ There is no further information available to help users evaluate the accuracy of the data and any level of error.
✓ Information is available to explain concepts, help users correctly interpret the data and understand how it can be used (key messages/findings).	Χ There is no further information available to explain concepts, help users correctly interpret the data and understand how it can be used.
✓ Information is available to explain ambiguous or technical terms used in the data.	Χ There is no further information available to explain ambiguous or technical terms used in the data.
Accessibility
✓ This dataset or data source is available online and is open and free to access.	Χ This dataset or data source is subject to limiting or restrictive access conditions.
✓ This dataset or data source is available in a machine-processable, structured format.	Χ This dataset or data source is not available in a machine-processable, structured format.
✓ This dataset or data source is available in a non-proprietary format.	Χ This dataset or data source is not available in a non-proprietary format.
✓ This dataset or data source is described using open standards and persistent identifiers.	Χ This dataset or data source is not described using open standards and persistent identifiers.
✓ This dataset or data source is linked to other data, to provide context.	Χ This dataset or data source is not linked to other data.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chadha, A.; Robinson, N.; Channon, J. Towards Data-Driven Decisions in Agriculture—A Proposed Data Quality Framework for Grains Trials Research. Data 2026, 11, 19. https://doi.org/10.3390/data11010019

AMA Style

Chadha A, Robinson N, Channon J. Towards Data-Driven Decisions in Agriculture—A Proposed Data Quality Framework for Grains Trials Research. Data. 2026; 11(1):19. https://doi.org/10.3390/data11010019

Chicago/Turabian Style

Chadha, Aakansha, Nathan Robinson, and Judy Channon. 2026. "Towards Data-Driven Decisions in Agriculture—A Proposed Data Quality Framework for Grains Trials Research" Data 11, no. 1: 19. https://doi.org/10.3390/data11010019

APA Style

Chadha, A., Robinson, N., & Channon, J. (2026). Towards Data-Driven Decisions in Agriculture—A Proposed Data Quality Framework for Grains Trials Research. Data, 11(1), 19. https://doi.org/10.3390/data11010019

Article Menu

Towards Data-Driven Decisions in Agriculture—A Proposed Data Quality Framework for Grains Trials Research

Abstract

1. Introduction

2. Data Quality: Its Dimensions and Principles for Grains Trials and Research

3. DQF Conceptualisation for Grains Trial Research

4. Action Case Study Using Online Farm Trials to Demonstrate Data Quality Assessment and Reporting

4.1. Proposed Reporting on Trial Data Quality

4.1.1. Trial Data Quality Test

4.1.2. Trial Data Quality Statement

4.1.3. Process for Improving Data Quality in OFT

4.1.4. Validation and Refinement of the Proposed DQF

5. Discussion

5.1. Lessons Learned

5.2. Limitations

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Table of Data Quality Dimensions Used (X) in Key Publications on Data Quality (1983–2015)

Appendix B. Questions for Assessment of Trial Data Quality

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI