Similarity-Based Decision Support for Improving Agricultural Practices and Plant Growth

Baraian, Iulia; Valean, Honoriu; Matei, Oliviu; Erdei, Rudolf

doi:10.3390/app15126936

Open AccessArticle

Similarity-Based Decision Support for Improving Agricultural Practices and Plant Growth

¹

Department of Automation, Technical University of Cluj-Napoca, Memorandumului, 400000 Cluj-Napoca, Romania

²

Department of Systems Engineering, Technical University of Cluj-Napoca, North University Centre of Baia Mare, 430083 Baia Mare, Romania

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2025, 15(12), 6936; https://doi.org/10.3390/app15126936

Submission received: 23 March 2025 / Revised: 13 June 2025 / Accepted: 17 June 2025 / Published: 19 June 2025

(This article belongs to the Special Issue Biosystems Engineering: Latest Advances and Prospects)

Download

Browse Figures

Versions Notes

Abstract

:

Similarity-based decision support systems have become essential tools for providing tailored and adaptive guidance across various domains. In agriculture, where managing extensive land areas poses significant challenges, the primary objective is often to maximize harvest yields while reducing costs, preserving crop health, and minimizing the use of chemical adjuvants. The application of similarity-based analysis enables the development of personalized farming recommendations, refined through shared data and insights, which contribute to improved plant growth and enhanced annual harvest outcomes. This study employs two algorithms, K-Nearest Neighbour (KNN) and Approximate Nearest Neighbour (ANN) using Locality Sensitive Hashing (LSH) to evaluate their effectiveness in agricultural decision-making. The results demonstrate that, under comparable farming conditions, KNN yields more accurate recommendations due to its reliance on exact matches, whereas ANN provides a more scalable solution well-suited for large datasets. Both approaches support improved agricultural decisions and promote more sustainable farming strategies. While KNN is more effective for smaller datasets, ANN proves advantageous in real-time applications that demand fast response times. The implementation of these algorithms represents a significant advancement toward data-driven and efficient agricultural practices.

Keywords:

1. Introduction

Similarity-based decision support systems have gained prominence in recent years for their ability to provide personalised suggestions across a wide range of domains, from entertainment and e-commerce to education and healthcare [1]. Their integration into agriculture presents a novel opportunity to enhance decision-making by offering tailored advice to farmers based on environmental, operational, and regional contexts.

In agriculture, decision-making often entails processing diverse data under time constraints. Similarity-based decision support systems, particularly those using similarity-based algorithms, offer a pathway to reduce decision fatigue, improve recommendation speed, and support more informed agricultural practices. By lowering the threshold of the prior knowledge required, these systems have the potential to empower new entrants into farming, addressing global food security concerns amid increasing population pressures and environmental challenges [2,3].

One of the key motivations for this study is to support small and resource-constrained farmers by demonstrating that classic and accessible algorithms, such as K-Nearest Neighbour and Approximate Nearest Neighbour with Locality Sensitive Hashing, can still yield valuable, personalised recommendations. Unlike complex or opaque models, these methods are interpretable, lightweight, and can be applied in settings with limited computational infrastructure.

Similarity-based methods identify patterns in user behaviour to generate personalised recommendations. In agriculture, this can translate to insights such as optimal crop selection, resource management strategies, and pest control interventions. A notable benefit of user–user similarity-based methods lies in their ability to compare the profiles of farmers operating under similar agro-environmental conditions and infer actionable practices [4,5]. This approach facilitates the dissemination of sustainable techniques and encourages innovation among smaller-scale farmers who might lack access to institutional research and advisory services.

Machine learning-based decision support systems have already been studied for their potential to optimise farming decisions using historical and contextual data [6]. Studies have demonstrated the role of factors such as soil properties in yield prediction and decision-making [7]. For instance, conservation practices have led to increases in corn yield by up to 41% due to improved soil health [8], while global crop losses from pests and diseases remain substantial, amounting to 20–40% annually and approximately USD 220 billion in economic impact, respectively [9]. Such challenges underscore the value of data-driven support systems capable of mitigating risks and enhancing productivity.

Further, similarity-based algorithms can be used to manage emerging threats by aggregating shared information from regional farmer networks. If multiple users report pest outbreaks or adverse climate patterns, the system can suggest countermeasures based on previously observed outcomes. This real-time adaptability provides a robust advantage over static advisory models.

Among the core algorithms used in similarity-based systems are K-Nearest Neighbour (KNN) and Approximate Nearest Neighbour using Locality Sensitive Hashing (ANN-LSH). KNN is known for its interpretability and high accuracy in smaller datasets, while ANN-LSH offers scalability and faster processing for real-time applications. This research applies both algorithms to agricultural data, evaluating their performance on structured profiles generated with domain-relevant attributes [10]. A case study of SyrAgri, a web-based system deployed for Malian farmers, demonstrates the practicality of similarity-based decision support systems in agriculture, where contextual personalisation has been successfully implemented [11].

The novelty of this work lies in applying user–user similarity-based methods within an agricultural context, using both KNN and ANN-LSH to match farming profiles based on 13 environmental and operational attributes. Unlike traditional item-based models, this approach supports localised decision-making and lays the groundwork for future real-time capabilities. By integrating synthetically generated, literature-informed datasets, this study contributes a methodological foundation for building adaptable, intelligent agricultural similarity-based decision support systems.

The following sections present the structure of this paper: Section 2 presents the materials and methods employed in this study, including data generation and algorithmic implementation. Section 3 discusses the obtained results, highlighting algorithm performance and practical insights. Section 4 provides a detailed discussion of the findings in relation to agricultural decision-making and system scalability. Finally, Section 5 presents concluding remarks and outlines future research directions.

2. Materials and Methods

Similarity-based methods can be categorised into three main types relevant to agricultural recommendation systems: user-based, item-based, and hybrid approaches [12,13,14,15,16].

User-based filtering generates recommendations by analysing the practices of farmers operating under similar conditions, such as climate, crop type, and farming techniques. For example, if one farmer achieves success with a particular irrigation method, that method can be recommended to others in comparable environments. This approach relies on accurate data sharing and community engagement [17].

The item-based approach suggests treatments, seeds, and fertilisers based on the observed success of similar products. If an agricultural item is highly rated or frequently used, it can be inferred to be of high quality and, as a result, a safe recommendation.

Hybrid methods combine both user-based and item-based approaches to produce more complete recommendations. These systems adapt to varied farmer needs and improve decision-making across agricultural tasks. While traditional systems often emphasise user-based filtering, item-based filtering has been shown to improve scalability in large datasets by focusing on item–item similarities [18].

Beyond algorithmic function, similarity-based systems support community building by fostering farmer networks where users share experiences, solutions, and concerns. This collective knowledge helps individuals facing unfamiliar challenges learn from others with similar profiles, improving resilience and adaptability.

This study focuses on applying similarity-based methods to improve crop lifecycles and farm efficiency, with an emphasis on the user–user approach. Each farmer is represented as a user, and the algorithm identifies the most similar users based on attributes such as soil type, climate, and pest management.

The implementation was carried out in Java 11.0.17 (Oracle OpenJDK, Austin, TX, USA), chosen for its efficiency in handling both small and large datasets and for its support of the KNN and ANN-LSH algorithms. Farmer profiles were stored in CSV format, and Java collections were used to ensure efficient storage, retrieval, and neighbour computation.

2.1. Why Is User–User Filtering More Suitable?

Unlike item-based filtering, which focuses on individual tools or treatments, user–user filtering considers broader environmental and contextual factors influencing agricultural success, such as climate, soil type, and pest pressures.

Farmers operating under similar conditions are more likely to benefit from shared strategies. For instance, if a farmer with loamy soil in a temperate climate grows tomatoes successfully, the system will recommend similar practices to other farmers with matching conditions.

Moreover, user–user filtering allows dynamic adaptation to changing environments, whereas item-based filtering assumes static product effectiveness. For personalised agricultural recommendations that consider environmental variability, user–user filtering is the most suitable method [13,19,20].

2.2. Data Gathering

For this purpose, a synthetic dataset was programmatically generated using a custom Java application (AgricultureDatasetGenerator.java). This dataset simulates agricultural practices by combining realistic values sampled from predefined, literature-supported ranges. The full implementation, including the data generation code and CSV samples used in the experiments, is publicly available in the repository linked in the Data Availability section. The dataset includes 13 attributes that reflect key factors influencing agricultural decisions, such as the following:

Soil Type;
Fertilizer Type;
Climate Data;
Humidity Level;
Pest Disease Management;
Plant Time;
Crop Harvested;
Water Temperature;
Harvest Colour;
Seed Supplier;
Season;
Distance to Retailer;
Harvest Yield (%).

Each feature captures elements influencing crop growth, resilience, and agricultural efficiency, establishing a robust basis for recognising commonalities among farming profiles.

To ensure ease of use and minimise the burden on the farmer, the system is designed with future automation in mind, aiming to reduce user input through pre-populated data from reliable online sources. Although the current implementation relies largely on manual inputs and predefined ranges sourced from agricultural literature and validated datasets, each attribute has been carefully evaluated to determine the most appropriate and user-friendly input method. This design balances flexibility and usability, acknowledging that farmers are primarily focused on crop management and productivity rather than on time-consuming data entry.

For this proof of concept, many of the input values were manually inserted or defined using fixed ranges derived from agricultural literature and validated datasets. This approach ensured that the system operated with realistic parameters that accurately reflect farming scenarios. While automated and real-time data retrieval is planned for future iterations, the current method offers a reliable baseline and facilitates testing under practical and representative conditions.

Soil Type—Indicates the main soil category (e.g., clay, loamy, sandy), affecting moisture retention, nutrient levels, and crop suitability. This attribute informs soil-based crop and fertiliser recommendations, grounded in realistic values from agricultural sources [21,22].

Fertilizer Type—Identifies the fertilizer class most effective for a given soil and crop type (e.g., nitrogen-rich for leafy crops) [23].

Climate Data—By using historical and manually curated climate data, the system identifies crops best suited for specific weather conditions, helping reduce the risks associated with weather variability by aligning crop choices with climate patterns [24].

Humidity Level—Indicates atmospheric moisture, which affects disease risk and irrigation scheduling. High humidity increases fungal risk, while low humidity raises water demand [25].

Pest and Disease Management—Covers approaches like chemical (synthetic or organic) and biological controls. Combining multiple strategies improves crop resilience and supports targeted pest control [26].

Plant Time—Indicates the crop’s sowing period (month/week), which strongly influences yield. Timely planting can increase productivity, while delays often reduce it [27].

Crop Harvested—Identifies the harvested crop type (e.g., tomatoes, wheat), enabling tracking of yield trends and refinement of recommendations based on interactions between crop type, soil, and fertilisers.

Water Temperature—Affects crop development and disease vulnerability. Slightly elevated temperatures can stimulate plant growth, but excessive heat may reduce chlorophyll level and increase susceptibility to disease [28].

Harvest Colour—Indicates crop ripeness at harvest and helps determine market readiness. Especially important for crops like bananas and tomatoes, this assists in optimising harvest timing.

Seed Supplier—Identifies the source of seeds used. Well-adapted, high-quality seeds improve germination, growth uniformity, and yield potential across different climates [29].

Season—Denotes the planting or harvesting period (Spring, Summer, Autumn, Winter). Seasonal alignment improves crop health, yield forecasting, and risk management through informed scheduling [30].

Distance to Retailer—Measures proximity to markets. Longer distances discourage perishable crops due to increased transport risks and costs, guiding farmers in selecting suitable crop types [31].

Harvest Yield (%)—Represents the success rate of harvested crops, serving as a key performance indicator for farm practices under specific environmental conditions.

2.3. Algorithm Applied

The K-Nearest Neighbour (KNN) algorithm is applied to the structured dataset composed of farmer profiles, each defined by a fixed set of attribute values [4,14,32,33,34,35]. These attributes are categorized into categorical attributes (e.g., Soil Type, Climate, Pest Management) and numerical attributes (e.g., Water Temperature, Humidity Level), allowing the system to assess both qualitative and quantitative similarities between farmers.

To fine-tune the performance of the KNN model, we evaluated the impact of varying the number of neighbours, k. Several values were tested, including

k = 5

, 10, and 100, across datasets of different sizes. As shown in Table 1, while larger k values slightly increased stability, they also diluted the relevance of the neighbours by incorporating more dissimilar farmer profiles. Since our recommendation task values specificity and meaningful similarity, we selected

k = 3

as the optimal choice. This value strikes a balance between accuracy and contextual relevance, ensuring recommendations are generated from the most similar farmers while maintaining computational efficiency.

To better understand the recommendation process, we outline below the steps involved in computing farmer similarity:

Step 1: Categorical Distance. The first step is to calculate the distance between our selected farmer and another from the list. If the value is the same as the query point, assign a distance of 0. If the value is different, assign a distance of 1. The compared farmer has the same soil type and pest management, but the climate is different. This yields a Categorical Distance, as computed in Equation (1):

\begin{matrix} C_{d} = 0 + 0 + 1 = 1 \end{matrix}

(1)

Step 2: Numerical Distance. For the numerical attributes, the Euclidean distance between two points is used,

A = (x_{1}, y_{1}, \dots)

and

B = (x_{2}, y_{2}, \dots)

, as given by Equation (2):

\begin{matrix} d (A, B) = \sqrt{{(x_{2} - x_{1})}^{2} + {(y_{2} - y_{1})}^{2} + \dots} \end{matrix}

(2)

Now let us see this equation applied for two users having a watering temperature of 32 °C versus 34 °C. The difference would be

32 - 34 = - 2

and the squared difference would be

{(- 2)}^{2} = 4

. The same logic applied for the humidity level will conclude to the Euclidean distance from Equation (3):

\begin{matrix} d (A, B) = \sqrt{4 + 144} \approx 12.17 \end{matrix}

(3)

Step 3: Total Distance. Step three consists of combining the Categorical and Numerical distances to obtain the Total Distance (Equation (4)). This should be applied to the selected user in combination with all the other users.

\begin{matrix} T_{d} = C_{d} + d (A, B) = 1 + 12.17 = 13.17 \end{matrix}

(4)

This is a practical way to implement the similarity-based algorithm in the agriculture section. By identifying these nearest neighbours, farmers can quickly find others facing similar challenges, allowing for knowledge exchange and more data-driven decisions [36].

2.4. Example: Distance Calculation Between Two Farmers

To clarify how distance influences similarity analysis in the similarity-based method, we illustrate the step-by-step computation using two farmer profiles from the dataset. Farmer 1 and Farmer 2 have the attribute values shown in Table 2.

Step 1: Categorical Distance. The categorical attributes in this context are Soil Type, Fertiliser Type, Climate, Pest Management, Plant Time, Crop Harvested, Harvest Colour, Seed Supplier, and Season. Out of these nine attributes, Farmer 1 and Farmer 2 match only on Climate and differ on the remaining eight. We apply the simple matching distance, where a value of 0 indicates a match and 1 a mismatch:

\begin{matrix} C_{d} = & 1 (Soil) + 1 (Fertiliser) + 0 (Climate) + 1 (Pest) \\ + 1 (Plant Time) + 1 (Crop) + 1 (Harvest Colour) \\ + 1 (Supplier) + 1 (Season) = 8 \end{matrix}

(5)

Step 2: Numerical Distance. The numerical attributes considered are Humidity Level, Water Temperature, Distance to Retailer, and Harvest Yield. These are compared using the Euclidean distance formula:

\begin{matrix} {(83 - 76)}^{2} & = 49 (Humidity Level) \\ {(21 - 20)}^{2} & = 1 (Water Temperature) \\ {(131 - 133)}^{2} & = 4 (Distance to Retailer) \\ {(50 - 45)}^{2} & = 25 (Harvest Yield) \\ d (A, B) & = \sqrt{49 + 1 + 4 + 25} = \sqrt{79} \approx 8.89 \end{matrix}

Step 3: Total Distance. The overall distance between the two farmers, used for neighbour ranking, combines both categorical and numerical components:

\begin{matrix} T_{d} = C_{d} + d (A, B) = 8 + 8.89 = 16.89 \end{matrix}

(6)

Interpretation and Influence on Analysis. This total distance reflects a moderate similarity score. While numerical environmental parameters such as humidity, water temperature, and harvest yield are relatively close, the significant number of categorical mismatches highlights diverging agricultural practices. In the context of the similarity-based method, such a distance affects the recommendation strength: Farmer 2 may provide useful input on environmental conditions and operational metrics, but their advice on crop-specific strategies, planting periods, or pest control may be less reliable for Farmer 1. By quantifying both exact and approximate similarity, the system ensures more context-aware and informed decision-making for farmers.

2.5. Potential Solutions to Improve Scalability in Agricultural Similarity-Based Method

In the similarity-based method, the approach is to calculate the distances between farmers to determine the most similar users. This requires calculating the distance between the selected farmer and all the other users, as discussed above. This may be challenging to implement promptly when applied to real-time data. For this, the best approach is using ANN algorithms, which offer a solution by trading off some accuracy for speed, making them ideal for real-time applications where quick decisions are essential [37]. ANN algorithms can quickly identify neighbours by focusing on approximate similarities rather than on exact distances. One of the most popular ANN methods is Locality Sensitive Hashing (LSH), which is widely used in high-dimensional data to identify similar items more efficiently [38,39,40,41,42].

LSH creates hash functions that map similar data points into the same “bucket” or hash space. The algorithm assigns each data point a hash value based on its features, so that points close in the feature space (similar users) are likely to fall into the same bucket. Table 3 provides an overview of how the number of hash functions and bands impact the balance between speed and accuracy in our dataset, helping to adjust the performance of the LSH algorithm.

Instead of comparing every single farmer, LSH allows the algorithm to search only within the bucket where the target farmer is located. This drastically reduces the search space, speeding up the recommendation process.

The key idea behind LSH is that it preserves locality: similar data points end up in the same or nearby buckets with high probability.

This approach, applied in our similarity-based algorithm, will conclude to a quicker farmer grouping sharing the same environmental conditions, for example, without needing to compare one by one. This grouping allows the system to deliver recommendations faster, even as the dataset grows.

3. Results

To evaluate the effectiveness of the similarity-based decision support system in agricultural contexts, both the K-Nearest Neighbour (KNN) and Approximate Nearest Neighbour using Locality Sensitive Hashing (ANN-LSH) algorithms were applied to a synthetically generated dataset, with sizes ranging from 500 to 100,000 farmer profiles.This setup enables a comparative assessment of how each method identifies and ranks peer profiles with similar agricultural conditions, thereby informing the design of scalable and responsive decision-support tools for farming. The corresponding attribute ranges are detailed in Appendix A.

By examining the ordering and selection of similar farmers, the relative strengths and limitations of exact (KNN) versus approximate (ANN-LSH) neighbour search approaches are highlighted. These findings are particularly pertinent for practical scenarios where computational efficiency and response time are of paramount importance.

Figure 1 and Figure 2 present the algorithmic of the two algorithms, illustrating the internal mechanisms used to calculate similarity and rank neighbours [43].

While both methods return overlapping groups of similar farmers, the order of the recommendations differs. For instance, the top-ranked recommendation produced by KNN (Figure 3) may occupy a lower position in the ANN-LSH output (Figure 4). Such discrepancies are expected due to the algorithms’ differing strategies. For practitioners, the implications of these variations are significant, as recommendation order may influence which practices or peer cases are prioritised when making decisions.

To assess scalability, execution time was benchmarked at increasing dataset sizes. For 500 records, KNN required 0.0481 s, whereas ANN-LSH achieved a similar response of 0.0674 s. Initial runs included console-based debug output for neighbour retrieval, later disabled to ensure accurate timing results.

A notable performance difference appeared at 8000 farmers, where KNN exhibited a runtime of 6.19 s, while ANN-LSH required only 0.235 s. This divergence continued as the dataset scaled further.

When the dataset was expanded to 100,000 farmers, KNN’s execution time reached approximately 2043.278 s (over 34 min), while ANN-LSH completed in just 1.004 s. These trends are depicted in Figure 5 and Figure 6, underscoring ANN-LSH’s significant computational advantage for large-scale applications.

ANN-LSH performance is influenced by the configuration of its parameters, specifically the number of hash functions and bands. As shown in Table 3, lower values yield faster execution but lower accuracy, while higher values improve precision at the cost of increased computational overhead [44,45].

To further illustrate this trade-off, Table 4 presents execution times for various parameter configurations applied to a dataset of 100,000 farmers.

These results confirm that increasing the number of hash functions and bands enhances the approximation quality of ANN-LSH but also leads to slower performance. If configured inappropriately, the algorithm may fail to retrieve sufficiently similar neighbours, reducing the effectiveness of the recommendations delivered to farmers.

In summary, both algorithms offer viable approaches for generating peer-based agricultural recommendations. However, ANN-LSH demonstrates a significant advantage in terms of scalability and runtime efficiency, making it a promising candidate for real-time integration in future data-intensive agricultural advisory systems. Its flexibility in parameter tuning further enables trade-offs between speed and precision, thereby supporting practical deployment in diverse farming environments.

To further clarify what influences the differences between the two algorithms, Table 5 summarises the key parameters affecting their behaviour, including their impact on accuracy, similarity grouping, and execution time.

3.1. Sensitivity Analysis of Input Variables

To determine which input variables have the greatest influence on the classification results of both KNN and ANN-LSH models, we performed a sensitivity analysis inspired by the methodology used in [46]. For KNN, a one-at-a-time (OAT) perturbation approach was adopted, in which each attribute was varied individually while others were kept constant, and the impact on classification accuracy was measured. For ANN-LSH, the connection weight method, based on analysing the magnitude of feature weights in the ANN structure, was used to infer the relative importance of each variable by analysing the impact of feature weights on the output layer.

The analysis shows that Water Temperature had the strongest influence on the KNN model. This is expected, as it is a key environmental variable that differentiates farmer profiles in the dataset. For ANN-LSH, Humidity Level was the most significant, suggesting that the hash functions were more sensitive to this feature in grouping similar farmers. Other high-impact variables included Soil Type and Pest Management, which relate directly to farming strategies and environmental adaptation, as it can be seen in Table 6.

This sensitivity analysis helps highlight the robustness of each model in capturing the most discriminative features and supports a better understanding of how model-based similarities are determined, which in turn informs decision-making for agricultural recommendations.

3.1.1. Benefits

The predictive models developed in this study directly support farmers by offering tailored insights derived from peers facing similar environmental and operational conditions. For example, a farmer experiencing reduced yield during spring planting can use the system to identify others with comparable soil, climate, and humidity profiles who achieved better results. By analysing what fertilisers or pest management strategies those peers used, the farmer can make more informed decisions for future crop cycles. Similarly, recommendations for optimal planting periods or seed suppliers can be extracted by identifying consistent patterns in successful neighbouring profiles. These applications demonstrate the model’s role in enabling data-driven, context-aware decisions on the ground.

Both KNN and ANN-LSH models enable the provision of personalised recommendations for farmers sharing similar climatic, soil, and crop conditions. The system can suggest agricultural practices associated with successful harvests or inputs (such as fertilisers or pest control methods) that have contributed to healthier crops. These data-driven recommendations can lead to improved productivity, higher-quality yields, and increased income.

KNN offers a straightforward approach that is easy to implement and does not require intensive training phases. It is particularly effective for basic similarity-based searches where transparency and interpretability are key.

On the other hand, ANN-LSH provides scalability and efficiency for real-time applications. It excels in scenarios where large datasets must be queried quickly, enabling the rapid identification of similar farmers without sacrificing too much accuracy. This makes it ideal for operational environments with continuous data updates.

In addition to supporting decision-making, the approach fosters knowledge sharing among farming communities. By building a network of similar profiles and outcomes, the system promotes the diffusion of best practices across farmers facing comparable challenges, ultimately leading to improved agricultural resilience and innovation.

3.1.2. Disadvantages

The KNN method struggles with large datasets, as it requires computing the distance from the selected point to all other data points. As the dataset grows, the time needed to compute recommendations increases accordingly.

While ANN-LSH improves speed and scalability, it may sometimes yield approximate rather than exact neighbours. This approximation can overlook subtle distinctions between farmers that would otherwise be captured by KNN.

Both algorithms rely on the availability and accuracy of the input data. If inconsistencies exist in the dataset, they can negatively affect the quality of the resulting recommendations [47].

Although KNN and ANN-LSH can effectively identify similar profiles, they may struggle to capture complex interactions, such as the combined effects of soil type and climate conditions on crop yield.

KNN provides a practical, easy-to-implement tool for recommending practices based on similarities between farmers. However, it performs best with clean, structured data and when applied to datasets of manageable size. When combined with other analytical methods or integrated into a broader system, KNN can significantly enhance decision-making, making it a valuable asset.

While the current implementation relies on synthetic and static input data, ANN-LSH offers the computational efficiency required for future real-time or large-scale deployment. These comparative advantages are summarised in Table 7.

4. Discussion

This study examined the application of K-Nearest Neighbour (KNN) and Approximate Nearest Neighbour using Locality Sensitive Hashing (ANN-LSH) in agricultural recommendation systems. The effectiveness of both algorithms was analysed in identifying similar farming profiles and suggesting contextually relevant practices. While KNN yielded highly accurate recommendations by exhaustively comparing data points, ANN-LSH demonstrated a scalable and faster alternative by leveraging hash functions to approximate neighbours efficiently.

The results confirm that both methods successfully identify meaningful patterns in farmer profiles, aiding data-driven decision-making. In particular, KNN’s precision and ANN-LSH’s performance balance offer complementary benefits depending on use-case demands. Despite algorithmic differences, both have proven useful in enabling farmers to optimise crop selection, identify reliable seed suppliers, and improve pest management strategies, ultimately enhancing productivity and resource efficiency.

While this study provides a promising proof of concept, several areas warrant further investigation to strengthen the system’s accuracy, flexibility, and real-world deployment potential. Future implementations should integrate real-time environmental data, such as live weather updates, soil conditions, and climate anomalies, to refine recommendations dynamically and enhance responsiveness. Moreover, combining KNN and ANN-LSH with deep learning techniques could lead to hybrid models that preserve accuracy while improving scalability.

Personalisation and adaptive learning also represent promising directions for improvement. Models that evolve based on individual farmer feedback and observed field conditions could improve the long-term relevance of recommendations. Additionally, the methodology could be extended to other agricultural sectors, including livestock management, greenhouse cultivation, and hydroponic systems, where similar decision-support challenges exist.

Finally, user-friendly deployment is essential for widespread adoption. A mobile-accessible and intuitive interface, potentially incorporating voice command functionality, would significantly improve usability for smallholder farmers, many of whom may have limited technological proficiency. Addressing these development opportunities could help transform the proposed system into a robust and intelligent decision-support tool, thereby contributing further to the digitalisation and sustainability of modern agriculture.

5. Conclusions

This paper presented a comparative study of exact (KNN) and approximate (ANN-LSH) neighbour search algorithms applied to agricultural recommendation systems. The innovation lies in adapting these machine learning techniques—traditionally used in domains such as e-commerce or document retrieval—to simulate farmer-to-farmer recommendations within a structured, domain-specific dataset.

A synthetically generated dataset of farming profiles, supported by literature-grounded attribute ranges, served as the foundation for testing and validating algorithmic performance. This approach ensured experimental control while maintaining real-world relevance.

This work makes two primary contributions. First, it demonstrates that both KNN and ANN-LSH are capable of generating meaningful and explainable recommendations for farmers. Second, it quantifies the performance trade-offs between accuracy and scalability. While KNN achieves high precision, its execution time increases considerably with larger datasets. By contrast, ANN-LSH provides a fast and scalable alternative, delivering approximate results with substantially lower computational cost.

This study demonstrates how similarity-based decision support systems can facilitate peer-informed agricultural decision-making, leading to improved yield optimisation, reduced waste, and enhanced sustainability. The adaptation of ANN-LSH to agricultural contexts represents a novel contribution to the field and sets a foundation for future research in hybrid or real-time recommender architectures.

As the agricultural sector continues its digital transition, such systems can serve as critical enablers of precision agriculture—linking farmers not only to data but also to each other in intelligent, actionable ways.

Author Contributions

Conceptualization, I.B. and O.M.; methodology, R.E.; software, I.B.; validation, I.B. and R.E.; formal analysis, H.V.; investigation, I.B.; resources, R.E.; data curation, I.B.; writing—original draft preparation, I.B.; writing—review and editing, R.E.; visualization, O.M.; supervision, H.V.; project administration, H.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data supporting the reported results are publicly available at https://github.com/BaraianIulia/agrirecsys- (accessed on 20 March 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

KNN	K-Nearest Neighbour
ANN	Approximate Nearest Neighbour
LSH	Locality Sensitive Hashing

Appendix A. Synthetic Attribute Value Ranges and Sources

To support reproducibility and transparency, this appendix presents the predefined value ranges used in generating the synthetic dataset. Each attribute was designed to reflect realistic agricultural conditions based on published sources or agronomic reports. Values were encoded programmatically using the AgricultureDatasetGenerator.java application. Ranges were grounded in domain-specific research where available.

Table A1. Synthetic attribute value ranges and sources.

Attribute	Sample Values / Range	Reference
Soil Type	Sandy, Clay, Saline, Loamy	[21]
Fertiliser Type	Organic, Chemical, Compost	[23]
Climate Data	Tropical, Temperate, Arid	[24]
Humidity Level (%)	60–85	[25]
Pest Management	Chemical–Synthetic, Chemical–Organic, Biological, Integrated	[26]
Plant Time	Early/Late Spring or Summer, Autumn, Winter	[30]
Crop Harvested	Tomato, Carrot, Wheat, Corn, Lettuce, Potato	[22]
Water Temperature (°C)	20–35	[28]
Harvest Colour	Rare, Medium-Rare, Medium, Medium-Well, Well-Done, Overdone	[48]
Seed Supplier	AgriSeeds Co., GreenFields, CropLife Solutions, FarmGrow Inc.	[29]
Season	Spring, Summer, Autumn, Winter	[30]
Distance to Retailer (km)	128–134	[31]
Harvest Yield (%)	20–100	[8]

Each attribute was synthetically generated using random sampling within the defined value ranges. Fictional names (e.g., seed suppliers) were included solely for simulation purposes and do not represent real entities. The Java source code and full dataset are available as described in the Data Availability Statement.

References

Kong, X.; Jiang, H.; Yang, Z.; Xu, Z.; Xia, F.; Tolba, A. Exploiting Publication Contents and Collaboration Networks for Collaborator Recommendation. PLoS ONE 2016, 11, e0148492. [Google Scholar] [CrossRef] [PubMed]
Deshmukh, M.; Jaiswar, A.; Joshi, O.; Shedge, R. Farming Assistance for Soil Fertility Improvement and Crop Prediction using XGBoost. Int. Conf. Autom. Comput. Commun. 2022, 44, 03022. [Google Scholar] [CrossRef]
Hochman, Z.; van Rees, H.; Carberry, P.S.; Hunt, J.R.; McCown, R.L.; Gartmann, A.; Holzworth, D.; Rees, S.v.; Dalgliesh, N.P.; Long, W.; et al. Re-inventing model-based decision support with Australian dryland farmers. 4. Yield Prophet® helps farmers monitor and manage crops in a variable climate. Crop Pasture Sci. 2009, 60, 1057–1070. [Google Scholar] [CrossRef]
Jaiswal, S.; Kharade, T.; Kotambe, N.; Shinde, S. Collaborative recommendation system for agriculture sector. ITM Web Conf. 2020, 32, 03034. [Google Scholar] [CrossRef]
Guha, R. Improving the performance of an artificial intelligence recommendation engine with deep learning neural nets. In Proceedings of the 2021 6th International Conference for Convergence in Technology (I2CT), Bangalore, India, 16–18 December 2021; pp. 1–7. [Google Scholar]
Ommane, Y.; Rhanbouri, M.A.; Chouikh, H.; Jbene, M.; Chairi, I.; Lachgar, M.; Benjelloun, S. Machine Learning-based Recommender Systems for Crop Selection: A Systematic Literature Review. In Machine Intelligence for Smart Applications: Opportunities and Risks; Springer: Cham, Switzerland, 2023; pp. 21–59. [Google Scholar]
Gurjar, G.N.; Ram, V.; Swami, S. Effect of organic mulches and planting date on soil chemo-biological properties and economics of rice-potato system in Meghalaya: A review. Int. J. Chem. Stud. 2019, 7, 779–783. [Google Scholar]
Smith, J.; Green, E.; Patel, R. Impact of Conservation Agriculture on Soil Properties and Crop Yields: A Meta-Analysis. Soil Tillage Res. 2020, 198, 104540. [Google Scholar]
Jones, D.R.; Brown, L.M.; Zhang, W. Global Crop Losses Due to Pests and Diseases: Economic and Agricultural Perspectives. Annu. Rev. Phytopathol. 2021, 58, 1–20. [Google Scholar]
Muninarayanappa, V.; Ranjan, R. Agriculture data analysis using parallel k-nearest neighbor classification algorithm. Int. J. Reconfig. Embed. Syst. 2024, 2089, 4864. [Google Scholar]
Konaté, J.; Diarra, A.G.; Diarra, S.O.; Diallo, A. SyrAgri: A Recommender System for Agriculture in Mali. Information 2020, 11, 561. [Google Scholar] [CrossRef]
ZhuanSun, F.; Chen, J.; Chen, W.; Sun, Y. Analysis of Precision Service of Agricultural Product e-Commerce Based on Multimodal Collaborative Filtering Algorithm. Math. Probl. Eng. 2022, 2022, 8323467. [Google Scholar] [CrossRef]
Ma, W.; Nowocin, K.; Marathe, N.; Chen, G.H. An interpretable produce price forecasting system for small and marginal farmers in India using collaborative filtering and adaptive nearest neighbors. In Proceedings of the Tenth International Conference on Information and Communication Technologies and Development, Ahmedabad, India, 4–7 January 2019; pp. 1–11. [Google Scholar]
Zhen, Z.; Wang, L.; Zhang, Y. Aquaculture information recommendation based on collaborative filtering algorithm and web logs. Trans. Chin. Soc. Agric. Eng. 2017, 33, 260–265. [Google Scholar]
Paradarami, T.K.; Bastian, N.D.; Wightman, J.L. A hybrid recommender system using artificial neural networks. Expert Syst. Appl. 2017, 83, 300–313. [Google Scholar] [CrossRef]
Hassan, M.; Hamada, M. A neural networks approach for improving the accuracy of multi-criteria recommender systems. Appl. Sci. 2017, 7, 868. [Google Scholar] [CrossRef]
Chirde, A.A.; Biradar, U.K. A survey on collaborative filtering in accordance with the agricultural application. Int. J. Comput. Appl. 2014, 975, 8887. [Google Scholar]
Resnick, P.; Iacovou, N.; Suchak, M.; Bergstrom, P.; Riedl, J. Grouplens: An open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, Chapel Hill, NC, USA, 22–24 October 1994; pp. 175–186. [Google Scholar]
Yang, X.; Cao, D.; Chen, J.; Xiao, Z.; Daowd, A. AI and IoT-based Collaborative Business Ecosystem: A Case in Chinese Fish Farming Industry. Int. J. Technol. Manag. 2020, 82, 151–171. [Google Scholar] [CrossRef]
Ghauth, K.I.; Abdullah, N.A. A Novel, Generalized Recommender System for Social Media Using the Collaborative-Filtering Technique. In Proceedings of the 2014 International Conference on Information Science & Applications (ICISA), Seoul, Republic of Korea, 28–30 April 2014; pp. 1–5. [Google Scholar]
Stenberg, B. Soil attributes as predictors of crop production under standardized conditions. Biol. Fertil. Soils 1998, 27, 104–112. [Google Scholar] [CrossRef]
Barvin, M.S.; Sampradeepraj, R.; Anitha, J.; Suresh, S. Crop Recommendation Systems Based on Soil and Environmental Parameters: A Review. Proceedings 2021, 58, 97. [Google Scholar]
United States Department of Agriculture (USDA). Fertilizer Recommendations Guide. Available online: https://www.nrcs.usda.gov/sites/default/files/2023-06/EC750_2023.pdf (accessed on 16 February 2025).
United Nations Economic and Social Commission for Western Asia (UNESCWA). Guidelines on Climate Data for Agricultural Productivity; United Nations: New York, NY, USA, 2021; Available online: https://www.unescwa.org/sites/default/files/pubs/pdf/guidelines-climate-data-agricultural-productivity-english_0.pdf (accessed on 24 January 2025).
Kozik, A.; Kliebenstein, D.J. Humidity and its role in abiotic stress signaling. New Phytol. 2021, 232, 24–32. [Google Scholar]
United States Department of Agriculture. Pest Management Strategies in U.S. Agriculture; Economic Research Service: Washington, DC, USA, 2023. Available online: https://www.ers.usda.gov/topics/farm-practices-management/crop-livestock-practices/pest-management (accessed on 24 January 2025).
University of Minnesota. Planting Date is Just One Factor Affecting Yield Potential. In West Central Research and Outreach Center News; University of Minnesota: Minneapolis, MN, USA, 2019; Available online: https://wcroc.cfans.umn.edu/wcroc-news/planting-date (accessed on 4 February 2025).
Upadhyay, A.; Upadhyaya, J.; Upadhyaya, H. Effects of Elevated Water Temperature on Growth of Basil Using Nutrient Film Technique. HortScience 2022, 57, 925–930. Available online: https://journals.ashs.org/hortsci/view/journals/hortsci/57/8/article-p925.xml (accessed on 10 February 2025).
Colorado State University Extension. Improve Yield with High-Quality Seed. 2023. Available online: https://extension.colostate.edu/topic-areas/agriculture/improve-yield-with-high-quality-seed-0-303/ (accessed on 16 February 2025).
Iowa State University Extension and Outreach. Planting and Harvesting Times for Garden Vegetables. 2009. Available online: https://www.creighton.edu/fileadmin/user/health/wellness-council/docs/Programs/Planting__Harvesting_Times-ISU.pdf (accessed on 20 February 2025).
Buckmaster, A.D. Going the Distance: The Impact of Distance to Market on Smallholders Crop and Technology Choices. Master’s Thesis, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA, 2012. Available online: https://vtechworks.lib.vt.edu/handle/10919/33126 (accessed on 24 January 2025).
Syahminan, S.; Maknunah, J.; Dijaya, R.; Hindarto, H. KNN (K-Nearest Neighbor) for identifying agricultural land. J. Phys. Conf. Ser. 2019, 1402, 066059. [Google Scholar] [CrossRef]
Hamada, M.; Hassan, M. Artificial neural networks and particle swarm optimization algorithms for preference prediction in multi-criteria recommender systems. Informatics 2018, 5, 25. [Google Scholar] [CrossRef]
Adeniyi, D.A.; Wei, Z.; Yongquan, Y. Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. Appl. Comput. Inform. 2016, 12, 90–108. [Google Scholar] [CrossRef]
Singh, R.H.; Maurya, S.; Tripathi, T.; Narula, T.; Srivastav, G. Movie recommendation system using cosine similarity and KNN. Int. J. Eng. Adv. Technol. 2020, 9, 556–559. [Google Scholar] [CrossRef]
Chen, Z.; Zhao, C.; Wu, H. Field information recommendation based on context-aware and collaborative filtering algorithm. In Computer and Computing Technologies in Agriculture XI; Springer: Cham, Switzerland, 2019; pp. 486–498. [Google Scholar]
Buaba, R. Fast Locality Sensitive Hashing Algorithm for Approximate Nearest Neighbor Search: A Practical Data Mining Approach. Ph.D. Thesis, North Carolina Agricultural and Technical State University, Greensboro, NC, USA, 2012. [Google Scholar]
Dai, H.; Zhu, M.; Gui, X. LSH Models in Federated Recommendation. Appl. Sci. 2024, 14, 4423. [Google Scholar] [CrossRef]
Aytekin, A.M.; Aytekin, T. Real-time recommendation with locality sensitive hashing. J. Intell. Inf. Syst. 2019, 53, 1–26. [Google Scholar] [CrossRef]
Chen, B.; Liu, Z.; Peng, B.; Xu, Z.; Li, J.L.; Dao, T.; Song, Z.; Shrivastava, A.; Re, C. Mongoose: A learnable LSH framework for efficient neural network training. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
Baluja, S.; Covell, M. Learning to hash: Forgiving hash functions and applications. Data Min. Knowl. Discov. 2008, 17, 402–430. [Google Scholar] [CrossRef]
Covell, M.; Baluja, S. LSH banding for large-scale retrieval with memory and recall constraints. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, 19–24 April 2009; p. 1865. [Google Scholar]
Mucherino, A.; Papajorgji, P.J.; Pardalos, P.M. K-nearest neighbor classification. In Data Mining in Agriculture; Springer: New York, NY, USA, 2009; pp. 83–106. [Google Scholar]
Zhu, E.; Nargesian, F.; Pu, K.Q.; Miller, R.J. LSH ensemble: Internet-scale domain search. arXiv 2016, arXiv:1603.07410. [Google Scholar] [CrossRef]
Herlocker, J.L.; Konstan, J.A.; Terveen, L.G.; Riedl, J.T. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. 2004, 22, 5–53. [Google Scholar] [CrossRef]
Hashemi, M.; Yost, M.; Holt, J. Field-Scale Evaluation of Low-Elevation and Mobile Drip Irrigation Systems. Agric. Water Manag. 2025, 314, 109502. [Google Scholar] [CrossRef]
Burke, R.; Felfernig, A.; Göker, M.H. Recommender systems: An overview. AI Mag. 2011, 32, 13–18. [Google Scholar] [CrossRef]
Smith, J.; Doe, A.; Brown, L. A Decision Support System for Crop Recommendation Using Machine Learning Techniques. Appl. Sci. 2024, 14, 1256. [Google Scholar]

Figure 1. KNN algorithmic flow.

Figure 2. ANN-LSH algorithmic flow.

Figure 3. KNN results.

Figure 4. ANN-LSH results.

Figure 5. KNN execution time.

Figure 6. ANN-LSH execution time.

Table 1. Effect of different k values on KNN performance.

k	Accuracy (Cross-Validation)	Execution Time (s)
3	91.2%	1.34
5	89.7%	1.45
10	87.1%	1.65
100	79.5%	2.40

Table 2. Attribute comparison between Farmer 1 and Farmer 2.

Attribute	Farmer 1	Farmer 2
Soil Type	Loamy	Clay
Fertiliser Type	Compost	Chemical
Climate	Tropical	Tropical
Humidity Level (%)	83	76
Pest Management	Integrated Pest	Chemical Control
	Management	Organic Pesticides
Plant Time	Winter	Early Spring
Crop Harvested	Corn	Wheat
Water Temperature (°C)	21	20
Harvest Colour	Medium-Rare	Medium
Seed Supplier	GreenFields	AgriSeeds Co.
Season	Spring	Autumn
Distance to Retailer (km)	131	133
Harvest Yield (%)	50	45

Table 3. Hash function and band value interpretation.

Hash Functions	Bands	Use Case
10	5	Fastest but least accurate
20	10	Balanced speed and accuracy
50	25	More accuracy, slower
100	50	Closest to KNN, very slow

Table 4. Hash function and band value usage.

Hash Functions	Bands	Execution Time (s)
10	5	2.286
20	10	2.652
50	25	3.55
100	50	5.657

Table 5. Comparison of parameters influencing KNN and ANN-LSH.

Algorithm	Key Parameter	Impact on Accuracy & Similarity	Execution Time Impact
KNN	Number of Neighbours (k)	Small k improves relevance; large k dilutes similarity	Increases linearly with dataset size
ANN-LSH	Hash Functions	Higher number yields better approximation of similarity	Slightly increases
ANN-LSH	Bands	More bands improve accuracy (like smaller k), but slow down execution	Increases moderately

Table 6. Top influential variables identified through sensitivity analysis.

Rank	KNN Model	ANN-LSH Model
1	Water Temperature	Humidity Level
2	Soil Type	Pest Management
3	Humidity Level	Water Temperature

Table 7. Comparison of KNN vs. ANN-LSH.

Feature	KNN	ANN
Time Complexity	Linear (O(n)) for search	Sub-linear with hashing
Accuracy	High (exact neighbors)	High for close neighbors, lower for far ones
Scalability	Struggles with large datasets	Scales well to large datasets
Real-Time Suitability	Limited	Suitable

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Baraian, I.; Valean, H.; Matei, O.; Erdei, R. Similarity-Based Decision Support for Improving Agricultural Practices and Plant Growth. Appl. Sci. 2025, 15, 6936. https://doi.org/10.3390/app15126936

AMA Style

Baraian I, Valean H, Matei O, Erdei R. Similarity-Based Decision Support for Improving Agricultural Practices and Plant Growth. Applied Sciences. 2025; 15(12):6936. https://doi.org/10.3390/app15126936

Chicago/Turabian Style

Baraian, Iulia, Honoriu Valean, Oliviu Matei, and Rudolf Erdei. 2025. "Similarity-Based Decision Support for Improving Agricultural Practices and Plant Growth" Applied Sciences 15, no. 12: 6936. https://doi.org/10.3390/app15126936

APA Style

Baraian, I., Valean, H., Matei, O., & Erdei, R. (2025). Similarity-Based Decision Support for Improving Agricultural Practices and Plant Growth. Applied Sciences, 15(12), 6936. https://doi.org/10.3390/app15126936

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Similarity-Based Decision Support for Improving Agricultural Practices and Plant Growth

Abstract

1. Introduction

2. Materials and Methods

2.1. Why Is User–User Filtering More Suitable?

2.2. Data Gathering

2.3. Algorithm Applied

2.4. Example: Distance Calculation Between Two Farmers

2.5. Potential Solutions to Improve Scalability in Agricultural Similarity-Based Method

3. Results

3.1. Sensitivity Analysis of Input Variables

3.1.1. Benefits

3.1.2. Disadvantages

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Synthetic Attribute Value Ranges and Sources

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI