Over 20 Years of Machine Learning Applications on Dairy Farms: A Comprehensive Mapping Study

Machine learning applications are becoming more ubiquitous in dairy farming decision support applications in areas such as feeding, animal husbandry, healthcare, animal behavior, milking and resource management. Thus, the objective of this mapping study was to collate and assess studies published in journals and conference proceedings between 1999 and 2021, which applied machine learning algorithms to dairy farming-related problems to identify trends in the geographical origins of data, as well as the algorithms, features and evaluation metrics and methods used. This mapping study was carried out in line with PRISMA guidelines, with six pre-defined research questions (RQ) and a broad and unbiased search strategy that explored five databases. In total, 129 publications passed the pre-defined selection criteria, from which relevant data required to answer each RQ were extracted and analyzed. This study found that Europe (43% of studies) produced the largest number of publications (RQ1), while the largest number of articles were published in the Computers and Electronics in Agriculture journal (21%) (RQ2). The largest number of studies addressed problems related to the physiology and health of dairy cows (32%) (RQ3), while the most frequently employed feature data were derived from sensors (48%) (RQ4). The largest number of studies employed tree-based algorithms (54%) (RQ5), while RMSE (56%) (regression) and accuracy (77%) (classification) were the most frequently employed metrics used, and hold-out cross-validation (39%) was the most frequently employed evaluation method (RQ6). Since 2018, there has been more than a sevenfold increase in the number of studies that focused on the physiology and health of dairy cows, compared to almost a threefold increase in the overall number of publications, suggesting an increased focus on this subdomain. In addition, a fivefold increase in the number of publications that employed neural network algorithms was identified since 2018, in comparison to a threefold increase in the use of both tree-based algorithms and statistical regression algorithms, suggesting an increasing utilization of neural network-based algorithms.


Introduction
Animal agriculture is responsible for 14.5% of global anthropogenic greenhouse gases emissions, 20% of which are due to dairy production [1]. With a 22% increase in global milk production forecasted between 2018 and 2027 [2], it is essential that the dairy sector adequately addresses the significant challenges ahead to ensure the future sustainability of the global dairy industry. This is coupled with the rapid intensification of milk production systems that has taken place over the past 20 years. This increased intensification may be due to the principles associated with modern agricultural systems that define progress in terms of efficiency and productivity [3]. This has led to economies of scale throughout the dairy industry, with increasing herd sizes reducing fixed costs per unit output, coupled with an emphasis on maximizing output per hectare of farmland and per unit of input (e.g., concentrate feed). However, increased numbers of dairy livestock will naturally result in an increased workload for farmers, which may reduce income per hour worked or potentially reduce animal health and wellbeing, as farmers must care for increased numbers

Methodology
This mapping study followed three primary stages, including: planning, conducting and reporting stages, as outlined by Kitchenham et al. [11]. In the planning stage, the research questions were defined, suitable databases identified and a robust search strategy was selected to identify the journal articles and conference papers (hereby referred to as publications) that could be used to answer the research questions. The databases were selected based on institutional access, their use in prior systematic literature reviews in the dairy research domain [9,10,12] and in conjunction with the ability or inability to easily carry out bulk downloads of publications. A heuristic approach was taken to identify the search string that provided a broad and unbiased search of the dairy literature without returning an unfeasible number of publications. During the conducting stage, the document search was carried out using the specifically defined search strings within each online database. The identified publications were filtered according to pre-determined selection criteria prior to analysis, whereby no quality assessment was performed in order to ensure maximum coverage. Relevant data required to answer each research question were then extracted from each publication and synthesised in the reporting stage via applicable charts, figures and tables.

Research Questions
The following research questions (RQ) were defined: RQ1. What countries/regions are responsible for the largest number of publications? RQ2. What journals and conference proceedings are research publications being published in? RQ3. What problem areas are being addressed using machine learning in the dairy farming domain? RQ4. What features are being employed to develop the machine learning models? RQ5. What machine learning algorithms are being utilised to develop the models? RQ6. Which evaluation metrics and methods are used?

Databases and Search Strategy
The literature search was carried out using five databases, Scopus, Science Direct, IEEE, Google Scholar and MDPI. These databases were selected as each allowed for the bulk downloading of publications (except for google scholar) while providing wide coverage of dairy-related research publications. Google Scholar returned a small number of publications; thus, bulk downloading was not required. A broad and unbiased search of the literature was undertaken to capture a wide range of publications from various areas within the dairy research domain [11] by using the search string "Dairy" AND ("machine learning" OR "artificial intelligence"). By default, each database's search function also searched for the approximate search phrase "machine-learning". This search string ensured that (1) preference was not given to any particular machine learning algorithm, and (2) a broad search of the literature was carried out without returning an unfeasible number of publications. Publications that contained the search string in either their abstract, title and/or keywords fields were identified using each database's search function. However, Google Scholar did not allow for searches to be carried out on the publication's abstract and keywords fields, so only the publication title field was used. The search strategy focused on identifying studies published between 1999 and 2021, whereby the last search was carried out on 9 June 2021. Initially, the search strategy aimed to identify studies published between 1990 and 2021; however, no studies were found prior to 1999. In total, 749 studies were identified between Scopus (n = 382), ScienceDirect (n = 109), IEEE (n = 189), Google Scholar (n = 45) and MDPI (n = 21) databases.

Selection Criteria
To filter only relevant publications required to answer the research questions (defined in Section 2.1), exclusion and inclusion criteria were determined, similar to Slob et al. [10]. To be included in the study, all exclusion criteria must be false, and all inclusion criteria must be true [11].
The exclusion criteria were: 1.
The publication was not related to machine learning applied to dairy farming 2.
The publication did not report empirical findings 3.
The publication was not written in English 4.
The publication was a duplicate study 5.
There was no full text available 6.
The publication was a review or survey study 7.
The publication was published before 1999 The inclusion criteria were: 1.
The publication features the development of machine learning models related to dairy farming 2.
The publication is a primary study

Data Collection
Each publication identified by the search strategy outlined in Section 2.2 was analyzed relative to the exclusion and inclusion criteria (Section 2.3). The search strategy was carried out in line with PRISMA guidelines, as shown in Figure 1 [13]. The flow of documents from initial identification to the manuscript screening/filtering stage to the final subset of documents included in the mapping study is shown in Figure 1. The number of studies excluded due to each exclusion criterion is also highlighted at the screening stage. Of the 746 documents initially identified, 10 were not written in English, 78 were review/survey studies and 294 had no full text available for downloading from the database website. In addition, 210 publications were found to be outside the scope of developing machine learning models for dairy farming, while 32 documents were removed due to being duplicate studies. In addition, seven publications were included through snowballing, as employed by Slob et al. [10]. Cumulatively, 129 individual publications passed the selection criteria stage and were then included in the mapping study (Appendix E).
Relevant data were extracted from each of the 129 studies to respond to each of the six research questions. This was carried out by reading each publication and extracting the following information: (1) the year of publication, (2) publication source (name of the journal or conference proceedings), (3) whether the publication was a journal article or conference paper, (4) the country of origin (identified as country or countries where data collection took place), (5) the dependent variable or variables used, (6) the problem type (e.g., classification, regression or clustering), (7) the features employed, (8) the machine learning algorithms utilised, (9) the evaluation metrics used for synthesising model performance and (10) the validation technique used to quantify model performance. being duplicate studies. In addition, seven publications were included through snowballing, as employed by Slob et al. [10]. Cumulatively, 129 individual publications passed the selection criteria stage and were then included in the mapping study (Appendix E). Relevant data were extracted from each of the 129 studies to respond to each of the six research questions. This was carried out by reading each publication and extracting the following information: (1) the year of publication, (2) publication source (name of the journal or conference proceedings), (3) whether the publication was a journal article or conference paper, (4) the country of origin (identified as country or countries where data collection took place), (5) the dependent variable or variables used, (6) the problem type (e.g., classification, regression or clustering), (7) the features employed, (8) the machine learning algorithms utilised, (9) the evaluation metrics used for synthesising model performance and (10) the validation technique used to quantify model performance.

Data Analysis
To ease with the synthesis of information, research categories, algorithm categories and feature categories were determined for each study. Categorisation was necessary to ensure each research question was addressed clearly and concisely. Firstly, each study was categorised according to its specific area of dairy research, whereby six categories were identified (RQ3) based on cognate review studies in the field: physiology and health, animal husbandry, milk, feeding, management and behavior analysis. Cockburn [9] employed physiology and health, animal husbandry, feeding, management and behavior analysis,

Data Analysis
To ease with the synthesis of information, research categories, algorithm categories and feature categories were determined for each study. Categorisation was necessary to ensure each research question was addressed clearly and concisely. Firstly, each study was categorised according to its specific area of dairy research, whereby six categories were identified (RQ3) based on cognate review studies in the field: physiology and health, animal husbandry, milk, feeding, management and behavior analysis. Cockburn [9] employed physiology and health, animal husbandry, feeding, management and behavior analysis, while Slob et al. [10] assessed studies milk disease detection, quantifying milk production and milk quality. The range of dependent variables that were used to determine which of the six categories of dairy research each study related to is shown in Appendix A. Secondly, the machine learning algorithms used within each study were also categorised accordingly, whereby eight categories were identified (RQ5): trees (e.g., decision trees), statistical regression (e.g., multiple linear regression, ridge regression), neural networks (e.g., multi-layered perceptron, deep learning networks), Bayes (e.g., naïve Bayes, Bayesian-LASSO), meta (e.g., bagging, boosting), rule-based (e.g., Jrip, OneR), clustering (e.g., k-means, DBSCAN) and other (e.g., support vector machine, KNN). The full list of machine learning algorithms used and their corresponding category is shown in Appendix C. Additionally, the features used within each study were categorised accordingly, whereby 11 categories were identified (RQ4): calving/pregnancy information, cow characteristics and clinical information, diet/feeding, farm characteristics and management, lactation information, meteorological conditions, milk characteristics, milking parameters, sensors, soil characteristics and other. The full list of features used and their corresponding categories are shown in Appendix B. Lastly, the categorisation of journals and conference proceedings was also carried out to help improve data synthesis. The other journal category represented journals that had less than four published articles included in this study, while all conference papers were included in a conference paper category (RQ2). The full lists of journals and conferences proceedings are shown in Appendix D.
Categorisation was straightforward when a publication focused on only one dependent variable. However, 13 publications focused on the prediction of multiple dependent variables. In these cases, the problem type, algorithms employed and features used were recorded for each dependent variable. Each dependent variable was categorised according to its specific area of dairy research. When a publication focused on the prediction of multiple dependent variables, each attributable to a different area of dairy research, each dependent variable was treated as a separate study. Otherwise, information would be excluded when; assessing the frequency of studies published in different research areas over time (RQ3), investigating the geographical locations attributed to different research areas (RQ1) and when evaluating the research problem type and popular journals and conference proceedings associated with different areas of dairy research (RQ2).
When a publication focused on the prediction of multiple dependent variables in the same dairy research area but utilising different features, each study involving unique sets of features were treated as a separate study. However, this was only applicable when addressing RQ4, whereby the features employed in different research areas in conjunction with the machine learning algorithms used was investigated. Otherwise, information related to the features used within each research area would be excluded.
Three studies involved the collection of data in more than one country/region. In such instances, each country was treated as though it had independently carried out the study. This was applicable when assessing the geographical distribution of the publications (RQ1). Assessing the geographical locations of publications was carried out on an individual publication basis, irrespective of the number of dependent variables. Likewise, assessing the algorithms used (RQ5), validation methods and model performance metrics used (RQ6) throughout the literature were carried out on an individual publication basis, as these were found to be consistent throughout each publication irrespective of the number of dependent variables.

Geographical Distribution
The geographical distribution of the publications included in this study is shown in Figure 2. The geographical location was determined by the origin of the data used for model development. In total, 30 countries contributed data to machine learning in the dairy farming research domain. Data originated from one single country for 126 of the studies, with the remaining three studies having cross-border collaboration. These included collaborations between: (1) the United Kingdom, Italy, Sweden and Finland; (2) Australia, Canada, Denmark and Ireland; and (3) Belgium, Canada, Ireland, Denmark and Germany. In relation to RQ1, the largest number of studies utilised data originating from the United States (n = 19), followed by Ireland (n = 15), Germany (n = 13) and the United Kingdom (n = 13), and Australia (n = 10) and China (n = 10). The remaining 24 countries contributed data to five or fewer research publications. However, from a continental perspective, Europe (n = 60) was by far the largest contributor of data, followed by North America (n = 24) and Asia (n = 27), Oceania (n = 13), South America (n = 8) and Africa (n = 2). Data originating from Europe were used in studies focusing on the physiology and health of dairy cattle (n = 19), analysing animal behavior (n = 13), animal husbandry (n = 12), farm management (n = 8), milk (n = 5) and feeding (n = 3), as shown in Figure 3. Applying machine learning algorithms to assess the physiology and health of dairy cattle was also the most popular research category for the North America (n = 10) and Asia (n = 8) continents and joint most popular category in Oceania (n = 3) and South America (n = 3).
Australia, Canada, Denmark and Ireland; and (3) Belgium, Canada, Ireland, Denmark and Germany. In relation to RQ1, the largest number of studies utilised data originating from the United States (n = 19), followed by Ireland (n = 15), Germany (n = 13) and the United Kingdom (n = 13), and Australia (n = 10) and China (n = 10). The remaining 24 countries contributed data to five or fewer research publications. However, from a continental perspective, Europe (n = 60) was by far the largest contributor of data, followed by North America (n = 24) and Asia (n = 27), Oceania (n = 13), South America (n = 8) and Africa (n = 2). Data originating from Europe were used in studies focusing on the physiology and health of dairy cattle (n = 19), analysing animal behavior (n = 13), animal husbandry (n = 12), farm management (n = 8), milk (n = 5) and feeding (n = 3), as shown in Figure 3. Applying machine learning algorithms to assess the physiology and health of dairy cattle was also the most popular research category for the North America (n = 10) and Asia (n = 8) continents and joint most popular category in Oceania (n = 3) and South America (n = 3).  Germany. In relation to RQ1, the largest number of studies utilised data originating from the United States (n = 19), followed by Ireland (n = 15), Germany (n = 13) and the United Kingdom (n = 13), and Australia (n = 10) and China (n = 10). The remaining 24 countries contributed data to five or fewer research publications. However, from a continental perspective, Europe (n = 60) was by far the largest contributor of data, followed by North America (n = 24) and Asia (n = 27), Oceania (n = 13), South America (n = 8) and Africa (n = 2). Data originating from Europe were used in studies focusing on the physiology and health of dairy cattle (n = 19), analysing animal behavior (n = 13), animal husbandry (n = 12), farm management (n = 8), milk (n = 5) and feeding (n = 3), as shown in Figure 3. Applying machine learning algorithms to assess the physiology and health of dairy cattle was also the most popular research category for the North America (n = 10) and Asia (n = 8) continents and joint most popular category in Oceania (n = 3) and South America (n = 3).

Publications Timeline
The number of research studies published per year from 1999 to 2021, categorised according to each research area, is shown in Figure 4. Prior to 2018, the animal husbandry category was the largest research area representing 35% of all publications in that period, followed by behavior analysis (19%), management (15%) and physiology and health (15%). A significant increase in the number of publications occurred in 2018, whereby a total of 15 journal articles and conference papers were published, representing a 114% increase compared to 2017. This trend continued in 2019 and 2020, whereby year-on-year increases of 80% and 41% were recorded, respectively. This resulted in 74% of the publications included in this mapping study being published after 2017, representing a threefold increase. On average, between 2018 and 2021, the physiology and health research category was the largest research area (38%) (up from 15% between 1999 and 2017), followed by research related to behavior analysis (19%) and animal husbandry (14%). The physiology and health research category represented the largest research area in each year between 2018 and 2021, representing 40%, 37%, 39% and 35% of publications, respectively. Behavior analysis was the second-largest research category in 2018 (27%), 2019 (22%) and the first five months of 2021 (24%), while animal husbandry was the second-largest research category in 2020 (21%).

Publications Timeline
The number of research studies published per year from 1999 to 2021, categorised according to each research area, is shown in Figure 4. Prior to 2018, the animal husbandry category was the largest research area representing 35% of all publications in that period, followed by behavior analysis (19%), management (15%) and physiology and health (15%). A significant increase in the number of publications occurred in 2018, whereby a total of 15 journal articles and conference papers were published, representing a 114% increase compared to 2017. This trend continued in 2019 and 2020, whereby year-on-year increases of 80% and 41% were recorded, respectively. This resulted in 74% of the publications included in this mapping study being published after 2017, representing a threefold increase. On average, between 2018 and 2021, the physiology and health research category was the largest research area (38%) (up from 15% between 1999 and 2017), followed by research related to behavior analysis (19%) and animal husbandry (14%). The physiology and health research category represented the largest research area in each year between 2018 and 2021, representing 40%, 37%, 39% and 35% of publications, respectively Behavior analysis was the second-largest research category in 2018 (27%), 2019 (22%) and the first five months of 2021 (24%), while animal husbandry was the second-largest research category in 2020 (21%).

Publications Breakdown
The following section has two primary components: the first component provides a breakdown of the type of problems addressed in relation to the source journals that published the research studies and the areas of research that machine learning has been applied to throughout the literature. The second component provides a breakdown of each research area in relation to the features considered for model development and machine learning algorithms employed.

Problem Type, Journals/Conferences and Research Area
The flow of research studies from the type of problem addressed, to the publication destination, to the area of research carried out is shown in Figure 5. Overall, 65% of the research studies focused on addressing classification problems, 33% addressed regression

Publications Breakdown
The following section has two primary components: the first component provides a breakdown of the type of problems addressed in relation to the source journals that published the research studies and the areas of research that machine learning has been applied to throughout the literature. The second component provides a breakdown of each research area in relation to the features considered for model development and machine learning algorithms employed.

Problem Type, Journals/Conferences and Research Area
The flow of research studies from the type of problem addressed, to the publication destination, to the area of research carried out is shown in Figure 5. Overall, 65% of the research studies focused on addressing classification problems, 33% addressed regression problems, while 2% and 1% focused on clustering and tree analysis problems, respectively.
In relation to RQ2, the Computers and Electronics in Agriculture journal was responsible for publishing the largest number of research studies (21%), followed by the Journal of Dairy Science (16%). In addition, 27% of all research studies were published in other journals (Appendix D), whereby each journal was responsible for publishing less than four research articles included in this study, while 15% of all publications (20 conference papers) were published in 18 different conference proceedings. Concurrent with Section 3.2, and in relation to RQ3, the majority of studies focused on physiology and health research (32%), followed by animal husbandry (20%), behavior analysis (18%), milk (13%), management (11%) and feeding (6%). No clear trend or bias was found between the types of problems addressed and the publication sources, whereby the most popular destination for both classification and regression problems was the other journals category, followed by the Computers and Electronics in Agriculture journal. Regarding the destination of each publication in relation to the research area, the largest number of research publications published in other journals and the Computers and Electronics in Agriculture journal focused on physiology and health applications (n = 12 and n = 8, respectively). However, this varied from articles published in the Journal of Dairy Science, where the largest number of research articles focused on animal husbandry applications (n = 9). problems, while 2% and 1% focused on clustering and tree analysis problems, respectively. In relation to RQ2, the Computers and Electronics in Agriculture journal was responsible for publishing the largest number of research studies (21%), followed by the Journal of Dairy Science (16%). In addition, 27% of all research studies were published in other journals (Appendix D), whereby each journal was responsible for publishing less than four research articles included in this study, while 15% of all publications (20 conference papers) were published in 18 different conference proceedings. Concurrent with Section 3.2, and in relation to RQ3, the majority of studies focused on physiology and health research (32%), followed by animal husbandry (20%), behavior analysis (18%), milk (13%), management (11%) and feeding (6%). No clear trend or bias was found between the types of problems addressed and the publication sources, whereby the most popular destination for both classification and regression problems was the other journals category, followed by the Computers and Electronics in Agriculture journal. Regarding the destination of each publication in relation to the research area, the largest number of research publications published in other journals and the Computers and Electronics in Agriculture journal focused on physiology and health applications (n = 12 and n = 8, respectively). However, this varied from articles published in the Journal of Dairy Science, where the largest number of research articles focused on animal husbandry applications (n = 9).

Research Area, Features and Algorithms Used
The flow of research studies from a research category to the category of features considered to the category of machine learning algorithms is shown in Figure 6. Overall, 48% of research studies utilised sensor data for model development (RQ4), predominantly for physiology and health (n = 24) and behavior analysis (n = 24) applications. Accelerometer (n = 27), image (n = 7) and pedometer (n = 6) data were the three most frequently employed types of data collected by sensors, as shown in Appendix C. Sensor data were most frequently employed as feature data when developing artificial neural network models (n = 35), tree-based models (n = 32) and other model types (n = 31), whereby other models included the application support vector machine and k-nearest neighbor algorithms (full list shown in Appendix C). In addition, cow characteristics (34%), milk characteristics (37%), calving information (23%) and lactation information (19%) were also commonly employed as feature data followed by meteorological data (14%), diet and feeding (10%), farm characteristics (16%), milking parameters (10%), soil characteristics (1%) and other variables (7%). Regarding the algorithms employed (RQ5), tree-based algorithms were employed in the largest number of studies (54%), followed by neural network algorithms

Research Area, Features and Algorithms Used
The flow of research studies from a research category to the category of features considered to the category of machine learning algorithms is shown in Figure 6. Overall, 48% of research studies utilised sensor data for model development (RQ4), predominantly for physiology and health (n = 24) and behavior analysis (n = 24) applications. Accelerometer (n = 27), image (n = 7) and pedometer (n = 6) data were the three most frequently employed types of data collected by sensors, as shown in Appendix C. Sensor data were most frequently employed as feature data when developing artificial neural network models (n = 35), tree-based models (n = 32) and other model types (n = 31), whereby other models included the application support vector machine and k-nearest neighbor algorithms (full list shown in Appendix C). In addition, cow characteristics (34%), milk characteristics (37%), calving information (23%) and lactation information (19%) were also commonly employed as feature data followed by meteorological data (14%), diet and feeding (10%), farm characteristics (16%), milking parameters (10%), soil characteristics (1%) and other variables (7%). Regarding the algorithms employed (RQ5), tree-based algorithms were employed in the largest number of studies (54%), followed by neural network algorithms (50%), statistical regression-based algorithms (43%), other model types (37%), Bayes algorithms (17%), meta (10%), rule (4%) and clustering (1%). A full breakdown of the specific algorithms employed within each algorithm category is shown in Appendix C, in conjunction with the number of studies that each algorithm was employed. (50%), statistical regression-based algorithms (43%), other model types (37%), Bayes algorithms (17%), meta (10%), rule (4%) and clustering (1%). A full breakdown of the specific algorithms employed within each algorithm category is shown in Appendix C, in conjunction with the number of studies that each algorithm was employed. The number of research studies published per year from 1999 to 2021, categorised according to each algorithm method, is shown in Figure 7. Prior to 2018, tree-based algorithms were the most frequently employed algorithm category (employed in 25% of all publications), followed by statistical regression-based algorithms (22%). This trend continued in the period between 2018 and 2021, whereby the percentage of publications that employed tree-based algorithms increased to 26%. However, the percentage of publications that employed statistical regression algorithms reduced to 17%, while the percentage of publications that employed neural network-based algorithms increased to 25% during the 2018 and 2021 period (up from 16% between 1999 and 2017). This equated to a fivefold (5.2), or a 420% increase in the number of publications that employed neural network algorithms since 2018, in comparison to a threefold (3.3) increase in the number of publications that employed tree-based algorithms and statistical regression algorithms (2.5).  The number of research studies published per year from 1999 to 2021, categorised according to each algorithm method, is shown in Figure 7. Prior to 2018, tree-based algorithms were the most frequently employed algorithm category (employed in 25% of all publications), followed by statistical regression-based algorithms (22%). This trend continued in the period between 2018 and 2021, whereby the percentage of publications that employed tree-based algorithms increased to 26%. However, the percentage of publications that employed statistical regression algorithms reduced to 17%, while the percentage of publications that employed neural network-based algorithms increased to 25% during the 2018 and 2021 period (up from 16% between 1999 and 2017). This equated to a fivefold (5.2), or a 420% increase in the number of publications that employed neural network algorithms since 2018, in comparison to a threefold (3.3) increase in the number of publications that employed tree-based algorithms and statistical regression algorithms (2.5).
Sensors 2022, 22, x FOR PEER REVIEW 10 (50%), statistical regression-based algorithms (43%), other model types (37%), Bayes rithms (17%), meta (10%), rule (4%) and clustering (1%). A full breakdown of the sp algorithms employed within each algorithm category is shown in Appendix C, in junction with the number of studies that each algorithm was employed. The number of research studies published per year from 1999 to 2021, catego according to each algorithm method, is shown in Figure 7. Prior to 2018, tree-based rithms were the most frequently employed algorithm category (employed in 25% publications), followed by statistical regression-based algorithms (22%). This trend tinued in the period between 2018 and 2021, whereby the percentage of publications employed tree-based algorithms increased to 26%. However, the percentage of pub tions that employed statistical regression algorithms reduced to 17%, while the percen of publications that employed neural network-based algorithms increased to 25% du the 2018 and 2021 period (up from 16% between 1999 and 2017). This equated to a fiv (5.2), or a 420% increase in the number of publications that employed neural networ gorithms since 2018, in comparison to a threefold (3.3) increase in the number of pub tions that employed tree-based algorithms and statistical regression algorithms (2.5)

Validation Methods
In relation to RQ6, six evaluation methods were identified throughout the 127 studies that addressed classification, regression and clustering (n = 1) problems: hold-out crossvalidation (n = 49), leave-out-one-animal (LOOA) (n = 4), leave-one-out cross-validation (LOOCV) (n = 3), nested cross-validation (Nested CV) (n = 7), Train/Validation/Test (n = 17) and k-fold cross-validation (n = 30), as shown in Table 2. The k-fold cross-validation method was employed with a mean k value of 10, the hold-out method was employed with 71% of data used for training and 29% of data used for a test dataset, while the train/validation/test method used 65%, 17% and 18% of data for training, validation and testing, respectively. In 21 research studies, these evaluation methods were repeatedly carried out to reduce the probability of biased results associated with a single hold-out, train/validation/test or k-fold CV split. The number of studies that repeatedly carried out each particular evaluation method is highlighted in brackets. On average, the hold-out method was repeated 38 times, the train/validation/test method was repeated 10 times and k-fold crossvalidation was repeated 14 times. In addition, 16 research studies employed a combination of two evaluation methods to further separate training and testing stages, particularly important for when tuning hyper-parameters. For example, 15 studies employed k-fold CV for model training to select features and/or hyper-parameters and calculated prediction accuracy on separate test data using hold-out cross-validation. One study employed two different evaluation methods for two different dependent variables.
The number of research studies published per year from 1999 to 2021, categorised according to each validation method, is shown in Figure 8. Prior to 2018, the hold-out method was the most frequently employed validation method (employed in 43% of all publications), followed by k-fold cross-validation (30%) and train/validation/test validation (19%). This trend continued throughout the 2018 to 2021 period, whereby the percentage of publications that employed the hold-out method increased slightly to 46%, as did the use of k-fold cross-validation (33%). However, this period also saw a reduction in the percentage of publications that employed the train/validation/test validation (10%). The hold-out cross-validation method was the most frequently employed method each year between 2014 and 2020, while the k-fold cross-validation method was the most frequently used method (45%) in the first five months of 2021. In 2019 and 2020, the use of the hold-out method increased by 100% and 19%, year-on-year, respectively, while the use of k-fold cross-validation increased by 80% and 33%, year-on-year, respectively.  (11) LOOA = leave-out-one-animal; LOOCV = leave-one-out cross-validation; Nested CV = nested cross-validation; k-fold CV = k-fold cross-validation. a Values along the diagonal refer to the number of studies that used that particular evaluation method. Values not along the diagonal refer to the number of studies that used a combination of evaluation methods corresponding to the value's vertical and horizontal position. b Bracketed values represent the number of studies where that particular evaluation method was carried out repeatedly (i.e., more than once). c One study employed two different evaluation methods for two different dependent variables. (11) LOOA = leave-out-one-animal; LOOCV = leave-one-out cross-validation; Nested CV = nested cross-validation; k-fold CV = k-fold cross-validation. a Values along the diagonal refer to the nu ber of studies that used that particular evaluation method. Values not along the diagonal refer t the number of studies that used a combination of evaluation methods corresponding to the valu vertical and horizontal position. b Bracketed values represent the number of studies where that particular evaluation method was carried out repeatedly (i.e., more than once). c One study employed two different evaluation methods for two different dependent variables.
The number of research studies published per year from 1999 to 2021, categori according to each validation method, is shown in Figure 8. Prior to 2018, the holdmethod was the most frequently employed validation method (employed in 43% of publications), followed by k-fold cross-validation (30%) and train/validation/test vali tion (19%). This trend continued throughout the 2018 to 2021 period, whereby the p centage of publications that employed the hold-out method increased slightly to 46% did the use of k-fold cross-validation (33%). However, this period also saw a reduction the percentage of publications that employed the train/validation/test validation (10 The hold-out cross-validation method was the most frequently employed method e year between 2014 and 2020, while the k-fold cross-validation method was the most quently used method (45%) in the first five months of 2021. In 2019 and 2020, the use the hold-out method increased by 100% and 19%, year-on-year, respectively, while use of k-fold cross-validation increased by 80% and 33%, year-on-year, respectively.

Discussion Overview
This study represents the largest and broadest systematic mapping review to da focusing on published literature related to the application of machine learning algorith in the dairy research domain. In total, 129 publications were included and assessed, ma

Discussion Overview
This study represents the largest and broadest systematic mapping review to date, focusing on published literature related to the application of machine learning algorithms in the dairy research domain. In total, 129 publications were included and assessed, made possible due to a combination of broad search terms and an increased search period spanning over 21 years. However, it is still plausible that additional publications that focused on the application of machine learning algorithms on dairy farms were not captured by the search strategy employed. The search strategy involved five databases chosen to provide wide coverage of dairy-related research while allowing for the bulk downloading of publications. It is likely that some publications located in other databases were not included. Snowballing was carried out to help reduce the number of publications not included. However, the largest barrier to including publications in this study was the availability of a full text from the Scopus database. This was due to restrictions on the publisher's side, which accounted for 93% of the total number of excluded publications.
Throughout the 129 publications included in this mapping study, a considerably wide range of dependent variables (n = 66), features (n = 251) and algorithms (n = 90) were employed in 35 journals and 18 conference proceedings. It was, therefore, necessary to categorise dependent variables, features, algorithms and journal articles and conference papers accordingly to ensure findings could be easily digested and each research question could be adequately addressed. Categorisation was based on the experience of the authors while considering the categorisation approaches employed in cognate studies. This included the categorisation of: (1) each dependent variable into one of six research categories, (2) each feature into 1 of 11 feature categories, (3) each algorithm into one of eight algorithm categories and (4) journals that published four or fewer articles included in this study into the other journals category, and all conference papers into a separate Conference Paper category. For full transparency, the full lists of dependent variables, features and algorithms employed and their respective categories, as well as the journal/conference proceedings, are presented in Appendices A-D respectively.
All neural network-based models, including multilayer perceptron networks, convolutional neural networks and long-short term memory networks, were included in the Neural Network category to minimise the over-categorisation of algorithms. The number of studies that employed each neural network-based algorithm can be found in Appendix C.
The research categories, algorithm categories and validation methods employed per year were assessed between 1999 and 2021 to allow for trends in research areas and methodologies to be identified over time. Firstly, regarding the research categories, the largest number of publications prior to 2018 were related to animal husbandry (35%). However, since 2018, the largest number of publications have been related to physiology and health (38%), with the percentage of publications focusing on animal husbandry research reducing to 14%. This suggested a trend throughout this research domain, with studies moving away from animal husbandry-related problems to focus on improving the physiology and health of dairy cows. The number of studies that focused on the physiology and health of dairy cows has increased seven-fold since 2018. Concurrently, the smallest number of publications both prior to 2018 (6%) and after 2018 (6%) were related to feeding, suggesting an opportunity for future research to be carried out in this largely unexplored subdomain. Secondly, in relation to the types of algorithms employed, tree-based algorithms were the most frequently employed algorithm category, being used in 25% and 26% of studies prior to 2018 and since 2018, respectively. However, the use of statistical regression-based algorithms reduced from 22% to 17%, before and after 2018, respectively, while at the same time, the use of neural network-based algorithms increased from 16% to 25%. This suggested a move away from statistical regression-based algorithms towards the utilisation of neural network-based algorithms. Lastly, regarding the validation methods employed, both prior to 2018 and after 2018, hold-out cross-validation was the most frequently employed validation method, being used in 43% and 46%, respectively. In addition, the use of k-fold cross-validation also increased from 30% to 33% during these periods. However, the percentage of studies that used the train/validation/test validation method reduced from 19% to 10% before and after 2018, respectively, suggesting a trend away from the train/validation/test method towards hold-out and k-fold cross-validation.
This mapping study was carried out in line with PRISMA guidelines, with six predefined research questions outlined in Section 2.1. The search strategy produced results that adequately addressed each research question. In relation to RQ1, the country responsible for the greatest number of publications was the USA (n = 19); however, when the geographical location of studies was assessed on a continent basis, Europe was by far the greatest region, producing 60 publications. Regarding RQ2, the greatest number of publications was published in the Computers and Electronics in Agriculture journal (21%), followed by the Journal of Dairy Science (16%). Additionally, 35 publications (27%) were published across 28 other journals that each published less than four papers included in this study, while the 20 conference papers were published in 18 different conference proceedings. RQ3 focused on determining what research areas were being addressed in the dairy research domain using machine learning methodologies, where results showed that the greatest number of studies addressed problems focused on the physiology and health of dairy cows (32%). In relation to RQ4, the most frequently employed feature data throughout the literature were derived from sensor data (48%), with 27 studies employing accelerometer data. Additionally, RQ5 focused on identifying the most frequently utilised machine learning algorithms used throughout the dairy literature. The greatest number of studies employed tree-based algorithms (54%), followed by neural network-based algorithms (50%). Lastly, RQ6 focused on identifying the evaluation metrics and methods employed throughout the dairy literature. Assessing the literature showed that RMSE (56%) and R2 (46%) were the most frequently employed metrics used for regression problems, while accuracy (77%) and recall (66%) were the most frequently employed metrics used for classification problems. In addition, hold-out cross-validation was the most frequently employed evaluation method throughout the literature.

Conclusions
The results show that there has been a considerable increase in the prevalence of published literature applying machine learning algorithms to help solve problems on dairy farms, with 74% of the publications included in this study published since 2018. Europe was responsible for the production of data utilised in 45% of the research studies assessed, highlighting the need for an increase in research studies in other regions, in particular Africa, Oceania and South America. In addition, 32% of the studies included in this review applied machine learning to problems related to the physiology and health of dairy cows, with a seven-fold increase in publications in this area occurring since 2018. Concurrently, this study has also highlighted a reduction in the percentage of studies that used statistical regression algorithms coupled with an increased percentage of studies that used neural network-based algorithms since 2018, when compared with the 1999 to 2017 period. As machine learning algorithms are more-frequently applied to problems in the dairy domain, it is important that best practice guidelines are followed to ensure their potential impact is realised. This mapping study may be used as the basis for future research in the dairy domain to identify studies that may have focused on a similar problem, whereby an identical, similar or improved methodology may be suitable.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The specific dependent variables used per research category are shown in Table A1, with the number of studies that used each dependent variable presented in brackets next to each variable. The total number of dependent variables per category is also presented. Table A1. Specific dependent variables used per research category.

Appendix B
The specific features used per feature category are shown in Table A2, with the number of studies that utilised each feature presented in brackets next to each feature. The total number of features per category is also shown.    (1) 19

Appendix C
The specific algorithms used per algorithm category are shown in Table A3, with the number of studies that employed each algorithm presented in brackets next to the algorithm. The total number of features per category is also shown. Table A3. Specific algorithms used per algorithm category.

Appendix D
The journals that published less than four studies included in this mapping study and all conference proceedings are shown in Table A4, with the number of studies published in each journal/conference presented in brackets next to each journal/conference. Table A4. List of journals that published less than four studies included in this study and all conference proceedings.

Appendix E
The specific feature data, dependent variables, machine learning algorithms, evaluation metrics and evaluation methods used per research category for each of the 129 publications included in this mapping study are shown in Table A5.