The Development of Smart Dairy Farm System and Its Application in Nutritional Grouping and Mastitis Prediction

Simple Summary This study combined Internet of Things technology with dairy farm management to set up a smart dairy farm system (SDFS). All kinds of data in the dairy farm will be intelligently captured by various sensors and transmitted to the SDFS in time for corresponding integration analysis. Nutritional grouping was demonstrated to improve production performance and methane and carbon dioxide emission reduction, which is also a hotspot of concern for the public and scientific research. The information from dairy herd improvement (DHI) analysis was used to predict the incidence of mastitis in dairy cows, which would lead to a new way to predict individual mastitis. By fully interpreting the hidden value of dairy farm data, SDFS could help in the better management of dairy farms and promote the application of intelligent systems in dairy farm production. Abstract In order to study the smart management of dairy farms, this study combined Internet of Things (IoT) technology and dairy farm daily management to form an intelligent dairy farm sensor network and set up a smart dairy farm system (SDFS), which could provide timely guidance for dairy production. To illustrate the concept and benefits of the SDFS, two application scenarios were sampled: (1) Nutritional grouping (NG): grouping cows according to the nutritional requirements by considering parities, days in lactation, dry matter intake (DMI), metabolic protein (MP), net energy of lactation (NEL), etc. By supplying feed corresponding to nutritional needs, milk production, methane and carbon dioxide emissions were compared with those of the original farm grouping (OG), which was grouped according to lactation stage. (2) Mastitis risk prediction: using the dairy herd improvement (DHI) data of the previous 4 lactation months of the dairy cows, logistic regression analysis was applied to predict dairy cows at risk of mastitis in successive months in order to make suitable measurements in advance. The results showed that compared with OG, NG significantly increased milk production and reduced methane and carbon dioxide emissions of dairy cows (p < 0.05). The predictive value of the mastitis risk assessment model was 0.773, with an accuracy of 89.91%, a specificity of 70.2%, and a sensitivity of 76.3%. By applying the intelligent dairy farm sensor network and establishing an SDFS, through intelligent analysis, full use of dairy farm data would be made to achieve higher milk production of dairy cows, lower greenhouse gas emissions, and predict in advance the occurrence of mastitis of dairy cows.


Introduction
With the rapid development of modern science and technology and the continuous improvement of communication technology, the development and application of the Internet of Things (IoT) have gradually been penetrating every aspect of life [1]. It is estimated that there would be about 2.5 billion bytes of data every day, which is beyond the capacity

Data Collection in Dairy Farm
The data collected mainly included the following: (1) individual information: cattle pedigree information, body weight, appearance score, body condition score, etc.; (2) farm information: farm location, stalls, cattle barns, environment, etc.; (3) cattle management: recording routine events in herds, such as heat, insemination, calving, and disease prevention and control events; (4) feed: including diet composition, average delivery and feed residues per day, and dry matter intake (DMI); (5) dairy herd improvement (DHI) parameters: milk yield, milk fat percentage, milk protein percentage, fat/protein ratio, milk fat content, milk protein content, and somatic cell count (SCC) were recorded on monthly test days [21].

Nutrition Grouping
Based on the individual cow information, DMI, and DHI data collected by the SDFS, Nutrient Dynamic System Professional Software (NDS, developed by Rum&n srl), which adopted the Cornell Net Carbohydrate and Protein System (CNCPS6.55) [22], was used to calculate metabolic protein (MP) and metabolic energy (ME) as two of references for nutritional grouping. Two hundred seventy lactating cows were divided into 9 pens by cluster analysis as the nutritional group (NG). The control groups were assigned to 9 pens with 30 cows in each pen according to the original grouping (OG; divided according to milk production of cows) method in the farm, in which cows were grouped according to lactation stages. The corresponding diets calculated by NDS were provided for each pen

Data Collection in Dairy Farm
The data collected mainly included the following: (1) individual information: cattle pedigree information, body weight, appearance score, body condition score, etc.; (2) farm information: farm location, stalls, cattle barns, environment, etc.; (3) cattle management: recording routine events in herds, such as heat, insemination, calving, and disease prevention and control events; (4) feed: including diet composition, average delivery and feed residues per day, and dry matter intake (DMI); (5) dairy herd improvement (DHI) parameters: milk yield, milk fat percentage, milk protein percentage, fat/protein ratio, milk fat content, milk protein content, and somatic cell count (SCC) were recorded on monthly test days [21].

Nutrition Grouping
Based on the individual cow information, DMI, and DHI data collected by the SDFS, Nutrient Dynamic System Professional Software (NDS, developed by Rum&n srl), which adopted the Cornell Net Carbohydrate and Protein System (CNCPS6.55) [22], was used to calculate metabolic protein (MP) and metabolic energy (ME) as two of references for nutritional grouping. Two hundred seventy lactating cows were divided into 9 pens by cluster analysis as the nutritional group (NG). The control groups were assigned to 9 pens with 30 cows in each pen according to the original grouping (OG; divided according to milk production of cows) method in the farm, in which cows were grouped according to lactation stages. The corresponding diets calculated by NDS were provided for each pen (Tables 1 and 2). Since the monitoring of methane and carbon dioxide is greatly influenced by the environment, space, and air flow, the stability of the data detected by this sensor needs to be further improved. Therefore, the methane and carbon dioxide data in this study mainly used the predicted value of CNCPS system, and the stability was guaranteed to a certain extent [23]. N intake was calculated and CH 4 and CO 2 emissions were also estimated by NDS [24]. 1-3 stand for diets for early, middle, and late stages of first lactation, respectively; 4-6 stand for early, middle, and late stages of second lactation, respectively; 7-9 stand for early, middle, and late stages of third lactation and over, respectively. 1-3 stand for diets for early, middle, and late stages of first lactation, respectively; 4-6 stand for early, middle, and late stages of second lactation, respectively; 7-9 stand for early, middle, and late stages of third lactation and over, respectively.

Mastitis Prediction
The mastitis prediction was carried out using the original DHI records of the dairy farm from 1 January 2019 to 31 December 2021. In general, SCC greater than 200,000/mL was used as the criterion for subclinical mastitis in dairy cows [25]. From the original DHI records, records without SCC were deleted. Due to the skewed distribution of the SCC values and the heterogeneous variance, SCC was first transformed into somatic cell score (SCS), which was close to a normal distribution, before the subsequent analysis [26]. The conversion formula was as follows: To improve the accuracy of model predictions, the values of SCS ranging from 0 to 10 were used [27]. According to the SCC values, cows were divided into a healthy group and a mastitis risk group (Table 3). According to the SCC in DHI data, when SCC ≤ 200,000/mL, it indicates a healthy condition; when 200,000/mL < SCC ≤ 500,000/mL, it indicates subclinical mastitis, when SCC > 500,000/mL, it indicates clinical mastitis [28]. In this paper, both subclinical and clinical mastitis were named as the mastitis risk group. Altogether, 2555 DHI records were used for regression analysis. The following independent variables were chosen: parity, days in milk (DIM), and milk indicators of the previous four lactation months (milk yield, milk protein percentage, milk fat percentage, lactose percentage, fat/protein ratio). A training set (70%) and a validation set (30%) were created from the collected data. The training set was used to filter the independent variables (equation 1) by bidirectional elimination stepwise regression. The parameters with statistically significant effects on the prediction of mastitis in dairy cows were obtained (Table 4), and corresponding coefficients were substituted into the logistic regression equation for further analysis using the validation set: where P represents the probability of positive results; 1 − P represents the probability of non-positive results. P 1−P is the strength of the disease as a statistical indicator, called the odds ratio (OR), used to estimate the effect of the independent variable on disease. X 1 is parity, X 2 -X 5 represent the amount of milk produced during the first 1-4 lactation months, X 6 -X 9 represent milk fat percentage in the first 1-4 lactation months, X 10 -X 13 represent the protein rate of the first 1-4 lactation months, X 14 -X 17 represent lactose rate in the first 1-4 lactation months, X 18 -X 21 represent the fat-to-egg ratio of the first 1-4 lactation months, X 22 -X 26 represent the natural months of the first 1-5 lactation months; β 0 is a constant, and β 1 -β 26 are the regression coefficients of each variable (X 1 -X 26 ). The receiver operating characteristic (ROC) curve was plotted to reflect the prediction accuracy of predictive variables in the dairy cow mastitis risk assessment model. The X-axis is specificity (percentage of healthy cows that tested negative) and the Y-axis is sensitivity (the percentage of risk cows that correctly tested positive), The area under the curve (AUC) was calculated. An AUC of the prediction model between 0.50 and 0.70 indicates that the effect is average, an AUC between 0.70 and 0.90 is considered to indicate a good model, and an AUC higher than 0.90 is considered to indicate an excellent model [29,30].

Statistical Analysis
First of all, the data were preliminarily sorted using Excel and SPSS 21.0. Cluster analysis was performed using the hclust function in the stats package of Rx64 (version 4.0.5) for nutritional parameters (milk production, parity, DIM, MP, ME, etc.).
For mastitis prediction, the DHI data were analyzed using ANOVA in SPSS 21.0. Logistic regression analysis was performed using the general linear model (GLM) function, and the ROC curve was drawn using the PROC package of Rx64 software.

Nutrient Grouping
As shown in Table 5, compared with OG, the milk production of NG increased significantly (p < 0.05), except that the milk production in the mid-lactation group of the second parity was not significantly different. In general, the use of NG can significantly improve the milk yield of dairy cows at the same stage. As shown in Table 6, the N intake of dairy cows in NG was higher than that in OG; N production and N efficiency in NG were highly significantly increased than that in OG (p < 0.01). Compared to OG, overall N efficiency in NG increased by 1.98%. Generally speaking, the N intake of cows increased after NG treatment. The GHG emissions are shown in Table 7. CH 4 and CO 2 emissions of dairy cows in NG were lower than those in OG. In general, the use of NG resulted in a decrease in dairy cow methane and carbon dioxide emissions.

Mastitis Prediction
The risk factors related to mastitis in the experimental dairy farm are shown in Table 8. OR values indicated the fitting degree of the model was good (Hosmer-Lemeshow (p > 0.05)). The results showed that milk yield in the second lactation month (p < 0.05), fat percentage in the first and third lactation months (p < 0.05), and natural month in the fifth lactation month (p < 0.05) had significant effects on mastitis risk in dairy cows. The predictive value of the mastitis risk assessment model was 0.773. The accuracy was 89.9%, the specificity was 70.2%, and the sensitivity was 76.3% ( Figure 2).

Mastitis Prediction
The risk factors related to mastitis in the experimental dairy farm are shown in Table  8. OR values indicated the fitting degree of the model was good (Hosmer-Lemeshow (p > 0.05)). The results showed that milk yield in the second lactation month (p < 0.05), fat percentage in the first and third lactation months (p < 0.05), and natural month in the fifth lactation month (p < 0.05) had significant effects on mastitis risk in dairy cows. The predictive value of the mastitis risk assessment model was 0.773. The accuracy was 89.9%, the specificity was 70.2%, and the sensitivity was 76.3% ( Figure 2).

Discussion
Recently, the utilization of 5G and IoT technology has become a major trend in the development of animal husbandry [31]. This study formed a large sensing network by applying the concept of precision animal husbandry and installing various types of equipment on dairy farms. Data are collected through various sensors, and IoT facilitates the transmission of data from the network to the SDFS for data analysis, forming the basis for the transformation of intelligent dairy farms into smart management.

Nutrition Grouping
Currently, most dairy farms usually only consider the lactation stage when grouping cows to determine feed ration, resulting in less accurate feed nutrition on dairy farms [32]. Studies have shown that grouping dairy cows according to their actual nutritional needs can increase the utilization rate of N in the diet [33], and milk production, so as to reduce GHG emissions [34].
In this study, for dairy cows in NG, milk production significantly increased (p < 0.05). The dairy cows with different needs were fed different TMR formulations, which greatly improved the N efficiency and increased milk production. This is in agreement with the

Discussion
Recently, the utilization of 5G and IoT technology has become a major trend in the development of animal husbandry [31]. This study formed a large sensing network by applying the concept of precision animal husbandry and installing various types of equipment on dairy farms. Data are collected through various sensors, and IoT facilitates the transmission of data from the network to the SDFS for data analysis, forming the basis for the transformation of intelligent dairy farms into smart management.

Nutrition Grouping
Currently, most dairy farms usually only consider the lactation stage when grouping cows to determine feed ration, resulting in less accurate feed nutrition on dairy farms [32]. Studies have shown that grouping dairy cows according to their actual nutritional needs can increase the utilization rate of N in the diet [33], and milk production, so as to reduce GHG emissions [34].
In this study, for dairy cows in NG, milk production significantly increased (p < 0.05). The dairy cows with different needs were fed different TMR formulations, which greatly improved the N efficiency and increased milk production. This is in agreement with the results of Cabrera et al. [35]. Nutritional grouping of dairy cows could improve the nutritional accuracy of diet, prevent nutrient loss, reduce dietary costs, and increase milk production.
Nutritional grouping had better theoretical nutritional accuracy, thereby reducing nutritional loss due to dietary and environmental influences [36]. GHG emissions (CH 4 , N 2 O, CO 2 ) are currently a research hotspot all over the world [37,38]. In this study, dairy cows in NG had CH 4 and CO 2 emissions lower than those in OG, and the N efficiency increased by 1.98% on average. Kalalantari et al. [39]. reported an increase in N efficiency by 2.7% when cows were divided into multiple nutritional groups and fed with an average MP lower than requirements. The methane emission also decreased. The reason for this discrepancy could be that the authors fully considered the cow's body weight (BW), BCS Animals 2023, 13, 804 9 of 12 (the range of 2.0-4.5 to ensure the accuracy of the model), and NEL in their study, while in this study these factors were not in consideration. Therefore, the nutritional grouping of lactating dairy cows can fully consider the physical condition and nutritional needs of dairy cows and provide different diets for different groups to better meet the nutritional needs of dairy cows and lower GHG emissions [40].

Smart Prediction of Mastitis
As dairy cows produce ever more milk, cow mastitis has been becoming one of the most important diseases that restrict the development of the global dairy industry [41]. It is estimated that economic losses due to mastitis in dairy cows account for 38% of the total direct cost of common production diseases on dairy farms [42]. Risk assessment and timely prevention of mastitis in cows are essential to ensure the health of cows and the consistent improvement of raw milk quality [43].
Recent studies on mastitis prediction have focused on factors such as year, month, and farm [44,45]. Individual information about cows has not been taken into account, making the prediction less practical. This study screened the factors that could predict the mastitis risk in the fifth lactation month based on the previous four lactation months using DHI data of individual cows. The milk yield in the second lactation month and fat percentage in the first and third lactation months were influential risk factors for mastitis and could accurately predict the risk of mastitis in the fifth lactation month. In this way, the incidence of mastitis in the dairy farm could be predicted two months in advance. This is expected to have a certain guiding significance for dairy farms.
The ROC curve is a comprehensive representation of a model's accuracy. Sensitivity and specificity are indispensable indicators that reflect authenticity [46]. The area under the curve (AUC) is the diagnostic value of the model. The larger the area is, the better the diagnostic performance of the model is [47]. The predictive specificity of this study was similar to Cavero's predictive specificity (74.9%), which was calculated for mastitis prediction using data from 478 cows from neural networks and automated milking systems [18]. Sun et al. [48] reported a result of 87% true positives using artificial neural networks, which was similar to this study. The predictive value of this study was lower than the predictive value (AUC) of 0.93 reported by Jadhav et al.
[49] based on a 214-cow dataset. This may be due to the authors using more variables such as mammary and bedding hygiene and milking methods in their study. In practice, bedding hygiene status, teat shape, and udder hygiene also affect the occurrence of mastitis in dairy cows, interfering with the accuracy of individual dairy cow mastitis risk assessment [50].
However, the prediction process could be always developing. With continuous accumulation and aggregation of more data and real-time data streams, the accuracy of model prediction could be improved over time. According to logistic regression and ROC curve description, we can find the risk indicators of dairy mastitis. Significantly, this equation does not apply to all cases; our team is still studying further, hoping to find a general equation that can be updated automatically by continuously merging the past data to detect the incidence of cow disease in time.

Conclusions
This study combined IoT technology with dairy farm management to set up an SDFS. All kinds of data in the dairy farm will be intelligently captured by various sensors and transmitted to the SDFS in time for corresponding integration analysis. The applications of the SDFS were demonstrated in two aspects. NG according to the nutritional needs of dairy cows could improve nutritional accuracy, thus leading to increased milk production and N efficiency and reduced CH 4 and CO 2 emissions, so as to mitigate the environmental impacts. A mastitis prediction model was established using DHI data to identify potential cows at risk of mastitis in advance, thus reducing economic losses. By fully interpreting the hidden value of dairy farm data, the SDFS could help in the better management of dairy farms and promote the application of intelligent systems in dairy farm production.
At present, our data mining of dairy farms is not comprehensive enough; we still need to continue to work hard. Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available.

Conflicts of Interest:
The authors declare no conflict of interest.