Neural Network-Aided Milk Somatic Cell Count Increase Prediction

Nagy, Sára Ágnes; Csabai, István; Varga, Tamás; Póth-Szebenyi, Bettina; Gábor, György; Solymosi, Norbert

doi:10.3390/vetsci12050420

Open AccessArticle

Neural Network-Aided Milk Somatic Cell Count Increase Prediction

by

Sára Ágnes Nagy

¹

,

István Csabai

¹

,

Tamás Varga

²,

Bettina Póth-Szebenyi

³,

György Gábor

² and

Norbert Solymosi

^1,2,*

¹

Department of Physics of Complex Systems, Eötvös Loránd University, 1117 Budapest, Hungary

²

Centre for Bioinformatics, University of Veterinary Medicine, 1078 Budapest, Hungary

³

Doctoral School of Animal Science, Hungarian University of Agriculture and Life Sciences, 7400 Kaposvár, Hungary

^*

Author to whom correspondence should be addressed.

Vet. Sci. 2025, 12(5), 420; https://doi.org/10.3390/vetsci12050420

Submission received: 11 February 2025 / Revised: 10 April 2025 / Accepted: 28 April 2025 / Published: 29 April 2025

(This article belongs to the Special Issue Ruminant Mastitis: Therapies and Control)

Download

Browse Figures

Review Reports Versions Notes

Simple Summary

Mastitis cause the biggest economical loss in the dairy industry worldwide. Its subclinical form, in the absence of visible symptoms, is difficult to diagnose under farm conditions without laboratory laboratory assistance. It is therefore particularly important to develop methods to reduce the incidence of subclinical mastitis in the herd to lower levels. Somatic cell count is the most commonly used method for monitoring subclinical mastitis. In our work we used artificial neural network to predict increased somatic cell count from data recorded by automatic milking machines, thus helping to detect subclinical mastitis at herd level. For this purpose, we used milk yield and milk-related data that can be made available to the owner by milking machines used in an average Hungarian dairy farm. Our best ANN model has a sensitivity of 0.54 and a specificity of 0.77, which exceeds the performance of the currently widely used diagnostic method performed cow-side, California Mastitis Test. Combined with the latter, the positive predictive value can be increased by a further 50%. This method may be a time-saving and cost-effective way to diagnose subclinical mastitis.

Abstract

Subclinical mastitis (SM) is the most economically damaging yet often visually undetectable disease of dairy cows. Early detection and treatment can reduce the loss caused by the disease; thus, the continuous improvement of SM diagnostic methods is necessary. Although milk’s somatic cell count (SCC) is commonly measured for diagnostic purposes, its direct determination is not widely used in everyday practice. The primary objective of our work was to investigate whether the predictive value of SM diagnostics can be improved by training artificial neural networks (ANNs) on data generated using typical conventional milking systems. The best ANN classifier had a sensitivity of 0.54 and a specificity of 0.77, which is comparable to performances of various California Mastitis Tests (CMT) found in the literature. Combining two diagnostic tests, ANN and CMT, we concluded that the positive predictive value could be up to 50% higher than the value provided by the individual CMT. While implementing CMT is a labor-intensive process on herd-level, in milking machines where milk properties or milk yield data can be measured automatically, similar to our work, SCC-increase predictions for all individuals could be obtained daily basis.

Keywords:

dairy cow; subclinical mastitis; somatic cell count; electrical conductivity; machine learning; artificial neural network

1. Introduction

Mastitis, or intramammary infection (IMI), is the most common infectious disease in the dairy cattle sector [1,2]. Its economic damage in the European Union is estimated at EUR 3 billion [3]. The financial impact comes on the one hand from reduced milk production [1,2,4] which can be as high as 0.3–1.9 kg/day of milk per cow [4]. On the other hand, the additional costs of early culling, the loss of revenue due to reduced milk quality, increased veterinary and pharmaceutical costs also reduce profitability [3]. Further costs are increased when the subclinical mastitis (SM) becomes clinical [5], which also leads to higher expenses [6]. In addition to the financial implications, increased antibiotic use due to subclinical mastitis also raises the problem of the spread of antibiotic resistance. The 60–70% of antibiotics used on dairy farms are for treatment of mastitis [2]. This opens up the possibility that with more efficient SM detection, more efficient antibiotic use could be achieved and even the development of clinical mastitis could be prevented [6]. Given that mastitis treatment accounts for a significant part of the dairy sector’s antimicrobial consumption, a good SM diagnostic system could be a great support for farmers to be able to meet the EU’s 20% antibiotic reduction target by 2030 [7].

Considering the significant economic and animal welfare and one health issues caused by mastitis [1,8], early detection and treatment of the affected animals is of utmost importance. Despite the 15–40 times higher appearance rates, SM, unlike clinical mastitis, does not cause visible symptoms in the udder or milk, but a decrease in milk production and quality [2]. As no clinical sign exists, additional tests are required for SM diagnosis. Even though bacterial testing and PCR are currently the best methods to diagnose IMI, they are expensive and time-consuming [9].

As an increase in the number of cells in milk is an indicator of an inflammatory response, the most commonly used method to detect IMI and assess udder health is to determine the total number of somatic cells in milk [8,9,10]. Somatic cell count (SCC) <200,000 cells/mL is considered healthy in individual case [2]. However, Ruegg and Pentoja [1] suggest that milk loss is seen as early as SCC 100,000 cells/mL. At the same time an SCC of 400,000 cells/mL is clearly considered an IMI. The use of SCC is limited because not only the presence of mastitis but other factors affect its results (e.g., age, season, diurnal variation, number of quarters with IMI) [1]. Furthermore, the frequent use of SCC determination as a diagnostic method is hampered by the fact that the execution requires trained experts and relatively high expenditure rates [10]. The most common laboratory cell counting method is based on staining cells in milk with fluorescent dye and counting the fluorescent particles with machine [11]. There is also a portable on-farm machine which can determine SCC. It also uses a fluoro-optical technique for cell counting, but does not require professional laboratory equipment [11]. A measurement method for SCC integrated in a milking robot is also possible. In the system described by Lusis et al, the milking robot deduces the SCC from the change in viscosity due to the addition of a reagent to the milk and the somatic cell reaction [12]. Currently, only a few conventional milking systems can continuously estimate SCC [8]. In contrast, conventional milking systems that measure the electrical conductivity of composite milk samples are widespread [8]. Although the change in the electrical conductivity of mastitis milk is a long-established phenomenon [13,14], its use for IMI detection is less common [8,10]. Furthermore, readily available indirect tests include the California Mastitis Test (CMT) [15] that provides approximate information on the quantity of somatic cells in milk by the agglutination of immune cells’ DNA in milk [16,17]. While CMT has the advantage of being simple to perform on single cow, its herd-level use is impractical. Moreover, CMT’s sensitivity and specificity are poor [16], and its evaluation is subjective [10].

The above-mentioned facts outline the need for a reliable, automatable, rapid SM detection method to reduce antimicrobial use and improve animal welfare parameters, the chance of recovery from mastitis, and economic indicators [8,10]. Several studies suggest that the combination of different indirect detection methods could yield better results in this field [9,16].

Machine learning models for predicting udder health are not unprecedented [6,18]. One part of the reason for this is that they are particularly well suited to biological datasets that are too large and complex [19]. Among machine learning models, artificial neural networks (ANNs) are also the most widely used in agricultural research and are also the most successful [20] ones.

In our work, we investigated how the combination of automatically collected data related to lactation and milk characteristics, accompanied by CMT parameters, could improve the prediction of SCC increase. For that, ANN were trained to classify SCC increase using available, automatically recorded parameters. Subsequently, the resulting ANN-based test was combined with CMT for the better performance.

2. Materials and Methods

2.1. Data Collection

Milk production data were collected from a large-scale Holstein Friesian herd in Hungary. The average milk yield is about 35 L/cow in milk daily. The herd of 850 milking cows are milked 3 times a day on a rotary milking parlour with 50 stalls. The milking machines collect data about milk and milk yield on cow level (e.g., blood traces in milk, conductivity, kick-off and air entry into liners). This data is detected by the sensors in the milking parlour and automatically registered by the software running the parlour (DeLaval ALPRO™, Stockholm, Sweden). The following data were collected: AverageFlow (average milk yield during milking, measured in kg/min), AvgBloodLevel (average amount of blood in the milk measured in ppm during milking), AvgCondLevel (average electrical conductivity of the milk during milking in mS), CowNo (the identification number of the cow), Duration (the length of the milking), GroupNumber (the identification of the group where the cow is found on the day of milking), HerdNo (the identification number of the herd), KickOff (sudden stop of milk flow during milking), MilkDateTime (the precise date and time of the milking), MPC (the identification number of the milking stall where the animal was milked), PeakBloodLevel (maximum amount of blood in the milk measured in ppm during milking), PeakCondLevel (maximum electrical conductivity of the milk during milking in mS), PeakFlow (maximum milk yield during milking, measured in kg/min), RelativeCond (expressing the change in electrical conductivity of the milk), Session (definition of the milking schedule: morning, midday and evening shift), SessionDateTime (date and exact time of milking shift), Yield (amount of yielded milk in kg), YieldIsLow (binary expression of the decrease in production compared to individual production) and YieldPercentage (milk yield relative to the individual’s own production).

On the farm, SCC measurements are taken once a month during a morning milking on all of the lactating animals that are not on antibiotic treatment. The SCC data is managed by another farm management software (Riska Farm Management System, Systo Ltd., Páty, Hungary) which contains various data on the complete history of the animal with unique identifiers for each individual as well. Here you can find all the information about the animal throughout its life, from its pedigree, body-weight measurements, moving between groups, reproductive performance, medications, diagnostic test results, hoof trimmings, milk production data and information on its removal from the herd also. In our study from this database we only used the data related to the monthly SCC measurement: date of the measurement, the measured SCC, days in milk when the measurement was taken, number of actual lactation, the identification number of the cow.

2.2. Data Preprocessing

The two datasets were linked via the unique identifier of the cows, and we filtered the measured data for morning milkings up to 3 days before the collection of SCC data for each individual. Further on, SCC values were used to create a binomial dependent variable, with a value of 1 if SCC was above 200,000 cells/mL and 0 below 200,000 cells/mL. Using the functions of the package caret [21], we filtered out correlated explanatory variables in R-environment [22] and estimated the variable importance using a binomial generalized linear model. For neural network training, we kept explanatory variables with variable importance values greater than 3.

In practice, an increase in SCC above 200,000 cells/mL is generally evaluated as a sign of SM. Therefore, we used the SM prevalence values reported in the literature to estimate the pre-test probability of increased SCC. Prevalence data from publications published after 2000 were included in our analyses, with a case definition of subclinical mastitis based on a cut-off of 200,000 cells/mL. The prevalence values from different countries range widely (Australia: 28.9% [23]; Brazil: 45.4–49.6% [24]; Finland: 19.0–22.3% [25]; Indonesia: 68.2% [26]). Accordingly, predictive values were estimated for a pre-test probability range of 0.19–0.68.

The classification bias of CMT shows considerable variability in the literature. Sensitivity and specificity values were selected using data from publications published after 2000 that met the following criteria SE and SP values were reported for the whole udder and not only for quarters. The cut-off value of 200,000 cells/mL was used in the case definition of SM. In addition, SE and SP values were reported for all intramammary infections, not just for minor or major infections. The following value pairs were included in the study: SE: 0.69, SP: 0.72 [27]; SE: 0.70, SP: 0.48 [28]; SE: 0.95, SP: 0.78 [29]; SE: 0.71, SP: 0.57 [16]; SE: 0.95, SP: 0.81 [30].

2.3. Evaluation Metrics

In our study, as in many similar works, SCC above 200,000 cells/mL was considered IMI even in the absence of clinical symptoms [31]. The quality of detection was assessed by the predictive efficiency of CMT, ANN trained by us, or a combination of these. Two measures we used for the evaluation are the negative predictive value (NPV = (1 − P) × SE/(P × (1 − SE) + (1 − P) × SP)) and the positive predictive value (PPV = P × SE/(P × SE + (1 − P) × (1 − SP))) [32]. In the formulas, P, SE, and SP are the pre-test probability, sensitivity and specificity of the test used, respectively.

2.4. Model Training

Artificial neural networks (Figure 1) represent a distinct category of machine learning with the structural similarity to biological neural networks [33]. The first ANNs were developed and applied based on the information processing and transmission model of neurons [34]. The ANNs basic building elements are artificial neurons [34]. An artificial neuron is a mathematical function that maps (converts) inputs into outputs in a defined way [34]. Starting from this simple unit, neural networks consist of a large number of neurons (Figure 1) arranged in at least three, but rather several layers [35]. The outputs of the neurons in one layer become the inputs of the next layer [36]. As the stimulus travels through the biological neurons, the initial input moves from artificial neuron to neuron along the structure of the ANN [19,36]. Each connection between neurons represents a learnable weight [37]. They are modified by the model during training. The learning process is actually the optimization of weights for a given task [36]. After the process of training on numerous data, the model basic aim is that generate meaningful output value(s) from a previously unknown input set [19].

In our work, the process can be translated as using data on milk and the cow producing it as the input information. Using this, we trained the neural network (i.e., optimized its weights) to obtain a prediction of the udder health status of the cow in question, whether or not she is affected by subclinical mastitis.

The dataset was split into two parts with a 70/30% split. The smaller part served as a test set. The larger part was also split into a training and validation set in a 70/30% ratio. The number of layers varied from 1 to 4, and the number of neurons per layer varied by increments of 64 between 64 and 512. ANNs were created using all possible combinations. The hidden layer of the model used the ReLu, while the output layer used the sigmoid activation function, as our output was binomial (increased/not increased SCC).

To find the best model, these ANNs were trained, and their classification performance was evaluated. We maximized the sensitivity during the training (50 epochs) while reducing the loss obtained in the validation dataset, with a specificity of 0.9 in the callback. After each epoch, if the sensitivity exceeded the previous maximum, the weights associated with the network were saved. ANNs were trained using TensorFlow [38] on a Tesla V100 32GB GPU. Finally, using the saved weights, we performed the classification on an NVIDIA GeForce P8 2GB GPU using the test set and used the model with the best F1 score (2TP/(2TP+FP+FN)) in the subsequent analyses.

2.5. Combination of Tests

When combining ANN and CMT in the analysis of SCC increase, the combined sensitivity (

{SE}_{p a r a l l e l}

,

{SE}_{s e r i a l}

) and specificity (

{SP}_{p a r a l l e l}

,

{SP}_{s e r i a l}

) change. Estimation of classification bias in parallel testing is:

{SE}_{p a r a l l e l}

= 1 − (1 −

{SE}_{A N N}

) × (1 −

{SE}_{C M T}

),

{SP}_{p a r a l l e l}

=

{SP}_{A N N}

×

{SP}_{C M T}

. For sequential testing, it is:

{SE}_{s e r i a l}

=

{SE}_{A N N}

×

{SE}_{C M T}

,

{SP}_{s e r i a l}

= 1 − (1 −

{SP}_{A N N}

) × (1 −

{SP}_{C M T}

) [39]. Where

{SE}_{A N N}

and

{SP}_{A N N}

are estimated from predictions on the test set using our trained ANN, and

{SE}_{C M T}

and

{SP}_{C M T}

are sensitivity and specificity, respectively, gathered from the literature [16,27,28,29,30].

3. Results

After the preprocessing of databases, a dataset containing 7685 records resulted. By importance filtering the following variables were kept for the modeling: PeakCondLevel, AvgCondLevel, RelativeCond, Yield, YieldIsLow, number of actual lactation, days in milk, and PeakCondLevel gauged on the 1st, 2nd and 3rd day before the SCC measurement day.

During the analysis, a model with 384 neurons in one hidden layer gave the highest F1 score (

0.42

with

{SE}_{A N N}

= 0.54,

{SP}_{A N N}

= 0.77) on the test set among the ANNs tested with different architectures.

The SE and SP estimates of the CMT from different literature and our best ANN and the NPV and PPV obtained using 19–68% prevalence values are presented in the top two panels of Figure 2.

The other two rows in the figure show the combined predictions of ANN and CMT. The second row of panels in the figure shows the NPV and PPV estimates of the parallel combination, and the third row shows the NPV and PPV estimates of the serial combination plotted against the prevalence. The median and interquartile range (IQR) of the same estimates are summarized in Table 1, which describes the expected overall performance of tests and test combinations over a given range of prevalence.

Ranking the medians of these estimates in descending order, ANN was ranked 5th in the NPV and 3rd in the PPV order. As shown in Table 1, the serial combination gives higher PPV values than the parallel one, the former approach is more useful from the SM detection point of view.

To illustrate the predictive change that the serial combination of ANN and CMT can bring compared to CMT, the difference and the ratio of the NPV and PPV values obtained in both ways are shown in Figure 3. The median and IQR of the difference and ratio curves are reported in Table 2.

4. Discussion

Due to the damage caused by SM, it is usually a central issue in dairy farm management. Since SM diagnosis is mainly possible by measuring SCC, regular monitoring of this value is key to both profitability and animal welfare. However direct SCC measurement is not a common practice. The primary objective of our work was to investigate whether ANN can be used to improve the diagnosis of SM based on data that are available by using typical conventional milking systems.

In our study, we kept in mind the practical usefulness of the model, namely that it could also be suitable for SM monitoring. In order to reduce the presence of subclinical mastitis at herd level, a test with a high positive predictive value should be used which can effectively detect affected animals even at lower prevalence [40]. The well-known phenomenon that prevalence affects PPV and NPV can be observed in each sub-figure of Figure 2. As prevalence increases, PPV also increases along with it while NPV decreases. On the contrary, the NPV shows an increase with declining prevalence.

Higher PPV can be achieved by increasing the specificity of the test [41]. Accordingly, the neural network weights were varied during training to increase sensitivity while keeping the specificity high.

Improving SM detection with ANN is not unprecedented in the literature [9,42,43]. Machine learning models, including ANNs, based on milking data (electrical conductivity, lactose, milk volume, etc.) have been used for SM detection in research. The predictive values of the used models were generally found to be superior to those of traditional statistical approaches. For this reason, in addition to analyzing ANN solely, we also combined ANN prediction with CMT, a well-known and widely used indirect SM diagnostic method in practice. This combined approach we present was not found in the literature.

The prediction curves plotted in the second and third rows of Figure 2 show that the parallel combination of ANN and CMT resulted in an improvement in NPV. In contrast, the serial combination resulted in an improvement in PPV. As one can see in these sub-figures, a higher PPV can be obtained by increasing the specificity, but it would be preferable to achieve this in such a way that the sensitivity is as high as possible. Accordingly, we varied the neural network weights during training to increase the sensitivity by keeping the specificity value high. In general, the classification efficiency of models is evaluated by the area under (AUC) the receiver operating characteristic curve (ROC) value [9]. However, maximizing the AUC parameter does not mean maximizing specificity.

Among other properties, Figure 3 shows that, especially for low prevalence, the ANN+CMT combination significantly increases the reliability of the positive prediction. Compared to individual CMT, the serial combination resulted in a 55% increase in PPV at a prevalence of 20% and 39% at a prevalence of 30%. According to these results, the prediction performance of ANN is comparable to CMT for both predictive values. While implementing CMT is a labor-intensive process, we can obtain SCC increase predictions for all individuals on a daily basis in milking parlors where milk properties or milk yield data similar to those used in our study can be measured automatically. This does not mean the neural network and the weights we have trained can be used directly on all farms. Nevertheless, we believe the pipeline presented here can be adapted quickly in farms with similar source data.

The combination of various tests is a solution that is often applied in epidemiology to improve the predictive value of diagnostic tests. When interpreting the results obtained from a combination of tests, it is essential to note that the elemental assumption is that the tests used must be independent of each other. The two tests used in our work detect an increase in SCC on a different basis. CMT is based on higher amounts of DNA deriving from higher numbers of cells in milk, while ANN uses independent features of the milk. Since the two tests are thus uncorrelated, the predictions from their combination can be considered reasonable. A parallel or serial combination of two tests is possible. In the former case, the sensitivity of the combined test will be higher than that of the individual tests used; in the latter case, the specificity will be higher. For SM, this has been pointed out by other authors as well [16]. They used combinations of different tests for the indirect detection of IMI were evaluated. According to their conclusion, the combination of SCC and CMT or milk electrical conductivity only resulted in modest improvements in diagnostics compared to the use of CMT or milk electrical conductivity alone [16]. However, no data were found on how the combination of ANN and CMT changes the predictive values. Figure 2 shows that the combination of ANN and CMT improves the predictive values compared to individual tests. If, as with model selection, our goal in combining tests is to increase PPV, which we can achieve by increasing specificity, we can accomplish this by using serial testing.

Since pre-test probability is one of the parameters used to estimate predictive values, NPV and PPV depend on their value. Therefore, we used different prevalence values found in the literature in the estimations to show the extent of change in the predictive value in practically feasible circumstances. For the same reason, we have collected data from the literature on the diagnostic reliability of CMT. Although we searched the literature for prevalence, sensitivity, and specificity data that utilized SCC cut-off values of 200,000 cells/mL, there is heterogeneity in the formulation, evaluation, and presentation of the included studies’ results. Including other automatically measurable parameters (e.g., feed intake, activity) could complete and improve the presented model. Further studies are needed to test our approach in practice and compare the predictions with gold-standard methods.

5. Conclusions

For subclinical mastitis, the prediction reliability of the trained artificial neural network algorithm has reached the levels of the widely used but labor-intensive California Mastitis Test’s. Moreover, it is shown that the serial combination of the ANN and the CMT can significantly increase the positive predictive value on SM. Since the ANN uses data that is automatically registered with each milking, ANN-based pre-testing can be done on the whole herd every day. Animals found to be SM affected by ANN can be further evaluated by CMT testing to clarify their mastitis status. Thanks to the serial test combination, the predictive value of the method is increased, and at the same time, the labour and cost requirements can be reduced, as only one pre-screened batch of cows needs to be tested. Further research is required in order to investigate its practical applicability and performance. However our results suggest that relying on automated measurements can cost-effectively improve the early diagnosis of subclinical mastitis.

Author Contributions

I.C. and N.S. conceived the concept of the study. N.S. and S.Á.N. participated in the computing, statistical analysis, and drafting of the manuscript. N.S. takes responsibility for the data integrity and the data analysis’s accuracy. B.P.-S., G.G., I.C., N.S., S.Á.N. and T.V. carried out the manuscript’s critical revision for important intellectual content. All authors read and approved the final manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

The study was supported by the European Union project RRF-2.3.1-21-2022-00004 within the framework of the MILAB Artificial Intelligence National Laboratory.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used and analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ruegg, P.; Pantoja, J. Understanding and using somatic cell counts to improve milk quality. Ir. J. Agric. Food Res. 2013, 52, 101–117. [Google Scholar]
Cobirka, M.; Tancin, V.; Slama, P. Epidemiology and classification of mastitis. Animals 2020, 10, 2212. [Google Scholar] [CrossRef] [PubMed]
Leitner, G.; Blum, S.E.; Krifucks, O.; Lavon, Y.; Jacoby, S.; Seroussi, E. Alternative Traits for Genetic Evaluation of Mastitis Based on Lifetime Merit. Genes 2024, 15, 92. [Google Scholar] [CrossRef] [PubMed]
Martins, L.; Barcelos, M.M.; Cue, R.I.; Anderson, K.L.; Dos Santos, M.V.; Gonçalves, J.L. Chronic subclinical mastitis reduces milk and components yield at the cow level. J. Dairy Res. 2020, 87, 298–305. [Google Scholar] [CrossRef]
Liu, J.; Liu, H.; Cao, G.; Cui, Y.; Wang, H.; Chen, X.; Xu, F.; Li, X. Microbiota characterization of the cow mammary gland microenvironment and its association with somatic cell count. Vet. Sci. 2023, 10, 699. [Google Scholar] [CrossRef]
Pakrashi, A.; Ryan, C.; Guéret, C.; Berry, D.; Corcoran, M.; Keane, M.T.; Mac Namee, B. Early detection of subclinical mastitis in lactating dairy cows using cow-level features. J. Dairy Sci. 2023, 106, 4978–4990. [Google Scholar] [CrossRef]
Bindel, L.J.; Seifert, R. Most European countries will miss EU targets on antibacterial use by 2030: Historical analysis of European and OECD countries, comparison of community and hospital sectors and forecast to 2040. Naunyn-Schmiedeberg’s Arch. Pharmacol. 2025, 1–26. [Google Scholar] [CrossRef]
Silva, S.R.; Araujo, J.P.; Guedes, C.; Silva, F.; Almeida, M.; Cerqueira, J.L. Precision technologies to address dairy cattle welfare: Focus on lameness, mastitis and body condition. Animals 2021, 11, 2253. [Google Scholar] [CrossRef]
Bobbo, T.; Biffani, S.; Taccioli, C.; Penasa, M.; Cassandro, M. Comparison of machine learning methods to predict udder health status based on somatic cell counts in dairy cows. Sci. Rep. 2021, 11, 13642. [Google Scholar] [CrossRef]
Wang, Y.; Kang, X.; He, Z.; Feng, Y.; Liu, G. Accurate detection of dairy cow mastitis with deep learning technology: A new and comprehensive detection method based on infrared thermal images. Animal 2022, 16, 100646. [Google Scholar] [CrossRef]
Kawai, K.; Hayashi, T.; Kiku, Y.; Chiba, T.; Nagahata, H.; Higuchi, H.; Obayashi, T.; Itoh, S.; Onda, K.; Arai, S.; et al. Reliability in somatic cell count measurement of clinical mastitis milk using D e L aval cell counter. Anim. Sci. J. 2013, 84, 805–807. [Google Scholar] [CrossRef] [PubMed]
Lusis, I.; Laurs, A.; Antane, V. Viscosity method in robotic milking system for detection of somatic cell count in milk. In Proceedings of the Engineering for Rural Development, Proceedings of the 18th International Scientific Conference Engineering for Rural Development, Jelgava, Latvia, 22–24 May 2019; pp. 22–24. [Google Scholar]
Janzekovic, M.; Brus, M.; Mursec, B.; Vinis, P.; Stajnko, D.; Cus, F. Mastitis detection based on electric conductivity of milk. J. Achiev. Mater. Manuf. Eng. 2009, 34, 39–46. [Google Scholar]
Ferrero, F.; Valledor, M.; Campo, J. Screening method for early detection of mastitis in cows. Measurement 2014, 47, 855–860. [Google Scholar] [CrossRef]
Schalm, O.; Noorlander, D. Experiments and observations leading to development of the California Mastitis Test. J. Am. Vet. Med. Assoc. 1957, 130, 199–204. [Google Scholar]
Gohary, K.; McDougall, S. Predicting intramammary infection status at drying off using indirect testing of milk samples. N. Z. Vet. J. 2018, 66, 312–318. [Google Scholar] [CrossRef]
Hisira, V.; Zigo, F.; Kadaši, M.; Klein, R.; Farkašová, Z.; Vargová, M.; Mudroň, P. Comparative analysis of methods for somatic cell counting in cow’s milk and relationship between somatic cell count and occurrence of intramammary bacteria. Vet. Sci. 2023, 10, 468. [Google Scholar] [CrossRef]
Thompson, J.S.; Green, M.J.; Hyde, R.; Bradley, A.J.; O’Grady, L. The use of machine learning to predict somatic cell count status in dairy cows post-calving. Front. Vet. Sci. 2023, 10, 1297750. [Google Scholar] [CrossRef]
Greener, J.G.; Kandathil, S.M.; Moffat, L.; Jones, D.T. A guide to machine learning for biologists. Nat. Rev. Mol. Cell Biol. 2022, 23, 40–55. [Google Scholar] [CrossRef]
Benos, L.; Tagarakis, A.C.; Dolias, G.; Berruto, R.; Kateris, D.; Bochtis, D. Machine learning in agriculture: A comprehensive updated review. Sensors 2021, 21, 3758. [Google Scholar] [CrossRef]
Kuhn, M. Building Predictive Models in R Using the caret Package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2023. [Google Scholar]
Plozza, K.; Lievaart, J.; Potts, G.; Barkema, H. Subclinical mastitis and associated risk factors on dairy farms in New South Wales. Aust. Vet. J. 2011, 89, 41–46. [Google Scholar] [CrossRef]
Busanello, M.; Rossi, R.S.; Cassoli, L.D.; Pantoja, J.C.; Machado, P.F. Estimation of prevalence and incidence of subclinical mastitis in a large population of Brazilian dairy herds. J. Dairy Sci. 2017, 100, 6545–6553. [Google Scholar] [CrossRef] [PubMed]
Hiitiö, H.; Vakkamäki, J.; Simojoki, H.; Autio, T.; Junnila, J.; Pelkonen, S.; Pyörälä, S. Prevalence of subclinical mastitis in Finnish dairy cows: Changes during recent decades and impact of cow and herd factors. Acta Vet. Scand. 2017, 59, 1–14. [Google Scholar] [CrossRef] [PubMed]
Khasanah, H.; Setyawan, H.B.; Yulianto, R.; Widianingrum, D.C. Subclinical mastitis: Prevalence and risk factors in dairy cows in East Java, Indonesia. Vet. World 2021, 14, 2102. [Google Scholar] [CrossRef] [PubMed]
Dingwell, R.T.; Leslie, K.E.; Schukken, Y.H.; Sargeant, J.M.; Timms, L.L. Evaluation of the California mastitis test to detect an intramammary infection with a major pathogen in early lactation dairy cows. Can. Vet. J. 2003, 44, 413. [Google Scholar]
Sanford, C.; Keefe, G.P.; Sanchez, J.; Dingwell, R.; Barkema, H.; Leslie, K.; Dohoo, I.R. Test characteristics from latent-class models of the California Mastitis Test. Prev. Vet. Med. 2006, 77, 96–108. [Google Scholar] [CrossRef]
Fosgate, G.T.; Petzer, I.M.; Karzis, J. Sensitivity and specificity of a hand-held milk electrical conductivity meter compared to the California mastitis test for mastitis in dairy cattle. Vet. J. 2013, 196, 98–102. [Google Scholar] [CrossRef]
Kandeel, S.; Megahed, A.; Arnaout, F.; Constable, P. Evaluation and Comparison of 2 On-Farm Tests for Estimating Somatic Cell Count in Quarter Milk Samples from Lactating Dairy Cattle. J. Vet. Intern. Med. 2018, 32, 506–515. [Google Scholar] [CrossRef]
Fernandes, L.; Guimaraes, I.; Noyes, N.; Caixeta, L.; Machado, V. Effect of subclinical mastitis detected in the first month of lactation on somatic cell count linear scores, milk yield, fertility, and culling of dairy cows in certified organic herds. J. Dairy Sci. 2021, 104, 2140–2150. [Google Scholar] [CrossRef]
Thrusfield, M.; Christley, R. Chapter Diagnostic testing. In Veterinary Epidemiology; John Wiley & Sons: Hoboken, NJ, USA, 2018. [Google Scholar]
Abdolrasol, M.G.; Hussain, S.S.; Ustun, T.S.; Sarker, M.R.; Hannan, M.A.; Mohamed, R.; Ali, J.A.; Mekhilef, S.; Milad, A. Artificial neural networks based optimization techniques: A review. Electronics 2021, 10, 2689. [Google Scholar] [CrossRef]
Dey, P. Artificial neural network in diagnostic cytology. CytoJournal 2022, 19, 27. [Google Scholar] [CrossRef] [PubMed]
Dastres, R.; Soori, M. Artificial neural network systems. Int. J. Imaging Robot. (IJIR) 2021, 21, 13–25. [Google Scholar]
Desai, M.; Shah, M. An anatomization on breast cancer detection and diagnosis employing multi-layer perceptron neural network (MLP) and Convolutional neural network (CNN). Clin. eHealth 2021, 4, 1–11. [Google Scholar] [CrossRef]
Hagenauer, J.; Helbich, M. A geographically weighted artificial neural network. Int. J. Geogr. Inf. Sci. 2022, 36, 215–235. [Google Scholar] [CrossRef]
Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: https://www.tensorflow.org (accessed on 28 April 2025).
Marshall, R.J. The predictive value of simple rules for combining two diagnostic tests. Biometrics 1989, 45, 1213–1222. [Google Scholar] [CrossRef]
Grimes, D.A.; Schulz, K.F. Uses and abuses of screening tests. Lancet 2002, 359, 881–884. [Google Scholar] [CrossRef]
Wong, H.B.; Lim, G.H. Measures of diagnostic accuracy: Sensitivity, specificity, PPV and NPV. Proc. Singap. Healthc. 2011, 20, 316–318. [Google Scholar] [CrossRef]
Mammadova, N.M.; Keskin, I. Application of neural network and adaptive neuro-fuzzy inference system to predict subclinical mastitis in dairy cattle. Indian J. Anim. Res. 2015, 49, 671–679. [Google Scholar] [CrossRef]
Ebrahimi, M.; Mohammadi-Dehcheshmeh, M.; Ebrahimie, E.; Petrovski, K.R. Comprehensive analysis of machine learning models for prediction of sub-clinical mastitis: Deep Learning and Gradient-Boosted Trees outperform other models. Comput. Biol. Med. 2019, 114, 103456. [Google Scholar] [CrossRef]

Figure 1. Fully connected artificial neural network scheme. The measured variables (x) stored in the databases used form the input layer, where the number (n) of nodes equals the number of the involved variables. The output layer (SM) contains nodes according to the target variable’s required class number. In our case, the subclinical mastitis state of a given animal can be positive or negative. Between the input and output layers, the nodes (m) of the hidden layers (upper index) store the parameters (weights: w) of the model. During the training, the weights are updated iteratively to reach the best classification of the output. The arrows represent the direction of information flow in training and prediction. From various architectures (with hidden layer and node numbers), based on the predictive performance, the best model is chosen by model selection.

Figure 2. Predictive values. The black curve represents the ANN-based negative (NPV) and positive predictive values (PPV) as a function of subclinical mastitis prevalence. Based on sensitivity (SE) and specificity (SP) values of California Mastitis Test (CMT) published by Dingwell et al. [27], Fosgate et al. [29], Gohary et al. [16], Kandeel et al. [30], and Sanford et al. [28] estimated predictive values are shown by colored curves. In the first row of the figures (one test), the curves represent the predictive values using SE and SP of the individual tests (ANN or CMTs). The following two rows of figures show the predictive values obtained by combining ANN and CMT. The second row of figures shows their parallel combination, and the third row their serial combination.

Figure 3. Prediction changes by serial combination. The curves show the predictive value changes of the ANN+CMT combination against CMT as a function of subclinical mastitis prevalence. The color of the curves indicates the literature source of the used CMT SE and SP values (Dingwell et al. [27], Fosgate et al. [29], Gohary et al. [16], Kandeel et al. [30], and Sanford et al. [28]). In the top row, the curves show the absolute difference between the ANN+CMT prediction values and the CMT prediction values, while in the bottom row, their ratios.

Table 1. Descriptives of the negative and positive predictive values. For ANN, the NPV had a median 0.69 (IQR: 0.22), while PPV 0.65 (0.23). The predictive estimates based on the SE and SP values of CMT published by Dingwell et al. [27], Fosgate et al. [29], Gohary et al. [16], Kandeel et al. [30], and Sanford et al. [28] are in the One test columns. The Combination columns show the estimates of parallel and serial combinations of ANN and CMT testing.

Source	Negative Predictive Value			Positive Predictive Value
	One Test	Combination		One Test	Combination
		Parallel	Serial		Parallel	Serial
Dingwell	0.75 (0.19)	0.84 (0.14)	0.66 (0.22)	0.65 (0.23)	0.60 (0.24)	0.82 (0.15)
Fosgate	0.95 (0.05)	0.97 (0.03)	0.72 (0.20)	0.77 (0.18)	0.65 (0.23)	0.89 (0.10)
Gohary	0.72 (0.20)	0.81 (0.16)	0.66 (0.23)	0.56 (0.25)	0.54 (0.25)	0.75 (0.19)
Kandeel	0.95 (0.04)	0.97 (0.03)	0.72 (0.20)	0.79 (0.17)	0.67 (0.22)	0.90 (0.09)
Sanford	0.68 (0.22)	0.78 (0.17)	0.65 (0.23)	0.51 (0.25)	0.51 (0.25)	0.71 (0.21)

Table 2. Descriptives of prediction changes by serial combination. The median and IQR of changes in predictive values between CMT and ANN+CMT serial combination estimates. The CMT test bias values were obtained from the paper of Dingwell et al. [27], Fosgate et al. [29], Gohary et al. [16], Kandeel et al. [30], and Sanford et al. [28].

Source	Difference		Ratio
Source	NPV	PPV	NPV	PPV
Dingwell	$- 0.09$ (0.04)	0.16 (0.07)	0.88 (0.08)	1.25 (0.21)
Fosgate	$- 0.24$ (0.16)	0.12 (0.08)	0.75 (0.18)	1.15 (0.14)
Gohary	$- 0.06$ (0.02)	0.19 (0.05)	0.91 (0.06)	1.34 (0.26)
Kandeel	$- 0.24$ (0.16)	0.11 (0.07)	0.75 (0.18)	1.14 (0.13)
Sanford	$- 0.03$ (0.01)	0.20 (0.04)	0.96 (0.03)	1.40 (0.29)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nagy, S.Á.; Csabai, I.; Varga, T.; Póth-Szebenyi, B.; Gábor, G.; Solymosi, N. Neural Network-Aided Milk Somatic Cell Count Increase Prediction. Vet. Sci. 2025, 12, 420. https://doi.org/10.3390/vetsci12050420

AMA Style

Nagy SÁ, Csabai I, Varga T, Póth-Szebenyi B, Gábor G, Solymosi N. Neural Network-Aided Milk Somatic Cell Count Increase Prediction. Veterinary Sciences. 2025; 12(5):420. https://doi.org/10.3390/vetsci12050420

Chicago/Turabian Style

Nagy, Sára Ágnes, István Csabai, Tamás Varga, Bettina Póth-Szebenyi, György Gábor, and Norbert Solymosi. 2025. "Neural Network-Aided Milk Somatic Cell Count Increase Prediction" Veterinary Sciences 12, no. 5: 420. https://doi.org/10.3390/vetsci12050420

APA Style

Nagy, S. Á., Csabai, I., Varga, T., Póth-Szebenyi, B., Gábor, G., & Solymosi, N. (2025). Neural Network-Aided Milk Somatic Cell Count Increase Prediction. Veterinary Sciences, 12(5), 420. https://doi.org/10.3390/vetsci12050420

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Neural Network-Aided Milk Somatic Cell Count Increase Prediction

Simple Summary

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.2. Data Preprocessing

2.3. Evaluation Metrics

2.4. Model Training

2.5. Combination of Tests

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI