Wearable Wireless Biosensor Technology for Monitoring Cattle: A Review

Simple Summary The wearable wireless sensor system plays a crucial role in providing behavioral and physiological data for each individual in precision livestock farming. This article reviewed the most types of sensor systems available in the market and summarized detailed information on these systems. Additionally, through meta-analysis, the accuracy of the parameters generated by the sensor system was verified. As a result, it has been shown that there are more than 60 sensor systems of various types have been developed and sold. Most of them generate behavioral and physiological parameters of cattle with excellent performance (e.g., eating time, ruminating time, lying time, standing time, etc.), with the exception of a few parameters (e.g., drinking time and walking time). In this review, it was also investigated that the same parameters predicted by sensor systems of the same brand showed different accuracies, but it was not possible to confirm where this difference originated because the additional experimental conditions presented in the literature were not detailed. Therefore, this review suggested that guidelines for evaluation criteria for research evaluating sensor performance are needed. Abstract The review aimed to collect information about the wearable wireless sensor system (WWSS) for cattle and to conduct a systematic literature review on the accuracy of predicting the physiological parameters of these systems. The WWSS was categorized as an ear tag, halter, neck collar, rumen bolus, leg tag, tail-mounted, and vaginal mounted types. Information was collected from a web-based search on Google, then manually curated. We found about 60 WWSSs available in the market; most sensors included an accelerometer. The literature evaluating the WWSS performance was collected through a keyword search in Scopus. Among the 1875 articles identified, 46 documents that met our criteria were selected for further meta-analysis. Meta-analysis was conducted on the performance values (e.g., correlation, sensitivity, and specificity) for physiological parameters (e.g., feeding, activity, and rumen conditions). The WWSS showed high performance in most parameters, although some parameters (e.g., drinking time) need to be improved, and considerable heterogeneity of performance levels was observed under various conditions (average I2 = 76%). Nevertheless, some of the literature provided insufficient information on evaluation criteria, including experimental conditions and gold standards, to confirm the reliability of the reported performance. Therefore, guidelines for the evaluation criteria for studies evaluating WWSS performance should be drawn up.


Introduction
To increase the sustainability of the dairy industry, there has been an increased need for replacing traditional group-level management with precision dairy farming, which continuously monitors and manages individual productivity and health issues [1]. However, individual monitoring through direct observation of farm staff or video recordings is time-consuming, labor-intensive, difficult to detect accurately, and practically impossible

Search Strategy and Quality Evaluation of the Constructed Database
In this review, we collected all the information about the currently available wearable wireless biosensors for cattle, summarizing the basic features of these sensors. Our comprehensive search was performed through a web-based search on Google, and the search terms were as follows: cattle AND sensor AND (ear OR halter OR neck OR rumen OR leg OR tail OR vagina) The inclusion criterion was that the product must be currently commercially available. The availability of the sensors was confirmed based on the information obtained from the respective web pages. The products marked as 'in development' or 'to be released soon (concept solutions and prototypes)' were excluded from this study, i.e., only the products currently available in the market were included in the study. The initial search lasted for three months (August 2019~October 2019). It was conducted extensively and meticulously to obtain a comprehensive market inventory and minimize the risk of missing any relevant products. While writing this review, the search process was re-conducted to prevent the omission of newly released products (~April 2020). During this iterative process, we double-checked if there were any missing products in the existing database.
Technical specifications and information on vendor websites were our primary sources of information, and business reports and research papers were additional sources for this review. If we found any further information about a product in scientific articles, we used this information to update our product information. For an objective evaluation of database quality, our database was compared with another independent database, the sensor product database for dairy cattle provided by the Data Driven Dairy Decision for Farmers (4D4F) project (https://www.4d4f.eu/, last updated on 23 August 2019) funded from the European Union's Horizon 2020 research and innovation program. Several wearable wireless biosensors that can be mounted externally on the animal body, such as on ears, necks, legs, and tails, have been developed. Among these, earmounted sensors are mainly equipped with sensors that measure temperature and activity. They are mostly mounted in the middle of the ear and used to check the animals' health status using temperature data. Most ear tag products equipped with three-axis accelerometer sensors can additionally check the animal's ruminating, eating, resting, and activity. The management system connected to the sensor uses these data to diagnose an animal's estrous cycle and health issues.
Halter type sensors are attached to the cow's head, and they measure the cow's eating and ruminating behavior through a noseband pressure sensor and a three-axis accelerometer sensor. The currently available ear tag and halter type sensors are listed in Tables 1 and 2.

Neck Collars
The neck collar sensor system consists of a device with sensors attached to the strap hanging on a cow's neck. This type of sensor is the most commonly used in dairy farms; many companies manufacture it. Generally, neck collars have been widely used to control the amount of feed or measure individual feed intake through radio-frequency identification technology. Recently, accelerometer and microphone sensors have been added to neck collars to measure eating time, rumination time, and activity level. Some are equipped with temperature sensors to measure an animal's body temperature. These sensors provide farm managers with a cow's health and estrus information. Some neck collar sensors are used in combination with automatic milking systems. The currently available commercial neck collar tag sensors are listed in Tables 3 and 4.

Reticulo-Rumen Bolus Sensors
A rumen bolus system is inserted orally and placed in the reticulum, where it will remain throughout the animal's life. It is designed to continuously monitor a few rumen parameters (temperature and pH) and an animal's activity throughout the day. The bolus is equipped with an internal battery, a temperature sensor/pH sensor/accelerometer, and a transmitter for data transmission. Its battery can last for months to years and can transmit the data wirelessly at adjustable time intervals.
Bolus sensors are primarily designed to sense ruminal temperature changes, which can signal a shift in animal physiological states. A decrease in ruminal temperature reflects drinking and eating events, and its increase coincides with increased body temperature [5][6][7]. Monitoring changes in the ruminal temperature and activity can facilitate early detection of abnormal behavior, estrous cycle, and illnesses. Unfortunately, the pH sensor is mostly unequipped due to its relatively short lifespan. The currently available commercial bolus sensor systems with a pH sensor have an operational lifetime of no more than a few months since the stability of the pH probe is limited. Thus, rumen bolus systems with a pH sensor are mainly considered as research tools. The currently available commercial bolus products are presented in Tables 5 and 6.

Leg Tags
Along with neck collar sensors, leg tag sensors are a popular sensor technology used in farms. Leg tag sensors are mainly equipped with three-axis accelerometers, which can measure animal activity, walking time, lying time, standing time, and the number of steps. They also provide farm managers with a cow's health and estrus information. Similar to the neck collar system, some leg tag sensors are used in combination with automatic milking systems. The currently available commercial leg tag products are presented in Tables 7 and 8.

Tail and Vagina Mounted Types
Both dystocia and stillbirth significantly impact on animal productivity and farm profitability, often requiring a skilled assistant and immediate intervention at the moment of delivery [8]. In order to reduce the reliance on labor and aid animal management, sensors detecting the calving time without physical observation have been developed. These sensors are attached to the tail (or tail head), and they measure tail movement patterns triggered by labor contractions.
Among the sensors used to detect calving, some sensors are inserted directly into a cow's vagina. Using the principle that a cow's body temperature decreases before calving [9][10][11], vaginally inserted sensors detect a reduction in a cow's vaginal temperature and provide a calving alarm to farm managers. Another type of vaginally inserted sensor detects light. When the device is pushed out of the vagina by a cow's water break, it is recognized that the device is out of the cow's body through detecting light. At this time, the device sends a text message to the farm manager to notify the start of calving. The currently available commercial products of the abovementioned types are presented in Table 9.

Literature Review on the Evaluation of Parameters Generated by Wearable Wireless Biosensor Systems
Wearable wireless wearable biosensors provide farm managers with physiological and behavioral data, such as eating, rumination, walking, and lying time. These data are generated by computing raw data measured by the sensor using a specific algorithm. The units of the generated values depend on the sensor type and the algorithm used. As the computed physiological and behavioral parameters are used as predictor variables in health and estrus diagnostic models, they should accurately represent the actual state of individual animals. Several studies have been conducted to verify the performance of different sensors. The majority of these studies conducted correlation analyses between the sensor data and the gold standard (actual observations) and performance analyses (i.e., sensitivity, specificity, accuracy, and precision). We reviewed the literature on the evaluation of physiological and behavioral data generated by wearable wireless biosensors.

Search Strategy, Study Selection, and Quality Assessment
A literature search was conducted by a keyword search in Scopus. To avoid an excessive number of search results, we used specific keywords. The final query used to search for articles in the databases was (TITLE-ABS-KEY (correlation OR correlated OR regression OR sensitivity OR specificity OR precision OR accuracy)) AND (TITLE-ABS-KEY (cow OR cattle OR calf OR heifer OR buffalo)) AND ((TITLE-ABS-KEY (sensor* AND NOT sensory)) OR (TITLE-ABS-KEY (automat* OR *meter OR device OR tag))) AND (TITLE-ABS-KEY (detect* OR monitor* OR record*)) AND NOT (TITLE-ABS-KEY (genetic* OR chromatography OR follicle OR muscle OR meat OR DNA OR antibody OR serum OR patient OR assay OR spectro*)) AND (LIMIT-TO (DOCTYPE, 'ar')) AND (LIMIT-TO (LANGUAGE, 'English')). A total of 1875 articles were retrieved using this query (search date: 26 April 2020).
After the initial database search was completed, we screened the title and abstract of each selected article and made decisions on the suitability of each study for inclusion in this review. Articles were included in the final database if they (i) investigated the performance of wearable wireless biosensors for beef or dairy cattle, (ii) evaluated variables related to feeding behavior, moving behavior, or rumen status generated by the sensors, (iii) tested the performance of the sensors with other independent reference measurements (a.k.a. the gold standard), such as real-time or recorded visual observations for the behavioral activities and manual pH or temperature measurements, and (iv) presented at least one or more quantitative evaluation measures, such as correlation, accuracy, precision, sensitivity, and specificity. A total of 46 articles met the above criteria and were selected for our systematic review. These studies evaluated the sensor's performance in monitoring the following three parameters: feeding behavior, activity behavior, and rumen status. The following information was extracted from the selected papers: target behavioral and physiological parameter (i.e., feeding behavior: eating time, ruminating time, drinking time; activity behavior: lying time, standing time, walking time, step count, active time, inactive time; rumen statue: rumen pH and rumen temperature), sensor information (i.e., mounting position, product name, company, country), animal information (i.e., breed, gender, physiological stage), housing information (i.e., barn type, feeding method), gold standard information (i.e., method, number of observers, reliability between observers), data quantity (i.e., number of animals, total collection time, mean collection time per animal), and evaluation results (i.e., correlation coefficient: Pearson, Spearman, Concordance; diagnostic accuracy: sensitivity, specificity, precision, accuracy).

Evaluation of Wearable Wireless Biosensor Systems
In this study, feeding behavior was classified as eating, ruminating, or drinking. Feeding behavior is usually measured by a sensor located on the head of the cow, such as an ear tag, halter, or neck collar. Activity behavior was classified as lying, standing, walking, active, or inactive (resting). These activities are usually measured by leg tag sensors; however, there are other types of sensors (e.g., ear tags and neck collars) capable of recording daily active and inactive time. As the gold standard for evaluating the sensor, the total duration of the target behavior quantified through visual observation of an observer is used for the behavioral activities, while independent measurements are used for physiological parameters (rumen pH and temperature). During observation, the trained observer records the start time and end time of the target behavior and calculates the duration of target behavior based on this record. The target behavior is defined through an ethogram, and the observer is trained to identify the animal's behavior based on this definition before observation. Visual observation of an observer includes both real-time (live observation) and non-real-time (video recordings) observations. The case where values derived from other wearable wireless sensors were used as the gold standard were excluded from this study.
The correlation results, i.e., the values of Pearson's correlation coefficient (PCC), Spearman's rank correlation coefficient (SCC), and Lin's concordance correlation coefficient (CCC) were graded using the criteria of Hinkle et al. [12]. The grades were negligible (0.00-0.30), low (0.30-0.50), moderate (0.50-0.70), high (0.70-0.90), and very high (0.90-1.00). PCC and SCC can describe a linear relationship between a measured value and a value to be compared, and CCC can additionally explain the degree of agreement with the measured value as well as the linear relationship. In this review, along with correlation and CCC, the results of binary classification tests based on 2 × 2 contingency tables (true positives, false negatives, false positives, and true negatives) of the sensors presented in the articles are also discussed. The following performance results were considered: sensitivity (Se; true positives out of the sum of true positives and false negatives), specificity (Sp; true negatives out of the sum of true negatives and false positives), accuracy (Acc; true positives and true negatives out of the total number of tests), and precision (Pre; true positives out of the sum of true positives and false positives; positive predictive value).

Statistical Analysis
A meta-analysis was performed for the reported correlation coefficients (PCC, SCC, and CCC) and diagnostic accuracy (i.e., Se and Sp). The mean and 95% confidence intervals of the statistics were estimated through a random-effects model based on the DerSimonian-Laird estimator [13], which was generally considered as the standard procedure in the meta-analysis. Since the animal types, physiological stages of animals, feeding and housing conditions, and sensor products were varied among the studies included in the metaanalysis, the random-effects model was selected instead of a fixed-effects model. Given the non-normality of correlation coefficients, point estimates were variance-stabilized using Fisher's z-transform [14]. The mean value from each study was weighted based on the inverse variance method using the study sample size (number of animals). We treated evaluations conducted under different conditions within the same article as separate individual studies. The analysis was not performed if there were no more than two independent study samples for one behavior. Heterogeneity was examined using τ 2 , I 2 , and Cochran's Q statistic, where τ 2 = 0 suggests no heterogeneity, and I 2 values of 25, 50, and 75% correspond to cut-off points for low, moderate, and high heterogeneity, respectively [15]. The differences in the correlation between sensor types were analyzed using analysis of variance. All the procedures of the meta-analysis were performed using the 'metacor' function in the 'meta' package of R version 4.0.3 [16]. Statistical significance was set at p < 0.05, and the results characterized by 0.05 ≤ p < 0.1 were considered trends.

Drinking Time
Drinking time is a variable that represents the amount of time a cow spends drinking water per day. In the literature, drinking behavior is defined as the behavior that cows exhibit when they put their muzzles into water bowls and swallow water [23][24][25]28,33]. The SCC value based on four independent study samples from three articles showed that the drinking time recorded by the sensors was poorly correlated with the actual observations (0.50, n = 142; Figure 5 and Supplementary Table S5) [24,25,28,33]. The same sensor product was used for the analysis of drinking time, but there were some differences in the animal type and feeding method (Supplementary Table S5), which showed high heterogeneity (I 2 = 79% and τ 2 = 0.14). The mean diagnostic accuracy of the wearable biosensors based on four independent study samples from three articles (four for Se, Sp, Acc, and Pre;

Drinking Time
Drinking time is a variable that represents the amount of time a cow spends drinking water per day. In the literature, drinking behavior is defined as the behavior that cows exhibit when they put their muzzles into water bowls and swallow water [23][24][25]28,33]. The SCC value based on four independent study samples from three articles showed that the drinking time recorded by the sensors was poorly correlated with the actual observations (0.50, n = 142; Figure 5 and Supplementary Table S5) [24,25,28,33]. The same sensor product was used for the analysis of drinking time, but there were some differences in the animal type and feeding method (Supplementary Table S5), which showed high heterogeneity (I 2 = 79% and τ 2 = 0.14). The mean diagnostic accuracy of the wearable biosensors based on four independent study samples from three articles (four for Se, Sp, Acc, and Pre; Table 10 and Supplementary Table S6) showed an Se of 21.9%, an Sp of 99.9%, an Acc of 98.8%, and a Pre of 30.8% (n = 149); notably, Se and Pre were lower than those relative to other feeding behavior variables [23,25,28].

Activity Behavior Lying Time
Lying time is a variable that indicates how long an animal is lying on the ground per day. In the literature, lying time is defined as the time during which the body is not supported by the legs and is in contact with the ground [18,[31][32][33]37,[44][45][46][47][48][49][50]. The PCC and SCC values based on 10 independent study samples from eight articles (six for PCC and four for SCC; Supplementary Table S7) showed that the lying time recorded by the leg tag sensors was very highly correlated with the actual observations (PCC = 0.99, n = 180, I 2 = 0%; SCC = 1.00, n = 53, I 2 = 97%; Figure 6) [18,31,33,37,[44][45][46]49]. The CCC value based on six independent study samples from three articles was also very high (1.00, n = 168, I 2 = 90%; Figure 6) [18,31,48]. Both the sensor product and the animal housing condition were different among the studies included in the meta-analysis (Supplementary Table S7), and very high heterogeneity was observed (I 2 = 94% and τ 2 = 1.69), with the exception of the analysis for PCC. The mean diagnostic accuracy of the wearable biosensors based on five independent study samples from three articles (five for Se and Sp and four for Pre; Table  10 and Supplementary Table S8)

Activity Behavior Lying Time
Lying time is a variable that indicates how long an animal is lying on the ground per day. In the literature, lying time is defined as the time during which the body is not supported by the legs and is in contact with the ground [18,[31][32][33]37,[44][45][46][47][48][49][50]. The PCC and SCC values based on 10 independent study samples from eight articles (six for PCC and four for SCC; Supplementary Table S7) showed that the lying time recorded by the leg tag sensors was very highly correlated with the actual observations (PCC = 0.99, n = 180, I 2 = 0%; SCC = 1.00, n = 53, I 2 = 97%; Figure 6) [18,31,33,37,[44][45][46]49]. The CCC value based on six independent study samples from three articles was also very high (1.00, n = 168, I 2 = 90%; Figure 6) [18,31,48]. Both the sensor product and the animal housing condition were different among the studies included in the meta-analysis (Supplementary Table S7), and very high heterogeneity was observed (I 2 = 94% and τ 2 = 1.69), with the exception of the analysis for PCC. The mean diagnostic accuracy of the wearable biosensors based on five independent study samples from three articles (five for Se and Sp and four for Pre; Table 10 and Supplementary Table S8) showed an Se of 99.8% (n = 53), an Sp of 99.9% (n = 53), and a Pre of 99.9% (n = 44) [32,47,50].

Standing Time
Standing time is a variable that represents the amount of time an animal spends standing per day. In the literature, standing behavior is defined as an animal's behavior when it is in an upright position with support from the legs but is not walking [31,33,44,45,47,48,50,51]. The SCC value based on four independent study samples from four articles showed that the standing time recorded by the leg tag sensors was very highly correlated with the actual observations (0.93, n = 56, I 2 = 57%; Figure 7 and Supplementary Table S9) [31,33,44,45]. In addition, the CCC value based on three independent study samples from two articles was 1.0 (n = 28, I 2 = 87%; Figure 7 and Supplementary Table S9) [31,48]. The sensor products and animal housing conditions used were different between the studies included in the meta-analysis of standing time (Supplementary Table S9), and moderate heterogeneity was observed (I 2 = 72% and τ 2 = 0.63). The mean diagnostic accuracy of wearable biosensors based on four independent study samples from three articles (four for Se and Sp and three for Pre; Table 10 and Supplementary Table S10) showed an Se of 95% (n = 53), an Sp of 98% (n = 53), and a Pre of 98% (n = 44) [47,50,51]. Only one study tested the performance of a neck sensor in estimating the standing time. The reported sensitivity of a neck sensor was approximately 30% lower than that of a leg sensor (Se = 63% and Sp = 98%) [51].

Standing Time
Standing time is a variable that represents the amount of time an animal spends standing per day. In the literature, standing behavior is defined as an animal's behavior when it is in an upright position with support from the legs but is not walking [31,33,44,45,47,48,50,51]. The SCC value based on four independent study samples from four articles showed that the standing time recorded by the leg tag sensors was very highly correlated with the actual observations (0.93, n = 56, I 2 = 57%; Figure 7 and Supplementary Table S9) [31,33,44,45]. In addition, the CCC value based on three independent study samples from two articles was 1.0 (n = 28, I 2 = 87%; Figure 7 and Supplementary  Table S9) [31,48]. The sensor products and animal housing conditions used were different between the studies included in the meta-analysis of standing time (Supplementary Table  Table 10 and Supplementary Table S10) showed an Se of 95% (n = 53), an Sp of 98% (n = 53), and a Pre of 98% (n = 44) [47,50,51]. Only one study tested the performance of a neck sensor in estimating the standing time.
The reported sensitivity of a neck sensor was approximately 30% lower than that of a leg sensor (Se = 63% and Sp = 98%) [51].

Walking Time
Walking time is a variable that represents the amount of time in which the animal walks per day. Walking time is typically defined as a period characterized by at least three consecutive strides in the forward or backward direction [31][32][33]44,45,47,48,50,51]. The SCC value based on four independent study samples from four articles showed that the walking time recorded by the sensors was highly correlated with the actual observations (0.83, n = 56, I 2 = 75%; Figure 8 and Supplementary Table S11) [31,33,44,45]. The CCC value based on three independent study samples from three articles was also high (0.80, n = 28, I 2 = 49%; Figure 8 and Supplementary Table S11) [31][32][33]44,45,48]. There were differences in the sensor products and the housing conditions used among the studies included in the analysis of the walking time (Supplementary Table S11), but the heterogeneity was moderate (I 2 = 62% and τ 2 = 0.21). The mean diagnostic accuracy of the wearable biosensors based on five independent study samples from four articles (five for Se and Sp and four for Pre; Table 10 and Supplementary Table S12) showed an Se of 34% (n = 53), an Sp of 98% (n = 53), and a Pre of 27% (n = 44); the Se and Pre were lower than those relative to other activity behavior variables [32,47,50,51]. Step Count Step count is a variable that represents the number of steps a cow makes per day. A step is defined as the phenomenon occurring when the rear foot is lifted completely off the ground and returned to the ground in any location with or without the movement of the entire body [45,48,[52][53][54]. The CCC value based on three independent study samples from two articles showed that the step count measured by the sensors was moderately correlated with the actual observations (0.69, n = 22, I 2 = 0%; Figure 9 and Supplementary Table S13) [48,54]. Although there were differences in the sensor product, animal type, and housing condition among the studies included in the analysis of the step counts (Sup- Step Count Step count is a variable that represents the number of steps a cow makes per day. A step is defined as the phenomenon occurring when the rear foot is lifted completely off the ground and returned to the ground in any location with or without the movement of the entire body [45,48,[52][53][54]. The CCC value based on three independent study samples from two articles showed that the step count measured by the sensors was moderately correlated with the actual observations (0.69, n = 22, I 2 = 0%; Figure 9 and Supplementary Table S13) [48,54]. Although there were differences in the sensor product, animal type, and housing condition among the studies included in the analysis of the step counts (Supplementary Table S13), no heterogeneity was observed (I 2 = 0% and τ 2 = 0).

Active Time
Active time is a variable that represents the total active time of a cow per day. It should be noted that the definition of active behavior varies in the literature. Bikker et al. [17] and Pereira et al. [30] defined active behavior as the process of moving the head or body and walking. Elischer et al. [37] defined active behavior as standing or walking behavior. Zambelis et al. [27] defined active behavior in detail as follows: exploring, drinking, urination, defecation, rising, lying down, head swinging, self-grooming, and social interaction. Swartz et al. [49] defined active behavior as a step activity in which the right rear leg is lifted off the floor while standing. The PCC and SCC values based on 10 independent study samples from eight articles (seven for PCC and four for SCC; Supplementary Table S14) showed that the active time recorded by the sensors was highly correlated with the actual observations (PCC = 0.80, n = 98, I 2 = 77%; SCC = 0.92, n = 146, I 2 = 0%; Figure 10) [17,25,27,28,30,31,37,49]. However, the CCC value based on three independent study samples from three articles showed that such correlation was moderate (0.57, n = 51, I 2 = 81%; Figure 10 and Supplementary Table S14) [17,30,31]. There were differences in the sensor products and the housing conditions used between the studies included in the analysis of active time (Supplementary Table S14), and high heterogeneity was observed (I 2 = 79% and τ 2 = 0.33), with the exception of SCC analysis. Unlike the other sensor types, the halter sensors (RumiWatch Noseband sensors) record active time in terms of movement of the muzzle that is not related to ingestion and drinking [25,28,31]. The active time variables evaluated in these studies showed a high correlation with the actual observed values (PCC = 0.87, SCC = 0.92, and CCC = 0.90) [25,28,31]. The diagnostic accuracy of the halter sensors based on three independent study samples from two articles (three for Se, Sp, Acc, and Pre; Table 10 and Supplementary Table S15) showed an Se of 93.1%, an Sp of 93.4%, an Acc of 93.4%, and a Pre of 89.9% (n = 134) [25,28].

Active Time
Active time is a variable that represents the total active time of a cow per day. It should be noted that the definition of active behavior varies in the literature. Bikker et al. [17] and Pereira et al. [30] defined active behavior as the process of moving the head or body and walking. Elischer et al. [37] defined active behavior as standing or walking behavior. Zambelis et al. [27] defined active behavior in detail as follows: exploring, drinking, urination, defecation, rising, lying down, head swinging, self-grooming, and social interaction. Swartz et al. [49] defined active behavior as a step activity in which the right rear leg is lifted off the floor while standing. The PCC and SCC values based on 10 independent study samples from eight articles (seven for PCC and four for SCC; Supplementary Table S14) showed that the active time recorded by the sensors was highly correlated with the actual observations (PCC = 0.80, n = 98, I 2 = 77%; SCC = 0.92, n = 146, I 2 = 0%; Figure 10) [17,25,27,28,30,31,37,49]. However, the CCC value based on three independent study samples from three articles showed that such correlation was moderate (0.57, n = 51, I 2 = 81%; Figure 10 and Supplementary Table S14) [17,30,31]. There were differences in the sensor products and the housing conditions used between the studies included in the analysis of active time (Supplementary Table S14), and high heterogeneity was observed (I 2 = 79% and τ 2 = 0.33), with the exception of SCC analysis. Unlike the other sensor types, the halter sensors (RumiWatch Noseband sensors) record active time in terms of movement of the muzzle that is not related to ingestion and drinking [25,28,31]. The active time variables evaluated in these studies showed a high correlation with the actual observed values (PCC = 0.87, SCC = 0.92, and CCC = 0.90) [25,28,31]. The diagnostic accuracy of the halter sensors based on three independent study samples from two articles (three for Se, Sp, Acc, and Pre; Table 10 and Supplementary Table S15) showed an Se of 93.1%, an Sp of 93.4%, an Acc of 93.4%, and a Pre of 89.9% (n = 134) [25,28]. Animals 2021, 11, x 24 of 32 Inactive or idle time is a variable that represents the amount of time in which cows are not active per day. Inactive time is defined as the time of lying or standing while resting without performing any action, that is, rumination, eating, or drinking Inactive or idle time is a variable that represents the amount of time in which cows are not active per day. Inactive time is defined as the time of lying or standing while resting without performing any action, that is, rumination, eating, or drinking [17,19,21,27,29,30,32]. The PCC value based on seven independent study samples from seven articles was very high (0.94, n = 107, I 2 = 84%; Figure 11 and Supplementary Table S16) [17,19,21,27,29,30,32]. Although slightly lower than that of the PCC, the CCC value calculated from five independent study samples from five articles was also high (0.85, n = 81, I 2 = 83%; Figure 11 and Supplementary Table S16) [17,19,29,30,32]. There were differences in the sensor products used and the animal housing conditions between the studies included in the analysis (Supplementary Table S16), and high heterogeneity was observed (I 2 = 84% and τ 2 = 0.42). The mean diagnostic accuracy of the wearable biosensors based on three independent study samples from two articles (three for Se, Sp, and Pre; Table 10 and Supplementary Table S17) showed an Se of 59% (n = 53), an Sp of 98% (n = 53), and a Pre of 89% (n = 44) [29,32].

Rumen Status
Rumen pH and rumen temperature are variables measured using reticulo-rumen bolus sensors. In the case of rumen pH measured by the bolus sensors, the pH of the rumen Figure 11. Forest plot of the correlation coefficient of inactive time between wearable sensors and visual observation. (A,B) show Pearson's correlation coefficient and concordance correlation coefficient, respectively. Numbers in parentheses indicate individual studies applying different evaluation conditions within the same article. 'Total' means the sample size of each study and 'Weight' means the weight for the mean based on the sample size.

Rumen Status
Rumen pH and rumen temperature are variables measured using reticulo-rumen bolus sensors. In the case of rumen pH measured by the bolus sensors, the pH of the rumen fluid measured by a pH meter is used as the gold standard [55][56][57][58]. The PCC value of the correlation between the pH measured by these sensors and actual observations, based on six studies from four articles, was high (0.79, n = 40, I 2 = 0%; Figure 12) [55][56][57][58]. However, the CCC value based on two articles (four independent studies) indicated an only moderate correlation (0.62, n = 32, I 2 = 0%; Figure 12) [55,57]. There were differences in the sensor product and gold standard used between the studies included in the analysis (Supplementary Table S18), but heterogeneity was not observed (I 2 = 0% and τ 2 = 0). In the literature, the rumen temperature measured by the bolus sensors was compared with the rectal temperature measured using digital thermometers [56,[59][60][61][62]. The PCC value from five articles (contributing to five independent study samples) showed that the rumen temperature measured by the bolus sensors was moderately correlated with the actual observations (PCC = 0.67, n = 456; Figure 12) [56,[59][60][61][62]. There were differences in the sensor products between studies included in the analysis (Supplementary Table S18), but low heterogeneity was observed (I 2 = 42% and τ 2 = 0.01).
fluid measured by a pH meter is used as the gold standard [55][56][57][58]. The PCC value of the correlation between the pH measured by these sensors and actual observations, based on six studies from four articles, was high (0.79, n = 40, I 2 = 0%; Figure 12) [55][56][57][58]. However, the CCC value based on two articles (four independent studies) indicated an only moderate correlation (0.62, n = 32, I 2 = 0%; Figure 12) [55,57]. There were differences in the sensor product and gold standard used between the studies included in the analysis (Supplementary Table S18), but heterogeneity was not observed (I 2 = 0% and τ 2 = 0). In the literature, the rumen temperature measured by the bolus sensors was compared with the rectal temperature measured using digital thermometers [56,[59][60][61][62]. The PCC value from five articles (contributing to five independent study samples) showed that the rumen temperature measured by the bolus sensors was moderately correlated with the actual observations (PCC = 0.67, n = 456; Figure 12) [56,[59][60][61][62]. There were differences in the sensor products between studies included in the analysis (Supplementary Table S18), but low heterogeneity was observed (I 2 = 42% and τ 2 = 0.01).

Summary and Implications
A wide variety of wearable wireless biosensor systems for health or estrus detection are currently available in the market. Most of these sensor systems measure acceleration using a three-axis accelerometer and convert this into a numeric value to quantify specific physiological parameters, such as eating time, rumination time, and resting time, using a customized algorithm. The reporting methods (reporting frequency, data units, etc.) of the information generated by the sensors are also diverse. Important basic information on the sensors, such as the frequency of data measurement and the algorithm used for calculating the value of a specific variable from acceleration, was largely undisclosed because of company confidentiality.
To date, several studies have evaluated different parameters related to feeding behavior, moving behavior, and rumen status that were measured and calculated using sensor systems. These sensor systems showed a high performance in measuring most of the physiological parameters. However, the sensor performance for some parameters (e.g., drinking time and walking time) needs to be improved [23][24][25]28,32,47,50,51], and a specific sensor showed low performance for a particular behavior (i.e., walking time measured with a neck sensor) [32,51]. Moreover, it seems that the mounting position of a sensor using an accelerometer is critical to detect a cow's specific behavior of interest, which is consistent with a previous report [63]. In particular, feeding behavior was classified more accurately by a neck-mounted than a leg-mounted accelerometer (Se 96 versus 80% and Pre 88 versus 79%, respectively), but the opposite was true for lying behavior (Se 95 versus 96% and Pre 82 versus 97%, respectively) [63].
A standardized guideline for reporting sensor evaluation is required. Different performance levels were reported under different conditions, which was reflected in the considerable heterogeneity of the meta-analysis (average I 2 = 76%). In some cases, the same brand of sensor was evaluated very differently in the literature, even under the same feeding and housing conditions [18,22,27,32,36]. Unfortunately, a number of literature sources provided insufficient evaluation criteria, which makes it impossible to ascertain which evaluation factor caused such differences in performance between the sensors. In order to clarify the factors affecting the difference in the accuracy of these sensors, more detailed information is required as follows: animal information (species, gender, physiological status, etc.), housing information (stall type, pen size, stocking density, etc.), data information (observation time per animal, number of observation points per day, total collection days, etc.), and gold-standard information (method, reliability within and between observers, etc.). In the medical field, there is a guideline for writing papers that report the accuracy of a diagnostic method called a Standards for Reporting of Diagnostic Accuracy (STARD) statement [64]. This guideline contains a list of essential reporting items that can be used as a checklist to ensure that a report of a diagnostic accuracy study contains the necessary information. Performing a meta-analysis using articles written using this guideline enables a detailed discussion of bias and heterogeneity among the studies. Therefore, it is necessary to establish reporting guidelines including the above-mentioned factors (i.e., animal, housing, gold standard, etc.), such as the STARD statement, for papers reporting the accuracy of wearable wireless biosensors.

Conclusions
In conclusion, the present study showed that the wearable biosensors tested in the literature predict targeted behavioral information with high accuracy. However, the algorithms used to generate some types of information, such as drinking time and walking time, need to be improved. Furthermore, since the accuracy of behavioral information changes sensitively depending on the evaluation conditions, it is recommended to evaluate each sensor using adequate and validated criteria and report the evaluation criteria in detail.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/ani11102779/s1, Table S1: Evaluation results (correlation coefficient) for eating time (time spent eating) among feeding behavior variables generated by the sensors, Table S2: Evaluation results (diagnostic accuracy) for eating time (time spent eating) among feeding behavior variables generated by the sensors, Table S3: Evaluation results (correlation coefficient) for rumination time (time spent ruminating) among feeding behavior variables generated by the sensors, Table S4: Evaluation results (performance) for rumination time (time spent ruminating) among feeding behavior variables generated by the sensors, Table S5: Evaluation results (correlation) for drinking time (time spent for drinking) among feeding behavior variables generated by the sensors, Table S6: Evaluation results (performance) for drinking time (time spent for drinking) among feeding behavior variables generated by the sensors, Table S7: Evaluation results (correlation) for lying time (time spent lying) among activity behavior variables generated by the sensors, Table S8: Evaluation results (performance) for lying time (time spent lying) among activity behavior variables generated by the sensors, Table S9: Evaluation results (correlation) for standing time (time spent standing) among activity behavior variables generated by the sensors, Table S10: Evaluation results (performance) for standing time (time spent standing) among activity behavior variables generated by the sensors, Table S11: Evaluation results (correlation) for walking time (time spent walking) among activity behavior variables generated by the sensors, Table S12: Evaluation results (performance) for walking time (time spent walking) among activity behavior variables generated by the sensors, Table S13: Evaluation results (correlation) for step counts (the number of steps) among activity behavior variables generated by the sensors, Table S14: Evaluation results (correlation) for active time (time spent activity) among activity behavior variables generated by the sensors, Table S15: Evaluation results (performance) for active time (time spent activity) among activity behavior variables generated by the sensors, Table S16: Evaluation results (correlation) for inactive time (time spent inactivity) among activity behavior variables generated by the sensors, Table S17: Evaluation results (performance) for inactive time (time spent inactivity) among activity behavior variables generated by the sensors, Table S18: Evaluation results (correlation) for rumen pH and rumen temperature generated by the reticulo-rumen bolus sensors.