Uncategorized

Animals | Free Full-Text | The Prediction of Clinical Mastitis in Dairy Cows Based on Milk Yield, Rumination Time, and Milk Electrical Conductivity Using Machine Learning Algorithms



1. Introduction

Clinical mastitis in early lactation can have negative impacts on the productivity of cows, including a temporary or permanent decrease in milk quality and production [1]. Mastitis significantly affects the welfare of animals at the individual or herd level and poses threats to the sustainability of dairy farming [2]. Prediction of mastitis at an early stage will help in successful disease intervention and reduce the risk of transmission of the pathogen, thereby potentially reducing the use of antibiotics [3,4,5]. Furthermore, early intervention could alleviate any pain or discomfort and hence increase the welfare of cows [6]. Measurement of somatic cell count (SCC) is the most frequently used method to indirectly evaluate subclinical and clinical mastitis at the herd or cow level [7,8]. In most farms, even in large commercial ones, the SCC in a cow or herd is normally tested once a month [9]. The electrical conductivity (EC) of milk, which is recorded for each milking, can also be used as a clinical predictor for mastitis of cows milked using rotary/robotic milkers [10,11,12,13]. There is a significant relationship between day rumination time and days relative to calving, twinning, subclinical hypocalcemia, subclinical ketosis, and retained fetal membrane [14]. Rumination and activity coupled with milk yield and body weight can be used to identify dairy cows with health disorders, such as mastitis, metritis, and lameness [15,16,17], ketosis [18], as well as the severity of inflammatory conditions evaluated using rumination time during the peripartum period [19].
Various commercially available sensors in intensive farms facilitate the collection and availability of a large amount of data, including estrus detection, synchronization protocol programming, and health management, which help in the efficient production and management of farms [20,21,22]. Recently, many researchers have explored the prediction or detection of subclinical/clinical mastitis in cows milked with automatic milking systems using classical machine learning algorithms such as support vector machine (SVM) [23,24], decision tree [25,26], random forest [25,27], gradient-boosted tree [25,27,28], Naïve Bayes [25,29], logistic regression [25,30], and neural networks [31,32,33,34,35,36]. Only a few studies on automatic milking systems have been conducted using random forest, Naïve Bayes, and extreme gradient [3] and SVM [37] algorithms to develop prediction or detection models on real-time data collected from milking parlors and/or management software adopted on the farms.
The deep learning algorithm has been used to detect key parts of the body of the dairy cow and the lameness of dairy cows in the Yangling district [38,39]. Data generated from precision dairy technology adopted by commercial farms have not been used previously for early warning, detection, or diagnosis of mastitis in dairy cows in Northeast China. This study aimed to verify the ability of machine learning algorithms to efficiently predict clinical mastitis in dairy cows based on real-time data of rumination time and physical activity generated from the monitoring collars, coupled with variables such as the EC of milk and milk yield from the rotary milking system.

2. Materials and Methods

This research was part of a large study aimed at developing a technology to improve farm management using big data, with a special focus on continuous monitoring, prediction, and precise detection of health disorders in lactating cows and calves in commercial herds in Northeast China using precision dairy technology and rotary/side by side milking system, along with environmental factors. Considering mastitis is more likely to occur in early productive life and lactation stages, cows were divided into subgroups according to their lactation stage, namely stage 1 (0–28 days in milk (DIM)), stage 2 (29–100 DIM), stage 3 (101–200 DIM), and stage 4 (201–305 DIM). We also accounted for parity (for mastitic cows, 25 cows with parity one, 63 cows with parity two, 117 cows with parity three, and 242 cows with parity ≥ 4, respectively; for healthy cows, 128 cows with parity one, 753 cows with parity two, 571 cows with parity three, and 694 cows with parity ≥ 4, respectively) to investigate the differences in forecast results. To obtain reliable results and improve decision-making by farm managers, data from healthy cows with similar DIM, daily milk yield, and the same parity as that of the sick ones were collected. All animal procedures were performed following the guidelines for the care and use of experimental animals at Heilongjiang Bayi Agricultural University (Daqing, China). The animal ethics committee of Heilongjiang Bayi Agricultural University approved the study protocol (FBD201603006).

2.1. Animal Housing and Feeding

We collected original data over 2.5 years (January 2020 to June 2022) from five commercial farms in Northeast China, the practical base of our university. The farms were located in three cities at latitudes and longitudes of 47.42 E to 51.03 E and 124.45 N to 129.18 N for the first city, 45.46 E to 46.55 E and 124.19 N to 125.12 N for the second city, and 44.04 E to 46.40 E and 125.42 N to 130.10 N for the third city.

Details about the environment of animal houses, collars worn by the cows, rotary milking systems, feeding patterns, and management modes used in the five farms are as reported previously [40]. The farms applied a total mixed ration to feed cows twice daily (500 h and 1300 h), with the feed pushed whenever necessary and fresh water made available at all times. They were milked thrice daily (300 h, 1100 h, and 1900 h) using a milking system (Data Flow, SCR Engineers Ltd., Netanya, Israel). The herds of these five farms were all monitored by neck collars (Collar, SCR Engineers Ltd., Netanya, Israel). Three farms adopted the Yimu Cloud management system (Yimu Technology Ltd., Beijing, China), and the other two used Data Flow Client management system (Data Flow, SCR Engineers Ltd., Netanya, Israel). Two farms raised 900~1000 cows per year on average during the period of experiment, with 1000~1200 cows, 1100~1200 cows, and 1600~1700 cows on the other three farms, respectively. Overall, the management modes and feeding patterns were similar among the considered herds.

2.2. Data Collection and Study Design

Information about the procedure of data collection and health-monitoring program is reported in detail in a companion manuscript [40]. The health-monitoring program was defined by our research team before the start of this study, and the farm staff (for each farm, 1 manager, 3 technicians, and 1 veterinarian with more than 15 years of experience monitoring cow health) were responsible for conducting the daily health monitoring of dairy cows. Clinical signs of mastitis were examined every three days by observing the udder and milk (i.e., hard quarter, heat or swelling, clots in milk, flakes, lumps, or clear/yellow milk) following calving until day 28 and were subsequently determined every seven days throughout lactation. Time from detection to diagnosis should not exceed 6 h, and details of animals, including cow identification number, quarter, date and time of diagnosis, and staff involved in the detection and diagnosis of disorders, were input to the management system software within 5 min after diagnosis.
In the present study, we monitored the weekly records of a total of 3031 healthy cows (without any disease during the experiment) and 587 cows suffering from naturally occurring clinical mastitis, with 685 mastitis events recorded from January 2020 to June 2022. The cows were initially grouped into two categories, mastitic cows and healthy cows, which were assigned a value of 1 and 0 when used as dependent variables, respectively, with the day of diagnosis and treatment considered as d-0, and the original variables were collected from seven (d-7) or three days (d-3) before diagnosis. After data preprocessing, the final data included information about parity, DIM, age at the time of disorders, milk yield, activity, six variables related to rumination time (daily rumination time, rumination at daytime, rumination at nighttime, the ratio of rumination time at daytime to that at nighttime, rumination deviation every 2 h, absolute values of the weighted rumination variation), and three variables related to the EC of milk (peak electrical conductivity of milk, daily percentage of the electrical conductivity of milk, standard deviation of the largest change in conductivity over the last three shifts) for a total of 2146 healthy cows and 447 mastitic cows (Table 1 and Table S1).

2.3. Statistical Analyses

Statistical analyses were performed for all variables unless otherwise stated. The values of each variable in the interval (QL − 1.5 IQR, QU + 1.5 IQR) were used to conduct statistical analyses and to establish prediction models or were otherwise removed, where QL was represented as the lower quartile of each variable, QU was represented as the upper quartile, and IQR was represented as the upper quartile minus the lower quartile. After removing missing data and outliers, descriptive statistical tests were performed to characterize the measures of location and variability using means of frequency distribution tables and histograms; thereafter, the χ2 and t-tests were performed for categorical outcomes and continuous variables, respectively. Results were considered statistically significant at p < 0.05 (trends declared at 0.05 < p ≤ 0.10).

As the dataset used in this study to forecast mastitis was collected from five intensive farms that used the same collars, rotary milking system, and management system, the dataset was transformed using Z-standardization (that is, each variable was subtracted from the mean, and then divided by the standard deviation, such that the original values were mapped to an interval of [0, 1]) for stable and reliable generalization of the prediction model for the environment of other farms. Prediction models based on machine learning algorithms were applied to both the original and transformed datasets for the selection of the optimal prediction model.

2.4. Machine Learning Algorithms

We employed nine machine learning algorithms, including multilayer artificial neural net (MNET), binary logistic, SVM, Rpart, random forest, XGboost, AdaBoost, linear discriminant analysis (LDA), and Naïve Bayes using the R software version 4.1.2 (R Core Team, 2021, https://www.r-project.org/, accessed on 12 June 2022). For each adopted algorithm, “set seed ( )” was used to ensure the repeatability of our results, and we randomly divided data according to the dependent variable “Species” (binary variable “0” represented “healthy cows” vs. “1” “mastitic cows”) using the “createDataPartition” function. When performing each machine learning algorithm, a data subset consisting of 75% of the observations was selected as training data to construct the predicting models; the data subset consisting of the remaining 25% was used as testing data to assess the performance of the models. The parameters for the other eight machine learning algorithms used in this study were set as described previously [40].
The performance of each machine learning algorithm was assessed based on their sensitivity, specificity, accuracy, precision, Matthew’s correlation (evaluation indicator for the results of the binary classification model, especially for imbalanced category data), and area under the receiver operating characteristic (ROC) curve (AUC) value, which are defined as follows:

S e n s i t i v i t y = T P T P + F N , S p e c i f i c i t y = T N T N + F P , A c c u r a c y = T P + T N T P + T N + F P + F N ,

P r e c i s i o n = T P T P + F P , M C C = T P T N F P F N ( T P + F P ) ( T P + F N ) ( T N + F P ) ( T N + F N )

For definitions of the abbreviations, see the companion article [40]. The description of the process of machine learning algorithms for developing the mastitis prediction model is shown in Figure 1.

4. Discussion

In commercial dairy farms, mastitis, a disease of the udder that is typically the result of bacterial infection, is associated with potentially increased use of antimicrobials and associated resistance, thereby affecting the welfare of dairy cows and increasing the rate of culling and/or death. Therefore, accurate and efficient mastitis prediction is valuable for timely intervention, leading to the protection of animal welfare both at individual and herd levels and ensuring associated food safety.

Relative to the traditional method, machine learning algorithms can model high-dimensional and noisy data efficiently, which have been successfully used to solve many biological problems, such as the prediction and detection of subclinical mastitis using various measurements gathered by automatic milking systems rather than laboratory tests, thereby reducing experimental costs, and without interfering with the daily working routine of the farmers or disturbing the cows.

The EC of milk is the measure of the resistance of a material to an electric current. Mastitis changes the blood capillary permeability [41]. For decades, this change in the conductivity of milk has been used as an indicator for clinical mastitis [42,43,44], with the frequency of use increasing in the dairy industry. Some studies have reported that EC exceeding 5.5 mS/cm could indicate subclinical mastitis [12,45]. In this study, the average value of the peak of EC for mastitic cows in all four subgroups exceeded 5.5 mS/cm, with an average value of 6.02 mS/cm on d-3, 5.99 mS/cm on d-2, and 6.01 mS/cm on d-1. Similar results were observed for milk yield, which was higher in mastitic cows than in healthy cows before the onset of mastitis [46]. In the early lactation period, high-yield dairy cows often experience relatively severe metabolic stress, and their self-immunity becomes relatively poor, which may give us a clue that cows at this stage are more likely involved in the risk of mastitis [47].
We also observed a delayed DIM in PMY in mastitic cows (62.51 ± 28.87 days) vs. healthy cows (55.17 ± 22.07 days), which was consistent with the results reported by Peiter et al. [48], where the DIM of PMY for mastitic cows in the 0–28 and 29–100 DIM subgroups were 61.5 ± 30.34 and 62.65 ± 27.02 days, respectively.
Consistent with the results reported by Stangaferro et al. [16] and King et al. [49], the daily rumination time, rumination time at nighttime, and daily milk yield started to decrease a few days before the diagnosis of mastitis. In line with our previous report [40], rumination deviation per 2 h, the sum of absolute values of the weighted rumination variation, and peak EC of milk showed a similar pattern of gradual increase before the diagnosis of mastitis. We also considered the daily percentage of change in the EC of milk and the standard deviation of the largest change in conductivity over the last three shifts (provided by the rotary milking system and uploaded to the management software), which showed significant differences between the sick and healthy cows, signifying that these two variables can be considered when developing a practical mastitis alert system. Although the variance from d-3 to d-0 in the change in EC of milk over the last three shifts did not show a noticeable trend as the absolute value of weighted rumination variation per 2 h, the two variables showed similar contributions to the forecast models.
In first-lactation Holstein-Friesian or Dutch Friesian cows, Barkema et al. [50] found that 30% of clinical mastitis cases occurred in the first 14 DIM. In first-lactation Iranian Holsteins, Moosavi et al. [51] found that although more clinical mastitis cases occurred in the first 74 DIM of lactation than later on, the duration of clinical mastitis was shorter when it occurred during this period. In this study, we divided the cows into four stages, as described in Section 2. For each machine learning algorithm, we presented the results derived using original non-standardized and Z-standardized data for four specific lactation stages to find the optimal models for predicting mastitis in cows one or three days before the actual onset. We also conducted experiments on logarithmic data and min–max normalization, which showed very poor results (sensitivity, specificity, accuracy; logistic regression, SVM, LDA, and Naïve Bayes algorithms were lower than 0.35). Using seven machine learning algorithms, Ebrahimi et al. [25] developed a prediction model of subclinical mastitis with several milking features measured using an automated monitoring system and SCC measured using an inline detector. They analyzed both non-transformed and Z-standardized datasets and found no significant differences between the two datasets. Ebrahimi et al. [25] designated EC as the most important parameter for predicting subclinical mastitis. The sensitivity of all their results exceeded 90%, whereas the specificity was lower than 50%. In our study, the specificities of all nine machine learning algorithms were higher than their sensitivities. Only the precision of the MNET algorithm modeled using Z-standardized data for cows in the 29–100 DIM group reached 90.91%, which indicated that, in future studies, we need to mine more reliable features to control the false positive rate and keep it as low as possible.
A previous study [52] has reported that the risk of clinical mastitis increased as parity increased, which may be because increasing parity increases the chances of infection with pathogens [53] in addition to the natural loss of teat defense mechanisms or immunity with an increase in years of service [54]. However, in the present study, parity was not confirmed as a significant indicator for naturally occurring mastitis. The activity of most cows at estrus was significantly affected; however, activity in mastitic cows in the four lactation stages did not display the expected difference nor give significant variance in any prediction model except in the MNET algorithm for data of cows in the 29–100 DIM subgroup.

Because of differences in the type of milking system used in the five farms, some parameters related to milk, such as its composition, temperature, or optical properties, were not included in the current experiment. The low accuracy and precision of the test model, the exclusion of data regarding milk parameters, and the smaller sample size of clinical mastitic cases pose challenges in the practical application of the prediction model. In future studies, in addition to the variables involved in the present study, other parameters should be mined, and algorithms with better performances should be explored to obtain more reliable and credible prediction outcomes.

Overall, these results indicated that models based on machine learning algorithms are useful tools in the development of prediction and/or detection models for mastitis in dairy cows monitored using collars and milked using a rotary milking system, which provides a broader understanding of some of the signs and symptoms of mastitis, leading to timely and efficient control and better management of this disorder.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *