Such differences in marginal probabilities prevent the raw cell counts from being directly compared

To overcome this issue, the LIT package by default provides entropy estimators based on the ML frequency estimates, but allows users to select from a range of bias-corrected frequency estimates available in the entropy package .Not only can the impact of latent factors on either behavioral measure differ in magnitude, but we can also anticipate that responses may differ in both strength and direction for different subgroups within the herd. Such nonlinear dynamics are easily captured in a model-free MI test, but further inspection of the contingency table is needed to fully characterize such complex bivariate relationships between sensor outputs. If either marginal encoding has roughly the same number of observations in each bin, then the cell counts in the joint contingency table can be directly compared, as under the null we would expect each cell to be equiprobable. For empirically defined encodings, however, bin sizes can vary significantly to better capture the underlying geometry of the univariate data distribution. To better identify which cells in an empirically-defined joint encoding are driving a significant overall relationship between two data streams, mutual information can be decomposed into pointwise mutual information values . The LIT package provides users the option in the compareEncodings plotting utility to color cells in the joint contingency table by PMI estimate, to better facilitate direct visual comparisons of the encodings.

To further enhance visualizations of the joint probability distribution that significantly differ from expected cell counts under the null, weed trimming tray users may also specify a probability threshold above which PMI values should not be displayed, which is determined here by simulating PMI estimates under the null by redrawing from a multinomial distribution using the outer product of the marginal distributions. Bivariate tree tests were applied to the time budget encodings using both the noise- and plasticity-penalized dissimilarity metrics, and pruned using the more conservative plasticity penalized mimicry, and the encoding of parlor entry order data produced using data mechanics clustering in previous work . A 2:10 x 2:10 grid was used to determine the optimal resolution for the bivariate relationship, with the optimal metaparameters used to create visualizations of the joint encoding, wherein point wise mutual information values were used to color cell counts that were significant at the alpha = 0.05 significance level. To further explore latent factors that might explain significant associations between entry position and time budgets, bivariate tree tests and point wise mutual information tests were also applied separately to the encodings of both PLF data streams and health records.Figure 2 provides a visual comparison of time budget encodings for the four candidate dissimilarity metrics. In each heatmap visualization, individual cows are arranged along the row axis and the mutually exclusive behaviors that comprise the overall time budget are ordered along the columns. Each cell within the heatmap is subsequently colored to reflect the proportion of time that a given cow is recorded by the accelerometer system to engage in a specific behavior over the observation window.

Few cows dedicated more than half of their time to any one behavioral axis, which is not surprising, given that total lying time in this system is split between the nonactive and rumination axes . Time recorded as eating and time recorded as ruminating were the highest magnitude behavioral axes, but time spent eating demonstrated far greater range and heterogeneity. Time spent nonactive was lower in overall magnitude but still showed a fair amount of heterogeneity across cows. The active and highly active axes, however, were both quite low in magnitude and generally demonstrated less systematic heterogeneity across the herd. The order of cows along the row axis in each heatmap is determined by the dendrogram calculated for each dissimilarity matrix. The dendrogram can be interpreted as an approximate 2D representation of the distribution of the cows with the 5D multinomial space of the time budget, and thus serves to bring out in the heatmap systematic differences in time budget across the herd. Gaps have here been added between rows to indicate branches that have been pruned, such that all cows within a given branch receive the same discrete value in the final time budget encoding.A cursory appraisal of all four encodings summarized in Figure 2 reveals that, regardless of the dissimilarity metric utilized, there is a considerable amount of heterogeneity in the distribution of overall time budgets across this herd. Looking more closely at the clustering tree produced from the unweighted Euclidean dissimilarity metric in Figure 2A, we see that the higher magnitude eating and rumination axis entirely dominated the first handful of bifurcations of the dendrogram.

Even for users not accustomed to reading dendrograms, this dynamic is clearly animated by parsing through the grid of heatmap visualizations provided by the encodePlot utility . Heterogeneity in the moderate-magnitude nonactivity appears to have been largely ignored in the first half-dozen bifurcations, with the first 10 clusters extracted from this dendrogram being ultimately quite variable in the nonactivity response. Nor is there clear evidence that either activity axes influenced the first 10 bifurcations of this tree. The Euclidean distance heatmap is also annotated on the row axis with a number of auxiliary data fields for each cow, which included: age , calving date, an estimate of peak lactation, nutrition supplementation treatment, and health status during the observation window . A cursory visual inspection reveals that most clusters appear to be fairly homogenous with respect to cow age, tenure in the pen, and feed supplementation status. Sick cows, however, appear to be slightly over represented in some groups, namely the smaller branches representing the more extreme time budget tradeoffs.Looking next at the hierarchical clustering results visualized in Figure 2b, the KL distance seems to have provided a slightly more holistic encoding of the data that better balances the input across the five behavioral axes. Again, extremes in eating and rumination drive the first few bifurcations of the tree structure, but tradeoffs between time spent eating and nonactive are considered much earlier in the bifurcation decisions within this tree. Some systematic heterogeneity is also revealed across the herd in high activity axis, despite its lower magnitude. Unfortunately, the KL distance also appears to have over-stratified cows whose time budgets lie at the extremes. In particular, the cows with extremely low time spent eating have been divided into clusters that are likely too small and narrowly defined to facilitate cross-sensor inferences in downstream analysis, and thus may obscure important behavioral dynamics in this dataset. The KL distance heatmap is also annotated on the row axis with the variance in observed daily time budgets for each behavioral axis. Given that time budgets have here been normalized and expressed as proportions, bud trimming trays the resulting variance terms were quite small in magnitude , and so have here been re-expressed on log scale, where a more negative value represents a smaller relative magnitude of variation. That all five axes ranged several orders of magnitude in these variance estimates reveals that there was a significant amount of variability in the relative plasticity of time budgets across days. Visual appraisal revealed very little systematic patterns in this heteroskedasticity across clusters, however, suggesting that differences in relative plasticity in daily time budget observations may be attributed more to the individual than to any specific pattern in overall time budget. The noise-penalized ensemble weighted distance, visualized in Figure 2C, displays clustering dynamics that fall somewhere in between the two extremes of Figure 2A and 2B. Time spent eating and ruminating still dominate bifurcations nearer the root of the tree, as with the unweighted Euclidean distance, pulling off the most extreme tradeoffs between these axes without over-cutting the tree as with the KL distance. In the later branches of the tree, however, cows with more moderate time budgets are divided with greater input from the nonactive and highly active axes. While the ensemble-rescaled estimator does appear to have succeeded in curbing the rescaling of dissimilarity estimates at the extremes of the distribution, the noise penalized ensemble distance did still bifurcate several cows with anomalously high values inthe eating, rumination, and nonactive axes into their own clusters of size n = 1.

While isolating these animals into their own branches will effectively exclude them from cross-sensor inferences in downstream analysis, this encoding may still be appropriate if these datapoints represent authentic outliers that cannot be explained by typical variation in the sensor system. The heatmap was also annotated on the row axis with the ensemble variance terms used to penalize the squared distance estimates. We see that, as anticipated, the magnitude of error in the noise-penalized ensemble variance terms is substantially smaller than the observed variance in observed daily time budgets, confirming that, with so many samples over an extended observation window, measurement error is not contributing substantially to the overall uncertainty in observed time budgets. Closer appraisal of the clear systematic differences in these ensemble variance terms observed across clusters, however, confirms that these penalty terms appear to be effectively mimicking the intrinsic heteroskedasticity in this multinomial sampling space. Ensemble variances calculated for each cow via the plasticity-penalized simulation routine closely matched the variances in observed daily time budget estimates for all five time budget axes, thereby validating the efficacy of the jackknifing routine. Figure 3 directly contrasts the first 10 clusters extracted from the dendrograms generated by the noise and plasticity-penalized ensemble weighted distance measures. In this visualization, clusters are numbered in each heatmap from top to bottom, and so directly align with the row and column indices of the contingency table. For example, we can easily confirm from this graphic that the first three cows constituting the first two clusters in the Noise-Penalized heatmap are the same cows isolated into the third and fourth clusters in the Plasticity-Penalized heatmaps ± a determination that can be easily confirmed by zooming into this high-definition rendering to compare Cow ID values. Further comparisons reveals that cluster designations for cows with extremely high time spent eating, extremely low time spent ruminating, and relatively low time spent nonactive are virtually identical. In the plasticity-penalized dendrogram, the extremely low eating time cluster is shrunk by just a few animals compared to the noise-penalized encoding . Additionally, after penalizing for behavioral consistency, the cow with the highest time spent nonactive in the sample was not isolated as a outlier. This bifurcation was instead shifted to the cows with more moderate time budgets , serving to better distinguish between cows with relatively high and only moderate times spent eating. The plasticity-penalized dissimilarity estimator was also notably more generous in assigning cows to the cluster characterized by slightly higher rates of rumination while all other axes remained relatively low , and appeared to place greater emphasis on the nonactive axis to determine the remaining clusters. In spite of these differences, both ensemble weighted dissimilarity metrics have succeeded in producing encodings that provide a more holistic and balanced description of this dataset, and ultimately serve to better visualize heterogeneity in the tradeoffs between all five behavioral axes.For all dendrograms pruned using the ensemble of simulations that accounted only for measurement noise, an extremely fine-grained encoding was returned. A total of 39 clusters were returned for the unweighted Euclidean distance, 31 for the KL distance, and 38 clusters for the noise-penalized dissimilarity metric. In Figure 3, the heatmap visualization of the noisepenalized encodings helps to illustrate just how far down each branch the pruning algorithm was able to penetrate before the signal was lost to simulated measurement error. In fact, amongst the first dozen bifurcations in this dendrogram, the only branch not validated was that which would have isolated the cow with the highest observed time spent eating into her own branch. This result is not necessarily surprising, given the extended observational period over which sensor records were recorded. With over 1500 minutes of observation for each cow, even in using a relatively conservative simulation strategy that very likely overestimated the noise intrinsic to this sensor, we should expect by the CLT that the standard errorattributable to measurement error would ultimately be quite small after averaging over so many sampled time points. Subsequently, these results reinforce that the sensors themselves should impose few limitations on downstream inferences for this dataset, and that inconsistencies in the environment and the animals themselves should be the true limiting factor for the resolution of this encoding.