It is important to consider the implications associated with imputing or replacing missing data

All these clinical scenarios are suggested to contribute to the high number of missing data, and as can be seen in further analyses, seems to be missing at random. Furthermore, when alcohol screen result was missing, mean total GCS was 14.13 compared to 12.38 when alcohol was present. This difference is indicative that the data is not missing at random. When total GCS scores were missing, the means for age and alcohol screen result also differed, but less so. When total GCS was missing, mean age was 31.47 compared to 33.89 when total GCS was present. Similarly, mean alcohol screen result was .0720 when total GCS was missing compared to .0620 when it was present. The mean total GCS when age was missing was 31.47 compared to 33.89. When alcohol was missing was 13.05 compared to 12.22 when alcohol was present. The difference is small which may indicate that the data is indeed missing at random. The means for age, alcohol, and total GCS were very similar when ethnicity was missing or present. Mean age was 33.42 when ethnicity was missing compared to 33.80 when it was present. Mean total GCS was 13.17 when ethnicity was missing compared to 13.69 when present. Similarly, the mean alcohol screen result was .0503 when ethnicity was missing compared to .0631 when present. This indicates the data is missing at random. Lastly,vertical growing system the means for age and total GCS differed in the presence of THC. When THC was missing mean age was 31.17 compared to 39.49, mean total GCS was 14.03 compared to 12.62 when THC was present.

Alternatively, the means for alcohol screen result were similar in the presence or lack thereof of THC. When THC was missing, mean alcohol screen result was .0615 compared to .0628 when present. As explained above, the larger difference in means may indicate that the data missing is not missing completely at random. However, it is important to consider that these differences cannot be solely attributed to the patient’s provision of information, as these are all clinical tests performed by hospital personnel. If data is missing, it is most likely due to the reasons mentioned above, and not necessarily because the patient was choosing to withhold information. The cross tabulations of categorical variables versus indicator variables table shows similar information to that found in the separate-variance t test table. This table provides information that can help determine whether there are differences in missing data among different categories. Males were found to have a documented value in alcohol screen 30.4% compared to 19.3% in females. This may indicate that there are differences in missing values among males and females. Similarly, males were found to have a documented THC result 28.4% of the time compared to females at 22.1% of the time. This indicates that the data is missing at random. Differences were smaller between males and females for the variables of total GCS and ethnicity, with males having a documented result for total GCS 94% of the times compared to 93.1% for females. Ethnicity was documented for 93.2% of the times with male participants and 92.9% for females. The small difference indicates that the data is not missing at random. For the variable of race, no drastic differences were noted between ethnicity, and THC Combo.

However, the variable of alcohol screen result was found to be largely different in the American Indian group when compared to the other groups . Looking at ethnicity, non-Hispanic patients had a value for alcohol screen result 27.5% of the time compared to 21.4% of the time for Hispanic or Latino patients. Non-Hispanic patients had a THC value documented 26% of the time compared to 23.3% of the time in Hispanic or Latino patients. Total GCS was present in 93.8% of the time in the non-Hispanic group compared to 94.7% of the time for Hispanic or Latino group. This shows that data missing amongst these variables can be attributed to chance. When considering the cross tabulation for THC Combo, or THC presence, it was found that patients who had a negative test for THC were more likely to have missing data for alcohol result when compared to those who tested positive. For those who tested negative, 55.8% had a value reported for alcohol screen result compared to 86.5% for those who tested positive. This aligns with the clinical scenario in that patients who had a blood sample drawn to test for substances had a higher chance of testing positive than those who did not get a blood sample drawn, as all substances are tested using the same sample and sample time. If a patient was having blood drawn to test for alcohol, they were also likely to be tested for other substances. The results were similar when looking at all the positive for drugs table. Patients who tested negative for all other substances were more likely to have missing data for alcohol screen result when compared to those who had a positive test. For those who tested negative, 53.8% of the time there was a value documented for alcohol compared to 83.7% of the time in the presence of a positive substance test.

This supports the idea that data for THC Combo may be missing if alcohol screen result is missing, which indicates that the missing values for THC may not be missing completely at random. When patterns in SPSS are requested, a bar chart displaying the percentage of cases for each pattern is tabulated. The bar chart seen below in Table 13 shows that almost 40% of the cases in the dataset have Pattern 40, and the missing value patterns chart, as seen in Table 12, shows that this is the pattern for cases with a missing value on alcohol screen result and THC Combo. Pattern 49 represents cases with a missing value on age, alcohol screen result and THC combo. The bar chart shows that almost 15% of the cases in the dataset have Pattern 1, and the missing value patterns charts shows that this is the pattern for cases with no missing values. Pattern 28 represents cases with a missing value on THC combo. Pattern 14 represents cases with a missing value on alcohol screen result. The great majority of cases are represented by these four patterns. It is important to note that patterns 21, 51, 43, 45, and 53 are considerably smaller than the first four patterns, and they are similar in size. This means that the patterns of missingness across the variables is somewhat consistent, and that no dominant pattern to the missingness is readily seen. Based on this extensive analysis, it was determined that variables total GCS, alcohol screen result and THC Combo are not missing completely at random. When missing values in each variable account for less than 5%, those values can be missing at random and listwise deletion can be performed relatively safely is appropriate to do. This holds true for all the variables except for THC Combo, positive for drugs, alcohol screen result, age in years, ethnicity and total GCS. These variables, three quantitative and three categorical, were found to have greater than 5% missing values. On observation of the missing value analysis, it was observed that most cases had these two variables as missing, perhaps suggesting a relationship, or an effect. Furthermore,how to dry cannabis the Little’s MCAR test revealed that missing data may not be missing completely at random. Deleting cases with missing values can reduce the statistical power of the analysis and result in biased outcomes and estimates. Therefore, the use of multiple imputation is appropriate for this dataset and this study. Another method in SPSS that can be utilized is the Replacing Missing Values method. The Linear Interpolation method will be utilized. The Linear Interpolation method is a simple statistical method used by SPSS which estimates the value of one variable from the value of another and using regression methods to find the line of best fit. Using the Replacing Missing Values method in this study will help solve the problem of bias and ensure that power is not decreased because a large majority of the sample size will be preserved. Multiple imputation or missing value replacement analyses will avoid bias only if enough variables predictive of missing values are included in the replacement method. If variables that may be predictive of the estimates are not included in the model, for example the effect of age on alcohol result, replacement computation will underestimate these associations and bias the final analysis.

Therefore, it is preferrable to include as many predictive variables as possible in the model when either imputation or replacing missing value methods are utilized.Replacing missing values was utilized to minimize the many problems associated with missing data. The absence of data reduces statistical power and can also lead to bias in the estimation of parameters and analyses. Finally, missing data can diminish the representatives of the sample size and cases . It is important to consider that though replacing or imputing data is a common approach to the problem of missing data, it still does not allow analyses of actual data that is provided by actual participants, or in this case, data entered by abstractors and hospital registry systems. In gaining a larger sample size, and perhaps a more representative sample, confidence is lost that actual responses provided are those analyzed. It is important to note that methods used to account for missing data only provide researchers with the best estimated guess of what actual data may have been had it been documented in the first place. It is this ideology that influenced the decision to include some of the variables with missing data to be multiply imputed. Replacing missing values is another form of multiple imputation that was selected for this study. Though multiple imputation process was utilized, it presented a complication in terms of the number of iterations and the subsequent analysis. Since the dependent variable, total GCS, was not selected for imputation/replacement, it was recommended and deemed appropriate to utilize the Replacing Missing Values function in SPSS to establish estimates for a select group of variables with missing data values. Replacing Missing Values method, a different form of imputation, allows the creation of new variables from existing ones by replacing them with estimates computed with a variety of methods. For this study, the Linear Interpolation method was used. This method utilizes the last valid value before the missing value and the first valid value after the missing value. The variables selected for missing value replacement were age and alcohol screen result. The variable age was selected due to its effect on traumatic brain injury incidences as well as post TBI outcomes . Additionally, the use of alcohol and other substances is prevalent in young adults with more than half of those who die from overdoses being younger than 50 years of age . The impact of age on TBI, substance abuse and outcomes could not be overlooked, and omitting this large percentage of cases will bias analysis results. The variable of alcohol screen result was also important to replace because of the known impact and association alcohol abuse has on TBI incidence and outcomes. Alcohol and TBI are closely associated, with up to 50% of adults noted to drink more alcohol than recommended prior to their injury, and ultimately incurring worse outcomes . The variables of total GCS, THC Combo and positive other drugs were not included. Total GCSis the dependent variable, and having estimates instead of actual data seemed conceptually and logically inappropriate. For being the main predictor variables, both THC Combo and positive other drugs were not included to ascertain a more accurate and true account of the effects they may have on TBI severity. The Replacing Missing Values method yielded 7872 entries for age, with only 3 missing cases. The mean for age in the new dataset with replaced values was 31.19 years with a standard deviation of 26.1 compared to 33.78 years with a standard deviation of 27.3 for the non-replaced dataset. The replacing missing values method yielded 7822 valid entries for alcohol screen result, compared to 2087 entries in the non-replaced dataset. In the new dataset, alcohol screen result had a mean of .03, a standard deviation of .0752, with a minimum value of .00 and a maximum value of .66.

It is important to consider the implications associated with imputing or replacing missing data

Recent Posts

Recent Comments

Archives

Categories