Dealing with Data Issues
If you guys remember, last week I pretty much sat in front of a computer for 35 hours and wrote down the heights and weights of around 630 patients. So basically, this week, along with some other exciting experiences, was mainly focused on "cleaning up the data".
So in the beginning of the week, I thought it would be a good idea to organize the data that had issues. And that means more Excel sheets!
If you look back at the week 2 post, we had 4 main issues: missing data, data outside our time parameter, patients who died within 1 year of transplant, and patients who received multiple transplants. I made a spreadsheet for patients with each of the following issues. Each one of these spreadsheets has an "Excel Number" column, which corresponds to the row the patient info was in on the original data collection sheet. I also had to black out patient names for obvious reasons.
It is not uncommon for a patient to receive an autologous stem cell transplant followed by an allogeneic stem cell transplant. Although patients who receive an auto transplant are still at a high risk for infection post-transplant, there are far less negative side effects when compared to an allo transplant (no graft versus host disease for auto transplants!). Because patients who receive multiple allo transplants are usually in a much worse state going into their second or third transplant, this might screw up our data. Therefore, if a patient received an auto transplant prior to their allo transplant, we kept the data. But if a patient had multiple allo transplants, we only used the data from their first allo transplant. This lowered the number of patients we had from around 633 to 533.
Next, we have the patients who were missing data. Sadly, there is no way to work around this so we were pretty much forced to remove them from the study. Since the data I was collecting was from 2008 to present, there were some patients who were missing 100 day of 1 year weights simply because it hasn't been 100 days or 1 year past there transplant. After taking these patients out, we went from 533 patients to 475.
This was the chart for patients who had data outside out time parameter. I decided to include how far off the data was that way, we would know how far we had to expand out time parameter if we needed more data. Turns out, we have even after taking out these patients so we will most likely just not include them. This left us with 409 patients.
I broke down BMI using the 4 main classes (underweight, normal, overweight, and obese), but I didn't break down the data into those subclasses, simply because we don't have enough data to break it down that far.
Next, I had to break down the percent BMI change (this is actually the exact same as percent body weight change) and there is no "standard" for this. For now, I broke it down into X>=0%, 0%<X<5%, 5%<X<15%, X>=15%. I might have to adjust these depending on how much data falls into each of these categories.
I hope this longer post made up for the short post last week. Although I might do some work over spring break, I won't be posting again to week 5. Until then, bye!
So this was the spreadsheet for patients who have received multiple transplants, including the transplant type and when this secondary transplant was received in relation to the one we care about. There are two main types of transplants: autologous and allogeneic. Autologous stem cell transplant involve giving a patient his/her own stem cells (which are first extracted then frozen) and allogeneic stem cell transplants are when you receive healthy stem cells from an outside donor.
Next, we have the patients who were missing data. Sadly, there is no way to work around this so we were pretty much forced to remove them from the study. Since the data I was collecting was from 2008 to present, there were some patients who were missing 100 day of 1 year weights simply because it hasn't been 100 days or 1 year past there transplant. After taking these patients out, we went from 533 patients to 475.
Finally we have patients who died early. I sorted out the patients who have passed into 4 categories as you can see in the spreadsheet. We decided not to take these patients out of the study. We can at least calculate pre-transplant BMI with all of these patients and if we see a trend among these patients, perhaps that would tell us we need to intervene if patient weights are too low pre-transplant.
Sorting the Data
So the last thing I had to do this week was determine the different BMI classes and percent change in body weight classes. First, let's talk about BMI. I decided to use a pre-existing BMI classification that the World Health Organization uses. If you wanna read more about exactly how BMI is calculated click here.
Next, I had to break down the percent BMI change (this is actually the exact same as percent body weight change) and there is no "standard" for this. For now, I broke it down into X>=0%, 0%<X<5%, 5%<X<15%, X>=15%. I might have to adjust these depending on how much data falls into each of these categories.
I hope this longer post made up for the short post last week. Although I might do some work over spring break, I won't be posting again to week 5. Until then, bye!
Hey Justin! Nice Job on placing the data collected into multiple categories, I am sure it makes things easier that way. Since you have multiple categories now are you going to see the correlation within every category then summarize a big picture based that or only focus on the big picture? Great Job... You have progressed a lot!
ReplyDeleteWell to be honest, we have a statistician doing most of the actual number crunching, obtaining a p-value, etc. However, I do now that we have to look at the correlation within each category before we make any claims. We have to know specifically which initial BMI groups or %BMI lost groups correspond to which outcomes.
DeleteYo Justin! I see your data collecting is coming along quite nicely. It's good that you're using an effective methodology. What inferences can you make so far with the data you've collected and organized?
ReplyDeleteHey Spencer, I thought that it would be the patients who fell into category A (no weight loss or weight gained) who would have the most positive outcomes. I also predict based on previous research patients who are obese coming into transplant have it worse off than those who are normal or underweight. Hope that helped!
DeleteHi Justin! I like the methods you are using to sort your data. Your project is coming along nicely, and it looks like you are finding correlations effectively. I'm looking forward to hearing more about your project!
ReplyDeleteThanks!
DeleteHey Justin! Super cool! Congrats on finishing the data input! Do the people who did not finish their data have a consequence? I can't wait to see what happens with how you sort the BMIs!
ReplyDeleteAs much I as I would have liked to include those patients or perhaps even follow one along through the course of their transplant, I am limited by the time constraints of the project. So those patients will be taken out of the study.
DeleteHey Justin! It seems like our data is really coming along which is really exciting. How are you planning on interpreting the data to draw conclusions from it?
ReplyDeleteHi Jack, sadly the statistical analysis of the data is outside the time and scope of my project so we have a statistician to do it for us. However, after the results come back in a few weeks I will meet with the statistician again to discuss briefly how the data was analyzed/ conclusions that can be drawn.
DeleteHi Justin
ReplyDeleteIt sounds like you've really got a handle on this. Your post is clear, sophisticated, and fabulously articulates all the nuances involved in your project.
Maybe you've already written this elsewhere, but what is the next step after data collection? Start looking at patterns? Will Excel help you do this?
Additionally, how will you be presenting all your information on presentation day (or whatever it's called)? A slideshow?
Hi Mr. Covalciuc, the next step would be to observe the analysis of the data after the statistician does his work. I forgot the name of the program the statisticians use when analyzing large data sets so I will get back to you on that. And yes, I will be presenting my information in a Powerpoint on presentation day.
DeleteHi Justin! It's so great to see how far your project has progressed. I really like how you walked through each step of your data sorting. What other difficulties did you encounter while going through the data? I am looking forward to next week and have a good spring break!
ReplyDeleteThat's pretty much all the issues with data that have come up!
DeleteHey Justin.
ReplyDeleteIt has been nice reading your posts since they have been a nice length and have also been easy to comprehend. Also, nice work on your data and excluding those who would screw up your data. All of your data collection has been very neat and organized, keep up the work!
Thanks!
DeleteHi Justin! Is it possible that infections could occur from the first transplant and if so how reliable could the data be then if you did not know which patients may have lost weight as a consequence?
ReplyDeleteWell this is definitely one of the factors that we take into consideration. Patients post-transplant are neutropenic so they are susceptible to all sorts of viruses and fungi after the transplant. This is basically the idea behind the study. We know patients suffer from these things post-transplant so at what point does the weight loss become lethal and intervention is needed.
DeleteHey Justin! It's nice to see your still working hard. It's good to see your lowering the numbers on your patients. However could there be other variables that could affect weight loss on the patients?
ReplyDeleteYes of course! We will never be able to pinpoint the exact cause of weight loss for all of these patients. But that's why we need a low p-value when doing the statistical analysis!
DeleteWhat kind of controls are you using for your experiments and why?
ReplyDeleteHey Justin! I'm glad to see that you're keeping organized because collecting so much data probably wasn't that simple so keeping all of it organized is very important.Thanks!
ReplyDeleteThanks!
Delete