Scaling It Down: : Week 4

Friday, March 3, 2017

Week 4

Dealing with Data Issues

If you guys remember, last week I pretty much sat in front of a computer for 35 hours and wrote down the heights and weights of around 630 patients. So basically, this week, along with some other exciting experiences, was mainly focused on "cleaning up the data".

So in the beginning of the week, I thought it would be a good idea to organize the data that had issues. And that means more Excel sheets!

If you look back at the week 2 post, we had 4 main issues: missing data, data outside our time parameter, patients who died within 1 year of transplant, and patients who received multiple transplants. I made a spreadsheet for patients with each of the following issues. Each one of these spreadsheets has an "Excel Number" column, which corresponds to the row the patient info was in on the original data collection sheet. I also had to black out patient names for obvious reasons.

So this was the spreadsheet for patients who have received multiple transplants, including the transplant type and when this secondary transplant was received in relation to the one we care about. There are two main types of transplants: autologous and allogeneic. Autologous stem cell transplant involve giving a patient his/her own stem cells (which are first extracted then frozen) and allogeneic stem cell transplants are when you receive healthy stem cells from an outside donor.

It is not uncommon for a patient to receive an autologous stem cell transplant followed by an allogeneic stem cell transplant. Although patients who receive an auto transplant are still at a high risk for infection post-transplant, there are far less negative side effects when compared to an allo transplant (no graft versus host disease for auto transplants!). Because patients who receive multiple allo transplants are usually in a much worse state going into their second or third transplant, this might screw up our data. Therefore, if a patient received an auto transplant prior to their allo transplant, we kept the data. But if a patient had multiple allo transplants, we only used the data from their first allo transplant. This lowered the number of patients we had from around 633 to 533.

Next, we have the patients who were missing data. Sadly, there is no way to work around this so we were pretty much forced to remove them from the study. Since the data I was collecting was from 2008 to present, there were some patients who were missing 100 day of 1 year weights simply because it hasn't been 100 days or 1 year past there transplant. After taking these patients out, we went from 533 patients to 475.

This was the chart for patients who had data outside out time parameter. I decided to include how far off the data was that way, we would know how far we had to expand out time parameter if we needed more data. Turns out, we have even after taking out these patients so we will most likely just not include them. This left us with 409 patients.

Finally we have patients who died early. I sorted out the patients who have passed into 4 categories as you can see in the spreadsheet. We decided not to take these patients out of the study. We can at least calculate pre-transplant BMI with all of these patients and if we see a trend among these patients, perhaps that would tell us we need to intervene if patient weights are too low pre-transplant.

Sorting the Data

So the last thing I had to do this week was determine the different BMI classes and percent change in body weight classes. First, let's talk about BMI. I decided to use a pre-existing BMI classification that the World Health Organization uses. If you wanna read more about exactly how BMI is calculated click here.

I broke down BMI using the 4 main classes (underweight, normal, overweight, and obese), but I didn't break down the data into those subclasses, simply because we don't have enough data to break it down that far.
Next, I had to break down the percent BMI change (this is actually the exact same as percent body weight change) and there is no "standard" for this. For now, I broke it down into X>=0%, 0%<X<5%, 5%<X<15%, X>=15%. I might have to adjust these depending on how much data falls into each of these categories.
I hope this longer post made up for the short post last week. Although I might do some work over spring break, I won't be posting again to week 5. Until then, bye!

23 comments:

UnknownMarch 5, 2017 at 6:14 AM
Hey Justin! Nice Job on placing the data collected into multiple categories, I am sure it makes things easier that way. Since you have multiple categories now are you going to see the correlation within every category then summarize a big picture based that or only focus on the big picture? Great Job... You have progressed a lot!
ReplyDelete
Replies
AnonymousMarch 6, 2017 at 12:10 PM
Yo Justin! I see your data collecting is coming along quite nicely. It's good that you're using an effective methodology. What inferences can you make so far with the data you've collected and organized?
ReplyDelete
Replies
UnknownMarch 6, 2017 at 4:36 PM
Hi Justin! I like the methods you are using to sort your data. Your project is coming along nicely, and it looks like you are finding correlations effectively. I'm looking forward to hearing more about your project!
ReplyDelete
Replies
loMarch 6, 2017 at 9:12 PM
Hey Justin! Super cool! Congrats on finishing the data input! Do the people who did not finish their data have a consequence? I can't wait to see what happens with how you sort the BMIs!
ReplyDelete
Replies
Jack BarthMarch 6, 2017 at 10:27 PM
Hey Justin! It seems like our data is really coming along which is really exciting. How are you planning on interpreting the data to draw conclusions from it?
ReplyDelete
Replies
UnknownMarch 8, 2017 at 5:33 AM
Hi Justin

It sounds like you've really got a handle on this. Your post is clear, sophisticated, and fabulously articulates all the nuances involved in your project.

Maybe you've already written this elsewhere, but what is the next step after data collection? Start looking at patterns? Will Excel help you do this?

Additionally, how will you be presenting all your information on presentation day (or whatever it's called)? A slideshow?
ReplyDelete
Replies
UnknownMarch 13, 2017 at 3:08 PM
Hi Justin! It's so great to see how far your project has progressed. I really like how you walked through each step of your data sorting. What other difficulties did you encounter while going through the data? I am looking forward to next week and have a good spring break!
ReplyDelete
Replies
UnknownMarch 13, 2017 at 5:57 PM
Hey Justin.
It has been nice reading your posts since they have been a nice length and have also been easy to comprehend. Also, nice work on your data and excluding those who would screw up your data. All of your data collection has been very neat and organized, keep up the work!
ReplyDelete
Replies
UnknownMarch 13, 2017 at 7:16 PM
Hi Justin! Is it possible that infections could occur from the first transplant and if so how reliable could the data be then if you did not know which patients may have lost weight as a consequence?
ReplyDelete
Replies
UnknownMarch 13, 2017 at 9:00 PM
Hey Justin! It's nice to see your still working hard. It's good to see your lowering the numbers on your patients. However could there be other variables that could affect weight loss on the patients?
ReplyDelete
Replies
Dr. Sanghamitra SahuMarch 13, 2017 at 9:48 PM
What kind of controls are you using for your experiments and why?
ReplyDelete
Replies
UnknownMarch 14, 2017 at 6:23 PM
Hey Justin! I'm glad to see that you're keeping organized because collecting so much data probably wasn't that simple so keeping all of it organized is very important.Thanks!
ReplyDelete
Replies

Add comment

Pages

Friday, March 3, 2017

Week 4

23 comments: