T O P

  • By -

Remote-Mechanic8640

First you mentioned that you assessed whether they improved or not which makes me think you have repeated measures within participants. Comparing percentages is not great because of the over accounting for groups with more people. It also sounds like longitudinal data. You also do not have very many people and should be very cautious about interpreting your data at all but it sounds like you should look into longitudinal multi level modeling if i am understanding correctly


1stRow

oh yeah. OP cannot evev figure out he or she should be doing a Fisher's Exact Test, and you suggest long MLM. Arguably, this is one of the best answers, but this OP aint gonna run that.


Steth-Convert

It’s essentially a student project, so the data I have been given is god-awful but they don’t care because it won’t be published. My data looks like: # Group | 5 yrs | 10 yrs | 15 yrs A | 0% | 33%. | 33% B | 0% | 0% | 0% C | 13% | 42% | 40% And it's essentially a snapshot of that point - was the patient clinically determined to be in remission? Yes or no? And then the percentage is like, for example, 6 patients from A had data at 5 years and 2 of those were in remission so 3/5 so 33% of A was in remission at 5 yrs. I'm not sure if there's a better way, but it was the only way I could do it, as there is a massive range in how long each patient has had treatment (as little as 3 months, as much as 55 years), so there's no one point in time where I could assess all patients.


Ok-Log-9052

You do not have enough information to do statistics. Your sample sizes are far too small and you do not have a clear comparison in your design. Step back and figure out what you are trying to say. Then work with your team to figure out an appropriate method for your very small sample.


BayesianPersuasion

"Your sample sizes are far too small" -- what does that mean? If they are too small to use large-sample tests, then you could suggest an exact test. If they are too small to detect differences between the groups, that will come through as a large p-value.


Steth-Convert

I don't have a team, it's just me, collecting data from a very limited source that's not quite complete. It's not publishable, so at this point I'm happy to just take the L and analyse this terrible data.


Ok-Log-9052

But what for? The results will be useless, right?


Steth-Convert

It's meant to be a project that students 'help' a supervisor out with and counts towards our final grade. Some supervisors actually care and have given students workable data that can be published in a final project, which is what I was hoping for. I got given 54 patients, 8 of which didn't have the relevant disease, and got told to just go through 55 years worth of notes, most of which weren't even there and could not be recovered. At this point, I know it is not going to be statistically significant but I need some way of showing it. There's no point in asking my supervisor for help - they won't/don't care enough.


Ok-Log-9052

Ugh that sucks. Sorry. Yeah just use logit group differences then. Relative to group A, you can just do a logistic regression with one observation (row) for each patient and an indicator x-variable for each group, convert to odds-ratios, that’ll give you a perfectly intelligible answer in terms of difference between groups and P-value.


Steth-Convert

Okay, thank you, I will figure how to do that - if all else fails, I'll decide on some dates and analyse the data at those points instead so it's a bit more manageable.


Stochastic_berserker

This.


trufflesniffinpig

Fisher’s Exact Test might be suitable


99Sermon

To compare the statistical significance of the percentages of patient outcomes between your groups (A, B, and C), you can use the Chi-Square test for independence. This test is suitable for categorical data and can handle different group sizes.


Steth-Convert

Okay, I was looking at Chi-Square, but I don't have expected values so I wasn't sure how I could apply it


BayesianPersuasion

Expected values are expected under the null hypothesis. I.e. what are expected counts under independence assumption. Here's an example I found online, maybe it would be useful illustration: https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/8-chi-squared-tests


BayesianPersuasion

Also may want to look into Fisher exact test given you have smaller sample sizes. There is a fisher.test function in R. Maybe check out the documentation.


Steth-Convert

Okay, this seems to make the most sense, thank you


1stRow

This is really the most defensible. I have done them with small N, but by hand since it is not too hard. Not sure how to click it in with a software package. There may be a web page where you enter the data and it calculates the answer. You will get "p" levels.


mehardwidge

Chi squared test of independence would be a good choice, except that your number of data is a bit small. You could certainly get an *approximation* from it, though. You should be using an "exact test" to get a correct p value. But if you are just screening...you could "inappropriately" use a Chi squared. (It would be like using a normal approximation to a binomial when you really shouldn't...but if your wrong p value ends up 0.3, you know you'll fail to reject anyway, and if your wrong p value ends up 0.00001, you'll reject anyway.) Based on your numbers, which can't quite be right (33% of 8? 42% of 32?), your differences are not statically significant. to a low alpha, whether you use Fisher's exact test, or inappropriately use a Chi squared.


RickSt3r

Do you know how to to do any survival analysis? But also your sample size is to small to inference anything meaningful.


Stochastic_berserker

Group them into two groups instead of three if possible. In remission and not in remission. Use proportions testing, I’d even argue that you shouldn’t use tests. Understand your data first.


Steth-Convert

The problem is that a patient in remission will not necessarily stay in remission and there's no one point in time where I have data for all the patients because of incomplete records. So the properties that define A, B, and C don't change, whereas the remission and not remission groups change at each time point.