How to do model comparison?

Galit · May 28, 2019, 12:06pm

Hello AFNI experts,

How to I compare two nested models in AFNI?
More specifically: I have fitted a model using 3dMVM, which uses as predictors three between-subject variables (measured outside the scanner). I want to test for each predictor whether it has a unique contribution to the explained variability in the estimated BOLD contrast. Thus, I want to get for each voxel a p-value that reflects the comparison between the full 3dMVM model and a nested model which is the same as the big one besides one predictor removed. How do I do that?

Thank you very much,

Galit

Gang · May 28, 2019, 9:35pm

Galit,

Unfortunately there is no ideal way to compare two models under the conventional statistical framework. One simple approach is to check your full model and see if the statistical evidence (e.g., t-statistic) for an effect you’re questioning is strong or weak, and use that as some sort of evidence.

Model comparison and validation can be systematically assessed through Bayesian modeling, but that is currently available only for region-based analysis: https://afni.nimh.nih.gov/afni/community/board/read.php?1,157054,157054#msg-157054

Galit · May 29, 2019, 10:22am

Thank you, Gang. I have mild collinearity between my predictors (around r=0.4), that’s why I was thinking that model comparison would be a better choice than relying on the t-statistics.
I suppose I could identify ROIs based on clusters of t-statistics, and then validate the unique contribution of each regressor in that ROI using Bayesian modeling (which I will have to read about, from your link, as I have no experience). Does that make sense? Or is it double-dipping?

Gang · May 29, 2019, 3:36pm

I have mild collinearity between my predictors (around r=0.4)

No, don't worry about it: that's not something you should feel sleepless.

I suppose I could identify ROIs based on clusters of t-statistics, and then validate the unique
contribution of each regressor in that ROI using Bayesian modeling (which I will have to read
about, from your link, as I have no experience). Does that make sense? Or is it double-dipping?

If you're happy with the whole brain voxel-wise analysis results, don't bother trying the region-based approach. It would be more appropriate to define the regions based on some information independent of the current data.

Galit · May 31, 2019, 3:24pm

Gang,

Thank you. I am not sure about the whole-brain analysis results, that’s why I want to make sure that my results are “real”.

This is how I understand it (and please correct me where I’m wrong): On the one hand, regression estimates should represent only unique contributions of the predictors. Hence, shared variability is not considered anyway, so we need not worry about collinearity. On the other hand, with too much collinearity, estimates are biased, so they don’t necessarily represent the unique contributions anymore. You say that with r=0.4 we can still trust the estimates. Could you please explain why, or possibly direct me to some reference? I am worried that a future reviewer will point at the correlation between my predictors as a problem. Not to mention that I myself want to understand this and have an idea why with r=0.4 we’re still in the “safe zone”.

Also, assuming that this correlation does add some bias to the estimates, I wonder how this affects the cluster-wise results. Does it make it easier or more difficult to find significant clusters per predictor in a whole-brain analysis? I.e., if I found a significant cluster, should I doubt it more or rest assured that if I found it even under these condition then it is even more “real”?

I hope I was clear, thank you very much,

Galit

Gang · May 31, 2019, 9:57pm

regression estimates should represent only unique contributions of the predictors.

No, that's not true. Crudely speaking, if two predictors are correlated with each other to some extent, they would "split" the amount of data variability they share. This is achieved through the principle of ordinary least squares or maximum likelihood.

assuming that this correlation does add some bias to the estimates, I wonder how this affects the cluster-wise results.

The decision of selecting predictors should be based on the prior information about the potential variables that may account for data variability. Whether some of predictors are correlated to some extent is just a practical issue in terms of statistical efficiency: if two predictors are heavily correlated, you may need to collect more data to parse the information than a situation with less correlated predictors.