3dMVM beginner question: one group, two qvars

Greetings, venerable AFNIstas-

I’m new to the 3dMVM game so please feel free to direct me to a previous thread if you’ve already answered this. Just want to make sure that our coding matches our questions.

We have two quantitative variables per subject: 1) CSF concentration of the dopamine metabolite homovanillic acid (HVA); and 2) age, which tends to be weakly but significantly correlated with HVA. Within a single group, we’d like to look at the following in relation to some rsfMRI indices:

  1. HVA, with the relation between age and the rsfMRI DV accounted for
  2. age, with the relation between HVA and the rsfMRI DV accounted for

It seems that this could be handled by the following:


3dMVM \
-prefix Age_HVA \
-bsVars "Age+HVA" \
-qVars "Age,HVA" \
-jobs 4 \
-dataTable \
Subj Age HVA InputFile \
AdP-003-1 54 211 /halfpipe/sub-MDD003/func/task-rest/sub-MDD003_task-rest_feature-dualReg_map-FINDIca_component-12_stat-effect_statmap.nii.gz \
...

Aye or nay?

And, a follow-up question:

What if I’m interested not in how HVA and age might uniquely account for variability in the rsfMRI data but the complement to the that: how the shared variance in age and HVA could relate to the imaging indices? Probably most efficient to handle this first on the side of the two predictor variables–i.e., calculate each subject’s projection onto the first principal component of age and HVA and then use that as a qvar in 3dMVM.

Sorry. I feel like this is turning into stats therapy more than AFNI advice. Feel free to direct me to other resources you’ve either developed or found useful.

Paul, your 3dMVM specification looks fine to me. However, you may consider using 3dttest++ or 3dRegAna for your next question.

What if I'm interested not in how HVA and age might uniquely account for variability in the rsfMRI data but the
complement to the that: how the shared variance in age and HVA could relate to the imaging indices?

Check out this thread and see if it helps: sums of squares - Where is the shared variance between all IVs in a linear multiple regression equation? - Cross Validated

Thanks, Gang. The Venn diagram-based discussion was really nice.

One addition thread:

For each of the qvars, is it possible to get 3dMVM to output a t- or r-statistic?

Thanks much!

For each of the qvars, is it possible to get 3dMVM to output a t- or r-statistic?

Add something like the following to get t-statistic:

-gltLabel 1 Age -gltCode 1 'Age :'
-gltLabel 2 HVA -gltCode 2 'HVA :' \

Then you can convert t-statistic to r afterward.

Hi Gang--

Following up...

However, you may consider using 3dttest++ or 3dRegAna for your next question.

Do you have suggestions for how to implement a shared-variance analysis with 3dttest++ or 3dRegAna?

I would think it could be done with just 3dMVM, with a little work beforehand in Matlab or Excel. For example, say we have predictor variables X and Y that are correlated and we want to see the relation of their shared variance to Z. What if I were to enter two qvars into 3dMEMA: 1) X residualized against Y (i.e., X with no shared variance with Y in it); and 2) just X. Wouldn't the marginal effect of X in this model give me what I'm looking for? Check my thinking?

Many thanks!

Paul

Hi Paul,

how the shared variance in age and HVA could relate to the imaging indices?

How to quantify the relatedness in this context? One approach is to use the coefficient of determination R^2 in a regression model that measures the proportion of the variation in the response variable Z that is predictable from the explanatory variable(s).

With two explanatory variables X and Y, you can construct three separate models for the response variable Z:

  1. Z ~ X,
  2. Z ~ Y, and
  3. Z ~ X + Y.

Then obtain R_0x^2 from model 1), R_0y^2 from model 2), and R_1x^2 plus R_2y^2 from model 3). Presumably,

R_0x^2 + R_0y^2 ≥ R_1x^2 + R_2y^2

Use 3dRegAna to obtain these R^2 values, and this is why I previously suggested 3dRegAna. Check out the 3dRegAna help (and maybe the manual https://afni.nimh.nih.gov/afni/doc/manual/3dRegAnam.pdf) for details.

So, the following differences

(R_0x^2 + R_0y^2) - (R_1x^2 + R_2y^2),
R_0x^2 - R_1x^2
R_0y^2 - R_2y^2

would help you partition and assess each variable's unique and shared contribution in predicting Z?

Hi Paul,

how the shared variance in age and HVA could relate to the imaging indices?

How to quantify the relatedness in this context? One approach is to use the coefficient of determination R^2 in a regression model that measures the proportion of the variation in the response variable Z that is predictable from the explanatory variable(s).

With two explanatory variables X and Y, you can construct three separate models for the response variable Z:

  1. Z ~ X,
  2. Z ~ Y, and
  3. Z ~ X + Y.

Then obtain R_0x^2 from model 1), R_0y^2 from model 2), and R_1x^2 plus R_2y^2 from model 3). Presumably,

R_0x^2 + R_0y^2 ≥ R_1x^2 + R_2y^2

Use 3dRegAna to obtain these R^2 values, and this is why I previously suggested 3dRegAna. Check out the 3dRegAna help (and maybe the manual https://afni.nimh.nih.gov/afni/doc/manual/3dRegAnam.pdf) for details.

So, the following differences

(R_0x^2 + R_0y^2) - (R_1x^2 + R_2y^2),
R_0x^2 - R_1x^2
R_0y^2 - R_2y^2

would help you partition and assess each variable's unique and shared contribution in predicting Z?

Hi Gang–

Sorry for my late reply. Just getting back online following the holidays.

Thank you for turning me on to 3dRegAna. I just went over the manual. What a nice and flexible tool. The process you present implements the same approach I had in mind but much more elegantly. I also like how one can play around with different models…

Thanks, again, for the help!

Paul

Hi Gang--

Two years in the future now and playing with the question of partitioning unique and shared variance again...this time in the context of how two positively correlated variables--body mass index (BMI) and the ratio of bad to good cholesterol in the blood (Apo)--associate with voxel-wise brain volume (VBM).

Following up specifically on:

With two explanatory variables X and Y, you can construct three separate models for the response variable Z:

  1. Z ~ X,
  2. Z ~ Y, and
  3. Z ~ X + Y.

Then obtain R_0x^2 from model 1), R_0y^2 from model 2), and R_1x^2 plus R_2y^2 from model 3). Presumably,

R_0x^2 + R_0y^2 ≥ R_1x^2 + R_2y^2

Here's a summary of the code I'm using for models 1, 2, and 3, respectively, as you propose above.

Model 1 (VBM ~ Apo):
3dRegAna
-rows 117
-cols 2
-xydata 0.814814815 23.7654321 T1_AdP-003_struc_GM_to_template_GM_mod.nii.gz
-xydata 0.398734177 21.2244898 T1_AdP-004_struc_GM_to_template_GM_mod.nii.gz
...
-model 1 : 0 \

Model 2 (VBM ~ BMI)
3dRegAna
-rows 117
-cols 2
-xydata 0.814814815 23.7654321 T1_AdP-003_struc_GM_to_template_GM_mod.nii.gz
-xydata 0.398734177 21.2244898 T1_AdP-004_struc_GM_to_template_GM_mod.nii.gz
...
-model 2 : 0 \

Model 3 (VBM ~ Apo + BMI):
3dRegAna
-rows 117
-cols 2
-xydata 0.814814815 23.7654321 T1_AdP-003_struc_GM_to_template_GM_mod.nii.gz
-xydata 0.398734177 21.2244898 T1_AdP-004_struc_GM_to_template_GM_mod.nii.gz
...
-model 1 2 : 0 \

I'm hung up, however, when it comes to...

Presumably,
R_0x^2 + R_0y^2 ≥ R_1x^2 + R_2y^2

To get a volume corresponding to (R_0x^2 + R_0y^2) - (R_1x^2 + R_2y^2), I first add the output volumes corresponding to R^2 from Model 1 and R^2 from Model 2 and then subtract from that volume the volume corresponding to R^2 from Model 3. When I do so, I get expected and unexpected things. First, as expected, I can see where the shared variance between Apo and BMI are associated with VBM in the striatum--which was what motivated this analysis. Second, unexpectedly, is that the subtraction volume has many (about half) negative values.

Which finally brings me to my question. How could the SSE(F) associated with R_0x^2 + R_0y^2 (Model 1 + Model 2) be more than the SSE(F) associated with R_1x^2 + R_2y^2 (Model 3)? Alternatively, perhaps there is something wrong with how I've implemented your suggestions?

Thanks for holding my hand through this...

Paul

Paul,

Time doesn’t merely fly; it zips back and forth!

Maybe worth considering a causal inference perspective to tackle the issue at hand? Check out this recent preprint and see if you could apply your domain knowledge in model construction.

Gang Chen

Hi Gang--

I'll take from your answer that my approach is okay and that the GLM has done some mal-culating. I also see some pointers in your preprint to the potential mechanics of this. I'll read and consider more.

The study at hand doesn't have too much of a causal bent to it but our current efforts align very well with what you present in the new article--deeper consideration of covariates, how they can both hide and induce effects, and how it's better to put everything on the table so we know better what to approach and what do discard going forward. I hope your article gets the consideration that it deserves. Our most recent bent has been to give up on the GLM and use propensity matching
(e.g., https://www.sciencedirect.com/science/article/pii/S2451902223003385)
but that's the privileged domain of big data science and there are lots of interesting small studies out there. So, sigh...

To continue the thread, I'd like to follow up on one more thing:

Two questions:

  1. If I'd actually like to report the results from (R_{0x}^2 + R_{0y}^2) - (R_{1x}^2 + R_{2y}^2) should I divide the result by 2 since the overlapping contributions of x and y are represented twice in (R_{0x}^2 + R_{0y}^2)?

  2. With respect to R_{0x}^2 - R_{1x}^2, I take it that this represents the proportion of variance in z that is accounted for in the overlapping variance between x and y? Ostensibly, then, this should be the same as R_{0y}^2 - R_{2y}^2 and the same as (R_{0x}^2 + R_{0y}^2) - (R_{1x}^2 + R_{2y}^2). I can confirm or disconfirm this on my own, cut could you tell me if there is a way in AFNI to get R_{1x}^2 and R_{2y}^2? It typically only likes to give R^2 for the full model, not the individual terms. No stress if not as I can just pull the data and do it "by hand."

Thanks and congrats on the new article. Hope you're submitting it somewhere fancy, as it deserves high visibility.

Paul

Hi Paul,

Congratulations on the new paper! It’s commendable that you’ve incorporated propensity score matching to mitigate potential biases. I’m curious: Have you also considered the feasibility of using the inverse probability weighting method in your case?

Regarding my original formulas, I think I previously made some mistakes. Let me clarify the situation by deriving the formulas. First, suppose that R_{0x}^2 and R_{0y}^2 are the coefficients of determination for the models z\sim x and z\sim y, respectively, while R_{x}^2 and R_{y}^2 are the coefficients of partial determination for the model z\sim x+ y.

Assume that the unique proportion of variability that variable x accounts for in z is p_x. Similarly, the unique proportion of variability that variable y contributes to z is p_y. Lastly, the shared (or common) proportion of variability that variables x and y contribute together to z is p_{xy}. With these three proportions conceptually represented by the following Venn diagram,

we have the following relationships:

\begin{aligned} p_x+p_{xy}&=R_{0x}^2,\\ p_y+p_{xy}&=R_{0y}^2,\\ p_x+p_{xy}+p_y&=R_{x}^2+R_{y}^2.\\ \end{aligned}

Solving the above simultaneous equations leads to the shared (or common) proportion of variability

p_{xy}=(R_{0x}^2+R_{0y}^2)-(R_{x}^2+R_{y}^2),

and the unique proportions of variability for x and y, respectively,

p_x=R_{x}^2+R_{y}^2-R_{0y}^2,\\ p_y=R_{x}^2+R_{y}^2-R_{0x}^2.

I hope I finally get this straightened out this time! You can implement the three models with 3dMVM, besides 3dRegAna. To calculate the corresponding coefficient of determination R^2, use the following formula:

R^2 = t^2/(t^2 + DF).

Here, DF represents the degrees of freedom associated with the t-statistic.

Even if you don't have a strong causal interest in the context of variance partitioning, the approach still relies on certain underlying assumptions:

  1. Linearity: There is no nonlinearity present among the variables.
  2. Interaction: There is no interaction between the variables x and y.

Gang Chen

Ahhh, Gang,

What a thing of beauty. That checks out 100%. (Or 100% minus my previous assumption that...

\begin{aligned} R_{x}^2+R_{y}^2=p_x+p_y...\\ \end{aligned}

...where the common proportion is factored out.

Nope. I didn't even know about this. If you have recommended reading, I'll definitely take a look at this in contrast to standard propensity matching.

Our daycare is closed this week so I'll be in full-dad-mode but will run the numbers next week and report back.

Paul

Hi Gang--

Quick check on using 3dMVM to render t-maps for R_{x}^2 and R_{y}^2, the coefficients of partial determination.

I'm assuming my approach should look something like either...:

...just one model...

3dMVM
-prefix X_and_Y.nii
-bsVars "X+Y"
-qVars "X,Y"
-num_glt 2
-gltLabel 1 X -gltCode 1 'X :'
-gltLabel 2 Y -gltCode 1 'Y :'
...

...or, perhaps, two models...

...one for getting R_{x}^2...

3dMVM
-prefix X_and_Y.nii
-bsVars "X+Y"
-qVars "X,Y"
-num_glt 2
-gltLabel 1 X -gltCode 1 'X :'
...

...and another for getting R_{y}^2

3dMVM
-prefix Y_and_X.nii
-bsVars "Y+X"
-qVars "X,Y"
-num_glt 2
-gltLabel 1 Y -gltCode 1 'Y :'
...

Thanks in advance!

Paul,

Your first model is sufficient for calculating R_x^2 and R_y^2. Then, construct two separate models as follows:

  3dMVM
    -prefix X.nii
    -bsVars "X"
    -qVars "X"
    -num_glt 1
    -gltLabel 1 X -gltCode 1 'X :'

and


    3dMVM
    -prefix Y.nii
    -bsVars "Y"
    -qVars "Y"
    -num_glt 1
    -gltLabel 1 Y -gltCode 1 'Y :'
    ...

These two separate models will allow you to compute R_{0x}^2 and R_{0y}^2.

Gang Chen

Super. Thanks, Gang.