Questions about some details in MBA.R

Dear Dr Chen Gang，

I am reading the source of MBA, namely MBA.R. Some details have confused me. I need your help.

First, in the definition of rp_summary function, why do you transform the original data in this way: sqrt(2) * (data - mean) + mean?

# obtain summary information of posterior samples for RPs
rp_summary <- function(ps, ns, nR) {
mm <- apply(ps, c(2,3), mean)
for(ii in 1:nR) for(jj in 1:nR) ps[,ii,jj] <- sqrt(2)*(ps[,ii,jj] - mm[ii,jj]) + mm[ii,jj]
RP <- array(NA, dim=c(nR, nR, 8))
RP[,,1] <- apply(ps, c(2,3), mean)
RP[,,2] <- apply(ps, c(2,3), sd)
RP[,,3] <- apply(ps, c(2,3), cnt, ns)
RP[,,4:8] <- aperm(apply(ps, c(2,3), quantile, probs=c(0.025, 0.05, 0.5, 0.95, 0.975)), dim=c(2,3,1))
dimnames(RP)[[1]] <- dimnames(ps)[[2]]
dimnames(RP)[[2]] <- dimnames(ps)[[3]]
dimnames(RP)[[3]] <- c('mean', 'SD', 'P+', '2.5%', '5%', '50%', '95%', '97.5%')
return(RP)
}


Secondly, in extracting region pair effects, why is the reference level "intercept - sum(other EOI)" ( in code: psa[nl,,,] <- ps[nl,,,] - psa[nl,,,] # reference level)? And why is the EOI level "EOI + intercept" (in code: psa[jj,,,] <- ps[nl,,,] + ps[jj,,,])?

########## region pair effects #############
# for factor
if(any(!is.na(lop$EOIc) == TRUE)) for(ii in 1:length(lop$EOIc)) {
lvl <- levels(lop$dataTable[[lop$EOIc[ii]]])  # levels
nl <- nlevels(lop$dataTable[[lop$EOIc[ii]]])  # number of levels: last level is the reference in deviation coding
ps <- array(0, dim=c(nl, ns, nR, nR)) # posterior samples
for(jj in 1:(nl-1)) ps[jj,,,] <- region_pair(pe, ge, paste0(lop\$EOIc[ii],jj), nR)
ps[nl,,,] <- region_pair(pe, ge, 'Intercept', nR)
psa <- array(0, dim=c(nl, ns, nR, nR)) # posterior samples adjusted
for(jj in 1:(nl-1)) {
psa[jj,,,] <- ps[nl,,,] + ps[jj,,,]
psa[nl,,,] <- psa[nl,,,] + ps[jj,,,]
}
psa[nl,,,] <- ps[nl,,,] - psa[nl,,,]  # reference level
dimnames(psa)[[3]] <- rL
dimnames(psa)[[4]] <- rL


Thirdly, why can we get the diagonal posterior samples though we don't provide the diagonal values?
And if we provide a datatable with diagonal values or duplicates entries ( more than n*(n-1)/2 ), will the final results be greatly affected?

in the definition of rp_summary function, why do you transform the original data in this way: sqrt(2) * (data - mean) + mean?

The transformation of the original data using sqrt(2) * (data - mean) + mean serves a specific purpose. Since the input data come from a symmetric matrix, each data point is utilized twice. The transformation accounts for this duplication by ensuring that the resulting summary statistics are appropriately adjusted.

in extracting region pair effects, why is the reference level "intercept - sum(other EOI)" ( in code: psa[nl,,,] <- ps[nl,,,] - psa[nl,,,] # reference level)? And why is the EOI level "EOI + intercept" (in code: psa[jj,,,] <- ps[nl,,,] + ps[jj,,,])?

In the context of categorical variables (factors), the last level is internally treated as the reference level. Other levels are expressed relative to this reference level. These formulas allow us to retrieve each level’s effect while maintaining the appropriate contrasts.

why can we get the diagonal posterior samples though we don't provide the diagonal values?

Even if diagonal values are not explicitly provided, MBA can still generate region-levels posterior samples. The reason lies in how MBA was developed. It takes a correlation matrix for a certain number of regions as input. In this context, the diagonals don’t contain any unique information. However, the region-level effects are assembled from the off-diagonal elements, which are assumed to contain relevant information about the pairwise relationships between regions.

And if we provide a datatable with diagonal values or duplicates entries ( more than n*(n-1)/2 ), will the final results be greatly affected?

In what context are you considering the diagonals as input for MBA?

Gang Chen

Dr Chen Gang,

Since the input data come from a symmetric matrix, each data point is utilized twice.

In my practice, I just use the half of a matrix without diagonals as the input to MBA. That is the recommended usage in its help file. So in your reply, "the input data" means posterior distributions from MBA not the original input to MBA, right?

In the definition of region_pair function, I see the region-specific effects and region-interaction effects are all added twice. So here is function of sqrt(2) * (data - mean) + mean. If it's right, I wonder which mathematical principle this formula is based on. Could your provide me some references？

In what context are you considering the diagonals as input for MBA?

About the last question, I am just curious because if we can provide full matrix with diagonals as the input to MBA, there is no need to remove the duplicates (and diagonals) from the datatable transformed from a correlation matrix, which was a big hurdle for me.

These formulas allow us to retrieve each level’s effect while maintaining the appropriate contrasts.

Here is my understanding, which may or may not be true.

Say we have a categorical variable, which has 3 levels: ctr, sham, exp, and ctr is chosen as the control level. If we transform this categorical variable into a quantitative contrast variable, ctr + sham + exp must equal to 0 and ctr must equal to negative (sham + exp) like this:

|treat   |treat_transformed |
exp       1
sham      1
ctr      -2


MBA was devised with users contributing half of the off-diagonal elements as input, and internal duplication handles the rest. Therefore, the adjustment sqrt(2) * (data - mean) + mean serves as a mere scaling factor of \sqrt{2} for the standard error, acknowledging that each data point is utilized twice.

Regarding the factor with three levels (ctr, sham, and exp), is it a within-individual or between-individual categorical variable? Which specific effects or contrasts are you interested in about these three levels?

My recent contemplation on correlation-based analyses in neuroimaging has led me to recognize inherent challenges and caveats. Conventional approaches appear to lack a robust foundation for network inferences. Regrettably, the methodology implemented in MBA is not an exception to this observation.

Gang Chen

Dr Chen Gang,

is it a within-individual or between-individual categorical variable?

That is a between-individual variable and my interested contrasts are sham_vs_ctr, exp_vs_ctr and exp_vs_sham. However, It is not a realistic example. I just think it may help understanding this method.

In my opinion, seed-based methods are based on biased inputs so they indeed have inherent caveats. Do you remember we have talk about an example of RBA? When using MBA, I always prepare whole brain ROIs-ROIs correlation matrix (not just based on one seed like you have done in your paper) so this problem can be mitigated in a certain degree. In addition, we pay more attention to effect uncertainty than its magnitude when analyzing RBA results because there is no magnitude reference. However, it is hard to neglect the magnitude contrast in effect matrix plots from MBA.

Of course, we will not just rely on a piece of evidence to draw the final conclusion.

Ren Yiyuan

Yiyuan,

Correlations play a crucial role as an initial step in understanding relationships between regions. However, relying solely on correlations may not yield accurate results in network-based causal inference. This limitation/caveat prompts me to consider the modeling approach used in the program MBA as challenging to interpret from a causal inference standpoint. In contrast, the methodology employed by the program RBA would be more interpretable.

Gang Chen

Dr Chen Gang,

That is interesting. I have always thought MBA is just an extension of RBA to matrix-like data. It is difficult for me to notice the detailed differences between them in those statistical formulas. I hope you can write some introduction articles to clarify the distinctions for the users (including me).

However, in my recent experience with MBA and RBA, MBA seems to provide more accurate results than RBA. Sorry, I can not show the unpublished data in the public. Using RBA, I saw a postive correlation change between the target seed and some region in the cerebellum. With most of other regions, the seed show no statistically significant correlation changes. At first, I draw the conclusion that some treatment has contributions to improving the connection from the seed to that region in cerebellum. Then I used MBA to analyze the whole brain ROIs-ROIs correlation matrix and I noticed that connections between the region in cerebellum and nearly all other ROIs changed dramatically. The connection to target seed is just minor one among of them. The results were confirmed by another RBA based on that region in cerebellum. If I did not do MBA, I would think the cause of connection change is the original seed but thanks to MBA, I saw the whole picture.

Yiyuan

Yiyuan,

In my opinion, there are two big problems with correlation-based inference with fMRI data:

1. Estimation Bias: The estimated correlations can be prone to under- or over-estimation. Factors such as noise, sampling variability, and preprocessing steps may substantially impact the accuracy of these estimates.

2. Network-Level Relationships: While correlations provide valuable information about pairwise associations, they fall short in revealing the intricate cross-region relationships within neural networks. Understanding network-level dynamics requires more sophisticated methods beyond simple correlations.

Regarding the MBA results aligning more with your research hypothesis, it’s possible that the distortions introduced by the two problems above might counterbalance each other to some extent.

Gang

Dr Chen Gang: