Different results from AFNI and CONN

hanjiaxu · November 8, 2021, 2:50am

Hello AFNI experts,

I conducted a seed to whole brain analysis on resting-state data using CONN and AFNI individually (also preprocessed in CONN and AFNI respectively), but these two softwares yielded different results for two seeds (i.e. no significant clusters from CONN but there were 3 significant clusters from AFNI). I am just wondering if this is even possible for different softwares to give different results or has this been studied before? Any information or help would be really appreciated!

Thank you in advance!
Jiaxu

pmolfese · November 8, 2021, 2:53pm

Preprocessing makes a HUGE difference in any type of analysis. And so you’d want to try and match these as closely as you can. Perhaps trying to use preprocessed data from CONN in AFNI or AFNI’s proc.py data in CONN.

Beyond preprocessing, there are differences in how CONN and AFNI handle significance testing. CONN has options for Random Field Theory (RFT) and other methods, whereas AFNI recommends cluster-based thresholding. Similarly you’d want to match these styles.

Has it been systematically studied? There’s lots of published controversy over the choices for literally every step of fMRI processing. Even packages like fMRIprep that are meant to chose a “best of best” methods have changed programs or options throughout. My recommendation is to process in whichever package and then publish your results as clearly as you can alongside the data so others can replicate (or possibly not) the results. This combined with conservative thresholds should please most journals and reviewers.

Beyond this cheerleading, not sure we can be much help other than to backup our choices in default processing options with our own publications or beliefs.

ptaylor · November 8, 2021, 6:50pm

Hi, Jiaxu-

A) I will assume that you input the same data into both packages for seedbased correlation. Is that correct?

B) Should I also assume that for clustering you have one single determined threshold you are applying to both packages?

C) Importantly, can I ask commands you used to do this with AFNI? Did you use this program 3dTcorr1D in AFNI, or something else?
https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/programs/3dTcorr1D_sphx.html#ahelp-3dtcorr1d

It would be best to just compare correlation maps before thresholding, as a start. Every layer of analysis could add a difference.

At the clustering level, there can be differences between packages such as what it means to be a neighbor of a voxel: do the voxels have to share a face, or just an edge, or just a node? We refer to this settable parameter as NN=1,2,3. See here for more (I don’t know what Conn uses):
https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/FATCAT/MakingROIs.html#getting-to-know-your-neighbors

–pt

hanjiaxu · November 10, 2021, 11:37pm

Hello pt,

Thank you for the follow-up questions!

(A) yes, I imported the same data into both packages for seed correlation.

(B) yes, I am using the same threshold for clustering (0.01), but as Peter mentioned earlier, the method seems different for these two softwares (random field theory for CONN, and for AFNI I used 3dClustSim).

(C) I used the program 3dNetCorr in AFNI to calculate the correlation maps. But should I use 3dTcorr1D instead?

Thank you again,
Jiaxu

hanjiaxu · November 11, 2021, 1:55am

Thank you very much for the insight, Peter!

ptaylor · November 11, 2021, 3:09am

Hi, Jiaxu-

(I’m going to rearrange my order of questions, but the lettering matches the previous.)

(A) OK, good.

(C1) So, if you use 3dNetCorr to calculate a wholebrain correlation map, that is done by averaging the time series within each ROI and calculating each map. That is fine—is that what you want?
(C2) And did you make the perform an analogous calculation in Conn, or did you put the output from C1 into the Conn clustering, as well?
(C3) If you did calculate the wholebrain correlation map separately in Conn, were those pre-threshold results the exact same?

(B) Well, if you use different methods to estimate the clustering threshold, you will likely get different estimates; in theory, they should be close. If you are using different software, there might be other differences involved, some of which are controllable.
(B1) Is that threshold the p-value threshold for voxelwise thresholding? If so, that isn’t very small… I would expect larger differences between software for larger p-values.
(B2) I would guess that Conn uses a different neighborhood definition than AFNI’s default, like described here:
https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/FATCAT/MakingROIs.html#getting-to-know-your-neighbors
–but it is up to you to check this to be sure. This can make a difference of 10-20% in cluster size. But if the way clusters/neighborhoods are formed is different, then this is not really an “apples to apples” comparison.
(B3) The sidedness of testing with thresholding matters. In AFNI, you can use 1sided, 2sided or bisided thresholding; by default, we use 2sided tests and 2sided or bisided thresholding as default. If you use a pair of 1sided tests without correction (which is unfortunately the default in many software packages—I am not sure about Conn), then you are artificially doubling your false positive rate, at a minimum. See:
https://pubmed.ncbi.nlm.nih.gov/30265768/
for more details.
(B4) What is your exact 3dClustSim command? Are you using the mixed ACF model, described here:
https://pubmed.ncbi.nlm.nih.gov/28420798/
https://pubmed.ncbi.nlm.nih.gov/28398812/
Based on a useful point in a paper on clustering, we did adjust the assumption made for clustering that the spatial autocorrelation function of noise in FMRI is well-approximated by a Gaussian (this assumption was basically made by all softwares for a long time, and it might still be assumed in some). We now estimate a more suitable “mixed ACF” function, that allows for the heavier tails that seem to be present. This could be another notable difference between the two programs, even on top of the different underlying methods.

And again, seeing the AFNI commands used will clarify a lot here.

–pt

hanjiaxu · November 11, 2021, 5:44am

Hello pt, thank you again for these questions and sharing your insights!

(C1) So, if you use 3dNetCorr to calculate a wholebrain correlation map, that is done by averaging the time series within each ROI and calculating each map. That is fine—is that what you want?

Yes, I think this would be in corresponding to what I did in CONN I believe.

(C2) And did you make the perform an analogous calculation in Conn, or did you put the output from C1 into the Conn clustering, as well?

I did not put the output from C1 into the conn clustering, but to my best knowledge, I made an analogous calculation in CONN.

(C3) If you did calculate the wholebrain correlation map separately in Conn, were those pre-threshold results the exact same?

I did not check this previously, but I will.

(B1) Is that threshold the p-value threshold for voxelwise thresholding? If so, that isn’t very small… I would expect larger differences between software for larger p-values.

Sorry, I didn’t made it clear in my last message. So I used p<0.001 for voxelwise thresholding for each analysis, and p<0.01 for cluster-wise threshold.

(B2) & (B3) I will double check!

(B4) What is your exact 3dClustSim command? Are you using the mixed ACF model, described here.

Yes, the code that I use is:


3dClustSim -acf 0.567915926	 5.510608025 14.86245309 -mask GM_msk333_final+tlrc.

“0.567915926 5.510608025 14.86245309” these numbers were the averaged value of the output from the preprocessing of all scans: “blur estimates (ACF)”

Did I use this correctly?

Thank you again!
Jiaxu

ptaylor · November 11, 2021, 2:48pm

Hi, Jiaxu-

Great, that clarifies things.

I think C3 is the first priority: comparing that the seedbased maps from average ROIs, as calculated in AFNI and Conn, really do match. This should be something that matches really quite closely, voxel-per-voxel, to something like floating point precision. If that is different… we need to investigate. You can load the analogous volumes of Pearson correlation (or Z-transformed) maps as overlay and underlay, and click around. Or, use 3dcalc to subtract the volumes, and check the size of difference:


3dcalc -a DSET_CORR_AFNI -b DSET_CORR_CONN -expr 'a-b' -prefix DIFF_CORR

Thanks for clarifying the thresholding values in B1.

For B4: that seems fine for 3dClustSim, yes, assuming that the same mask is used from what was applied to calculate those smoothness values with 3dFWHMx. The ACF parameters represent the average spatial smoothness of noise in the data throughout a masked region (e.g., the brain or a GM mask). You can then use 3dClustSim to take that smoothness information to estimate what kinds of clusters a “noise-only” set of data would make in that same region; you then use your voxelwise p-value and desired FPR to pick what cluster threshold you want—but for consistency, you have to maintain the same region from which the ACF parameters were estimated.

So, I hope the voxelwise correlation maps from the two softwares are essentially the same. At the next stage—that of clustering—I suspect there will be differences if the smoothness distribution assumptions are different (again, everyone used to assume Gaussian, but AFNI changed to the mixed ACF to be more general because that seemed to be a necessary/better way to approach this), as well as due to the fact that RFT and 3dClustSim’s brute-force simulations are different techniques. While the difference of techniques will surely create some difference, I might expect the assumptions of noise distributions in each to be a larger factor (if they are different), as well as the features of how a neighborhood is defined (not: in AFNI you can set this to be whatever you want, so you can match whatever the other toolbox is using, but I suspect they might not be the same ab initio—this is not an error on either’s part, just a different but reasonable choice by either). Importantly, verifying that the sidedness of testing is equivalent is important. In AFNI, the default will be 2sided at the voxelwise level, and it probably makes the most sense to use bisided at the clusterwise level—these are appropriate for most hypotheses in the field, but surprisingly not so widely adopted as defaults (the reasons for the former and the surprise about the latter are elucidated in the “A tail of two sides…” paper, cited above).

–pt