Hi, Brady-
A couple of sidenotes to start, which I hope might be useful (apologies if they aren’t!):
From this recent paper:
https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/codex/fmri/2022_TaylorEtal.html
… there is github code for several processing examples of various AFNI commands:
https://github.com/afni/apaper_highlight_narps
… which might be useful. In particular, the use of permutations for clustersize estimation with 3dClustsim in 3dttest++ that was described here (see Fig 5):
https://pubmed.ncbi.nlm.nih.gov/28398812/
To do that, one could use 3dttest++ directly as you do, or gen_group_command.py; the github code above shows a gen_group_command.py setup example (for a one-group comparison), which is essentially:
# an example of a one-group t-test, with Clustsim permutations
set some_path = # path to the top of tree of subject FMRI processing results
set all_subj = # list of subj IDs, like: ( sub-002 sub-004 ... )
set all_omit = # list of subj IDs to drop, like: ( sub-016 sub-030 ... )
gen_group_command.py \
-command 3dttest++ \
-write_script run.tt.equalRange.gain.tcsh \
-dsets ${some_path}/sub-*/sub-*.results/stats.sub-*_REML+tlrc.HEAD \
-dset_sid_list ${all_subj} \
-dset_sid_omit_list ${all_omit} \
-subj_prefix sub- \
-set_labels equalRange.gain \
-subs_betas Resp#1_Coef \
-verb 2 \
-options -mask group_mask.inter.nii.gz \
-Clustsim
… and this created a 3dttest++ command like:
set data_dir = /data/SSCC_NARPS/globus_sync/ademo_narps/data_23_ap_task_b
3dttest++ \
-prefix ttest++_result \
-setA equalRange.gain \
sub-002 "$data_dir/sub-002/sub-002.results/stats.sub-002_REML+tlrc.HEAD[Resp#1_Coef]" \
sub-004 "$data_dir/sub-004/sub-004.results/stats.sub-004_REML+tlrc.HEAD[Resp#1_Coef]" \
sub-006 "$data_dir/sub-006/sub-006.results/stats.sub-006_REML+tlrc.HEAD[Resp#1_Coef]" \
sub-008 "$data_dir/sub-008/sub-008.results/stats.sub-008_REML+tlrc.HEAD[Resp#1_Coef]" \
sub-010 "$data_dir/sub-010/sub-010.results/stats.sub-010_REML+tlrc.HEAD[Resp#1_Coef]" \
sub-014 "$data_dir/sub-014/sub-014.results/stats.sub-014_REML+tlrc.HEAD[Resp#1_Coef]" \
sub-018 "$data_dir/sub-018/sub-018.results/stats.sub-018_REML+tlrc.HEAD[Resp#1_Coef]" \
...
-mask group_mask.inter.nii.gz -Clustsim
… which is basically the same as what you have in your code (different files, obviously). The main output of this is a set of text files from which you get your appropriate clustersize estimate. Sidenote: when you consider applying it and displaying your outputs, consider the transparent thresholding display, and that masking might not be so necessary:
https://www.biorxiv.org/content/10.1101/2022.10.26.513929v2
(but the 3dttest++ command should include a mask, defining the volume within which to estimate a cluster).
I often like using the gen_group_command.py form, because of the option of including an “omit” (= drop or exclusion) list of subject IDs, and as the tests get more complicated, or if you want to switch between different AFNI stats programs, the smaller form of gen_group_command.py can be relatively convenient—but purely a style choice. You can see more examples of gen_group_command.py in the wild in the AFNI Codex:
https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/codex/main_toc.html
and/or by searching for “gen_group_command.py” in the searchbar in sidebar on the help page.
--------- OK, to your questions:
- I am surprised about the runtime, though that will depend on:
- the number of subj
- the number of voxels (so, high-res data will be more computationally expensive)
- the computer itself, both in terms of memory and CPUs
I wonder if that computer’s resources are being swamped, and it is using a lot of virtual memory, making it run much more slowly? That will be difficult to get around, since the data are the data, as it were…
I am curious: how many subjects do you have, and how many voxels are in your mask (e.g., “3dROIstats -quiet -nomeanout -nzvoxels -mask DSET_MASK DSET_MASK”
But note that this program can take advantage of having multiple CPUs on your computer, inherently, using OpenMP (several AFNI programs do). To see the number of threads or CPUs on your machine, you can run:
afni_system_check.py -disp_num_cpu
To see how many AFNI is using for OpenMP-able commands, run:
afni_check_omp
By default (no environment variable set), those should likely match; however, you can control it by setting an environment variable “OMP_NUM_THREADS” in your script or appropriate shell ~/.*rc file. To see if that is set (yes/no numerical output), you can type:
echo $?OMP_NUM_THREADS
… and if it is set, its value is displayed by:
echo $OMP_NUM_THREADS
To set it to 8 in a tcsh script, for example, you could put:
setenv OMP_NUM_THREADS 8
I don’t think there is a magic number for permutations. “Enough” would be estimated as having reached convergence in output: adding more doesn’t really alter results. I am afraid I have not really played around with this to know whether, say, 5000 might be enough often; and obviously you probably don’t want to run 5000, 6000 and 7000 and see how things look, because you are back at the initial issue again then.
-
Yes, the permutations will require a run-per-contrast, because the individual data is being flipped or swapped.
-
The inputs to the program here should likely be the volumes from the output of your single-subject regression modeling; if you were using afni_proc.py, then these would be from the “stats*HEAD” file in the results directory, selecting subbricks from your contrast of interest each time. All your input datasets need to be in the same space and on the same grids—typically that means they will be in standard space.
If you are performing a voxelwise analysis (which I assume you are, because clusters wouldn’t be used in an ROI-based one), then I might have expected blurring/smoothing to be performed as part of the processing?
If you were going to use ETAC, then blurring would not be applied during processing, because that gets applied in different stages within ETAC itself, but for standard t-tests or Clustim-y ones, I would have thought blurring would be applied already, typically (perhaps with multi-echo FMRI, there would be very little if any blurring, though, due to the higher TSNR).
If you blurred your data during processing, I wouldn’t blur it again; if your data were not blurred during processing, you could blur the stats before using 3dClustsim. Blurring will affect the cluster size, certainly—note that you would apply clustering and results-reporting on that blurred data, then.
–pt