3dttest++ input files and settings

Hi everyone!

I’m relatively new to AFNI and I’m trying to run a -ClustSim analysis using 3dttest++. I just have a few questions about parameters for the simulations.

  1. What is the recommended number of simulations? Running 1,000 simulations finished successfully after ~26 hours of runtime on our 2017 iMac with an i5 and 8GB of RAM. Our more recent run with 10,000 simulations has taken over 4 days so far and is only maybe half way finished (although I realized afterwards this may have been due to the computer going to sleep at some point). We have been changing the number of simulations by setting DAFNI_TTEST_NUMCSIM=10000. Is there an ideal number of simulations and/or perhaps a faster way to run them?

  2. Does a simulation need to be run for each contrast in the data set? For example, we have multiple condition contrast files setup for each participant. Will each one provide a unique minimum cluster size as a result of simulations? Presumably the answer is yes since the mask and functional files show data from different conditions being contrasted, but in published papers that use this -ClustSim option I tend to only see a single minimum cluster size reported.

  3. Should the functional files that are used in the 3dttest++ function be tlrc files or should they instead be spatially smoothed files? For example, we used the following code to turn our normal (‘tlrc’) files into spatially smoothed (‘ss’) files for each participant: 3dmerge -prefix WNss -1blur_fwhm 6 WN+tlrc. Which is better for -ClustSim?

Here is our current 3dttest++ code for one of the contrasts:

3dttest++ -ClustSim -prefix DN-Con-Results
-mask /Users/mafernan_admin/Desktop/anova_TEST/mask_union+tlrc.HEAD
-setA DN_Con MM03 “/Users/mafernan_admin/Desktop/anova_TEST/p03_DN+tlrc”
MM04 “/Users/mafernan_admin/Desktop/anova_TEST/p04_DN+tlrc”
MM05 “/Users/mafernan_admin/Desktop/anova_TEST/p05_DN+tlrc”
MM06 “/Users/mafernan_admin/Desktop/anova_TEST/p06_DN+tlrc”
MM07 “/Users/mafernan_admin/Desktop/anova_TEST/p07_DN+tlrc”
MM08 “/Users/mafernan_admin/Desktop/anova_TEST/p08_DN+tlrc”
MM09 “/Users/mafernan_admin/Desktop/anova_TEST/p09_DN+tlrc”
MM10 “/Users/mafernan_admin/Desktop/anova_TEST/p10_DN+tlrc”
MM11 “/Users/mafernan_admin/Desktop/anova_TEST/p11_DN+tlrc”
MM14 “/Users/mafernan_admin/Desktop/anova_TEST/p14_DN+tlrc”
MM15 “/Users/mafernan_admin/Desktop/anova_TEST/p15_DN+tlrc”
MM16 “/Users/mafernan_admin/Desktop/anova_TEST/p16_DN+tlrc”
MM17 “/Users/mafernan_admin/Desktop/anova_TEST/p17_DN+tlrc”
MM18 “/Users/mafernan_admin/Desktop/anova_TEST/p18_DN+tlrc”
MM19 “/Users/mafernan_admin/Desktop/anova_TEST/p19_DN+tlrc”
MM20 “/Users/mafernan_admin/Desktop/anova_TEST/p20_DN+tlrc”
MM21 “/Users/mafernan_admin/Desktop/anova_TEST/p21_DN+tlrc”
MM101 “/Users/mafernan_admin/Desktop/anova_TEST/p101_DN+tlrc”

Hi, Brady-

A couple of sidenotes to start, which I hope might be useful (apologies if they aren’t!):

From this recent paper:
… there is github code for several processing examples of various AFNI commands:
… which might be useful. In particular, the use of permutations for clustersize estimation with 3dClustsim in 3dttest++ that was described here (see Fig 5):

To do that, one could use 3dttest++ directly as you do, or gen_group_command.py; the github code above shows a gen_group_command.py setup example (for a one-group comparison), which is essentially:

# an example of a one-group t-test, with Clustsim permutations

set some_path = # path to the top of tree of subject FMRI processing results
set all_subj  = # list of subj IDs, like: ( sub-002 sub-004 ... )
set all_omit  = # list of subj IDs to drop, like: ( sub-016 sub-030 ... )

gen_group_command.py                                                         \
    -command             3dttest++                                           \
    -write_script        run.tt.equalRange.gain.tcsh                         \
    -dsets               ${some_path}/sub-*/sub-*.results/stats.sub-*_REML+tlrc.HEAD \
    -dset_sid_list       ${all_subj}                                         \
    -dset_sid_omit_list  ${all_omit}                                         \
    -subj_prefix         sub-                                                \
    -set_labels          equalRange.gain                                     \
    -subs_betas          Resp#1_Coef                                         \
    -verb                2                                                   \
    -options             -mask group_mask.inter.nii.gz                       \

… and this created a 3dttest++ command like:

set data_dir = /data/SSCC_NARPS/globus_sync/ademo_narps/data_23_ap_task_b

3dttest++                                                                                   \
   -prefix ttest++_result                                                                   \
   -setA equalRange.gain                                                                    \
      sub-002 "$data_dir/sub-002/sub-002.results/stats.sub-002_REML+tlrc.HEAD[Resp#1_Coef]" \
      sub-004 "$data_dir/sub-004/sub-004.results/stats.sub-004_REML+tlrc.HEAD[Resp#1_Coef]" \
      sub-006 "$data_dir/sub-006/sub-006.results/stats.sub-006_REML+tlrc.HEAD[Resp#1_Coef]" \
      sub-008 "$data_dir/sub-008/sub-008.results/stats.sub-008_REML+tlrc.HEAD[Resp#1_Coef]" \
      sub-010 "$data_dir/sub-010/sub-010.results/stats.sub-010_REML+tlrc.HEAD[Resp#1_Coef]" \
      sub-014 "$data_dir/sub-014/sub-014.results/stats.sub-014_REML+tlrc.HEAD[Resp#1_Coef]" \
      sub-018 "$data_dir/sub-018/sub-018.results/stats.sub-018_REML+tlrc.HEAD[Resp#1_Coef]" \
   -mask group_mask.inter.nii.gz -Clustsim

… which is basically the same as what you have in your code (different files, obviously). The main output of this is a set of text files from which you get your appropriate clustersize estimate. Sidenote: when you consider applying it and displaying your outputs, consider the transparent thresholding display, and that masking might not be so necessary:
(but the 3dttest++ command should include a mask, defining the volume within which to estimate a cluster).

I often like using the gen_group_command.py form, because of the option of including an “omit” (= drop or exclusion) list of subject IDs, and as the tests get more complicated, or if you want to switch between different AFNI stats programs, the smaller form of gen_group_command.py can be relatively convenient—but purely a style choice. You can see more examples of gen_group_command.py in the wild in the AFNI Codex:
and/or by searching for “gen_group_command.py” in the searchbar in sidebar on the help page.

--------- OK, to your questions:

  1. I am surprised about the runtime, though that will depend on:
  • the number of subj
  • the number of voxels (so, high-res data will be more computationally expensive)
  • the computer itself, both in terms of memory and CPUs
    I wonder if that computer’s resources are being swamped, and it is using a lot of virtual memory, making it run much more slowly? That will be difficult to get around, since the data are the data, as it were…
    I am curious: how many subjects do you have, and how many voxels are in your mask (e.g., “3dROIstats -quiet -nomeanout -nzvoxels -mask DSET_MASK DSET_MASK”

But note that this program can take advantage of having multiple CPUs on your computer, inherently, using OpenMP (several AFNI programs do). To see the number of threads or CPUs on your machine, you can run:

afni_system_check.py -disp_num_cpu

To see how many AFNI is using for OpenMP-able commands, run:


By default (no environment variable set), those should likely match; however, you can control it by setting an environment variable “OMP_NUM_THREADS” in your script or appropriate shell ~/.*rc file. To see if that is set (yes/no numerical output), you can type:


… and if it is set, its value is displayed by:


To set it to 8 in a tcsh script, for example, you could put:


I don’t think there is a magic number for permutations. “Enough” would be estimated as having reached convergence in output: adding more doesn’t really alter results. I am afraid I have not really played around with this to know whether, say, 5000 might be enough often; and obviously you probably don’t want to run 5000, 6000 and 7000 and see how things look, because you are back at the initial issue again then.

  1. Yes, the permutations will require a run-per-contrast, because the individual data is being flipped or swapped.

  2. The inputs to the program here should likely be the volumes from the output of your single-subject regression modeling; if you were using afni_proc.py, then these would be from the “stats*HEAD” file in the results directory, selecting subbricks from your contrast of interest each time. All your input datasets need to be in the same space and on the same grids—typically that means they will be in standard space.

If you are performing a voxelwise analysis (which I assume you are, because clusters wouldn’t be used in an ROI-based one), then I might have expected blurring/smoothing to be performed as part of the processing?

If you were going to use ETAC, then blurring would not be applied during processing, because that gets applied in different stages within ETAC itself, but for standard t-tests or Clustim-y ones, I would have thought blurring would be applied already, typically (perhaps with multi-echo FMRI, there would be very little if any blurring, though, due to the higher TSNR).

If you blurred your data during processing, I wouldn’t blur it again; if your data were not blurred during processing, you could blur the stats before using 3dClustsim. Blurring will affect the cluster size, certainly—note that you would apply clustering and results-reporting on that blurred data, then.


Hi Brady,

Paul has many good things to say of course. One thing to focus on is simply that your computer, in its current state, is not really good enough for that computation, even at 1000 iterations. Taking more than a day for 1000 iterations means it is already thrashing, spending most time swapping memory in and out because it does not have enough RAM for what it is doing. Doing 10 times as many iterations will possibly take far more than 10 times as long, because memory use will increase, and the thrashing will get worse.

Exactly how big are those inputs?

3dinfo -n4 -datum p04_DN+tlrc

For comparison, I went to AFNI_data6/group_results and modified s5.ttest.paired to be more like your command:

  • added -Clustsim and maybe -DAFNI_TTEST_NUMCSIM=1000
  • deleted the -setB line, so there would be 20 input dsets
    Running this took 5 minutes on my 16 GB RAM laptop, and 7 minutes on my weaker 8 GB RAM laptop. The data resolution might be lower than yours, but to put it another way, if your computer has enough resources, the time should be in that range (scaled by the resolution or the number of voxels, in some way).

Consider that for 1000 or 10000 iterations, the program is running t-tests and cluster simulations on the residuals. That means storing a time series of length 1000 or 10000 in RAM to do so. Going to 10000, my 8 GB RAM laptop started thrashing somewhere around 2000. There is no way it would make it to 10000.

Maybe there are other processes on it that could be shut down, such as web browsers or MS programs (Word, Powerpoint, etc). Those use a lot of RAM. But it seems highly unlikely that your 8 GB iMac would be able to run to 10000. And if this data is higher resolution (my test was with 2 mm voxels), game over.

Because it needs to store those iteration results, the program will need substantially more RAM than a simple t-test. Do you have any better machine it could be run on?

  • rick

For reference, my 16 GB laptop cannot complete 10000 iterations, either. It gets about 2/3 of the way through (the randomsign step) before running out of memory and crashing. The number of threads does not seem to strongly affect this (some programs scale memory use by the number of threads, this does not appear to).

  • rick

Hi ptaylor,

First of all thank you for the highly detailed reply, especially considering the holiday! I also appreciate the extra resources for transparent thresholding and the helpful Python commands. I’ll read more into the former and let you know if I have questions. As for the Python commands, unfortunately I can’t install Python on our machine because it’s an older iMac that requires an update to do so, and some of the people in the lab are apprehensive about updating the macOS when we are mid-analysis. So for now I suppose I’ll stick with going straight to the 3dttest++ command itself! Thanks again for the suggestions.

Here are my replies in turn:

  1. Great, so if the number of iterations doesn’t matter much then I’ll just play it safe and leave it on the default number (which I believe is 10,000). Since you asked, there are 18 subjects in the analysis and 2,640,253 voxels in the mask. I’ll paste the entire 3dttest++ output at the end of this post. I also confirmed that the iMac I’m using is a quad-core CPU and the commands you offered also confirm there are 4 cores being used. The analyses I was running earlier that took multiple days seemed to be because the computer was going to sleep after some time and pausing the analysis. Once I changed some system settings to disallow this, the computer was able to run the analysis in 2-3 hours. This is still longer than expected but is more doable. To be clear, the analysis is working without any issues, it just takes some time. I’ll reply to Rick’s posts with more details on this front.

  2. Gotcha - one simulation per contrast.

  3. As far as I know we don’t plan on doing ETAC, so I will try to use the smoothed data files. I’ll also try and look back in our procedure to see if any smoothing was done earlier (I inherited this project part way along the analysis pipeline).

I was following a post on Andy’s Brain Blog and I had a few new questions if you don’t mind. I’m referring to this guide: https://andysbrainbook.readthedocs.io/en/latest/AFNI/AFNI_Short_Course/AFNI_07_GroupAnalysis.html

  1. When creating a composite mask, is it best to use ‘union’ or ‘intersect’? In the linked tutorial, the author used the ‘-union’ command but named the resulting mask ‘intersection’
  1. Also I’m wondering how to use / read the output from the -ClustSim analysis. I understand the differences between NN1 vs. NN2, bisided vs. one-sided etc., but what I can’t figure out is how to ‘read’ the table of cluster thresholds. From the blog guide linked above, it seems that you would typically want to set alpha to .05 by using the column labelled ‘.05000’. But then what are the row headers for? How can you use alpha = .05 and p < .001 at the same time? And further, why do smaller p-values in the row headers seem to require smaller cluster thresholds? To me, it seems at face value that you can set lower p-values which seems more robust while at the same time having it be easier to find significant clusters because the minimum cluster size gets smaller and smaller as the rows go down.

  2. Finally, when I load the clusterized mask and functional data into AFNI, I presumably use the zscore as the ULAY and OLAY rather than the ‘mean’ option. To produce images of the corrected data based on the newly found cluster thresholds, do I then use the ‘Clusterize’ button and set the minimum cluster size and NN option there too, then set p = .001? Or does the zscore data loaded into the OLAY and ULAY already account for cluster thresholding naturally without having to use the Clusterize button?

Finally, if it helps you to see more details, here is the code and output I got when using 3dttest++:

v1040-wn-rt-b-34-154:anova_TEST mafernan_admin$ AFNI_TTEST_NUMCSIM=10000
v1040-wn-rt-b-34-154:anova_TEST mafernan_admin$ DAFNI_TTEST_NUMCSIM=10000
v1040-wn-rt-b-34-154:anova_TEST mafernan_admin$ cd /Users/mafernan_admin/Desktop/anova_TEST
v1040-wn-rt-b-34-154:anova_TEST mafernan_admin$ 3dttest++ -ClustSim -prefix DW-Con-Results                    \
>            -mask /Users/mafernan_admin/Desktop/anova_TEST/mask_union+tlrc.HEAD \
>            -setA DW_Con MM03 "/Users/mafernan_admin/Desktop/anova_TEST/p03_DW+tlrc" \
>                          MM04 "/Users/mafernan_admin/Desktop/anova_TEST/p04_DW+tlrc" \
>                          MM05 "/Users/mafernan_admin/Desktop/anova_TEST/p05_DW+tlrc" \
>                          MM06 "/Users/mafernan_admin/Desktop/anova_TEST/p06_DW+tlrc" \
>                          MM07 "/Users/mafernan_admin/Desktop/anova_TEST/p07_DW+tlrc" \
>                          MM08 "/Users/mafernan_admin/Desktop/anova_TEST/p08_DW+tlrc" \
>                          MM09 "/Users/mafernan_admin/Desktop/anova_TEST/p09_DW+tlrc" \
>                          MM10 "/Users/mafernan_admin/Desktop/anova_TEST/p10_DW+tlrc" \
>                          MM11 "/Users/mafernan_admin/Desktop/anova_TEST/p11_DW+tlrc" \
>                          MM14 "/Users/mafernan_admin/Desktop/anova_TEST/p14_DW+tlrc" \
>                          MM15 "/Users/mafernan_admin/Desktop/anova_TEST/p15_DW+tlrc" \
>                          MM16 "/Users/mafernan_admin/Desktop/anova_TEST/p16_DW+tlrc" \
>                          MM17 "/Users/mafernan_admin/Desktop/anova_TEST/p17_DW+tlrc" \
>                          MM18 "/Users/mafernan_admin/Desktop/anova_TEST/p18_DW+tlrc" \
>                          MM19 "/Users/mafernan_admin/Desktop/anova_TEST/p19_DW+tlrc" \
>                          MM20 "/Users/mafernan_admin/Desktop/anova_TEST/p20_DW+tlrc" \
>                          MM21 "/Users/mafernan_admin/Desktop/anova_TEST/p21_DW+tlrc" \
>                          MM101 "/Users/mafernan_admin/Desktop/anova_TEST/p101_DW+tlrc"
++ 3dttest++: AFNI version=AFNI_19.2.24 (Sep 13 2019) [64-bit]
++ Authored by: Zhark++
++ Number of -Clustsim threads set to 4
 + Default clustsim prefix set to 'TTnew'
++ 2640253 voxels in -mask dataset
++ option -setA :: processing as LONG form (label label dset label dset ...)
++ have 18 volumes corresponding to option '-setA'
++ loading -setA datasets
*+ WARNING: 3dttest++ -setA :: 7150 vectors are constant
++ t-testing:0123456789.0123456789.0123456789.0123456789.0123456789.!
 + skipped 7150 voxels completely for having constant data
++ ---------- End of analyses -- freeing workspaces ----------
++ Creating FDR curves in output dataset
*+ WARNING: Smallest FDR q [1 DW_Con_Zscr] = 0.789136 ==> few true single voxel detections
 + Added 1 FDR curve to dataset
++ Output dataset ./DW-Con-Results+tlrc.BRIK
++ Output dataset ./TTnew.resid.nii
++ ================ Starting -ClustSim calculations ================
 + === temporary files will have prefix TTnew ===
 + === running 4 -randomsign jobs (2500 iterations per job) ===
 + === creating 52,805,060,000 (53 billion) bytes of pseudo-data in .sdat files ===
 + --- 3dClustSim reads .sdat files to compute cluster-threshold statistics ---
 + --- there is 8,589,934,592 (8.6 billion) bytes of memory on your system ---
*+ WARNING: --- runs may be slow (or crash) due to memory needs :( ---
++ 3dttest++: AFNI version=AFNI_19.2.24 (Sep 13 2019) [64-bit]
++ Authored by: Zhark++
++ 2640253 voxels in -mask dataset
++ option -setA :: processing as SHORT form (all values are datasets)
++ have 18 volumes corresponding to option '-setA'
++ random seeds are 964209600 1278368865
++ opened file ./TTnew.0000.sdat for output
++ loading -setA datasets
*+ WARNING: 3dttest++ -setA :: 7150 vectors are constant
++ t-test randomsign:0123456789.0123456789.0123456789.0123456789.0123456789.!
++ saving main effect t-stat MIN/MAX values in ./TTnew.0000.minmax.1D
++ output short-ized file ./TTnew.0000.sdat
 + 3dttest++ ===== simulation jobs have finished (600351.6 s elapsed)
 + 3dttest++ ===== starting 3dClustSim A: elapsed = 600362.7 s
++ 3dClustSim: AFNI version=AFNI_19.2.24 (Sep 13 2019) [64-bit]
++ Authored by: RW Cox and BD Ward
++ Loading -insdat datasets
++ -insdat had 10000 volumes = new '-niter' value
++ Startup clock time = 0.2 s
++ Using 4 OpenMP threads
++ Clock time now = 5279.9 s
++ Command fragment to put cluster results into a dataset header;
 + (also echoed to file TTnew.CSimA.cmd for your scripting pleasure)
 + Append the name of the datasets to be patched to this command:
 3drefit -atrstring AFNI_CLUSTSIM_NN1_1sided file:TTnew.CSimA.NN1_1sided.niml -atrstring AFNI_CLUSTSIM_NN2_1sided file:TTnew.CSimA.NN2_1sided.niml -atrstring AFNI_CLUSTSIM_NN3_1sided file:TTnew.CSimA.NN3_1sided.niml -atrstring AFNI_CLUSTSIM_NN1_2sided file:TTnew.CSimA.NN1_2sided.niml -atrstring AFNI_CLUSTSIM_NN2_2sided file:TTnew.CSimA.NN2_2sided.niml -atrstring AFNI_CLUSTSIM_NN3_2sided file:TTnew.CSimA.NN3_2sided.niml -atrstring AFNI_CLUSTSIM_NN1_bisided file:TTnew.CSimA.NN1_bisided.niml -atrstring AFNI_CLUSTSIM_NN2_bisided file:TTnew.CSimA.NN2_bisided.niml -atrstring AFNI_CLUSTSIM_NN3_bisided file:TTnew.CSimA.NN3_bisided.niml 
++ Environment variable AFNI_DONT_LOGFILE already set to 'NO'. Value of 'YES' from /Users/mafernan_admin/.afnirc is ignored. 
To kill such warnings Set AFNI_ENVIRON_WARNINGS to NO
++ 3drefit: AFNI version=AFNI_19.2.24 (Sep 13 2019) [64-bit]
++ Authored by: RW Cox
++ Processing AFNI dataset ./DW-Con-Results+tlrc.HEAD
 + atrcopy
++ applying attributes
++ 3drefit processed 1 datasets
++ 3dttest++ ----- Cleaning up intermediate files:
 + 3dttest++ =============== -ClustSim work is finished :) ===============
++ ----- 3dttest++ says so long, farewell, and happy trails to you :) -----

Hi Rick,

Thanks for your insight into why this analysis may be taking so long! To clarify, the analyses I run do work in the end, they just take some time. I also discovered that the computer was going to sleep after some time left unattended, so the multi-day analysis I mentioned earlier was a false alarm. After adjusting the system settings to avoid sleep mode, I was able to do an analysis in 1-3 hours. This is obviously still longer than the ~5-10 minutes you seem to be getting, but at least the analyses are working.

As I understand it, the default number of simulations is in 3dttest++ is 10,000. I was able to run an analysis with this default overnight and it seemed to take about 2 hours and converged just fine in the end. The output of 3dttest++ does mention that it requires something like 52 GB of RAM for the analysis, and that my system only has around 8 GB (see the code pasted in my other reply to ptaylor). As for using a better machine, there are some logistical / ethical rules on our end in terms of data privacy, so we hope to keep the data on this one machine. For now then, I’ll ask our IT department about some RAM upgrades to this iMac and otherwise I will keep chugging along with an analysis finished every 2-3 hours. I don’t have that many contrasts, so it shouldn’t take more than a few days.

  • Brady

Hi, Brady-

OK, understood about the hardware constraints.

Re. 1) Wow, that is a huge number of voxels in the mask! Can I ask what spatial resolution the output data has? For a 2.5 mm isotopic EPI output dataset in the Bootcamp example, there are only 68,809 voxels. So, are your output voxels around 0.75mm isotropic or so?
→ and sorry to hear your computer got bored with AFNI running and would go to sleep, but glad that was resolvable and made the processing time reasonable.

Re. 4) There are different routes to take with forming a group mask. I guess the main idea is that you want something representative of the group. Two ways contending to do this might be:
A) use the strict intersection of masks across the group
B) use something less strict like voxels that overlap 70% of the group masks.
(“Union” might be a bit overly generous; a higher percentile overlap might be preferable, as above.) In both cases, you would probably be using the mask_epi_anat*HEAD dsets, if using afni_proc.py. I guess you might consider A preferable if the per-subject maps themselves are less strict, and B if they are more strict. This is a fuzzy response, perhaps, but there are likely multiple reasonable ways to go (“semi-arbitrary” choice).

  • in the NARPS processing in the “Highlight, Don’t Hide…” paper github code, the afni_proc.py processing had the ‘blur’ block precede the ‘mask’ block; making the mask there, we then used strict intersection to generate the ‘group level’ mask for cluster simulation:

3dmask_tool                                                                  \
    -prefix  group_mask.inter.nii.gz                                         \
    -frac    1.0                                                             \
    -input   ${all_mask}

  • in another recent processing project, the afni_proc.py processing had the ‘mask’ block precede the ‘blur’ block; we didn’t create a group-level mask for it, but we might lean toward a 70% overlap then:

3dmask_tool                                                                  \
    -prefix  group_mask.7.nii.gz                                         \
    -frac    0.7                                                             \
    -input   ${all_mask}

Re. 5) There is a bit of guidance on reading the table output here:
… in Fig. 2 (and a bit more in Fig. 3).

In the cluster table output, there are simulation-based cluster thresholds (as voxel counts) for lots of different scenarios. A scenario in this case is: what per voxel p-value do you want to threshold at, and then what FWE value (which is what “alpha” is) do you want the final (adjusted) result to approximately be at? The p-value choices are listed in the row labels, and the FWE (or alpha) values are listed in the column labels. Typically, people want FWE of 5%, which is “0.05” there—that is pretty standard. On the p-value side, people often tend to like 0.001 because… well, because. The paper cited in #4 points out how adjustments need to be made if using software that does pairs of one-sided testing separately; if you don’t, you end up with more-than-doubled false positive rate, from what you think you have. Typically bisided would be most appropriate for most analyses. (But again, the “Highlight, Don’t Hide…” paper discusses why visualizing sub-threshold stuff is still very important.)

The question of, “Why do thresholds get smaller as the pvalue shrinks?” comes up a lot. It does seem counterintuitive, but recall what is being done here: we say that we want to threshold the voxels at some p-value (say, 0.001), but because we have so many voxels being thresholded at one, we don’t think that that 0.001 accurately reflects the false positive rate in our results. There is a call to do a “multiple comparisons adjustment”. Clustering is an ad hoc way of doing that, suggesting that clumps of low-p values should be more believable than individual ones (the latter which are more likely due to scattered noise, say). So, this is a 2 step process:

  • first, threshold by p-value and get “islands” of candidate locations
  • then, filter to those to keep only ones “big enough” to be believable at a given FWE
    So, the cluster size needed to achieve a given FWE depends on the first thresholding that was done. And as the thresholds in the first “cut” get stricter, all the islands get smaller; therefore, the clustersize threshold to choose among them also gets smaller.
    Thus, as you move “up” an FWE column with shrinking p-values, the clustersize threshold shrinks, because all the islands you are filtering have also shrunk.

Re. 6) I’m a little confused by the terminology here. If I wanted to visualize the output dataset from “3dttest++ -Clustsim …”, I would

  • Use the UnderLay button to load the anatomical reference (say, template data) as underlay
  • Use the OverLay button to load the output dataset as the overlay; this has multiple subbricks in it, and I would:
    • go to the “OLay” selector in the overlay panel and use the mean volume as the overlay (the colors to view)
    • go to the “Thr” selector in the overlay panel and use the Zscr volume as the thresholding volume (to select voxels to see by significance).
  • Then you can select the threshold value based on your p-value threshold of choice (e.g., rightclick on Thr next to the “A” and “B” buttons, and “Set p-value”)
  • Then click Clusterize and enter your chosen NN, sidedness, and cluster-output value from the table (consistent with your p-value threshold).
    That would be a standard view of your clusterized results. BUT, having read the “Highlight, Don’t Hide…” paper, I would also:
  • Turn on the “alpha” and “boxed” functionality by clicking the “A” and “B” buttons, because sub-threshold results matter, too.
  • and actually, I might consider calculating a straight-up t-test without masking, and visualizing its effect estimate and T-stat volume, applying the thresholding+clustering and “A” and “B” buttons, and viewing the results there, which should look the same except I could see everywhere in the whole FOV—that way, I would be more certain that no artifacts or anything are leaking into my brain results.


Hi ptaylor,

Thanks once again for the detailed responses. It’s been super helpful.

Re 1) Sorry but I am just now coming in to take over the project at the cluster forming stage, so I’m not sure the spatial resolution the output data has. Is there code I can run to check?

Re 4) My (perhaps oversimplified) takeaway from this point was that if you go ‘mask then blur’, it’s reasonable to use intersection, and if you go ‘blur then mask’, it’s reasonable to use 70% overlap. Our processing pipeline was the latter scenario, so I’ve created masks using 70% overlap now which certainly includes fewer voxels than ‘union’ did (it is 1,700,999 voxels now, to be exact) and therefore the analysis now runs a bit quicker. However, your exclamation at the number of voxels in the mask makes me wonder if something went wrong in an earlier stage of pre-processing. Are there any earlier stages I should be checking for errors perhaps?

Re 5) Okay great, thank you for all of these resources! In Figure 3 of the paper you linked, it talks about a conversion process when running two one-sided tests instead of a single bi-sided test. Am I correct in assuming that if one were to run a single bi-sided test (e.g., using the ‘TTnew.CSimA.NN2_bisided’ cluster simulation table output file) then no conversion is needed? Meaning, one can use the table as-is?

Re 6) Perfect, this was exactly what I needed to form the images! That paper you linked was also very interesting (Figure 1 and Box 1 were especially illuminating). I’ll certainly consider featuring images with transparent thresholding for the sake of reproducibility. I liked the approach of Figure 1 where you can show a ‘normal’, easy-to-read activation pattern, and another set of images that show all the data. Thanks for the suggestion!

Hi, Brady-

Re. 1) You can run:

3dinfo -ad3 DATASET

… to see the voxelsize. Also, you might check:

3dinfo -n4 DATASET

… to see the dimensions (3 spatial, 1 temporal). Perhaps just:

3dinfo DATASET

would be useful, to view lots of the header info/properties.

Re. 4) That takeaway sounds reasonable, as a general rule of thumb. I would always try to look at the output mask and see that it is sensical: overlay it on your reference template, and see that it has good coverage, esp. in regions you care about for your analysiss.
Flipping through the 18 individual masks that comprise it would also be useful—those are shown in the APQC HTML output: in va2t if using a template (which you likely are); in ve2a if only using subject anatomical as the final “space”; in vorig if not even using a subject anatomical (rare bird). See if anyone pops out at a weird outlier, or if parts of the brain you focus on in your analysis don’t get included in it (e.g., due to signal dropout—looking at you, subcortical nuclei!). If you have a glob over all the masks you are using, you could just use that same glob to open them in the AFNI GUI and flip through them, too.

In the APQC HTML, you can check pretty quickly on each subject’s EPI-anatomical alignment (“ve2a” block) and anatomical-template alignment (“va2t” block). I would definitely check those out.

Re. 5) In most of MRI, hypotheses are of the style “is X nonzero here?” or “is the response to stimulus A different than to stimulus B here”? Those each require 2-sided testing to address: Either checking if X>0 or X<0, or if A>B or A<B, respectively. Some software do each of those subcomparisons separately, outputting two separate volumes. You wouldn’t be able to use a single “bisided” clustering approach on that output directly—you would have to merge them together and do so, or (more dangerously), test each one separately with adjusted values from what you might expect because of the separate testing. Grumble.
If you are using a stats program in AFNI, you should generally be getting a single volume out, and you could check it for either “side” of comparison separately, but why? Using “bisided” is probably the most appropriate thing (just different than 2sided because in the former, positive and negative clumps are considered separately, even if they share boundaries). In such a case, you asked a 2-sided question, and used a 2-sided approach to answer it—and there should be no need for conversion (which just comes about with separating the question into pieces—grumble again).

Re. 6) Cool, glad that is useful.


Hi ptaylor,

Thanks once again for the help. We’re getting much closer to resolved!

Re 1): I ran those pieces of code you provided and here are the following outputs:

Running the following code on a sample subject data file in original space…

3dinfo -ad3 DATASET


2.750000 2.750000 5.000000

3dinfo -n4 DATASET


80 80 26 1

3dinfo DATASET


++ 3dinfo: AFNI version=AFNI_19.2.24 (Sep 13 2019) [64-bit]

Dataset File: DW+orig
Identifier Code: AFN_Wz_yXl6e2Jv-HGG0NxFH6Q Creation Date: Fri Jun 24 15:34:38 2022
Template Space: ORIG
Dataset Type: Intensity (-fim)
Byte Order: LSB_FIRST [this CPU native = LSB_FIRST]
Storage Mode: BRIK
Storage Space: 665,600 (666 thousand) bytes
Geometry String: “MATRIX(2.75,0,0,-111.1003,0,2.75,0.04542,-112.6167,0,-0.024981,5,-12.63179):80,80,26”
Data Axes Tilt: Oblique (0.521 deg. from plumb)
Data Axes Approximate Orientation:
first (x) = Right-to-Left
second (y) = Anterior-to-Posterior
third (z) = Inferior-to-Superior [-orient RAI]
R-to-L extent: -111.100 [R] -to- 106.150 [L] -step- 2.750 mm [ 80 voxels]
A-to-P extent: -112.617 [A] -to- 104.633 [P] -step- 2.750 mm [ 80 voxels]
I-to-S extent: -12.632 [I] -to- 112.368 [S] -step- 5.000 mm [ 26 voxels]
Number of values stored at each pixel = 1
– At sub-brick #0#0’ datum type is float: -3.12815 to 3.19592

So all together it looks like my original voxel size is 2.75 X 2.75 X 5 mm3. Which means my voxel size was similar to the example data set you mentioned earlier at 2.5mm, yet the example data set you mentioned had 68,809 voxels whereas my 70% intersection mask contains ~1.7 million voxels. Do you think something possibly went wrong in the processing pipeline that would lead to the ~1.7 million voxels I’m seeing in my 70% intersection mask?

Re 4): My predecessors did not use afni_proc.py to do the pre-processing. Rather, they used an old guide that was provided by researchers from Baycrest. So, I do not have the QC output from afni_proc.py. Is there a way I can obtain that without needing to redo everything?

If it helps, I did take your suggestion and checked that the intersection mask seems okay visually. We are actually not using an anatomical template, but rather an average anatomical image from the 18 subjects. I took some pictures to show you (see attached images below) that show the 70% intersection mask seems to fit well within the average anatomical image and when scrolling through the 3 planes in AFNI GUI I did not notice anything obvious missing or extending out past the skull.

Re 5): That makes complete sense now, thanks! Since we are comparing two conditions, I will stick with single, bisided tests without any conversion between tables.

Hi, Brady-

Cool, thanks.

Re 1) That must be an input dataset you have included the 3dinfo data for? The mask+underlay images you show appear to have much, much higher resolution (and a spatially isotropic one, too). It is great to have the initial resolution information, but I suspect that the processing pipeline included a very large upsampling factor, then? Frankly, that output looks like an antomical resolution, or higher, consistent with.

The 2.75x2.75x5.00 mm**3 voxels are about 2.5 times the size of the ones I listed, so the similar whole brain mask would have less than 30k voxels, if made from that. Again, the images you show appear to have very, very high spatial resolution. I wonder if part of the reason for upsampling is try to counter some of the anisotropicity of the voxels?

While some software upsample highly by default, I am not quite sure what purpose it would serve, esp. to this high a resolution—upsampling won’t add information, and it will greatly increase computational time and disk space storage. Is there a particular analysis that would require this? It is surprising to me.

Re 4) Hmm, if you had a one-to-one replacement files for each item in the AP results directory that goes into the APQC HTML, that might be possible to kinda retrofit with the generating script, but I doubt that can be done very easily… Do you have a lot of intermediate/output files from processing still? Frankly, setting up a new AP command would likely be faster, because you just need the input data. But I understand that might not be a popular choice.

That mask does show quite reasonable coverage, indeed. (And using any anatomical reference volume to define a final space is OK, in terms of processing setup and execution).

Re 5) Sounds good.