Anomalistic z-scores

I have computed z-scores using 3dttest++ with the -Clustsim flag to correct for multiple comparisons. Everything seemed to be going well until I discovered that for a few of my analyses, I am getting clusters with a peak z-score of 13. For clusters where I get these peaks of 13, when I click around in the cluster, off the peak, I get more regular seeming z-scores (e.g. not whole numbers), and while the other voxels in the cluster often have pretty high z-scores (greater than 7.0), I haven’t seen voxels with z-scores above 10. Finally, the area of the clusters where the z-score is 13 is sometimes quite large (100s of voxels), not a single voxel/couple of voxels that peaks usually are. All of the other analyses run with the same script seem fine, producing totally normal z-scores. Could it be possible that AFNI is maxing out z-scores at 13? Or that AFNI sets any z-score above a certain threshold to 13?

Thanks for your help!

The help for 3dttest++ says that is correct:

  • The largest Tstat that will be output is 99.
  • The largest Zscr that will be output is 13.
    ++ FYI: the 1-sided Gaussian tail probability of z=13 is 6.1e-39.

I take your reference to the Gaussian probability as a subtle hint that I likely have an error. :slight_smile:

Thank you for your help!

After your low probability hint I did some trouble shooting to root out the source of the error and I am honestly stumped. Hoping you guys can help me figure it out.

Again, just to repeat the problem, for my group-level analyses, I am running ttests using 3dttest++ with the -Clustsim flag for correcting for multiple comparisons. For a subset of my results, I am getting voxels with a z-score of 13. I understand now that AFNI is just maxing out the z-score, so I am trying to figure out why I am getting these voxels with such a high z-score. Here’s what I’ve done so far:

  1. I thought perhaps the issue was that there was only 1 data point for the voxels where I am getting z=13 causing some kind of error when computing the standard deviation or standard error and leading to these maxed out z-scores. Double checked and there are 40 datapoints going into the analysis for these voxels (and for my study N=40).

  2. I thought perhaps for one subject the data at this voxel might just be an extreme outlier produced by some kind of error, thus leading to an extreme mean and an extreme z-score. However, I manually inspected all of the individual subjects values and they are all reasonable. Moreovoer, the mean value across subjects for these voxels (where z=13) is comparable to the mean of surrounding voxels (and at times even slightly lower) that have reasonable z-scores. For example the mean value for a z=13 voxel is 13.39, the mean value for an adjacent voxel is 16.62 but the z-score for that voxel is only 8.08.

  3. This made me think that something was going wrong in the computation of the z-scores themselves. So I re-ran 3dttest++ without the -Clustsim flag to get t-stats. As far as I can tell, the t-stats are normal. At the voxel referenced above where z=13 and the mean is 13.39, the t-stat is 10.65. At the adjacent voxel, where the mean value is 16.62, the t-stat is 10.69.

All of this would suggest that my crazy z-scores are coming at some point during the cluster correction, when converting the t-scores to z-scores. But for the life, of me, I can’t figure out what I am doing wrong. To make things more confounding, the vast majority of my analyses using this same script aren’t producing these wild z-scores.

I would really appreciate any thoughts or ideas you have and thank you so much for your help!

You might be reading more than I intended into my previous response, which was really only a report of the help output and numerical precision. It is useful though to take a close at your data, and we will never recommend otherwise. If you need help with the analysis, we will need at least the exact command you used to start. It may be, in fact, that your results are not anomalous. I will say, however, that we tend to interpret the statistical test as a threshold, and not as the main result. This is in contrast to other software packages. We will generally look at beta coefficients, interpretable as percent signal change in our recommended pipelines for FMRI in So whether a p-value is less than some very small number or another infinitesimal number is less important than the effect size. Others may disagree.

For some experiments, the model fits very well. Our class data includes an example for a single subject where the threshold does reach these kinds of heights. That data is for a somatosensory task, and the EPI data is an excellent fit. For your data, you may want to look closely at the underlying data and the fit to the model to see if it does indeed fit well. For any particular voxel, without taking into account clustering, you can take the data from all the individual subjects and analyze those with a separate statistical software to verify the basic t-test.

Hi Daniel,

Thank you so much for getting back to me. I am still not convinced that these z-scores of 13 are not some kind of error. Particularly because the t-values for these voxels produced by 3dttest++ do not seem to be extreme (or not more extreme then the t-values for neighboring voxels). For example, voxel A has a t-score of 10.65 and a z-score of 13. It’s neighbor, voxel B, has a t-score of 10.23, but a z-score of 8.13.

For what it’s worth, the command I am running is as follows:
3dttest++ -prefix belvthink_belvdesthink_1 -setA shellgame__belvthink_resampled -mask belvdesthink_1_dilate1+tlrc -Clustsim 8

Just to make sure it was not some kind of issue being produced with the cluster correction, I reran this analysis without -Clustsim but including the -toz flag as follows:
3dttest++ -prefix belvthink_belvdesthink_1 -setA shellgame__belvthink_resampled -toz -mask belvdesthink_1_dilate1+tlrc
This produced the same aberrant z-scores so it seems as though the error is occurring in the t to z conversion that is done with -toz. Can you tell me a little bit more about what happens in -toz? How does AFNI convert the t-scores to z-scores?

Thank you again for your help!

At this point, I (the author of 3dttest++) am a little confused.
Rather than kibitz your problem, I’ll need to look at it.

For that purpose, I’ll need access to your data and the commands used to run the analyses.
Either you can put it on a place where I can get at it (e.g., DropBox), or you can upload it to our server – I believe the correct address for that is still

Please follow the instructions closely. If the upload fails (it’s been a while since I used that link), then I’ll have to get someone like Daniel to chime in on the correct upload page.

Hi Regan,

I sent you a PM with instructions for uploading to the afni server. Please let me know if you have any trouble.

  • rick

Thanks all! Sending my data and code now!

Hi Regan,

Thanks for the data. It looks possible that the t to z conversion might have precision difficulty for large t-stats.

However the main problem is more simple than that. You have doubled the degrees of freedom in the commands, since the input is specified as:

-setA shellgame__belvthink_resampled

But these files are .HEAD and .BRIK pairs. The input should be specified as:

-setA shellgame__belvthink_resampled.HEAD

That will cut the DoF in half, and lead to more reasonable results.

  • rick