Motion/outlier censoring thresholds: uniform vs subject‑specific in afni_proc.py

Max · May 7, 2026, 10:27am

AFNI version info (afni -ver): Precompiled binary linux_ubuntu_22_64: Feb 03 2026 (Version AFNI_26. 0. 09 'Pupienus Maximus')

Hi AFNI experts,

I am preprocessing a dataset with 58 healthy participants, each with two acquisition sessions (T1 and T2), using afni_proc.py. I have a question about the most appropriate strategy for setting motion and outlier censoring thresholds.
I tested two approaches:

Subject‑specific thresholds

For some participants, I increased the motion censoring threshold (-regress_censor_motion) from 0.2 up to 0.3–0.5, and in a few cases I also increased the outlier censoring threshold (regress_censor_outliers) from 0.05 to 0.1.
With this approach, only 3 acquisition sessions fail to process correctly.

Uniform thresholds for all participants

I used the same thresholds for everyone:

motion: 0.3
outliers: 0.1

With this approach, 11 acquisition sessions fail because too many TRs are censored.

My questions:

Is it methodologically acceptable to use subject‑specific thresholds to reduce the number of excluded acquisition sessions, or does this introduce bias at the group level?
Is it preferable to keep uniform thresholds and accept the exclusion of sessions that do not meet those criteria?
Is there a recommended percentage of censored TRs beyond which an acquisition session should be excluded?
Do you have suggestions for datasets with multiple acquisition sessions per participant, where preprocessing consistency is particularly important?

Thanks a lot for any guidance!

ptaylor · May 7, 2026, 1:00pm

Howdy-

I will start by saying that this is a very large, open and important topic. In the FMRI Open QC Project, I would say that dealing with motion was a very wide-ranging issue across teams, and it is worth reading about different folks' approaches there. And I hope other folks weigh in here, but here are my 2 cents worth (NB: we might need a new analogy for this, now that pennies are gone ).

I see the purpose and potential benefits of subject-wise thresholding:

If subject A moved a lot more than subjects B, C and D, forcing me to make a higher cross-group threshold to accommodate A, then I am wasting information in lots of other subjects just because of that one subject. So, having a subject-wise threshold would use more information from each subject.
Also, in practice, subjects that move more probably will have more "remainder effects" of motion even after thresholding, because motion effects are so tricky to isolate. Therefore, a uniform threshold won't eliminate this bias.

However, I have never seen per-subject ("bespoke") thresholding performed, for either task or rest FMRI, and my guess is that the fear of bias is too strong. If motion varies strongly, particularly with an association to group membership or something in the task, then that will be its own problem. If there are a few random subjects with very high motion, then it is possible that for the uniformity of processing, one might end up with a smaller group overall because they would get excluded. If a lot of random subjects move, then that will always be a problem, but one might just need to use a higher threshold and acknowledge the higher risk of motion contamination in effects---for example, in studies of children or in populations prone to movement. Piloting, practice scanning and careful study design are some ways to help try to reduce motion, but in some cases it will be nearly unavoidable.

Q1: So, I don't think it will be acceptable in current practice to use subject specific thresholds. The fear of bias will be too strong.

Q2: Then yes, uniform thresholds are the way to go.

Q3: Slightly differently than percentages, we worry about "degrees of freedom" (DF) in the model. These will often be closely related, because each censored time point uses 1 DF, but other things contribute, like baseline regressors, motion regresors, any bandpassing, etc. We describe this in more detail here in Reynolds et al. (2024).

sidenote: if you are doing resting state processing, the common step of low frequency fluctuation (LFF)-style bandpassing to only keep data with ~0.01-0.1Hz will heeeeavily reduce your DF count, and we would argue often unnecessarily, along with several other groups. Please see the above Reynolds et al. for discussion on that.

We have some generic warning levels about censor fractions in the APQC HTML, which hopefully you are viewing (and see here for a playlist in the AFNI Academy YouTube channel about it). There is also DF table in the "regr" (regression) QC block, to help you see how your modeling is using up DFs for each subject. The above paper also shows how to use gen_ss_review_table.py to make group-wide summary tables of important processing/data properties like censor fraction, and you can quickly find the number+ID of subjects that have, say, a high amount of motion censoring.

We also recommend using GCOR as a way to see how much motion/global effect is still left in your data after censoring. The "corr_brain" images in the APQC HTML show basically the volumetric view of what gets summarized in GCOR (see the APQC HTML paper about that). If you have a huge amount of global correlation and/or odd/widespread patterns across tissue classes, then there is probably a problematic amount of motion left in. It is funny, though, sometimes some subjects with high amounts of motion censoring have relatively clean corr_brain maps and low GCOR, because their motion was limited to those found+censored events; others have high GCOR even with less thresholding, due to slow motion or drift or other effects. Life is pretty hard with FMRI, but the above are some useful ways to check for odd effects in the data, relatively systematically.

In the end, I don't think there is a universally accepted or necessarily "safe" number to choose for a censored fraction. It's really had to wager a guess. If more than 20% of time points were censored, one might start to be worried, esp. about the practical consideration of additional lingering effects? If you see Table 1 of Reynolds et al. (2023), from the Open QC Project, we thought that having less than 60% DFs or more than 20% time points censored would be reasonable exclusion criteria. We also included criteria based on the average amount of motion in the non-censored/remaining time points, as well as maximum motion. And we also had a threshold for GCOR at 0.2 (though we considered even 0.15 potentially a warning level) for participant exclusion. These all get at different attributes that motion (or other effects!) might produce in the data that could be problematic for interpreting physiological, neuronally-driven BOLD response, rather than motion/etc.

Q4: Similar to what was mentioned above, it is hard not to see processing consistency as viewed as important for not introducing a reason for differences across either groups or sessions. If you change processing with session, it will be impossible to entangle processing differences from session differences, and there will always be a question about that. If there are strong differences in subject behavior (e.g., motion) in the scanner in different sessions, that will also bring its own challenge about why that happened.

Hopefully some of the above has some useful starting point considerations.

--pt

rwcox123 · May 7, 2026, 5:56pm

If you have a population with a significant fraction of "difficult" subjects, you might just have to budget for some of them resulting in data that's just not good enough. I know that people who do FMRI on pre-teen children figure that at least 30% of their subjects will have to be discarded for motion.