Preprocessing for intersubject correlation

Dear AFNI experts,

During my experiment, participants watch 36 short movie clips (average duration 32 seconds) over three runs (i.e. 12 clips in each run) and are asked to give ratings after each clip. The order of movie clips is pseudo-randomised across participants and runs.

I am planning to follow the preprocessing as described by Chen et al. (2016): and have some questions about it:

If I understand the description of the “-regress_apply_mot_types demean deriv” option correctly, this adds the demeaned motion regressors as well as the derivatives to the regression model. Is it correct that the derivatives model the potential scanner drift over time so that this is accounted for? Further, is it correct that only the motion parameters are demeaned, but not the BOLD time series itself?

After the preprocessing of the three runs, I combine them and then remove all volumes associated with fixation between the clips and ratings participants are asked to give after each clip. Additionally, the volumes are re-ordered so that the resulting time course reflects the same order of movie clip presentation across participants. This then results in a concatenated time series only associated with the display of the clips itself which is later used in the 3dTcorrelate command. I think that for computing the intersubject correlation, I should the option “-polort -1” given that my data is preprocessed and concatenated. Is that correct? But this assumes that the data is detrended within the preprocessing, so presumably with the “-regress_apply_mot_types demean deriv” option?

Given that I concatenate my data, would it be recommendable to demean the time course associated with each clip (using the mean of the volumes associated with each clip) before merging the volumes for all clips together?

Lastly, should I include bandpass filtering to the preprocessing? If so, at what stage would it be best implemented?

I am looking forward to your thoughts on the issues raised above, any help is highly appreciated!

Many thanks,

Hi, Stef-

Gang and I have chatted a bit about this, generating the following thoughts:

Your data set is a little bit different than the one from the Chen et al. work you have cited. In Chen et al., we were looking at a few longer clips of movies that were interspersed with rest. We spliced out the movie clips, and treated each as a separate run for processing (the clip presentation hadn’t been randomized, so we didn’t have to reassemble/unshuffle them); things like detrending were done “as normal” for each run separately.

In your case, you have several shorter clips back-to-back-to-back, forming individual runs. In order for trends across each run of several clips not to affect things (esp. since they are randomized), then it might be a good idea to first detrend the data to, say, 2nd order (because of their overall duration) using 3dDetrend, to remove the trends across each run. Then, splice the runs into the short clips; demean each short clip; unshuffle the clip order (so that movie clips match across subjects); and concatenate the short clips into a single long run. That final thing can be put into

You then shouldn’t have to use the usual polort degrees in the regression block later; you can just specify to include a constant in the model, via:

-regress_polort 0

In naturalistic scanning, just as in resting state, we don’t have the benefit of having task stimuli to model and use as our data of interest; we only can use nuisance regressors in the model, and then we take the remaining residuals (errts* file) as the output time series of interest. Motion has even larger influence than usual; therefore, we often include the derivs of the 3dvolreg motion estimates to try to account for motion to a higher degree than in task data. It seems a good idea to keep doing that here.

Re. bandpass filtering-- that does not seem necessary at all here. (In fact, in much of resting state processing, it might not even be necessary… but that is a separate story.)


Hi Paul,

Thank you very much for your thorough reply. To sum it up: you would advice to detrend first, then to cut the time series corresponding to each video clip, demean, reorder and concatenate them and run the preprocessing on the resulting data series. For the preprocessing, the code from the NI paper can be used, but -regress_polort 0 should be added. The resulting output (errts* file) can then be used to compute the inter subject correlation amongst subjects, but I would use the option -polort -1 as the data has been preprocessed already. Is this correct?

There is one more thing that occurred to me: My a priori defined regions of interest are subcortical. Would it still be recommendable to do FS surface reconstruction using the FS command recon-all -all or would it be better to replace it with the -autorecon1 and -autorecon2 option instead?

Many thanks,

Dear Paul,

I came across some problems in the pre-processing related to a very high number of volumes getting censored due to motion. I acquire my data in three runs and participants have breaks in between them, so it could be that they move their heads in the breaks but not during the EPI acquisition itself. But given that the current pre-processing pipeline firstly reassembles/unshuffles the video clips (after detrending) and then secondly performs the pre-processing for data concatenated from all three runs, it seems that using 3dvolreg with vr_base_min_outlier as base could see between session head movement as within session movement and censor too conservative. Hence, I was wondering whether it might be better to detrend each run, then preprocess each run separately as suggested, transform the resulting errts.* into a nifti file (or is it possible to do that by specifying -errts as eprefix.nii?), then concatenate and unshuffle the time series (whilst demeaning the time course of each video clip) and then use this file to calculate the intersubject correlation.

Again, thank you very much for your help, this is highly appreciated!
Best regards,

Hi, Stef-

The motion censoring criterion is based on the enorm (Euclidean norm) of the motion parameter values; that is calculated using volume-to-volume changes (the derivative of each motion parameter time series), so that I don’t think one fixed movement between scans should result in a huge amount of censoring (assuming that that displacement wouldn’t lead to large outlier values for the “outlier frac” censoring criterion, which I doubt it would.

The quality control (APQC) HTML output should show you where the censoring occurs in time-- you can see if it is where video clips are “glued together” or not.

If you look at the volumes in the AFNI GUI, can you see noticeable motion within a video clip’s duration?

I don’t think detrending separate from the rest of the regression model would be recommendable


Hi Paul,

Thank you for your reply. I tried to explore where in my pre-processing the high degree of censoring occurs. From how I understood your reply from April 12th, I firstly detrended each run, then took each each clip, demeaned it, unshuffled it and combined it into one nifti file which I then used as input for the pre-processing following the same code as used in Chen et al (2016, NI) with the modification of setting -regress_polort 0. Even though the raw runs didn’t show any severe motion by visual inspection (and I also verified this by using the raw files as input for the preprocessing, which for an example subject then had a censor fraction of 0.0018); however, if I follow the steps as described, I get a censor fraction of 0.8381 for the same subject.

I then tried to disentangle these strange results: as said before, if I just enter all three runs as they are, I barely have censored TRs. If I detrend them firstly (using 3dDetrend -polort 2; separate for each run), the censor fraction increases to 0.5145. If I combine all three runs into one run (without detrending in all following steps), just the step of combining them into a big run slightly increases the censor fraction to 0.0299 and the censored TRs are at the two intersections between run 1 and run 2 and run 2 and run 3, respectively. The same issue also occurred when I further concatenate the single clips into one run without unshuffling them (censor fraction 0.0337) and again, these censored TRs are where the runs are combined. If I then add the unshuffling to it, the censored TRs distribute (censor fraction 0.0236), but are at the beginning of a clip and the mot enorm plot shows several spikes aligning with the begin of a video clip. If I demean the video clips firstly before unshuffling them, this increases the censor fraction to 0.8145.

I am not sure on how to deal with this, as both, detrending and demeaning seem to impose severe problems for the later preprocessing. What would be your advice on this?

Again, many thanks for your support.
Best regards,

Dear Paul,

I am sorry for bothering you again. I have pre-processed my data, but wanted to make sure that what I did makes sense. Please see below for how I specified the – I modified the pre-processing a bit compared to what was suggested in the NI paper: I am pre-processing the data separately for each run whilst regressing out the polynomial degrees (up the third order given the average run length of 380 à default: 1 + floor(380 / 150.0) = 3) to cover scanner drifts. As recommended, I don’t do band pass filtering. I have further modified the -regress_censor_motion to be 0.3 instead of 0.2 based on some resting state recommendations I found in the help. Afterwards, I take the final output file (the residual files errts*) and concatenate and reshuffle the volumes relating to each video clip without demeaning them. The output of this step is then used to compute the ISC. Is this halfway reasonable? Do you think I should demean the volumes for each magic trick before reshuffling/concatenating?

# specify actual -subj_id $subj.magictrickwatching_perRun					\
    -blocks despike tshift align tlrc volreg blur mask regress          \
    -copy_anat $fsindir/$fsanat                                         \
    -anat_follower_ROI aaseg  anat $fsindir/aparc.a2009s+aseg_rank.nii  \
    -anat_follower_ROI aeseg  epi  $fsindir/aparc.a2009s+aseg_rank.nii  \
    -anat_follower_ROI FSvent epi  $fsindir/$fsvent                     \
    -anat_follower_ROI FSWMe  epi  $fsindir/$fswm                       \
    -anat_follower_erode FSvent FSWMe                                   \
    -dsets $epi_dpattern                                                \
    -tcat_remove_first_trs 0                                            \
    -tlrc_base /usr/share/afni/atlases/MNI152_T1_2009c+tlrc             \
    -tlrc_NL_warp                                                       \
    -volreg_align_to MIN_OUTLIER                                        \
    -volreg_align_e2a                                                   \
    -volreg_tlrc_warp                                                   \
    -regress_ROI_PC FSvent 3                                            \
    -regress_make_corr_vols aeseg FSvent                                \
    -regress_anaticor_fast                                              \
    -regress_anaticor_label FSWMe                                       \
    -regress_censor_motion 0.3                                          \
    -regress_censor_outliers 0.1                                        \
    -regress_apply_mot_types demean deriv                               \
    -regress_est_blur_epits                                             \
    -regress_est_blur_errts                                             \
    -regress_run_clustsim no											\
    -regress_polort 3	

One more question arises: In addition to the movie clip watching, we also did a pre and post task resting state scan and I am interested in changes in the resting state functional connectivity due to the task. Can I preprocess that resting state data in the same way? Would I still include both runs (i.e. pre and post task) into the same script or would I treat them independently?

Any help with these issues is highly appreciated!
Many thanks and best regards,

Hi, Stef-

I am not quite certain why the EPIs wouldn’t be processed together… Going back to the initial question you had asked about this, our thoughts were these:,161034,161061#msg-161061
… and I’m not sure why that isn’t appealing? One main thing to avoid is having drifts within a segment dominate the overall correlation after being concatenated together.

Having a lower motion censoring limit seems fine. Indeed, one basically does treat naturalistic scanning as resting state, from a processing point of view. The polynomial order seems to be what about the default is (though I think that one is based on time duration, and not just number of time points).

Re. last question about resting state: yes, you should be able to process your resting state in the same way. You should be able to put both runs into the same script.


Hi Paul,

Thank you very much for your reply.

As indicated here (,161034,161745#msg-161745) when doing what you’ve suggested I got an incredibly high number of motion censored volumes - even though the motion is not apparent in the original EPI files for each run which was very surprising for me. Hence, I tried to disentangle the effects each step of the pre-processing would have on my data (,161034,161922#msg-161922), especially using the quality check html file. I’m very new to pre-processing in AFNI, but from my layman point of view, it seems that detrending first as suggested in your initial answer (,161034,161061#msg-161061) imposes problems. Likewise does the demeaning. To circumvent the problems of motion censoring if there is de facto no motion (as observed in the original EPI), I thought it might be better to do the pre-processing firstly and do the concatenation/demeaning as a separate step.

This also has the additional benefit that it is easier to select different video clips to be included for the ISC calculation. I study memory, so I’m interested in comparing the ISC for remembered video clips (i.e. both members of a given pair have remembered) with the one for forgotten video clips (i.e. both members of a given pair have forgotten). However, this is different for each pair of subjects, so the concatenation has to be done individually (i.e. 2(memory outcome: remembered vs forgotten) * 1/2N(N-1) = 2450 concatenations given my N = 50) and for me, it seemed easier/more efficient to pre-process the data of 50 participants to then use the 50 *errts files to create those 2450 final time series by concatenation than to first concatenate and then pre-process 2450 time series.

Best regards,