@sswarper speed

Hello experts

I ran @sswarper on a subject [@sswarper $inputT1 $subid -deoblique]. It took about 11-12 hours for a single subject. The second subject is also taking just as long.

The T1 image dims are 174 x 213 x213 (res is 1 x 0.94 x 0.94 mm).
afni -ver is slightly outdated AFNI_17.2.07
But the script running on a server with 96 cpus and I set OMP_NUM_THREADS=48
I have the terminal output (from without -verb) if needed. The output is fine…the images look alright.

Is this a typical amount of time for one image? Can the older afni -ver be the issue?

Thank you for the help


Hi, Shankar-

That seems excessively slow to me. I ran @SSwarper on a 1x1x1 mm**3 anatomical the other day using 5 CPUs and it took about an hour, I think.

Something to note-- that version of AFNI is veeery old, and there have been a lot of modifications to @SSwarper since then, as well as important ones to 3dQwarp. In particular, pretty recently Bob made the default 3dQwarp operation both faster and “lighter” memory-wise. I woudl definitely update you binaries. Done via:

@update.afni.binaries -d

… unless you are on a Mac and using the 10.7_local binaries (as evince by running “afni -ver”), in which case you will need to update via:

@update.afni.binaries -d -package macos_10.12_local

because we haven’t maintained the Mac 10.7_local binaries for a long time now.

Additionally, it is possible that 48 CPUs might be too many, and leading to thrashing in the parallelization-- the fight threads is costing too much, and slowing things down. What if you tried, say 12-16 CPUs?

Can you also verify that your computer is really using that many? If you run just the followign in your script:

3dQwarp -hview

… it will open up the 3dQwarp help in a GUI text editor, but more importantly it will echo in the terminal the OpenMP thread count that it has, for example as:

++ OpenMP thread count = 1


Hi, again-

One more thought on this-- what is the output of:

3dinfo -extent  DSET

… where DSET is your input to @SSwarper? If your data set has weird coordinates that aren’t ~approx centered around (x,y,z) ~ (0,0,0), that could conceivably cause some issues, too, for speed/memory.


I have updated the binaries and set threads to 15. I will post an update once a subject is done.

After the update 3dQwarp -hview shows ++ OpenMP thread count = 15

This is the output for 3dinfo -extent for a couple of random subjects

-84.145836 90.854164 -116.968491 76.156509 -107.456924 91.293076

-71.374207 92.625793 -125.734436 88.953064 -113.288055 85.461945

Thanks a lot for the help.


Hi, Shankar-

OK, cool, let’s see how long that takes.

The reason that the extents might matter: the warping program at the heart of @SSwarper (the latter is actually a wrapper for a few AFNI programs, with 3dQwarp being the primary workhorse) has to make a grid encompassing both dsets on which to work; so, if the source and base dsets are veery far apart, then it has to make a big grid, using lots of memory, and that might slow things down.

The extents you listed don’t look super egregious; they might still be a bit far apart, for example consider the extents of MNI template for @SSwarper:

3dinfo -extent MNI152_2009_template_SSW.nii.gz 
-96.000000	96.000000	-96.000000	132.000000	-78.000000	114.000000

The brains you have might be relatively shifted a fair along in the anterior-posterior axis: MNI FOV extent [-96, 132], and one of your dset’s FOV extent [-117, 76]-- could be (much) more than 20mm offset; something similar might be happening in the inf-sup directions.

If your current alignment doesn’t go well, I think adding in the “-giant_move” option to @SSwarper might help. That will do a pre-alignment center-of-mass alignment (with a couple other alterations) that might help get the grids closer to start.



It seems fine now. Each subject is taking about 45 to 60 mins. The number of threads doesn’t seem to make a difference. I ran a few subjects each with 12 and 48 threads and it takes about the same time.

The registration in a bunch of these subjects has turned out pretty well. But thanks for the tip about the -giant_move; will surely be handy.


Hi, Shankar-

Cool, glad that is working well.

I think with multiple threading, there is a diminishing returns after a while: doubling the number of threads doesn’t halve the runtime, because there is 1) some part of the processing that is not parallelized, and 2) there is computational cost to multithreading (organizing jobs, separating and rejoining data, etc.). Eventually, adding more CPUs doesn’t add more efficiency, because there is so much cost from the second phenomenon. It might be that things top out around 12 or so in this case (that actually rings a bell with what Bob has mentioned previously, I think). But, if running with 12 or 48 doesn’t matter for you, you can run 4 job simultaneously, each with 12 threads-- so, “parallelizing” your analysis further in that sense still works.



I ended up doing that - running multiple scripts to finish it faster.
However to update my previous message- on my machine, subs running with 48 threads were completed 5-10 mins faster than those on 12 threads (~38-40 vs 45-50 mins).


Hi, Shankar-

Cool, good to know-- those run times seems consistent with the diminishing returns of extra threads per run after a while.