Using more than 32 cores in 3Ddeconvolve


I am ruuning 3Ddeconvolve on a server that has 128 cores. So, to run the 3Ddeconvolve function faster, I want to use most of the cores. However, “-jobs” maximum allowed is 32. Is there a way to increase the limit?
My Dataset is huge (14000 volumes). I made the 3Ddeconvolve faster by adding “-mask”, but still it takes a full day to complete with 32 cores.

I would appreciate any guidence to help me to address this issue. Thanks you in advance for your assistance!

My suggestion would be to instead use the superior 3dREMLfit and set your environmental variable OMP_NUM_THREADS to the number of threads you think would be of use.

Other possibilities also include using 3dZcutup to slice and dice your dataset up before running either 3dDeconvolve or 3dREMLfit. Some helpful bits in the help for that.

1 Like

Some more ideas to add on to the pmolfese suggestions, we tend to use swarm scripts on a computing cluster over subjects or over different slices. Getting rid of extra space around the EPI dataset will save time (3dAutobox or computing only within a mask).

3dDeconvolve doesn’t support OpenMP, so the 32 jobs are the hard coded limit. The good news is that you can change the code if you want to try. I think that in the past testing, the effective computational time savings was reaching an asymptote with increased jobs. It doesn’t hurt to try really. 3dDeconvolve.c has the macro variable PROC_MAX defined to be 32. You can change that to something higher to see if it works for you. This assumes you’re comfortable building AFNI from source code. It’s not too hard, but if you run into problems, let us know.

You can see the past discussion of this here:

Thank you so much for your valuable suggestion! greatly appreciated.
I’m going to give 3dREMLfit a try. I am curious to see how it enhances/affects my results.

Thank you so much for sharing this information! It is incredibly helpful.

hey Peter and Daniel,

if Ali ends up not using 3dREMLfit, do you think adding ‘-noFDR’ to the 3dDeconvolve command would also speed things up? i’ve found it helps a little, but maybe it would help even more if the process scales with number of volumes? just a thought; that would imply she’d use other recommended (and probably newer) approaches for multiple comparisons correction, too.

It couldn’t hurt, but the OP emphasized the number of volumes. The FDR calculation would vary with the number of voxels instead.

1 Like

The FDR computation should scale with the number of statistical volumes output, rather than the number of EPI volumes input. So it depends how many betas and contrasts there are.

1 Like

Also, on the chance that memory is slowing things down, note that there is no need to include both an -errts option and a -fitts option. If they are both currently included, use only -errts in 3dDeconvolve. After the fact, use 3dcalc to compute fitts = all_runs - errts. This is what does when using -regress_compute_fitts.

  • rick

Rick’s point about memory is a good one. If you are running out of memory and not CPUs, then the speed can be limited by disk thrashing by swapping virtual memory to and from the disk. Check during the processing with top and ps or similar tools.

There are several previous threads with more good advice: