I have been using a computing cluster at our university to run 3dDeconvolve. Recently, I started working with a new dataset (EPI resolution 2x2x2mm) where I have to run trial-by-trial 3dDeconvolve models for 162 trials, and found that it is taking about 6 hours to run deconvolve per trial. In the past, I was able to bring down computation time by increasing the number of jobs specified using the -jobs flag in the 3dDeconvolve command to 24, and asking the job scheduler on the cluster to place deconvolve processes on separate cores.
However, with the new dataset, I’m not able to increase the number of jobs beyond 12, because it seems that each job is requiring that approximately 125GB of memory is allocated to each process (presumably because of the higher resolution of the EPI). And we have a limited number of nodes that have high enough available memory for this. I was working with the IT folks who maintain the cluster to get this working, and they said that the processes never actually use that much memory (i.e. 125GB), but the job terminates if that much memory is not allocated through the job scheduler.
So finally to my question: Is there any way that I can bring down the computing time required for 3dDeconvolve, other than increasing the number of jobs? Alternatively, is there some way to resolve this issue where 3dDeconvolve requires more memory to be allocated to the job than it actually ends up using? If I’m able to do that, I will be able to increase the number of jobs. The cluster will have sufficient resources if I’m not requesting 125GB for each job.
I’m not super familiar with these concepts, so please let me know if I’m missing something/if you need more information from me!
There are a couple of options, depending on your comfort with scripting.
[li] Use 3dREMLfit instead of 3dDeconvolve - using 3dDeconvolve only to create the matrix file to input the regression model to 3dREMLfit. This program has the option “-usetemp” which will use disk files for temporary storage; this option was added specifically for someone with a similar problem. You do not have to use the “-R” options to get the temporal autocorrelation corrected results, if you don’t want to – you can use the “-O” options to get Ordinary least squares results, more or less as calculated by 3dDeconvolve. Please read the output of “3dREMLfit -help” and pay attention to the notes for “-usetemp” – in particular, a solid-state disk (SSD) is best used for the temporary storage.
To use 3dDeconvolve as the matrix-generator only, give it the option “-x1D_stop”, which means it will exit/stop after it writes the matrix file out. 3dDeconvolve will write the 3dREMLfit command to stdout (the terminal), which you can then edit and use to your heart’s content.
[li]Alternatively, you can script the program 3dDeconvolve to use one slice at a time. This is a little more complicated, as you have to do these steps:
[li] 3dZcutup to break the inputs into 1-slice datasets
[/li][li] 3dDeconvolve run separately on each 1-slice dataset, to produce 1-slice output(s)
[/li][li] 3dZcat to assemble the 1-slice outputs back to 3D datasets
In either approach, there are details to get it running correctly. However, I personally would start with the 3dREMLfit method, as it is simpler. Note that “-usetemp” disables the use of multiple CPUs, as I didn’t make the effort to deal with multi-threaded I/O to the same temp files. For this reason, if the program is too slow, you’ll have to try the slice-and-dice method (which could be done with either 3dREMLfit or 3dDeconvolve).