3dDWUncert processing time

Hi (probably pt) :slight_smile:

Per your suggestion, I changed the number of iterations from 50 to 300 for 3dDWUncert. I guess it makes sense that this would now extend the processing time of this command. However, I’m wondering what a reasonable amount of time would be. For instance, I am running our subjects (n=330) in parallel across 32 cores. After 3 days, I have gone in to check and it’s only at ~20-30% for the first batch of 32 subjects. At this rate, I’ve calculated everything to be completed in 3 months! This seems abnormally long. Does my command below seem ok?

find SPN01* -type f | nohup parallel ‘cd {} && cd afni && 3dDWUncert -inset …/data.nii.gz -prefix DWUncert -input 3dDWItoDT_ -bmatrix_FULL dwi_matA.txt -iters 300 -overwrite’ ::: * > nohup_3dDWUncert.out &

Hi-

3dDWUncert is written to take advantage of multiple cores, if available. In your script, are you sure that 3dDWUncert is using multiple cores? Can you check the output of “afni_check_omp” when you are running your script? That tells you the number of CPUs being specified to use for a given command with OpenMP.

Going from 50 → 300 runs should just increase the runtime by a factor of 6. If you can use multiple cores per job, then that is much preferred. Also, specifying a “mask” of the brain (or brain+a bit extra), if you have one, can speed up run time, if the data is not masked.

Finally, how many gradients do you have, and what is your voxel size?

–pt

Hi pt,
I have been checking and it is using all 32 cores on our 32-core virtual machine, with one subject being run on each core at a time. Basically, the memory is being maxed out.

I have a binary mask - would that help?

bval = 1000, 47 gradient directions, 2mm voxel size.

If it comes down to it, could I decrease the # of iterations to an acceptable #, per your suggestion? thoughts?

I would try the binary mask— if they data has not been masked at all, it should make a big difference (basically, voxels in noise/nonbrain zones will be included in the volume to calculate, and that will slow things down veeery unnecessarily, esp. because they are noise dominated). Can you post b=0 or DWI image for one subj?

That number of grads and voxel sizes seems pretty typical, so I am surprised at the speed, esp. with that many processors.

It miiight be OK to lower the number, it is really a question of how many iterations it takes to be approx. converged to a representative distribution. But let’s check the first part, first.

–pt

attached!

Screenshot (232).png

Hi-

This does not look masked to me-- that means that probably much of your processing time is going in to calculating uncertainty values of skull, air, and other non-brain material that don’t matter for the tracking. In fact, more than 2/3s of the FOV is probably nonbrain matter (and those noisy regions probably get re-fit many times because they are so noisy, meaning that more-than-usual time is spent on them), so that should save a lot of time.

–pt

so I just ran 3dDWUncert with a binary mask.
Without the mask it took 5 hours for a single subject. With the mask it finished in less than a minute.
I’m shocked by the drastic decrease in processing time.

I am shocked, too! Thaaaaaat seems too big a change to be true… I mean, I expected it to be a lot faster, but not quite that fast (though, 32 CPUs is a lot, and the program is what is actually technically called “embarrassingly parallel”, because the work is soooo split-able, that there is very little readjustment or fanciness to go from regular-to-parallel).

That being said, a large fraction of the data are outside the brain, so serious speedup happens from just not having to process those areas. But additionally, the program has to work harder to fit the uncertainty when noise is present and the assumptions of the tensor model itself can be violated-- that happens quite often in air/skull voxels. So, not only is the masking reducing the number of voxels to work in, it is reducing the most problematic/slowest ones.

But going from 5hrs (=300 min) to 1 min is speedup of a factor of 300…

Would you mind uploading the dataset if I send you a link, juuuust to check it out?

–pt

Yes please, I would feel better if you could take a look at it to double check. Thank you!

Thanks, I have looked at the data now.

I first ran it with:

  • Niters = 5
  • OMP_NUM_THREADS = 8
  • no mask
    … and after 18 mins, it had finished only 50%, and I was bored, so I stopped it.

Then I ran it with:

  • Niters = 5
  • OMP_NUM_THREADS = 8
  • the WB mask
    … and after 0.07 mins, it had finished fully, and indeed things did look like they were on the right track.

Then I ran it with:

  • Niters = 300
  • OMP_NUM_THREADS = 8
  • the WB mask
    … and after 5.55 mins, it had finished fully, and I think the results do seem pretty normal.

So, indeed, I think a lot of the slowness is the extra voxels, being ones that require lots of refitting to work because they are basically noise.

Note: I have now updated the 3dDWUncert help to try to make all of this clearer for future users (use a mask, if the dset isn’t masked already; use 300 iterations). Sorry for the unclarity prior to this.

–pt