3dQwarp parallel processing problem


I’m trying to run 3dQwarp using multiple threads(N=2 versus N=6 in the following script lines). However, it seems that the warp fields calculated by 3dQwarp are not identical when I use different number of threads. This results in different afni_proc.py outputs for identical inputs but different number of threads( N=2 vs N=6). Here is what I run:

3dQwarp -plusminus -pmNAMES Rev For
-pblur 0.05 0.05 -blur -1 -1
-noweight -minpatch 9
-source rm.blip.med.masked.rev+orig
-base rm.blip.med.masked.fwd+orig
-prefix thereadsNum/blip_warp_$N

blip_warp_For_WARP+orig is then different for N=2 vs N=6.

Anyone has an idea what is the problem/solution?

And this is the afni version I use:
Precompiled binary linux_ubuntu_16_64: Dec 22 2017 (Version AFNI_17.3.09)

The image below shows %difference between a volume warped using 2 and 6 threads. The difference is in the same order as warping two different volumes using equal number of threads.

I’m confused by your units. Is 0.0123 (the threshold) in percent? That is, a fractional change of 1.23 * 10[sup]-4[/sup]? That isn’t much to worry about.

The reason is probably that the calculations are carried out in a different order in different runs, so that the roundoff errors accumulate differently. And then the optimizer will stop at slightly different points in each stage (patch and level), and so the small changes will accumulate to some extent.

Here’s a different visualization of the problem (I’m in the same lab) - a gif going back and forth between using 8 and 9 cores. The problem isn’t as much with changes in signal intensity but with changes in the pattern of distortions across number of threads.

We’ve also confirmed that repeated executions using the same number of cores give identical results, which to me seems to argue against the possibility that this is due to different orders of operations across executions (but is consistent with there being a different order of operations across number of threads). We’ve also confirmed this is restricted to 3dQwarp - 3dNwarpApply executes identically across different numbers of cores.

Finally - if this is just a known issue with multi-threaded computations in 3dQwarp, is there a ‘correct’ # of threads? Is it safest to default to a single thread, hopefully avoiding some of these problems?