'Chunked' 3dvolreg to reduce mem usage

Hello AFNI team, and 3dvolreg experts in particular,

SMS has increased both the spatial and temporal resolution of fMRI data to the point where motion correction can use quite a bit of memory. From my naive understanding, 3dvolreg reads the entire input dataset into memory, performs motion correction on the entire dataset (resulting in an output dataset) and then writes the output to disk. In C-PAC we have reduced the amount of memory used by separating the functional data into different chunks, performing the motion correction on each chunk to a universal base (all chunks have same base), and then concatenate the results. Not only does this reduce memory but by parallelizing the computations (when we have a lot of memory) we can speed things up.

While this works well, it seems to me that this could be done more easily and much more efficiently inside of 3dvolreg. Would it be possible (and reasonably simple) to modify 3dvolreg to motion correct the data one chunk at a time?

Kind regards,