3dDeconvolve stalls at "current memory mallocated"

oryonessoe · January 22, 2020, 6:58pm

Hi, I am a newbie in AFNI and this is my first time analysing fMRI data independently, so thank you in advance for your help. Please feel free to point out any folly that I have inadvertently fallen into, and let me know if I need to provide more info.

I am trying to run a similar analysis as one of those published in Mumford 2012 – a separate GLM for each trial in preparation for MVPA. I built my scripts and preprocessing based on the MVPA tutorial from Brown University’s Carney Institute for Brain Science.

I started with 16gb of RAM, and upgraded to 48gb, and still run into the same error of “current memory mallocated.” I suspect that my problem is that I have 3600 trials…

[size=medium]Data and Design[/size]:
BOLD data: 90 trials for each of 40 categories (total 3600 trials), across 6 runs, data collected from 10 participants. Dimension:104x104x92 x 625 time points.
Data size: about 1.8gb for each participant (including all 6 runs, brain extracted)

[size=medium]Resources[/size]:
Software: Ubuntu 18.04.3 LTS, AFNI_19.3.18 ‘Nero’
Hardware: Intel Core i7-7700 CUP @ 3.6GHz x 8, RAM=48GB, data disk available space = 187.8GB

[size=medium]Bash Command[/size]
3dDeconvolve -input $fnames \ ← file names of the 6 BOLD run for this subject
-polort 1 \ ← terminal suggests minimum of 5, I did 1 here based on Brown’s tutorial, is that wrong?
-local_times
-GOFORIT 4
-num_stimts 40
-stim_times_IM 1 $stim_dir/${subj}_1_Snake.txt ‘BLOCK(1,1)’ -stim_label 1 Snake \
-stim_times_IM 2 $stim_dir/${subj}_2_Frog.txt ‘BLOCK(1,1)’ -stim_label 2 Frog \
… there are all together 40 lines of these. Calls stim timing file with 6 lines (1 for each run), 15 trials for each run, timing precision to 4 decimal
-bucket $reg_dir/id${subj}_bucket.nii

[size=medium]Issue (trimmed terminal output below)[/size]

as the script starts, it loads about 20gb into memory (I am guessing: 1.8gb data + 18.3gb the output says it needs for output briks?).
then it increases to about 38 gb.
20 hours later:

terminal says something about “need to compute the hard way”
maxs out the RAM to 100% (47.xx gb)
gives the “current memory mallocation” error
started doing something about vertex/voxel loop (?) that puts out “0”,“1”,“2”,“3”,“4”, each very slowly across the next few days.

I accidentally quitted (ctrl+C… sigh… when I tried to copy the error for the forum…) but terminal would not release the memory (still at 100% RAM used)
*My apologies about the vagueness in #3, I had to do this bit by memory… since I had to restart with no memory left to do anything… I am re-running it now, and I can post the actual error in … 20 hours? (SIGH)

[size=medium]Four questions, if you don’t mind:[/size]

The input data itself is not so large (1.8gb). Is this really ultimately a RAM issue? If so, how do I compute how much RAM do I actually need? Or did I do something wrong in the scripting?
Even though the 3dDeconvolve does stall, it output somethings before it stalls. What are those outputs? Are they usable?
3dZcutup: I understand this to be a valid way to get around RAM issues. If I were to cut it up @ Z, should I first concatenate the 6 runs? I had read the 3dZcutup -help carefully, but I am still not entirely confident that I understand it well enough to put it back together correctly. Can you please recommend some tutorials for me?
I cannot find instructions to do 3dDeconvolve in a multithread manner to use my 8 cores. If I do 3dZcutup, can I then parallel the processes? E.g. core #1 runs slices 1, 9, 17… core #2 runs slices 2, 19, 18… etc. This should still need less RAM, right? I would be running 8 slices instead of all 92?

[size=medium]Terminal output[/size] (before the 20 hours described in issues step 3)
9006 is all ready for GLM now! @ Wed Jan 22 12:57:43 EST 2020
++ ‘-stim_times_IM 1 [[stim timing dir]]/9006_1_Snake.txt’ will have 90 regressors
++ ‘-stim_times_IM 2 [[stim timing dir]]/9006_2_Frog.txt’ will have 90 regressors
[[ and so on for a total of 40 stim categories]]
++ 3dDeconvolve: AFNI version=AFNI_19.3.18 (Dec 27 2019) [64-bit]
++ Authored by: B. Douglas Ward, et al.
++ current memory malloc-ated = 1,456,926 bytes (about 1.5 million)
++ loading dataset [[6 preprocessed, brain extracted BOLD runs]
++ current memory malloc-ated = 1,614,150 bytes (about 1.6 million)
++ Auto-catenated input datasets treated as multiple imaging runs
++ Auto-catenated datasets start at: 0 625 1250 1875 2500 3125
++ STAT automask has 153313 voxels (out of 1274880 = 12.0%)
++ Skipping check for initial transients
*+ WARNING: Input polort=1; Longest run=625.0 s; Recommended minimum polort=5
++ -stim_times using TR=1 s for stimulus timing conversion
++ -stim_times using TR=1 s for any -iresp output datasets
++ [you can alter the -iresp TR via the -TR_times option]
++ -stim_times_IM 1 using LOCAL times
++ -stim_times_IM 2 using LOCAL times
[[ and so on for a total of 40 stim categories]]
++ Number of time points: 3750 (no censoring)

Number of parameters: 3612 [12 baseline ; 3600 signal]
++ Memory required for output bricks = 18,363,371,520 bytes (about 18 billion)
++ Wrote matrix values to file [[out put file name]]
++ ========= Things you can do with the matrix file =========
++ (a) Linear regression with ARMA(1,1) modeling of serial correlation:

3dREMLfit -matrix [output .xmat.1D]
-input [[6 preprocessed, brain extracted BOLD runs]]
-Rbuck [[output dir]] id9006_bucket_REMLvar -verb

++ N.B.: 3dREMLfit command above written to file [[output dir]]/id9006_bucket.REML_cmd
++ (b) Visualization/analysis of the matrix via ExamineXmat.R
++ (c) Synthesis of sub-model datasets using 3dSynthesize
++ ==========================================================
++ ----- Signal+Baseline matrix condition [X] (3750x3612): 1279.16 ++ OK ++
*+ WARNING: !! in Signal+Baseline matrix:

Largest singular value=2.92141
84 singular values are less than cutoff=2.92141e-07
Implies strong collinearity in the matrix columns!
++ Signal+Baseline matrix singular values:
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 8.22803e-10 6.99588e-09
7.61534e-09 1.17709e-08 1.44836e-08 1.65853e-08 1.69408e-08
1.86453e-08 2.11361e-08 2.13231e-08 2.43178e-08 2.59434e-08
[[and so on for a total of 723 lines]]
++ ----- Signal-only matrix condition [X] (3750x3600): 1233.42 ++ OK ++
*+ WARNING: !! in Signal-only matrix:
Largest singular value=2.75718
84 singular values are less than cutoff=2.75718e-07
Implies strong collinearity in the matrix columns!
++ Signal-only matrix singular values:
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 7.31057e-09 8.75017e-09
1.06977e-08 1.30859e-08 1.42085e-08 1.46287e-08 1.61204e-08
1.71607e-08 1.9407e-08 1.95557e-08 2.10651e-08 2.23834e-08
2.36478e-08 2.48074e-08 2.52718e-08 2.63005e-08 2.85352e-08
[[and so on for a total of 720 lines]]
++ ----- Baseline-only matrix condition [X] (3750x12): 1 ++ VERY GOOD ++
++ ----- polort-only matrix condition [X] (3750x12): 1 ++ VERY GOOD ++

Thank you again for your time in reading and considering to answer my questions, I really appreciate it!

rickr · January 22, 2020, 8:53pm

Your model is huge. There are 3750 time points, but 3612 regressors. Even once you get the results, they will basically be noise.

It is surprising this takes so much memory, but it has to do with generating statistics for all of the sub-models. So having 3612 regressors makes a big difference there, along with overall speed of the computation.

And the speed would be slow to begin with, but it seems you are using all of the RAM, in which case the program is probably thrashing (swapping in and out memory).

On to your questions…

Since the command does not include -fout, -fitts or similar options, there is probably no good way to reduce RAM use, other than with something like 3dZcutup. On the flip side, I think you are over-modeling the data, and this analysis might not be so great to begin with.
It would take some looking into, but you are probably better off not using any temporary files.
Are you running 3dDeconvolve per run now? Using 3dZcutup would just break the data into pieces that could be handed to 3dDeconvolve, and then put back together. The point is not to run a parallel analysis, it is to run sequentially on smaller datasets to save RAM.

This is not a very convenient way to go.

3dDeconvolve has a -jobs option to use multiple threads. That is a good way to speed up the analysis, but it will not prevent the RAM problem. If the program has used all of the RAM and is thrashing, it will still be very slow.

It might be best to think more about this approach. There are 40 stim categories, each with 90 regressors? The output might not be very useful.

rick

oryonessoe · January 22, 2020, 10:04pm

Thank you, Rick, for the quick response!

Re #3: I am running all 6 runs together. I can definitely try running 1 run at a time.
I was under the impression that I needed to do them all together so all the samples of the same category are present. Is that true? Or does it not matter since it is yielding individual regressors for each trial anyways?

Regarding the over modelling: Maybe I am approaching this all wrong? My goal was to get 3600 betas then run MVPA on the betas like the Brown tutorial suggests. Would this not work because the stim presentations were too short?

Again, thank you for your time, and I apologise in advance for novice questions!

oryonessoe · January 22, 2020, 10:47pm

actually… an even more novice question… does my setup for 3dDeconvolve produce output 1 beta per trial, or 1 beta per voxel per trial?

rickr · January 27, 2020, 2:18pm

Hello,

If all of the stim classes are being modeled using IM, and if motion and other regressors of no interest, then you should get identical betas modeling per run as with across runs. The t-stats would be different, of course. But the basic point is that if all regressors are only non-zero in a single run, then it does not matter if they are fit over one run or all. The betas would not change.

If that is the case, it might be an alternative way to save RAM, if your computer is not able to reasonably run the analysis as is.

Note that if you were NOT using IM, then indeed, we would analyze all runs at once, so that all samples in a category are present. But when using IM, it is like each event is its own category.

But either way, I do not see what you could do with these results. With 3600 regressors over 3750 time points (and that means not even censoring motion points, which will have a big effect on local betas), it suggests that not only is this IM, but the responses to events overlap heavily. So the betas for one class will be strongly affected by the betas for another. These results will probably be quite noisy.

Getting to your last question, the IM method will output one beta per event. But all of this is “per voxel”, too. The results are across a volume. So at each voxel there will be one beta per event, it is one beta per event per voxel.

rick

oryonessoe · January 27, 2020, 5:37pm

I think I understand now, and will try to figure out some other way to analyse the data.

Thank you, Rick, you have been most kind and helpful!