Normalized data - abnormal size of the output file

boukes01 · August 16, 2023, 2:55pm

AFNI version info (21.1.20):

Dear all,

I am running data analyses on a task-based fMRI study and here is my issue.

I am using me-ica to preprocess the data, and apply the normalization to the generated data in their native space as follow:

I first warp my anatomical data to the MNI and then apply 3dWarpApply to the preprocessed data in their native space.

While the size of my native output is about 175.3 MB, after applying the normalization, the generated output is about 16.4 GB.

When I am applying the same normalization to a different set of resting state data, the issue does not occur and the output generated is of normal size (approximately 30 MB).

I don’t really understand why I obtain such a big file with my task-based fMRI data, and this prevents me from running the second level analyses (3dMVM) effectively as I took more than three days to be processed.

Any help would be greatly appreciated.

Thanks.

Here is the code used for normalization:

mni=/mypath

for i in subj_ID; 

echo "-------------------------------"
echo "Working on subject: $i at: $(date +%Y-%m-%d-%H:%M:%S)"
echo "-------------------------------"

sleep 2 # Two seconds pause

curr_warp=$mni/${i}/anatQQ.sswarp_anat2mni_WARP.nii
curr_aff12_1D=$mni/{i}/anatQQ.sswarp_anat2mni.aff12.1D

cd /path Subj_ID

for j in 1 ; do 

#3dNwarpApply -overwrite -nwarp "$curr_warp $curr_aff12_1D" -source ${I}_task${j}_T1c_medn_nat.nii.gz -ainterp NN -master $mni/template/mni+tlrc. -dxyz 1 -prefix ${i}_task${j}_T1c_medn_norm -verb

done
done

dglen · August 16, 2023, 4:12pm

Are you saying that itaskjT1c_medn_norm+tlrc is 16GB? It should have the same spatial grid as the master dataset here - which is the MNI template you specified. The number of volumes should be the same as the input, i_taskj_T1c_medn_nat.nii.gz. The data type can have a smaller effect. Check the inputs and the outputs with the "3dinfo" command.

A few other remarks:
The MNI template is not one provided with AFNI. There are different variants, so consider a specific one that is a useful alignment target and has a relevant atlas.

The letter "I" (eye) is in upper and lowercases in your script. That's usually a problem for most languages.

NN (nearest neighbor) interpolation is typically used for ROI masks and not for intensity-based datasets. The default is wsinc5 for 3dNwarpApply.

boukes01 · August 16, 2023, 11:44pm

That is correct, itaskjT1c_medn_norm+tlrc is 16GB.

My input i_taskj_T1c_medn_nat.nii.gz. includes 567 vols and my output itaskjT1c_medn_norm+tlrc also includes 567 vols.

I am using the MNI152_2009_template_SSW.nii.gz which I resampled to my input. Does that sound good to you? The “I” was a typo; in my script I am only using the lowercase. Thanks for the heads-up regarding nearest neighbor, I am going to use wsinc5 instead.

Thank you so much for your response.

ptaylor · August 17, 2023, 2:26am

Just to check: what is this for each of the input and output volume:

3dinfo -datum -n4 -ad3 -prefix DSET

?

Also, the source is gzipped, and is the output (either NIFTI or BRIK file) gzipped, when comparing the file sizes?

--pt

boukes01 · August 17, 2023, 4:23am

For the input, 3dinfo -datum -n4 -ad3 -prefix DSET gives a series of "float" and the numbers: 65 90 91 567 2.500000 2.500000 2.500000

And for the output, the command generates a series of "float" and the numbers: 182 218 182 567 1.00000 1.00000 1.00000

Sorry for the naive question, but what are the first three numbers, assuming that the last three numbers represent the voxel size?

When comparing the sizes, the inputs are 175.3 MB zipped and 1.2GB unzipped and the outputs are 16.4 GB unzipped.

Thank you so much!

ptaylor · August 17, 2023, 12:08pm

Howdy-

So, the "datum" of each subbrick is float---OK, that is the most disk-space-intensive type to have, but it is not uncommon. I wanted to see if there were a datatype difference between the dsets---and there isn't.

The "-n4" will output the number of elements along the i, j and k axes, as well as the time axis. So, the first dset is a 65x90x91 matrix with 567 time points. The second one is a 182x218x182 matrix, also having 567 time points.

The last three numbers are the voxel dims.

So, the voxel dims and matrix sizes tell a similar story: the dataset has been heavily upsampled to much finer resolution, and to having a looot more voxels. The first dset has 65x90x91 = 532,350 voxels, while the second has 7,221,032 voxels, which is 13.6 times more (!). That basically tells you the story of what is happening here---comparing the unzipped datasets, the ratio 16.4 GB to 1.2 GB is essentially that factor. You have increased the raw/uncompressed filesize of your data by over a factor of 10, by upsampling from 2.5mm isotropic to 1.0mm isotropic.

For reference, you can estimate the file size of the data by the following:
(number of voxels) * (number of time points) * (dtype_factor) bytes
... where dtype_factor is 1 for "byte" type, 2 for "short" and 4 for "float". In your upsampled data case, this is: 7221032 * 567 * 4, which is 16,377,300,576 bytes, AKA 16.4 GB.

If you gzip each file, (just the *.BRIK file for the BRIK/HEAD case, or the *.nii for the NIFTI case), they will each get smaller, but note that when you do data calculations, they get uncompressed and loaded into memory.

So, the big question is, do you need to upsample your data so much? I don't think that should be necessary in most cases (but if you need to, you need to). When we process FMRI data with afni_proc.py, our general guideline is to slightly upsample the EPI data, but only a bit; like maybe 2.5mm isotropic data to 2.0mm isotropic, or even 2.25 mm iso. But I don't see why going to 1mm iso would be necessary---often EPI data has even be blurred in processing (or, if not, will be averaged into ROIs), too, further decreasing the needs/benefits of upsampling. The data might look smoother, but it doesn't really increase information content.

--pt

dglen · August 17, 2023, 5:55pm

Given Paul's advice and your 3dinfo output, consider the -dxyz option for 3dNwarpApply

3dNwarpApply -dxyz 2.25 -master ...

boukes01 · August 18, 2023, 1:32pm

Hello Paul & Daniel,

Your posts are incredibly helpful, thank you so much for such clear explanations! I appreciate the detailed explanation, Paul.

Paul, the reason why we wanted to upsample our data so much is because we started to acquire data with a 2.5mm resolution and then switched to 1.5mm resolution. This is why we thought that we could upsample all data to 1mm iso.

What would be the best approach moving forward? Should I upsample the 2.5mm data to 2.5 and the 1.5mm to 1.5? Or should I upsample all the data in a similar way (1.5mm?)

According to Daniel’s advice, I played with the scripts and a few data, and ran the following commands:

3dNwarpApply -overwrite -nwarp "$curr_warp $curr_aff12_1D" -source ${i}_task${j}_medn_nat+tlrc. -ainterp wsinc5 -master MNI2009_to_MNI+tlrc -dxyz 1 -prefix ${i}_task${j}_norm_dxy1.nii -verb

3dNwarpApply -overwrite -nwarp "$curr_warp $curr_aff12_1D" -source ${i}_task${j}_medn_nat+tlrc. -ainterp wsinc5 -master MNI2009_to_MNI+tlrc -prefix ${i}_task${j}_norm_nodxyz.nii -verb

3dNwarpApply -overwrite -nwarp "$curr_warp $curr_aff12_1D" -source ${i}_task${j}_medn_nat+tlrc. -ainterp wsinc5 -master MNI2009_to_MNI+tlrc -dxyz 2 -prefix ${i}_task${j}_norm_dxy2.nii -verb

3dNwarpApply -overwrite -nwarp "$curr_warp $curr_aff12_1D" -source ${i}_task${j}_medn_nat+tlrc. -ainterp wsinc5 -master MNI2009_to_MNI+tlrc -dxyz 2.5 -prefix ${i}_task${j}_norm_dxy2.5.nii -verb

The output size for the first and the second command line are 17.GB, 2.2 GB for the third cmd line and 1.1 GB for the last one. So now I have a clear understanding of what is going on.

Thanks again!

ptaylor · August 18, 2023, 4:42pm

It is a little tricky to know what is the best way to merge the information in these two datasets.

You could us 1.5mm isotropic as a final resolution---that preserves the higher res data you have, and "only" increases the size from your 2.5 mm data by a factor of about (2.5/1.5)^3 = 4.6? How does that seem, memory-wise?

Also, we do have a demo for processing multiecho FMRI data with afni_proc.py (and @SSwarper for nonlinear registration to a template), using various methods for combining the echos, including the tedana MEICA; the poster describing is here:
OHBM 2022 APMULTI Poster
... and you can install the demo data+scripts with:

@Install_APMULTI_Demo1_rest

in case that is of use. You can set your final dataset resolution with the option -volreg_warp_dxyz .. (which might be explicitly done in the demo, too, already, and you can set the value to what you want).

--pt

boukes01 · August 23, 2023, 4:00am

Thank you for sharing the poster Paul. I realigned the anatomical data to the MNI first using @SSwarper and I suspect that the memory issues that I have might in fact be related to 3dNwarpApply.

These last days, I upsampled all data to 1.5mm with the hope that I will be able to run 3dDeconvolve and the second level analyses. The size of the normalized data is decreased a little bit as compared to when the data were upsampled to 1mm (4.8GB). However, the 3dDeconvolve failed and I got the following error:

Moreover, while re-running the 3dNwarpApply with -dxyz 2.5 it remains extremely slow, and I tried to re-running it with -ainterp wsinc5 and got the weird QC:

Which I don't have when I use -ainterp NN

Not sure how to proceed moving forward as the 3dNwarpApply using -dxyz 2.5 generates 1GB output files as well.

Thanks for your help

ptaylor · August 23, 2023, 9:05am

Hi-

Are you using afni_proc.py for the processing, or running 3dDeconvolve separately?

Re. -ainterp wsinc5: actually the results you show are fine and probably not an issue. The wsinc5 kernel is one that best preserves sharpness and edges when interpolating a dataset, but it does lead to a bit of tiny ringing, which can be seen against a zeroed out background or low-noise floor. The wsinc5 properties (preserve sharpness, induce a slight ringing) are describable from signal processing theory (the Fourier Transform of a tophat-shaped signal is a sinc() function, and vice versa, for what it's worth).

Anyways, those values that "appear" outside the brain are likely very tiny, and outside of the brain, and shouldn't affect analysis. You probably only really see them because tedana-MEICA masks out the part of the field of view outside the brain.

--pt

boukes01 · August 23, 2023, 11:22am

Hi Paul,

Thanks for clarifying re -ainterp wsinc5. I assumed that the underlay and overlay in afni should align perfectly, it's good to know that the ringing is not related to any issues.

I processed the data in several steps.
First I ran meica, then I separately aligned the anat to the MNI using @SSwarper and then applied 3dNwarpApply to the preprocessed data (meica output) in their native space. Separately, I applied 3dDeconvolve on the normalized data.

the meica cmd line looks like this:
$meicapath/meica.py --OVERWRITE -a aseg_mask_uni.nii.gz -e 9.5, 22.92, 36.34 -d "e1.nii, e2.nii, e3.nii" --MNI --qwarp --fres=1.5 --no_skullstrip --native --prefix task --cpus 4

I was not satisfied with the normalized data generated by meica that is why I added the extra-steps with @SSwarper to the non-normalized data.

Am I doing something totally wrong in one of those steps?

Many thanks again,
Sarah.

ptaylor · August 23, 2023, 1:30pm

Hi, Sarah-

MEICA processing fits into the larger set of FMRI processing, including motion reduction and various other alignment processes, regression/time series modeling, and more. It is certainly possible to do those steps in pieces and string them together in a custom pipeline, but that puts the burden of lots of details on you.

For example, alignments can be calculated separately, but should be concatenated before applying them to the EPI time series, to minimize blurring. The details of doing that are feasible and known, but not simple and require a lot of checks.

That is basically the reason that afni_proc.py exists---to take care of the mundane, known, mathematical details of processing, while still allowing the user to select options to control the processing in detail (such as blur values, censor values, templates of choice, ME-FMRI processing of choice, and more). I would strongly recommend using an existing pipeline tool for what you are doing, unless what you want to do cannot be done in any one (and if the latter is the case, perhaps ask and we often add in options/functionality).

In this case, it sounds like everything you are doing can be integrated within afni_proc.py. Our standard recommendation is to run @SSwarper for nonlinear alignment, and then to provide those warps to afni_proc.py to handle and concatenate. As shown in the demo linked up above, you can include several forms of ME-FMRI combination---including the tedana MEICA---within afni_proc.py directly, as well. Finally, you can specify whatever you want in a regression model (task, rest, etc.) and let afni_proc.py build your 3dDeconvolve command for you.

Importantly, afni_proc.py builds a complete, commented script of what the processing will be, so you can read it and have the full, detailed provenance of everything. There is also a built in quality control HTML, called the APQC HTML, that provides thorough images of all the processing steps and several quantitative checks (you can see many examples of it in action here).

So, I think there are several reasons to use a pipeline tool---added benefits, and reduction of some headaches. It is great that you already know in detail what you want, and have expectations of what should happen at various steps---that will make setting up your AP command and evaluating everything much better. But, again, I think using a pipeline tool will help multiple aspects. You can even start with one of the APMULTI demo examples from above as a starting point.

--pt

boukes01 · August 24, 2023, 12:56am

Thank you so much, Paul, I truly appreciate all the detailed explanations as well as the recommendations. I need to educate myself more on TEDANA and the pipeline, so thanks for sharing the relevant links that are extremely helpful.

I also just found out that applying -short to 3dNwarpApply is converting float data into short data and consequently the output generated is about 500MB. It looks like this option can be used with -ainterp so I hope this is something that I can consider to at least avoid memory/storage issues during the first and second level analyses.

Many thanks again for all the time spent answering my multiple questions and for providing all the resources.

Sarah.

ptaylor · August 24, 2023, 11:05am

Cool, that sounds good, just ping back any time with any questions/comments/etc.

--pt

boukes01 · August 30, 2023, 3:08am

Hi all,

I am back to this thread as I am running out of solutions to solve my issue with memory allocation that just persists whatever I am trying. I re-ran the analyses while resampling the data to 2.5mm and figured out how to decrease the size of my inputs which is now quite reasonable: app 500MB however, 3dDeconvolve fails again. Please, see below the error that I got:

I found similar error messages posted on the afni board but couldn't find a solution.

Do you have any idea on how I can solve this issue and run 3dDeconvolve? Or even, how to reduce the number of output sub-bricks?

Thanks for your help.

Sarah.

rickr · August 30, 2023, 2:48pm

Hi Sarah,

Exactly how much RAM do you have on that computer?
And what is the 3dDeconvolve command that you are using?

rick

boukes01 · August 31, 2023, 2:17am

Hi Rick,

I am working on a server that has over 500 GB of memory and 80 CPUS. When I am attempting to run this analysis, my process is attempting to use almost all of the system's resources.

Here is the 3dDeconvolve cmd that I am using:

Many thanks for your quick reply.

rickr · August 31, 2023, 8:39pm

Hi Sarah,

Assuming those 2 inputs are each 1.6 GB, then I would expect this command to require about 16 GB to execute (1.6 GB * 2 dsets * ~5 for inputs and outputs). That could be reduced to about 9.6 GB if you removed the -fitts option (which is not needed - it could come from 3dcalc all_runs - errts).

But even at 16 GB, I cannot fathom how it could be using a high fraction of the system's resources, particularly without a -jobs option.

Consider running free -h beforehand, to see how much RAM the computer actually has available. I doubt it is you alone going too far. If you are using -jobs, that might scale the RAM needs.

rick

boukes01 · September 7, 2023, 9:41pm

Dear Rick,

Thanks again. I stopped analyzing those data for a few days, and when I started over again it worked perfectly fine with both 1.5 and 2.5 datasets. It was apparently a server and memory issue that has been solved on the backend.

Thank you again for your help!

Best,
Sarah.