Smoothed resting-state fMRI file is unusually big

AFNI version info (afni -ver): AFNI_24.0.06

Dear AFNI experts,

I am rather new to AFNI and I have some doubts. I am preprocessing resting-state fMRI data. Data was already partially preprocessed (downloaded from OpenNeuro) and I just performed the final steps (that I reported below).

I noticed that the final smoothed file has a size of 1,2GB, compared to a pre-smoothed file of 170,8MB. This applies to all of the subjects (200~). It seems unusual that a smoothing operation would enlarge the file so much. What do you think?

All files are in nifti format and of dtype float32.

Here are the final preprocessing step I performed:

  • removed the first 4 TRs (as I am following the protocol of a study by Hearne et al. 2021)
  • created the censor file
  • created the file with the regressors
  • ran 3dTproject
  • smoothed
3dTproject \
         -input $ts \
         -prefix $output \
         -censor $censor \
         -cenmode NTRP \
         -ort $regr_file \
         -polort 2 \
         -passband 0.01 0.1 \
         -TR 2 \
         -mask $mask \
         -norm \
         -verb \
         &> ${sub_folder}/log_${sub_id}_nuisance_regression.txt

and lastly, 4mm smoothing:

3dBlurInMask -input "$input" -mask "$mask" -FWHM 4 -prefix "$output"

Thank you for your time,

Gabriele

If it's like the example script we provide in our class data, the $ts variable may contain multiple datasets that then get filtered and combined in the output. Try this using the input datasets that go into $ts

echo $ts
3dinfo -nv -prefix $ts $output

Dear dglen,

Thank you for your reply. I had thought this might've been the issue and checked, but files seemed fine. To rule out possible bugs, I even tried to smooth one single file from command line, without the use of variables or parallelization. I still got the file with the same disproportionate size.

Gabriele

Hi, Gabriele-

That is odd, because the output of 3dBlurInMask will be masked, so if the datasets are compressed, then the latter should be highly compressible.

What is the OpenNeuro site?

Can you please copy+paste the output of this (where I have separated $output to be clearly distinguished between that of 3dTproject and 3dBlurInMask:

nifti_tool -disp_hdr -infiles $ts $output1 $output2

Also, have you looked at the datasets in the GUI, to make sure nothing looks weird?

--pt

Dear pt,

Yes, I do inspect data after each step of the preprocessing, usually with fsleyes. Though for this issue I also looked at it with AFNI GUI. Everything seems fine.

Here's the output of the command you asked:

N-1 header file '/home/gabridele/Desktop/irbio_folder/spreading_dynamics_clinical/derivatives/sub-10189/func/sub-10189_task-rest_bold_space-MNI152NLin2009cAsym_preproc_resampled_4RTremoved.nii.gz', num_fields = 43

all fields:
  name                offset  nvals  values
  ------------------- ------  -----  ------
  sizeof_hdr             0      1    348
  data_type              4     10    
  db_name               14     18    
  extents               32      1    16384
  session_error         36      1    0
  regular               38      1    r
  dim_info              39      1    0
  dim                   40      8    4 193 229 193 148 1 1 1
  intent_p1             56      1    0.0
  intent_p2             60      1    0.0
  intent_p3             64      1    0.0
  intent_code           68      1    0
  datatype              70      1    16
  bitpix                72      1    32
  slice_start           74      1    0
  pixdim                76      8    1.0 1.0 1.0 1.0 2.0 0.0 0.0 0.0
  vox_offset           108      1    352.0
  scl_slope            112      1    1.0
  scl_inter            116      1    0.0
  slice_end            120      1    0
  slice_code           122      1    0
  xyzt_units           123      1    10
  cal_max              124      1    0.0
  cal_min              128      1    0.0
  slice_duration       132      1    0.0
  toffset              136      1    0.0
  glmax                140      1    0
  glmin                144      1    0
  descrip              148     80    2203.12-dirty 2024-02-01T16:17:47+00:00
  aux_file             228     24    
  qform_code           252      1    1
  sform_code           254      1    1
  quatern_b            256      1    0.0
  quatern_c            260      1    0.0
  quatern_d            264      1    0.0
  qoffset_x            268      1    -96.0
  qoffset_y            272      1    -132.0
  qoffset_z            276      1    -78.0
  srow_x               280      4    1.0 0.0 0.0 -96.0
  srow_y               296      4    0.0 1.0 0.0 -132.0
  srow_z               312      4    0.0 0.0 1.0 -78.0
  intent_name          328     16    
  magic                344      4    n+1

N-1 header file '/home/gabridele/Desktop/irbio_folder/spreading_dynamics_clinical/derivatives/sub-10189/func/sub-10189_regressed_bp.nii.gz', num_fields = 43

all fields:
  name                offset  nvals  values
  ------------------- ------  -----  ------
  sizeof_hdr             0      1    348
  data_type              4     10    
  db_name               14     18    
  extents               32      1    0
  session_error         36      1    0
  regular               38      1    r
  dim_info              39      1    48
  dim                   40      8    4 193 229 193 148 1 1 1
  intent_p1             56      1    0.0
  intent_p2             60      1    0.0
  intent_p3             64      1    0.0
  intent_code           68      1    0
  datatype              70      1    16
  bitpix                72      1    32
  slice_start           74      1    0
  pixdim                76      8    1.0 1.0 1.0 1.0 2.0 0.0 0.0 0.0
  vox_offset           108      1    7632.0
  scl_slope            112      1    0.0
  scl_inter            116      1    0.0
  slice_end            120      1    192
  slice_code           122      1    0
  xyzt_units           123      1    10
  cal_max              124      1    0.0
  cal_min              128      1    0.0
  slice_duration       132      1    0.0
  toffset              136      1    0.0
  glmax                140      1    0
  glmin                144      1    0
  descrip              148     80    
  aux_file             228     24    
  qform_code           252      1    1
  sform_code           254      1    1
  quatern_b            256      1    0.0
  quatern_c            260      1    -0.0
  quatern_d            264      1    0.0
  qoffset_x            268      1    -96.0
  qoffset_y            272      1    -132.0
  qoffset_z            276      1    -78.0
  srow_x               280      4    1.0 -0.0 -0.0 -96.0
  srow_y               296      4    -0.0 1.0 -0.0 -132.0
  srow_z               312      4    0.0 0.0 1.0 -78.0
  intent_name          328     16    
  magic                344      4    n+1

N-1 header file '/home/gabridele/Desktop/irbio_folder/spreading_dynamics_clinical/derivatives/sub-10189/func/sub-10189_regressed_smoothed.nii.gz', num_fields = 43

all fields:
  name                offset  nvals  values
  ------------------- ------  -----  ------
  sizeof_hdr             0      1    348
  data_type              4     10    
  db_name               14     18    
  extents               32      1    0
  session_error         36      1    0
  regular               38      1    r
  dim_info              39      1    48
  dim                   40      8    4 193 229 193 148 1 1 1
  intent_p1             56      1    0.0
  intent_p2             60      1    0.0
  intent_p3             64      1    0.0
  intent_code           68      1    0
  datatype              70      1    16
  bitpix                72      1    32
  slice_start           74      1    0
  pixdim                76      8    1.0 1.0 1.0 1.0 2.0 0.0 0.0 0.0
  vox_offset           108      1    8000.0
  scl_slope            112      1    0.0
  scl_inter            116      1    0.0
  slice_end            120      1    192
  slice_code           122      1    0
  xyzt_units           123      1    10
  cal_max              124      1    0.0
  cal_min              128      1    0.0
  slice_duration       132      1    0.0
  toffset              136      1    0.0
  glmax                140      1    0
  glmin                144      1    0
  descrip              148     80    
  aux_file             228     24    
  qform_code           252      1    1
  sform_code           254      1    1
  quatern_b            256      1    0.0
  quatern_c            260      1    -0.0
  quatern_d            264      1    0.0
  qoffset_x            268      1    -96.0
  qoffset_y            272      1    -132.0
  qoffset_z            276      1    -78.0
  srow_x               280      4    1.0 -0.0 -0.0 -96.0
  srow_y               296      4    -0.0 1.0 -0.0 -132.0
  srow_z               312      4    0.0 0.0 1.0 -78.0
  intent_name          328     16    
  magic                344      4    n+1

Howdy-

The datatype=16, meaning it is float type data, or 4bytes per voxel.

The voxel size is quite fine for EPI---1x1x1 mm^3. That is pretty surprising (was this data acquired at/near such high resolution, or just upsampled heavily during processing?). But more importantly for judging the size of the file, the matrix size is 193x229x193, and there appear to be 148 time points per voxel.

Multiplying out 193x229x193x148x4 leads to 5,049,772,432 bytes, or 5GB for that dataset. It's quite big. When that gets compressed, it can/will shrink down. But the amount it shrinks down will depend on the data contents and their compressibility. None of this processing is changing the matrix size, the number of time points or the datum type.

It does surprise me that the compression for the final dataset would be so much less than the earlier data, since the mask appears to be applied in the later stages. I guess maybe the earlier stages are masked, too.

One thought that occurs to me is that the output of 3dTproject will be like a residual time series, which should probably be quite noisy. "Noise", with all its randomness, does not compress down well. The compressed file size increase is about an order of magnitude, which is quite large, but at least some of this expansion might be due to that. Blurring/smoothing would likely increase compressibility, decreasing *.nii.gz file size. Does that pattern bear out in your datasets?

--pt

1 Like

Dear pt,

Thanks a lot for your thorough explanation. I did fail to mention that the timeseries has been resampled according to T1's size and hence this might have compromised the compressibility of the file. I will take now a few steps back and check if that's the case. Thanks again for your message

Gabriele

It might be possible that the resampling step was done with nearest neighbor interpolation, so voxels would repeat rather than be an in-between value. That repeating arrangement would compress well, but smoothing would reintroduce in-between values, albeit different ones than other interpolation methods. The smoothed values would not compress as well.

I will add that in general, we don't really recommend large upsampling of EPI datasets. Consider having 3x3x3 mm^3 voxels to start. One can't really create higher resolution information just by upsampling to 1x1x1 mm^3 resolution. Consider also that for voxelwise studies, one typically blurs the data--that further reduces spatial specificity---so having high upsampling really doesn't seem to be useful.

The final maps might come out with smoother lines, but that won't really represent the spatial specificity of the acquired data. Furthermore, upsampling quickly increases file size and procesing time. Consider going from 3mm iso to 1.5mm iso. That means that one has replaced one voxel with eight in the upsampled data---that means that one has almost an order of magnitude more data to analyze. In going from 3mm iso to 1mm iso, one has 27 times (!) as much data. That takes up vastly more disk space, computing requirements and processing time, without actually creating more information.

I know some default pipelines do upsample heavily, but it doesn't seem to be worthwhile in practice. The afni_proc.py default will be to slightly upsample/round based on the minimum voxel size (but of course, you can control this final resolution yourself with -volreg_warp_dxyz ..).

--pt