RAID System analyses?

storrisi · March 19, 2025, 7:00pm

general computation q:

i'm using an M1 desktop mac studio and processing afni_procs is fast but i'm running out of space. any experience with doing analyses directly on an external RAID drive? looking specifically at "PROMISE Pegasus32 R4". back in the day i remember I/O would slow everything down, so therefore i'm skittish, but i'm sure things have changed by now. thanks for your thoughts!

-Sam

pmolfese · March 19, 2025, 9:36pm

Disk IO is the slowest part of data processing. RAIDs can work! But if you test it out and you're really noticing a slowdown, I would recommend copying your files to the Mac Studio internal drive, processing there, and then copying over to a RAID.

We actually do this on a cluster as well. All of the cluster blades have internal SSD storage, so we'll copy data over, process it, and then copy it back to the pooled (slower) storage. I usually put this back and forth copying directly in the script that later calls afni_proc.

rickr · March 20, 2025, 1:54pm

Running analyses serially should be okay, but if the RAID is competitive for use, including running multiple analyses at once, then Pete's copy afterwards approach is a good way to go. That might be a good habit in any case if the RAID is shared. It is what we do in our demo scripts.

Note that on biowulf for example, such copying isn't necessary for speed (it used to be). However it may be kinder for other users and overall system performance to stay in that habit.

There is also a possible need to use AFNI_NOMMAP on a remote file system. Some configurations make use of that important, though it does not seem common. It is not needed on biowulf.

-rick

dglen · March 20, 2025, 2:05pm

One more tip to add to these other useful tips. All this depends on lots of factors - other processes including disk accesses, file sizes, RAID configuration types, physical disk I/O speeds, disk file systems, memory usage, memory mapping for AFNI, RAID connection type and speed, dedicated networks, file compression,.... For time critical applications like realtime feedback, some use pigz instead of gzip to compress data faster using parallel threads. On some systems, it might work out faster to have no compression, and on some to compress the data. AFNI format and NIFTI format read and write differently, so even that will make a difference.

storrisi · March 20, 2025, 6:13pm

thank you guys! suuuper helpful. this'll be for pretty 'standard' fmri analyses with resolutions generally ranging from 0.8mm to 1.2mm iso (FOVs scaling accordingly); so mostly just sswarper2+afni_proc serial workflows; no realtime feedback and it's not shared and permissions aren't an issue and as for compression i have pigz installed and "AFNI_AUTOZIP = YES" in my .afnirc. the RAID system's been ordered and it'll be fun to try out (and fingers crossed i won't have to do Pete's copy afterwards approach but good to know that's an option) and i'll report back general impressions. thanks again!

dglen · March 20, 2025, 6:19pm

I turn off AFNI_AUTOZIP (NO) because AFNI programs will decide for each output whether to compress. That makes scripting and output variable. I do set AFNI_COMPRESSOR though to GZIP or PIGZ.

storrisi · March 20, 2025, 7:22pm

good to know, i'll try that @dglen

discoraj · March 26, 2025, 7:22pm

With the R8 you can use SSDs.
But it looks like the R4 can only use HDDs.

But you can connect directly to your mac with thunderbolt 3.
That will help speed things up.
If you use RAID 0 (striping), that will speed things up, but you lose redundancy.

Some other brands (synology) can use M.2 drives to speed up read and writes using caching.

It is over ethernet only though no thunderbolt.
But it has ethernet bonding, so you can use the two 1Gb ethernet ports at the same time.
If you have a 2.5 Gb router or switch you can probably max out the connection speed wise.
That is what I use.

The mac studio already has a 10 Gb ethernet port.
However, thunderbolt should be faster.

storrisi · May 29, 2025, 2:49am

update: took me forever to return to this issue but we got the Pegasus32 R4 and, connected via thunderbolt, i tested an already-analyzed subject's afni_proc with some high res + nonlinear parts that taxed the system a little. there was no discernible difference in processing time from the internal SSD memory analysis. sweet!!