I have AFNI locally on my desktop computer and on our analysis servers. The analysis servers gets their large storage space by mounting a our storage server on them. This is a bit of a trade off since it is very expensive to have a huge powerful server that also have vast amounts of disk space. But on the other hand files sometimes have to travel through the network and that might slow some processes down. I think I have discovered one of these:
When running 3dFWHMx locally on my Ubuntu machine it is “fast” (quite demanding procedure but the one cpu core is used to a 100 %).
When running 3dFWHMx locally on our RedHat server it is also fast, using 100 % of one cpu core (copied the data to the actual disk-space of the server).
When running 3dFWHMx on our RedHat server but with the data in the mounted directory (like we always do, the users home folders are there) this function is really really slow. It usually only uses 2-3 % of one cpu core and sometimes it jumps up to 100 % for a short period of time and then back to 2 %.
I guess this indicates that it is a network/mounting related problem. 3dClustsim on the other hand is much faster on the server, even in the mounted directory, due to the many cores of the server. So, is 3dFWHMx functioning in a way that is extra sensitive to quite slow migration of data files? And is there anything I can do to get around this? I guess I can create a local directory on the server where the users can preform smoothness estimates and wipe it every night.
3dClustSim does not actually read data files (at least, the way you are using it) – the program just creates simulated data in memory and processes it. So it can use the CPUs at nearly 100%. In the jargon of computing science, it is “compute bound”.
3dFWHMx, on the other hand, does pretty simple calculations after reading the dataset in (the -acf computations are a little more complicated). It is “I/O bound” – and in your case, this is showing up painfully.
I don’t know how to help you in your situation, short of telling you to copy the input file for 3dFHWMx to a local disk, use it there, then throw it away. But if you are running it via afni_proc.py, that isn’t much help.
The NIH compute cluster has a similar problem, with fast Linux nodes and a slow networked filesystem. The solution they offer is that each Linux node has a SSD (solid state drive) local to it, that can be used only by the jobs running on that node – and when a job is done, its space on the SSD is erased. The way this speeds things up is to (1) copy all inputs to the SSD, (2) process them there, and (3) copy the outputs back to permanent storage. I am doing something like this right now on a series of jobs, and they run about 30-50% faster than they did before I discovered this “trick”. Perhaps something like this is available on your server?
Thanks for the input Bob! It feels good to have an explanation!
We are, compared to the NIH, very small and do not run a huge cluster. We have two separate quite capable Linux servers (I guess this would correspond to two nodes at your place). A relatively large Windows storage server is mounted on each of these and this is the location on the servers where people do their analysis (since that creates a lot of data).
Maybe we can just buy a TB sized SSD disk for each of these servers, give all the users read/write privileges and they can go there to run smaller jobs or “I/O-bound” jobs like 3dFWHMx. Then I can set a cron-job to wipe it at midnight every other day or something. This would probably work well for us. Or do you see a major flaw in this?
Thanks again - This was great!
The
National Institute of Mental Health (NIMH) is part of the National Institutes of
Health (NIH), a component of the U.S. Department of Health and Human
Services.