I am recently using 3dsvm to analyze my data, and wondering if there is any method to prohibit the program to output the model file in the training mode.
Below is my situation.
My general aim is to find a subregion of ROI which includes the most representative voxels in the decoding algorithm
The basic method is:
1) run linear svm analysis on the ROI with 3dsvm;
2) select a subset of voxels which have big abstract weight values;
3) re-run the linear svm analysis in the selected voxels { repeat to Step 1) }.
4) when the selected voxel number reaches a threshold, do the prediction analysis.
3dsvm can deal with this process normally. But here is my concern. In the Step 1) above, 3dsvm is using the training mode. In this mode, there are usually two outputs: one is the model file calculated from the training process, which is supposed to be used for later prediction process; the other one is the weight file containing the distance of each voxel to the hyperplane in the model. In my case, in the recursive loop of Step 1) and 2), what I am caring about is only the weight file, not the model file.
However, the 3dsvm program treats the model file as a necessary output in the training model. Each model file has a rough size of 1 GB. Since I need to run the analysis for different ROIs and subjects, I have to keep writing and removing the model files in my disk. Writing these files consumes a considerable amount of time given the number of ROIs and subjects. So I would like to find a way to prohibit the model output to speed up my analysis.
So here is my question: is there a method for me to prohibit the 3dsvm program to output the model file in the training mode? Or is there any other program that I can use to achieve my intention?
Any advice will be appreciated.
I added the option: -nomodelfile for this. This allows you to omit the -model option for training and no model file will be written to disc. I think this should be compiled overnight and will be available for download tomorrow ET.
Give it a try and let me know if you run into any problems.
Since I could not wait for the reply to this thread, I modified the source code by myself.
Anyway, I will test your command together with my own and check their output.
One more problem, I think it’s more severe.
When I use the prediction mode of 3dsvm, I found that the command can not reproduce previous prediction file (pname file). Below is a test example.
The test command is:
3dsvm -testvol run2.nii.gz -testlabels var2.1D -model model+orig -nodetrend -predictions pred
I ran the command 100 times in a sequential way, and then 100 times in a parallel way (20 instances being calculated simultaneously, repeating 5 rounds). This will generate 200 prediction files.
When I checked the output prediction files, I found 12996 pairs of files that are different from each other (total pair number is 19900). In sequential mode, the different pair number is 1294 (total pair number 4950). In parallel mode, the different pair number is 4454 (total pair number 4950).
I don’t know the details of the calculation process in 3dsvm, but shouldn’t the command generate consistent output? Did I make any mistake?
Thanks for running this and for letting us know! As long as model file and test options are the same, the distance to the hyper-plane for each observation in the predictions file(s) should be within single-float precision. However, I’m not sure why the number of different files is not roughly the same between sequential and parallel execution (chance ?!). I will run something similar and check.
We were writing too many significant digits into prediction output files. I think this is the reason for the differences you saw. I updated 3dsvm and ran a similar test using motor data. All prediction files were identical regardless if executed sequentially or in parallel.
If you still see differences using the updated version of 3dsvm, please let me know.
I have another question about 3dsvm.
There is an option in prediction/test mode "-nodetrend". Its function is to linearly de-trend the pname file. My question is at what situation the de-trend process is necessary?
Thanks.
I downloaded the updated source code of 3dsvm and compiled it to binary file.
Then I ran the command on my test data for 100 times in sequential way, and checked if the output pname files were identical or not.
Since in my test data there were 3 conditions to classify, the command generated pname files for overall prediction values for all the three conditions, as well as the pname files for pair-wise predictions.
I found that all the pair-wise prediction pname files were identical, but the overall_DAG.1D files still varied. Probably there are still some bugs in DAG method part. I did not check the vote method in multi-class prediction.
Another issue is still on the model file output. Previously I asked your help to add a no-model-output option in the training mode. By checking the source code, I found 3dsvm has a TRAIN AND TEST mode, which allows users to input training and testing dataset together within a single command line. 3dsvm in this mode still generates the model files. However, I only care about the resultant prediction values, and do not care about the model itself. Theoretically, the model generated from the training process could be passed to the testing process through RAM rather than through HDD. So may I ask for more of your help to make some modifications on the command to cease the model output in the TRAIN AND TEST mode? I think it would benefit very much in speeding up the data processing.
Thank you very much for your help!
3dsvm was originally developed for classification of temporal and minimally pre-processed FMRI data. Thus, temporal de-trending of the classifier output is performed by default to correct for temporal drift that we observed. Please take a look at LaConte, 2007 (Fig. 4) for more details. However, performing de-trending of the classifier output is not the only and might not be the best way to correct for drift of the FMRI signal/classifier output, so you need to decide if it’s necessary.
If you are NOT using temporal FMRI data (e.g. GLM beta maps, functional connectivity maps, structural data, etc.), please use the -nodetrend option to disable temporal de-trending.
Thanks for running your test again and I’m glad to hear that the classifier output does no longer differ when performing training and testing multiple times using the same command-line options. Most people probably would not have noticed, since these differences are smaller than single-precision. I will take a look at the multi-class predictions (DAG, vote) and check why the output is not identical, but I suspect it’s for the same reason. I would not necessary call this a bug, but reaching the precision limit of 3dsvm. That being said, the output should be identical so thank you for letting us know.
I agree, if desired (-nomodelfile) no model file should be written to disc when performing training and testing simultaneously. This was always the plan, but we rarely used this and never finished coding it up. This should not take too long - I’ll update 3dsvm soon.
Jonathan
The
National Institute of Mental Health (NIMH) is part of the National Institutes of
Health (NIH), a component of the U.S. Department of Health and Human
Services.