3dSVM Weighting Function

AnthonyA · September 19, 2022, 5:52pm

I am trying to understand how exactly 3dSVM produces its activation maps. In the documentation for 3dsvm, it mentions that it outputs the sum of the weighted linear support vectors. However, it does not specify how it exactly weights the linear support vectors themselves. In the LaConte paper referenced in the documentation, it was based off the distance from the margin, but the exacting weighting function used was not specified either. Can someone shine some light onto the weighting function that the program uses? Thank you so much!

slaconte · September 27, 2022, 3:17am

Hi Anthony,

Sorry for the delay - thanks Gang for the heads-up and Brian for reinstating my message board account!

Quick answer:

The SVM decision boundary is defined by a weight vector W and a scalar bias term w_0 (svm-light and thus 3dsvm uses b instead of w_0). For the linear kernel case, W is the same size as the input data. So if you are using whole brain data, you can overlay W as a whole-brain map. W is the SVM solution to whatever your labeled training data gave 3dsvm.

A bit more:

Let’s call your training voxels X_t (Real numbers) and your class labels y_t {-1, 1}. (Note under the hood 3dsvm takes 0, 1 labels and maps them to -1, 1). The weight vector, W = SUM_t(alpha_t * y_t * X_t). (Note * is just simple multiplication). I’m summing over t, but you could be doing other things, like summing over subjects.

NOTE: If you specify –alpha alpha_file_name in 3dsvm, you will get a file that has a value for every t. These values are the non-negative alpha_t * y_t. Because it is multiplied by y_t, for the negative class (smaller class label in 3dsvm), you will get negative values in the alpha file.

For fun and some intuition, what if all of the alphas were equal and we broke up the summation into class 1 and class -1?
W = SUM_+(alpha_+ * X_+) – SUM_-(alpha_- * X_-) (here instead of _t, _+ is supposed to be all the t’s labeled +1 and _- is supposed to be all of the t’s labeled -1).

This is really similar to just taking the average of all of the class 1 volumes and subtracting it from the average of all of the class -1 volumes. You can try this in AFNI (e.g. with 3dcalc), and it will probably give you an OK looking map!

Of course the SVM alphas are generally not the same for every X_t, so in a sense SVM is giving us a smart, weighted average. Larger magnitude alphas make that X_t “count” more. If alpha equals 0, then that X_t doesn’t “count” at all! Any X_t with an alpha > 0 (or abs(y*alpha) > 0) is a support vector.
You could try multiplying each volume by the value in the alpha file outside of 3dsvm to verify.

Even more detail (sure to be either too much or not enough – could it be both simultaneously?):

The alphas are Lagrange multipliers for solving the SVM margin constraints. The SVM approach is to minimize the norm of W subject to y_t*( dot(X_t,W) + b ) ge 1. Here dot(u,v) is the dot product of vectors u and v and ge is “greater than or equal to.” Going back to your question, the exact weighting function comes from this. The alphas*y are the weights. Any X_t that has a non-zero alpha_t is a support vector.

One more try:

There are lots of good tutorials that set up the SVM quadratic programming problem, which ultimately lead to the Lagrange multipliers. I’ve never seen anyone call this a weighted sum of the training data, but it is and I hope that gives you some intuition. The other shortcoming of tutorials is that it is convenient to work in 2D and let the reader generalize their thoughts to higher dimensions. In those 2D plots, think of each point as single fMRI volume. Thus you would have tens of thousands of dimensions instead of 2, but each volume would still just be a single point in that N-dimensional space. Now also think about some extremely simple cases in 2D: (with class 0 as “#” and class 1 as “^” as points in 2D space).

Voxel 2 hyperplane
| |
| # | ^ ^ ^
| # # |
| # | ^ ^
||_____ Voxel 1
| |

For example, the above shows 2 voxels and 9 TRs.

And

Voxel 2
| # # # #
-----|--------------------------------------------- hyperplane
| ^^^ ^ ^
|_______________________ Voxel 1
|

In the first case, what I am attempting to illustrate is a hyperplane that is perpendicular to voxel 1 and parallel to voxel 2. Voxel 1 is doing all of the work! (Remember you are mapping back to voxels and trying to understand how that relates to the decision boundary). In the second case, the hyperplane is horizontal and voxel 2 should be more prominent in a weight vector brain map.

Hope this helps,
Steve LaConte