# Cutting a time-series after vs. before preprocessing

Dear all,

I would like to know if cutting a time-series with 3dTcat (e.g., removing the first 300 TRs in a 3000 TR time-series) yields a minimally different time-series if the cutting is done before vs. after using AFNI proc concerning the power spectrum.

Let me make it more clear with two cases.

1. I cut the time-series before preprocessing with 3dTcat (or I simply use the “-tcat_remove_first_trs” option in AFNI proc) to remove the first x number of TRs.
2. I preprocess the time-series with AFNI proc, and cut it afterwards with 3dTcat.

Would the resulting power spectrum, e.g., via using 3dPeriodogram, be identical?

I read that a fundamental or basic wave can serve as a reference for Fourier analysis. That wave is the longest wave available and its usually the length of the record (in our case, the length of the fMRI run – all sample points). If AFNI proc handles preprocessing like this, I could imagine that the power spectra between case 1 and 2 could differ, even if the difference is only minimal.

Let me know what you think,
Philipp

Philipp-

There are a couple different aspects here (well, probably more than just a couple…).

Fourier analysis is a pair of operations, where you transform bases from time to frequency and vice versa.

• the analysis equation describes how you take an entire time series, and integrate/sum over it to get the estimate of one frequency coefficient magnitude and phase. “Analysis” comes from the Greek for cutting—we often picture this as cutting up a time series into its component frequency parts.
• the synthesis equation describes taking one entire frequency spectrum (with both frequency and phase information per frequency) and estimating one time point value. “Synthesis” comes from the Greek for combining or putting together—we often picture this as summing up the components to better approximate and reconstruct a time series.
(Sidenote that the Fourier relations are essentially symmetric, so we could have started from a frequency point of view for naming, but we don’t.)
But the above means that, to estimate the power spectrum of a single frequency, you need to integrate or sum over an entire time series. So, if you chop away time points, by necessity you are changing your estimate of a frequency.

That being said, if your entire time series has essentially constant frequency/statistical content throughout, for example being pure white noise or repetitive over very small time scales throughout, then chopping away initial time points might not actually affect your final coefficient magnitude values very much over a given interval of the frequency spectrum on average.

But with that being said, note that the exact physical frequencies you estimate values for depend on the number of time points you have (and the sampling rate, though let’s assume that is constant). The maximum frequency you can estimate with Fourier relations (for the unique band that includes the baseline) is determined by the sampling rate (so, here, constant), and it is called the Nyquist frequency, Fnyquist. But the number of frequencies between 0 and Fnyquist that you can estimate is determined by your number of time points. So, if you change the length of your time series slightly, you are estimating slightly different frequency components. (These are called harmonics, because they are factors of a fundamental frequency.)

And also, power spectrum estimates of our noisy datasets should also be considered statistical estimates—it’s not just the estimate that should matter, but the uncertainty associated with it, as well. In many applications, people don’t just estimate power of a single frequency, with might be a bit unstable (i.e., have high uncertainty), but instead of some range of frequencies, to have smaller uncertainty/more stability. From that point of view, if you are averaging over frequency windows/bands, and your time series has constant statistical properties throughout, then chopping away time points might not matter so much.

This has been a long answer, but it is not a question that has a simple yes/no. Constancy of frequency properties depends on what exactly you are measuring, choosing a stable way to measure it (i.e., using a frequency window, not single frequency), and the details of the time series themselves.

–pt

Paul,

thank you for the elaborate answer. What you explained about the properties of a time-series and how these properties can affect the power distribution in the frequency-domain when the time-series is cut at the beginning or end is clear to me.

For example, you wrote that “From that point of view, if you are averaging over frequency windows/bands, and your time series has constant statistical properties throughout, then chopping away time points might not matter so much.”

I understand this. My question was related to another aspect that is rather related to the internal computations that AFNI proc does, and not related to theoretical considerations, I think.

I am not wondering if the frequency-domain of a time-series can be affected when we compare the Fourier results between a BOLD time-series that was (1) not cut vs. (2) cut. Of course, the resulting frequency-domain (the distribution of power along the spectrum) can change (from minimal to extreme changes), depending on the properties of the time-series and how the cut out part deviates from the rest of the time-series.

Instead, I was wondering if cutting a time-series after preprocessing (after AFNI proc) vs. cutting the time-series before preprocessing (before AFNI proc) can result in different power spectra of the very same time-series. This is to say that the length of the time-series in case 1 is the same as in case 2. Case 1 is a cut time-series, and case 2 is a cut time-series. But in one case the cutting was done before preprocessing, and in the second case it was done after preprocessing.

And now I wonder if THAT can make a difference, precisely because preprocessing in AFNI proc maybe chooses a different fundamental wave or frequency (maybe somehow related to bandpassing or any of the vast amount of other computations), because the run was longer if not cut before preprocessing. The harmonics between case 1 and 2 maybe differ, since the fundamental frequency would not be the same when providing a time-series that is cut vs. non cut into AFNI proc.

Does that make sense? Otherwise, maybe I don’t understand if the answer to that question is already provided in your post. Please let me know if I didn’t understand you.

Philipp

Hi, Philipp-

Like the Fourier Transform, regression modeling uses an entire time series. (In fact, the FT is a special case of linear regression modeling—just using specific regressors—which is why afni_proc.py does bandpassing as a regression, to do mathematically correct processing when wanting to both bandpass and regress a model.)

Your modified question still leads to an “it depends” answer. Baseline modeling takes place during processing, and this will generally be different between your two cases (chop then process, vs process then chop). Now, in special cases of certain time series properties, the baseline model might be essentially the same between the two cases—such as if there aren’t large fluctuations of certain orders. But again, that depends on the specific time series properties and the relative size of chopping.

Re. fundamental frequencies: I think the more relevant point is the Nyquist, which depends on TR. Having more or fewer time points (for a constant Nyquist) means that your frequency spectrum is mroe or less fine, respectively, sure, but if averaging over the same band of frequencies, then the “smearing” should reduce that a bit. MRI time series are so noisy, I don’t think one would want to focus in on one, specific frequency very often unless one is looking for some mechanical effect, say. Even breathing rate won’t be perfectly constant, so one might estimate breathing effects over a small range of frequencies (though that frequency is typically above Nyquist, and only then aliased in amongst other ones).

Do you have a specific application or case for these considerations?

–pt

Hi Paul,

I am not interested in one specific frequency/oscillation; I am not interested in one particular periodicity of the time-series that would then “pop out” to be easily detected in the frequency-domain. Instead, I am interested in the whole frequency spectrum, that is, how the power is distributed, say from 0.01 Hz to the Nyquist frequency.

Now let’s get specific (maybe I should have been specific right from the beginning, but I thought it was easier to explain the topic more generally):

I have very long sleep runs up to 3000 time points with a TR of 2.16. The sleep onset heavily diverges between the single subjects. Since the sleep recordings are very long, preprocessing took days. I later realized that I should have cut out the first ~50-400 TRs where subjects were awake, because I am only interested in how the power spectrum looks in sleep. Since the power shifts to slower frequencies in sleep (compared to awake), the power spectra awake vs. sleep look different.

Because I wanted to proceed methodologically correct, I didn’t want to cut the initial “awake” TRs of the preprocessed .errts file out now, without further asking you guys if that could lead to a power spectrum that would look different compared to a time-series that was already cut before running AFNI proc on it.
Of course, the power spectrum is not computed by AFNI proc, but since there are so many computations going on in AFNI proc, I was not so sure that one of thos (or the combination and interactive effects of many of those) can subsequently affect the results of 3dPeriodogram.

The bandpassing was 0.01 - 1 Hz. The polort option was set to 2. After running 3dPeriodogram, I cut the power-spectra so that only the frequency range 0.01 - 0.225 remains.

But I already get the feeling it is more safe to cut the data with 3dTcat before preprocessing, and running AFNI proc again on those. Looking forward to your answer/ideas.

Philipp

Hi, Philipp-

With this kind of resting state-like processing, the regression model is basically full of regressors of no interest. The “output” signal for further use is then the leftover residuals. From your applied case, it seems like the question is: is the quality of fitting in the first 400 time points different than that of the remaining ones, such that the residuals would have very different properties? While sleep and rest do have different frequency content and time series characteristics, I would suspect that the quality of time series fit wouldn’t be appreciably different (as long as there is no big change in motion profiles, say). I suspect that it should be reasonable to take your full, processed output/residual/errts time series and now, afterwards, chop out the first 400 time points and estimate a power spectrum without too much difference from having chopped ahead of time and processed the “sleep” segment separately. Your errts for time points with indices 400…3000 will likely be quite similar whether you process the full 3000 time points and chop, or chop and then process—with the important caveat that there is nothing “weird” happening or a major difference between the first 400 and later ones (e.g., lots of motion differences, etc.).

I suspect your choice of polynomial regressor and bandpassing was due to this AP note:

``````
-regress_polort DEGREE  : specify the polynomial degree of baseline

e.g. -regress_polort 2
default: 1 + floor(run_length / 150.0)

3dDeconvolve models the baseline for each run separately, using
Legendre polynomials (by default).  This option specifies the
degree of polynomial.  Note that this will create DEGREE * NRUNS
regressors.

The default is computed from the length of a run, in seconds, as
shown above.  For example, if each run were 320 seconds, then the
default polort would be 3 (cubic).

* It is also possible to use a high-pass filter to model baseline
drift (using sinusoids).  Since sinusoids do not model quadratic
drift well, one could consider using both, as in:

-regress_polort 2         \
-regress_bandpass 0.01 1

Here, the sinusoids allow every frequency from 0.01 on up to pass
(assuming the Nyquist frequency is <= 1), modeling the lower
frequencies as regressors of no interest, along with 3 terms for
polort 2.

``````

so that sounds good, as well. That should mean that a pretty reasonable baseline should be fit for the full time series, hopefully.

You could re-process one subject as a “quick” check/verification, I suppose, but I don’t see that as necessary, providing other time series checks.

–pt

Paul, you summed the lines over lines of my question up into one sentence: “is the quality of fitting in the first 400 time points different than that of the remaining ones, such that the residuals would have very different properties?”.

Thanks for this one.

This is basically what I was thinking about. Normally, I don’t use bandpassing and leave the polort option to the automatic standard calculation. I only used

``````
-regress_polort 2         \
-regress_bandpass 0.01 1

``````

on this particular dataset because my laptop never seemed to finish preprocessing the first subject in the pipeline when AFNI set the polort option to 34 by default. Rick then suggestet to use the bandpass and polort settings above.

Just as a side note here: with the default polort option and no bandpassing, the file folder size of the AFNI proc output for one subject was around 30 GB (at the point where it was detrending the data - baseline funcs …), and now, using bandpassing 0.01-1 with polort 2, the final folder size per subject reduced to around 12 GB.

As for your answer: thanks, this helps me out. I will also make a comparison for one or two subjects, just to be sure that the results of cutting before vs. after AFNI proc look the same.

Philipp

The size of the output directory is smaller now because it has finished processing and removed some of the temporary datasets. The final size would probably be the same using polort only.

• rick