Spikes in timeseries from clustered acquisition

Hello AFNI gurus!
tl;dr: What is the best way to remove spikes in fMRI time series that occur as a result of using a clustered acquisition protocol?

We are running an auditory study using a clustered acquisition protocol. Each trial consists of a 10s scanner OFF period, followed by a 16s scanner ON period. The auditory stimulus that we will be modeling is presented 1500ms into the scanner OFF period, and lasts 7s.

A large spike occurs at the start of each scanner ON period and lasts up to 2 TRS (4s) until the signal reaches steady state, which I am trying to get rid of.

3dDespike doesn’t seem to help much (DespikedTS.png attached). So now I’m looking for a way to regress out the spike from the raw time series. I have a couple of silent trials, where no stimulus is presented in scanner OFF period. The spike in these trials is (presumably) “pure” and not muddled with any task-related activity. I’m trying to use the average shape of this pure spike across the silent trials to regress it out.

Consequently, I have a couple of questions:

  1. Is it reasonable to treat this averaged pure spike as an “HRF” to use for convolution at the start of each acquisition period? If so, is there a way to do this (apart from using the -WAV option in 3dDeconvolve, and making the HRF generated by WAV look like the spike)? Using 3dTfitter, perhaps?
  2. The spike looks fairly different in different voxels. If I were to regress the spike out, would it be possible to use a voxel-by-voxel measure of the spike to regress out?

Thank you,
Mrinmayi