How to Specify Individual-Level (Random) Effects in Hierarchical Modeling?

How to Specify Individual-Level (Random) Effects in Hierarchical Modeling?

Gang Chen

Preface

Hierarchical (also known as mixed-effects or multilevel) modeling is a powerful analytical tool designed to handle complex data structures. However, its power is often matched by the challenges it poses, both at conceptual and specification levels. The motivation behind this blog post stems from the common practice of specifying hierarchical models with a simple varying (random) intercept at the individual level. However, this approach can pose challenges, especially when multiple within-individual variables are taken into account. In this post, we explore several population-level scenarios in neuroimaging data analysis that can be valuable templates. We plan to expand this list by incorporating additional scenarios as they demonstrate typified patterns.

We wish to underscore the following key points:

  • Within-individual variables. The term "within-individual variables" (or "repeated-measures") is a conventional notion used in the context of ANOVA. It denotes situations where the measurement unit, such as an experimental participant, experiences all levels of a categorical variable (factor) or multiple instances of a quantitative variable (e.g., time) in longitudinal studies.
  • Counterbalance. The models outlined in this discussion for scenarios involving one or more within-individual factors are not the most intricate ones that could potentially be employed. In other words, there exists the possibility of utilizing more sophisticated models to capture nuanced variance-covariance patterns. However, we believe that the parsimonious models introduced here strike a delicate equilibrium between effectively accounting for data variability and managing computational complexities, rendering them well-suited for many practical scenarios.

  • Between-individual variables. The treatment of between-individual variables requires thorough consideration. The discourse here primarily revolves around within-individual variables, particularly categorical variables (factors). Variables such as sex, patient/control status, and age, which encompass differences between individuals, can also be integrated into a model. However, the process of variable selection is intricate and will be addressed separately at another juncture.

  • Implementation. The specifications provided here are especially relevant for population-level analyses conducted using the AFNI program 3dLMEr. This program is renowned for its flexibility and is preferred over its predecessor, 3dLME. Additionally, when compared to the ANCOVA program 3dMVM, 3dLMEr demonstrates greater adaptability.

In the scenarios discussed below, we assume:

  • y is the response variable (e.g., BOLD response) in a hierarchical data structure;
  • Subj serves as the unit of measurement (e.g., experiment participant);
  • A, B, and so forth, signify within-individual factors (e.g., emotion, congruency, and more). These categorical variables are commonly utilized in designed experiments to investigate their causal effects on neural response.

1) One within-individual factor

For situations with just a single within-individual (or repeated-measures) factor A, the process is relatively straightforward. Assuming indices a and i represent the factor levels and individuals respectively, we typically define the following hierarchical model for data y_{ai}:

\begin{aligned} y_{ai}&\sim N(\mu_{ai}, ~\sigma^2)\\ \mu_{ai}&=m_a+\delta_i\\ \delta_i &\sim N(0,~\tau^2) \end{aligned}

Here, m_a denotes the population-level effect associated with the a-th factor level, \delta_i signifies the effect linked with the i-th individual (commonly known as a random effect), and \sigma^2 and \tau^2 represent the population- and individual-level variances respectively. This formulation is commonly referred to as a linear mixed-effects model with random (or varying) intercepts.

Mapping this model into the program 3dLMEr is straightforward:

-model  'A+(1|Subj)'

In this case, A corresponds to m_a, while (1|Subj) corresponds to \delta_i.

(1.A) incorporation of within-individual quantitative variables

The above model can be extended to include individual-level slopes. For example, when rating score r_{ai} is available for the i-th individual at the a-th level of the factor, we may modify the model to

\begin{aligned} y_{ai}&\sim N(\mu_{ai}, ~\sigma^2)\\ \mu_{ai}&=m_a+s_a r_{ai}+\delta_i+\theta_i r_{ai}\\ \begin{bmatrix} \delta_i\\ \theta_i \end{bmatrix} &\sim N(\begin{bmatrix} 0\\ 0 \end{bmatrix}, ~ \begin{bmatrix} \tau_1^2 & \rho \tau_1\tau_2 \\ \rho \tau_1\tau_2 & \tau_2^2 \end{bmatrix}) \end{aligned}

where s_a and \theta_i are the slopes at the population- and individual-levels, respectively. This formulation is commonly referred to as a linear mixed-effects model with both random intercepts and random slopes.

The specification in 3dLMEr is now updated to

-model  'A*rating+(1+rating|Subj)'

To improve the interpretability of differences among the levels of the factor, it may be essential to center the variable rating within each level of the factor.

(1.B) incorporation of multiple samples

Another extension to the hierarchical model above with one within-individual factor A is the scenario with multiple samples. Suppose that the response variable (e.g., BOLD response) is measured across N samples (e.g., scanning runs). With an extra index n for samples (n=1,2,...,N), the original model is now extended to

\begin{aligned} y_{ain}&\sim N(\mu_{ai}, ~\sigma^2)\\ \mu_{ai}&=m_a+\delta_i+\theta_{ai}\\ \delta_i &\sim N(0,~\tau^2)\\ \theta_{ai} &\sim N(0,~\pi^2) \end{aligned}

This extended model can be implemented through 3dLMEr as

-model  'A+(1|Subj)+(1|A:Subj)'

2) Two within-individual factors

Now, let us extend our discussion to scenarios with two within-individual (or repeated-measures) factors, say A and B . Here, indices a, b, and i denote the factor levels for A , B , and individuals respectively. In such cases, a random-intercept model would not be appropriate. Instead, we consider the following hierarchical model for the data y_{abi}:

\begin{aligned} y_{abi}&\sim N(\mu_{abi}, ~\sigma^2)\\ \mu_{abi}&=m_{ab}+\delta_i+\alpha_{ai}+\beta_{bi}\\ \delta_i &\sim N(0,~\tau_1^2)\\ \alpha_{ai} &\sim N(0,~\tau_2^2)\\ \beta_{bi} &\sim N(0,~\tau_3^2)\\ \end{aligned}

For mapping this model into the program 3dLMEr , we use the following specification:

-model  'A*B+(1|Subj)+(1|A:Subj)+(1|B:Subj)'

If a within-individual quantitative variable like rating is available across all the levels of factors A and B, consider the following specification:

-model  'A*B*rating+(1+rating|Subj)+(1+rating|A:Subj)+(1+rating|B:Subj)'

Again, proper centering might be essential for the interpretability of some effects.

3) Three within-individual factors

Extending our approach from two within-individual factors to three, the case involving factors A , B , and C should come naturally. Indices a, b, c, and i represent the levels of factors A , B , C , and individuals respectively. We adopt the following hierarchical model for the data y_{abci}:

\begin{aligned} y_{abci}&\sim N(\mu_{abci},~\sigma^2)\\ \mu_{abci}&=m_{abc}+\delta_i+\alpha_{ai}+\beta_{bi}+\gamma_{ci}+\xi_{abi}+\eta_{aci}+\zeta_{bci} \end{aligned}

The distribution assumptions for individual-specific effects like \delta_i, \alpha_{ai}, etc., are similar to those in the two within-individual factors case, and are thus not repeated here.

The mapping for this model to the program 3dLMEr becomes:

-model  'A*B*C+(1|Subj)+(1|A:Subj)+(1|B:Subj)+(1|C:Subj)+(1|A:B:Subj)+(1|A:C:Subj)+(1|B:C:Subj)'

What if there are more than three within-individual factors?

In this scenario, we'd like to emphasize two key points. Firstly, if an investigator plans to design an experiment with such a high level of complexity, they should anticipate and include strategies for managing the resulting model complexity as part of their planning process. Secondly, extending the modeling process beyond the case with three within-individual factors, as demonstrated above, is not significantly more challenging, although it may involve increased technical intricacy and computational cost.

1 Like

Thanks for this very helpful explainer on how to specific the LMEr models. I have a complex data set with 1 within-subjects factor and 2 within-subjects quantitative variable, as well as a covariate.

3dLMEr -prefix 3dLMEr_output_Longit_Format_Quantity_Linear \
-jobs 8 -mask /Users/andrewlynn/Documents/Projects/HomeMath_RSA/ref/mask/BN_Atlas_HaskinsPeds_NL_template_2.5_mask+tlrc.HEAD \
-model "age*format*quantity+motion+(1|Subj)+(1+quantity|format:Subj)+(1+quantity|age:Subj)" \
-qVars "age,quantity,motion" \
-qVarCenters "7.03,4.5,0.13" \
-SS_type 3 \
-dataTable \
Subj    age    format  quantity   motion   InputFile \

I can get the model to run, but I receive a singular fit

boundary (singular) fit: see help('isSingular')

Do you have any advice on how to improve this model in 3dLMEr? I think there are optimizers in the R packages -- is there a way to do this in 3dLMEr? I see that you suggest a random intercept only model isn't correct, but if the model is singular isn't that a better option?

Thanks in advance for your help,

Andrew

Andrew,

Is format a within-individual factor while the quantitative variables of quantity and motion vary within each individual? If so, do quantity and motion differ to some extent across the levels of format?

Gang Chen

Yes, format (digit, dot) is within-individual. But quantity (2,3,4,5,6,7) is the same for both levels of format. Motion differs across Age, but is the same across format and quantity for each Age (of which there are 2 timepoints for most individuals, and some missing data).

The model is not specified properly. But first, what are the effects of interest? If covariates are involved, check out this discussion.

The effects of interest are format, quantity, format x quantity, format x age, quantity x age, and format x quantity x age. Motion is a covariate of no interest seeking to control for between- and within-individual differences of in-scanner motion.

Andrew,

I assume age does not vary within each individual. Consider the following model:

-model "age*format*quantity+motion+(1+quantity|Subj)" \

Gang Chen

The study is longitudinal. age does vary (2 timepoints). But quantity is the same across format and age.

Currently I've been working with this: format*quantity*age+motion+(1|Subj)+(1|age:Subj)

But perhaps I need: format*quantity*age+motion+(1|Subj)+(1+quantity|Subj)(1+age |Subj)

In that case, try

-model "age*format*quantity+motion+(1+age+quantity|Subj)" \

Gang Chen

That model is singular again, but the previous one I tried isn't: format*quantity*age+motion+(1|Subj)+(1+quantity|Subj)+(1+age|Subj)

Do the results differ much between the two models?

The results appear to be the same -- but the fixed effects couldn't changed based on the random effects structure, right?

I'm looking into it -- but I'm not sure I understand the difference between (1+age|Subj) and (1|age:Subj) Does it just depend on the type of variable?

Differences between your specification (1|Subj)+(1+quantity|Subj)+(1+age|Subj) and my suggestion (1+age+quantity|Subj):

  • This part (1|Subj) in your specification is redundant because it is already included in latter terms.

  • Your specification assumes no correlation between age and quantity. In contrast, the correlation in my suggestion is estimated from the data. With only two age values per participant, it is possible that the correlation estimation could fail at some voxels. However, such singularity warning would not matter for the effects of interests.

In the random-effects formula (A|B), B should be combinations of categorical variables while A can be any variable types. Thus, (1|age:Subj) would not be a meaningful specification even if the program does not complain.

Gang Chen

I think I understand. I've decided to go with (quantity*age|Subj) to allow for both slope estimates for age & quantity and to allow them to vary as a function of one another.

Thanks again for all your help!!

Hello. I hope I have found the right place to ask questions.
I was trying to run the 3dLMEr model and have some questions about the model specification and run time.
As background, we have a dot probe task: 3 (Groups: SR/DP/HC) X 3 (Cond1: death/life/neutral) X 3 (Cond2: congruent/incongruent/neutral). For the model specification, I only selected the DP group and estimated the group difference in the "Cond2" condition while including age and "Cond1" condition as covariates.

-model 'Cond2+Age+(1|Subj)+(1|Cond1)'	\
-qVars 'Age'	\
-SS_type 3	\
-bounds -2 2\
-gltCode Con 'Cond2 : 1*con Age :'	\
-gltCode Incon 'Cond2 : 1*incon Age :'	\
-gltCode Neut 'Cond2 : 1*neut Age :'	\
-gltCode Con-Incon 'Cond2 : 1*con -1*incon Age :'	\
-gltCode Incon-Neu 'Cond2 : 1*incon -1*neut Age :'	\
-gltCode Con-Neu 'Cond2 : 1*con -1*neut Age :'	\
-gltCode eff1 'Cond2 : 0.5*con +0.5*incon -1*neut Age :'	\
-gltCode eff2 'Cond2 : 1*con -1*incon & 1*con -1*neut Age :'	\
-mask MNI_N27_mask+tlrc		\
-jobs 16		\
-dataTable		\
319	DP	death	con	24	ffespmeeg_319_sdotprobe-f_1_t0_1000_f30_58_1_resampled.nii	\
319	DP	death	incon	24	ffespmeeg_319_sdotprobe-f_1_t0_1000_f30_58_2_resampled.nii	\
319	DP	life	con	24	ffespmeeg_319_sdotprobe-f_1_t0_1000_f30_58_3_resampled.nii	\
319	DP	life	incon	24	ffespmeeg_319_sdotprobe-f_1_t0_1000_f30_58_4_resampled.nii	\
319	DP	neut	neut	24	ffespmeeg_319_sdotprobe-f_1_t0_1000_f30_58_5_resampled.nii	\
321	DP	death	con	22.4	ffespmeeg_321_sdotprobe-f_1_t0_1000_f30_58_1_resampled.nii	\
321	DP	death	incon	22.4	ffespmeeg_321_sdotprobe-f_1_t0_1000_f30_58_2_resampled.nii	\
321	DP	life	con	22.4	ffespmeeg_321_sdotprobe-f_1_t0_1000_f30_58_3_resampled.nii	\
321	DP	life	incon	22.4	ffespmeeg_321_sdotprobe-f_1_t0_1000_f30_58_4_resampled.nii	\
321	DP	neut	neut	22.4	ffespmeeg_321_sdotprobe-f_1_t0_1000_f30_58_5_resampled.nii	\
322	DP	death	con	63.3	ffespmeeg_322_sdotprobe-f_1_t0_1000_f30_58_1_resampled.nii	\
322	DP	death	incon	63.3	ffespmeeg_322_sdotprobe-f_1_t0_1000_f30_58_2_resampled.nii	\
...

I have three questions:

  1. Do you think my model specification is adequate?
  2. If then, how long it is supposed to take for 3dLMEr analysis with 94 nifti files? This is because my run has taken more than three hours, which seems not right to me.
  3. When I ran the model, the model sometime needs lots of memory allocation. Is there anyway to resolve this memory issue?

Thank you for your consideration in advance!

  1. Do you think my model specification is adequate?

First, you need to add a header to the data table:

Subj	Group  Cond1	Cond2	Age	InputFile	\

If Cond1 is not part of the current model, consider

-model 'Cond2+Age+(1|Subj)'	\

or

-model 'Cond2*Age+(1|Subj)'	\

Gang Chen

1 Like

This makes sense. Thank you so much!