How to Specify Individual-Level (Random) Effects in Hierarchical Modeling?

How to Specify Individual-Level (Random) Effects in Hierarchical Modeling?

Gang Chen

Preface

Hierarchical, or mixed-effects, modeling is a powerful analytical tool for handling complex data structures. However, its advantages often come with challenges, particularly in terms of conceptual understanding and model specification. The motivation behind this post stems from the common practice of specifying hierarchical models with only a random intercept at the individual level. While this approach is widely used, it can present limitations, especially when accounting for multiple within-individual variables. In this post, we explore several population-level scenarios in neuroimaging data analysis that serve as useful templates, with plans to expand the list as additional relevant patterns emerge.

Key points to consider:

  • Within-individual variables. Also known as repeated measures, these are variables for which all levels (e.g., conditions or time points) are experienced by the same experimental unit, such as a participant. This term is familiar from ANOVA, where it applies to categorical factors or longitudinal quantitative variables (like time).
  • Counterbalancing. While the models discussed here focus on simpler within-individual factors, they are not the most sophisticated that could be used. More complex models could account for nuanced variance-covariance structures. However, the parsimonious models introduced here balance data variability and computational complexity, making them suitable for many practical situations.

  • Between-individual variables. Though the focus here is on within-individual factors, between-individual variables, such as sex, patient/control group, or age, are also important. Their can be incorporated in models without major changes to the specifications.

  • Implementation. The models outlined are particularly relevant for population-level analyses using the AFNI program 3dLMEr, which offers more flexibility than its predecessor 3dLME. Compared to 3dMVM, an ANCOVA-like program, 3dLMEr provides greater adaptability for handling complex experimental designs.

In the scenarios discussed below, we assume:

  • y is the response variable (e.g., BOLD response) in a hierarchical data structure;
  • Subj serves as the unit of measurement (e.g., experiment participant);
  • A, B, and so forth, represent within-individual factors (e.g., emotion, congruency, and more). These categorical variables are commonly utilized in designed experiments to investigate their causal effects on the response variable.

1) One within-individual factor

For situations with a single within-individual (or repeated-measures) factor A, the process is relatively straightforward. Assuming indices a and i represent the factor levels and individuals respectively, we typically define the following hierarchical model for data y_{ai}:

\begin{aligned} y_{ai}&\sim N(\mu_{ai}, ~\sigma^2)\\ \mu_{ai}&=m_a+\delta_i\\ \delta_i &\sim N(0,~\tau^2) \end{aligned}

Here, m_a denotes the population-level effect associated with the a-th factor level, \delta_i signifies the effect linked with the i-th individual (commonly known as a random effect), and \sigma^2 and \tau^2 represent the population- and individual-level variances respectively. This formulation is commonly referred to as a linear mixed-effects model with random (or varying) intercepts.

Mapping this model into the program 3dLMEr is straightforward:

-model  'A+(1|Subj)'

In this case, A corresponds to m_a, while (1|Subj) corresponds to \delta_i.

2) Two within-individual factors

Now, let us extend our discussion to scenarios with two within-individual (or repeated-measures) factors, say A and B . Here, indices a, b, and i denote the factor levels for A , B , and individuals respectively. In such cases, a random-intercept model would not be appropriate. Instead, we consider the following hierarchical model for the data y_{abi}:

\begin{aligned} y_{abi}&\sim N(\mu_{abi}, ~\sigma^2)\\ \mu_{abi}&=m_{ab}+\delta_i+\alpha_{ai}+\beta_{bi}\\ \delta_i &\sim N(0,~\tau_1^2)\\ \alpha_{ai} &\sim N(0,~\tau_2^2)\\ \beta_{bi} &\sim N(0,~\tau_3^2)\\ \end{aligned}

To map this model into the program 3dLMEr , we use the following specification:

-model  'A*B+(1|Subj)+(1|A:Subj)+(1|B:Subj)'

3) Three within-individual factors

Extending our approach from two within-individual factors to three, the case involving factors A , B , and C should come naturally. Indices a, b, c, and i represent the levels of factors A , B , C , and individuals respectively. We adopt the following hierarchical model for the data y_{abci}:

\begin{aligned} y_{abci}&\sim N(\mu_{abci},~\sigma^2)\\ \mu_{abci}&=m_{abc}+\delta_i+\alpha_{ai}+\beta_{bi}+\gamma_{ci}+\xi_{abi}+\eta_{aci}+\zeta_{bci} \end{aligned}

The distribution assumptions for individual-specific effects like \delta_i, \alpha_{ai}, etc., are similar to those in the two within-individual factors case, and are thus not repeated here.

The mapping for this model to the program 3dLMEr becomes:

-model  'A*B*C+(1|Subj)+(1|A:Subj)+(1|B:Subj)+(1|C:Subj)+(1|A:B:Subj)+(1|A:C:Subj)+(1|B:C:Subj)'
1 Like

Thanks for this very helpful explainer on how to specific the LMEr models. I have a complex data set with 1 within-subjects factor and 2 within-subjects quantitative variable, as well as a covariate.

3dLMEr -prefix 3dLMEr_output_Longit_Format_Quantity_Linear \
-jobs 8 -mask /Users/andrewlynn/Documents/Projects/HomeMath_RSA/ref/mask/BN_Atlas_HaskinsPeds_NL_template_2.5_mask+tlrc.HEAD \
-model "age*format*quantity+motion+(1|Subj)+(1+quantity|format:Subj)+(1+quantity|age:Subj)" \
-qVars "age,quantity,motion" \
-qVarCenters "7.03,4.5,0.13" \
-SS_type 3 \
-dataTable \
Subj    age    format  quantity   motion   InputFile \

I can get the model to run, but I receive a singular fit

boundary (singular) fit: see help('isSingular')

Do you have any advice on how to improve this model in 3dLMEr? I think there are optimizers in the R packages -- is there a way to do this in 3dLMEr? I see that you suggest a random intercept only model isn't correct, but if the model is singular isn't that a better option?

Thanks in advance for your help,

Andrew

Andrew,

Is format a within-individual factor while the quantitative variables of quantity and motion vary within each individual? If so, do quantity and motion differ to some extent across the levels of format?

Gang Chen

Yes, format (digit, dot) is within-individual. But quantity (2,3,4,5,6,7) is the same for both levels of format. Motion differs across Age, but is the same across format and quantity for each Age (of which there are 2 timepoints for most individuals, and some missing data).

The model is not specified properly. But first, what are the effects of interest? If covariates are involved, check out this discussion.

The effects of interest are format, quantity, format x quantity, format x age, quantity x age, and format x quantity x age. Motion is a covariate of no interest seeking to control for between- and within-individual differences of in-scanner motion.

Andrew,

I assume age does not vary within each individual. Consider the following model:

-model "age*format*quantity+motion+(1+quantity|Subj)" \

Gang Chen

The study is longitudinal. age does vary (2 timepoints). But quantity is the same across format and age.

Currently I've been working with this: format*quantity*age+motion+(1|Subj)+(1|age:Subj)

But perhaps I need: format*quantity*age+motion+(1|Subj)+(1+quantity|Subj)(1+age |Subj)

In that case, try

-model "age*format*quantity+motion+(1+age+quantity|Subj)" \

Gang Chen

That model is singular again, but the previous one I tried isn't: format*quantity*age+motion+(1|Subj)+(1+quantity|Subj)+(1+age|Subj)

Do the results differ much between the two models?

The results appear to be the same -- but the fixed effects couldn't changed based on the random effects structure, right?

I'm looking into it -- but I'm not sure I understand the difference between (1+age|Subj) and (1|age:Subj) Does it just depend on the type of variable?

Differences between your specification (1|Subj)+(1+quantity|Subj)+(1+age|Subj) and my suggestion (1+age+quantity|Subj):

  • This part (1|Subj) in your specification is redundant because it is already included in latter terms.

  • Your specification assumes no correlation between age and quantity. In contrast, the correlation in my suggestion is estimated from the data. With only two age values per participant, it is possible that the correlation estimation could fail at some voxels. However, such singularity warning would not matter for the effects of interests.

In the random-effects formula (A|B), B should be combinations of categorical variables while A can be any variable types. Thus, (1|age:Subj) would not be a meaningful specification even if the program does not complain.

Gang Chen

I think I understand. I've decided to go with (quantity*age|Subj) to allow for both slope estimates for age & quantity and to allow them to vary as a function of one another.

Thanks again for all your help!!

Hello. I hope I have found the right place to ask questions.
I was trying to run the 3dLMEr model and have some questions about the model specification and run time.
As background, we have a dot probe task: 3 (Groups: SR/DP/HC) X 3 (Cond1: death/life/neutral) X 3 (Cond2: congruent/incongruent/neutral). For the model specification, I only selected the DP group and estimated the group difference in the "Cond2" condition while including age and "Cond1" condition as covariates.

-model 'Cond2+Age+(1|Subj)+(1|Cond1)'	\
-qVars 'Age'	\
-SS_type 3	\
-bounds -2 2\
-gltCode Con 'Cond2 : 1*con Age :'	\
-gltCode Incon 'Cond2 : 1*incon Age :'	\
-gltCode Neut 'Cond2 : 1*neut Age :'	\
-gltCode Con-Incon 'Cond2 : 1*con -1*incon Age :'	\
-gltCode Incon-Neu 'Cond2 : 1*incon -1*neut Age :'	\
-gltCode Con-Neu 'Cond2 : 1*con -1*neut Age :'	\
-gltCode eff1 'Cond2 : 0.5*con +0.5*incon -1*neut Age :'	\
-gltCode eff2 'Cond2 : 1*con -1*incon & 1*con -1*neut Age :'	\
-mask MNI_N27_mask+tlrc		\
-jobs 16		\
-dataTable		\
319	DP	death	con	24	ffespmeeg_319_sdotprobe-f_1_t0_1000_f30_58_1_resampled.nii	\
319	DP	death	incon	24	ffespmeeg_319_sdotprobe-f_1_t0_1000_f30_58_2_resampled.nii	\
319	DP	life	con	24	ffespmeeg_319_sdotprobe-f_1_t0_1000_f30_58_3_resampled.nii	\
319	DP	life	incon	24	ffespmeeg_319_sdotprobe-f_1_t0_1000_f30_58_4_resampled.nii	\
319	DP	neut	neut	24	ffespmeeg_319_sdotprobe-f_1_t0_1000_f30_58_5_resampled.nii	\
321	DP	death	con	22.4	ffespmeeg_321_sdotprobe-f_1_t0_1000_f30_58_1_resampled.nii	\
321	DP	death	incon	22.4	ffespmeeg_321_sdotprobe-f_1_t0_1000_f30_58_2_resampled.nii	\
321	DP	life	con	22.4	ffespmeeg_321_sdotprobe-f_1_t0_1000_f30_58_3_resampled.nii	\
321	DP	life	incon	22.4	ffespmeeg_321_sdotprobe-f_1_t0_1000_f30_58_4_resampled.nii	\
321	DP	neut	neut	22.4	ffespmeeg_321_sdotprobe-f_1_t0_1000_f30_58_5_resampled.nii	\
322	DP	death	con	63.3	ffespmeeg_322_sdotprobe-f_1_t0_1000_f30_58_1_resampled.nii	\
322	DP	death	incon	63.3	ffespmeeg_322_sdotprobe-f_1_t0_1000_f30_58_2_resampled.nii	\
...

I have three questions:

  1. Do you think my model specification is adequate?
  2. If then, how long it is supposed to take for 3dLMEr analysis with 94 nifti files? This is because my run has taken more than three hours, which seems not right to me.
  3. When I ran the model, the model sometime needs lots of memory allocation. Is there anyway to resolve this memory issue?

Thank you for your consideration in advance!

  1. Do you think my model specification is adequate?

First, you need to add a header to the data table:

Subj	Group  Cond1	Cond2	Age	InputFile	\

If Cond1 is not part of the current model, consider

-model 'Cond2+Age+(1|Subj)'	\

or

-model 'Cond2*Age+(1|Subj)'	\

Gang Chen

1 Like

This makes sense. Thank you so much!