How to Specify Individual-Level (Random) Effects in Hierarchical Modeling?
Gang Chen
Preface
Hierarchical, or mixed-effects, modeling is a powerful analytical tool for handling complex data structures. However, its advantages often come with challenges, particularly in terms of conceptual understanding and model specification. The motivation behind this post stems from the common practice of specifying hierarchical models with only a random intercept at the individual level. While this approach is widely used, it can present limitations, especially when accounting for multiple within-individual variables. In this post, we explore several population-level scenarios in neuroimaging data analysis that serve as useful templates, with plans to expand the list as additional relevant patterns emerge.
Key points to consider:
- Within-individual variables. Also known as repeated measures, these are variables for which all levels (e.g., conditions or time points) are experienced by the same experimental unit, such as a participant. This term is familiar from ANOVA, where it applies to categorical factors or longitudinal quantitative variables (like time).
-
Counterbalancing. While the models discussed here focus on simpler within-individual factors, they are not the most sophisticated that could be used. More complex models could account for nuanced variance-covariance structures. However, the parsimonious models introduced here balance data variability and computational complexity, making them suitable for many practical situations.
-
Between-individual variables. Though the focus here is on within-individual factors, between-individual variables, such as sex, patient/control group, or age, are also important. Their can be incorporated in models without major changes to the specifications.
-
Implementation. The models outlined are particularly relevant for population-level analyses using the AFNI program 3dLMEr, which offers more flexibility than its predecessor 3dLME. Compared to 3dMVM, an ANCOVA-like program, 3dLMEr provides greater adaptability for handling complex experimental designs.
In the scenarios discussed below, we assume:
y
is the response variable (e.g., BOLD response) in a hierarchical data structure;Subj
serves as the unit of measurement (e.g., experiment participant);A
,B
, and so forth, represent within-individual factors (e.g., emotion, congruency, and more). These categorical variables are commonly utilized in designed experiments to investigate their causal effects on the response variable.
1) One within-individual factor
For situations with a single within-individual (or repeated-measures) factor A
, the process is relatively straightforward. Assuming indices a and i represent the factor levels and individuals respectively, we typically define the following hierarchical model for data y_{ai}:
\begin{aligned} y_{ai}&\sim N(\mu_{ai}, ~\sigma^2)\\ \mu_{ai}&=m_a+\delta_i\\ \delta_i &\sim N(0,~\tau^2) \end{aligned}
Here, m_a denotes the population-level effect associated with the a-th factor level, \delta_i signifies the effect linked with the i-th individual (commonly known as a random effect), and \sigma^2 and \tau^2 represent the population- and individual-level variances respectively. This formulation is commonly referred to as a linear mixed-effects model with random (or varying) intercepts.
Mapping this model into the program 3dLMEr
is straightforward:
-model 'A+(1|Subj)'
In this case, A
corresponds to m_a, while (1|Subj)
corresponds to \delta_i.
2) Two within-individual factors
Now, let us extend our discussion to scenarios with two within-individual (or repeated-measures) factors, say A
and B
. Here, indices a, b, and i denote the factor levels for A
, B
, and individuals respectively. In such cases, a random-intercept model would not be appropriate. Instead, we consider the following hierarchical model for the data y_{abi}:
\begin{aligned} y_{abi}&\sim N(\mu_{abi}, ~\sigma^2)\\ \mu_{abi}&=m_{ab}+\delta_i+\alpha_{ai}+\beta_{bi}\\ \delta_i &\sim N(0,~\tau_1^2)\\ \alpha_{ai} &\sim N(0,~\tau_2^2)\\ \beta_{bi} &\sim N(0,~\tau_3^2)\\ \end{aligned}
To map this model into the program 3dLMEr
, we use the following specification:
-model 'A*B+(1|Subj)+(1|A:Subj)+(1|B:Subj)'
3) Three within-individual factors
Extending our approach from two within-individual factors to three, the case involving factors A
, B
, and C
should come naturally. Indices a, b, c, and i represent the levels of factors A
, B
, C
, and individuals respectively. We adopt the following hierarchical model for the data y_{abci}:
\begin{aligned} y_{abci}&\sim N(\mu_{abci},~\sigma^2)\\ \mu_{abci}&=m_{abc}+\delta_i+\alpha_{ai}+\beta_{bi}+\gamma_{ci}+\xi_{abi}+\eta_{aci}+\zeta_{bci} \end{aligned}
The distribution assumptions for individual-specific effects like \delta_i, \alpha_{ai}, etc., are similar to those in the two within-individual factors case, and are thus not repeated here.
The mapping for this model to the program 3dLMEr
becomes:
-model 'A*B*C+(1|Subj)+(1|A:Subj)+(1|B:Subj)+(1|C:Subj)+(1|A:B:Subj)+(1|A:C:Subj)+(1|B:C:Subj)'