How to Specify Individual-Level (Random) Effects in Hierarchical Modeling?

How to Specify Individual-Level (Random) Effects in Hierarchical Modeling?

Gang Chen

Preface

Hierarchical (also known as mixed-effects or multilevel) modeling is a powerful analytical tool designed to handle complex data structures. However, its power is often matched by the challenges it poses, both at conceptual and specification levels. The motivation behind this blog post stems from the common practice of specifying hierarchical models with a simple varying (random) intercept at the individual level. However, this approach can pose challenges, especially when multiple within-individual variables are taken into account. In this post, we explore several population-level scenarios in neuroimaging data analysis that can be valuable templates. We plan to expand this list by incorporating additional scenarios as they demonstrate typified patterns.

We wish to underscore the following key points:

  • Within-individual variables. The term "within-individual variables" (or "repeated-measures") is a conventional notion used in the context of ANOVA. It denotes situations where the measurement unit, such as an experimental participant, experiences all levels of a categorical variable (factor) or multiple instances of a quantitative variable (e.g., time) in longitudinal studies.
  • Counterbalance. The models outlined in this discussion for scenarios involving one or more within-individual factors are not the most intricate ones that could potentially be employed. In other words, there exists the possibility of utilizing more sophisticated models to capture nuanced variance-covariance patterns. However, we believe that the parsimonious models introduced here strike a delicate equilibrium between effectively accounting for data variability and managing computational complexities, rendering them well-suited for many practical scenarios.

  • Between-individual variables. The treatment of between-individual variables requires thorough consideration. The discourse here primarily revolves around within-individual variables, particularly categorical variables (factors). Variables such as sex, patient/control status, and age, which encompass differences between individuals, can also be integrated into a model. However, the process of variable selection is intricate and will be addressed separately at another juncture.

  • Implementation. The specifications provided here are especially relevant for population-level analyses conducted using the AFNI program 3dLMEr. This program is renowned for its flexibility and is preferred over its predecessor, 3dLME. Additionally, when compared to the ANCOVA program 3dMVM, 3dLMEr demonstrates greater adaptability.

In the scenarios discussed below, we assume:

  • y is the response variable (e.g., BOLD response) in a hierarchical data structure;
  • Subj serves as the unit of measurement (e.g., experiment participant);
  • A, B, and so forth, signify within-individual factors (e.g., emotion, congruency, and more). These categorical variables are commonly utilized in designed experiments to investigate their causal effects on neural response.

1) One within-individual factor

For situations with just a single within-individual (or repeated-measures) factor A, the process is relatively straightforward. Assuming indices a and i represent the factor levels and individuals respectively, we typically define the following hierarchical model for data y_{ai}:

\begin{aligned} y_{ai}&\sim N(\mu_{ai}, ~\sigma^2)\\ \mu_{ai}&=m_a+\delta_i\\ \delta_i &\sim N(0,~\tau^2) \end{aligned}

Here, m_a denotes the population-level effect associated with the a-th factor level, \delta_i signifies the effect linked with the i-th individual (commonly known as a random effect), and \sigma^2 and \tau^2 represent the population- and individual-level variances respectively. This formulation is commonly referred to as a linear mixed-effects model with random (or varying) intercepts.

Mapping this model into the program 3dLMEr is straightforward:

-model  'A+(1|Subj)'

In this case, A corresponds to m_a, while (1|Subj) corresponds to \delta_i.

(1.A) incorporation of within-individual quantitative variables

The above model can be extended to include individual-level slopes. For example, when rating score r_{ai} is available for the i-th individual at the a-th level of the factor, we may modify the model to

\begin{aligned} y_{ai}&\sim N(\mu_{ai}, ~\sigma^2)\\ \mu_{ai}&=m_a+s_a r_{ai}+\delta_i+\theta_i r_{ai}\\ \begin{bmatrix} \delta_i\\ \theta_i \end{bmatrix} &\sim N(\begin{bmatrix} 0\\ 0 \end{bmatrix}, ~ \begin{bmatrix} \tau_1^2 & \rho \tau_1\tau_2 \\ \rho \tau_1\tau_2 & \tau_2^2 \end{bmatrix}) \end{aligned}

where s_a and \theta_i are the slopes at the population- and individual-levels, respectively. This formulation is commonly referred to as a linear mixed-effects model with both random intercepts and random slopes.

The specification in 3dLMEr is now updated to

-model  'A*rating+(1+rating|Subj)'

To improve the interpretability of differences among the levels of the factor, it may be essential to center the variable rating within each level of the factor.

(1.B) incorporation of multiple samples

Another extension to the hierarchical model above with one within-individual factor A is the scenario with multiple samples. Suppose that the response variable (e.g., BOLD response) is measured across N samples (e.g., scanning runs). With an extra index n for samples (n=1,2,...,N), the original model is now extended to

\begin{aligned} y_{ain}&\sim N(\mu_{ai}, ~\sigma^2)\\ \mu_{ai}&=m_a+\delta_i+\theta_{ai}\\ \delta_i &\sim N(0,~\tau^2)\\ \theta_{ai} &\sim N(0,~\pi^2) \end{aligned}

This extended model can be implemented through 3dLMEr as

-model  'A+(1|Subj)+(1|A:Subj)'

2) Two within-individual factors

Now, let us extend our discussion to scenarios with two within-individual (or repeated-measures) factors, say A and B . Here, indices a, b, and i denote the factor levels for A , B , and individuals respectively. In such cases, a random-intercept model would not be appropriate. Instead, we consider the following hierarchical model for the data y_{abi}:

\begin{aligned} y_{abi}&\sim N(\mu_{abi}, ~\sigma^2)\\ \mu_{abi}&=m_{ab}+\delta_i+\alpha_{ai}+\beta_{bi}\\ \delta_i &\sim N(0,~\tau_1^2)\\ \alpha_{ai} &\sim N(0,~\tau_2^2)\\ \beta_{bi} &\sim N(0,~\tau_3^2)\\ \end{aligned}

For mapping this model into the program 3dLMEr , we use the following specification:

-model  'A*B+(1|Subj)+(1|A:Subj)+(1|B:Subj)'

If a within-individual quantitative variable like rating is available across all the levels of factors A and B, consider the following specification:

-model  'A*B*rating+(1+rating|Subj)+(1+rating|A:Subj)+(1+rating|B:Subj)'

Again, proper centering might be essential for the interpretability of some effects.

3) Three within-individual factors

Extending our approach from two within-individual factors to three, the case involving factors A , B , and C should come naturally. Indices a, b, c, and i represent the levels of factors A , B , C , and individuals respectively. We adopt the following hierarchical model for the data y_{abci}:

\begin{aligned} y_{abci}&\sim N(\mu_{abci},~\sigma^2)\\ \mu_{abci}&=m_{abc}+\delta_i+\alpha_{ai}+\beta_{bi}+\gamma_{ci}+\xi_{abi}+\eta_{aci}+\zeta_{bci} \end{aligned}

The distribution assumptions for individual-specific effects like \delta_i, \alpha_{ai}, etc., are similar to those in the two within-individual factors case, and are thus not repeated here.

The mapping for this model to the program 3dLMEr becomes:

-model  'A*B*C+(1|Subj)+(1|A:Subj)+(1|B:Subj)+(1|C:Subj)+(1|A:B:Subj)+(1|A:C:Subj)+(1|B:C:Subj)'

What if there are more than three within-individual factors?

In this scenario, we'd like to emphasize two key points. Firstly, if an investigator plans to design an experiment with such a high level of complexity, they should anticipate and include strategies for managing the resulting model complexity as part of their planning process. Secondly, extending the modeling process beyond the case with three within-individual factors, as demonstrated above, is not significantly more challenging, although it may involve increased technical intricacy and computational cost.

1 Like