Advice on repeated-measures analysis


Hi guys,

I am struggling with deciding how to analyze the following data. I am investigating a new method to estimate hearing thresholds using EEG. About twenty subjects participated in the study. Each of these subjects underwent three types of tests:

  • TLA: this is the true hearing threshold
  • TB: this is the traditional EEG measurement for hearing threshold estimation
  • NB: this is the new EEG measurement for hearing threshold estimation

For each test, the threshold could be obtained for 0.5, 1, 2, and 4 kHz. However, due to time constraints, it was impossible to obtain the thresholds for all types of tests and all frequencies in each subject. Therefore, I randomly selected frequencies. Although I did not manage to test every combination, for the frequencies that were tested, I do have the thresholds for each test (TLA, TB, NB).

There are two research questions:

  1. For each frequency, is NB a good estimator for the TLA?
  2. For each frequency, is NB a better estimator for the TLA than the TB?

My questions is: can I do a repeated-measures ANOVA for each frequency separately? Or do I need to do a generalized linear mixed model? I have never done generalized linear mixed models and I am new to R, so it would be nice if someone could help me out.

Many thanks,



Hi Lindsey,

I’ll give it a shot…

Repeated measures is generic term used to describe a variety of statistical models that account for the fact that multiple measurements have been taken on each subject (in your case person). Let’s represent these subject effects using a categorical variable ID, which takes a different level for each subject. Because subjects may differ in how they respond to experimental treatments, if we don’t include ID in our model, residuals will be correlated, thus violating the model assumptions. Repeated measures models either implicitly account for or explicitly incorporate ID effects. Such models can be implemented using classic least-squares ANOVA, but likelihood-based approaches (LMM, GLMM) are used more frequently these days because they offer greater flexibility and can be used with unbalanced designs.

Because you do not have a fully crossed design (i.e. you have not considered all combinations of tests and frequencies for all person), I would suggest that you begin by considering each frequency separately. If TB and NB were estimated simultaneously (i.e. both of these estimates correspond to exactly the same TLA estimate), and if you have only a single measurement at each frequency for each person, you don’t need a repeated measures model. Instead, you can fit a simpler model such as:

m1 <- lm(TLA ~ NB)
m2 <- lm(TLA ~ TB)
anova(m1) # addresses research question 1
AIC(m1,m2) # addresses research question 2

The models above are just standard OLS regressions that assume that TLA exhibits linear relationships to NB and TB, and that the residuals about the fitted models are normally distributed. If these assumptions are not reasonable, you could try GLM.

If you multiple paired estimates of TLA, NB, and TB for each person, then you have a repeated measures design and could fit a linear mixed model (or GLMM) instead. For the linear mixed model, the code would be

m1 <- lmer(TLA ~ NB + (1|ID))
m2 <- lmer(TLA ~ TB + (1|ID))
drop1(m1) # addresses research question 1
AIC(m1,m2) # addresses research question 2

By including ID in this way, you account for any differences on the intercepts of the linear relationships of TLA to NB, and TLA to TB, for different people. More complicated models are also possible if you want to account for slope differences between people as well.