1 Exploring mixed effects models with R
greenleaves edited this page 2021-12-09 20:20:42 +00:00
This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

title author date output
Exploring mixed-effects models Jose 09/10/2020 html_document
knitr::opts_chunk$set(echo = TRUE)

Exploring mixed models

1. Learn about basic exploration of a linear model

All data and information about the tutorial 1 and 2 comes from:

Winter, B. (2013). Linear models and linear mixed effects models in R with linguistic applications. arXiv:1308.5499. [http://arxiv.org/pdf/1308.5499.pdf]

Complementary information came from:

Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.

library(car)
library(lattice)
library(HLMdiag)
library(ggplot2)

Creating objects:

pitch <- c(233,204,242,130,112,142)
Voice.Pitch <- c(252,244,240,233,212,204)
sex <- c(rep("female",3),rep("male",3))
Age <- c(14,23,35,48,52,67)
my.df <- data.frame(sex, pitch)

my.df

Proceed to linear model

summary(xmdl <- lm (pitch~sex,my.df))
Anova(xmdl)

“We constructed a linear model of pitch as a function of sex. This model was significant (F(1,4)=46.61, p<0.01). (...)”

Note that the p-value for the overall model was p=0.002407, which is the same as the p-value on the right-hand side of the coefficients table in the row that starts with “sexmale”. This is because your model had only one fixed effect (namely, “sex”) and so the significance of the overall model is the same as the significance for this coefficient.

mean(my.df [my.df$sex == "female",]$pitch)

... youll get the mean of female voice pitch values, and youll see that this value is very similar to the estimate value in the “(Intercept)” column.

Next, note that the estimate for “sexmale” is negative. If you subtract the estimate in the first row from the second, youll get 128, which is the mean of the male voice pitches (you can verify that by repeating the above command and exchanging “male” for “female”). 226.33-98.33

mean(my.df [my.df$sex == "male",]$pitch)

To sum up, the estimate for “(Intercept)” is the estimate for the female category, and the estimate for “sexmale” is the estimate for the difference between the females and the male category.

Create data for exploring if age affects voice pitch:

print(my.df_2 <- data.frame(Age, Voice.Pitch))
scatterplot(my.df_2$Age, my.df_2$Voice.Pitch)
qplot(my.df_2$Age, my.df_2$Voice.Pitch)

Run the second model

summary(xmdl_2 <- lm(Voice.Pitch ~ Age, my.df_2))

Again, the significance of the intercept is not very interesting. Remember that the p-value in each row is simply a test of whether the coefficient to the left is significantly different from zero. The intercept (267.0765) here is the predicted pitch value for people with age 0. This intercept doesnt make much sense because people who are not born yet dont really have voice pitch.

What really interests us is “age”, which emerges as a significant “predictor” of voice pitch. The way to read the output for age (“-0.9099“) is that for every increase of age by 1 you decrease voice pitch by 0.9099 Hertz. Easy-peasy: just go one step to the right in your graph (in your unit of measurement, here: age in years) and one step down (in your unit of measurement, here: voice pitch in Hz).

You might want to remedy the above-discussed situation that the intercept is meaningless. One way of doing this would be to simply subtract the mean age from each age value, as is done below:

plot(xmdl_2)
my.df_2$Age.c <- my.df_2$Age - (mean(my.df_2$Age))

Here, we just created a new column “age.c” that is the age variable with the mean subtracted from it.

mean(my.df_2$Age)
my.df_2
summary(xmdl_3 <- lm(Voice.Pitch~Age.c,my.df_2))

This is the resulting coefficient table from running a linear model analysis of this “centered” data

xmdl_3
mean(my.df_2$Voice.Pitch)

Note that while the estimate has changed from 267.0765 (predicted voice pitch at age 0) to 230.8333 (predicted voice pitch at average age), the slope hasnt changed and neither did the significance associated with the slope or the significance associated with the full model. That is, you havent messed at all with the nature of your model, you just changed the metric so that the intercept is now the mean voice pitch. So, via centering our variable we made the intercept more meaningful.

Say, you measured two factors (“age” and “sex”)... you could put them in the same model. Your formula would then be:

pitch ~ sex + age + ε

Or, you could add dialect as an additional factor:

pitch ~ dialect + sex + age + ε

And so on and so on. The only thing that changes is the following. The p-value at the bottom of the output will be the p-value for the overall model. This means that the P-value considers how well all of your fixed effects together help in accounting for variation in pitch. The coefficient output will then have p-values for the individual fixed effects.

2. Assumptions of a linear model

  • Linearity
  • Abscense of collinearity for fixed effects
  • Homoskedasticity
  • Normality of residuals
  • Independence

A commentary about "Normality of residuals":

The normality of residuals assumption is the one that is least important. Interestingly, many people seem to think it is the most important one, but it turns out that linear models are relatively robust against violations of the assumptions of normality. Researchers differ with respect to how much weight they put onto checking this assumption. For example, Gellman and Hill (2007), a famous book on linear models and mixed models, do not even recommend diagnostics of the normality assumption (ibid. 46).

This is a residual plot. The fitted values (the predicted means) are on the horizontal line (at y=0). The residuals are the vertical deviations from this line.

This view is just a rotation of the actual data (compare the residual plot with the scatterplot to see this). To construct the residual plot for yourself, simply type:

plot(fitted(xmdl_2),residuals(xmdl_2))

A comment about heteroskedasticity: "Being able to pronounce “heteroskedasticity” several times in a row in quick succession will make you a star at your next cocktail party, so go ahead and rehearse pronouncing them now!"

Important consideration:

Theres not really that many data points to tell whether this is really homoscedastic. In this case, I would conclude that theres not enough data to safely determine whether there is or isnt heteroskedasticity. Usually, I would construct models for much larger data sets anyway.

Its a good idea to generate some random data to see how a plot with roughly equal variances looks like. You can do so using the following command line:

plot(rnorm(100),rnorm(100))

This creates two sets of 100 normally distributed random numbers with a mean of 0 and a standard deviation of 1. If you type this in multiple times to create multiple plots, you can get a feel of how a “normal” residual plot should look like.

D.Normality of residuals:

hist(residuals(xmdl_3))
qqPlot(residuals(xmdl_3))
qqnorm(residuals(xmdl_3))

Checking for Abscence of influential data points Heres a useful R function, dfbeta(), that you can use on a model object like our xmdl from above.

dfbeta(xmdl_3)

More concretely, lets look at the age column in the data frame above. The first row means that the coefficient for age (which, if you remember, was -0.9099) has to be adjusted by 0.06437573 if data point 1 is excluded. That means that the coefficient of the model without the data point would be -0.9742451 (which is -0.9099 minus 0.06437573... if the slope is negative, DFbeta values are subtracted, if its positive, they are added).

Theres a little bit of room for interpretation in what constitutes a large or a small DFbeta value. One thing you can say for sure: Any value that changes the sign of the slope is definitely an influential point that warrants special attention...because excluding that point would change the interpretation of your results.

What I then do is to eyeball the DFbetas and look for values that are different by at least half of the absolute value of the slope. Say, my slope would be 2 ... then a DFbeta value of 1 or -1 would be alarming to me. If its a slope of -4, a DFbeta value of 2 or -2 would be alarming to me.

Commentary on Independence:

When you violate the independence assumption, all hell breaks loose. The other assumptions that we mentioned above are important as well, but the independence assumption is by far the most important one. Violating independence may greatly inflate your chance of finding a spurious result and it results in a P-value that is completely meaningless.

Extra tools for exploring a linear model:

You can test for heteroscedasticity using the function ncvTest in the car (Com-panion to Applied Regression) package, which implements the Breusch-Pagan test. Just run the function and apply to your model (name) ncvTest(name of the model here) ?ncvTest()

If the Chi Squared value is significant with p-value below an appropriate threshold (e.g. p<0.05) then the null hypothesis of homoskedasticity is rejected and heteroskedasticity assumed.

You can test for autocorrelation in a model using the function durbin.watson in the car package, which implements the Durbin-Watson test. Just run the function and apply to your model (name)

durbinWatsonTest(name of the model here)

?durbinWatsonTest()

d = 2 indicates no autocorrelation. The value of d always lies between 0 and 4. If the DurbinWatson statistic is substantially less than 2, there is evidence of positive serial correlation

Second part of tutorial

Fixed and random effects

pitch ~ age + ε

We called “age” a fixed effect, and ε was our “error term” to represent the deviations from our predictions due to “random” factors that we cannot control experimentally. You could call this part the “probabilistic” or “stochastic” part of the model. Now, well unpack this “ε” and add complexity to it. That is, we change the random aspect of our model, essentially leaving the systematic part unchanged. In mixed models, everything in the “systematic” part of your model works just like with linear models in tutorial 1

pitch ~ politeness + sex + ε

Our design was so that we took multiple measures per subject. That is, each subject gave multiple polite responses and multiple informal responses. If we go back to the discussion of the assumptions of the linear model in tutorial 1, we can immediately see that this would violate the independence assumption: Multiple responses from the same subject cannot be regarded as independent from each other.

The way were going to deal with this situation is to add a random effect for subject. This allows us to resolve this non-independence by assuming a different “baseline” pitch value for each subject.

Loading second dataset:

Please see the file on [http://arxiv.org/pdf/1308.5499.pdf]

library(readr)
politeness_data <- read_csv("politeness_data.csv")
politeness_data
Boxplot(politeness_data$frequency~politeness_data$subject)

Lets look at the relationship between politeness and pitch by means of a boxplot using the library "car":

Boxplot(frequency ~ attitude*gender,
          col=c("white", "lightgray"), politeness_data, notch=T)

What do we see? In both cases, the median line (the thick line in the middle of the boxplot) is lower for the polite than for the informal condition. However, there may be a bit more overlap between the two politeness categories for males than for females.

You immediately see that males have lower voices than females (as is to be expected). But on top of that, within the male and the female groups, you see lots of individual variation, with some people having relatively higher values for their sex and others having relatively lower values.

We can model these individual differences by assuming different random intercepts for each subject. That is, each subject is assigned a different intercept value, and the mixed model estimates these intercepts for you.

In the mixed model, we add one or more random effects to our fixed effects. These random effects essentially give structure to the error term “ε”. In the case of our model here, we add a random effect for “subject”, and this characterizes idiosyncratic variation that is due to individual differences.

Updated formula looks like this:

pitch ~ politeness + sex + (1|subject) + ε

table(politeness_data$attitude, politeness_data$subject)
library(lme4)
summary(politeness.model<-lmer(frequency ~ attitude + (1|subject) + (1|scenario),
                               data = politeness_data))
Anova(politeness.model)
confint.merMod(politeness.model)

Lets add gender as an additional fixed effect:

summary(politeness.model_2 <- lmer(frequency~ attitude + gender + (1|subject) + (1|scenario), 
                         data = politeness_data))
Anova(politeness.model_2)
confint.merMod(politeness.model_2)

Im already using the R-typical notation format here. What this is saying is “assume an intercept thats different for each subject” ... and “1” stands for the intercept here.

But were not done yet. In the design that we used in Winter and Grawunder (2012), theres an additional source of non-independence that needs to be accounted for: We had different items. One item, for example, was an “asking for a favor” scenario. Here, subjects had to imagine asking a professor for a favor (polite condition), or asking a peer for a favor (informal condition).

Another item was an “excusing for coming too late” scenario, which was similarly divided between polite and informal. In total, there were 7 such different items.

Heres a visual representation of the by-item variability:

Boxplot(politeness_data$frequency~politeness_data$scenario)

The variation between items isnt as big as the variation between subjects but there are still noticeable differences, and we better account for them in our model! We do this by adding an additional random effect:

We do this by adding an additional random effect:

pitch ~ politeness + sex + (1|subject) + (1|item) + ε

So, on top of different intercepts for different subjects, we now also have different intercepts for different items.

Statistical significance

comparing models using the likelihood ratio test

A. constructing a null model:

summary(politeness.null<- lmer(frequency ~ gender + (1|subject) + (1|scenario), data=politeness_data,
                       REML=FALSE))
summary(politeness.model_3 <- lmer(frequency ~ attitude + gender + (1|subject)+ (1|scenario), 
                           data=politeness_data,
                           REML=FALSE))

Now you have two models to compare with each other one with the effect in question, one without the effect in question. We perform the likelihood ratio test using the anova() function:

anova(politeness.null,politeness.model_3)

You would report this result the following way:

“... politeness affected pitch (χ 2 (1)=11.62, p=0.00065), lowering it by about 19.7 Hz ± 5.6 (standard errors) ...”

We could have also compared the following two models:

full model: frequency ~ attitude + gender

reduced model: frequency ~ 1

It might be a good idea to try out different likelihood comparisons with the data provided above, say “attitude*gender” versus “attitude + gender” versus simply “1” (the intercept only model). Remember to always put REML=FALSE when creating your model.

Random slopes and random intercepts

Coefficients of the model by subject and by item:

coef(politeness.model_2)

So, what we need is a random slope model, where subjects and items are not only allowed to have differing intercepts, but where they are also allowed to have different slopes for the effect of politeness. This is how we would do this in R:

summary(politeness.model_4 <- lmer(frequency ~ attitude + gender + (1+attitude|subject) + 
                             (1+attitude|scenario), data=politeness_data, REML=FALSE))

Note that the only thing that we changed is the random effects, which now look a little more complicated. The notation “(1+attitude|subject)” means that you tell the model to expect differing baseline-levels of frequency (the intercept, represented by 1) as well as differing responses to the main factor in question, which is “attitude” in this case. You then do the same for items.

take a look to the coefficients of this model (random slope and intercept):

coef(politeness.model_4)

Now, the column with the by-subject and by-item coefficients for the effect of politeness (“attitudepol”) is different for each subject and item.

Have a look at the column for gender. Here, the coefficients do not change. That is because we didnt specify random slopes for the by-subject or by-item effect of gender.

Another important consideration about random slopes:

Moreover, researchers in ecology (Schielzeth & Forstmeier, 2009), psycholinguistics (Barr, Levy, Scheepers, & Tilly, 2013) and other fields have shown via simulations that mixed models without random slopes are anti- conservative or, in other words, they have a relatively high Type I error rate (they tend to find a lot of significant results which are actually due to chance).

The write-up

You need to describe the model to such an extent that people can reproduce the analysis. So, a useful heuristic for writing up your results is to ask yourself the question “Would I be able to re-create the analysis given the information that I provided?” If the answer is “yes” your write-up isgood.

Some extra considerations (out of the tutorial):

You have many options for assessing the mixed model

For example: you may want to do a profile of the model:

Profile of the model

A. create an object in order to made the graphics (choose any name you want: on this example it was named "prof_mix"):

prof_mix <- profile("put name of the model here")

Then: apply a series of graphs (on lattice package: remember to charge the "library(lattice)" first):

densityplot(prof_mix)

splom(prof_mix)

xyplot(prof_mix)

You might also want to explore deeper the model using the package HLMdiag:

HLMdiag::

If you used lme4 package, please cite:

citation("lme4")

For more information about lme4 package you may explore this excellent article:

Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.

Doing predictions of any parameter from the model:

Posterior predictive simulation

A. Create an object and apply th function to your model:

In this case you want to know the predictive value of the model for the mean of the outcome:

meanvec <- sapply(simulate("put name of the model here", 1000), mean)

Write the name of your output or response variable (in this case based on the mean: remember that)

obsval <- mean("outpur here", na.rm = T)

Assessing the predictive value of the mean based on your observed values:

post.pred.p <- mean(obsval >= c(obsval, meanvec))

Look for your answer

post.pred.p

Doing an analysis of deviance table for fied effects:

Anova("put name of the model here")

Extracting info from the model

extractAIC("put name of the model here")

logLik("put name of the model here")

model.matrix("put name of the model here")

terms("put name of the model here")

VarCorr("put name of the model here")

fixef("put name of the model here")

...And so goes on....

Diagnostic graphics:

plot("put name of the model here")

plot("put name of the model here", type = c("p", "smooth"))

plot.lme("put name of the model here")

ranef("put name of the model here")

Quantile-quantile plots, (from lattice)

qqmath(plot("put name of the model here"), id = 0.05)

plot("put name of the model here", sqrt(abs(resid(.))) ~ fitted(.))

plot("put name of the model here", sqrt(abs(resid(.))) ~ fitted(.), type = c("p", "smooth"))