Within Subjects ANOVA

Lecture 17

Dave Brocker

Farmingdale State College

What is ANOVA?

ANOVA (Analysis of Variance), like t-tests, compares the means of different groups to determine if they differ significantly from one another:

Independent variable: Categorical / Nominal
Dependent variable: Continuous

When to Use Repeated Measures ANOVA

Use Repeated Measures ANOVA when:

You have the same participants measured more than once (e.g., pre-test/post-test).
You’re interested in changes over time or under different conditions.
Example: Testing whether students’ enjoyment of statistics improves after receiving lollipops for 2 weeks.

Between vs. Within Subjects ANOVA

Between-Subjects ANOVA:

Compares means between groups (e.g., Group A vs. Group B).
Error comes from individual differences between people.

Within-Subjects (Repeated Measures) ANOVA:

Compares means within the same participants over time.
Controls for individual differences.

🍕 Pizza and Productivity Study

A Repeated-Measures Design

Objective: Test whether productivity changes across three timepoints:
1. Before lunch
2. 10 minutes after pizza
3. 1 hour after pizza
Same participants measured at each timepoint
DV = Number of emails written in 10 minutes

What type of test compares means across more than two repeated timepoints?

📊 Visualizing the Data

Code

library(tidyverse)

set.seed(123)
data <- tibble(
  id = rep(1:30, each = 3),
  time = rep(c("Pre", "10minPost", "1hrPost"), times = 30),
  productivity = c(
    rnorm(30, mean = 5, sd = 1),
    rnorm(30, mean = 7, sd = 1.5),
    rnorm(30, mean = 4.5, sd = 1)
  )
)

data %>%
  ggplot(aes(x = time, y = productivity, group = id)) +
  geom_line(alpha = 0.3) +
  stat_summary(aes(group = 1), fun = mean, geom = "line", color = "red", size = 1.5) +
  labs(title = "Individual and Mean Productivity Over Time") +
  theme_minimal()

🧪 Why Repeated-Measures ANOVA?

Can’t use multiple paired-samples t-tests (increases Type I error)
Repeated-Measures ANOVA allows us to:
- Compare within-subject changes across time
- Control for individual variability
- Test for an overall time effect

📐 Assumptions of RM ANOVA

Normality of differences
Sphericity: Equal variances of pairwise differences
- Test: Mauchly’s Test
Violation → Use Greenhouse-Geisser correction

Always check sphericity if more than two levels of the within-subject factor

🧮 Running the Analysis in R

✅ Interpretation

Example output:

F(2, 58) = 12.3, p < .001
→ Significant effect of time on productivity

Conclusion: Productivity changes significantly across timepoints

What would be your next step?

🔍 Follow-Up: Post-hoc Comparisons

Run paired t-tests with Bonferroni correction
Which timepoints differ?
- Pre vs. 10minPost
- 10minPost vs. 1hrPost
- Pre vs. 1hrPost

ANOVA

Between subjects ANOVA compares means BETWEEN groups to determine if they differ significantly from one another.
Within subjects ANOVA compares pre-manipulation means to post-manipulation means to determine if there is a difference WITHIN participants over time.

Within subjects ANOVA

Within subjects ANOVA examines differences within participants over time.
Within subjects ANOVA is also known as Repeated Measures ANOVA, because it looks at one measure repeated over time.

Within subjects ANOVA

The independent variable is time.

Before intervention: baseline or pre-test
After intervention: post-test

Within Subjects ANOVA

Within subjects ANOVA can look at 2 or more time points.

WITHIN Subjects ANOVA

Code

library(ggplot2)
library(dplyr)
library(gt)
library(gtsummary)
library(broom)
library(afex)

# Generate data for two normal distributions
x <- seq(-6, 6, length=100)
null_dist <- dnorm(x, mean = 0, sd = 1)
alt_dist <- dnorm(x, mean = 2, sd = 1)

data <- data.frame(
  x = rep(x, 2),
  y = c(null_dist, alt_dist),
  hypothesis = factor(rep(c("Null Hypothesis", "Alternative Hypothesis"), each=length(x)))
)

# Create the plot

plt <- 
  ggplot(data, aes(x = x, y = y, 
                 lty = rev(hypothesis))) +
    geom_line(aes(color = "purple")) +
  theme_linedraw() + 
  labs(
    x = "",
    y = "",
    lty = "Hypotheses"
  ) + 
  theme(
    panel.grid = element_blank(),
    legend.position = "inside",
    legend.position.inside = c(.2,.5),
    legend.background = element_rect(color = "black")
  ) + 
  scale_color_identity()

plt

Between subjects ANOVA

In between subjects ANOVA, we analyze two types of variance:

Between Group variance: How the data varies between the groups
Error variance: How the data varies across all participants

Between subjects ANOVA

Between Group variance: How the data varies between the groups
Error variance: How the data varies across all participants

WITHIN subjects ANOVA

In within subjects ANOVA the variance is more complicated, because data will differ in 3 ways:

Over time
Error variance: How the data varies across all participants
Per participant

WITHIN subjects ANOVA Example

In within subjects ANOVA the variance is more complicated, because data will differ in 3 ways:
Over time: Mood before Dark is more similar to mood before Dark than it is to mood after the Dark

WITHIN subjects ANOVA

WITHIN subjects ANOVA Example

In within subjects ANOVA the variance is more complicated, because data will differ in 3 ways:

Over time: Mood before the Dark is more similar to mood before Dark than it is to mood after the Dark
Error Variance: Mood will have natural variance across participants and time.

WITHIN subjects ANOVA Example

In within subjects ANOVA the variance is more complicated, because data will differ in 3 ways:

Over time: Mood before Dark is more similar to mood before Dark than it is to mood after Dark
Error Variance: Mood will have natural variance across participants.
Per participant: Each participant’s scores will be more like their other scores than they are like the scores of another participant.

WITHIN subjects ANOVA Example

WITHIN subjects ANOVA

In within subjects ANOVA the variance is more complicated, because data will differ in 3 ways:

Time
Error
Subjects

WITHIN subjects ANOVA

Code

flowchart LR
      A(Total Variability) --> B(Time Variability)
      A --> C(Within-Groups Variability)
      C --> D(Subject Variability)
      C --> E(Error Variability)

flowchart LR
      A(Total Variability) --> B(Time Variability)
      A --> C(Within-Groups Variability)
      C --> D(Subject Variability)
      C --> E(Error Variability)

Calculating F

WITHIN subjects ANOVA

What is the F ratio in between subjects ANOVA?

\(F = \frac{MS_{BG}}{MS_{Error}}\)

WITHIN subjects ANOVA

What is the F ratio in between subjects ANOVA?

\(F = \frac{\color{red}{MS_{BG}}}{MS_{Error}}\)

\(MS_{BG} = \frac{SS_{BG}}{df_{BG}}\)

WITHIN subjects ANOVA

What is the F ratio in between subjects ANOVA?

\(F = \frac{MS_{BG}}{\color{red}{MS_{Error}}}\)

\(MS_{Error} = \frac{SS_{Error}}{df_{Error}}\)

WITHIN subjects ANOVA

We want measure 2 sources of variance:

\(SS_{Time}\): this is just like the SS from between subjects ANOVA
\(SS_{Error}\): this is harder to calculate in within subjects ANOVA, because of the subject variance.

WITHIN subjects ANOVA

\(SS_{WG} = SS_{Subjects} + SS_{Error}\)

WITHIN subjects ANOVA

\(SS_{WG} = SS_{Subjects} + \color{red}{SS_{Error}}\)

\(SS_{Error} = SS_{WG} - SS_{Subjects}\)

Calculating means squared (MS)

We calculate a SS for the variance between the times points (\(SS_{Time}\)).
This is just like the \(SS_{BG}\) from between subjects ANOVA
We calculate a SS for the within group variance (\(SS_{WG}\)).
We calculate a SS for the variance by person (\(SS_{Subject}\)).

Calculating means squared (MS)

We calculate a SS for the variance between the times points (SSTime).
To find \(SS_{Error}\), we subtract \(SS_{Subjects}\) from \(SS_{WG}\).

Calculating means squared (MS)

\(MS_{Time} = \frac{SS_{Time}} {df_{Time}}\)
\(MS_{Error} = \frac{(SS_{WG} - SS_{Subject})}{df_{Error}}\)

Degrees of freedom

Time \((MS_{Time}) = k - 1\)
Error \((MS_{Error}) = (n - 1) \times (k - 1)\)
- *k is the number of time points

Calculating means squared (MS)

\(MS_{Time} = \frac{SS_{Time}}{df_{Time}} = MS_{Time} = \frac{SS_{Time}}{(k - 1)}\)
\(MS_{Error} = \frac{(SS_{WG} - SS_{Subject})} {df_{Error}} = MS_{Error}\ = \frac{(SS_{WG}\ - SS_{Subject})}{(n-1)\times(k-1)}\)

WITHIN subjects ANOVA

The F ratio in within subjects ANOVA:

\(F = \frac{MS_{Time}}{MS_{Error}}\)

ANOVA

Prof. Brocker wants to know if giving lollipops to statistics students improves their opinion of statistics. He asks 100 students to rate how much they enjoy statistics on a scale of 1 (not at all) to 10 (extremely). Then he gives each student a lollipop at the start of statistics class for 2 weeks. After two weeks of conditioning, Prof. Brocker asks participants to rate how much they enjoy statistics again.

What’s k?
What’s n?

Practice

Prof. Brocker wants to know if giving lollipops to statistics students improves their opinion of statistics. He asks 100 students to rate how much they enjoy statistics on a scale of 1 (not at all) to 10 (extremely). Then he gives each student a lollipop at the start of statistics class for 2 weeks. After two weeks of conditioning, Prof. Brocker asks participants to rate how much they enjoy statistics again.

Code

tribble(
~Source, ~SS, ~df, ~MS, ~F,
"Time","49", "","","",
"Within Group","912","","","",
"Subjects","219","","","",
"Error","","","",""
) |> 
  gt() |> 
    cols_align(align = "center",
             columns = SS:F)

Source	SS	df	MS	F
Time	49
Within Group	912
Subjects	219
Error

Practice

Prof. Brocker wants to know if giving lollipops to statistics students improves their opinion of statistics. He asks 100 students to rate how much they enjoy statistics on a scale of 1 (not at all) to 10 (extremely). Then he gives each student a lollipop at the start of statistics class for 2 weeks. After two weeks of conditioning, Prof. Brocker asks participants to rate how much they enjoy statistics again.

Code

tribble(
~Source, ~SS, ~df, ~MS, ~F,
"Time","49", "1","49",".053",
"Within Group","912","198","","",
"Subjects","219","99","2.2","",
"Error","912","","",""
) |> 
  gt() |> 
    cols_align(align = "center",
             columns = SS:F)

Source	SS	df	MS	F
Time	49	1	49	.053
Within Group	912	198
Subjects	219	99	2.2
Error	912

ANOVA

Dr. Grace believes watching Shameless makes Zoomers happy. She asks 50 Zoomers to rate their happiness on a scale of 1 (no happies) to 10 (all the happies). Then she shows them an episode of Shameless. After each Zoomer watches one episode of Shameless, she asks them to rate their happiness again on the same scale of 1 to 10.

What’s k?
What’s n?

Practice

ANOVA

Prof. Brocker measures the death anxiety of 500 middle-aged men. He then shows them images from anti-aging advertisements featuring young men. After they view the images, participants report their aging anxiety once again. Two weeks later, participants are asked to report their aging anxiety one last time.

What’s k?
What’s n?

Practice

Interpreting f

The MS_Time is made up of the MS_Error + the theoretical difference over time:

\(MS_{Time} = \text{change over time} + MS_{Error}\)
\(MS_{Time} = 0 + MS_{Error}\)
\(MS_{Time} = MS_{Error}\)
\(F = 1\)

Interpreting f

When F <= 1, it’s not likely to be significant.

Interpreting f

The MS_Time is made up of the MS_Error + the theoretical difference over time:

\(MS_{Time} = \text{change over time} + MS_{Error}\)
\(MS_{Time} = \text{effect} + MS_{Error}\)
\(F > 1\)

Hypothesis testing: ANOVA

F < = 1 —> Fail to reject the Null Hypothesis
F > 1 —> Refer to p-value —> Reject the Null Hypothesis

How many F’s do you Get?

One-way Between Subjects?
- 1 for 1 IV
Two-way Between Subjects with a 2x2 design?
- 3 (One for each IV and one for the Interaction)
Within Subjects?
- 1

Reporting F

If asked to report findings in terms of the Null Hypothesis (H0), you should report findings as:

Reject H₀, or
Fail to Reject H₀

Reporting F

If asked to report findings in general or for publication, you need to report:

F(df time, df error) = F-value, p-value

Reporting F

Mood after watching the Dark was not significantly different from mood before watching Dark, F (1, 499) = 1.02, p = 0.07.

There was no significant change in mood over time, F (1, 499) = 1.02, p = 0.07.

Reporting F

Participants’ mood was significantly better after watching the Dark compared to before watching it, F(1, 499) = 7.12, p < 0.05.

Reporting F

If asked to report findings in general or for publication, you need to report:

F(df time, df error) = F-value, p-value
If the results are significant, we have to also report the means and standard deviations for each time point.

Reporting F

Participants’ anxiety differed significant across time, F(2, 198) = 3.744, p = 0.25. Anxiety was highest at pre-test (M= 19.58, s = 0.643). Anxiety was lowest at baseline (M , s ). Anxiety at post test was M= , s.

Practice

Was anxiety just before the Stats Exam significantly higher than baseline anxiety?

Data Manipulation

When collected, data can be in either wide or long format.

Below are the first five rows of the same dataset:

Code

data.frame(
  Participant = factor(1:5),
  Immediate = rnorm(5, mean = 8, sd = 1.5),
  After24Hours = rnorm(5, mean = 6, sd = 1.5),
  After1Week = rnorm(5, mean = 4, sd = 1.5)
) |> 
  gt() |> 
  tab_header(title = "Wide Data",
             subtitle = "Each row represents an indivdual participant")

Wide Data
Each row represents an indivdual participant
Participant	Immediate	After24Hours	After1Week
1	7.629962	3.498087	1.573176
2	7.478686	5.429660	3.916657
3	6.572572	7.378495	4.779111
4	7.932458	5.136980	4.451730
5	6.822643	6.911946	4.158514

Code

data.frame(
  Participant = factor(1:5),
  Immediate = rnorm(5, mean = 8, sd = 1.5),
  Day = rnorm(5, mean = 6, sd = 1.5),
  Week = rnorm(5, mean = 4, sd = 1.5)
) |> 
  tidyr::pivot_longer(cols = !Participant,
                      names_to = "Time",
                      values_to = "Stress") |> 
  gt() |> 
  fmt_auto() |> 
  tab_header(title = "Long Data",
             subtitle = "Each row represents an indivdual participant's score on each level of the treatment")

Long Data
Each row represents an indivdual participant's score on each level of the treatment
Participant	Time	Stress
1	Immediate	10.807
1	Day	5.637
1	Week	3.411
2	Immediate	8.904
2	Day	7.676
2	Week	4.011
3	Immediate	6.851
3	Day	7.777
3	Week	0.258
4	Immediate	7.07
4	Day	8.47
4	Week	2.534
5	Immediate	9.185
5	Day	6.289
5	Week	4.943

More Examples!

1. Memory Recall Study

Code

set.seed(100)

# Memory Recall Study
memory_data <- data.frame(
  Participant = factor(1:30),
  Immediate = rnorm(30, mean = 8, sd = 1.5),
  After24Hours = rnorm(30, mean = 6, sd = 1.5),
  After1Week = rnorm(30, mean = 4, sd = 1.5)
)

# Make Data Longer
memory_data_long <- 
  memory_data |> 
  tidyr::pivot_longer(cols = !Participant,
                      names_to = "Time",
                      values_to = "Memory")


aov(Memory ~ Time + Error(Participant), data = memory_data_long) |> 
  tidy() |> 
  mutate(
    Source = term,
    SS = sumsq,
    MS = meansq,
    `F` = ifelse(is.na(statistic),"--",statistic |> round(3)),
    p = ifelse(p.value >.05,"<.05",p.value),
    p = ifelse(is.na(p),"--",p)
  ) |> 
  select(Source, df, SS, MS, `F`, p) |> 
  gt() |> 
  fmt_auto()

Source	df	SS	MS	F	p
Residuals	29	60.124	2.073	--	--
Time	2	253.722	126.861	53.001	8.10477782732907e-14
Residuals	58	138.826	2.394	--	--

Code

memory_data_long |> 
  ggplot(aes(Time,Memory, color = Time)) + 
    stat_summary(
    geom = "line",
    group = 1,
    fun.data = "mean_se",
    color = "black"
  ) + 
    stat_summary(
    fun.data = "mean_se",
    geom = "line",
    group = 1,
    color = "black"
  ) + 
  geom_jitter(alpha = .2) + 
  stat_summary(
    geom = "errorbar",
    fun.data = "mean_se",
    width = .2
  ) + 
  theme_minimal()

2. Clinical Trial

Code

set.seed(100)

# Clinical Trial Study
clinical_data <- data.frame(
  Participant = factor(1:30),
  LowDose = rnorm(30, mean = 5, sd = 1),
  MediumDose = rnorm(30, mean = 6.5, sd = 1),
  HighDose = rnorm(30, mean = 8, sd = 1)
)

clinical_data_long <- 
  clinical_data |> 
  tidyr::pivot_longer(
    cols = !Participant,
    names_to = "Dosage",
    values_to = "DV"
  )

aov(DV ~ Dosage + Error(Participant), data = clinical_data_long) |> 
  tidy() |> 
  gt()

stratum	term	df	sumsq	meansq	statistic	p.value
Participant	Residuals	29	26.72174	0.9214392	NA	NA
Within	Dosage	2	128.73321	64.3666071	60.50637	6.394066e-15
Within	Residuals	58	61.70033	1.0637988	NA	NA

Code

clinical_data_long |> 
  ggplot(aes(Dosage,DV, color = Dosage)) + 
    stat_summary(
    geom = "line",
    group = 1,
    fun.data = "mean_se",
    color = "black"
  ) + 
    stat_summary(
    fun.data = "mean_se",
    geom = "line",
    group = 1,
    color = "black"
  ) + 
  geom_jitter(alpha = .2) + 
  stat_summary(
    geom = "errorbar",
    fun.data = "mean_se",
    width = .2
  ) + 
  theme_minimal()

3. Behavioral Therapy Progress

Code

set.seed(100)

# Behavioral Therapy Study
therapy_data <- data.frame(
  Participant = factor(1:30),
  Baseline = rnorm(30, mean = 10, sd = 2),
  MidTreatment = rnorm(30, mean = 7, sd = 2),
  PostTreatment = rnorm(30, mean = 5, sd = 2)
)

therapy_data_long <- 
  therapy_data |> 
  tidyr::pivot_longer(
    cols = !Participant,
    names_to = "Treatment_Group",
    values_to = "DV"
  )

aov_ez(
  "Participant",
  "DV",
  therapy_data_long, 
  within = "Treatment_Group",
  anova_table = list(correction = "none"),
  return = "nice"
) |> gt()

Effect	df	MSE	F	ges	p.value
Treatment_Group	2, 58	4.26	46.88 ***	.530	<.001

Code

therapy_data_long |> 
  ggplot(aes(Treatment_Group,DV,color = Treatment_Group)) +
  geom_jitter(alpha = .2) + 
  stat_summary(
    fun.data = "mean_se",
    geom = "errorbar",
    width = .2
  ) + 
  stat_summary(
    fun.data = "mean_se",
    geom = "line",
    group = 1,
    color = "black"
  ) + 
  theme_minimal()

4. Consumer Product Ratings

Code

set.seed(100)

# Consumer Product Study
product_data <- data.frame(
  Participant = factor(1:30),
  ProductA = rnorm(30, mean = 6, sd = 1),
  ProductB = rnorm(30, mean = 7, sd = 1),
  ProductC = rnorm(30, mean = 5, sd = 1)
)

product_data_long <- 
  product_data |> 
  tidyr::pivot_longer(cols = !Participant,
                      names_to = "products",
                      values_to = "ratings")

aov(ratings ~ products + Error(Participant), data = product_data_long) |>
  tidy() |> 
  gt()

stratum	term	df	sumsq	meansq	statistic	p.value
Participant	Residuals	29	26.72174	0.9214392	NA	NA
Within	products	2	68.46662	34.2333084	32.18025	3.961105e-10
Within	Residuals	58	61.70033	1.0637988	NA	NA

Code

product_data_long |> 
  ggplot(aes(products,ratings,fill = products)) + 
  stat_summary(
    fun = "mean",
    geom = "bar",
    width = .5
  ) +
  stat_summary(
    fun.data = "mean_se",
    geom = "errorbar",
    width = .2
  ) + 
  theme_minimal()

5. Physiological Stress Study

Code

set.seed(100)

# Physiological Stress Study
stress_data <- data.frame(
  Participant = factor(1:30),
  Rest = rnorm(30, mean = 50, sd = 5),
  Task = rnorm(30, mean = 70, sd = 5),
  Recovery = rnorm(30, mean = 55, sd = 5)
)

stress_data_long <- 
  stress_data |> 
  tidyr::pivot_longer(cols = !Participant,
                      names_to = "Time",
                      values_to = "Stress")

aov(Stress ~ Time + Error(Participant), data = stress_data_long) |>
  tidy() |> 
  gt() |> 
  fmt_auto()

stratum	term	df	sumsq	meansq	statistic	p.value
Participant	Residuals	29	668.043	23.036	NA	NA
Within	Time	2	6,802.717	3,401.358	127.895	5.456 × 10⁻²²
Within	Residuals	58	1,542.508	26.595	NA	NA

Code

stress_data_long |> 
  ggplot(aes(Time,Stress,color = Time)) + 
  geom_jitter(alpha = .2) + 
  stat_summary(
    fun.data = "mean_se",
    geom = "errorbar",
    width = .2
  ) + 
  stat_summary(
    fun.data = "mean_se",
    geom = "line",
    group = 1) + 
  theme_minimal()