We perform t-test to test if two samples have the the same mean. More specifically, we test the null hypothesis that there is difference between the means.
Similarly, ANOVA tells us if the means of three or more samples are same or not. ANOVA is an omnibus test which means that it will tell us that means are same (or not same) but, it will not give you specific information (in the case when means are not same) about which means are not equal.
In this post we will perform ANOVA test on a dataset and find out where the difference lies.
Download the data here
setwd("e:/r") #I have kept the data file in this location
d <- read.csv("labs.csv")
d
boxplot(d)
s=stack(d) #This step is needed to prepare the data.
s #See how the arrangement of the data has changed
names(s) = c("measure","lab")
s
diff <- aov( measure~lab, data=s) #response variable (measure) comes first
summary(diff)
#We reject the assumption of no difference because the p-value suggests that
#there is a significant difference across the 3 labs.
#If there were no difference, the investigation would have ended here.
#As there is significant difference in this case, we need to find out where
#the difference lies. We need to perform pairwise comparison test. We have many
#options for this. One of them is Tukey's HSD test that gives us the intervals.
#We must remember that this test only works the design is balanced, in other
#words, data points for each lab must be same. In our case we have a blanced design.
#Run the test
tk <- TukeyHSD(diff)
#Output
# Tukey multiple comparisons of means
# 95% family-wise confidence level
#Fit: aov(formula = measure ~ lab, data = s)
#$lab
# diff lwr upr p adj
#lab2-lab1 1.75 -1.8842297 5.38423 0.4584177
#lab3-lab1 6.25 2.6157703 9.88423 0.0008182
#lab3-lab2 4.50 0.8657703 8.13423 0.0137160
#Each of the last 3 rows contain pairwise comparison results.
#Look at Row 1: diff column shows the mean of difference between lab2 and lab1 is 1.75
#lwr column provides the lower limit of difference at 95% confidence level
#upr column provides the upper limit of difference at 95% confidence level
#Last column: p adj gives the p-value 0.4584177; we cannot reject the assumption of no-difference between lab2 and lab1
#Row 2 and row3 indicates (p-value less than 0.05) that there are significant differences between lab3 & lab1; and lab3 & lab2
#Let's take a look at visual plot
plot(tk)
#Take a look at the plot. The dotted line reprents zero. Zero is within the limits of 95% confidence interval of the difference
#between lab2 and lab1 indicating that there is no significant difference between lab2 and lab1.
#But there are significant differences between lab3 & lab1, and lab3 & lab2.
Similarly, ANOVA tells us if the means of three or more samples are same or not. ANOVA is an omnibus test which means that it will tell us that means are same (or not same) but, it will not give you specific information (in the case when means are not same) about which means are not equal.
In this post we will perform ANOVA test on a dataset and find out where the difference lies.
Download the data here
setwd("e:/r") #I have kept the data file in this location
d <- read.csv("labs.csv")
d
boxplot(d)
s=stack(d) #This step is needed to prepare the data.
s #See how the arrangement of the data has changed
names(s) = c("measure","lab")
s
diff <- aov( measure~lab, data=s) #response variable (measure) comes first
summary(diff)
#We reject the assumption of no difference because the p-value suggests that
#there is a significant difference across the 3 labs.
#If there were no difference, the investigation would have ended here.
#As there is significant difference in this case, we need to find out where
#the difference lies. We need to perform pairwise comparison test. We have many
#options for this. One of them is Tukey's HSD test that gives us the intervals.
#We must remember that this test only works the design is balanced, in other
#words, data points for each lab must be same. In our case we have a blanced design.
#Run the test
tk <- TukeyHSD(diff)
#Output
# Tukey multiple comparisons of means
# 95% family-wise confidence level
#Fit: aov(formula = measure ~ lab, data = s)
#$lab
# diff lwr upr p adj
#lab2-lab1 1.75 -1.8842297 5.38423 0.4584177
#lab3-lab1 6.25 2.6157703 9.88423 0.0008182
#lab3-lab2 4.50 0.8657703 8.13423 0.0137160
#Each of the last 3 rows contain pairwise comparison results.
#Look at Row 1: diff column shows the mean of difference between lab2 and lab1 is 1.75
#lwr column provides the lower limit of difference at 95% confidence level
#upr column provides the upper limit of difference at 95% confidence level
#Last column: p adj gives the p-value 0.4584177; we cannot reject the assumption of no-difference between lab2 and lab1
#Row 2 and row3 indicates (p-value less than 0.05) that there are significant differences between lab3 & lab1; and lab3 & lab2
#Let's take a look at visual plot
plot(tk)
#Take a look at the plot. The dotted line reprents zero. Zero is within the limits of 95% confidence interval of the difference
#between lab2 and lab1 indicating that there is no significant difference between lab2 and lab1.
#But there are significant differences between lab3 & lab1, and lab3 & lab2.
