IES 612/STA 4-573/STA 4-576
Spring 2005
Week 6 IES612-week06-lecture.doc
ANOVA MODELS models for comparing means of different treatments or
populations
Recall your old friend the two-group pooled variance t-test
H0: m1 = m2 [two populations do NOT differ in mean response]
Ha: m1 ≠ m2
Assumptions/Data?
Data from population 1: (y11, y12, . . . , y1n1)
Data from population 2: (y21, y22, . . . , y2n2)
Assume Yij ~ independent N(mi, s2)
i = 1,2
j = 1, 2, , ni
Another way of writing this is Yij = mi + eij with eij ~ independent N(0, s2)
In other words, the response of the jth observation in the ith population can be written in terms of the mean of the ith population + how this observation differs from the mean. Does this look familiar?
Test Statistic?
where 
The pooled variance looks like something from regression. What?
How about your new friend, regression?
Can we test H0: m1 = m2 [two populations differ in mean response]? Vs. Ha: m1 ≠ m2
Assumptions/Data?
Data from population 1: (y11, y12, . . . , y1n1)
Data from population 2: (y21, y22, . . . , y2n2)
Assume Yij ~ independent N(mi, s2) i = 1,2, j = 1, 2, , ni
Another way of writing this is Yij = mi + eij with eij ~ independent N(0, s2)
Let X = 1 (if group 2) and X=0 (if group 1) and Y = b0 + b1 X + e
Then
Group 1: Y = b0 + e
Group 2: Y = b0 + b1
+ e
Implying m1 = b0 and m2 = b0 + b1 so b1 = m2 - m1. Thus, H0: m1 = m2 AND H0: b1 = 0 test the same hypothesis.
Example: Comparing Two-group T-test, Regression test and one-way ANOVA test
options ls=80
formdlim=- nocenter nodate;
data meat;
input condition $ logcount @@;
imix = (condition=mixed);
datalines;
vacuum 5.26 vacuum 5.44
vacuum 5.80
mixed 7.41
mixed 7.33 mixed 7.04
;
ods html;
title Log(bacteria
count) for different packaging conditions;
proc boxplot;
title2 Boxplots of
log(count);
plot logcount*condition;
run;
proc ttest;
title2 T-test
comparing mix to vacuum conditions;
class condition;
var logcount;
run;
proc reg;
title2 Regression
with indicator variable for mix condition;
model logcount = imix;
run;
proc glm;
title2 One-way
anova model;
class condition;
model logcount = condition;
run;
ods html close;
|
T-test comparing mix to vacuum conditions |
The TTEST Procedure
|
Statistics |
|||||||||||
|
Variable |
condition |
N |
Lower CL |
Mean |
Upper CL |
Lower CL |
Std Dev |
Upper CL |
Std Err |
Minimum |
Maximum |
|
logcount |
mixed |
3 |
6.7764 |
7.26 |
7.7436 |
0.1014 |
0.1947 |
1.2235 |
0.1124 |
7.04 |
7.41 |
|
logcount |
vacuum |
3 |
4.817 |
5.5 |
6.183 |
0.1432 |
0.275 |
1.728 |
0.1587 |
5.26 |
5.8 |
|
logcount |
Diff (1-2) |
|
1.22 |
1.76 |
2.3 |
0.1427 |
0.2382 |
0.6845 |
0.1945 |
|
|
|
T-Tests |
|||||
|
Variable |
Method |
Variances |
DF |
t Value |
Pr > |t| |
|
logcount |
Pooled |
Equal |
4 |
9.05 |
0.0008 |
|
logcount |
Satterthwaite |
Unequal |
3.6 |
9.05 |
0.0013 |
|
Equality
of Variances |
|||||
|
Variable |
Method |
Num DF |
Den DF |
F Value |
Pr > F |
|
logcount |
Folded F |
2 |
2 |
1.99 |
0.6678 |
|
Regression with indicator variable for mix condition |
The REG Procedure
Model: MODEL1
Dependent Variable:
logcount
|
Number of
Observations Read |
6 |
|
Number of
Observations Used |
6 |
|
Analysis
of Variance |
|||||
|
Source |
DF |
Sum of |
Mean |
F Value |
Pr > F |
|
Model |
1 |
4.64640 |
4.64640 |
81.87 |
0.0008 |
|
Error |
4 |
0.22700 |
0.05675 |
|
|
|
Corrected
Total |
5 |
4.87340 |
|
|
|
|
Root MSE |
0.23822 |
R-Square |
0.9534 |
|
Dependent
Mean |
6.38000 |
Adj R-Sq |
0.9418 |
|
Coeff Var |
3.73390 |
|
|
|
Parameter
Estimates |
|||||
|
Variable |
DF |
Parameter |
Standard |
t Value |
Pr > |t| |
|
Intercept |
1 |
5.50000 |
0.13754 |
39.99 |
<.0001 |
|
imix |
1 |
1.76000 |
0.19451 |
9.05 |
0.0008 |
|
One-way anova model |
The GLM Procedure
|
Class
Level Information |
||
|
Class |
Levels |
Values |
|
condition |
2 |
mixed vacuum |
|
Number of
Observations Read |
6 |
|
Number of
Observations Used |
6 |
Dependent Variable:
logcount
|
Source |
DF |
Sum of
Squares |
Mean
Square |
F Value |
Pr > F |
|
Model |
1 |
4.64640000 |
4.64640000 |
81.87 |
0.0008 |
|
Error |
4 |
0.22700000 |
0.05675000 |
|
|
|
Corrected
Total |
5 |
4.87340000 |
|
|
|
|
R-Square |
Coeff Var |
Root MSE |
logcount Mean |
|
0.953421 |
3.733896 |
0.238223 |
6.380000 |
|
Source |
DF |
Type I SS |
Mean
Square |
F Value |
Pr > F |
|
condition |
1 |
4.64640000 |
4.64640000 |
81.87 |
0.0008 |
|
Source |
DF |
Type III
SS |
Mean
Square |
F Value |
Pr > F |
|
condition |
1 |
4.64640000 |
4.64640000 |
81.87 |
0.0008 |
|
|
Test statistic |
P-value |
Comment |
|
T-test |
tobs=9.05 |
0.0008 |
Test of m1 = m2 note unequal variance t-test has same value test statistic [b/c sample sizes are the same]; however, slight modification in degrees of freedom |
|
Regression |
tobs=9.05 Fobs=81.87 |
0.0008 |
Test of b1=0 in model logcount =b0 + b1 I[condition=mix] + e |
|
One-way ANOVA |
Fobs=81.87 |
0.0008 |
Test of m1 = m2 |
A more general formulation
Numeric data samples from t populations obtained
(y11, y12, . . . , y1n1) = {y1j} j=1, , n1
(y21, y22, . . . , y2n2) = {y2j} j=1, , n2
(yt1, yt2, . . . , ytnt) = {ytj} j=1, , nt
Assume Yij ~ independent N(mi, s2)
ni = number of observations from the ith population
i = 1,2, , t (populations or treatments)
j = 1, 2, , ni (observations)

Terminology
* Designed experiments versus observational studies
* Completely Randomized Designs (CRD)
H0: m1 = m2= m3= = mt
Ha: mi ≠ mj [at least two population means differ]
Assumptions/Data?
Assume Yij ~ independent N(mi, s2)
i = 1,2, , t
j = 1, 2, , ni
Test Statistic?
where the between(among)
group variability is
and the within group variability is 
Reject H0
if ![]()
AOV Table
|
Source |
SS |
df |
MS |
Fobs |
|
Between |
SSB |
t-1 |
SSB/(t-1) |
MSB/MSW |
|
Within |
SSW |
nT-t |
SSW/( nT-t) |
|
|
Totals |
TSS |
nT-1 |
|
|
![]()
Example Bacteria growth in meat under different packaging conditions (revisited)
*--------------------------------------------------------------------;
title One-way
ANOVA/ CRD example + contrasts + multiple comparisons;
title2 Bacteria in
meat data;
data meat;
input condition $ logcount @@;
imix = (condition=mixed);
iCO2 = (condition=CO2);
cards;
plastic 7.66 plastic
6.98 plastic 7.80
vacuum 5.26
vacuum 5.44 vacuum 5.80
mixed 7.41
mixed 7.33 mixed 7.04
CO2 3.51
CO2 2.91 CO2 3.66
;
proc print
data=meat;
run;
proc sort out=smeat;
by condition;
proc univariate
plot; by condition;
title3 summary
statistics and boxplot;
var logcount;
run;
proc reg data=meat;
title3 Regression with indicators;
model logcount = ivac imix iCO2;
run;
proc glm data=meat
order=data;
title3 One-way anova + contrast + model adequacy;
class condition;
model logcount=condition;
output out=new p=yhat r=resid;
contrast 'plastic vs. rest' condition 3 -1 -1
-1;
estimate 'plastic vs. rest' condition 3 -1 -1
-1;
contrast 'CO2 vs. plastic' condition -1
0 0 1;
estimate 'CO2 vs. plastic' condition -1
0 0 1;
contrast 'CO2 vs. vacuum' condition
0 -1 0 1;
estimate 'CO2 vs. vacuum' condition
0 -1 0 1;
contrast 'CO2 vs. mixed' condition
0 0 -1 1;
estimate 'CO2 vs. mixed' condition
0 0 -1 1;
lsmeans condition / stderr pdiff;
means condition / lsd clm;
means condition / bon scheffe tukey;
means condition / bon tukey cldiff;
run;
proc plot data=new;
plot logcount*condition yhat*condition='p'
/overlay;
plot resid*condition resid*yhat / vref=0;
run;
proc univariate
plot;
var resid;
run;
* construct the
normal scores - Z[(i-.375)/(n+.25)];
* note not
multiplied by sqrt(mse);
proc rank data=new
normal=blom out=rnew;
var resid;
ranks nscore;
* generate plot
analogous to univariate's normal prob. plot;
proc plot;
plot resid*nscore;
run;
data moremeat; set
meat;
count = exp(logcount);
title3 raw count data analyzed;
proc glm
data=moremeat;
class condition;
model count=condition;
output out=mnew p=yhat r=resid;
lsmeans condition / stderr pdiff;
* means condition / clm bon scheffe lsd tukey
snk;
proc plot data=mnew;
plot count*condition yhat*condition='p'
/overlay;
plot resid*condition resid*yhat / vref=0;
proc univariate
data=mnew plot;
var resid;
proc rank data=mnew
normal=blom out=rnew;
var resid;
ranks nscore;
proc plot;
plot resid*nscore;
proc print data=meat;
run;
Obs condition
logcount ivac imix
iCO2
1
plastic 7.66 0
0 0
2
plastic 6.98 0
0 0
3
plastic 7.80 0
0 0
4
vacuum 5.26 1
0 0
5
vacuum 5.44 1
0 0
6
vacuum 5.80 1
0 0
7
mixed 7.41 0
1 0
8
mixed 7.33 0
1 0
9
mixed 7.04 0
1 0
10
CO2 3.51 0
0 1
11
CO2 2.91 0
0 1
12 CO2 3.66 0 0 1
proc sort out=smeat; by condition;
proc univariate plot; by condition;
title3 summary statistics and boxplot;
var logcount;
run;
The UNIVARIATE Procedure
Variable: logcount
Schematic Plots
8 +
| *-----*
| +-----+ |
+ |
| *--+--* |
|
7 + +-----+ +-----+
|
|
|
6 +
|
+-----+
| *--+--*
|
+-----+
5 +
|
|
|
4 +
| +-----+
| *-----*
| |
+ |
3 + +-----+
|
|
|
2 +
------------+-----------+-----------+-----------+-----------
condition CO2 mixed plastic vacuum
proc reg data=meat;
title3 Regression with indicators;
model logcount = ivac imix iCO2;
run;
The REG Procedure
Model: MODEL1
Dependent Variable:
logcount
Analysis of
Variance
Sum of Mean
Source DF Squares Square F Value
Pr > F
Model 3 32.87280 10.95760 94.58
<.0001
Error 8 0.92680 0.11585
Corrected Total 11 33.79960
Root MSE 0.34037 R-Square
0.9726
Dependent Mean 5.90000 Adj R-Sq
0.9623
Coeff Var 5.76894
Parameter Estimates
Parameter Standard
Variable DF
Estimate Error t Value
Pr > |t|
Intercept 1 7.48000 0.19651 38.06
<.0001
ivac 1 -1.98000 0.27791 -7.12
<.0001
imix 1 -0.22000 0.27791 -0.79
0.4514
iCO2 1 -4.12000 0.27791 -14.83 <.0001
proc glm data=meat order=data;
title3 One-way anova + contrast + model adequacy;
class condition;
model logcount=condition;
output out=new p=yhat r=resid;
The GLM Procedure
Class Level Information
Class Levels Values
condition 4
plastic vacuum mixed CO2
Number of observations 1
The GLM Procedure
Dependent Variable:
logcount
Sum of
Source DF Squares Mean Square F Value
Pr > F
Model 3 32.87280000 10.95760000 94.58
<.0001
Error 8 0.92680000 0.11585000
Corrected Total 11 33.79960000
R-Square Coeff Var Root MSE logcount Mean
0.972580 5.768940 0.340367 5.900000
Source DF Type I
condition 3 32.87280000 10.95760000 94.58
<.0001
Source
condition 3 32.87280000 10.95760000 94.58
<.0001
Contrast
plastic vs. rest 1 9.98560000 9.98560000 86.19
<.0001
CO2 vs. plastic 1 25.46160000 25.46160000 219.78
<.0001
CO2 vs. vacuum 1 6.86940000 6.86940000 59.30
<.0001
CO2 vs. mixed 1 22.81500000 22.81500000 196.94 <.0001
contrast 'plastic vs. rest' condition 3 -1 -1 -1;
estimate 'plastic vs. rest' condition 3 -1 -1 -1;
contrast 'CO2 vs. plastic' condition -1 0 0 1;
estimate 'CO2 vs. plastic' condition -1 0 0 1;
contrast 'CO2 vs. vacuum' condition 0 -1 0 1;
estimate 'CO2 vs. vacuum' condition 0 -1 0 1;
contrast 'CO2 vs. mixed' condition 0 0 -1 1;
estimate 'CO2 vs. mixed' condition 0 0 -1 1;
Dependent Variable:
logcount
Standard
Parameter Estimate Error t Value
Pr > |t|
plastic vs. rest 6.32000000 0.68073490 9.28
<.0001
CO2 vs. plastic -4.12000000 0.27790886 -14.83
<.0001
CO2 vs. vacuum -2.14000000 0.27790886 -7.70
<.0001
CO2 vs. mixed -3.90000000 0.27790886 -14.03 <.0001
lsmeans condition / stderr pdiff;
The GLM Procedure
Least Squares Means
logcount Standard LSMEAN
condition LSMEAN Error Pr > |t| Number
plastic 7.48000000 0.19651124 <.0001 1
vacuum 5.50000000 0.19651124 <.0001 2
mixed 7.26000000 0.19651124 <.0001 3
CO2 3.36000000 0.19651124 <.0001 4
Least Squares Means for effect condition
Pr > |t| for H0:
LSMean(i)=LSMean(j)
Dependent Variable: logcount
i/j 1 2 3 4
1 <.0001 0.4514 <.0001
2
<.0001
0.0002 <.0001
3
0.4514 0.0002 <.0001
4
<.0001 <.0001 <.0001
NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used
means condition / lsd clm;
t Confidence Intervals for
logcount
Alpha 0.05
Error Degrees of
Freedom 8
Error Mean Square 0.11585
Critical Value of t 2.30600
Half Width of Confidence
Interval 0.453156
95%
Confidence
condition N Mean
Limits
plastic 3 7.4800 7.0268 7.9332
mixed 3 7.2600 6.8068 7.7132
vacuum 3 5.5000 5.0468 5.9532
CO2 3 3.3600 2.9068 3.8132
means condition / bon scheffe tukey;
NOTE: This test controls
the Type I experimentwise error rate, but it generally has a higher Type II
error rate than REGWQ.
Alpha 0.05
Error Degrees of
Freedom 8
Error Mean Square 0.11585
Critical Value of
Studentized Range 4.52880
Minimum Significant
Difference 0.89
Means with the same letter
are not significantly different.
Mean N condition
A 7.4800 3
plastic
A
A 7.2600 3
mixed
B 5.5000 3 vacuum
C 3.3600 3 CO2
NOTE: This test controls
the Type I experimentwise error rate, but it generally has a higher Type II
error rate than REGWQ.
Alpha 0.05
Error Degrees of
Freedom 8
Error Mean Square 0.11585
Critical Value of t 3.47888
Minimum Significant
Difference 0.9668
Means with the same letter
are not significantly different.
Mean N
condition
A 7.4800 3
plastic
A
A 7.2600 3
mixed
B 5.5000 3
vacuum
C 3.3600 3 CO2
NOTE: This test controls
the Type I experimentwise error rate.
Alpha 0.05
Error Degrees of
Freedom 8
Error Mean Square 0.11585
Critical Value of F 4.06618
Minimum Significant Difference 0.9706
Means with the same letter
are not significantly different.
Mean N
condition
A 7.4800 3
plastic
A
A 7.2600 3 mixed
B 5.5000 3
vacuum
C 3.3600 3
CO
means condition / bon tukey cldiff;
NOTE: This test controls
the Type I experimentwise error rate.
Alpha 0.05
Error Degrees of
Freedom 8
Error Mean Square 0.11585
Critical Value of
Studentized Range 4.52880
Minimum Significant
Difference 0.89
Comparisons significant at
the 0.05 level are indicated by ***.
Difference
condition Between Simultaneous 95%
Comparison Means Confidence Limits
plastic - mixed 0.2200 -0.6700
1.1100
plastic - vacuum 1.9800 1.0900
2.8700 ***
plastic - CO2 4.1200 3.2300
5.0100 ***
mixed - plastic -0.2200 -1.1100
0.6700
mixed - vacuum 1.7600 0.8700
2.6500 ***
mixed - CO2 3.9000 3.0100
4.7900 ***
vacuum - plastic -1.9800 -2.8700
-1.0900 ***
vacuum - mixed -1.7600 -2.6500
-0.8700 ***
vacuum - CO2 2.1400 1.2500
3.0300 ***
CO2 - plastic -4.1200 -5.0100
-3.2300 ***
CO2 - mixed -3.9000 -4.7900
-3.0100 ***
CO2 - vacuum -2.1400 -3.0300 -1.2500 **
NOTE: This test controls
the Type I experimentwise error rate, but it generally has a higher Type II
error rate than Tukey's for all pairwise comparisons.
Alpha 0.05
Error Degrees of
Freedom 8
Error Mean Square 0.11585
Critical Value of t 3.47888
Minimum Significant
Difference 0.9668
Comparisons significant at
the 0.05 level are indicated by ***.
Difference
condition Between Simultaneous 95%
Comparison Means Confidence Limits
plastic - mixed 0.2200 -0.7468
1.1868
plastic - vacuum 1.9800 1.0132
2.9468 ***
plastic - CO2 4.1200 3.1532
5.0868 ***
mixed - plastic -0.2200 -1.1868
0.7468
mixed - vacuum 1.7600 0.7932
2.7268 ***
mixed - CO2 3.9000 2.9332
4.8668 ***
vacuum - plastic -1.9800 -2.9468
-1.0132 ***
vacuum - mixed -1.7600 -2.7268
-0.7932 ***
vacuum - CO2 2.1400 1.1732
3.1068 ***
CO2 - plastic -4.1200 -5.0868
-3.1532 ***
CO2 - mixed -3.9000 -4.8668
-2.9332 ***
CO2 - vacuum -2.1400 -3.1068 -1.1732 ***
options ls=70;
proc plot data=new;
plot logcount*condition yhat*condition='p' /overlay;
plot resid*condition resid*yhat / vref=0;
run;
Plot of resid*condition. Legend: A = 1 obs, B = 2 obs, etc.
resid
0.4
A
A A
0.2 A
A A
A
0.0
A
-0.2 A
A
-0.4
A
A
-0.6
CO2 mixed plastic vacuum
Condition
Plot of resid*yhat. Legend: A = 1 obs, B = 2 obs, etc.
resid
0.4
A A A
0.2 A
A A
A
0.0
A
-0.2
A
A
-0.4
A
A
-0.6
3 4 5 6 7 8
yhat
proc univariate plot;
var resid;
* construct the normal scores - Z[(i-.375)/(n+.25)];
* note not multiplied by sqrt(mse);
proc rank data=new normal=blom out=rnew;
var resid;
ranks nscore;
* generate plot analogous to univariate's normal prob. plot;
proc plot;
plot resid*nscore;
Plot of resid*nscore. Legend: A = 1 obs, B = 2 obs, etc.
resid
0.4
B A
0.2 A
A A
A
0.0
A
-0.2 A
A
-0.4
A
A
-0.6
-2 -1 0 1 2
Rank for Variable
resid
data moremeat; set meat;
count = exp(logcount);
title3 raw count data analyzed;
proc glm data=moremeat;
class condition;
model count=condition;
output out=mnew p=yhat r=resid;
lsmeans condition / stderr pdiff;
* means condition / clm bon scheffe lsd tukey snk;
run;
The GLM Procedure
Dependent Variable: count
Sum of
Source DF Squares Mean Square F Value
Pr > F
Model 3 7282652.348 2427550.783 16.56
0.0009
Error 8 1172820.616 146602.577
Corrected Total 11 8455472.964
R-Square Coeff Var Root MSE
count Mean
0.861294 42.54159 382.8872 900.0303
Source DF Type I
condition 3 7282652.348 2427550.783 16.56
0.0009
Source
condition 3 7282652.348 2427550.783 16.56 0.0009
proc plot data=mnew;
plot count*condition yhat*condition='p' /overlay;
plot resid*condition resid*yhat / vref=0;
run;
Plot of resid*condition. Legend: A = 1 obs, B = 2 obs, etc.
resid
1000
A
500
A A
A A
0
CA
A
A
-500
A
-1000
CO2 mixed plastic vacuum
Condition
Plot of resid*yhat. Legend: A = 1 obs, B = 2 obs, etc.
resid
1000
A
500
A A
A A
0
CA
A
A
-500
A
-1000
0
500 1000 1500 2000
yhat
proc rank data=mnew normal=blom out=rnew;
var resid;
ranks nscore;
proc plot;
plot resid*nscore;
run;
Plot of resid*nscore. Legend: A = 1 obs, B = 2 obs, etc.
resid
1000
A
500
A
A
A A
0 A A A A
A
A
-500
A
-1000
-2 -1 0 1 2
Rank for Variable resid