IES 612/STA 4-573/STA 4-576
Spring 2005
Week 9 IES612-week09-lecture.doc
ANOVA MODELS for standard designs
i. Completely Randomized Design (CRD)
ii. Random Complete Block Design (RCBD)
iii. Latin Squares (LS)
CRD with a single factor
Numeric data samples from t populations obtained
Assume yij ~ independent N(mi, s2)
ni = number of observations from the ith population
i = 1,2, , t (populations or treatments)
j = 1, 2, , ni (observations)
N = nT = n1 + n2 + + nt
CRD MODEL: yij = m + ai + eij
where
m = overall mean (with S ai = 0 constraint)
ai = treatment effect
eij = random error ~ independent N(0, s2)
CRD ANOVA Table
|
Source |
SS |
df |
MS |
Fobs |
|
Treatment |
SSTr |
t-1 |
MSTr = SSTr/(t-1) |
MSTr/MSE |
|
Error |
SSE |
N-t |
MSE= SSE/(N-t) |
|
|
Total |
TSS |
N-1 |
|
|
Notational warning: book uses single summation for multiple sums
![]()
Why does an F-test work? Expected Mean Squares
![]()
H0: m1 = m2= m3= = mt which is equivalent to H0: a1 = a2= a3= = at= 0
κ
![]()
κ
E(MSTr) = E(MSE)
CRD Advantages:
1. easy to construct
2. easy to analyze
3. can be used for any number of treatments
CRD Disadvantages:
1. Best suited for relatively few treatments
2. EUs must be as homogeneous as possible [may need more observations in a CRD to detect a particular effect size when compared to an RCBD or other designs]
Example Bacteria growth in meat under different packaging conditions (revisited)
title One-way
ANOVA;
title2 Bacteria in
meat data;
data meat;
input condition $ logcount @@;
cards;
plastic 7.66 plastic
6.98 plastic 7.80
vacuum 5.26
vacuum 5.44 vacuum 5.80
mixed 7.41
mixed 7.33 mixed 7.04
CO2 3.51 CO2 2.91 CO2 3.66
;
proc glm data=meat
order=data;
title3 One-way anova + contrast + model adequacy;
class condition;
model logcount=condition;
run;
RCBD with a single factor
* Design for comparing t treatments in b blocks
* Block = homogeneous unit formed in advance and treatments are randomly assigned within blocks (if t units in each block then RCBD)
RCBD MODEL: yij = m + ai + bj + eij
i = 1, , t (treatments)
j = 1, , b (blocks)
where
m = overall mean (with constraints S ai = 0 and S bj = 0)
ai = ith treatment effect
bj = jth block effect
eij = random error ~ independent N(0, s2)
|
E(yij) |
Block |
||||
|
1 |
2 |
|
b |
||
|
Treatment |
1 |
m + a1 + b1 |
m + a1 + b2 |
|
m + a1 + bb |
|
2 |
m + a2 + b1 |
m + a2 + b2 |
|
m + a2 + bb |
|
|
|
|
|
|
|
|
|
t |
m + at + b1 |
m + at + b2 |
|
m + a1 + bb |
|
Notice: Difference of means in the same block differ only by the as.
RCBD ANOVA Table
|
Source |
SS |
df |
MS |
Fobs |
|
Treatment |
SSTr |
t-1 |
MSTr = SSTr/(t-1) |
MSTr/MSE |
|
Block |
SSB |
b-1 |
MSB = SSB/(b-1) |
|
|
Error |
SSE |
(b-1)(t-1) |
MSE= SSE/(N-t) |
|
|
Total |
TSS |
bt-1 |
|
|
Comments:
i. RCBD has N=b*t total observations b/c form b blocks with t units each
ii. Spend b-1 of error degrees of freedom on blocks [potential COST] in hopes of achieving a smaller residual error for testing treatment effects.
TESTS:
H0: a1 = a2= a3= = at= 0
Test Statistic: Fobs = MSTr/MSE
RR: Reject H0 if Fobs > Fa, t-1, (b-1)(t-1)
P-value: Prob(Ft-1, (b-1)(t-1)>Fobs)
* Some argue that blocks should not be tested since no randomization basis for test
RCBD Advantages:
RCBD Disadvantages:
Efficiency of RCBD relative to CRD
![]()
![]()
![]()
IF RE >1, then this implies that r > b (blocking design is more efficient)
Latin Squares Design with a single factor
- 2 sources of extraneous variation controlled
- t x t LS has t rows and t columns (t treatments are randomly assigned to EUs within rows and columns so that every treatment appears in every row and column)
e.g. t=3
|
|
1 |
2 |
3 |
|
1 |
A |
B |
C |
|
2 |
B |
C |
A |
|
3 |
C |
A |
B |
|
|
1 |
2 |
3 |
|
1 |
A |
B |
C |
|
2 |
C |
A |
B |
|
3 |
B |
C |
A |
|
|
1 |
2 |
3 |
|
1 |
B |
A |
C |
|
2 |
A |
C |
B |
|
3 |
C |
B |
A |
* ..\classes\spring03\rcbd-factorial-other-02mar03;
* updated: 23 mar 04;
options nodate nocenter;
options nodate nocenter;
ods rtf;
title "RCBD
- block=plot trt=insecticide";
title2 "Ott/Longnecker
p. 868 - example 15.2";
data drcbd;
input insecticide plot yseedling @@;
datalines;
1 1 56 1 2 48 1 3 66 1 4 62
2 1 83 2 2 78 2 3 94 2 4 93
3 1 80 3 2 72 3 3 83 3 4 85
;
proc plot;
plot yseedling*insecticide=plot;
run;
proc glm;
class plot insecticide;
model yseedling = plot insecticide;
means insecticide / tukey;
run;
proc glm;
class insecticide;
model yseedling = insecticide;
run;
|
Plot
of yseedling*insecticide. Symbol is
value of plot. |
|
|
|
yseedling
|
|
100 |
|
|
|
|
|
|
|
3 |
|
4 |
|
|
|
90
|
|
|
|
|
|
4 |
|
|
|
1 3 |
|
|
|
80
1 |
|
2 |
|
|
|
|
|
|
|
|
|
2 |
|
70
|
|
|
|
|
|
3
|
|
|
|
|
|
4
|
|
60 |
|
|
|
|
|
1
|
|
|
|
|
|
|
|
50
|
|
2
|
|
|
|
|
|
|
|
|
|
|
|
40
|
|
|
|
1
2 3 |
|
|
|
insecticide |
RCBD
yij = m + ai + bj + eij where eij ~
ind. N(0,
)
|
RCBD - block=plot trt=insecticide |
|
Ott/Longnecker p. 868 - example 15.2 |
The GLM Procedure
|
Class
Level Information |
||
|
Class |
Levels |
Values |
|
plot |
4 |
1 2 3 4 |
|
insecticide |
3 |
1 2 3 |
|
Number of
Observations Read |
12 |
|
Number of
Observations Used |
12 |
|
Ott/Longnecker p. 868 - example 15.2 |
The GLM Procedure
Dependent Variable:
yseedling
|
Source |
DF |
Sum of
Squares |
Mean
Square |
F Value |
Pr > F |
|
Model |
5 |
2270.000000 |
454.000000 |
104.77 |
<.0001 |
|
Error |
6 |
26.000000 |
4.333333 |
|
|
|
Corrected
Total |
11 |
2296.000000 |
|
|
|
|
R-Square |
Coeff Var |
Root MSE |
yseedling Mean |
|
0.988676 |
2.775555 |
2.081666 |
75.00000 |
|
Source |
DF |
Type I SS |
Mean
Square |
F Value |
Pr > F |
|
plot |
3 |
438.000000 |
146.000000 |
33.69 |
0.0004 |
|
insecticide |
2 |
1832.000000 |
916.000000 |
211.38 |
<.0001 |
|
Source |
DF |
Type III
SS |
Mean
Square |
F Value |
Pr > F |
|
plot |
3 |
438.000000 |
146.000000 |
33.69 |
0.0004 |
|
insecticide |
2 |
1832.000000 |
916.000000 |
211.38 |
<.0001 |
|
Ott/Longnecker p. 868 - example 15.2 |
The GLM Procedure
Tukey's Studentized
Range (HSD) Test for yseedling
|
Note: |
This test controls the Type I experimentwise error rate, but
it generally has a higher Type II error rate than REGWQ. |
|
Alpha |
0.05 |
|
Error
Degrees of Freedom |
6 |
|
Error Mean
Square |
4.333333 |
|
Critical
Value of Studentized Range |
4.33902 |
|
Minimum
Significant Difference |
4.5162 |
|
Means with
the same letter |
|||
|
Tukey Grouping |
Mean |
N |
insecticide |
|
A |
87.000 |
4 |
2 |
|
|
|
|
|
|
B |
80.000 |
4 |
3 |
|
|
|
|
|
|
C |
58.000 |
4 |
1 |
Comment:
The following is a hypothetical analysis illustrating how the analysis
might have changed if a CRD had been conducted instead of an RCBD. Compare the MSE between the previous
analysis and this analysis.
|
Ott/Longnecker p. 868 - example 15.2 |
The GLM Procedure
|
Class
Level Information |
||
|
Class |
Levels |
Values |
|
insecticide |
3 |
1 2 3 |
|
Number of
Observations Read |
12 |
|
Number of
Observations Used |
12 |
|
Ott/Longnecker p. 868 - example 15.2 |
The GLM Procedure
Dependent Variable:
yseedling
|
Source |
DF |
Sum of
Squares |
Mean
Square |
F Value |
Pr > F |
|
Model |
2 |
1832.000000 |
916.000000 |
17.77 |
0.0007 |
|
Error |
9 |
464.000000 |
51.555556 |
|
|
|
Corrected
Total |
11 |
2296.000000 |
|
|
|
|
R-Square |
Coeff Var |
Root MSE |
yseedling Mean |
|
0.797909 |
9.573626 |
7.180220 |
75.00000 |
|
Source |
DF |
Type I SS |
Mean
Square |
F Value |
Pr > F |
|
insecticide |
2 |
1832.000000 |
916.000000 |
17.77 |
0.0007 |
|
Source |
DF |
Type III
SS |
Mean
Square |
F Value |
Pr > F |
|
insecticide |
2 |
1832.000000 |
916.000000 |
17.77 |
0.0007 |
Factorial Designs
Treatment Structure in a CRD
* Structure for the analysis of multiple factor studies
Factorial MODEL: yij = m + ai + bj + (ab)i j +eijk
i = 1, , a (Factor A levels)
j = 1, , b (Factor B levels)
k = 1, , nij
where
m = overall mean (with constraints S ai = 0, S bj = 0, Si (ab)ij = 0, Sj (ab)ij = 0)
ai = main effect of Factor A
bj = main effect of Factor B
(ab)ij = interaction of Factors A and B
eij = random error ~ independent N(0, s2)
|
E(yijk) |
Factor B |
||||
|
1 |
2 |
|
b |
||
|
Factor A |
1 |
m+a1+b1+(ab)11 |
m+a1+b2+(ab)12 |
|
m+a1+bb+(ab)1b |
|
2 |
m+a2+ b1+(ab)21 |
m+a2+b2+(ab)22 |
|
m+a2+bb+(ab)2b |
|
|
|
|
|
|
|
|
|
a |
m+aa+b1+(ab)a1 |
m+aa+b2+(ab)a2 |
|
m+aa+bb+(ab)ab |
|
Notice: Difference of means in the same level of one factor differ by the as AND the interaction terms (ab)s.
Twoway Factorial ANOVA (in a CRD) Table
|
Source |
SS |
df |
MS |
Fobs |
|
Factor A |
SSA |
a-1 |
MSA = SSA/(a-1) |
MSA/MSE |
|
Factor B |
SSB |
b-1 |
MSB = SSB/(b-1) |
MSB/MSE |
|
Interaction |
SSAB |
(a-1)(b-1) |
MSAB = SSAB/ (a-1)(b-1) |
MSAB/MSE |
|
Error |
SSE |
N-ab |
MSE= SSE/(N-ab) |
|
|
Total |
TSS |
N-1 |
|
|
TESTS:
H0: abij = 0 for all i,j (No interaction)
Test Statistic: Fobs = MSAB/MSE
RR: Reject H0 if Fobs > Fa, (a-1)(b-1), N-ab
P-value: Prob(F(a-1)(b-1), N-ab >Fobs)
H0: a1 = a2= a3= = aa= 0 (No A Main Effect)
Test Statistic: Fobs = MSA/MSE
RR: Reject H0 if Fobs > Fa, a-1, N-ab
P-value: Prob(Fa-1, N-ab >Fobs)
H0: b1 = b2= b3= = bb= 0 (No B Main Effect)
Test Statistic: Fobs = MSB/MSE
RR: Reject H0 if Fobs > Fa, b-1, N-ab
P-value: Prob(Fb-1, N-ab >Fobs)
yij = m + ai + bj + (ab)ij + eij where eij ~ ind. N(0,
)
title "Factorial
example: Factor A=pesticide Factor
B=variety";
title2 "Ott/Longnecker
p. 901 - example 15.8";
data dfact;
input variety pesticide yield @@;
datalines;
1 1 49 1 1 39 1 2 50 1 2 55 1 3 43 1 3 38 1 4
53 1 4 48
2 1 55 2 1 41 2 2 67 2 2 58 2 3 53 2 3 42 2 4
85 2 4 73
3 1 66 3 1 68 3 2 85 3 2 92 3 3 69 3 3 62 3 4
85 3 4 99
;
* Generate the Mean Profile plot;
proc sort; by
variety pesticide;
proc means noprint; by variety pesticide;
var yield;
output out=factmean mean=ymean;
run;
proc print;
run;
proc plot data=factmean;
plot
ymean*pesticide=variety;
run;
* Test components of the Two-way anova model;
proc glm data=dfact;
class pesticide variety;
model yield = variety pesticide variety*pesticide;
means variety / tukey;
means pesticide / tukey;
run;
|
Plot of ymean*pesticide.
Symbol is value of variety. |
|
|
|
ymean
|
|
100
|
|
|
|
|
|
|
|
|
|
|
|
3 |
|
90
|
|
3 |
|
|
|
|
|
|
|
|
|
|
|
80 |
|
2 |
|
|
|
|
|
|
|
|
|
|
|
70
|
|
|
|
3
|
|
3 |
|
|
|
2
|
|
|
|
60
|
|
|
|
|
|
|
|
|
|
1 |
|
|
|
50
1 |
|
2
|
|
2 |
|
|
|
1 |
|
|
|
|
|
40 1 |
|
|
|
1 2 3 4
|
|
|
|
pesticide |
COMMENT:
This plot suggests NO INTERACTION between pesticide and variety. It also suggests a difference in both
PESTICIDE levels and VARIETY levels.
Lets see if this is observed in the formal hypothesis tests.
|
Factorial example: Factor A=pesticide Factor B=variety |
|
Ott/Longnecker p. 901 - example 15.8 |
The GLM Procedure
|
Class
Level Information |
||
|
Class |
Levels |
Values |
|
pesticide |
4 |
1 2 3 4 |
|
variety |
3 |
1 2 3 |
|
Number of
Observations Read |
24 |
|
Number of
Observations Used |
24 |
Dependent Variable:
yield
|
Source |
DF |
Sum of
Squares |
Mean
Square |
F Value |
Pr > F |
|
Model |
11 |
6680.458333 |
607.314394 |
14.36 |
<.0001 |
|
Error |
12 |
507.500000 |
42.291667 |
|
|
|
Corrected
Total |
23 |
7187.958333 |
|
|
|
|
R-Square |
Coeff Var |
Root MSE |
yield Mean |
|
0.929396 |
10.58149 |
6.503204 |
61.45833 |
|
Source |
DF |
Type I SS |
Mean Square |
F Value |
Pr > F |
|
variety |
2 |
3996.083333 |
1998.041667 |
47.24 |
<.0001 |
|
pesticide |
3 |
2227.458333 |
742.486111 |
17.56 |
0.0001 |
|
pesticide*variety |
6 |
456.916667 |
76.152778 |
1.80 |
0.1817 |
|
Source |
DF |
Type III
SS |
Mean
Square |
F Value |
Pr > F |
|
variety |
2 |
3996.083333 |
1998.041667 |
47.24 |
<.0001 |
|
pesticide |
3 |
2227.458333 |
742.486111 |
17.56 |
0.0001 |
|
pesticide*variety |
6 |
456.916667 |
76.152778 |
1.80 |
0.1817 |
COMMENT: We would fail to reject the (null) hypothesis of NO INTERACTION between pesticide and variety (P=0.1817). The main effects of VARIETY and PESTICIDE are both significant at P-values of <.0001 and .0001, respectively. Thus, we conclude that YIELD differs for both different varieties and pesticides; however, these factors do no interact.
COMMENT:
TYPE III table = TYPE I table if the nij are the same in all
factor level combinations (balanced data).
TYPE I corresponds to sequential tests (test of term given
all terms above it) while TYPE III corresponds to partial/adjusted
tests (test of term given all other terms are in the model). It is usually recommended that you consider
the TYPE III tests.
|
Ott/Longnecker p. 901 - example 15.8 |
Tukey's Studentized
Range (HSD) Test for yield
|
Note: |
This test controls the Type I experimentwise error rate, but
it generally has a higher Type II error rate than REGWQ. |
|
Alpha |
0.05 |
|
Error
Degrees of Freedom |
12 |
|
Error Mean
Square |
42.29167 |
|
Critical
Value of Studentized Range |
3.77278 |
|
Minimum
Significant Difference |
8.6745 |
|
Means with
the same letter |
|||
|
Tukey Grouping |
Mean |
N |
variety |
|
A |
78.250 |
8 |
3 |
|
|
|
|
|
|
B |
59.250 |
8 |
2 |
|
|
|
|
|
|
C |
46.875 |
8 |
1 |
Comment:
The TUKEY procedure is comparing means of VARIETY levels that are pooled
across levels of the PESTICIDE factor.
This makes sense if the factors do not interact.
|
Ott/Longnecker p. 901 - example 15.8 |
The GLM Procedure
Tukey's Studentized
Range (HSD) Test for yield
|
Note: |
This test controls the Type I experimentwise error rate, but
it generally has a higher Type II error rate than REGWQ. |
|
Alpha |
0.05 |
|
Error
Degrees of Freedom |
12 |
|
Error Mean
Square |
42.29167 |
|
Critical
Value of Studentized Range |
4.19852 |
|
Minimum
Significant Difference |
11.147 |
|
Means with
the same letter |
|||
|
Tukey Grouping |
Mean |
N |
pesticide |
|
A |
73.833 |
6 |
4 |
|
A |
|
|
|
|
A |
67.833 |
6 |
2 |
|
|
|
|
|
|
B |
53.000 |
6 |
1 |
|
B |
|
|
|
|
B |
51.167 |
6 |
3 |
COMMENT: If you have significant interactions
present, then you may want to analyze the study as a one-way anova. In the variety-pesticide study, you have 3*4
= 12 unique factor level combinations that define the treatments. We can reanalyze these data using a one-way
anova with 12 levels. ASIDE: This is mainly a pedagogical exercise since
the FACTORS did not interact, there is no strong reason to do this unless you
want to identify the variety-pesticide combination that leads to the maximal
response.
COMMENT: Variety = 1, 2, 3 and Pesticide = 1, 2, 3, 4
so defining
COMBO = 10*variety + 1*pesticide
yields a treatment with levels
11, 12, 13, 14, 21, 22, 23, 24, 31, 32, 33, 34.
title "Factorial
- Factor A=pesticide Factor B=variety";
title2 "Ott/Longnecker
p. 901 - example 15.8";
title3 "redo
as a one-way anova";
data dfact;
input variety pesticide yield @@;
combo = 10*variety + 1*pesticide; * coding of combined treatment;
datalines;
1 1 49 1 1 39 1 2 50 1 2 55 1 3 43 1 3 38 1 4
53 1 4 48
2 1 55 2 1 41 2 2 67 2 2 58 2 3 53 2 3 42 2 4
85 2 4 73
3 1 66 3 1 68 3 2 85 3 2 92 3 3 69 3 3 62 3 4
85 3 4 99
;
proc glm;
class combo;
model yield = combo;
means combo / tukey;
run;
|
Factorial - Factor A=pesticide Factor B=variety |
|
Ott/Longnecker p. 901 - example 15.8 |
|
redo as a one-way anova |
The GLM Procedure
|
Class
Level Information |
||
|
Class |
Levels |
Values |
|
combo |
12 |
11 12 13 14 21 22 23 24 31 32 33 34 |
|
Number of
Observations Read |
24 |
|
Number of
Observations Used |
24 |
Dependent Variable:
yield
|
Source |
DF |
Sum of
Squares |
Mean
Square |
F Value |
Pr > F |
|
Model |
11 |
6680.458333 |
607.314394 |
14.36 |
<.0001 |
|
Error |
12 |
507.500000 |
42.291667 |
|
|
|
Corrected
Total |
23 |
7187.958333 |
|
|
|
|
R-Square |
Coeff Var |
Root MSE |
yield Mean |
|
0.929396 |
10.58149 |
6.503204 |
61.45833 |
|
Source |
DF |
Type I SS |
Mean
Square |
F Value |
Pr > F |
|
combo |
11 |
6680.458333 |
607.314394 |
14.36 |
<.0001 |
|
Source |
DF |
Type III
SS |
Mean
Square |
F Value |
Pr > F |
|
combo |
11 |
6680.458333 |
607.314394 |
14.36 |
<.0001 |
Tukey's Studentized
Range (HSD) Test for yield
|
Note: |
This test controls the Type I experimentwise error rate, but
it generally has a higher Type II error rate than REGWQ. |
|
Alpha |
0.05 |
|
Error
Degrees of Freedom |
12 |
|
Error Mean
Square |
42.29167 |
|
Critical
Value of Studentized Range |
5.61464 |
|
Minimum
Significant Difference |
25.819 |
|
Means with
the same letter |
||||||
|
Tukey Grouping |
Mean |
N |
combo |
|||
|
|
|
A |
|
92.000 |
2 |
34 |
|
|
|
A |
|
|
|
|
|
B |
|
A |
|
88.500 |
2 |
32 |
|
B |
|
A |
|
|
|
|
|
B |
|
A |
C |
79.000 |
2 |
24 |
|
B |
|
A |
C |
|
|
|
|
B |
D |
A |
C |
67.000 |
2 |
31 |
|
B |
D |
|
C |
|
|
|
|
B |
D |
E |
C |
65.500 |
2 |
33 |
|
|
D |
E |
C |
|
|
|
|
|
D |
E |
C |
62.500 |
2 |
22 |
|
|
D |
E |
|
|
|
|
|
|
D |
E |
|
52.500 |
2 |
12 |
|
|
D |
E |
|
|
|
|
|
|
D |
E |
|
50.500 |
2 |
14 |
|
|
D |
E |
|
|
|
|
|
|
D |
E |
|
48.000 |
2 |
21 |
|
|
D |
E |
|
|
|
|
|
|
D |
E |
|
47.500 |
2 |
23 |
|
|
D |
E |
|
|
|
|
|
|
D |
E |
|
44.000 |
2 |
11 |
|
|
|
E |
|
|
|
|
|
|
|
E |
|
40.500 |
2 |
13 |
Suppose your data are not balanced. This will often be the case even if the design starts out as balanced (beakers break, algal blooms kill all organisms in an aquarium, etc.) What will this do to the output of a factorial analysis? HINT: compare the TYPE I and TYPE III tables.
title "Factorial
- Factor A=pesticide Factor B=variety";
title2 "Ott/Longnecker
p. 901 - example 15.8";
title3 "what
if missing data in a couple of cells";
data dfact;
input variety pesticide yield @@;
datalines;
1 1 49 1 1 39 1 2 . 1 2 55 1 3 43 1 3 38 1 4 53 1 4 48
2 1 55 2 1 41 2 2 67 2 2 58 2 3 53 2 3 42 2 4
85 2 4 .
3 1 . 3
1 68 3 2 85 3 2 92 3 3 69 3 3 62 3 4 85 3 4 99
;
proc glm;
class pesticide variety;
model yield = variety pesticide variety*pesticide;
run;
|
Factorial - Factor A=pesticide Factor B=variety |
|
Ott/Longnecker p. 901 - example 15.8 |
|
what if missing data in a couple of cells |
The GLM
Procedure
|
Class
Level Information |
||
|
Class |
Levels |
Values |
|
pesticide |
4 |
1 2 3 4 |
|
variety |
3 |
1 2 3 |
|
Number of
Observations Read |
24 |
|
Number of
Observations Used |
21 |
Dependent Variable:
yield
|
Source |
DF |
Sum of
Squares |
Mean
Square |
F Value |
Pr > F |
|
Model |
11 |
6480.809524 |
589.164502 |
12.59 |
0.0004 |
|
Error |
9 |
421.000000 |
46.777778 |
|
|
|
Corrected
Total |
20 |
6901.809524 |
|
|
|
|
R-Square |
Coeff Var |
Root MSE |
yield Mean |
|
0.939002 |
11.16858 |
6.839428 |
61.23810 |
|
Source |
DF |
Type I SS |
Mean
Square |
F Value |
Pr > F |
|
variety |
2 |
4108.666667 |
2054.333333 |
43.92 |
<.0001 |
|
pesticide |
3 |
1864.336975 |
621.445658 |
13.29 |
0.0012 |
|
pesticide*variety |
6 |
507.805882 |
84.634314 |
1.81 |
0.2035 |
|
Source |
DF |
Type III
SS |
Mean
Square |
F Value |
Pr > F |
|
variety |
2 |
3096.800000 |
1548.400000 |
33.10 |
<.0001 |
|
pesticide |
3 |
2096.211538 |
698.737179 |
14.94 |
0.0008 |
|
pesticide*variety |
6 |
507.805882 |
84.634314 |
1.81 |
0.2035 |
I showed you an
ANCOVA analysis where the assumptions were violated (the slopes were not equal
when comparing Tahoe Keys to Eagle lake with respect to log(DO) depth
relationships). The next example is one
where the traditional ANCOVA assumption holds.
ANCOVA
yij = m + ai + b xij + eij where eij ~
ind. N(0,
)
title "ANCOVA
- Factor =Fertilizer Covariate=height";
title2 "Ott/Longnecker
p. 947 - example 16.1";
data dancova;
input fertilizer $ yield height @@;
datalines;
C 12.2 45 C 12.4 52 C 11.9 42 C 11.3 35 C 11.8
40 C 12.1 48
C 13.1 60 C 12.7 61 C 12.4 50 C 11.4 33
S 16.6 63 S 15.8 50 S 16.5 63 S 15.0 33 S 15.4
38 S 15.6 45
S 15.8 50 S 15.8 48 S 16.0 50 S 15.8 49
F 9.5
52 F 9.5 54 F 9.6 58 F 8.8 45 F 9.5 57 F
9.8 62
F 9.1
52 F 10.3 67 F 9.5 55 F 8.5 40
;
proc plot;
plot yield*height=fertilizer;
run;
proc glm;
class fertilizer;
model yield = height|fertilizer;
run;
proc glm;
class fertilizer;
model yield = height fertilizer;
lsmeans fertilizer / pdiff;
run;
|
Plot of yield*height. Symbol
is value of fertilizer.
|
|
|
|
yield
|
|
|
|
17
|
|
|
|
S |
|
|
|
16 S
|
|
S S
S |
|
S S |
|
|
|
15 S
|
|
|
|
|
|
|
|
14
|
|
|
|
|
|
|
|
13
C |
|
C |
|
C C |
|
C |
|
12 C C |
|
C
|
|
C
|
|
C
|
|
11
|
|
|
|
|
|
F |
|
10
|
|
F |
|
F F F F
F |
|
|
|
9 F |
|
F |
|
F |
|
|
|
8
|
|
|
|
|
|
30 35 40
45 50 55 60 65 70 |
|
|
|
height |
|
|
|
NOTE: 2 obs hidden.
|
Notice: The yield is linearly related to the covariate (height) in each fertilizer group.
|
ANCOVA - Factor =Fertilizer Covariate=height |
|
Ott/Longnecker p. 947 - example 16.1 |
The GLM Procedure
|
Class
Level Information |
||
|
Class |
Levels |
Values |
|
fertilizer |
3 |
C F S |
|
Number of
Observations Read |
30 |
|
Number of
Observations Used |
30 |
Dependent Variable:
yield
|
Source |
DF |
Sum of
Squares |
Mean
Square |
F Value |
Pr > F |
|
Model |
5 |
214.4372247 |
42.8874449 |
2887.70 |
<.0001 |
|
Error |
24 |
0.3564420 |
0.0148517 |
|
|
|
Corrected
Total |
29 |
214.7936667 |
|
|
|
|
R-Square |
Coeff Var |
Root MSE |
yield Mean |
|
0.998341 |
0.978334 |
0.121868 |
12.45667 |
|
Source |
DF |
Type I SS |
Mean
Square |
F Value |
Pr > F |
|
height |
1 |
0.4721494 |
0.4721494 |
31.79 |
<.0001 |
|
fertilizer |
2 |
213.9038045 |
106.9519022 |
7201.30 |
<.0001 |
|
height*fertilizer |
2 |
0.0612708 |
0.0306354 |
2.06 |
0.1491 |
|
Source |
DF |
Type III
SS |
Mean
Square |
F Value |
Pr > F |
|
height |
1 |
6.65321124 |
6.65321124 |
447.97 |
<.0001 |
|
fertilizer |
2 |
6.69631934 |
3.34815967 |
225.44 |
<.0001 |
|
height*fertilizer |
2 |
0.06127080 |
0.03063540 |
2.06 |
0.1491 |
|
ANCOVA - Factor =Fertilizer Covariate=height |
|
Ott/Longnecker p. 947 - example 16.1 |
The GLM Procedure
Dependent Variable:
yield
|
Source |
DF |
Sum of
Squares |
Mean
Square |
F Value |
Pr > F |
|
Model |
3 |
214.3759539 |
71.4586513 |
4447.85 |
<.0001 |
|
Error |
26 |
0.4177128 |
0.0160659 |
|
|
|
Corrected
Total |
29 |
214.7936667 |
|
|
|
|
R-Square |
Coeff Var |
Root MSE |
yield Mean |
|
0.998055 |
1.017537 |
0.126751 |
12.45667 |
|
Source |
DF |
Type I SS |
Mean
Square |
F Value |
Pr > F |
|
height |
1 |
0.4721494 |
0.4721494 |
29.39 |
<.0001 |
|
fertilizer |
2 |
213.9038045 |
106.9519022 |
6657.08 |
<.0001 |
|
Source |
DF |
Type III
SS |
Mean
Square |
F Value |
Pr > F |
|
height |
1 |
6.6932872 |
6.6932872 |
416.62 |
<.0001 |
|
fertilizer |
2 |
213.9038045 |
106.9519022 |
6657.08 |
<.0001 |
The GLM Procedure
Least Squares Means
|
fertilizer |
yield
LSMEAN |
LSMEAN
Number |
|
C |
12.3141728 |
1 |
|
F |
9.1700172 |
2 |
|
S |
15.8858099 |
3 |
Comment:
The LSMEANS compares the yields for the different fertilizer groups
after adjusting for the covariate.
|
Least
Squares Means for effect fertilizer |
|||
|
i/j |
1 |
2 |
3 |
|
1 |
|
<.0001 |
<.0001 |
|
2 |
<.0001 |
|
<.0001 |
|
3 |
<.0001 |
<.0001 |
|
|
Note: |
To ensure overall protection level, only probabilities
associated with pre-planned comparisons should be used. |
Finally, suppose
that the factor levels were not FIXED but where sampled from some population of
factor levels. This naturally leads to
a random (or mixed) effects model. Here
is a simple illustration.
Random Effects Models
yij = m + ai + eij where ai ~
N(0,
) and eij ~
ind. N(0,
)
title "Random
effect";
title2 "Ott/Longnecker
p. 981 - example 17.1";
data draneff;
input station intensity @@;
datalines;
1 20 1 1050 1 3200 1 5600 1 50
2 4300 2 70 2 2560 2 3650 2 80
3 100 3 7700 3 8500 3 2960 3 3340
;
proc glm;
class station;
model intensity=station;
random station;
run;
ods html close;
|
Random effect |
|
Ott/Longnecker p. 981 - example 17.1 |
The GLM Procedure
|
Class
Level Information |
||
|
Class |
Levels |
Values |
|
station |
3 |
1 2 3 |
|
Number of
Observations Read |
15 |
|
Number of
Observations Used |
15 |
|
Ott/Longnecker p. 981 - example 17.1 |
The GLM Procedure
Dependent Variable:
intensity
|
Source |
DF |
Sum of
Squares |
Mean
Square |
F Value |
Pr > F |
|
Model |
2 |
20259573.3 |
10129786.7 |
1.38 |
0.2884 |
|
Error |
12 |
87989600.0 |
7332466.7 |
|
|
|
Corrected
Total |
14 |
108249173.3 |
|
|
|
|
R-Square |
Coeff Var |
Root MSE |
intensity Mean |
|
0.187157 |
94.06622 |
2707.853 |
2878.667 |
|
Source |
DF |
Type I SS |
Mean
Square |
F Value |
Pr > F |
|
station |
2 |
20259573.33 |
10129786.67 |
1.38 |
0.2884 |
|
Source |
DF |
Type III
SS |
Mean
Square |
F Value |
Pr > F |
|
station |
2 |
20259573.33 |
10129786.67 |
1.38 |
0.2884 |
The GLM Procedure
|
Source |
Type III
Expected Mean Square |
|
station |
Var(Error) + 5 Var(station) |