IES 612/STA
4-573/STA 4-576
Spring 2005
Week 10 –
IES612-week10-lecture.doc
Topics:
i. Fractional Factorial
ii. Response
Surface Methods
iii. Nesting/Crossing
and Split-Plots
iv. Fixed/Random/Fixed Effects Models
i.
Fractional Factorials
-Notes
from Dr. R. Schaefer (
C:\MyDocs\Class\IES
612\Bailer's 612 Winter 2005\Fractional Factorials.doc
GOAL: To determine which factors from among a
very large number are the most important to investigate further in a more
thoroughly and completely in a factorial analysis.
EXAMPLE
Suppose we want to optimize the operation
of a Waste Water Treatment Plant (WWTP).
A WWTP is made up of three operations:
Physical Unit Operation, the Chemical Unit Operation, and the Biological
Unit Operation. Each of these units is
made up of five to seven processes. The
Physical Unit Operation is comprised of:
screening, comminution, flow equalization,
sedimentation, flotation, and granular-medium filtration. The Chemical Unit Operation depends on: chemical precipitation, adsorption,
disinfection, dechlorination, and other chemical
applications, such as grease removal.
Finally, the Biological Unit Operation consists of: activated sludge process, aerated lagoons,
trickling filters, rotating biological contactors, pond stabilization,
anaerobic digestion, and biological nutrient removal.
Suppose the manager of such a facility
wants to optimize the facility. Above we
have identified eighteen potential variables that could be investigated in an
experimental study. To further simplify,
let’s assume that each variable will only be investigated at one of two
levels. The dilemma is that to
completely investigate these 18 factors would require we obtain a response
measurement for every combination of factors. This would entail 218 or 262,144
observations! Suppose further that the
WWTP needs to be run a week at each experimental combination for the system to
stabilize. To complete the experiment
would require 37,450 weeks or 720 years!
Clearly, this is not a very effective use of time or money.
The solution is to eliminate many, if
not most, of the factors in a preliminary screening process that
will identify those variables that have the largest or greatest impact on the
WWTP.
The method that is used is known as Fractional
Factorial designs. Such designs
attempt to accomplish two goals:
1. Reduce
the number of experimental observations by observing ONLY a fraction (say, ½,
¼, etc) of the entire 218 design.
2. Rather
than focus on all possible effects normally investigated in a factorial design,
notably the main effects and interactions, fractional factorial designs allows
the investigator to focus on the main effects and selected “lower order”
interaction effects, usually up to degree three. Recall that in a design with 18 factors, one
COULD investigate main effects, two-way interactions, three-way interactions,
four-way interactions, all the way up to the eighteen-way interaction.
BACKGROUND
We
need several concepts and terms to be recalled and/or defined.
Two-Way ANOVA Model
Recall that in a Two-Way Model, there are
two factors, say A and B. If we observe
every combination of levels A and B (that is, we have one observation of the
response for each combination of A and B) AND we have at least one replication
then we can investigate the effects of A and B (the MAIN EFFECTS) as well as
the A*B effect (the INTERACTION EFFECT).
Multi-Way ANOVA Model
If we take the Two-Way ANOVA and extend
it to a Multi-Way ANOVA, a Factorial ANOVA with many factors (say there are “p”
factors), we could then investigate the MAIN EFFECTS and INTERACTIONS. In this Multi-Way case, there are likewise
MANY INTERACTIONS. There are two-way
interactions in which the interaction between two of the factors is
investigated. There are three-way
interactions in which the interaction between three factors is
investigated, all the way to the p-way interaction or interactive
effect of all p factors.
Now, while we may be able to investigate
such interactions the interpretation of higher-order interactions ( beyond two- or three-way interactions ) is difficult. Hence, higher-order interactions are
typically not included in Multi-Way Factorial ANOVA’s since in many cases their
effect are usually much smaller than the main and lower-order interaction
effects.
2k ANOVA Model
In this special case of a Multi-Way
Factorial Model, we have k factors, but each has exactly 2 levels, usually
denoted High and Low, but could actually represent any two values. However there must be exactly two.
For such designs there are 2K
design points. A design point is
a combination of the factors at given levels.
For example, in a 23 design, with factors labeled A, B, and
C, there are 8 different design points.
Further assume that the two levels of each factor are High and Low. The eight different design points are
designated in the following table. In
the second last column is a convention for concisely labeling each point.
|
Design Point Number |
Level of Factor A |
Level of Factor B |
Level of Factor C |
Label |
Y |
|
1 |
L |
L |
L |
(1) |
Y(1) |
|
2 |
H |
L |
L |
a |
Ya |
|
3 |
L |
H |
L |
b |
Yb |
|
4 |
H |
H |
L |
ab |
Yab |
|
5 |
L |
L |
H |
c |
Yc |
|
6 |
H |
L |
H |
ac |
Yac |
|
7 |
L |
H |
H |
bc |
Ybc |
|
8 |
H |
H |
H |
abc |
Yabc |
Note!
DO NOT CONFUSE THE DESIGN POINT LABEL WITH AN EFFECT. For example, design point “a” is NOT the
point in the design that measures the effect of factor A. It is simple a labeling convention.
Now to estimate the effect of a factor,
we compare the average of Y’s at the High value of the factor to the average of
the Y’s at the Low value of the factor.
For example, the “effect” of factor A would be

Confounding in a Design
Two
effects are CONFOUNDED in an analysis, if the effect of one is not separable
from the other.
Example: Suppose we wish to compare two brands of
gasoline (say SuperAmerica and Mobil) with respect to
Miles Per Gallons (MPG). In our study, we use eight automobiles (4
large sedans and 4 small compacts). The
four large sedans use Mobil exclusively for a month and the 4 compacts use SuperAmerica for a month.
After the month, we compare the MPG’s for the
two brands.
In this context note that if we observe a difference in mean MPG’s for the two brands, WE CAN NOT CONCLUDE that the
difference is due to the brands since the SIZE of the car is confounded with
gasoline brand. Any difference we
observe could be due to the effect of the different Brands of gas OR the
different sizes of cars. We do not know
which!
Now while confounding is usually not a
wise thing, it can serve a purpose when the number of factors is large.
FRACTIONAL
FACTORIAL DESIGN
A Fractional Factorial Design is an
experiment in which there are k factors of interest, each factors having
exactly two levels, and a fraction of the entire design is used in the
analysis. Various fractional designs
will choose certain combinations of the design points resulting in
1. a smaller number of observations to be run ( ½, ¼, etc) and
2. certain higher-order interaction effects confounded with
main effects and lower order interaction effects.
For
example, suppose we decide to take a ½ fraction of the 23 design
seen earlier with our goal of investigating the Main Effects of A, B, and
C. Further suppose that the 4 ( = ½ of 8 ) design points we choose are 1, 4, 5, and 8.
|
Design Point Number |
Level of Factor A |
Level of Factor B |
Level of Factor C |
Label |
Y |
|
1 |
L |
L |
L |
(1) |
Y(1) |
|
4 |
H |
H |
L |
ab |
Yab |
|
5 |
L |
L |
H |
c |
Yc |
|
8 |
H |
H |
H |
abc |
Yabc |
While we only need to obtain 4
observations in this fractional design rather than the original 8, we have not
met our goal; we have confounded the effects of factors A and B. Notice that when we estimate the effect of
factor A we obtain:

And if we estimate the effect of factor B
we obtain:

Note that our estimates of the effect of
factors A and B are the same and hence if we use the four design points above,
we have confounded the main effects of factors A and B, which is not a very
wise idea or result.
The goal of fractional factorial designs
is to choose the design points to be used in the experiment, so that effects of
interest (main and lower-order interactions) are confounded with higher order
interactions which are usually small and relatively uninteresting in this
preliminary screening phase of the analysis.
DESIGN
RESOLUTIONS
Fractional factorial designs are
categorized by their Resolution. The
resolution of a fractional factorial design determines the degree of
confounding. Fractional factorials can
be of Resolution III, IV, or V; these are the most common. The definitions of these degrees of resolution
are:
Resolution III: These designs have no main effects confounded with other
main effects. However, main effects will
be confounded with two-way interactions and two-way interactions may be
confounded with other two-way interactions.
So in a resolution III design, higher-order interactions (higher than three)
are confounded with the main effects and two-way interaction effects.
Resolution IV: These designs have no main effects confounded with any other
main effect or any other two-way interaction.
Two-way interactions are confounded with other two-way interactions.
Resolution V: These designs have no main effects or two-way interactions
confounded with any other main effect or any other two-way interaction.
Clearly
the “best” design is a resolution V design.
EXAMPLE
Suppose now instead of the above design
we use the following design. Like the
one above we have only 4 points so this design is also a ½ fractional factorial
of the 23 or a 23-1 fractional factorial.
|
Design Point Number |
Level of Factor A |
Level of Factor B |
Level of Factor C |
Label |
Y |
|
2 |
H |
L |
L |
a |
Ya |
|
3 |
L |
H |
L |
b |
Yb |
|
5 |
L |
L |
H |
c |
Yc |
|
8 |
H |
H |
H |
abc |
Yabc |
Notice that when we estimate the effect
of factors A, B, and C we obtain:
![]()
![]()
![]()
Note that each is different and we have
not confounded any of the main effects with any other main effect. But now the question arises, what effect is
confounded with a main effect.
To investigate this question, let’s
consider our present design of four observations in a simple Two-Way ANOVA with
factors A and B. Presenting the data in
terms of 2x2 table of factors A and B alone we obtain:
|
|
|
B |
|
|
A |
|
L |
H |
|
L |
Yc |
Yb |
|
|
H |
Ya |
Yabc |
|
Notice that Yc
value is an observation where Factor C was observed at a High value, while
Factors A and B were observed at the Low value.
Further recall that an interaction effect
between two factors is said to exist if the effect of one factor is different
within different levels of the other factor.
In the table above, A and B would have an interactive effect if the
effect of B (High value of B minus Low value of B) is different for Low and
High values of A.
The estimate of this interaction effect between A and B is

Comparing the last value to the estimated
C effect we note they are identical. So
in this case, the C effect is confounded with the AB interaction. Since we have a main effect confounded with a
two-way interaction, the four design points we used is a Resolution III
design. Unfortunately, when k = 3, this
is the best you can do in terms of resolution.
For larger values of k, one can find (and prefer) designs with higher
resolution.
While your present text does not contain
such designs, many more advanced design texts will contain tables with
different designs of varying resolution for different values of k. One such text is:
Design and Analysis of Experiments, Douglas C. Montgomery, Wiley.
Alternatively, if you provide SAS with
the value of k and desired resolution of your design, if possible, SAS will
provide you with the design points you will need to obtain. As we saw above, if k = 3, there are no
resolution IV or higher designs possible.
However, as k, the number of factors increases, higher resolution designs
are possible.
SUMMARY
Fractional
Factorial Designs are used to screen a large number of factors and identify
those factors to be used in a traditional factorial design. Once the important factors are identified, a
full factorial model with these factors is fit to the fractional design and
effects can then be formally tested using ANOVA techniques.
FRACTIONAL
FACTORIAL ANALYSIS USING SAS
OPTIONS LS=110
PS=60 NODATE PAGENO=1;
TITLE 'FRACTIONAL.SAS';
TITLE2 'MM
EXAMPLE 4.2';
DATA ONE;
INPUT
RESIST A B C D E;
CARDS;
15.1 -1
-1 -1 -1 1
20.6 1 -1 -1 -1 -1
68.7 -1 1 -1 -1 -1
101.0 1 1 -1 -1 1
32.9 -1
-1 1 -1 -1
46.1 1 -1 1 -1 1
87.5 -1 1 1 -1 1
119.0 1 1 1 -1 -1
11.3 -1
-1 -1 1 -1
19.6 1 -1 -1 1 1
62.1 -1 1 -1 1 1
103.2 1 1 -1 1 -1
27.1 -1
-1 1 1 1
40.3 1 -1 1 1 -1
87.7 -1 1 1 1 -1
128.3 1 1 1 1 1
;
PROC PRINT;
PROC FACTEX;
TITLE3 'STEP
1: CREATING A FRACTIONAL FACTORIAL
DESIGN';
FACTORS
A B C D E;
SIZE
DESIGN=16;
MODEL
RESOLUTION=5;
EXAMINE
ALIASING;
PROC GLM
DATA=ONE;
TITLE3 'STEP
2: ANALYZING THE FRACTIONAL FACTORIAL
USING PROC GLM';
MODEL
RESIST=A|B|C|D|E @2/SOLUTION;
PROC GLM
DATA=ONE;
TITLE3 'STEP
3: ANALYZING ONLY THE IMPORTANT EFFECTS';
MODEL
RESIST=A B A*B C/SOLUTION;
RUN;
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
FRACTIONAL.SAS
1
MM EXAMPLE 4.2
Obs RESIST A
B C D
E
1 15.1
-1 -1 -1
-1 1
2 20.6
1 -1 -1
-1 -1
3 68.7
-1 1 -1
-1 -1
4 101.0
1 1 -1
-1 1
5 32.9
-1 -1 1
-1 -1
6 46.1
1 -1 1
-1 1
7 87.5
-1 1 1 -1
1
8 119.0
1 1 1 -1
-1
9 11.3
-1 -1 -1
1 -1
10 19.6
1 -1 -1
1 1
11 62.1
-1 1 -1
1 1
12 103.2
1 1 -1
1 -1
13 27.1
-1 -1 1
1 1
14 40.3
1 -1 1
1 -1
15 87.7
-1 1 1 1 -1
16 128.3
1 1 1 1 1
FRACTIONAL.SAS
2
MM EXAMPLE 4.2
STEP 1: CREATING A FRACTIONAL FACTORIAL DESIGN
The FACTEX Procedure
Aliasing Structure
A
B
C
D
E
A*B = C*D*E
A*C = B*D*E
A*D = B*C*E
A*E = B*C*D
B*C = A*D*E
B*D = A*C*E
B*E = A*C*D
C*D = A*B*E
C*E = A*B*D
D*E = A*B*C
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
FRACTIONAL.SAS
3
MM EXAMPLE 4.2
STEP 2: ANALYZING THE FRACTIONAL FACTORIAL USING PROC
GLM
The GLM Procedure
Number of
Observations Read 16
Number of
Observations Used 16
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
FRACTIONAL.SAS
4
MM EXAMPLE 4.2
STEP 2: ANALYZING THE FRACTIONAL FACTORIAL USING PROC
GLM
The GLM Procedure
Dependent Variable: RESIST
Sum of
Source DF Squares Mean Square F Value
Pr > F
Model 15 23260.21937 1550.68129 . .
Error 0 0.00000 .
Corrected Total 15 23260.21937
R-Square Coeff Var Root
MSE RESIST Mean
1.000000 . . 60.65625
Source DF Type I
A 1 2155.28062 2155.28062 . .
B 1 18530.01563 18530.01563 . .
A*B 1 693.00563 693.00563 . .
C 1 1749.33063 1749.33063 . .
A*C 1 7.98063 7.98063 . .
B*C 1 3.70563 3.70563 . .
D 1 7.98062 7.98062 . .
A*D 1 26.78063 26.78063 . .
B*D 1 28.89063 28.89063 . .
C*D 1 3.15062 3.15062 . .
E 1 0.60063 0.60063 . .
A*E 1 26.78063 26.78063 . .
B*E 1 0.39063 0.39063 . .
C*E 1 14.25063 14.25063 . .
D*E 1 12.07563 12.07563 . .
Source
A 1 2155.28063 2155.28063 . .
B 1 18530.01563 18530.01563 . .
A*B 1
693.00563 693.00563 . .
C 1 1749.33063 1749.33063 . .
A*C 1 7.98063 7.98063 . .
B*C 1 3.70563 3.70563 . .
D 1 7.98062 7.98062 . .
A*D 1 26.78063 26.78063 . .
B*D 1 28.89063 28.89063 . .
C*D 1 3.15062 3.15062 . .
E 1 0.60062 0.60062 . .
A*E 1 26.78063 26.78063 . .
B*E 1 0.39063 0.39063 . .
C*E 1
14.25063 14.25063 . .
D*E 1 12.07563 12.07563 . .
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
The GLM Procedure
Dependent Variable: RESIST
Standard
Parameter Estimate Error t Value
Pr > |t|
Intercept 60.65625000 . . .
A 11.60625000 . . .
B 34.03125000 . . .
A*B 6.58125000 . . .
C 10.45625000 . . .
A*C 0.70625000 . . .
B*C 0.48125000 . . .
D -0.70625000 . . .
A*D 1.29375000 . . .
B*D 1.34375000 . . .
C*D 0.44375000 . . .
E 0.19375000 . . .
A*E 1.29375000 . . .
B*E -0.15625000 . . .
C*E 0.94375000 . . .
D*E -0.86875000 .
. .
FRACTIONAL.SAS
6
MM EXAMPLE 4.2
STEP 3: ANALYZING ONLY THE IMPORTANT EFFECTS
The GLM Procedure
Number of
Observations Read 16
Number of
Observations Used 16
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
FRACTIONAL.SAS
7
MM EXAMPLE 4.2
STEP 3: ANALYZING ONLY THE IMPORTANT EFFECTS
The GLM Procedure
Dependent Variable: RESIST
Sum of
Source DF Squares Mean Square F Value
Pr > F
Model 4 23127.63250 5781.90813 479.69
<.0001
Error 11 132.58687 12.05335
Corrected Total 15 23260.21937
R-Square Coeff Var Root
MSE RESIST Mean
0.994300 5.723720 3.471794 60.65625
Source DF Type I
A 1 2155.28062 2155.28062 178.81
<.0001
B 1 18530.01563 18530.01563 1537.33
<.0001
A*B 1 693.00563 693.00563 57.49
<.0001
C 1 1749.33063 1749.33063 145.13
<.0001
Source
A 1 2155.28063 2155.28063 178.81
<.0001
B 1 18530.01563 18530.01563 1537.33
<.0001
A*B 1 693.00563 693.00563 57.49
<.0001
C 1 1749.33063 1749.33063 145.13
<.0001
Standard
Parameter Estimate Error t Value
Pr > |t|
Intercept 60.65625000 0.86794845 69.88
<.0001
A 11.60625000 0.86794845 13.37
<.0001
B 34.03125000 0.86794845 39.21
<.0001
A*B 6.58125000 0.86794845 7.58
<.0001
C 10.45625000 0.86794845 12.05
<.0001
ii. Response Surface Methods
From Dr. R. Noble
Goal: Determine factor level combination that leads to an optimal response.
Natural units: X1: L1 to H1, X2: L2, H2
Conversion to coded units from natural units: ![]()
Conversion to natural units from coded units: ![]()
C1 C2 d
X1 X2
-1 -1
0 L1 L2
1 -1
0 H1 L2
-1
1 0 L1 H2
1 1 0 H1 H2
0 0 1 0.5(L1+H1) 0.5(L2+H2)
0 0 1 0.5(L1+H1) 0.5(L2+H2)
0 0 1 0.5(L1+H1) 0.5(L2+H2)
0 0 1 0.5(L1+H1) 0.5(L2+H2)
![]()
Model
Yi = b0 + b1C1i + b2C2i + b3di + ei where ei
N(0,s2)
/*
x1: 11 to 16
x
data phase1;
input
x1 x2 center y;
code1 = (2*x1
- 11 - 16)/(16 - 11);
code2 = (2*x2
- 25 - 32)/(32 - 25);
cards;
11 25 0
7.52
11 32 0
5.01
16 25 0
6.01
16 32 0
9.27
13.5
28.5 1 5.45
13.5
28.5 1 8.6
13.5
28.5 1 7.11
13.5
28.5 1 5.12
;
proc reg;
model
y = code1 code2 center;
run;
------------------------------------------------------------------------------
The REG Procedure
Model: MODEL1
Dependent Variable:
y
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value
Pr > F
Model 3 2.32386 0.77462 0.19
0.8964
Error 4 16.09263 4.02316
Corrected Total 7 18.41649
Root MSE 2.00578 R-Square
0.1262
Dependent Mean 6.76125 Adj R-Sq -0.5292
Coeff Var 29.66583
Parameter
Estimates
Parameter Standard
Variable DF
Estimate Error t Value
Pr > |t|
Intercept 1
6.95250 1.00289 6.93
0.0023
code1 1 0.68750 1.00289 0.69
0.5307
code2 1 0.18750 1.00289 0.19
0.8608
center 1
-0.38250 1.41830 -0.27
0.8007
Gradient vector:

= [0.9648 0.26312]
Multiplier C1
C2 X1 X2
Y
2
1.9295 0.5262 18.3 30.3 14.83
3
2.8943 0.7894 20.7 31.3 11.75
4
3.8591 1.0525 23.1 32.2 13.39
5
4.8238 1.3156 25.6 33.1 20.23
6
5.7886 1.5787 28.0 34.0 31.15
7
6.7533 1.8418 30.4 34.9 38.81
8
7.7181 2.1049 32.8 35.9
43.55
9 8.6829 2.3681 35.2 36.8 41.49

Phase 2 center: X1 = 32.8, X2 = 35.9
/*
x1: 32.8 +/- 4
x2: 35.9 +/- 4 */
data phase2;
input
x1 x2 center y;
code1 = (2*x1
- 29 - 37)/(37 - 29);
code2 = (2*x2
- 32 - 40)/(40 - 32);
cards;
32.8
35.9 1 43.55
28.8
31.9 0 21.89
36.8
31.9 0 17.22
28.8
39.9 0 66.87
36.8
39.9 0 32.89
32.8
35.9 1 41.64
32.8
35.9 1 47.43
32.8
35.9 1 44.54
;
proc reg;
model
y = code1 code2 center;
run;
------------------------------------------------------------------------------
The REG Procedure
Model: MODEL1
Dependent Variable:
y
Analysis of
Variance
Sum of Mean
Source DF Squares Square F Value
Pr > F
Model 3 1476.32676 492.10892 8.48
0.0330
Error 4 232.26122 58.06531
Corrected Total 7
1708.58799
Root MSE 7.62006 R-Square
0.8641
Dependent Mean 39.50375 Adj R-Sq 0.7621
Coeff Var
19.28946
Parameter
Estimates
Parameter Standard
Variable DF
Estimate Error t Value
Pr > |t|
Intercept 1
34.71750 3.81003 9.11
0.0008
code1 1
-9.66250 3.81003 -2.54
0.0642
code2 1
15.16250 3.81003 3.98
0.0164
center 1
9.57250 5.38820 1.78
0.1503

= [–0.5374 0.8433]
Multiplier C1
C2 X1 X2
Y
2
-1.0748 1.6866 28.5
42.6 70.84
3
-1.6122 2.5300 26.4
46.0 67.84

Phase 3 center: X1 = 28.5, X2 = 42.6
/*
x1: 28.5 +/- 4
x2: 42.6 +/- 4 */
data phase3;
input
x1 x2 center y;
code1 = (2*x1
- 24.5 - 32.5)/(32.5 - 24.5);
code2 = (2*x2
- 38.6 - 46.6)/(46.6 - 38.6);
cards;
28.5
42.6 1 70.84
24.5
38.6 0 34.47
32.5
38.6 0 57.62
24.5
46.6 0 53.78
32.5
46.6 0 47.79
28.5
42.6 1 69.68
28.5
42.6 1 75.45
28.5
42.6 1 71.08
;
proc reg;
model
y = code1 code2 center;
run;
------------------------------------------------------------------------------
The SAS
System
The REG Procedure
Model: MODEL1
Dependent Variable:
y
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value
Pr > F
Model 3 1186.29551 395.43184 6.83
0.0472
Error 4 231.53617 57.88404
Corrected Total 7
1417.83169
Root MSE 7.60816 R-Square
0.8367
Dependent Mean 60.08875 Adj R-Sq 0.7142
Coeff Var
12.66153
Parameter
Estimates
Parameter Standard
Variable DF
Estimate Error t Value
Pr > |t|
Intercept 1
48.41500 3.80408 12.73
0.0002
code1 1 4.29000 3.80408 1.13
0.3225
code2 1 2.37000 3.80408 0.62
0.5670
center 1
23.34750 5.37978 4.34
0.0123
Significant curvature
Run axial runs in order to estimate quadratic effects
C1
C2 X1 X2
-1.4142 0
22.8 42.6
1.4142
0 34.2 42.6
0
-1.4142 28.5 36.9
0
1.4142 28.5 48.3
Model
Yi = b0
+ b1x1i + b2x2i + b3
+b4x1ix2i
+ b5
+ ei where ei
N(0,s2)

data quadratic;
input
x1 x2 y;
cards;
28.5
42.6 70.84
24.5
38.6 34.47
32.5
38.6 57.62
24.5
46.6 53.78
32.5
46.6 47.79
28.5
42.6 69.68
28.5
42.6 75.45
28.5
42.6 71.08
22.8
42.6 30.69
34.2
42.6 41.58
28.5
36.9 46.21
28.5
48.3 63.77
;
proc rsreg;
model
y = x1 x2 / nocode;
run;
------------------------------------------------------------------------------
The RSREG Procedure
Response Surface for
Variable y
Response Mean 55.246667
Root MSE 3.186475
R-Square 0.9761
Coefficient of
Variation 5.7677
Type I Sum
Regression DF of Squares R-Square
F Value Pr > F
Linear 2 280.145763 0.1099 13.80
0.0057
Quadratic 2 1996.161673 0.7830 98.30
<.0001
Crossproduct 1
212.284900 0.0833 20.91
0.0038
Total Model 5
2488.592337 0.9761 49.02
<.0001
Sum of
Residual DF Squares Mean Square
Total Error 6
60.921730 10.153622
Standard
Parameter
DF Estimate Error t Value
Pr > |t|
Intercept 1
-2285.208774 205.175353 -11.14
<.0001
x1 1 80.792050 6.146719 13.14
<.0001
x2 1 54.857762 7.222680 7.60
0.0003
x1*x1 1
-1.059339 0.077885 -13.60
<.0001
x2*x1 1
-0.455313 0.099577 -4.57
0.0038
x2*x2 1
-0.479006 0.077885 -6.15
0.0008
Sum of
Factor
DF Squares Mean Square F Value
Pr > F
x1 3
2223.109257 741.036419 72.98
<.0001
x2 3
744.013683 248.004561 24.43
0.0009
The RSREG Procedure
Canonical Analysis of
Response Surface
Critical
Factor Value
x1 28.765411
x2 43.590782
Predicted value at stationary
point: 72.445880
Eigenvectors
Eigenvalues x1 x2
-0.400358 -0.326531 0.945186
-1.137986 0.945186 0.326531
Stationary point is a
maximum.
Summary:
D1 = Design 1
P1 = Path of steepest ascent 1
D2 = Design 2
P2 = Path of steepest ascent 2

D3 = Design 3
Truth (always unknown in practice)


e ~ N(0, 4)

Predicted max @ X1 = 28.765411, X2 = 43.590782
Actual max @ X1 = 28.80, X2 = 43.97
iii. Nesting/Crossing and Split-Plot Designs
Factorial treatment structures represent CROSSING of
factors – all levels of one factor are combined with all levels of other
factors.
If Factors are NESTED, then levels of some factor
occur within levels of another factor.
Classic example:
Split-Plot Designs (OL 17.6, p. 1014+)
Can only apply
treatment to a WHOLE PLOT while other treatments can be randomly assigned to
SUB-PLOTS.
Factor A:
fertilizer (2 levels A1 and A2) – can only be
applied to WHOLE PLOT
Factor T:
varieties (3 levels T1, T2, and T3) –
can be randomly assigned to SUB-PLOTS
|
Fertilizer applied to WHOLE PLOTS |
||||||
|
A= A1 |
|
A=A2 |
|
A=A2 |
|
A=A1 |
|
T2 |
|
T3 |
|
T1 |
|
T3 |
|
T1 |
|
T2 |
|
T3 |
|
T1 |
|
T3 |
|
T1 |
|
T2 |
|
T2 |
Treatments/varieties
randomly assigned to SUB-PLOTS within the WHOLE PLOTS
Notice
that VARIETY is NESTED in PESTICIDE. A
model for this design (p. 1015) is
yijk = m + ai + tj +
(at)ij +dik + eij
where Sai = Stj = Si (at)ij = Sj(at)ij =
0, djk ~ ind. N(0,
), and eij ~ ind. N(0,
) [djk and
eij
*
Error terms may change for testing hypotheses about model parameters:
Between
WHOLE PLOTs: H0:
a1
=a2
=…=aa
=0 tested by Fobs = MSA/MS(A)
Within Wholeplots: H0: atij = 0 tested by Fobs =
MSAT/MSE
Within Wholeplots: H0: t1
=t2
=…=tt
=0 tested by Fobs = MST/MSE
*
You could also define WHOLE PLOTS within BLOCKS as in Example 17.11 (p. 1017).
* This
type of design might be appropriate for the analysis of artificial mesocosms/ponds. For
example, you may need to apply a sediment treatment to an entire pond that is
then subdivided into sections for other treatments.
* MORE
MAY BE ADDED
iv. Fixed/Random/Mixed Effects Models
Fixed
Effect = levels of Factor are of particular interest for inference [how do mean
responses differ at different factor levels?]
Random
Effect = levels of Factor are selected from population of possible factor
levels – levels are not of particular inference [is this factor an important
source of variation in the distribution of responses?]
Fixed
Effects models = comprised of ONLY Fixed factors
e.g.
yij = m + ai + eij where Sai
= 0 and eij
~ ind. N(0,
)
H0: a1
=a2
=…=at
=0
E[yij] = m
+ ai
V[yij]
= ![]()
Random
Effects models = comprised of ONLY Random factors
e.g.
yij = m + ai + eij where ai
~ N(0,
) and eij ~ ind.
N(0,
)
H0:
=0
E[yij] = m
V[yij] =
+ ![]()
Mixed
Effects models = has both fixed and random factors
Random Effects Models
yij
= m
+ ai + eij where ai
~ N(0,
) and eij ~ ind.
N(0,
)
title
"Random effect";
title2 "Ott/Longnecker p. 981 - example 17.1";
data
draneff;
input station intensity @@;
datalines;
1 20 1 1050 1 3200 1 5600 1 50
2 4300 2 70 2 2560 2 3650 2 80
3 100 3 7700 3 8500 3 2960 3 3340
;
proc
glm;
class station;
model intensity=station;
random station;
run;
ods
html close;
|
Random effect |
|
Ott/Longnecker
p. 981 - example 17.1 |
The GLM Procedure
|
Class
Level Information |
||
|
Class |
Levels |
Values |
|
Station |
3 |
1 2 3 |
|
Number
of Observations Read |
15 |
|
Number
of Observations Used |
15 |
|
Ott/Longnecker
p. 981 - example 17.1 |
The GLM Procedure
Dependent Variable: intensity
|
Source |
DF |
Sum
of Squares |
Mean
Square |
F
Value |
Pr > F |
|
Model |
2 |
20259573.3 |
10129786.7 |
1.38 |
0.2884 |
|
Error |
12 |
87989600.0 |
7332466.7 |
|
|
|
Corrected
Total |
14 |
108249173.3 |
|
|
|
|
R-Square |
Coeff Var |
Root
MSE |
intensity Mean |
|
0.187157 |
94.06622 |
2707.853 |
2878.667 |
|
Source |
DF |
Type
I SS |
Mean
Square |
F
Value |
Pr > F |
|
Station |
2 |
20259573.33 |
10129786.67 |
1.38 |
0.2884 |
|
Source |
DF |
Type
III SS |
Mean
Square |
F
Value |
Pr > F |
|
Station |
2 |
20259573.33 |
10129786.67 |
1.38 |
0.2884 |
The GLM Procedure
|
Source |
Type
III Expected Mean Square |
|
Station |
Var(Error)
+ 5 Var(station) |
Mixed Effects Models
yij
= m
+ ai + bj
+ eij
where Sai
= 0, bj
~ ind. N(0,
), and eij
~ ind. N(0,
)
Repeated
measurements Examples:
1. Respondents in the same household
2. Students in the same classroom
3. Pups in the same litter
4. Multiple
measurements of the same individual [repeated measurements, longitudinal, time
series, growth curves]
options nocenter
formdlim="-";
/*
data from Verbeke and Molenberghs
2.5
age(yrs); dist
(center of pituitary to maxillary fissure)
*/
data growth;
input girl age dist @@;
datalines;
1
8 21 1 10 20 1 12 21.5 1 14 23
2
8 21 2 10 21.5 2 12 24 2 14 25.5
3
8 20.5 3 10 24.0 3 12 24.5 3 14 26
4
8 23.5 4 10 24.5 4 12 25.0 4 14 26.5
5
8 21.5 5 10 23 4 12 22.5 4 14 23.5
6
8 20 6 10 21 6 12 21 6 14 22.5
7
8 21.5 7 10 22.5 7 12 23.0 7 14 25
8
8 23 8 10 23 8 12 23.5 8 14 24
9
8 20 9 10 21 9 12 22 9 14 21.5
10
8 16.5 10 10 19 10 12 19 10 14 19.5
11
8 24.5 11 10 25 11 12 28 11 14 28
;
proc gplot;
title “Growth data for 11 girls –
pituitary to maxillary fissure”;
plot dist*age=girl;
run;
proc reg
data = growth;
title2
"OLS ignoring multiple measurements per girl";
model dist = age;
run;
proc mixed
data=growth;
title2 "Random intercept for each
girl";
class girl;
model dist = age /
solution;
random intercept /
type=un subject=girl;
run;
proc mixed
data=growth;
title2 "Random intercept and slope for
each girl";
class girl;
model dist = age /
solution;
random intercept age
/ type=un subject=girl;
run;
* MORE
MAY BE ADDED