IES 612/STA 4-573/STA 4-576
Spring 2005
Week 14 IES612-week14-lecture.doc
1. Describe
all of the outcomes of an experiment.
2. Characterize
the probabilities associated with each outcome.
3. Match
these probabilities up with what is produced by some random number generator.
4. Generate
a large number of random experiments according to this rule.
Examples:
1. Estimating p using MC methods
2. Sex ratio in family with 4 kids
3. Impact of violating
the equal variance assumption in a 2 group t-test
EXAMPLE: Using SAS data step to do Monte Carlo
Integration
/*
Problem: Estimate PI using
Monte Carlo Integration
Strategy: Equation of a
circle with radius=1: x2 + y2
= 1
which can be written y = sqrt(1-x2)
Area of this
circle = p
Area of this
circle in the first quadrant = p/4
Generate Ux ~ Uniform(0,1) and Uy ~ Uniform(0,1)
Check to see if Uy <= sqrt(1-Ux2)
The proportion of
generated points when this
Condition is
true is an estimate of p/4.
*/
data MCint;
/* initialize seed */
seed1 = 12345;
do
isim = 1 to 10000;
/* generate point in
first quadrant */
Ux
= ranuni(seed1);
Uy
= ranuni(seed1);
/* check to see if point
under the circle */
Under = (Uy <= sqrt(1-Ux**2));
/* output simulation
result */
drop
isim;
output;
end;
* ODS TRACE ON;
ODS OUTPUT
OneWayFreqs=data_freq;
proc freq;
table Under;
run;
ODS OUTPUT CLOSE;
* ODS TRACE OFF;
proc print data=data_freq; run;
data summary; set data_freq;
if under = 1;
PI_est =
4*Percent/100;
prop_est =
Percent/100;
SE_PI_EST = 4*sqrt(prop_est
* (1 prop_est)/10000 );
PI_CI_Low = PI_est 2*SE_PI_EST;
PI_CI_Up = PI_est + 2*SE_PI_EST;
put
-----------------------------------------------;
put | MC
Integration estimate of PI |;
put
-----------------------------------------------;
put PI (estimate) = PI_est;
put SE [PI
(estimate)] = SE_PI_EST;
put Approx. 95% CI = PI_CI_Low
, PI_CI_Up;
put ;
put Based on 10,000
simulated points. ;
put -----------------------------------------------;
run;
From the SAS LOG file
-----------------------------------------------
| MC
Integration estimate of PI |
-----------------------------------------------
SE [PI
(estimate)] = 0.0166662792
Approx. 95% CI = 3.0722674415
, 3.1389325585
Based
on 10,000 simulated points.
-----------------------------------------------
ODS RTF
file='D:\baileraj\Classes\Fall
2003\sta402\SAS-programs\week6-MC-fig.rtf'
proc gplot data=MCint;
plot Uy*Ux=Under;
run;
ODS RTF CLOSE;

EXAMPLE: Simulating sex ratio in family with 4 kids
data
monte2;
title2 Problem 2: #boys & girls in
families of size 4;
array x
x1-x4; * array containing family
of 4 kids;
do j=1 to 1000; * generate 1000 familes;
numboys = 0; * initialize counters of #boys and
#girls;
numgirls=0;
do
over x; * generate a family of 4
kids;
p = ranuni(0);
if
p<.5 then do;
numboys = numboys
+ 1;
x = 0;
end;
else
do;
numgirls = numgirls
+ 1;
x = 1;
end;
end; * end of the loop for KIDS;
outcome
= 0;
if
numboys=numgirls then
outcome=1;
drop
j p;
output;
end; * end of the loop for
FAMILIES;
proc print;
proc freq;
table
outcome numboys*numgirls;
run;
Obs x1
x2 x3 x4
numboys
numgirls
outcome
1 1 0 0 0 3 1 0
2
0 0 0 1
3 1 0
3
0 0 0 1
3 1 0
4
0 0 1
1
2 2 1
5
0 1 0
1 2 2 1
6
1 1 0
1 1 3 0
7
1 0 0 1
2 2 1
8
1 1 1 1 0 4 0
9
1 0 1
0 2 2 1
10
0 0 0 0 4 0 0
11 0 0 0 1
3 1
0
. . .
990
0 0 1
0 3 1 0
991
1 1 1 0
1 3 0
992
0 0 1
1
2 2 1
993
1 1 1 0
1 3 0
994
1 0 1
0 2 2 1
995
1 1 0
0
2 2 1
996
0 1 0
0
3 1 0
997
1 0 1
1 1 3 0
998 0 0 0 1 3 1 0
999
0 0 1
0 3 1 0
1000
1 1 0
0
2 2 1
The
FREQ Procedure
Cumulative Cumulative
outcome Frequency Percent
Frequency Percent
0
620 62.00 620 62.00
1
380 38.00 1000 100.00
Table
of numboys by numgirls
numboys numgirls
Frequency
Percent
Row
Pct
0
0 0 0
0 60 60
0.00 0.00 0.00
0.00 6.00 6.00
0.00 0.00 0.00
0.00 100.00
0.00 0.00 0.00
0.00 100.00
1
0 0 0
235 0
235
0.00 0.00 0.00 23.50 0.00
23.50
0.00 0.00 0.00 100.00 0.00
0.00 0.00 0.00 100.00 0.00
2
0 0 380
0 0 380
0.00 0.00 38.00 0.00
0.00 38.00
0.00 0.00 100.00 0.00
0.00
0.00 0.00 100.00 0.00
0.00
3
0 263 0
0 0 263
0.00 26.30
0.00 0.00
0.00 26.30
0.00 100.00 0.00 0.00
0.00
0.00 100.00 0.00 0.00
0.00
4
62 0 0
0 0 62
6.20 0.00 0.00
0.00 0.00 6.20
100.00 0.00
0.00 0.00 0.00
100.00 0.00
0.00 0.00 0.00
Total 62 263
380 235 60
1000
6.20 26.30
38.00 23.50 6.00
100.00
EXAMPLE: Is the t-test really robust to violating the
equal variance assumption?
/*
Problem: Explore whether
t-test really is robust to
violations of the equal variance assumption
Strategy: See if the
t-test operates at the nominal
Type I error rate when the unequal
variance
assumption is violated
Test case: n1=n2=10
Population
1: N(0,1)
Population
2: N(0,4)
*/
data twogroup;
array x{10}
x1-x10;
array y{10}
y1-y10;
do isim = 1 to 10000;
/*
generate samples X~N(0,1) Y~N(0,4) - normal case */
do isample = 1 to 10;
x{isample} = rannor(0);
y{isample} = 2*rannor(0);
end;
/*
calculate the t-statistic
*/
xbar
= mean(of x1-x10);
ybar
= mean(of y1-y10);
xvar
= var(of x1-x10);
yvar
= var(of y1-y10);
s2p = (9*xvar + 9*yvar)/18;
tstat
= (xbar-ybar)/sqrt(s2p*(2/10));
Pvalue = 2*(1-probt(abs(tstat),18));
Reject05 = (Pvalue
<= 0.05);
keep xbar ybar xvar
yvar s2p tstat Pvalue Reject05;
output;
end; * end of the
simulation loop;
/*
proc print;
run;
*/
proc freq;
table Reject05;
run;
Cumulative Cumulative
Reject05 Frequency Percent
Frequency Percent
0 9443 94.43 9443 94.43
1 557 5.57 10000
100.00
So, a nominal error rate of 5% was specified and we rejected 5.57% of the time in the
10000 simulated samples.
IES 612
1. Estimate the area under a standard normal curve [i.e. N(0,1)] between -2 and 2 using Monte Carlo Techniques. Place a bound on the error of estimation.
2. Randomly break a stick at some point along its length. Form the ratio of the length of the long end to the short end. Describe the distribution of this ratio. (Some relevance to ideas in Linkage groups in genetics.)
3. A Different Kind of Gambling (from How to Model It)
You are a good student, but your performance in examinations depends very much on your mood.
You tend to be in a good mood 4 days out of 6. When you are in a good mood, there is a 0.9 probability that you will score 100 percent in an examination and only a 0.1 probability that you will score 80 percent.
When you are in a bad mood there is a 0.2 probability that you will score 100 percent, a 0.3 probability of scoring 80 percent, and a 0.5 probability of scoring only 50 percent.
Your teacher offers you a choice. Either you write two equally weighted examinations (a midquarter and final) or else you write ten equally weighted tests (one each week).
Analyze this proposition and explain your choice.
4. Does a confidence interval for the mean of
exponentially distributed quantity based upon a normal interval [
± t(1-a/2)*s/√
n ] cover the mean with the stated confidence coefficient 1-a?
5. Regular, Aggregated or Completely Spatially Random?
Data were gathered on two plots of trees. For ease of manipulation, assume the plots were square and thus tree locations in the squares could be scaled to be between 0 and 1. Let the South-West corner of the plot correspond to the "origin" of the plot [ the point (0,0) on a coordinate system scaled over a unit square ] and let the North-East corner correspond to the point farthest away from the origin in the plot [ the point (1,1) on a coordinate system scaled over a unit square ].
One measure of spatial relationships is
constructed from measuring distances to nearest neighbors. The distance between two points (x1, y1) and (x2, y2) can be found
by d = . The Nearest Neighbor distance to a particular
tree is the smallest of all of the distances from this tree to all other
trees. One measure of tree closeness might
be the average Nearest Neighbor distance.
If this average is too "small", then this suggests
aggregation. If this average is too
"large", then this suggests a regular pattern. Use
In plot 1, 4 trees were located at points (.25,.75), (.75, .75), (.25,.25), and (.75, .25).

In plot 2, 4 trees were located at points (.25,.75), (.30, .85), (.20, .80), and (.22, .70).
