Homework [* All assignments are tentative until a date is assigned *] :
NOTE: Since Collett doesn't have homework problems, I will be selecting problems from other sources.

  1. Select an issue of a recent biomedical/public health journal (e.g. American Journal of Public Health, New England Journal of Medicine, American Journal of Epidemiology, etc.). Look at the METHODS section of each research paper in this issue. Provide a table with a separate row for each research paper and the following information in the columns:
    1. Column 1 (title of the paper)
    2. Column 2 (was one or more of the authors a biostat/stat person? Y/N)
    3. Column 3 (list the stat methods used - multiple regression, logistic regression, Poisson regression, Cox models, relative risk models, etc.)
    4. Column 4 (computer software used to do the analysis)
    5. Column 5 (any comments/reactions to the data analysis/displays)
    (Due: 16 Jan 07)

  2. Let T1, ..., Tn be a random sample from an exponential distribution with mean q. Determine: i) likelihood function - L(q); ii) score function - u(q); iii) MLE for q; iv) expected information number for q; v) observed information number - i(q); vi) H0: q = 1 vs. H1: q != 1 using a) score test; b) LRT; and c) Wald test (Due: 23 Jan 07)
  3. Present solutions to the following problems
    1. Given S(t) = exp(-t^gamma), derive f(t), h(t), H(t) and F(t). [Lee]
    2. h(t) = beta/(t + gamma) + delta*t can be used to describe a "bathtub" shaped hazard function. Examine this model and show how it is able to describe such a hazard. Comment on the flexibility of shapes that can be accomodated by this model. [Lawless]
    (Due: 30 Jan 07)
  4. Logistic Regression problem - O-ring failure as a function of temperature. Fit a logistic regression model to the number of O-ring failures as a function of temperature. Your solution should include the following information:
    1. Estimated logistic regression coefficients and standard errors.
    2. A test of whether the probability of ring failure is related to temperature. Include null and alternative hypothesis, test statistics and P-values. Include both Wald and Likelihood Ratio tests.
    3. Provide an estimate of the odds ratio of failure associated with a 5 deg. F decrease in temperature (include a 95% CI).
    4. Include a plot of the predicted probability of failure along with the observed failure proportions.
    (Due: 06 Feb. 07)
  5. Refer the "Data+" link and the "Survival-data-analysis Folder." The columns in the data file "T3.1.prn" correspond to PATIENT AGE GENDER (1=M; 2=F) -- 4 indicator variables for INITIAL_STAGE -- TREATMENT REMISSION_DURATION OBS_REM (1=yes; 0=censor) SURVIVAL_TIME OBS_SUR (see "T3.1.txt" for description of this file).
    1. Compute and plot the KM estimates of the survivorship function for the two treatment groups.
    2. Compute the variance of the estimated survival probabilities for every observed event time.
    3. Estimate the median survival times of the two groups.
    4. Which treatment appears to be associated with better survival and why?
    5. Repeat these steps for the remission durations of the two treatment groups
    6. (Based on Lee 4.1, 4.2)
    (Due: 15 Feb. 07)
  6. For the survival times given in the last homework set.
    1. Display the KM estimates of the survivorship function for the two treatment groups on a single plot.
    2. Formally test the equality of S(t) between the two treatments using the log-rank and the Wilcoxon tests.
    3. Prepare a table with estimated median survival times (and 95% CIs) for the two treatments.
    4. Prepare a paragraph summarizing the differences between the treatments.
    5. Suppose you have the following survival time (wks) of brain tumor patients receiving four different treatments. Are the four treatments equally effective? Include appropriate graphical displays along with hypothesis tests.
      trt 1: 4, 5, 9, 12, 20+, 25, 30+
      trt 2: 1, 4, 9, 12, 15, 23, 30
      trt 3: 3, 7, 14, 20, 27, 30, 32+, 50+
      trt 4: 5, 15, 20, 31, 39, 47, 55+, 67+
    6. (Based on Lee 5.1, 5.2, 5.12)
    (Due: 20 Feb. 07)
  7. Refer to the same data as first analyzed in Homework 5. In particular, Follow the "Data+" link and the "Survival-data-analysis Folder." The columns in the data file "T3.1.prn" correspond to PATIENT AGE SEX (1=M; 2=F) -- 4 indicator variables for INITIAL_STAGE -- TREATMENT REMISSION_DURATION OBS_REM (1=yes; 0=censor) SURVIVAL_TIME OBS_SUR (see "T3.1.txt" for description of this file).
    1. Use a Cox PH model to test whether survival differs between the the treatments.
    2. Fit a Cox PH model to predict survival as a function of explanatory variables including treatment along with AGE, and SEX. Provide estimates of the coefficients and related standard errors. Interpret the coefficients in terms of impact on the hazard ratio.
    3. Fit a Cox PH model that includes interactions between treatment and AGE and SEX. Do you have evidence that the effect of the treatment differs as a function of AGE, and SEX?
    4. Prepare a figure displaying the survival curves for the 4 SEX-TREATMENT combinations.
    (Due: 22 Mar. 07)
  8. Suppose ties are possible in a collection of observations with "r" unique times [i.e. dj >= 1 for some t(j), j=1, ..., r]. Using the Breslow approximation to the partial likelihood (Collett Eq. 3.9), derive a score test for H0: b=0 for comparing 2 treatments. Will this score test be more conservative than the log-rank test? (Due: 03 Apr. 07)
  9. Consider data from STATLIB on length of stay in nursing homes (ch12.dat [data], ch12.sas [pgm to read the data], ch12.txt [description of the variables] from http://lib.stat.cmu.edu/datasets/csb/ ).
    1. Determine the best set of predictors for modeling resident length of stay ("lstay") as a function of age ("age"), treatment ("trt"), marital status ("marstat"), and health status ("hlstat").
    2. For the model you select, interpret the coefficients in terms of impact on the hazard ratio.
    3. Conduct appropriate model adequacy assessments for the model you select.
    (Due: 10 Apr. 07)
  10. Testing for Exponential distributions along with comparing t(p) CI procedures.
    1. Fit a Weibull model to the data generated in "sta685-fitexp-04apr07.sas" (see the "Program Files" link). Estimate the scale and shape parameters. Construct a likelihood ratio test of the H0: shape=1 (i.e. a test of exponentiality).
    2. Conduct a small simulation study to compare the two methods for constructing confidence intervals for median survival time [t(50)] when sampling from exponential distributions. Use the lambda from the "sta685-fitexp-04apr07.sas" file and the censoring pattern described therein. Consider n=20, 50, 100 and report coverage probabilities and a summary of CI widths. Which of the two CI methods would you recommend?
    (Due: 19 Apr. 07)
  11. Problems related to AFT ideas.
    1. Let T ~ Exp(lambda=1) and let Y=log(T). Derive the density of Y. Hint: Use the method of transformations.
    2. For the Nelson data on the time to breakdown of a type of electrical insulating fluid (see below), do the following: (a) Test whether breakdown times areidentically distributed across voltage levels; (b) present evidence to suggest that a Weibull model might be reasonable;(c) use an AFT regression model to describe breakdown timers as a function of voltage levels. Interpret the model parameter estimates; (d) repeat step c using a Cox PH model; and (e) summarize your analysis. The Nelson (1970) data (rounded to the nearest 0.1 minute, all observed/no censoring) are given as (voltage condition): breakdown times.
  12. (Due: 26 Apr. 07)
  13. Lee 5.9 (Due: dd mmm 07)
  14. Lee 6.1, 6.2, 6.3, 6.4, 6.7, 6.8, 8.1ab, 8.6 (Due: dd mmm 07)
  15. Obtain MLEs for lambda1 and delta for a random sample of n1 from Exp(lambda1) and an independent random sample of n2 from Exp(lambda1 + delta). Assume that r1 (r2) events are observed in the first (second) sample and n1-r1 (n2-r2) are right censored. (Due: dd mmm 07)
  16. Lee 7.9 (using LRT); 9.5 (using LRT) (Due: dd mmm 07)
  17. Does survival differ among genotypes for CDCL exposed Ceriodaphnia a) is different survival associated with different genotypes? b) is a proportional hazards assumption reasonable for these data? (Due: dd mmm 07)
  18. For the Nelson data on the time to breakdown of a type of electrical insulating fluid, do the following: (a) Test whether breakdown times are identically distributed across voltage levels; (b) present evidence to suggest that a Weibull model might be reasonable; (c) use an AFT regression model to describe breakdown timers as a function of voltage levels. Interpret the model parameter estimates; (d) repeat step c using a Cox PH model; and (e) summarize your analysis. (Due: dd mmm 07)