STA 402/502 - Statistical Programming
|
Course (Section) |
STA 402/502 (A) |
|
Meeting Time: |
800-850 MWF (plus other make-up times to be arranged in consultation with students) |
|
Meeting Location |
106 Bachelor Hall |
|
Prerequisites: |
STA 401/501; STA 671; or permission of the instructor. Willingness to work. |
|
Professor: |
Dr. John Bailer |
|
E-mail: |
|
|
URL: |
|
|
Office (phone) |
292 or 122B Bachelor Hall (529-3538) FAX: 529-1493 |
|
Office Hours |
10:00 - 11:30 M W F (other hours by appointmentdon't be shy!) |
Purpose of Course:
To introduce the use of computers to process and analyze data Techniques and strategies for managing, manipulating and analyzing data are discussed. Emphasis is on the use of the SAS system. SAS data steps including infile, input, merge, set, looping structures, conditional execution (if-then), etc. are presented. SAS mathematical, statistical and data functions are discussed along with discussion of macro construction, extensive matrix manipulation and programming (PROC IML) and graphics procedures. Other quantitative programming environments (e.g. R) are considered for constructing specialized statistical analysis functions and graphical displays. Statistical computing topics, such as random number generation, randomization tests and Monte Carlo simulation, will be used to illustrate these programming ideas.
Course Objectives:
Develop programming and computing skills to address data analysis problems using statistical programming tools.
Texts:
|
Required (and provided): |
|
Bailer AJ. 2010. Statistical Programming in SAS. SAS Institute. Cary, NC ISBN: 978-1-59994-656-6 |
| Recommended (will discuss in class - wait to purchase) |
| SAS certification guide - SAS Certification Prep Guide: Base Programming for SAS9. SBN-10: 159047922X / ISBN-13: 978-1590479223. Approx. 22 chapters with CD. |
|
Other books that you might like ... |
|
Delwiche LD and Slaughter SJ. 2003. The Little SAS Book: A Primer 3rd edition. SAS Institute. Cary, NC ISBN 1-59047-333-7 |
|
Cody R and Pass R. 1995. SAS Programming by Example. SAS Institute. Cary, NC ISBN 1-55544-681-7 |
|
Cody R 2004. SAS Functions by Example. SAS Institute. Cary, NC ISBN 1-59047-378-7- lots of great examples - worth browsing to see what you can do with functions in SAS
|
|
[BM] Braun WJ and Murdoch DJ 2007. A First Course in Statistical Programming with R. Cambridge University Press. Cambridge, UK ISBN 978-0-521-69424-7 |
Belief and Style:
You learn programming by doing it. Actually, you tend to learn a lot more from failing and fixing code than by getting it right the first time. So, you will get the most out of this class by trying the various code “displays” and the suggested exercises. This class does not follow a simple linear trajectory in which topics are not used until they are fully defined and developed. In programming, you may find that you need functions, procedures, etc. long before you learned about them in any formal way. Thus, I unapologetically, brashly and frustratingly use ideas that might be formally defined in later discussions if it helps tell a more interesting programming story earlier. In addition, many of the homework problems may require that you dig for additional programming information in order to successfully complete the assignment. Problems will be assigned early in the discussion and then I will serve as a “consultant” to the students over the next class periods as you work on the assignments.
Other resources:
| SAS docs | |
|---|---|
| SAS | www.sas.com |
| R | www.r-project.org |
Grading:
Homework and projects will contribute to the final grade. Homework will contribute 80% of the grade while a mid-term project report and a final project report will each contribute a total of 20% to the final grade. Homework will be typed on a computer with appropriate output included and annotated. It is expected that programs will be internally documented with adequate amounts of commenting. Homework hint: start early! Programming projects always take longer than you estimate.
Expectations for independent work:
I expect you to struggle with implementing solutions to homework problems. During this struggle, you will ask me questions, and you will talk to your classmates about sticky programming issues. Talking about programming projects is an opportunity to learn. Helping others with debugging and coding is useful. HOWEVER, copying code and changing variable names is not equivalent to struggling with the work. Don’t cheat. Don’t plagiarize. When you are caught, your academic career could be ruined.
* STA 502 Project: Students enrolled in STA 502 will be required to complete an additional project and grades will be separately assigned in 402 and 502. This extra project will involve either: 1) an additional simulation study in which the impact of violating (at least one) assumption underlying a statistical inference procedure is investigated; 2) a large scale data management project or 3) a description of a statistical methods/ideas not discussed in class but implemented in SAS (e.g. power and sample size planning, MCMC and Bayesian analyses, incorporating survey sampling weights in an analysis). A written report detailing this project is due Nov. 20. Feel free to discuss possible projects with other faculty or me.
* Homework must be in my mailbox by 4 p.m. on the assigned due date in order to be considered.
Calendar:
Course Outline (rough guide to 402/502)
|
# weeks |
Tentative topics (associated with estimated # weeks!) |
|
1 |
BASIC CONCEPTS (Ch. 1) *Review basic concepts of statistical computing and research data management* Introduce SAS data sets * Review the form of SAS Statements and SAS names * Introduce SAS procedures * Review the structure of SAS programs * Describe SAS data libraries and what they can contain * Show documenting SAS programs using comments * Illustrate running SAS programs and basic debugging |
|
1-2 |
Constructing a data set for analysis: reading, combining, managing and manipulating data sets (Ch. 2) 1. Temporary versus permanent status of data sets – LIBNAME
|
|
1-2 |
Using SAS procedures (Ch. 3) 1 SAS system options – options
|
|
1 |
Complex table construction and output control (i.e., “pretty” output) (Ch. 4) 1. PROC TABULATE |
|
1 |
Basic models in SAS (Ch. 5) 1. Overview of modeling |
|
1-2 |
Producing Statistical Graphics (Ch. 6) 1. Old School (device-based) / New School (template-based) SAS graphics
|
|
1-2 |
Formatting, basic DATA step manipulations and programming (Ch. 7) 1. Internal representations and output displays
|
|
1-2 |
Programming in a DATA step (Ch. 8) 1. Storage bins for collections of values - ARRAYS |
|
1 |
MACRO programming (Ch. 9) 0. What is a macro and why would you use it?
|
|
1-2 |
Programming with matrices and vectors – IML (Ch. 10) 1: Basic matrix definition + subscripting2: Diagonal matrices and stacking matrices 3: Repeating, Element-wise operations and Matrix Multiplication 4 Importing SAS data sets into IML and exporting matrices from IML to data set 4.1: Creating matrices from SAS data sets and vice versa 5: CASE STUDY 1: Monte Carlo integration to estimate p 6: CASE STUDY 2: Bisection root finder 7: CASE STUDY 3: Randomization test using matrices imported from PLAN 8: CASE STUDY 4: IML module to implement Monte Carlo integration to estimate p 8.1: Storing and loading IML modules 9: SAS/IML Studio 9.1 CASE STUDY 1: Dynamic and interactive analysis of the SMSA country data set 9.2: CASE STUDY 2: Multiple-linked graphics windows 9.3 CASE STUDY 3: IML matrix manipulations and invocations of SAS/Stat procedures 9.4 CASE STUDY 4: Calling R library to generate bootstrap confidence intervals for mean MPG
|
|
2-5
|
TOPICS IN STATISTICAL PROGRAMMING (varies) * Introduction to quantitative programming in R
(objects-vectors, lists, matrices, dataframes; reading data [scan, read.table,
sas.get]; summarizing data sets [mean, var, summary, table];
graphical displays [plot, pairs, coplots]; writing functions. |
FAQs