STA 402/502 - Statistical Programming

Fall 2008

Course (Section)

STA 402/502 (A)

Meeting Time:

1000-1050 MWF (plus other make-up times to be arranged in consultation with students)

Meeting Location

264 Bachelor Hall

Prerequisites:

STA 401/501; STA 671; or permission of the instructor

Professor:

Dr. John Bailer

E-mail:

baileraj@muohio.edu

URL:

http://www.users.muohio.edu/baileraj

Office (phone)

292 Bachelor Hall (529-3538)

FAX: 529-1493

Office Hours

12:30 - 1:45 M F

11:00-12:00 W

(other hours by appointmentdon't be shy!) 

Purpose of Course:

To introduce the use of computers to process and analyze data Techniques and strategies for managing, manipulating and analyzing data are discussed. Emphasis is on the use of the SAS system. SAS data steps including infile, input, merge, set, looping structures, conditional execution (if-then), etc. are presented. SAS mathematical, statistical and data functions are discussed along with discussion of macro construction, extensive matrix manipulation and programming (PROC IML) and graphics procedures. Other quantitative programming environments (e.g. R, S-Plus) are considered for constructing specialized statistical analysis functions and graphical displays. Statistical computing topics, such as random number generation, randomization tests and Monte Carlo simulation, will be used to illustrate these programming ideas.

Course Objectives:

Develop programming and computing skills to address data analysis problems using statistical programming tools.

Texts:

Required:

[DS] Delwiche LD and Slaughter SJ. 2003. The Little SAS Book: A Primer  3rd edition. SAS Institute. Cary, NC  ISBN 1-59047-333-7

[BM] Braun WJ and Murdoch DJ  2007. A First Course in Statistical Programming with R. Cambridge University Press. Cambridge, UK  ISBN 978-0-521-69424-7

Other books that you might like ...

[CP] Cody R and Pass R. 1995. SAS Programming by Example. SAS Institute. Cary, NC ISBN 1-55544-681-7

[C] Cody R 2004. SAS Functions by Example. SAS Institute. Cary, NC ISBN 1-59047-378-7

- lots of great examples - worth browsing to see what you can do with functions in SAS

[KO] Krause A and Olson M. 2004. The basics of S and S-Plus. Springer-Verlag, New York, NY

ISBN 0-387-95456-2

 Other resources:

SAS doc via MU

www.muohio.edu/quantapps

http://support.sas.com/onlinedoc/912/docMainpage.jsp

http://www.units.muohio.edu/doc/sassystem/sasonlinedocv8/sasdoc/sashtml/main.htm

SAS

www.sas.com

support.sas.com

R www.r-project.org
S-Plus

http://www.insightful.com/support/doc_splus_win.asp

www.insightful.com/products/splus/default.asp

Grading:

Homework and projects will contribute to the final grade. Homework will contribute 75% of the grade while a mid-term project report and a final project report will each contribute a total of 25% to the final grade. Homework will be typed on a computer with appropriate output included and annotated. It is expected that programs will be internally documented with adequate amounts of commenting.

* STA 502 Project: Students enrolled in STA 502 will be required to complete either: 1) an additional simulation study in which the impact of violating (at least one) assumption underlying a statistical inference procedure is investigated; or 2) a large scale data management project. A written report detailing this project is due Dec. 5. Feel free to discuss possible projects with other faculty or me.

* Homework must be in my mailbox by 4 p.m. on the assigned due date in order to be considered.

Other dates of interest:

Sept. 1

LABOR DAY, no classes.

Oct. 17-19

Mid-term break

Nov. 27-30

Thanksgiving break

Dec. 12

Classes end

Course Outline (rough guide to 402/502)

week

Tentative topics

[DS]

[CP]

[BM]

1

BASIC CONCEPTS

*Review basic concepts of statistical computing and research data management

* Introduce SAS data sets

* Review the form of SAS Statements and SAS names

* Introduce SAS procedures

* Review the structure of SAS programs

* Describe SAS data libraries and what they can contain

* Show documenting SAS programs using comments

* Illustrate running SAS programs and basic debugging

1,2

1

n/a

2

USING SAS PROCEDURES

* Introduce the idea of SAS system options

* Briefly review statements that can be used with most procedures (BY, WHERE, TITLE, FOOTNOTE, LABEL, FORMAT)

* PROC CONTENTS for describing a data set

* PROC PRINT for listing the observations in a data set

* PROC CHART and PROC PLOT for producing low resolution graphs

* PROC FREQ for one-way frequency tables and n-way cross-tabulations

* PROC UNIVARIATE for descriptive statistics and distributional information

* PROC MEANS for descriptive statistics

* PROC SORT for sorting a data set

* SAS documentation and the online help system

4

9, 10, 12, 13

n/a

3

REPORT WRITING

* Introduce the Output Delivery System (ODS) for customizing procedure output

* PROC TABULATE for producing nicely-formatted tables

4,5

 

n/a

4

AN INTRODUCTION TO STATISTICAL MODELING

* PROC REG for linear modeling (a very basic introduction)

* PROC GLM for anova models

8

 

n/a

5

HIGH-RESOLUTION GRAPHICS AND FORMATS

* Introduce concepts related to high-resolution graphs

* PROC GCHART and PROC GPLOT for producing high-resolution graphs

* SAS-supplied formats and PROC FORMAT for user-defined formats

 

 

n/a

6

TRANSFORMING SAS DATA SETS

* Creating SAS data sets with DATA steps: flow of execution, including the program data vector

* Creating variables in DATA steps with assignment statements

* Statements: DATA, SET, OUTPUT, RETURN, WHERE, IF, DROP, KEEP, LENGTH

* Subsetting observations and variables

* Using SAS functions and operators

* Working with SAS date values (also time and date-time)

* Introduction to missing values

3

 

n/a

7

SAS PROGRAMMING

* Declarative vs. executables statements

* Statements: RETAIN, RENAME, LABEL, FORMAT, SUM

* Using formats in DATA steps

* Conditional execution

* DO groups

* Arrays

* More on missing values

4

 

n/a

8

COMBINING AND MANAGING SAS DATA SETS

* SET statement for concatenation and interleaving

* MERGE statement for joining observations

* UPDATE statement for updating a master file (maybe)

* Special variables: IN, END, FIRST, and LAST

* Creating multiple data sets in one DATA step

* Reshaping data sets

* Managing data sets using PROC COPY and PROC DATASETS

* Transporting data sets between hosts

6

3

n/a

9

WRITING EXTERNAL FILES

* Statements: FILE, PUT

* Using DATA _NULL_

* PUT function

* Creating customized reports using DATA setps

9

 

n/a

10

MACRO LANGUAGE

* Why use macros?

* Macro variables- system-defined and user-defined

* Macros

* Macro parameters

* Macro functions

* Conditional execution and DO loops

* CALL SYMPUT

7

 

n/a

11

SAS/IML Programming

* Basic matrix concepts: rows, columns, scalars

* matrix operators

* subscripting

* matrix functions

* creating matrices from data sets and vice versa

* sample applications

 

 

n/a

12-15

TOPICS IN STATISTICAL PROGRAMMING (varies)

* Introduction to quantitative programming in S-Plus (objects-vectors, lists, matrices, dataframes; reading data [scan, read.table, sas.get]; summarizing data sets [mean, var, summary, table]; graphical displays [plot, pairs, trellis displays]; writing functions.

 

 

 

 

R/S-Plus: Intro. & GUI (S-Plus)

 

 

1,2

 

R/S-Plus: Data structures

 

 

2.2.7, 2.2.15, 2.2.16, 6.1

 

R/S-Plus: Basic graphics

 

 

3

 

R/S-Plus: Programming (flow control, functions, etc.)

 

 

4

 

R/S-Plus: Simulation

 

 

5

 

R/S-Plus: Other topics?

 

 

6,7

FAQs

  1. Where can I run SAS on campus? A: Various labs have SAS. For more information, see http://www.units.muohio.edu/mcs/academictechnology/learning_technologies/) or the RedHawk cluster (redhawk.hpc.muohio.edu). I will request RedHawk accounts for all 402/502 students. Dave Woods of IT Services (woodsdm2@muohio.edu) will lead a class session on running SAS on the cluster.
  2. Can I get SAS on my personal computer? A: Yes, assuming you have a Windows machine or can run Windows on your Mac (via VMWare Fusion or Parallels desktop). You can purchase the SAS Learning Edition for $60 (from http://support.sas.com/learn/le/start_student.html). Other solutions may be available from Miami - stay tuned. The print center may be providing SAS on disk for $30 (still being resolved as of 19 aug 08).
  3. How do I download R? A: Go to www.r-project.org and follow the downloads link from a CRAN Mirror near us (e.g. Statlib at CMU). You can download Linux, MacOS X or Windows precompiled binary distributions of the base system and contributed packages from the mirrors.
  4. Can I get formal certification in SAS? A: Yes. There are different levels of certification (e.g. Base, Advanced, etc.) and students can take these exams for $90 (half price). See http://support.sas.com/certify/faq.html#fee for more information.
  5. When should I join professional societies? A: Now! Check out www.amstat.org (can join ASA for $10) or www.enar.org ($27).