## Using SAS for self-controlled case series studies

**The macros**

The SAS macros are written to run under SAS v8.0 or above.

There are two main macros for fitting the self-controlled case series method (both parametric and semi-parametric) in SAS:

- Sccs.sas: This macro creates the dataset for case series analysis
- Poisreg.sas: This macro fits a Poisson (case series) regression model with the possibility to eliminate a factor from the model, to reduce the computation time (similar to ELIMINATE in GLIM or the GROUPS option to MODEL in GENSTAT). Estimation is done using Newton-Raphson.

These macros can be read into SAS using the %INCLUDE statement (change the directory according to the location of the macros):

%INCLUDE “c:\temp\sccs\sccs.sas”;

%INCLUDE “c:\temp\sccs\poisreg.sas”;

%INCLUDE “c:\temp\sccs\element.sas”;

The following global parameters should be specified before calling the macros sccs.sas or poisreg.sas:

- agerange: e.g., 0 730 (from 0 to 730 days of age). The observation period for a subject will be determined from agerange and the variables startst and endst, specified in sccs.sas
- risk: risk periods after vaccination, e.g., 0 6 7 14 (first period from 0 to 6 days and a second period from 7 to 14 days)
- age: age cutpoints for age covariates in model,e.g., 90 180 270 360. Leave empty if no age categories are used. This global macrovariable will not be used in a semiparametric model
- season: season cutpoints for season covariates in model,e.g., 31MAR 30JUN 30SEP 31DEC. Leave empty if no season categories are used
- semi: put this parameter equal to Y if a semiparametric analysis needs to be fitted. In this case, the global macrovariable age will not be used. Further, the output dataset wk_sccs will not contain the variables start, offset and l_off because these are irrelevant

The macro sccs.sas has the following parameters:

Note that dates can be specified on the age scale (i.e., number of days since birth) or on the calendar scale (i.e., SAS numeric date value: number of days since 01JAN1960)

- data: Input dataset which contains the variables used in the analysis in cross-sectional format (i.e., only one row per subject)
- pid: variable in dataset data that contains the subject ID number
- dob_raw: variable (numeric) in dataset data that contains the date of birth (on the calendar scale). Leave this parameter empty if the dates of events and vaccinations are specified on the age scale
- events: variables (numeric) in dataset data that contain the dates (on age or calendar scale) for the different events. If events are specified on calendar scale, dob_raw should be specified
- vacc: variables (numeric) in dataset data that contain the dates (on age or calendar scale) for the different vaccinations. If vaccinations are specified on calendar scale, dob_raw should be specified
- startst: variable (numeric) in dataset data that contains the study start for each individual (on age or calendar scale)
- endst: variable (numeric) in dataset data that contains the study end for each individual (on age or calendar scale)
- covars: covariates in dataset data that should be put in the output dataset such that they can be used in the analysis
- overlap: Indicator variable to allow overlapping risk intervals (Default=N)
- outdata: Output dataset (Default=wk_sccs)

The macro sccs.sas outputs a dataset (default name = wk_sccs) which contains several rows per subject (one for each interval) and the following parameters:

- Variable containing subject ID number (same as in pid)
- Variable containing date of birth (same as in dob_raw)
- start: beginning of each interval (on age scale). Not available when a semi-parametric model is used
- stop: end of each interval (on age scale). If a semi-parametric model is used, this variable contains the timings at which events occurred
- offset: difference between start and stop. Not available when a semi-parametric model is used
- l_off: natural log of offset. Not available when a semi-parametric model is used
- nevt: number of events per interval
- age: class variable which contains the age category for each interval (specified in global macrovariable age)
- season: class variable which contains the season category for each interval (specified in global macrovariable season)
- Several indicator variables for the risk periods

- Risk = 1 if the interval lies in a risk period (over all doses and all risk periods)
- RiskVi = 1 if the interval lies in a risk period after dose i
- RiskRj = 1 if the interval lies in a jth risk period (over all doses)
- Riskk = 1 if the interval lies in the kth risk period (each risk period after each dose has a separate indicator variable)
- When overlap=Y, overlapping intervals are possible and these variables should be used with caution

- int: intercept (equal to 1)
- Different covariates specified in covars

The macro poisreg.sas has the following parameters:

- data: input dataset (e.g., wk_sccs from macro sccs.sas)
- y: Response variable for the Poisson regression (e.g., nevt)
- covar: Covariates in the model that are not in class or elim
- class: Class variables in model (e.g., age)
- elim: Variable that should be eliminated from the model. This variable is in the model but is not estimated (e.g., pid)
- offset: Offset variable (Leave empty when using a semiparametric model)
- beta0: Starting values. If empty, all parameters are put equal to 0
- outdata: dataset with parameter estimates (DEFAULT =out)
- eps: convergence criteria (DEFAULT=1e-08), the maximum of the absolute value of the first derivative is calculated, if smaller than eps, the algorithm stops
- alpha: significance level for confidence interval (DEFAULT=0.05)
- prntyn: print results (DEFAULT=Y)
- covb: if Y, print variance-covariance matrix and save into SAS dataset _covb (DEFAULT=N)

**Examples**

The following examples, taken from the Tutorial paper (Statistics in medicine, 2006; 25(10): 1768-1797), show how the macros can be used for fitting the self-controlled case series analysis in SAS.

In each of the examples, three global macro variables are defined which contain the following directories:

- macdir, the directory containing the macros,
- outdir, the directory where the output will be stored,
- indir, the directory with the input datasets.

**MMR and meningitis in Oxford example**

The SAS batch oxford.sas contains the case series analysis of the Oxford dataset from Miller et al. (The Lancet, 1993; 341: 979-982). The results can be found in oxford.lst.

Additional details and comments are given in the SAS program itself.

Both the original analysis as well as the semi-parametric analysis are illustrated.

oxford_cov.sas contains the case series analysis of the Oxford dataset, displaying the variance-covariance matrix. The option COVB in PROC GENMOD has also been included to compare the output. The covariance matrix is saved in matrix_covb and printed when parameter covb=Y.

**MMR and ITP examples**

The SAS batch ITP and MMR.sas contains the case series analysis of the ITP and MMR dataset from Miller et al. (Archives of Disease in Childhood 2001;84:227-229). The results can be found in ITP and MMR.lst.

The data is imported from itp.dat using the INFILE statement.

The SAS program contains

- the analyses on all events (multiple events per subject),
- the analysis using the first event for each subject only and
- the semi-parametric analysis.

**OPV and IS examples**

The SAS batch IS and OPV.sas contains the case series analysis of the IS and OPV dataset from Andrews et al. (European Journal of Epidemiology 2001; 17:701-706). The results can be found in IS and OPV.lst.

The data is imported from intus.dat using the INFILE statement.

The SAS program contains the different analyses that are performed in the tutorial:

- Analysis 1: Standard analysis, including only Dose 3 (with and without gender interaction)
- Analysis 2: Observation period starts at Dose 3
- Analysis 3: pre-Dose 3 period
- Analysis 4: All doses with no dose effect
- Analysis 5: All doses with dose effect

**SCCS for censored, perturbed or curtailed post event exposures**

Macros provided below can be used to fit the SCCS extension described in the paper

Farrington, Whitaker and Hocine. Case series analysis for censored, perturbed, or curtailed post-event exposures. Biostatistics, 2008, 10(1): 3-16.

**SAS code for the paper: Hua et al. 'A Simulation Study to Compare Three Self-Controlled Case Series Approaches: Correction for Violation of Assumption and Evaluation of Bias' Pharmacoepidemiology and drug safety**

This macro is for multiple exposures, pseudo-likelihood method (for censored, perturbed or curtailed post event exposures, Farrington et al, Biostatistics 2009) pseudo_multiple_exposures.SAS.

**SAS code for the paper: Cohet et al. 'Effect of the adjuvanted (AS03) A/H1N1 2009 pandemic influenza vaccine on the risk of rejection in solid organ transplant recipients in England: a self-controlled case series’. BMJ Open 2016.**

Two new macros were created for this paper, along with a document describing the macros. The first, SCCSDTA.SAS, is a more flexible macro than SCCS.SAS for formatting data. The second, SCCS_MOD.SAS, fits the adapted SCCS method for censored, perturbed or curtailed post event exposures (Farrington et al 2009) and produces sandwich variance estimates, this is likely to be quicker and more reliable than bootstrapping. Read the description carefully and refer to the study to understand the data formatting. The example uses dates, but these could be replaced with ages. SCCSDTA.SAS creates an extra variable Dose (which indicates the dose that precedes a period), required for SCCS_MOD.

**SAS code for use when the end of the exposure risk periods varies between individuals**

Kindly provided by Cécile Landais and Aurélie Bardet, Roche

This macro extends to the situation in which the end of the exposure risk period varies between individuals. It gives an additional option by directly specifying in the input dataset the value for variable ENDVA… so that different risk durations can be studied.

macro_sccs.sas