Survival Analysis Using SAS , livre ebook

SAS Institute - Paul D. Allison

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

217 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Easy to read and comprehensive, Survival Analysis Using SAS: A Practical Guide, Second Edition, by Paul D. Allison, is an accessible, data-based introduction to methods of survival analysis. Researchers who want to analyze survival data with SAS will find just what they need with this fully updated new edition that incorporates the many enhancements in SAS procedures for survival analysis in SAS 9. Although the book assumes only a minimal knowledge of SAS, more experienced users will learn new techniques of data input and manipulation. Numerous examples of SAS code and output make this an eminently practical book, ensuring that even the uninitiated become sophisticated users of survival analysis. The main topics presented include censoring, survival curves, Kaplan-Meier estimation, accelerated failure time models, Cox regression models, and discrete-time analysis. Also included are topics not usually covered in survival analysis books, such as time-dependent covariates, competing risks, and repeated events.
Survival Analysis Using SAS: A Practical Guide, Second Edition, has been thoroughly updated for SAS 9, and all figures are presented using ODS Graphics. This new edition also documents major enhancements to the STRATA statement in the LIFETEST procedure; includes a section on the PROBPLOT command, which offers graphical methods to evaluate the fit of each parametric regression model; introduces the new BAYES statement for both parametric and Cox models, which allows the user to do a Bayesian analysis using MCMC methods; demonstrates the use of the counting process syntax as an alternative method for handling time-dependent covariates; contains a section on cumulative incidence functions; and describes the use of the new GLIMMIX procedure to estimate random-effects models for discrete-time data.
This book is part of the SAS Press program.

Sujets

Getting Started

Bayesian inference

Statistics

Informations

Publié par	SAS Institute
Date de parution	29 mars 2010
Nombre de lectures	0
EAN13	9781599948843
Langue	English
Poids de l'ouvrage	4 Mo

Informations légales : prix de location à la page 0,0145€. Cette information est donnée uniquement à titre indicatif conformément à la législation en vigueur.

Extrait

urvivalAnalysisUsing ® SAS

A Practical GuideSecond Edition

Paul D. Allison

The correct bibliographic citation for this man9al is as follows: Allison, Pa9l D. 2010.Survival Analysis ® Using SAS : A Practical Guide, Second Edition. Cary, NC: SAS Instit9te Inc. ® Survival Analysis Using SAS : A Practical Guide, Second Edition Copyright © 2010, SAS Instit9te Inc., Cary, NC, USA ISBN 78-1-54-884-3 All rights reserved. Prod9ced in the United States of America. For a hard-copy book:No part of this p9blication may be reprod9ced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, witho9t the prior written permission of the p9blisher, SAS Instit9te Inc. For a Web download or e-book:Yo9r 9se of this p9blication shall be governed by the terms established by the vendor at the time yo9 acq9ire this p9blication. The scanning, 9ploading, and distrib9tion of this book via the Internet or any other means witho9t the permission of the p9blisher is illegal and p9nishable by law. Please p9rchase only a9thorized electronic editions and do not participate in or enco9rage electronic piracy of copyrighted materials. Yo9r s9pport of others' rights is appreciated. U.S. Government Restricted Rights Notice:Use, d9plication, or disclos9re of this software and related doc9mentation by the U.S. government is s9bject to the Agreement with SAS Instit9te and the restrictions set forth in FAR 52.227-1, Commercial Comp9ter Software-Restricted Rights (J9ne 187). SAS Instit9te Inc., SAS Camp9s Drive, Cary, North Carolina 27513-2414 March 2014 SAS Instit9te Inc. provides a complete selection of books and electronic prod9cts to help c9stomers 9se SAS software to its f9llest potential. For more information abo9t o9r e-books, e-learning prod9cts, CDs, and hard-copy books, visit the SAS Books Web site at s9pport.sas.com/bookstore or call 1-800-727-3228. ® SAS and all other SAS Instit9te Inc. prod9ct or service names are registered trademarks or trademarks of SAS Instit9te Inc. in the USA and other co9ntries. ® indicates USA registration. Other brand and prod9ct names are registered trademarks or trademarks of their respective companies.

CONTENTS

PREFACE

Chapter 1 Introduction What Is Survival Analysis? What Is Survival Data? Why Use Survival Analysis? Approaches to Survival Analysis What You Need to Know Computing Notes

Chapter 2 Basic Concepts of Survival Analysis Introduction Censoring Describing Survival Distributions Interpretations of the Hazard Function Some Simple Hazard Models The Origin of Time Data Structure

Chapter 3 Estimating and Comparing Survival Curves with PROC LIFETEST Introduction The Kaplan-Meier Method Testing for Differences in Survivor Functions The Life-Table Method Life Tables from Grouped Data Testing for Effects of Covariates Log Survival and Smoothed Hazard Plots Conclusion

Chapter 4 Estimating Parametric Regression M odels w ith PROC LIFEREG Introduction The Accelerated Failure Time Model Alternative Distributions Categorical Variables and the CLASS Statement Maximum Likelihood Estimation Hypothesis Tests Goodness-of-Fit Tests with the Likelihood-Ratio Statistic Graphical Methods for Evaluating Model Fit Left Censoring and Interval Censoring Generating Predictions and Hazard Functions The Piecewise Exponential Model Bayesian Estimation and Testing Conclusion

Chapter 5 Estimating Cox Regression M odels with PROC PHREG

Introduction The Proportional Hazards Model Partial Likelihood Tied Data Time-Dependent Covariates Cox Models with Nonproportional Hazards Interactions with Time as Time-Dependent Covariates Nonproportionality via Stratification Left Truncation and Late Entry into the Risk Set Estimating Survivor Functions Testing Linear Hypotheses with CONTRAST or TEST Statements Customized Hazard Ratios Bayesian Estimation and Testing Conclusion

Chapter 6 Competing Risks Introduction Type-Specific Hazards Time in Power for Leaders of Countries: Example Estimates and Tests without Covariates Covariate Effects via Cox Models Accelerated Failure Time Models Alternative Approaches to Multiple Event Types Conclusion

Chapter 7 Analysis of Tied or Discrete Data with PR OC LOGISTIC Introduction The Logit Model for Discrete Time The Complementary Log-Log Model for Continuous-Time Processes Data with Time-Dependent Covariates Issues and Extensions Conclusion

Chapter 8 Heterogeneity, Repeated Events, and Other Topics Introduction Unobserved Heterogeneity Repeated Events 2 Generalized R Sensitivity Analysis for Informative Censoring

Chapter 9 A Guide for the Perplexed How to Choose a Method Conclusion

Appendix 1 M acro Programs Introduction The LIFEHAZ Macro The PREDICT Macro

Appendix 2 Data Sets

Introduction The MYEL Data Set: Myelomatosis Patients The RECID Data Set: Arrest Times for Released Prisoners The STAN Data Set: Stanford Heart Transplant Patients The BREAST Data Set: Survival Data for Breast Cancer Patients The JOBDUR Data Set: Durations of Jobs The ALCO Data Set: Survival of Cirrhosis Patients The LEADERS Data Set: Time in Power for Leaders of Countries The RANK Data Set: Promotions in Rank for Biochemists The JOBMULT Data Set: Repeated Job Changes References Index

PREFACE

When the first edition of Survival Analysis Using SAS was published in 1995, my goal was to provide an accessible, data-based introduction to methods of survival analysis, one that focused on methods available in SAS and that also used SAS for the examples. The success of that book confirmed my belief that statistical methods are most effectively taught by showing researchers how to implement them with familiar software using real data. Of course, the downside of a software-based statistics text is that the software often changes more rapidly than the statistical methodology. In the 15 years that the first edition of the book has been in print, there have been so many changes to the features and syntax of SAS procedures for survival analysis that a new edition has been long overdue. Indeed, I have been working on this second edition for several years, but got partially sidetracked by a four-year term as department chair. So, it’s a great relief that I no longer have to warn potential readers about out-of-date SAS code. Although the basic structure and content of the book remain the same, there are numerous small changes and several large changes. One global change is that all the figures use ODS Graphics. Here are the other major changes and additions:

■ Chapter 3, “Estimating and Comparing Survival Curves with PROC LIFETEST.” This chapter documents some major enhancements to the STRATA statement, which now offers several alternative tests for comparing survivor functions. It also allows for pairwise comparisons and for adjustment ofp-values for multiple comparisons. In the first edition, I demonstrated the use of a macro called SMOOTH, which I had written to produce smoothed graphs of hazard function. That macro is no longer necessary, however, because the PLOTS option (combined with ODS Graphics) can now produce smoothed hazard functions using a variety of methods. ■ Chapter 4, “Estimating Parametric Regression Models with PROC LIFEREG.” This chapter now includes a section on the PROBPLOT command, which offers graphical methods to evaluate the fit of each model. The last section introduces the new BAYES statement, which (as the name suggests) makes it possible to do a Bayesian analysis of any of the parametric models using MCMC methods. ■ Chapter 5, “Estimating Cox Regression Models with PROC LIFEREG.” The big change here is the use of the counting process syntax as an alternative method for handling time-dependent covariates. When I wrote the first edition, the counting process syntax had just been introduced, and I did not fully appreciate its usefulness for handling predictor variables that vary over time. Another new topic is the use of the ASSESS statement to evaluate the proportional hazards assumption. Finally, there is a section that describes the BAYES statement for estimating Cox models and piecewise exponential models. ■ Chapter 6, “Competing Risks.” This chapter now contains a section on cumulative incidence functions, which is a popular alternative approach to competing risks. ■ Chapter 7, “Analysis of Tied or Discrete Data with the LOGISTIC Procedure.” The first edition also used the PROBIT and GENMOD procedures to do discrete time analysis. But, PROC LOGISTIC has been enhanced to the point where the other procedures are no longer needed for this application. ■ Chapter 8, “Heterogeneity, Repeated Events, and Other Topics.” For repeated events and other kinds of clustered data, the WLW macro that I described in the first edition has been superseded by the built-in option COVSANDWICH. In this chapter, I also describe the use of the new GLIMMIX procedure to estimate random-effects models for discrete time data.

Please note that I use the following convention for presenting SAS programs. All

words that are part of the SAS language are shown in uppercase. All user-specified variable names and data set names are in lowercase. In the main text itself, both SAS keywords and user-specified variables are in uppercase. I am most grateful to my editor, George McDaniel, for his patient persistence in getting me to finish this new edition.

CHAPTER1 Introduction

What Is Survival Analysis? What Is Survival Data? Why Use Survival Analysis? Approaches to Survival Analysis What You Need to Know Computing Notes

WHATISSURVIVALANALYSIS?

Survival analysisis a class of statistical methods for studying the occurrence and timing of events. These methods are most often applied to the study of deaths. In fact, they were originally designed for that purpose, which explains the name. That name is somewhat unfortunate, however, because it encourages a highly restricted view of the potential applications of these methods. Survival analysis is extremely useful for studying many different kinds of events in both the social and natural sciences, including disease onset, equipment failures, earthquakes, automobile accidents, stock market crashes, revolutions, job terminations, births, marriages, divorces, promotions, retirements, and arrests. Because these methods have been adapted—and sometimes independently discovered—by researchers in several different fields, they also go by several different names: event history analysis (sociology), reliability analysis (engineering), failure time analysis (engineering), duration analysis (economics), and transition analysis (economics). These different names don’t imply any real difference in techniques, although different disciplines may emphasize slightly different approaches. Because survival analysis is the name that is most widely used and recognized, it is the name I use here. This book is about doing survival analysis with SAS. I have also written an introduction to survival analysis that is not oriented toward a specific statistical package (Allison, 1984), but I prefer the approach taken here. To learn any kind of statistical analysis, you need to see how it’s actually performed in some detail. And to do that, you must use a particular computer program. But which one? Although I have performed survival analysis with many different statistical packages, I am convinced that SAS currently has the most comprehensive set of full-featured procedures for doing survival analysis. When I compare SAS with any of its competitors in this area, I invariably find some crucial capability that SAS has but that the other package does not. When you factor in the extremely powerful tools that SAS provides for data management and manipulation, the choice is clear. On the other hand, no statistical package can do everything, and some methods of survival analysis are not available in SAS. I occasionally mention such methods, but the predominant emphasis in this book is on those things that SAS can actually do. I don’t intend to explain every feature of the SAS procedures discussed in this book. Instead, I focus on those features that are most widely used, most potentially useful, or most likely to cause problems and confusion. It’s always a good idea to check the official SAS documentation or online help file.

WHATISSURVIVALDATA?

Survival analysis was designed for longitudinal data on the occurrence of events. But what is an event? Biostatisticians haven’t written much about this question because they have been overwhelmingly concerned with deaths. When you consider other kinds of events,

however, it’s important to clarify what is an event and what is not. I define aneventas a qualitative change that can be situated in time. By aqualitative change, I mean a transition from one discrete state to another. A marriage, for example, is a transition from the state of being unmarried to the state of being married. A promotion consists of the transition from a job at one level to a job at a higher level. An arrest can be thought of as a transition from, say, two previous arrests to three previous arrests. To apply survival analysis, you need to know more than just who is married and who is not married. You need to knowwhenthe change occurred. That is, you should be able to situate the event in time. Ideally, the transitions occur virtually instantaneously, and you know the exact times at which they occur. Some transitions may take a little time, however, and the exact time of onset may be unknown or ambiguous. If the event of interest is a political revolution, for example, you may know only the year in which it began. That’s all right so long as the interval in which the event occurs is short relative to the overall duration of the observation. You can even treat changes inquantitativevariables as events if the change is large and sudden compared to the usual variation over time. A fever, for example, is a sudden, sustained elevation in body temperature. A stock market crash could be defined as any single-day loss of more than 20 percent in some market index. Some researchers also define events as occurring when a quantitative variable crosses a threshold. For example, a person is said to have fallen into poverty when income goes below some designated level. This practice may not be unreasonable when the threshold is an intrinsic feature of the phenomenon itself or when the threshold is legally mandated. But I have reservations about the application of survival methods when the threshold is arbitrarily set by the researcher. Ideally, statistical models should reflect the process generating the observations. It’s hard to see how such arbitrary thresholds can accurately represent the phenomenon under investigation. For survival analysis, the best observation plan is prospective. You begin observing a set of individuals at some well-defined point in time, and you follow them for some substantial period of time, recording the times at which the events of interest occur. It’s not necessary that every individual experience the event. For some applications, you may also want to distinguish between different kinds of events. If the events are deaths, for example, you might record the cause of death. Unlike deaths, events like arrests, accidents, or promotions are repeatable; that is, they may occur two or more times to the same individual. While it is definitely desirable to observe and record multiple occurrences of the same event, you need specialized methods of survival analysis to handle these data appropriately. You can perform survival analysis when the data consistonlyof the times of events, but a common aim of survival analysis is to estimate causal or predictive models in which the risk of an event depends on covariates. If this is the goal, the data set must obviously contain measurements of the covariates. Some of these covariates, like race and sex, may be constant over time. Others, like income, marital status, or blood pressure, may vary with time. For time-varying covariates, the data set should include as much detail as possible on their temporal variation. Survival analysis is frequently used withretrospectivedata in which people are asked to recall the dates of events like marriages, child births, and promotions. There is nothing intrinsically wrong with this approach as long as you recognize the potential limitations. For one thing, people may make substantial errors in recalling the times of events, and they may forget some events entirely. They may also have difficulty providing accurate information on time-dependent covariates. A more subtle problem is that the sample of people who are actually interviewed may be a biased subsample of those who may have been at risk of the event. For example, people who have died or moved away will not be included. Nevertheless, although prospective data are certainly preferable, much can be learned from retrospective data.