One of the reports that is run for quality assurance and audit purposes is our “Missing and Unknown

9 pages

English

One of the reports that is run for quality assurance and audit purposes is our “Missing and Unknown

Entau - Julie Ann Kilburn

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

9 pages

English

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

An Example of Using Differing Methods to Create Excel Output from SAS Julie Kilburn, Rizvan Mamet, Anna Ter Veer, Ann Vanderplas, Rebecca Ottesen, Joyce Niland, City of Hope, Duarte, CA ABSTRACT DDE, ODS HTML, ODS XML with tagsets…Which is best? The answer depends entirely on the customer’s desired output. This paper will review, by example, the use of DDE to create various Excel reports. It will then explain in detail a more effective way to move data from SAS to Excel for our particular need. In this case, the customers were clinical research associates (CRAs) at 39 institutions collecting data for the National Comprehensive Cancer Network (NCCN) Outcomes Database. The desired output was a detailed report summarizing the number and proportion of patients with missing or unknown categories for each data element in the NCCN database, and institution-specific patient listings for the CRA to review for quality assurance purposes. The pros and cons of each approach and the rationale for our selected approach in this challenging multi-center environment will be described. INTRODUCTION The National Comprehensive Cancer Network (NCCN) Outcomes Database collects outcomes data on breast cancer, non-Hodgkin’s lymphoma, non-small cell lung, ovarian, and colorectal cancers from 19 NCCN member institutions, and 20 community sites affiliated with a NCCN member institution, across the United States. Each n has one or more clinical research ...

Informations

Publié par	Entau
Nombre de lectures	41
Langue	English

Extrait

An Example of Using Differing Methods to Create Excel Output from SAS

Julie Kilburn, Rizvan Mamet, Anna Ter Veer, Ann Vanderplas, Rebecca Ottesen,

Joyce Niland, City of Hope, Duarte, CA

ABSTRACT

DDE, ODS HTML, ODS XML with tagsets…Which is best?

The answer depends entirely on the customer’s desired

output.

This paper will review, by example, the use of DDE to create various Excel reports.

It will then explain in

detail a more effective way to move data from SAS to Excel for our particular need.

In this case, the customers were

clinical research associates (CRAs) at 39 institutions collecting data for the National Comprehensive Cancer Network

(NCCN) Outcomes Database.

The desired output was a detailed report summarizing the number and proportion of

patients with missing or unknown categories for each data element in the NCCN database, and institution-specific

patient listings for the CRA to review for quality assurance purposes.

The pros and cons of each approach and the

rationale for our selected approach in this challenging multi-center environment will be described.

INTRODUCTION

The National Comprehensive Cancer Network (NCCN) Outcomes Database collects outcomes data on breast

cancer, non-Hodgkin’s lymphoma, non-small cell lung, ovarian, and colorectal cancers from 19 NCCN member

institutions, and 20 community sites affiliated with a NCCN member institution, across the United States.

Each

institution has one or more clinical research associates (CRAs) assigned to collecting eligible patient data from

electronic or paper medical records, and entering the data into a network-based database.

For institutions that have

more programming savvy, data can be entered into their legacy system and sent to the NCCN database via quarterly

FTP submission.

The data management of the NCCN Outcomes Database includes rigorous quality assurance (QA)

processes.

These QA processes include initial and follow-up data management training of the CRAs, edit checks

during data entry, logic checks of the data elements and tables that are relational, and quarterly QA reports for each

participating institution across the five cancer sites.

In addition to the quarterly QA reports, annual audits of a random

sample of patients comparing source documents against entered data takes place at each participating institution.

The Data Coordinating Center (DCC) of the NCCN Outcomes Database is located at City of Hope, and consists of a

team of database programmers and statisticians lead by the DCC principal investigator.

The DCC oversees the

database operations and data analysis, with the assistance of the Scientific Office (SO) at the Dana-Farber Cancer

Institute.

One of the many analytic functions of the DCC is to program QA reports for each institution and disease

site, and post the reports on the NCCN Outcomes Database webpage for institution review.

An additional function of

the DCC is to program outcomes reports, including an annual report of benchmarked data, the focus of two other

papers by ter Veer and Ottesen 2009 presented at WUSS

1,2

This paper introduces two examples of QA reports, both of which are output into Excel.

The focus of this paper is to

discuss three different methods of outputting data from SAS to Excel: 1) Dynamic Data Exchange, 2) ODS HTML,

and 3) ODS XML with tagsets.

For each method, a screenshot of a partial Excel report is shown, along with the SAS

code that was used to generate the Excel report.

The advantages and disadvantages (pros and cons) of each

method will be described, including the rationale for choosing a preferred method for the final resulting QA reports.

METHOD 1:

DYNAMIC DATA EXCHANGE (DDE)

To determine how complete the data collection is at a certain date, the CRAs examine results in the “Missing and

Unknown Summary Report”.

The report comes in two main sections: the first part lists a summary of the percentages

of missing and unknown for each variable in a data table, the second section lists the patients with missing and

unknown data in each table so that they CRA can review the patient level data.

This Excel report is distributed to the

CRAs on a quarterly basis for each center and across disease sites.

Clearly, with so many reports to distribute, the

programming and output needs to be automated.

In this case, the one method we will discuss for outputting and

automating the Excel reports is DDE.

Figure 1 is an example of two tables within the summary section of the Missing and Unknown Summary Report,

specifically the Demographics and Study Accession Data tables.

The report lists every data element collected in the

tables across all tables in the database.

Each portion of the summary section represents a data table contained in

the database.

The report across disease sites is not completely identical, as each disease site has additional data

tables and data elements that are collected for that type of cancer only.

The first column, named Variable, lists the

data element description.

The Missing column shows the percent of patients that do not have data entered, that is,

the CRA has not collected data for that percent of patients.

The Unknown column shows the percent of patients that

have an entry of unknown, that is, the CRA reviewed the chart but the answer for that data element is not listed in the

chart .

For example, of the 1725 patients that had data for the Study Accession Table, 0% were missing education

status, and 22% had an unknown education status, as this information was not listed in the chart.

The red font

signifies a percent above 25%; the green font signifies a percent from 10% to <25%.

This color classification helps

underscore the data elements that require further scrutiny by the CRAs.

Additionally rules are built into the

programming so that data elements will be checked for missing/unknown status when it makes sense to do so.

For

example racial background other, an open text field in the Demographics table, is only checked for the N=12 patients

who indicated that their race was other.

Figure 1. An Example of Missing and Unknown Summary Report

The process for creating this first section of the Missing and Unknown Summary Report requires creating an Excel

template, manipulating the data in SAS to get a dataset that contains the desired output, and outputting the data from

SAS to Excel using DDE.

The initial task is to create a template of the desired report in Excel, and is typically

completed by the project manager at the SO and does not require the knowledge of SAS programming.

All of the

formatting is defined in Excel, including header and footer information, borders, shading, color coding, and page

numbering.

The next task is to create a SAS dataset for the desired output.

In turn, the SAS variables from this

dataset are output from SAS to Excel using DDE.

The DDE portion of these tasks are programmed in the sample

SAS code listed below.

SAMPLE SAS CODE

Initially the coding for the Missing and Unknown Summary report involves running a series of PROCs to review the

patient level data and calculate the number of missing and unknown per variable in each data table.

The coding for

this portion is beyond the scope of this paper.

Once a data table level data set has been created which contains the

number and percentages for missing and unknown per variable we can output this information via DDE to the report.

In this data set we have the following variables:

The SAS System

The CONTENTS Procedure

Variables in Creation Order

Variable

Type

Len

var

Char

longname

Char

denom

Num

pcntmiss

Num

pcntunk

Num

total

Num

** Macro to output data to report via DDE;

%macro dumpnow(dsn);

data

_null_; set &dsn;

file alldump notab;

** Outputs each desired variable and adds text formatting to denom variable;

put longname

' (N=' denom ')' '09'x pcntmiss '09'x pcntunk '09'x total '09'x;

run;

%mend;

** Wakes up Excel and opens the template file for the summary report;

options noxsync noxwait;

X ' c:\nccn\WUSS\Templates\Missing

_Unknown_Summary DDE.xls';

** DDE triplet points to location in Excel where to store Demographics output;

filename alldump DDE

'Excel| c:\nccn\WUSS\Templates\

[Missing_Unknown_Summary DDE.xls]all!R3C1:R17C4';

** Calling output macro for Demographics results;

%dumpnow(MUDemographics);

** DDE triplet points to location in Excel where to store Demographics output;

filename alldump DDE

'Excel| c:\nccn\WUSS\Templates\[Missing_Unknow

n_Summary DDE.xls]all!R23C1:R43C4';

** Calling output macro for Demographics results;

%dumpnow(MUStudyacc);

In summary, for this first section of the missing/unknown report the use of Dynamic Data Exchange along with SAS

macros simplifies the creation of multiple reports across centers and disease sites from existing templates.

Almost all

of the monthly and quarterly reports generated can be run with these tools.

Table 1 lists the pros and cons of

choosing DDE for Excel reports.

Table 1. Pros and Cons for Choosing DDE for Excel Reports

DDE Pros:

Get the ‘look’ we want by creating template

Can pinpoint very specific data ranges (can output to one cell if needed)

DDE Cons:

Initially labor intensive

Changes must be updated in template and programming

Must know exactly where data will be placed

METHOD 2: ODS HTML

After the CRAs have reviewed the Missing and Unknown summary section, the next step is for the CRAs to review

the actual list of patients that have a high percent of missing and/or unknown data elements.

This way, the CRA can

request the charts on these patients and complete the data collection for the missing data elements.

To add this

level of detail we had to consider another method for automating the Excel reports.

This is because DDE would no

longer work for automation purposes as the number of patients from any given table and institution would vary.

Therefore we decided to use ODS HTML as our first attempt.

Figure 2 shows an example of a Missing and Unknown Patient List for the variables racial background other and

hispanic, data element s that are collected for all five disease sites.

This list is specific to a center, and the center is

listed under the Center ID column.

The patient identifiers are listed under the Patient ID column.

The last column

lists whether the patient record was missing or unknown for racial background and then again for hispanic.

In this

example, all the listed patient records have a racial background other, collected as a character variable, of missing

and Hispanic, collected as numeric, is unknown.

Figure 2. Example of Missing and Unknown Patient List

As previously mentioned the DDE method is not ideal for generating the Missing Unknown Patient List.

To begin

with, generating the specific patient list can be tedious, and does not lend itself to use for multiple centers, since each

center may need to see different data elements.

The DDE method requires the making of a template for the one-time

request of a handful of data elements. Because of the requirement to specifically place the incoming data, the

DDE/Template method necessitates the creation of one worksheet per data element (we could have any number of

patients from the database with a missing or unknown value, from zero to potentially thousands).

A complete

workbook would have hundreds of sheets, many of which would contain no data.

The ODS HTML method allows the flexibility of one or multiple PRINT procedures to be output to Excel. Only the lists

with more than zero observations are shown. The programming to move the data from SAS to Excel is simple.

However, the report can be unwieldy, with all lists dumping one after the other in one worksheet.

The look of the

report is controlled only with the STLYE option in the ODS statement. However, the header and footer cannot be

controlled from SAS.

As exemplified in Figure 2, the SAS-generated title outputs to the first cell and causes the list

spacing to be distorted for the Center ID column.

The sample SAS code is listed below.

SAMPLE SAS CODE

** Including a format to distinguish for variables with missing or unknown;

proc format;

value mu

.='Missing'

-1='Unknown'

other='Valid';

run;

** Close the listing destination and open an html destination;

** Note the resulting file is saved as xls;

ods listing close;

ods html file='c:\nccn\WUSS\output\MU_pt_list.xls' style=journal;

proc print data=demoX

(where=(pid^=. and compstat in(.,-1))) label noobs;

title 'Missing Unknown Patient List: Completion Status';

var cid pid compstat;

format compstat mu.;

label cid='Center ID'

pid='Patient ID'

compstat='Completion Status';

run;

** Note -1 for dob is a valid value because it is a date;

proc print data=demoX

(where=(pid^=. and dob in(.))) label noobs;

title 'Missing Unknown Patient List: Date of Birth';

var cid pid dob;

label cid='Center ID'

pid='Patient ID'

dob='Date of Birth';

run;

proc print data=demoX

(where=(compstat^=. and racebg in(.,-

1))) label noobs;

title 'Missing Unknown Patient List: Racial Background';

var cid pid racebg;

format racebg mu.;

label cid='Center ID'

pid='Patient ID'

racebg='Racial Background';

run;

** Racebgother has a subcondition for patients who mark other as their race;

** Because this is character data missing is the only value searched;

proc print data=demoX

(where=(compstat^=. and racebg=0 and racebgother=' ')) label

noobs;

title 'Missing Unknown Patient List: Racial Background Other';

var cid pid racebgother;

label cid='Center ID'

pid='Patient ID'

racebgother='Racial Background Other';

run;

proc print data=demoX (where=(compstat^=. and hispanic in(.,-1))) label noobs;

title 'Missing Unknown Patient List: Spanish/Hispanic Background';

var cid pid hispanic;

format hispanic mu.;

label cid='Center ID'

pid='Patient ID'

hispanic='Spanish/Hispanic Background';

run;

/* ... and so on for all data elements ... */

** Closing the html destination and reactivating the listing window;

ods html close;

ods listing;

In summary, the use of ODS HTML to create the desired Excel output for the Missing Unknown Patient Listing

section is in its simplicity.

All that is needed to designate an Excel file is the .xls extension within the ODS HTML

FILE statement.

Table 2 lists the pros and cons of choosing ODS HTML for Excel reports.

Table 2. Pros and Cons for Choosing ODS HTML for Excel Reports

ODS HTML Pros:

Easy to program

Quick turn around time

ODS HTML Cons

Little control over formatting

Output unwieldy

METHOD 3: ODS XML WITH TAGSET EXCELXP

In contrast to the ODS HTML method, the ODS XML with tagset ExcelXP can easily create multiple sheet Excel

workbooks, with control over tab names, formatting, and even header and footer information. A tagset is a special

type of ODS destination (HTML, RTF, etc.).

Tagsets can be modified to meet specific needs by using the

TEMPLATE procedure.

The tagset we use here is called “ExcelXP.”

It creates XML output that can be opened using

Excel.

A comprehensive lesson on what can be done with this tagset, can be found in DelGobbo’s WUSS 2008

Presentation.

The sample programs and data used in his workshop are available at the SAS Presents Web site.

Within the downloadable package is the ExcelXP.sas program, and a sample of how to incorporate it is in the SAS

program CompleteCode.sas

Figure 3 shows an example of a Missing and Unknown Patient List Report for health insurance data elements, which

are collected for all five disease sites.

This list is specific to a center, and the center is listed under the Center ID

column.

The patient identifiers are listed under the Patient ID column.

The assessment identifiers are listed under

the Assessment ID column.

The last column lists whether the patient record was Missing or Unknown for change in

health insurance.

In this example, all the listed patient records have a change in health insurance of Missing.

Also

note that there are three separate worksheets, one per data table.

With ExcelXP, the output can be divided at any

point by using an additional ODS ExcelXP statement.

Figure 4 calls attention to the header that we would like to appear.

Note the use of all sections of the header, as well

as the formatting such as bold, italic, font styles and sizes. The worksheet has a title within the header and titles for

each of the output blocks in the body. This is all handled in the SAS program. Coding for this header and footer (not

shown) is found in the sample code below.

Figure 3. Missing and Unknown Patient List Report

Figure 4. Print Preview of Missing and Unknown Patient List Report

For the Missing and Unknown Patient List Report, the data is manipulated into a format that is ready for output.

For

the sake of discussion, a simplified version of the actual programming has been created.

We are using a PROC

PRINT that refers to data in the W9 directory.

The output file is an XML file that can be opened in Excel, and then

can be saved as an Excel Workbook.

The sample SAS code is included below.

SAMPLE SAS CODE

** include a version of the ExcelXP ODS tagset;

%include "&SAMPDIR.ExcelXP.sas";

** Close listing destination, and open ExcelXP destination;

ods listing close;

ods tagsets.ExcelXP path="C:\nccn\WUSS\output\" file="&outfile" style=journal;

**Set global options;

**Descriptions of the many tagset ExcelXP options can be found in

DelGobbo’s WUSS 2008 paper as well;

ods tagsets.ExcelXP options

(embedded_titles='yes'

suppress_bylines='yes'

Print_Header='&L&"Arial,Bold Italic"&10CONFIDENTIAL

&C&"Arial,Bold Italic"&10BCA Outcomes Project Missing and

Unknown Report: Patient Listing &RFor Internal Use Only'

&C&8June 17, 2009 &R&8&P'

Center_Horizontal='Yes');

** Create a new worksheet for each data table by using an ODS statement;

ods tagsets.ExcelXP options(sheet_interval='none' sheet_name='Demographics');

proc print data=w9.racial noobs label split='*';

var cid pid racebg;

title 'Demographics: Racial Background';

format racebg mu.;

label

cid='Center*ID'

pid='Patient*ID'

racebg='Racial*Background';

run;

** continue to print other Demographic table variables here…;

** switch to Health Insurance data and worksheet;

ods tagsets.ExcelXP options(sheet_interval='none' sheet_name='Insurance');

proc print data=w9.hichange noobs label split='*';

var cid pid assessid inschange;

title 'Insurance: Change in Health Insurance';

format inschange mu.;

label

cid='Center*ID'

pid='Patient*ID'

assessid='Assesssment*ID'

inschange='Change in*Health*Insurance';

run;

proc print data=w9.primaryt noobs label split='*';

var cid pid assessid ins1type;

title 'Insurance: Primary Insurance Type';

format ins1type mu.;

label

cid='Center*ID'

pid='Patient*ID'

assessid='Assesssment*ID'

ins1type='Primary Health*Ins Type';

run;

** continue to print other Health Insurance table variables here…;

** switch to Study Accession data and worksheet;

ods tagsets.ExcelXP options(sheet_interval='none' sheet_name='Study

Accession');

proc print data=w9.education noobs label split='*';

var cid pid edustat;

title 'Study Accession: Education Status';

format edustat mu.;

label

cid='Center*ID'

pid='Patient*ID'

edustat='Education*Status';

run;

** continue to print other StudyAcc table variables here…;

** Close ExcelXP destination and reopen the listing destination;

ods tagsets.ExcelXP close;

ods listing;

In summary, the use of ODS XML with tagset ExcelXP to create the desired Excel output is a much better method for

the patient listings of missing and unknown data elements.

We are able to group worksheets by data table and have

better control over what is output to each worksheet. Table 3 lists the pros and cons of choosing ODS XML with

tagsets ExcelXP for Excel reports.

Table 3. Pros and Cons for Choosing ODS XML with tagsets ExcelXP for Excel Reports

TAGSET EXCELXP Pros:

Creating template not required

Can accommodate multiple worksheets

Changes are made in SAS program

Will only create worksheets where there is output available

TAGSET EXCELXP Cons:

Must save as an Excel file

May be some format editing post-creation

CONCLUSION

There are many ways to move data from SAS to Excel.

Each has its advantages and disadvantages, and each is

quite useful depending on what the output actually contains and on the need of the client.

If a quick query is

requested, a simple ODS HTML output will do.

If the report needs to be a very specific set of tables or a specific

layout, then the DDE/template method will work quite well. If flexibility is important, but so also is style, then ODS

ExcelXP is a great option.

REFERENCES

1) Ottesen R,

Custom Outcomes Benchmarking Reports Using Dynamic Data Exchange, The Only Way To Go

Proceedings of the WUSS 2009 Conference, HOR-Ottesen.

2) Ter Veer A,

Reporting Quality of Cancer for the NCCN – A SAS Bridge to Excel

, Proceedings of the WUSS 2009

Conference, HOR-terVeer.

DelGobbo ,

Tips and Tricks for Creating Multi-Sheet Excel Workbooks the Easy Way with SAS

Proceedings of the WUSS 2008 Conference, HOW07.

DelGobbo,

Tips and Tricks for Creating Multi-Sheet Excel Workbooks the Easy Way with SAS

http://support.sas.com/rnd/papers/index.html#excel2008.

CONTACT INFORMATION

Your comments and questions are valued and encouraged.

Please contact the author at:

Julie Kilburn

559.783.9660

jkilburn@coh.org

City of Hope

1500 E. Duarte Road

Duarte, CA 91010

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS

Institute Inc. in the USA and other countries. ® indicates USA registration.

Other brand and product names are trademarks of their respective companies.

Univers
Ebooks
Livres audio
Presse
Podcasts
BD
Documents

Livre audio en ligne - Développement personnel Livre en ligne Tout le catalogue Tous les Intérêts

One of the reports that is run for quality assurance and audit purposes is our “Missing and Unknown

YouScribe

Le catalogue

Le service

Les conditions