April 20, 2011 Joint Committee Standards for Educational and Psychological Testing Attn: Dianne Schneider, Ph.D./Marianne Ernesto American Psychological Association 750 First Street NE Washington, DC 20002-4242 http://www.teststandards.net/ RE: Comments on the AERA, APA, NCME Standards for Educational and Psychological Testing (Revised Draft) Dear Dr. Schneider and Ms. Ernesto: The Equal Employment Advisory Council (EEAC) welcomes the opportunity to file the following comments on the Revised Draft of the forthcoming revisions to the 1999 Standards for Educational and Psychological Testing . EEAC would like to thank the Joint Committee for providing us with an opportunity to participate in this process. EEAC is a national nonprofit association of major employers formed in 1976 to promote sound approaches to the elimination of employment discrimination. EEAC’s membership is comprised of approximately 300 of the nation’s largest private sector companies, collectively providing employment to millions of people throughout the United States alone. EEAC’s directors and officers include many of the nation’s leading experts in the field of equal employment opportunity. Their combined experience gives EEAC an unmatched depth of knowledge of the practical, as well as legal, considerations relevant to the proper interpretation of fair employment policies and practices. EEAC’s members are firmly committed to the principles of equal employment opportunity. Because of EEAC’s members’ interest in employment selection testing, EEAC participated throughout the revision process that culminated in the 1999 Standards . EEAC is very grateful for the thoughtful consideration given to our comments at that time. We welcome the opportunity to participate once again. Our comments below are presented in Chapter order, with specific reference to the appropriate lines.
Joint Committee Standards for Educational and Psychological Testing April 20, 2011 Page 2 Introduction Lines 5-8
60 103-108 240-250
The Introduction appropriately takes a positive tone. The Introduction appropriately uses a positive tone to address the use of testing. All too often, and particularly in the employment context, testing is criticized unfairly, when indeed, a well-constructed test can be much more objective, above reproach, and free from accusations of bias than other selection methods. EEAC appreciates the Joint Committee’s efforts to convey “the positive effects of well-constructed tests, and that, properly developed and conducted, testing is a proper, and frequently superior, method of informing decisions about employment, selection, placement, and promotion. The draft Standards appropriately convey the primary role of professional judgment. Here and throughout the draft, the professional judgment of the expert involved in developing and implementing test use is recognized as a key component of the process. EEAC strongly supports the inclusion of such references to professional judgment. The draft Standards correctly distinguish between assessment techniques that are properly subject to the Standards and those that are not. EEAC strongly supports the draft’s confirmation that “it would be overreaching expectations to require individuals making day-to-day decisions using informal assessment techniques, e.g. job interviews and performance evaluations, to follow the Standards. The language in this section appropriately conveys a proper scope for the Standards . The draft Standards properly do not seek to express or interpret legal requirements. Except as noted below, the draft takes the correct approach for a document outlining professional standards that coexist with legal requirements. By noting for informational purposes that there are legal requirements regarding a certain point, but not seeking to interpret those requirements, the draft maintains its integrity as a professional standards document. The draft properly states that legal advice should be sought where appropriate. EEAC strongly supports the draft’s approach.
Joint Committee Standards for Educational and Psychological Testing April 20, 2011 Page 3 Chapter 1. Validity Lines 249 333-382 440-499
The point that “differential item functioning is not always a flaw or weakness is an important one that should be retained. As we said in our comments on the 1999 Standards , EEAC is very concerned that the language in the draft could be read to imply that studies of differential item functioning are sometimes required. The research literature does not support a requirement that such studies be conducted. Ideally, EEAC would like to see this reference deleted entirely. That said, the discussion on differential item functioning does correctly point out that it is “not always a flaw or weakness. As thedraft goes on to explain, there may be legitimate reasons why differential item functioning occurs. EEAC supports this statement and recommends that it be included in the final Standards if the discussion of differential item functioning appears there. The discussion of validity generalization is critical and should be retained. EEAC strongly supports the draft’s discussion of the value and utility of validity generalization. We note that the discussion contains only minor changes from the 1999 Standards . EEAC respectfully requests that this critical discussion be retained in its entirety The draft’s revisions to the discussion of unintended consequences provide additional clarity and information, but additional revisions are recommended. EEAC observes that the draft’s discussion of unintended consequences contains some significant revisions. In EEAC’s view, the revisions generally provide considerable clarity and add helpful information. The key points in this discussion are that (1) “it is important to distinguish between evidence that is directly relevant to validity and evidence that may inform decisions about social policy but falls outside the realm of validity, and (2) “[a]lthough information about the consequences of testing may influence decisions about test use, such consequences do not in and of themselves detract from the validity of intended interpretations of the test scores. As revised, the draft generally explains and illustrates these points much better than the 1999 Standards did, and EEAC supports the revisions, with one key exception.
Joint Committee Standards for Educational and Psychological Testing April 20, 2011 Page 4
As we pointed out in our comments on the 1999 Standards , EEAC is extremely concerned with the adoption of the legal concept of “adverse impact in the Standards. This legal theory, which not incidentally addresses test consequences, has no place in a technical standards document. While the phrase “differential impact was removedfrom earlier drafts of the 1999 Standards , as we recommended, the concept remains in the use of the phrase “unintended consequences. As employers, EEAC’s member companies have a strong commitment to avoiding both intentional and unintentional discrimination in employment on the basis of irrelevant characteristics. As employers subject to federal, state and local antidiscrimination laws, and as federal contractors subject to contractual pledges to ensure equal opportunity and take affirmative action, EEAC’s members are subject to various legal and contractual requirements for monitoring their selection procedures to identify areas in which those procedures may have a disparate impact on individuals who are members of a protected class. Accordingly, the issue of whether the use of a particular element of a selection procedure, such as a test, or particular tests within a test battery, disproportionately select one group over another is a matter of extreme importance to EEAC’s members. When this occurs, EEAC’s members attempt to ensure that the selection procedure meets appropriate legal requirements such as those embodied in Title VII of the Civil Rights Act of 1964, 42 U.S.C. § 2000e et seq. , as amended. Such considerations, while extremely important to employers, are simply out of place in a technical standards document, unless and until research identifies a technical basis for their occurrence. Thus, we agree with the statement at lines 468-470 that “If a test covers most of the relevant content domain but omits some areas, the content coverage might be judged adequate for some purposes. The next statement, however, at lines 470-474, that “if it is found that excluding some components has a noticeable impact on selection rates for groups of interest (e.g., subgroup differences are found to be smaller on excluded components than on included components), the intended interpretation of test scores as predicting job performance in a comparable manner for all groups of applicants would again be rendered invalid, ventures too far into the realm of legal requirements. Moreover, the statement does not follow logically from the previous example (as the word “again would suggest) and also contradicts the key principle that test consequences do not in and of themselves detract from the validity of intended interpretations of the test scores. EEAC strongly recommends that this sentence be deleted.
Joint Committee Standards for Educational and Psychological Testing April 20, 2011 Page 5 Standard 1.2 564-566 As noted previously, EEAC strongly supports the draft’s retention of the critical discussion of validity generalization. The parallel endorsement here should be retained as well. Chapter 3. Fairness in Testing Lines 1-655
EEAC supports the Joint Committee’s decision to combine discussions of fairness and test modifications into a single chapter EEAC agrees with the Joint Committee’s decision to combine the formerly separate chapters on Fairness in Testing and Test Use, Testing Individuals of Diverse Linguistic Backgrounds, and Testing Individuals with Disabilities into a single chapter. For professionals as well as end users, the concepts embodied in these chapters are so interrelated as to cause confusion and potential conflict when treated separately. At the same time, EEAC strongly urges the Joint Committee to consider and maintain, throughout this Chapter, the appropriate balance between the needs of the test taker and the needs of the organization sponsoring the employment test. The draft’s definition of “fairness is succinct and helpful. The draft states that, “A test that is fair does not unduly advantage or disadvantage certain test takers because of individual characteristics that are irrelevant to the construct being measured. For employers who are the end users of employment selection tests, it is critical that the tests they depend upon are designed and constructed, as much as possible, to minimize construct-irrelevant individual characteristics. Accordingly, the Standards’ emphasis on this point is helpful and should be maintained. The draft properly acknowledges that test modification affects validity, but inappropriately suggests that significant alterations to standardized procedures, " or even to the construct to be measured are required. This draft makes the following statement: “At the same time, being sensitive to the needs of individuals from relevant subgroups raises an important tension in achieving validity and fairness: on the one hand, standardized procedures traditionally have been fundamental to achieving comparability of score meaning from one individual to the next, a foundational principle of test validity and
Joint Committee Standards for Educational and Psychological Testing April 20, 2011 Page 6
fairness. However, responding to the needs of some individuals and subgroups may require some alteration to standardized procedures or even to the construct to be measured. These issues are expanded upon in the sections below. In so doing, the draft properly recognizes that standardized procedures are critical to both validity and fairness. We respectfully suggest that this principle should guide the rest of the chapter, at least as it pertains to employment. Contrary to this principle, however, the draft goes on to state that modifying standardized test procedures or even the construct to be measured may be required in order to meet the needs of some test-takers. This assertion is inappropriate on several levels, at least in the employment context. First, such a broad statement is likely to mislead the lay reader to believe that testing professionals condone setting aside standardized procedures, validity principles, and even the construct being measured, in favor of making modifications for some individuals. Second, this language could, and likely will, be used by lawyers representing plaintiffs in employment cases to argue that an employer failed to follow the applicable technical standards by not making significant modifications. EEAC strongly recommends that the final Standards recognize and acknowledge the substantial differences between the purposes of accommodations in employment testing and those in other settings, e.g. educational and diagnostic testing. Moreover, employers are subject to Title I of the Americans with Disabilities Act, 42 U.S.C. § 12101 et seq. , including specific requirements regarding testing accommodations. Ideally, as we said in our comments on the 1999 Standards , as EEAC strongly recommends that the final Standards simply urge full compliance with the ADA and otherwise exempt employment testing applications entirely from any standards or discussion on disability-related accommodations. The test developer should offer guidance to the test user on making accommodations. The section on “Fairness in Treatment During the Testing Process states that “Although standardization has been a fundamental principle . . . sometimes flexibility is necessary to provide essentially equivalent opportunities for some test takers. The draft goes on to state that accommodations themselves should be standardized. As we mentioned in our comments on the 1999 Standards , EEAC remains skeptical that standardized accommodations are even possible, much less lawful. EEAC recommends that, at least for tests designed to be used in the employment context, that the Standards require the test developer to provide the user with considerable guidance on how to make accommodations, with emphasis on methods for making accommodations without jeopardizing validity.
Joint Committee Standards for Educational and Psychological Testing April 20, 2011 Page 7
Further, the draft goes on to make the important point, drawn from the 1999 Standards , that accommodations may be required by law in some situations and prohibited in others. This significant point should be retained in the final version. In addition, the statement in the 1999 Standards that “In all cases, standardized procedures should be followed for all examinees unless explicit, documented accommodations have been made has been replaced wtih the very different statement that “Where testing accommodations are necessary and permitted, they should be carefully documented by the test developer and adhered to by the test administrators. EEAC recommends that the statement from the 1999 Standards be included in the revised version as well. Preserving alternative methods for investigating possibilities of measurement bias is particularly important in the employment context. At the outset, EEAC notes, as we did in our comments on the 1999 Standards , that in employment testing, there is little reason to suspect a well-constructed measure just because there is a mean difference among groups. Guion, R.M. (1998) Assessment, measurement and prediction for personnel decisions . A mean group difference in test scores alone should not trigger the need for an investigation of bias. Moreover, As we mentioned in our comments on Chapter 1: Validity , that Chapter correctly points out that differential item functioning is “not always a flaw or weakness. EEAC recommends that this acknowledgement also be included in this Chapter. That said, EEAC agrees with the statement that “Although it is important to guard against the possibility of measurement bias for all relevant subgroups in the intended test population, it may not be feasible to fully investigate all possibilities, assuming that it means that while et st developers should guard against construct-irrelevant variations among subgroups, an exhaustive investigation may not be possible. In our view, this concern is particularly applicable in the employment context. We recommend that this statement, followed by the clause, “particularly in the employment context be included in the final Standards . Further, EEAC agrees that small sample size maybe one of the reasons that such investigation would not be feasible, and that prior research, construct-based rationale and/or data from similar tests may be used to address concerns of bias in measurement. We recommend that this discussion be included in the final Standards . The acknowledgment that certain accessibility concerns are more prominent in educational and psychological testing is significant and should be retained. EEAC notes and appreciates the statement that “The dictate that all intended test takers have a full opportunity to demonstrate their standing on the construct being
Joint Committee Standards for Educational and Psychological Testing April 20, 2011 Page 8
measured has given rise, particularly in educational and psychological testing , to concerns for accessibility in testing. This language properly reflects our concern that the Standards demonstrate recognition that there is a significant difference between the purpose of employment testing and that of educational or psychological testing. We recommend that this proviso be included in the final Standards . The draft properly excludes any discussion of equality of outcome. The current draft contains the critical statement that “In closing this section on the meanings of fairness, note that the Standards measurement perspective explicitly ’ excludes one common view of fairness in public discourse: fairness as the equality of testing outcomes for test taker subgroups defined by race, ethnicity, gender, disability, or other characteristics. Indeed, it would be wholly inappropriate for a technical standards document to require equality of outcome, or even to imply that equality of outcome is a proper goal, particularly in the employment context. EEAC further endorses and appreciates the concluding statement that “group differences in outcomes do not in themselves indicate that a testing application is biased or unfair. We strongly recommend that both of these statements be included in the final draft. The draft properly exempts employment testing from the discussion of “differentially interesting material. The draft states that “ Except in employment testing where relevance to the job is the primary concern , material that is likely to be differentially interesting should be balanced to appeal broadly to the targeted testing population. The employment testing exception to this statement is critical and should be retained in the final version. If non-standardized procedures are to be used, the test developer should be required to provide substantial guidance for the end user. In the discussion of Construct Irrelevant Components of Test Content, the draft states that one way of making test content more accessible would be by “ providing extended administration time when speed is not relevant to the construct being measured. . . . As noted earlierin these comments, as a representative of end users, EEAC is extremely concerned with the potential impact of non-standardized procedures not only on the validity of the inferences to be drawn from test scores, but also on the usability of scores obtained under non-standardized conditions. EEAC strongly recommends that where non-standardized procedures are contemplated, test developers be required to provide substantial guidance for the end user on both of these issues. Indeed, where non-
Joint Committee Standards for Educational and Psychological Testing April 20, 2011 Page 9 364-66 469-484
standardized procedures would compromise validity, reliability, or usability, the test materials should discourage their use and explain the reasons for doing so. Language and vocabulary specific to workplace and employment testing is properly excluded from a general discussion of concerns over language proficiency. In a discussion on language issues in testing, the draft properly cautions, “Note that this concern may not apply when the construct of interest is defined as a particular kind of language proficiency (e.g., academic language of the kind found in text books, language and vocabulary specific to workplace and employment testing ). This exclusion is correct and should be retained in the final version. The Standards should require test developers to maintain comparability. The discussion of Test Accommodations: Comparable Measures of the Intended Construct properly acknowledges that comparability of scores is “the defining feature of test accommodations. If test scores obtained under accommodated conditions are not comparable to those obtained under standardized conditions, the usability of the test itself will be compromised severely. Once again, EEAC strongly recommends that the final Standards require test developers to provide substantial guidance for test users regarding making accommodations without jeopardizing either validity or utility In particular, as noted at lines 498-501, any adaptation of the test offered as an accommodation should, and indeed must, maintain the intended construct measured by the test for the scores to be considered comparable. A test modification that changes the intended construct yields a different test. The discussion of Test Modifications: Non-Comparable Measures of the Intended Construct properly acknowledges that if a test modification results in changing the intended construct, the result is, in fact, a different test. EEAC recommends that this discussion be included in the final version. The Standards should maintain that in the employment setting, modifications that would alter the constructs being measured are not appropriate. The discussion of Appropriate Use of Accommodations or Modifications properly notes that “Depending on the construct to be measured and test purpose, there are some testing situations where accommodations are not needed or modifications are not appropriate. It notes that “English language skills or a disability may, in fact, be directly relevant to the focal constructand gives the following crucial example: For example, in employment testing, it would be inappropriate to make changes to the test if the test is designed to assess essential skills required for the
Joint Committee Standards for Educational and Psychological Testing April 20, 2011 Page 10
job and the test changes would fundamentally alter the constructs being measured; a customer service work sample for a job that requires fluent communications in English would not be translated into another language. This critical point, and the example, should be retained in the final version. EEAC further agrees with the statement that “Professional judgment necessarily plays a substantial role in decisions about changes to the test or testing situation. Once again, EEAC strongly recommends that the final Standards require test developers be required to provide substantial guidance with respect to potential changes to the test or testing situation. At least in the employment context, where such changes would compromise validity, reliability, or usability, the test materials should discourage their use and explain the reasons for doing so. The final Standards should require test developers to provide procedures and guidance for using the results of altered tests. The draft properly acknowledges the important contextual differences between the individualized use of tests, and group or large-scale testing, such as that done in the employment context. The draft also notes appropriately that “alteration should not put those taking an altered test at an undue advantage over those tested under regular conditions. Further, EEAC strongly agrees with the statement that “There is a need for clear, well documented procedures for making decisions about the selection and use of alterations for the assessment. Decision makers should be knowledgeable about the effects of the characteristics of the individual and the alterations on performance and should seek psychometric advice when necessary. The final Standards should retain this language, requiring test developers to supply such procedures with ample guidance for the test user. Test developers should design tests in a way that minimizes or even eliminates any need to modify either the test or testing conditions. The discussion on draft Standard 3.1 highlights the need to address concerns of construct-irrelevant barriers as part of the design process. In particular, EEAC strongly supports the statement that “The test design process should focus on minimizing the need to change the test or testing conditions to accommodate the characteristics of specific individuals that may interfere with the construct that the assessment is measuring. This statement should beincluded in the final Standards . Language proficiency concerns should be addressed at the design stage.
Joint Committee Standards for Educational and Psychological Testing April 20, 2011 Page 11
710-711 718-721 747-757
EEAC fully agrees that linguistic concerns are best addressed during the design process. In particular, EEAC agrees with the statement in the commentary that “The level of language proficiency required by the test should be kept to the minimum required to meet work and credentialing requirements and/or to represent the target construct(s). In work situations, the modality in which language proficiency is assessed should be comparable to that required on the job. In the employment context, employers who useselection tests seek instruments that will test the target construct. These tests should be designed intentionally to minimize language proficiency effects that are construct-irrelevant. Test developers should address the appropriateness of tests for different subgroups at the design stage and provide relevant information to the test user. EEAC strongly agrees that test developers should evaluate the appropriateness of tests for individuals from different subgroups at the design stage and seek to eliminate any problems they find. Subgroup analysis may be delayed until sufficient data is available. EEAC strongly agrees that where sample sizes initially do not permit subgroup analysis, operational results may be collected and maintained so that the analysis may be performed once sufficient data is available. The technical manual should document the test developer’s efforts to minimize construct irrelevant barriers and provide administration, scoring and reporting procedures. EEAC strongly agrees that in the technical manual, test developers should document how construct irrelevant barriers were minimized in the test development process, report any studies done regarding reliability, validity, and comparability of scores, and provide test administration and test scoring and reporting procedures. This discussion should be included in the final version of the Standards . The test developer must shoulder the responsibility to investigate possible causes of subgroup mean differences. As we noted in our comments on the 1999 Standards , the term “subgroup mean differences is another way of expressing the legalconcept of “adverse impact and as such does not belong in a technical standards document. That said, EEAC agrees with the statement that “Subgroup mean differences do not in and of themselves indicate lack of fairness. EEAC also agrees that where subgroup mean differences occur, a test developer has the responsibility to identify possible