8 pages

English

USES OF BENCHMARK TESTS latest ss

Ornyo - E Su

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

8 pages

English

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

The Uses of Benchmark Tests to Improve Student Learning Hardin Daniel and Betsy Wheeler ThinkLink Learning/Discovery Education © 2006 Hardin Daniel and Betsy WheelerThe Uses of Benchmark Tests to Improve Student Learning Michael K. Smith and Jacqueline Shrago ThinkLink Learning/Discovery Education The use of formative assessment to improve student learning has gained support in recent years from meta-analyses of research studies showing its effectiveness (Black & Wiliam, 1998; Black, Harrison, Lee, Marshall, & Wiliam, 2003) and theoretical discussions articulating its principles (Sadler, 1989). Conversely, there are ongoing disputes about the definition or definitions of formative assessments. Recently, the Council of Chief State School Officers formed a special interest group to sort out these various constructs. Benchmark tests can be considered an aspect of formative assessments. Shrago and Smith (2006; see also Smith, 2006) presented evidence that online benchmark tests dramatically improved student learning. Herman and Baker (2005) outlined six criteria for “Making Benchmark Testing Work.” These six criteria should apply to both commercial and school produced benchmark tests according to Herman and Baker. ThinkLink Learning/Discovery Education has pioneered the use of high quality benchmark testing to help schools improve student learning as defined state content standards and as measured by state ...

Informations

Publié par	Ornyo
Nombre de lectures	83
Langue	English

Extrait

The Uses of Benchmark Tests to Improve Student Learning

Hardin Daniel and Betsy Wheeler

ThinkLink Learning/Discovery Education

The Uses of Benchmark Tests to Improve Student Learning

Michael K. Smith and Jacqueline Shrago

ThinkLink Learning/Discovery Education

The use of formative assessment to improve student learning has gained support in recent

years from meta-analyses of research studies showing its effectiveness (Black & Wiliam,

1998; Black, Harrison, Lee, Marshall, & Wiliam, 2003) and theoretical discussions

articulating its principles (Sadler, 1989).

Conversely, there are ongoing disputes about

the definition or definitions of formative assessments.

Recently, the Council of Chief

State School Officers formed a special interest group to sort out these various constructs.

Benchmark tests can be considered an aspect of formative assessments.

Shrago and

Smith (2006; see also Smith, 2006) presented evidence that online benchmark tests

dramatically improved student learning. Herman and Baker (2005) outlined six criteria

for “Making Benchmark Testing Work.”

These six criteria should apply to both

commercial and school produced benchmark tests according to Herman and Baker.

ThinkLink Learning/Discovery Education has pioneered the use of high quality

benchmark testing to help schools improve student learning as defined state content

standards and as measured by state assessments.

This article addresses how ThinkLink

benchmark tests address each of the six criteria outlined by Herman and Baker.

Furthermore, this article outlines how high quality, commercially developed benchmark

tests can help schools improve student learning and address accountability issues raised

by such laws as the No Child Left Behind Act.

Alignment

Herman and Baker suggest, “Unless benchmark tests reflect state standards and

assessments, their results tell us little about whether students are making adequate

progress toward achieving the standards and performing well on the assessment.”

They

also caution against benchmark tests that “mimic the content and format of annual state

tests” without also addressing state standards.

ThinkLink Learning/Discovery Education benchmark tests are aligned with state

standards.

Benchmark test development begins by examining the curriculum standards in

each state, particularly the larger objectives articulated in these standards.

Then the

standards that are most frequently assessed are also examined.

The resulting benchmark

tests are a synthesis of the important state standards that are listed in curriculum standards

and measured on the state’s assessment program.

For benchmark tests to be helpful to teachers, they must provide useful, timely

information that can be utilized in a reasonable amount of time.

Benchmark tests must

not interfere with instruction.

At the same time, benchmark tests must truly assess the

most crucial skills.

All ThinkLink/Discovery benchmark tests are designed to be

administered within a single class period.

Results are provided immediately for online

users and within two weeks for students who take paper-based versions of these tests.

ThinkLink/Discovery benchmark tests are completely aligned with state standards and

assessments in the following states:

Alabama, Florida, Illinois, Indiana, Kentucky,

Michigan, New York, North Carolina, Ohio, Tennessee, Virginia, and Wisconsin.

ThinkLink/Discovery also produces benchmark assessments that blend state criterion-

referenced standards and norm-referenced standards in the following states: Arkansas,

Colorado, Mississippi, Missouri, New Mexico, Oklahoma, South Carolina, and West

Virginia.

One goal of ThinkLink/Discovery is the development of high quality benchmark tests

aligned to the state standards in all states.

Mapping Content

Herman and Baker suggest that “The alternative to aligning benchmark tests with the

specific content and format of state assessments is to align them with priority content and

performance expectations implicit in state standards.”

This is not been necessary for

ThinkLink/Discovery’s approach for the 12 states that are fully aligned.

This is precisely

the approach that has been used for the ‘blended tests’ available in the other 8 states

described above, however.

Conversely, Herman and Baker suggest that if schools did develop their own benchmark

tests, these tests could map “standards in terms of specific content knowledge that

students need to acquire” such as the “four cognitive levels suggested by Norman Webb

(1997).”

These four levels include recall, conceptual understanding, problem solving,

and extended and strategic thinking.

While this approach can be useful for determining

long-term curriculum changes, teachers generally desire a state-level standards

comparison for current requirements to adequately teach to the standards.

Since the inception of ThinkLink Learning in 2000, ThinkLink/Discovery benchmark

tests have addressed levels of cognitive complexity such as those articulated by Bloom

(Bloom et al.,

1956; Bloom et al

1971; Anderson et al

2001).

Furthermore, levels of

understanding and depths of knowledge articulated by specific state standards have also

been addressed (see, for instance, information on Kentucky benchmark tests).

During the

2006-2007 school year, ThinkLink/Discovery won the contract to develop the benchmark

testing program for the Milwaukee City Schools grades 3-9 in the subject areas of reading

and mathematics.

All of these assessments are aligned with the principles articulated by

Webb.

Focusing on Big Ideas

Herman and Baker note, “By incorporating the key principles that underlie state or

district standards into benchmark assessments, educators have a reasonable strategy for

addressing the breadth of these standards.”

To incorporate this suggestion into

benchmark testing, the definition of “key principles” of “big ideas” must be addressed.

ThinkLink/Discovery benchmark tests have always focused on the key principles

articulated in state standards and measured by a state’s assessment program.

The

benchmark tests are designed to provide a snapshot of student performance on these key

principles.

When the “key principles” have been articulated by a specific school district,

ThinkLink/Discovery has helped that district develop benchmark tests to address its

specific priorities (see information on Jefferson County Public Schools in Kentucky, for

instance.)

More recently, these “key principles” for mathematics and science have been articulated

by national organizations and research groups such as the National Council of Teachers

of Mathematics Curriculum Focal for Mathematics (NCTM, 2006) and the National

Research Council’s Taking Science to School: Learning and Teaching Science in Grades

K-8 (Duschl, Schweingruber, & Shouse, 2006).

Herman and Baker also suggest “Despite the ease of scoring multiple-choice items,

benchmark tests should employ many different formats to enable students to reveal the

depth of their understanding.”

Currently, ThinkLink/Discovery benchmark tests are

strictly multiple-choice.

This is primarily a result of financial limitations in schools and

districts paying for scoring of non-multiple choice items. On the other hand, practice

items for these benchmark tests have always included open-ended items with sample

scoring rubrics and student examples when appropriate.

The practice pool allows

teachers to help students deepen their understanding of key concepts through practice in a

variety of formats.

Diagnostic Value

Herman and Baker note, “A test has diagnostic value to the extent that it provides useful

feedback for instructional planning for individuals and groups. A test with high

diagnostic value will tell us not only whether students are performing well but also why

students are performing at certain levels and what to do about it.”

ThinkLink/Discovery benchmark tests provide detailed feedback on student performance

in a format that is easily used by teachers, parents, and even students. Further, these

benchmark tests are released for teacher and student use after administration. Unlike most

standardized tests, these benchmark test questions can be examined by teachers and

students. Therefore, the diagnostic value is increased as students and teachers discuss

correct and incorrect responses.

Feedback reports provide a variety of diagnostic information.

Diagnostic information is

summarized for every state objective and subskill at the class and student level. Teachers

use the test items and the reports to provide immediate feedback to students and

determine the specific concept within a subskill that students are not mastering.

The

unique advantage of the ThinkLink/Discovery approach is that parents, students, and

teachers become colleagues in analyzing test items.

Self assessment and peer assessment

methods are used to understand why students selected certain correct and incorrect

responses.

Feedback analysis and utilization are demonstrated and supported through

such services as professional development sessions, support telephone calls, manuals and

collectively as classroom teachers, parents, and administrators examine the reports.

Finally, professional development activities constitute an integral part of the use of these

benchmark tests. No professional development is needed to administer the tests.

Teachers know how to do this.

However, once the data is available (immediately for

online testing or within a few days for paper administration), teachers often collaborate

with a ThinkLink/Discovery professional to devise classroom teaching strategies to

overcome the prominent gaps.

This professional development is typically organized

within a school during a class period, with one grade group team meeting together.

This

structure allows flexibility of embedding the professional development into the learning

year and also fosters ongoing learning communities within the school. Professionals work

together to examine what works and what doesn’t work.

Such sharing continues after the

professional development day in meetings with an instructional leader.

Evaluations

support the effectiveness of these professional development activities.

Fairness

As Herman and Baker suggest, “Fair benchmark tests provide an accurate assessment of

diverse subgroups.”

All ThinkLink/Discovery benchmark test items undergo standard

fairness reviews to eliminate bias related to gender, race, and other categories

Teachers

are encouraged to apply high-stakes accommodations to the formative area, including

time allowances and individual testing.

Large print options are provided at no cost for

any student. If requested, ThinkLink/Discovery produces Braille tests and audio

recordings.

These accommodations are appropriate but some schools find the costs to be

burdensome.

Schools that have invested in software such as Read, Write, Gold, can

routinely convert a ThinkLink/Discovery test into an approach to address most testing

accommodations.

Technical Quality

Herman and Baker note, “Tests with high technical quality provide accurate and reliable

information about student performance.”

From its beginning, ThinkLink/Discovery has

developed benchmark tests that address the Standards for Educational and Psychological

Testing (1999).

All benchmark tests have the following technical characteristics:

Content Validity:

Subject matter experts align all tests with state standards.

Reliability:

All tests have reliabilities of .85 or greater as measured by

Cronbach’s alpha.

Criterion Validity:

Correlation of scores on benchmark tests with scores on state tests

provide evidence that these tests predict performance on key

indices reported on state assessments.

Predictive Validity: Benchmark tests also predict proficiency levels reported on state

tests.

Equated Tests:

Scores on benchmark tests are equated across time periods.

ThinkLink/Discovery is working with the Milwaukee City School System to develop

benchmark tests for grades 3-9 in reading and mathematics. These benchmark tests will

be scored using the Rasch measurement model (Wright and Stone, 1979) and will be

horizontally and vertically equated across all grades.

This project is the first of its kind to

use state-of-the-art item response theory measurement techniques to develop equated and

linked benchmark tests across multiple grades.

Utility

Herman and Baker define utility as “the extent to which intended users find the test

results meaningful and are able to use them to improve teaching and learning.”

Furthermore, “To make benchmark tests useful, schools must put the results in intended

users’ hands quickly and train them to interpret the information correctly.”

ThinkLink/Discovery benchmark tests provide timely reports. Results of online

benchmark tests are available immediately.

ThinkLink/Discovery scores and reports

results within two weeks for paper-based benchmark assessments.

Professional development activities train teachers to interpret the information from

benchmark tests.

(See comments above on typical professional development

incorporated into the ThinkLink/Discovery system.)

Feasibility

Herman and Baker note, “Benchmark testing should be worth the time and money that

schools invest in it.” ThinkLink/Discovery has helped schools by developing benchmark

tests that can be used by schools without the additional costs of development.

Finally, Herman and Baker suggest that to make benchmark testing worthwhile,

“educators ultimately need to look at the results . . . . Are they actually improving student

learning?”

ThinkLink/Discovery routinely implements research studies that demonstrate

how the use of these benchmark tests improve student performance on state assessments

and enable schools to meet AYP criteria.

Final Comments

The use of formative assessments is being widely researched in many educational

settings.

The uses of benchmark tests by ThinkLink/Discovery offer one method for

schools to quickly and efficiently administer benchmarks that are high quality

assessments of student learning.

These assessments provide valuable feedback for

teacher, parent, self, and peer feedback to facilitate student learning.

References

Anderson, L. W., Krathwohl, D. R., Airasian, P. W., Cruikshank, K. A., Mayer, R. E.,

Pintrich, P. R., Raths, J., and Wittrock, M. C. (Eds.) (2001).

A taxonomy for

learning, teaching, and assessing:

A revision of Bloom’s Taxonomy of

Educational Objectives. Abridged Edition.

New York:

Longman.

Black, P., and Wiliam, D.

(1998). Assessment and classroom learning.

Assessment in

Education:

Principles, Policy, and Practice, 5

(1), 7-73.

Black, P., Harrison, C., Lee, C., Marshall, B., and Wiliam, D.

(2003).

Assessment for

learning:

Putting it into practice

Maidenhead, UK:

Open University Press.

Bloom, B. S. (Ed.), Engelhart, M. D., Furst, E. J., Hill, W. H., and Krathwohl, D. R.

(1956).

Taxonomy of educational objectives:

Handbook I: Cognitive domain

New York:

David McKay.

Bloom, B. S., Hastings, J. T., and Madaus, G. F. (1971).

Handbook on formative and

summative evaluation of student learning.

New York:

McGraw-Hill.

Duschl, R. A., Schweingruber, H. A., and Shouse, A. W. (Eds.) (2006)

Taking science to

school:

Learning and teaching science in grades K-8.

National Research

Council.

Washington, D.C.:

National Academies Press.

Herman, J. L., and Baker, E.L. (2005). Making benchmark testing work.

Educational

Leadership,

63, 3, 48-54.

National Council of Teachers of Mathematics.

(2006).

Curriculum focal points for

prekindergarten through grade 8 mathematics:

A quest for coherence.

Reston,

VA:

NCTM.

Sadler, D. R. (1989). Formative assessment and the design of instructional systems.

Instructional Science, 18,

199-144.

Shrago, J. B., and Smith, M. K. (2006). Online assessments in the K-12 classroom:

formative assessment model for improving student performance on standardized

tests.

In S. L. Howell and M. Hricko (Eds.),

Online assessment and

measurement:

Case studies from higher education, K-12, and corporate

(p. 181-

194). Hershey, PA:

Information Science Publishing.

Smith, M. K. (2006).

How can a large scale formative assessment be research-based

and valid?

Paper presented at the CCSSO conference, San Francisco, CA.

Standards for educational and psychological testin

g. (1999).

Washington, D. C.:

American Educational Research Association.

Webb, N. L. (1997).

Criteria for alignment of expectations and assessments in

mathematics and science education

. Madison, WI:

University of Wisconsin,

National Institute for Science Education.

Wright, B. D., and Stone, M. H. (1979).

Best test design

Chicago:

MESA Press.

The following ThinkLink research articles provide more detailed information on the

technical aspects of these benchmark tests:

What is Predictive Assessment in Tennessee?

What is Predictive Assessment in Kentucky?

What is Predictive Assessment in Alabama?

What is Predictive Assessment in Florida?

What is Predictive Assessment in Illinois?

Improving Learning in Birmingham: A Controlled Group Comparison

Research with Multi-State Examples

Research example: IL

Research example: TN

What Research is Available on ThinkLink Learning?

How Can a Large Scale Formative Assessment Be Research-Based and Valid?

Peer Review:

ThinkLink’s tests and results have been incorporated and analyzed in the following Peer

reviewed work:

Dr. Elizabeth Vaughn-Neely,

Associate Professor, Dept. of Leadership & Counselor

Education, Ole Miss University

eiv@olemiss.edu

662-915-5771

Dr. Marjorie Reed

, Associate Professor, Dept. of Psychology, Oregon State University

Peer reviewed finding based on their joint research and work with schools using

ThinkLink Learning were presented at:

Society for Research & Child Development, Atlanta, GA. April 2005

Kappa Delta Pi Conference, Orlando, FL November 2005

Two dissertations for Ed.S studies have been published:

Dr. Juanita Johnson, Union University

johnsonj4@k12tn.net

Dr. Monica Eversole, Richmond KY

meversol@madison.k12.ky.us

Univers
Ebooks
Livres audio
Presse
Podcasts
BD
Documents

Livre audio en ligne - Développement personnel Livre en ligne Tout le catalogue Tous les Intérêts

USES OF BENCHMARK TESTS latest ss

YouScribe

Le catalogue

Le service

Les conditions