Tuesday, December 2, 2008

A response to a question about the possible uses of benchmark data

A recent comment to one of our prior posts asked some questions that I thought warranted at least a couple of posts. The questions concerned whether or not it was desirable to take formative and benchmark data that were originally implemented as a means to inform instruction and apply them to a different purpose such as the assignment of grades or retention decisions.

The reason that one might want to do this sort of thing is pretty clear. It’s an issue of efficiency. Why use all kinds of different tests for retention and grading when you already have all of this data available online? Even though it was intended to guide instruction, why not make use of benchmark data for a grade or a placement? The short answer to this question is that it is generally not a good idea to use a test for purposes other than the purposes for which it was designed. Benchmark assessments are designed to provide information that can be used to guide instruction. They are not designed to guide program placement decisions or to determine course grades.

Evaluating the possible use of benchmark data for program placement or for grading would require careful consideration of the validity of the data for these purposes. It is worth noting that the requirement to address the intended purpose of the assessment was not always a part of the process of establishing test validity. This may have contributed to the lasting tendency to ignore the question of whether or not a test is valid for purposes other than those that the test was intended to serve. There was a time when questions regarding test validity focused heavily on how well the test correlated with other measures to which the test should be related. Sam Messick argued that validity discussions must consider the use of the test score. He indicated that consideration must be given to both the validity of the score as a basis for the intended action and what sort of consequences resulted from the scores use for that purpose. Today the National Council on Measurement in Education (NCME) includes in its Standards for Educational and Psychological Tests and Manuals the notion that the use and the consequences, both intended and otherwise, of that use of a test score must be considered in evaluating validity.

Applying this notion to the use of benchmark and formative data for grades, the question that would need to be asked is whether the scores that are produced actually serve the intended purpose of the score on the report card. Report card grades are typically designed as a “bottom line measure” of the best information available on a student’s knowledge and skills related to a given body of instructional content following a defined period of instruction. This sort of a summative function will, in many cases, dictate a different selection of standards and items than one would choose if the goal were to provide a score that was intended to guide subsequent instruction. For example, a test designed to guide instruction might include a number of questions on a concept that had been introduced prior to test administration, but was intended to be a focus of later lessons. Students would be quite justified in considering this unfair on a test used for a grade. Tilting the balance in the other direction would limit the usefulness of the test as a guide for instruction.

The case against using benchmark assessments as program placement tests is particularly clear. As the author of the comment rightly points out, retention is a high stakes decision. The consequences to a student are extremely high. Moreover, the consequences may not be perceived by students and parents as being beneficial. For example, a student who has learned that he or she is being retained in grade based on benchmark test results may not be happy about that fact. Because grade retention is a high stakes issue, it would be reasonable for the student’s parents to question the validity of the placement decision. Benchmark tests are designed to provide valid and reliable data on those standards that are the focus of instruction during a particular time period. They are not intended to provide an all encompassing assessment of the student’s mastery of all those standards that they are expected to know throughout the school year. Moreover, validation of benchmark assessments does not generally include evidence that benchmark failure one year is very likely to indicate failure the following year. Thus, the parent would have good reason to challenge the placement decision.

1 comment:

Michael Verola Ph.D. said...

I love using the Galileo tool, because it allows me to chart growth in my 3-4 year old pre-k children. Michael Verola, Ph.D.