The last post that I put up discussed some of the purposes of assessment. It was written in response to a comment that we received relating to the ways that one might use assessment data and the wisdom of using an assessment for something that it was not designed to accomplish. As we discussed, this is something that must be approached with caution as an assessment that is valid for one purpose might not be valid for another.
With this post I am going to use that question as a springboard for a broader discussion which will hopefully address an issue that has been raised to us by our clients. The sort of dynamic intervention system that we have been discussing in this blog and in other papers position assessment as serving the purpose of informing instructional decision making. It is for the purpose of answering questions like the following: Who are the students whose test scores indicate that they still need more help on key skills? What are the specific areas on which the additional work should focus?
The topic that I would like to focus on here is the picture into which these questions fit. How might the instructional questions of which children need assistance, and what should the assistance be, fit into the type of intervention system that we are trying to introduce through this blog and other writings that we have posted?
It is rather interesting that in many discussions about curriculum implementation, intervention is an additional thing applied on top of normally planned instruction. It is something that is intended to fix problems, not something that is an integrated part of how things are done for everyone. We would like to discuss the possibility that it could be something different. Rather than a way to fix problems, intervention can be a different way to handle the implementation of a curriculum such that districts are in a position to be extremely agile and responsive to what the data is saying about the success of what has been planned. Rather than being separate and apart from the plan, intervening is part of the plan. A plan is developed and then implemented. Right from the start data is collected that speaks to whether things are working. Decisions are made and plans revised and then the cycle starts a-new. The US Department of Education has defined educational intervention in this way in documents generated to assist schools in conducting evidence based practice.
What does such an approach require? The first thing is a system in which assessment data is immediately available. Ideally the system should include an indication of whether an individual lesson is working. This data should be made immediately available and it should be tracked to determine if what has been planned is working. Similarly, the means of providing the content should be sufficiently flexible that plans may be readily changed. If one group of 4th grade students are part way through a set of lessons and they are still struggling with the skills that are the focus of those lessons then the teacher should be in a position of easily modifying the plan without having to wait for weeks or months until a planned course of instruction has been completed. In order to make this kind of decision making possible it is critical that the assessment data collected not only be timely, but also valid. Questions must be written so that they cover the intended skills and minimize measurement error.
It is our hope that the discussion in this blog and in the forum that surrounds it will include many innovative possibilities about how intervention might look. How have interventions been applied in your district? What obstacles have come up in making them work?
Tuesday, December 16, 2008
Tuesday, December 2, 2008
A response to a question about the possible uses of benchmark data
A recent comment to one of our prior posts asked some questions that I thought warranted at least a couple of posts. The questions concerned whether or not it was desirable to take formative and benchmark data that were originally implemented as a means to inform instruction and apply them to a different purpose such as the assignment of grades or retention decisions.
The reason that one might want to do this sort of thing is pretty clear. It’s an issue of efficiency. Why use all kinds of different tests for retention and grading when you already have all of this data available online? Even though it was intended to guide instruction, why not make use of benchmark data for a grade or a placement? The short answer to this question is that it is generally not a good idea to use a test for purposes other than the purposes for which it was designed. Benchmark assessments are designed to provide information that can be used to guide instruction. They are not designed to guide program placement decisions or to determine course grades.
Evaluating the possible use of benchmark data for program placement or for grading would require careful consideration of the validity of the data for these purposes. It is worth noting that the requirement to address the intended purpose of the assessment was not always a part of the process of establishing test validity. This may have contributed to the lasting tendency to ignore the question of whether or not a test is valid for purposes other than those that the test was intended to serve. There was a time when questions regarding test validity focused heavily on how well the test correlated with other measures to which the test should be related. Sam Messick argued that validity discussions must consider the use of the test score. He indicated that consideration must be given to both the validity of the score as a basis for the intended action and what sort of consequences resulted from the scores use for that purpose. Today the National Council on Measurement in Education (NCME) includes in its Standards for Educational and Psychological Tests and Manuals the notion that the use and the consequences, both intended and otherwise, of that use of a test score must be considered in evaluating validity.
Applying this notion to the use of benchmark and formative data for grades, the question that would need to be asked is whether the scores that are produced actually serve the intended purpose of the score on the report card. Report card grades are typically designed as a “bottom line measure” of the best information available on a student’s knowledge and skills related to a given body of instructional content following a defined period of instruction. This sort of a summative function will, in many cases, dictate a different selection of standards and items than one would choose if the goal were to provide a score that was intended to guide subsequent instruction. For example, a test designed to guide instruction might include a number of questions on a concept that had been introduced prior to test administration, but was intended to be a focus of later lessons. Students would be quite justified in considering this unfair on a test used for a grade. Tilting the balance in the other direction would limit the usefulness of the test as a guide for instruction.
The case against using benchmark assessments as program placement tests is particularly clear. As the author of the comment rightly points out, retention is a high stakes decision. The consequences to a student are extremely high. Moreover, the consequences may not be perceived by students and parents as being beneficial. For example, a student who has learned that he or she is being retained in grade based on benchmark test results may not be happy about that fact. Because grade retention is a high stakes issue, it would be reasonable for the student’s parents to question the validity of the placement decision. Benchmark tests are designed to provide valid and reliable data on those standards that are the focus of instruction during a particular time period. They are not intended to provide an all encompassing assessment of the student’s mastery of all those standards that they are expected to know throughout the school year. Moreover, validation of benchmark assessments does not generally include evidence that benchmark failure one year is very likely to indicate failure the following year. Thus, the parent would have good reason to challenge the placement decision.
The reason that one might want to do this sort of thing is pretty clear. It’s an issue of efficiency. Why use all kinds of different tests for retention and grading when you already have all of this data available online? Even though it was intended to guide instruction, why not make use of benchmark data for a grade or a placement? The short answer to this question is that it is generally not a good idea to use a test for purposes other than the purposes for which it was designed. Benchmark assessments are designed to provide information that can be used to guide instruction. They are not designed to guide program placement decisions or to determine course grades.
Evaluating the possible use of benchmark data for program placement or for grading would require careful consideration of the validity of the data for these purposes. It is worth noting that the requirement to address the intended purpose of the assessment was not always a part of the process of establishing test validity. This may have contributed to the lasting tendency to ignore the question of whether or not a test is valid for purposes other than those that the test was intended to serve. There was a time when questions regarding test validity focused heavily on how well the test correlated with other measures to which the test should be related. Sam Messick argued that validity discussions must consider the use of the test score. He indicated that consideration must be given to both the validity of the score as a basis for the intended action and what sort of consequences resulted from the scores use for that purpose. Today the National Council on Measurement in Education (NCME) includes in its Standards for Educational and Psychological Tests and Manuals the notion that the use and the consequences, both intended and otherwise, of that use of a test score must be considered in evaluating validity.
Applying this notion to the use of benchmark and formative data for grades, the question that would need to be asked is whether the scores that are produced actually serve the intended purpose of the score on the report card. Report card grades are typically designed as a “bottom line measure” of the best information available on a student’s knowledge and skills related to a given body of instructional content following a defined period of instruction. This sort of a summative function will, in many cases, dictate a different selection of standards and items than one would choose if the goal were to provide a score that was intended to guide subsequent instruction. For example, a test designed to guide instruction might include a number of questions on a concept that had been introduced prior to test administration, but was intended to be a focus of later lessons. Students would be quite justified in considering this unfair on a test used for a grade. Tilting the balance in the other direction would limit the usefulness of the test as a guide for instruction.
The case against using benchmark assessments as program placement tests is particularly clear. As the author of the comment rightly points out, retention is a high stakes decision. The consequences to a student are extremely high. Moreover, the consequences may not be perceived by students and parents as being beneficial. For example, a student who has learned that he or she is being retained in grade based on benchmark test results may not be happy about that fact. Because grade retention is a high stakes issue, it would be reasonable for the student’s parents to question the validity of the placement decision. Benchmark tests are designed to provide valid and reliable data on those standards that are the focus of instruction during a particular time period. They are not intended to provide an all encompassing assessment of the student’s mastery of all those standards that they are expected to know throughout the school year. Moreover, validation of benchmark assessments does not generally include evidence that benchmark failure one year is very likely to indicate failure the following year. Thus, the parent would have good reason to challenge the placement decision.
Subscribe to:
Posts (Atom)