Tuesday, August 31, 2010

Adaptive Testing

While adaptive testing has been around for a long time, it has recently been gaining in popularity. Many have been looking at it as a magic bullet to meet the data driven decision making requirements of the modern accountability era. The approach can greatly shorten the time required to deliver an assessment with an acceptable level of measurement precision. While it is certainly true that adaptive testing has a number of potential benefits that provide it a place in the assessment bag of tools, it also has a number of limiting characteristics that have to be considered in selecting it for a particular use.

How does an adaptive test adapt? The approaches that are currently used fall into two basic categories based on whether adaptation decisions are made after every item or after a group of items. In the first case, a cumulative score is calculated each time a student completes an item. The next item is typically selected by determining the item that will provide the optimal level of information regarding the student’s ability. Once a score has been reached that provides an adequate level of measurement precision, the assessment is ended. Typically this endpoint is reached after far fewer questions than would be required to achieve a comparable level of precision on a standard or “linear” test , thus making the approach extremely efficient.

In the second case, decisions are made after administering preconfigured blocks of items rather than on an item by item basis. A typical design might specify a routing test followed by two stages, each of which involves a decision between pre-constructed item blocks optimized for lower, average, and high ability. In this design, students would complete the routing test and then be assigned to the appropriate second stage item block based on their score. After completing the second stage they would again be routed to the appropriate block of items in stage three based on their cumulative score on the routing test and the second stage. After completing the third stage the assessment would be ended and an overall score determined based on all the items completed.

What are the benefits of adaptive testing? One of the most notable is its efficiency. This is particularly true with the item by item approach. As I mentioned, an adequately precise measure of a student’s ability may typically be achieved with far fewer items than would be required for a typical “linear” test. This efficiency can make the approach far easier to use when large groups of students must be tested in a short time.

What considerations go along with the benefits? Two of the most fundamental are content control and item exposure. Both are controlled to some degree by the multi-stage design. The multi-stage design is not quite as efficient as the single-stage design. However, in those cases in which content control and/or item exposure are important considerations, the multi-stage approach may be preferred.

Content control is especially important in standards-based education. While the item by item approach is extremely efficient, the ability to ensure that all students are exposed to content reflecting standards targeted for instruction may be compromised in the item by item approach. Johnny, Suzy, and Javier may all sit down together and not be exposed to any of the same items. Suzy may have to multiply fractions while Johnny and Javier don’t. This may not be cause for concern if the test is being used for purposes of screening, but it may limit the utility of the measure if the results are being used for very specific decisions about student instructional needs. For instance, if there are instructional decisions to be made about providing instructional coaching on multiplication of fractions for Johnny and Javier, it would likely be most informative they had actually been asked to respond to the questions that Suzy was given. While Item Response Theory (IRT) may be used to estimate the probability that students have mastered these skills even if they haven’t been given the questions, actual administration of the items can afford the additional benefit allowing mistakes to be analyzed. The pre-constructed design of the multi-stage approach affords control of this issue at the expense of some efficiency.

Item exposure is particularly important when item security is a concern. The focus of the item selection algorithms on picking the optimal item for a given ability level can mean that the same limited set of items come to the top of the heap repeatedly. This is particularly true for the item by item approach. This issue can limit the utility of the assessment for repeated use because students are likely to see the same items again and again. In high stakes situations, the door is open for cheating because the items that are likely to show up become predictable. As with the content consideration, the multi-stage approach provides some control over this issue at the expense of efficiency.

The combination of strengths and unique considerations involved with adaptive testing point to some particular uses for which it would be a strong choice. One of the most notable is screening. The efficiency with which an adaptive test can provide a reliable measure of ability make it a good choice when the issue at hand is deciding whether a student should enter into a particular instructional group or class. The degree of content control provided by a multi-stage approach could provide more granular information when the decision at hand is whether a child should be given additional help on a specific skill or set of skills.

ATI is current working on a module to support adaptive testing. The module will support the the item by item and multi-stage approaches. Provision of these options will allow districts to take advantage of the strengths provided by adaptive testing when they are designing an assessment program that will best meet their needs.

No comments: