ATI Town Hall Blog: Raw scores are not what they seem

The score reports from state testing have arrived. Your class has scored an average of 680 on Math and 700 on reading. Anyone who has been involved in education either as a teacher, a student, or the parent of a student in the schools has seen testing reports that include these kinds of scores. What do these scores mean? Why are all the decisions that are made from state tests such as AIMS or MCAS based on scores like these instead of something more straightforward such as percent correct? Why not simply tell students only that they got 85% correct?

Unfortunately scores like percent correct aren’t as straightforward as they seem. Say, for instance, that a new math curriculum has been implemented in the 5^th grade. It’s very reasonable to want to track its effectiveness by looking at test scores on district tests. What would we make of an observation that the average score was 65% correct on district exams prior to implementation of the new curriculum and 85% following? It would be very temping to infer that the new approach was a success. Unfortunately, it is in fact likely that the two exams are not equivalent in difficulty. The difference in scores between the two tests may be entirely the result of easier test questions rather than more skilled students.

The scoring approach that has been applied to statewide testing has an answer to this problem. Wrapped up in the complicated sounding label of Item Response Theory (IRT) is a technique for analyzing test results that makes answering questions like whether there a measurable change in learning from one year to the next simpler. Difficulty is evaluated so that 90% correct on a harder test would result in a higher score than 90% correct on an easier test. The process of accounting for difficulty may also be taken into account in a fashion that makes scores from two tests directly comparable. When this is done, one can effectively address questions about just how effective that new curriculum actually is by direct comparison of test scores. Otherwise, you don’t really know what you have.

In addition to providing information about difficulty and providing the capability to place scores on the same scale, IRT also provides information about what skills children likely need to master first in order to develop to the next level. Imagine that results from that 5^th grade math test indicated that students were struggling with probability and fractions. IRT provides a way of looking at performance that takes into account performance on all the other items on a test to determine the likelihood that a student will perform successfully on a given skill. This means that that all the information available can be brought to bear in answering the question of what should be planned next.

All of these benefits are why we have chosen to make extensive use of IRT for scoring assessments within Galileo. Our objective in designing the reports and tools that make use of IRT based scores is how to make the benefits of IRT for simplifying educational decision making apparent. Toward that end we are designing an increasing number of graphical presentations as well as a number planning tools that make it easy to bring all the information at hand together to assist in planning.

Monday, June 27, 2011

Raw scores are not what they seem

No comments: