Consider the following scenario. You are an elementary school principal and on the most recent statewide assessment, 62% of your 4th grade students passed the math portion of the assessment. The target Annual Measurable Objective (AMO) for 4th grade math in your state is that 63% must pass. If the school is to avoid NCLB sanctions, it is crucial that a greater percentage of 4th-graders pass the math portion of the statewide assessment in the coming year. Together with your 4th-grade teachers you’ve reviewed and revised the math curriculum and put new intervention procedures in place. How do you know whether it’s working? Can you tell before it is too late?
The good news is that it is reasonable to expect a certain degree of student learning across the year under almost any circumstances. The bad news is that the rate of student growth last year obviously wasn’t enough. If the rate of student growth remains the same, then you can expect that by the end of the year approximately 62% of the students will be prepared to pass the statewide assessment. All of the hard work of the students and teachers will have resulted in merely maintaining the status quo and, in this case, the status quo is not enough. You need to make sure students are making gains at a pace that is better than status quo.
The first step in monitoring whether an intervention is effectively preparing a greater number of students to pass a statewide assessment is to identify the status quo rate of growth. Once this is known you can compare the growth rate of your own students and determine whether they are out-pacing the status quo. One obvious way to monitor student growth is to administer periodic benchmark assessments.
When comparing student performance from one benchmark assessment to the next, it is important to use scaled scores rather than raw scores, and it is important that the scores on the assessments under consideration be on the same scale so that comparisons are meaningful. Scaled scores, such as those derived via Item Response Theory (IRT) analysis, take into account differences in the degree of difficulty of assessments, whereas raw scores do not. If a student earns a raw score of 20 on the first benchmark assessment and a score of 25 on the next, you do not know whether the student improved or the 2nd assessment was easier. If the comparison is made in terms of scaled scores, however, and if both assessments have been placed on the same scale, then the relative difficulty of the assessments has been factored into the calculation of the scaled scores and an increased score can be attributed to student growth.
ATI clients that administer Galileo K-12 Online benchmark assessments can use the Aggregate Multi-Test Report to monitor student growth and to identify whether the rate of growth is greater than the rate that is likely to result in maintaining the status quo. Galileo K-12 Online benchmark Development Level (DL) scores are scaled scores that take difficulty into account and all assessments within a given grade and subject are placed on the same scale, so that comparisons across assessments are meaningful. In addition, beginning with the 2007-08 school year, the cut scores on the benchmark assessments that are aligned to the relevant statewide assessment cut scores also provide an indication of the status quo growth rate.
In Galileo K-12 Online, the cut score for passing (e.g. “Meets benchmark goals” in Arizona, “Proficient” in California, and so on) that is applied to the first benchmark assessment of the year is tailored for each district using equipercentile equating (Kolen & Brennan, 2004). The cut score is aligned to the performance of that district’s students on the previous year’s statewide assessment. For subsequent benchmark assessments, the passing cut score represents an increase over the previous cut score at an expected growth rate that is likely to maintain the status quo. The expected growth rate for each grade and subject is based on a review of the data from approximately 250,000 students in grades 1 through 12 who took Galileo benchmark assessments in math and reading during the 2007-08 school year. Districts and schools that are seeking to improve the percent of students passing the statewide assessment should aim for an increase in average DL scores that is greater than the increase in the cut score for passing the assessments.
The following graphs illustrate a district that is showing growth at the expected rate and maintaining the status quo, and another that is showing growth that is better than the expected rate and which can expect to show improvement over the previous year with regard to the percent of students passing the statewide assessment.
District A: Showing growth but maintaining the status quo
District B: Showing growth AND improvement
Different rates for different grade levels
There has been a great deal of research regarding the amount of growth in terms of scaled scores that can be expected within and across various grade levels and the results have been mixed. When IRT methods are used in calculating scaled scores, it has generally been found that the relative amount of change in scaled scores from one grade level to the next tends to decrease at the higher grade levels (Kolen & Brennan, 2004). ATI applies IRT methodology and has also observed that the rate of increase in student performance in terms of scaled scores tends to decrease at the higher grade levels. The graph that follows presents the mean scaled score on the first and third benchmark assessments within the 2007-08 school year for grades, 1, 3, 5, and 8. The sample consisted of approximately 25,000 students per grade.
It should be noted that the slower rate of growth at the higher grade levels does not necessarily imply that students’ ability to learn decreases at the upper grades. The decrease in the growth rate may, for example, be a side effect of the methodology that is used in the raw-to-scale score conversion (Kolen, 2006). Regardless, the pattern is a stable one that provides a reliable measure against which to compare the growth of individual students or groups of students.
I’d love to hear other thoughts on monitoring intervention efforts. What has been helpful? What has not? What might be helpful in the future?
Kolen, M.J. (2006). Scaling and norming. In R.L. Brennan (Ed.), Educational Measurement (pp. 155-186). Westport, CT: American Council on Education and Praeger Publishers, jointly.
Kolen, M.J. & Brennan, R.L. (2004). Test equating, scaling, and linking: methods and practices (2nd ed.). New York: Springer.