Friday, May 1, 2009

The Calculations behind Forecasting Risk and Making Predictions in Galileo K-12 Online

Thanks for the comment on my previous post, Gerardo! I’ll work through the calculations you requested in this response, but the real work is in making sure that the benchmark assessments are properly aligned to state standards and that student scores on the benchmark assessments correlate well with their scores on the statewide assessment. The validity of Galileo K-12 Online benchmark assessments, both in terms of the alignment of content and the correlations with the state test scores, has been well-established (see the Galileo K-12 Online Technical Manual), and so now we are free to engage in a very straightforward, easy, and accurate approach to forecasting student performance on statewide assessments like AIMS.

The first step in forecasting student performance on a statewide assessment is establishing cut scores on the benchmark assessment that correspond to the cut scores on the statewide assessment. To do this we use equipercentile equating (e.g. Kolen & Brennan, 2004). With equipercentile equating, you start with the distribution of student scores on the target assessment. In the example I’ll work through here the target assessment, the one we want to make predictions about, is the 3rd grade math AIMS assessment (the statewide assessment in Arizona). The distribution of scores that is used for the equating process is the set of scores from that particular district’s students on the previous year’s assessment. In this case, 25% of the district’s third-graders had fallen below the critical cut score for meeting the standard on the spring, 2007 AIMS assessment, and so for the 2007-08 3rd grade math benchmark assessments, the cut score for Meets the Standard was set at the 25th percentile. The same approach was used for the other two cut scores (Approaches and Exceeds) but for the purposes of this discussion, we are only concerned with the cut score for Meets, which is essentially the pass/fail cut score.

Once the cut scores for benchmark assessments are established, they can be used to estimate each student’s degree of risk of not meeting the standard on the next statewide assessment. As stated in my original blog on this topic, we have found that observing a student’s pattern of performance across multiple benchmark assessments yields more accurate forecasts of likely performance on the statewide assessment than does looking at the student’s score on one assessment in isolation. Classification depends on whether the student scored above or below the cut score for Meets the Standard on each benchmark assessment. If the student scored above the cut score for Meets on all three, then she is said to be On Course for demonstrating mastery on the statewide assessment. A student who scores below the cut score on all three assessments is classified as being at High Risk of not demonstrating mastery. Scoring above on two out of three assessments earns the classification of Low Risk and scoring above on only one out of three assessments earns the classification of Moderate Risk.

In Galileo K-12 Online, the reports that indicate student risk levels are linked directly to instructional materials and other tools to support intervention efforts with students who are at risk. This support for intervention efforts is the primary purpose of the risk classification scheme. But it is important to demonstrate that the classification scheme is accurate, which brings us to the data summary that Gerardo asked about.

The table below presents the data for the example I’ve been working through here.

The data are from a particular district’s 2007-08 benchmark assessments in 3rd grade math. The panel on the left shows the different possible patterns of performance on the series of three benchmark assessments: the first row represents students who scored above the cut score on all three benchmarks, and so on. The next column indicates the Risk Level classification for each pattern of performance. Note that scoring above the cut score on two out of three benchmark assessments leads to the same Risk Level classification, regardless of which two assessments were passed by the student. The number of students who showed each pattern of performance is indicated, as is the number of students in each pattern who did and did not demonstrate mastery on the AIMS assessment. For example, there were 238 students who scored above the cut score for Meets on all three benchmark assessments. Of these students, 234, or 98%, also met the standard when they took the AIMS assessment at the end of the year. The percent who met the standard in AIMS for each of the other risk groups was calculated in a similar manner. For the Low Risk group, it was simply a matter of adding up the number of students who passed the AIMS assessment (15+5+26), dividing by the total number of students in that risk group (19+9+34) and then multiplying by 100 to get 74%.

The Percent Met Standard column in the table presented here corresponds to the data in the table at the end of my previous post (“Forecasting Risk and Making Predictions about AMOs”). In that table, I presented averages for each Risk Group that were based on data from 7 school districts in a pilot investigation. The averages are collapsed across all of those districts and all grade level and content areas. We have plans to investigate this further, and I will keep the readers of this Blog posted in any developments.

The final column in the table indicates the accuracy of forecasting performance on AIMS for each of the Risk Groups. For the On Course and Low Risk groups, the prediction is that they will most likely meet the standard on AIMS. For these groups the calculation of the percent accuracy is the same as the calculation of the percent who Met the Standard in the previous column. For the other two groups, the prediction is that they will most likely NOT meet the standard on AIMS, and accuracy here refers to the percent of students who, as predicted, did not meet the standard on AIMS. For the High Risk group, accuracy was at 87% because 13% of these students met the standard on AIMS in spite of their failure to do so on any of the benchmark assessments. Even more interesting is the Moderate Risk group, for which accuracy was only 59% because 41% passed AIMS in spite of our prediction. At ATI, we actually like to be less accurate in these two categories. If our prediction is wrong with these students, it suggests that they and their teachers worked very hard, and they managed to pass the statewide assessment against the odds. We hope that the Galileo K-12 Reports and instructional materials were a part of that effort. That’s what it’s all about.


Kolen, M.J. & Brennan, R.L. (2004). Test equating, scaling, and linking: methods and practices. New York: Springer.

No comments: