Monday, November 10, 2008
A number of important audience questions were raised concerning the use of technology in dynamic intervention systems. One question dealt with the issue of whether or not all interventions should be implemented online. The answer that I gave is that all interventions are not and should not be required to be implemented online. There are two reasons for this: First, not all districts have the necessary technology to support online interventions. Second, instruction does not and should not always occur on a machine. That said, technology can still be helpful. For example, online technology can be useful in documenting the occurrence of interventions that take place offline. Questionnaires and records of lesson plans and assignments can document implementation of an intervention. Data of this kind has been used effectively in many studies of intervention implementation.
Another audience question raised the possibility of automating experimental designs that could be implemented to assess intervention effects. There is no substitute for a skilled researcher when the task at hand is experimental design. Nonetheless, automated packaged designs could be useful in supporting the kinds of short experiments that we have proposed for use in dynamic intervention systems. We are currently working on the design and development of technology to automate experimental design.
Wednesday, November 5, 2008
This rather dry topic would likely not have been something that was well known outside the ivory towers were it not for the growing question of merit pay for educators. One of my statistics professors used to love to say that “Statistics isn’t a topic for polite conversation”. The introduction of pay into the conversation definitely casts it in a different light. In NYC, consideration is being given to utilizing a value added analyses in tenure decisions for principals. Value added models have been used for determining teacher bonus pay in Tennessee and Texas. Michelle Rhee has argued for using a value added approach to determining teacher performance in the DC school system. One might say it is all the rage, both for the size of the spotlight shining its way and the emotion that its use for this purpose has brought forth.
I will not be using this post to venture into the turbulent waters of discussing who should be getting paid based on results and who shouldn’t. I’ll leave it to others to opine on that very difficult and complicated question. My purpose here is to introduce the idea that the type of questions one asks from a value added perspective, the mindset if you will, can greatly inform instructional decision making through creative application. The thoughts that I will write about here are not intended to say that current applications of the value added type approach are wrong or misguided. I intend only to offer a different twist for everyone’s consideration.
The fundamental question in the value added mind set is whether something that has been added to the classroom positively impacts student learning above and beyond the status quo. One could easily ask this question of new instructional strategies introduced to the classroom that are intended to teach a certain skill. For instance, one might evaluate a new instructional activity designed to teach finding the lowest common denominator between two fractions. Given the limited scope of the activity, this evaluation could be conducted with a great deal of efficiency in very short time by the administration of a few test questions. This sort of evaluation will provide the sort of data that could be used immediately to guide instruction. If the activity is successful then teachers can move on to the next topic. If it is unsuccessful then a new approach may be utilized. The immediacy of the results puts one in a position of being able to make decisions informed by data without having to wait for the year or the semester to end.
Conducting short term small scale evaluations is different from the typical approach in value added analysis of being concerned about impact over a long period. The question of long term impact over time could easily be asked of collection of instructional activities or lessons. In an earlier post, Christine Burnham discusses some of the ways that impact over time could be tested.
As always, we look forward to hearing your thoughts about these issues.
Tuesday, November 4, 2008
Consider the following scenario. You are an elementary school principal and on the most recent statewide assessment, 62% of your 4th grade students passed the math portion of the assessment. The target Annual Measurable Objective (AMO) for 4th grade math in your state is that 63% must pass. If the school is to avoid NCLB sanctions, it is crucial that a greater percentage of 4th-graders pass the math portion of the statewide assessment in the coming year. Together with your 4th-grade teachers you’ve reviewed and revised the math curriculum and put new intervention procedures in place. How do you know whether it’s working? Can you tell before it is too late?
The good news is that it is reasonable to expect a certain degree of student learning across the year under almost any circumstances. The bad news is that the rate of student growth last year obviously wasn’t enough. If the rate of student growth remains the same, then you can expect that by the end of the year approximately 62% of the students will be prepared to pass the statewide assessment. All of the hard work of the students and teachers will have resulted in merely maintaining the status quo and, in this case, the status quo is not enough. You need to make sure students are making gains at a pace that is better than status quo.
The first step in monitoring whether an intervention is effectively preparing a greater number of students to pass a statewide assessment is to identify the status quo rate of growth. Once this is known you can compare the growth rate of your own students and determine whether they are out-pacing the status quo. One obvious way to monitor student growth is to administer periodic benchmark assessments.
When comparing student performance from one benchmark assessment to the next, it is important to use scaled scores rather than raw scores, and it is important that the scores on the assessments under consideration be on the same scale so that comparisons are meaningful. Scaled scores, such as those derived via Item Response Theory (IRT) analysis, take into account differences in the degree of difficulty of assessments, whereas raw scores do not. If a student earns a raw score of 20 on the first benchmark assessment and a score of 25 on the next, you do not know whether the student improved or the 2nd assessment was easier. If the comparison is made in terms of scaled scores, however, and if both assessments have been placed on the same scale, then the relative difficulty of the assessments has been factored into the calculation of the scaled scores and an increased score can be attributed to student growth.
ATI clients that administer Galileo K-12 Online benchmark assessments can use the Aggregate Multi-Test Report to monitor student growth and to identify whether the rate of growth is greater than the rate that is likely to result in maintaining the status quo. Galileo K-12 Online benchmark Development Level (DL) scores are scaled scores that take difficulty into account and all assessments within a given grade and subject are placed on the same scale, so that comparisons across assessments are meaningful. In addition, beginning with the 2007-08 school year, the cut scores on the benchmark assessments that are aligned to the relevant statewide assessment cut scores also provide an indication of the status quo growth rate.
In Galileo K-12 Online, the cut score for passing (e.g. “Meets benchmark goals” in Arizona, “Proficient” in California, and so on) that is applied to the first benchmark assessment of the year is tailored for each district using equipercentile equating (Kolen & Brennan, 2004). The cut score is aligned to the performance of that district’s students on the previous year’s statewide assessment. For subsequent benchmark assessments, the passing cut score represents an increase over the previous cut score at an expected growth rate that is likely to maintain the status quo. The expected growth rate for each grade and subject is based on a review of the data from approximately 250,000 students in grades 1 through 12 who took Galileo benchmark assessments in math and reading during the 2007-08 school year. Districts and schools that are seeking to improve the percent of students passing the statewide assessment should aim for an increase in average DL scores that is greater than the increase in the cut score for passing the assessments.
The following graphs illustrate a district that is showing growth at the expected rate and maintaining the status quo, and another that is showing growth that is better than the expected rate and which can expect to show improvement over the previous year with regard to the percent of students passing the statewide assessment.
District A: Showing growth but maintaining the status quo
District B: Showing growth AND improvement
Different rates for different grade levels
There has been a great deal of research regarding the amount of growth in terms of scaled scores that can be expected within and across various grade levels and the results have been mixed. When IRT methods are used in calculating scaled scores, it has generally been found that the relative amount of change in scaled scores from one grade level to the next tends to decrease at the higher grade levels (Kolen & Brennan, 2004). ATI applies IRT methodology and has also observed that the rate of increase in student performance in terms of scaled scores tends to decrease at the higher grade levels. The graph that follows presents the mean scaled score on the first and third benchmark assessments within the 2007-08 school year for grades, 1, 3, 5, and 8. The sample consisted of approximately 25,000 students per grade.
It should be noted that the slower rate of growth at the higher grade levels does not necessarily imply that students’ ability to learn decreases at the upper grades. The decrease in the growth rate may, for example, be a side effect of the methodology that is used in the raw-to-scale score conversion (Kolen, 2006). Regardless, the pattern is a stable one that provides a reliable measure against which to compare the growth of individual students or groups of students.
I’d love to hear other thoughts on monitoring intervention efforts. What has been helpful? What has not? What might be helpful in the future?
Kolen, M.J. (2006). Scaling and norming. In R.L. Brennan (Ed.), Educational Measurement (pp. 155-186). Westport, CT: American Council on Education and Praeger Publishers, jointly.
Kolen, M.J. & Brennan, R.L. (2004). Test equating, scaling, and linking: methods and practices (2nd ed.). New York: Springer.
Saturday, November 1, 2008
Unfortunately, the educational research community has not responded adequately to the call for experimental studies. Indeed, the number of experimental studies conducted in education in United States has been declining for some time. Professor Joel Levin of the Department of Educational Psychology at the University of Arizona is one of a number of scholars who have played an important role in providing evidence that the decline is real. Moreover, he and others have been effective in pointing out the damaging effects of the decline on the potential impact of educational research on educational interventions designed to improve learning.
There are many possible reasons for the decline. One obvious reason is that the conduct of experimental research in schools can be very expensive. Expense is particularly problematic in experiments taking place over an extended time span involving large numbers of students and teachers. Many university professors, particularly young scholars, do not have access to the funding resources needed to conduct experimental studies of this kind.
Fortunately there is much to be gained from short experiments, which are inexpensive to conduct. Much of our knowledge regarding student learning, memory, cognition, and motivation has come from studies that generally require less than an hour of time from each of a small number of research subjects. Thus, while it may be beneficial to assess the effects of an entire curriculum on learning over the course of the school year, it may also be useful to assess the effects of experimental variables implemented in a single lesson or small number of lessons.
Focusing research on short experimental interventions provides much needed flexibility in implementing school-based interventions in the dynamic world of the 21st century. The educational landscape of the 21st century is in a constant state of flux. Standards change, curriculums change, and assessment practices change rapidly in the current educational environment. Schools have to be able to adjust intervention practices quickly to achieve their goals. Short experiments provide the flexibility needed to support rapid change in educational practice. I think we need more of them. In fact I think we need thousands of them coming from researchers across the nation. We now have the technology to manage the massive amounts of information that large numbers of short experiments can provide. The task ahead is to apply that technology in ways that support the continuing efforts of the educational community to promote student learning.