Tuesday, April 27, 2010

Finding Balance: Designing a Balanced (and Effective) Assessment System

Any follower of recent education publications intended either for educators or the general public will have frequently come across mention of the notion that it is important for an assessment system used in K-12 education to be balanced. The concept of a “Balanced” Assessment System isn’t new. A quick Google search will show writings dating back to the early 90’s on the first page. While the notion isn’t something that has its genesis in this modern era of No Child Left Behind (NCLB) and Race To the Top (RTT), these initiatives have certainly contributed to the rise in prominence of this idea. My purpose with this post is to discuss what makes an assessment system balanced and, critically, what is needed to make a balanced assessment system effective.

Starting a discussion of a balanced assessment system begs the question about what exactly must be balanced with what. In short, the balanced assessment notion argues that assessment systems have many different purposes and that those purposes must be balanced without one overwhelming the other. The need for an administrator to evaluate the efficacy of a school within the district must be balanced with the teacher’s need to plan based on information about his/her student’s mastery of standards which must be balanced with a student’s need for feedback on his/her work. NCLB and RTT also emphasize the need for state and federal government oversight of schools to be added to the mix.

Balancing all of these different components naturally requires a system that contains different types of assessments. One type of assessment cannot really adequately serve all the different needs. An assessment designed to provide an overview of progress during the entire year must necessarily cover a wide range of topics. The number of items that would be required for that type of coverage would be, to say the least, a bit unwieldy as a tool to provide data for classroom planning. In contrast, informing decision making for the teacher requires an instrument that is targeted to the specific topics that are the focus of instruction at that moment.

When an assessment system involves multiple different types of assessments, each instrument must be specifically designed for its function within the bigger picture. Using an instrument for a purpose other than what it has been designed, while at times being tempting, is a sure route towards making decisions based on data that is invalid. While different instruments are required, they must play well together in order to provide the balance that is sought. This part of the design of an assessment system can be the trickiest as there are many potential pitfalls. One of the most common sources of difficulty is the lack of a common underlying framework dictating the kinds of assessments to be used and the purposes served by each type of assessment. Put another way, it would make little sense for the formative component to be aligned to a different instructional plan that the quarterly benchmark assessments. An administrator who is following student progress by observing results on formative assessment could be in for quite a surprise when state tests are administered. In order for the different measures of student proficiency that are part of a balanced assessment system to present a complete and sensible picture, they have to be aligned to the standards targeted for instruction. It is also important that all components of the assessment system provide valid and reliable results. If classroom formatives aren’t producing reliable results then they don’t contribute to the overall picture in a dependable way and can produce a misleading impression. It should be noted that reliability of an assessment is something that must be evaluated on an ongoing basis. It has been well established in the literature that items that perform one way for a group of students can behave entirely differently for a different group of students.

ATI advocates and supports a balanced approach to the implementation of an assessment system. A complete and balanced picture of student achievement that meets the needs of administrators, teachers, students and parents is fundamental to increasing student achievement. Districts design benchmarks to be specifically aligned to the scope and sequence of their curriculum. A similar process is used to ensure that formatives are aligned to district instructional objectives. Benchmarks and formatives may be assembled together to form a complete picture of student learning that is well suited to determining if students are making progress overall in a school or particular class. Performance of items is also continually evaluated in order to determine how the instruments are performing with the students who are assessed.

How has the notion of a balanced assessment system impacted your district’s assessment practices?

Thursday, April 8, 2010

Instructional Dialogs and Formative Items for Enrichment and Remediation

When working with students to teach and reinforce mastery of state standards, it is often the case where some students quickly pick up the concept. Other students need a little more instruction and guided practice, while still others may need to have more fundamental instruction to succeed.

How can the needs of all three groups be addressed? One option for teachers using Galileo Online Instructional Dialogs and formative assessments is to identify student proficiency on an Instructional Dialog or quiz drawing on their current grade level. Those students who struggle with the concept may then be assigned Instructional Dialogs and quizzes drawing on materials from earlier grades, while the students who have demonstrated proficiency appropriate to their current grade, can be given opportunities to enrich and expand their understanding with work from similar objectives at the next grade level.

As an example, a third grade standard asking students to know multiplication and division facts through 10 could be complimented by remediation in the second grade standard of multiplication by 1, 2, 5, 10,and expanded, for those students who are ready, with fourth grade work in the multiplication of two-digit numbers by two-digit, and multiple-digit by one-digit. No new materials, licenses or texts are needed. The standards for all grades for each state are assigned across the district. Teachers have access allowing them to reach forward or back within the state standards to access the Instructional Dialogs and quizzes that can help reinforce core knowledge or introduce concepts and challenges appropriate to students' current development and knowledge.

This opportunity is not limited to math. In English language arts, the recurring standards are engaged by providing grade appropriate reading content. Students may be instructed and assessed in similar standards of theme, characterization, and vocabulary from context, but a teacher may enhance learning by assigning to each student texts and questions on these common elements with readings that are best suited to the current proficiency in vocabulary and reading fluency of the individual student. By exploring the opportunities afforded by using common standards across multiple grade levels, teachers are able to increase the variety and developmental appropriateness of instructional content and assessment in the effort to foster student successes.

To get more information on how to utilize multi-grade resources for instruction and assessment, please contact the Educational Management Services Department at Assessment Technology, Incorporated.

Monday, April 5, 2010

Why did student scores go down on the most recent assessment?

As the year progresses, student Development Level (DL) scores generally go up from one assessment to the next. This is a reflection of student growth, and it is a rewarding state of affairs for all involved. However, sometimes student DL scores go down from one benchmark or interim assessment to the next. When that happens, school or district administrators are understandably quite concerned, and they sometimes contact us looking for an explanation. There are many things that can contribute to a decrease in DL scores, and no two situations are exactly alike, but a basic understanding of how DL scores are calculated can help to solve the mystery or, at least, to focus the investigation in a more fruitful direction.

Galileo DL scores are scale scores, and in Galileo K-12 Online the DL scores for interim assessments within a given grade and subject are all put on the same scale, so that the scores on different assessments can be compared meaningfully. The way this is accomplished is by relying on the item parameter estimates for the items that appear on the assessments. Item parameters (described in a previous blog) summarize various aspects of the way an item performs when students encounter it on an assessment, including its relative difficulty. When student DL scores for an interim assessment are calculated, the algorithm takes into account both the number of items the student got correct and the relative difficulty of the items on the assessment.

Consider a situation in which two students take two different assessments in 5th grade math. Both students get 65% correct, but by chance Assessment A, which was taken by Student A, was much more difficult than Assessment B, taken by Student B. If you know that there was a difference in the difficulty of the assessments, it doesn’t seem fair that both students should get the same Development Level score based on the score of 65% correct. The algorithm that calculates DL scores takes the relative difficulty of the assessments into account, and in this scenario Student A would end up with a higher DL score than Student B.

Now apply this situation to a series of interim assessments taken by the same set of students. Suppose, for example, the average student raw score was 65% correct on both the first and second interim or benchmark assessments. If the second assessment was more difficult, student DL scores will go up relative to the first. In fact, it’s possible for student DL scores to go up even if their raw scores drop a bit relative to the first assessment if the second one was much more difficult than the first. And, sometimes, student DL scores will go down on a second assessment. This can happen if it was easier, in terms of item parameters, than the first assessment and student raw scores either stayed the same, went down a bit, or only improved slightly but not enough to account for the relative ease of the second assessment.

That brings us to the most important point. If student DL scores drop rather than increase, it is important to identify which skills or concepts the students are having difficulty with and then to do something about it, via re-teaching, intervention, or whatever the best course of action seems to be given the situation and the available resources. There are at least three Galileo K-12 Online reports that are helpful in this regard: the Intervention Alert, the Development Profile, and the Item Analysis report, especially when run with the ‘detailed analysis’ display. All of these indicate student performance on specific skills and therefore provide a starting place for improving student performance.