Friday, June 26, 2009

On the Assessment of Writing

One of the topics being considered in many states is how to best assess students' writing skills. The implementation of multiple-choice items to assess the writing ability of students has become more popular in recent years. Among states where Galileo K-12 Online is currently used, California and Massachusetts both use multiple-choice items to assess some aspects of writing. Arizona is reportedly adding multiple-choice writing to the AIMS in the next round of pilot testing, and we expect to see those items supporting the revised Arizona English and Language Arts standards which will be adopted in 2010-2011.

It is not surprising that multiple-choice holds a certain appeal for those wishing to assess writing. Multiple-choice items take less time away from instruction, can be scored using automated procedures such as those available to users of Galileo K-12 Online, and are scored consistently due to the use of a single correct answer instead of relying on evaluators to score to a rubric. These advantages make multiple-choice a compelling option, but there are other considerations that limit the usefulness and effectiveness of multiple-choice items in the assessment of writing. The use of multiple-choice to assess writing is an attractive but limited approach. Thomas M. Haladyna explains the limits of using multiple-choice to assess writing in Developing and Validating Multiple-Choice Items:

The most direct measure would be a performance-based writing prompt. MC items might measure knowledge of writing or knowledge of writing skills, but they would not provide a direct measure (p.11).

Therefore a crucial concern is the logical connection between item formats and desired interpretations. For instance, an MC test of writing skills would have low fidelity to actual writing. A writing sample would have much higher fidelity (p.12).

To assess writing, it is necessary to apply a standardized rubric and a writing prompt that allows students to express their responses in a manner that represents accurately their ability to compose, convey and communicate in a manner that represents accurately their ability to compose, convey and communicate in a way that fulfills the designated purpose of a text and that utilizes appropriate information they possess relevant to the topic.

While multiple-choice reading items addressing an analysis standard may not require the student to compose a full analytical expression, they do require the student to utilize the same analytical processes to identify the correct analysis from the distractors provided. However, the ability to identify the best compositional example does not reflect accurately the skills and abilities inherent in good writing as the ability to recognize persuasive, informative or expressive quality does not indicate the ability of the student to create the same level of written content.

Galileo provides content to allow for writing assessments using prompts for the most authentic measure of student writing, while also covering writing knowledge and skills in multiple-choice items that help to establish data for basic skills measurement and test reliability in predicting standardized test performance.

Text Referenced

Haladyna, T.M. (2004). Developing and Validating Multiple-Choice Test Items (3rd ed.). Mahwah, N.J.: Lawrence Erlbaum Associates.

Thursday, June 18, 2009

Care must be taken when administering benchmark assessments to subsets of students or to students from multiple grade levels

Galileo K-12 Online benchmark assessments serve two functions simultaneously. One is to provide teachers with timely feedback regarding which standards their students have and have not mastered. The other is to forecast the students’ likely performance on the high stakes statewide assessment such as AIMS in Arizona or MCAS in Massachusetts. Both of these functions are equally important, and in most cases both goals are achieved in harmony by the single benchmark assessment. However, there are some cases where the two goals are in conflict. In today’s post, I want to alert district administrators to a potential problem and to give them a way to avoid it when planning benchmark assessments.

In the typical scenario, a benchmark assessment is given to all students in the district in a given grade level. For example, all fifth-graders in the district might take a fifth-grade math benchmark assessment. It is expected that all of these students will also take the fifth-grade math high-stakes statewide assessment. This is important because the benchmark assessment must be aligned to the statewide assessment in order to generate cut scores for performance levels and to forecast student performance on the statewide assessment. If the same set of students is expected to take both the benchmark assessment and the statewide assessment, then the comparison between the two assessments is essentially a comparison of apples to apples, and all is well. The cut scores that are calculated for the benchmark assessment should provide accurate forecasts of student performance on the statewide assessment and, in fact, the accuracy rate for Galileo K-12 Online benchmark assessments is quite high (see the Galileo K-12 Online Technical Manual.)

There are cases, however, where the set of students taking a benchmark assessment is not the same as the set that will be taking the statewide assessment. In these cases, the calculation of accurate cut scores for benchmark assessments becomes more complicated. A common scenario is one in which advanced 8th-graders are taking a high school algebra course and, quite reasonably, they take the high school math benchmark assessments instead of the 8th grade math benchmark assessments. This makes perfect sense for the first goal of benchmark assessments: providing feedback to teachers regarding student mastery of state standards. It does, however, create problems for the goal of forecasting student performance on the statewide assessment. In most cases these students will be taking the 8th grade statewide assessment, and not the high school statewide assessment, and so the comparison when calculating cut scores becomes one of apples to oranges.

In order to calculate accurate cut scores for the high school math benchmark assessment in the above scenario, the scores from the 8th grade students must be removed from the data set, so that the set of students on the benchmark assessment will be the same as the set of students who will be taking the high school statewide assessment. Additionally, care must be taken when calculating the cut scores for the 8th grade math benchmark assessment. This is because a specific region of the student distribution, the advanced students, will not be present in the distribution of scores for the 8th grade benchmark. If no adjustment is made to account for the absence of the advanced students, then the cut scores that are calculated will be too low, and too many students will be classified as being likely to pass the statewide assessment. This, of course, will result in rude surprises when the statewide assessment results come in.

The take-home message, then, is to be sure to be clear about who will be taking benchmark assessments when you are planning them. Steps can be taken in cases such as the one described here to make sure that the cut scores on benchmark assessments are accurate, but only if ATI knows about the unusual circumstances in advance. If you are designing benchmark assessments in Galileo K-12 Online and there will be any out-of-grade testing, or if the set of students on the benchmark assessment will not be the same as the set that is taking a particular statewide assessment, please let your Field Services or Educational Management Services representative know right away. Forearmed with as much information as possible, ATI can work with your district to make sure that the benchmark assessments provide accurate forecasts of student performance on statewide assessments as well as providing timely feedback regarding the mastery of standards to classroom teachers.

Wednesday, June 3, 2009

Help for Math Teachers

The purpose of this thread is to provide information and a way for math teachers to converse with each other about specific states standards both interpretations of state provided language and ideas about how to teach these standards to students.

Please comment on posts or add new posts including questions, ideas, and answers about how to teach math standards.

High School: Post #1

AZ-MCW-S3C4-PO10. Determine an effective retirement savings plan to meet personal financial goals including IRAs, ROTH accounts, and annuities.

AZ provided connection: MCWR-S5C2-09. Use mathematical models to represent and analyze personal and professional situations.

AZ provided explanation: An IRA is an “Individual Retirement Account,” and a ROTH is a specific type of IRA, with a more complex tax-advantaged structure.
I have searched for formulas or information about how to figure returns, advantages, and how to figure how much to invest in order to reach a retirement goal, but I have only found calculators not any information about formulas to mathematically figure the answer.

What materials/formulas do you plan to teach students to figure this information?

Middle School: Post #1

AZ-M06-S2C4-01. Investigate properties of vertex-edge graphs
· Hamilton paths,
· Hamilton circuits, and
· shortest route.

How do you teach students to check their answers on the vertex-edge graph items?

How do you know if you found all possible paths on a vertex-edge graph?


AZ provided explanation: A Hamilton path in a vertex-edge graph is a path that starts at some vertex in the graph and visits every other vertex of the graph exactly once. Edges along this path may be repeated. A Hamilton circuit is a Hamilton path that ends at the starting vertex. The shortest route may or may not be a Hamilton path. Depending upon the constraints of a problem, each vertex may not need to be visited.
Elementary School: Post #1

AZ-M02-S5C2-03. Select from a variety of problem-solving strategies and use one or more strategies to arrive at a solution.

What problem strategies do you think are appropriate to teacher primary students?

Which problem strategies are your student’s favorites?



Monday, June 1, 2009

Share Your Lessons With Others

Have you created a lesson that you are incredibly proud of? Do you wish there was an easier way to let your colleagues access the lesson to use with their students? With Galileo sharing is easy. In order to share your content, you will want to attach it to a Dialog. Don’t worry. You needn’t recreate your lesson in a Dialog. We recommend that you do the following:

  1. Link your Dialog to state standards. Most of your colleagues will search for lessons based on standards.
  2. Give your Dialog a title and add any notes that will be relevant to other users.
  3. Add a description. The words you place in the description box will be searchable by other users once you share your lesson. Examples of keywords could include: emerging language learners, hand-held responders, or teacher-facilitated.
  4. Attach the lesson as a resource.
  5. Automatically generate a follow up quiz. This is optional and only necessary if you’d like to use Galileo’s Formative Test Reports to evaluate students’ learning of the lesson.
  6. Publish your lesson.

Once your lesson is published you can share it in two ways. Once your Dialog is published you will see a Share Dialog button. Click this button to add your lesson to the community bank. Sharing your Dialog to the community bank will allow Galileo users in your district and other districts to see your Dialog when searching, and they can schedule and use it with students. If you would prefer only to share your content with colleagues in your district, that is possible as well. You will just need to provide your colleague’s access to your Dialog Library or copy your Dialog into their library. ATI will be more than happy to show you how this is done. For more information on sharing your lessons, e-mail ATI’s Professional Development staff at professionaldevelopment@ati-online.com for assistance. Or call us at 1-800-367-4762 ext. 132.

Wednesday, May 27, 2009

Benchmark Results by Groups

ATI released a new report this month called the Benchmark Results by Groups report. This report breaks out the students who passed the benchmark goals by subgroup. Combined with customizable forms it allows the user flexibility in reporting. It can be run on an individual class, offering comparisons to the school and district data, all schools, or all classes. In all cases you will receive an “Overall” column for comparison purposes (see screen shot below). At the present time this report is available to district-level users only however ATI has future plans to make it available to all user access levels.

If you are interested in learning more about this report or other components of the system, a WebEx can be set up. A WebEx is a guided tour of the system over the Internet. Please contact the Field Services department at 800-367-4762 Ext. 124 to obtain more information.

For those districts that are current clients, please contact your Field Service Coordinator if you have questions on this report or other components of Galileo.


Wednesday, May 6, 2009

Questions to complement Value Added Measurement

A short while back I wrote a post about the use of Value Added Measurement (VAM) within the context of educational reforms. I mentioned President Obama speaking of the need to reward effective teachers financially. Indeed the impact of effective teachers is well established. There is certainly value in identifying which instructors have the most impact on the learning of their students. However, just as with any set of tools that might be employed to the ultimate task of raising student achievement, there are certain limitations to VAM and merit pay as a source of guidance for policy. One of the most notable is that it provides no insight into what effective teaching actually looks like. My task here will be to make good on my promise from the last post to talk a bit about some additional tools that can be added to the arsenal. As I said before, it makes sense to make use of any tool that we can to tackle this important task.

VAM asks the question who is the most effective in the classroom. What if we added some additional questions such as: what is the most effective way of teaching a given skill? What are the specific needs of students who need additional help? How are they progressing as they receive instruction? These might be thought of as "bottom up" questions as opposed to the “top down” type of inquiries that characterize VAM. The notion is that specific identification of the components of effective construction will support the construction of a larger program. This type of approach could provide a nice complement to the gains that can be achieved from VAM.

What is needed to effectively ask these types of questions? One necessary ingredient is the ability to work collaboratively on the implementation of common objectives, assessments, and instructional approaches across different classes and schools. It must be possible to distribute necessary materials to all the teachers who need them. It must be possible to monitor the delivery of that instruction so that differences across teachers can be identified and, where necessary, they can be addressed. Highly consistent implementation is needed in order to make strong conclusions about what works or doesn’t work.

It also must be possible to gather accurate and reliable assessment data on a frequent basis. Assessments must be shared so that information may be reliably aggregated. Assessments should also be well integrated into instruction so that the picture of learning is highly detailed.

This approach is a nice complement to VAM because it positions us to answer the question of what can be done when differences in outcomes are identified across classrooms. The implicit assumption is that teacher effectiveness can be taught once the components of effective instruction are identified. In a recent article, Stephen Raudenbush describes the successful implementation of a literacy program based on what he terms a shared systemic approach to instruction. Central to the approach are shared goals, instructional content, and assessments. Differences in teacher expertise are expected and the system encourages mentoring by those whose skills are more advanced. Raudenbush argues that this sort of collaborative approach is key to effectively identifying and then implementing the kinds of systemic changes that will ultimately advance instruction and improve schools.

The tools within Galileo have been designed to support the process of determining what strategies are effective in helping students to meet goals. As we described in our recent seminar, the intervention model positions districts to do that sort of collaborative work. We would be interested in hearing responses from those who have worked in a district where such an approach was implemented. How did it seem to work? What sort of approach was taken to implementation? What kinds of problems came up?

Friday, May 1, 2009

The Calculations behind Forecasting Risk and Making Predictions in Galileo K-12 Online

Thanks for the comment on my previous post, Gerardo! I’ll work through the calculations you requested in this response, but the real work is in making sure that the benchmark assessments are properly aligned to state standards and that student scores on the benchmark assessments correlate well with their scores on the statewide assessment. The validity of Galileo K-12 Online benchmark assessments, both in terms of the alignment of content and the correlations with the state test scores, has been well-established (see the Galileo K-12 Online Technical Manual), and so now we are free to engage in a very straightforward, easy, and accurate approach to forecasting student performance on statewide assessments like AIMS.

The first step in forecasting student performance on a statewide assessment is establishing cut scores on the benchmark assessment that correspond to the cut scores on the statewide assessment. To do this we use equipercentile equating (e.g. Kolen & Brennan, 2004). With equipercentile equating, you start with the distribution of student scores on the target assessment. In the example I’ll work through here the target assessment, the one we want to make predictions about, is the 3rd grade math AIMS assessment (the statewide assessment in Arizona). The distribution of scores that is used for the equating process is the set of scores from that particular district’s students on the previous year’s assessment. In this case, 25% of the district’s third-graders had fallen below the critical cut score for meeting the standard on the spring, 2007 AIMS assessment, and so for the 2007-08 3rd grade math benchmark assessments, the cut score for Meets the Standard was set at the 25th percentile. The same approach was used for the other two cut scores (Approaches and Exceeds) but for the purposes of this discussion, we are only concerned with the cut score for Meets, which is essentially the pass/fail cut score.

Once the cut scores for benchmark assessments are established, they can be used to estimate each student’s degree of risk of not meeting the standard on the next statewide assessment. As stated in my original blog on this topic, we have found that observing a student’s pattern of performance across multiple benchmark assessments yields more accurate forecasts of likely performance on the statewide assessment than does looking at the student’s score on one assessment in isolation. Classification depends on whether the student scored above or below the cut score for Meets the Standard on each benchmark assessment. If the student scored above the cut score for Meets on all three, then she is said to be On Course for demonstrating mastery on the statewide assessment. A student who scores below the cut score on all three assessments is classified as being at High Risk of not demonstrating mastery. Scoring above on two out of three assessments earns the classification of Low Risk and scoring above on only one out of three assessments earns the classification of Moderate Risk.

In Galileo K-12 Online, the reports that indicate student risk levels are linked directly to instructional materials and other tools to support intervention efforts with students who are at risk. This support for intervention efforts is the primary purpose of the risk classification scheme. But it is important to demonstrate that the classification scheme is accurate, which brings us to the data summary that Gerardo asked about.

The table below presents the data for the example I’ve been working through here.


The data are from a particular district’s 2007-08 benchmark assessments in 3rd grade math. The panel on the left shows the different possible patterns of performance on the series of three benchmark assessments: the first row represents students who scored above the cut score on all three benchmarks, and so on. The next column indicates the Risk Level classification for each pattern of performance. Note that scoring above the cut score on two out of three benchmark assessments leads to the same Risk Level classification, regardless of which two assessments were passed by the student. The number of students who showed each pattern of performance is indicated, as is the number of students in each pattern who did and did not demonstrate mastery on the AIMS assessment. For example, there were 238 students who scored above the cut score for Meets on all three benchmark assessments. Of these students, 234, or 98%, also met the standard when they took the AIMS assessment at the end of the year. The percent who met the standard in AIMS for each of the other risk groups was calculated in a similar manner. For the Low Risk group, it was simply a matter of adding up the number of students who passed the AIMS assessment (15+5+26), dividing by the total number of students in that risk group (19+9+34) and then multiplying by 100 to get 74%.

The Percent Met Standard column in the table presented here corresponds to the data in the table at the end of my previous post (“Forecasting Risk and Making Predictions about AMOs”). In that table, I presented averages for each Risk Group that were based on data from 7 school districts in a pilot investigation. The averages are collapsed across all of those districts and all grade level and content areas. We have plans to investigate this further, and I will keep the readers of this Blog posted in any developments.

The final column in the table indicates the accuracy of forecasting performance on AIMS for each of the Risk Groups. For the On Course and Low Risk groups, the prediction is that they will most likely meet the standard on AIMS. For these groups the calculation of the percent accuracy is the same as the calculation of the percent who Met the Standard in the previous column. For the other two groups, the prediction is that they will most likely NOT meet the standard on AIMS, and accuracy here refers to the percent of students who, as predicted, did not meet the standard on AIMS. For the High Risk group, accuracy was at 87% because 13% of these students met the standard on AIMS in spite of their failure to do so on any of the benchmark assessments. Even more interesting is the Moderate Risk group, for which accuracy was only 59% because 41% passed AIMS in spite of our prediction. At ATI, we actually like to be less accurate in these two categories. If our prediction is wrong with these students, it suggests that they and their teachers worked very hard, and they managed to pass the statewide assessment against the odds. We hope that the Galileo K-12 Reports and instructional materials were a part of that effort. That’s what it’s all about.

Reference

Kolen, M.J. & Brennan, R.L. (2004). Test equating, scaling, and linking: methods and practices. New York: Springer.