Monday, April 20, 2015

How to Use Item Parameters to Make Decisions During Test Review

Assessment Technology’s use of Item Response Theory (IRT) provides clients with rich information about items. This information can be available for items written by district professionals as well as those written by ATI item writers.

Explanations of the provided data points is provided below.

Parameter Definitions
Understanding IRT Item Parameters
The best way to understand what item parameters refer to is to look at an Item Characteristic Curve. On an item characteristic curve, which presents the data for one, specific item, student ability (based on their performance on the assessment as a whole) is plotted on the horizontal axis, with a mean of 0 and a standard deviation of 1. The probability of answering the item correctly is plotted on the vertical axis. Typically the probability of answering correctly is relatively low for low ability students and relatively high for students of higher ability.


Difficulty Parameter
The first example (Item 4) is an example of a great item as far as the parameters go. The b-value (difficulty) for that item was 0.689, which is a bit on the difficult side, but not too bad. The important point here is that b-values (item difficulty) are on the same scale as student ability. So what this example is telling us is that students at or above 0.689 standard deviations above the mean are likely to get the answer correct. Students below that point on the ability scale are more likely to answer incorrectly. The b-parameter is also known as the location parameter, because it locates the point on the ability scale where students start demonstrating mastery of the concept.

 
Figure 1
Item 4 with a difficult b-value.


Tip: This parameter should have a wide range, generally between -3 and +3, across a test.

Discrimination Parameter
The a-value (discrimination) refers to how well the item discriminates between different ability levels. It’s how steep the rise is in the curve that shows the probability of answering correctly. Ideally, there is a nice, steep rise in the probability of answering correctly like the one for test Item 4. That indicates that there is a dramatic change in how likely it is that a student has mastered the concept that’s pin-pointed within a very narrow range of the ability scale. You can be pretty confident that students above 0.689 standard deviations above the mean “get it” and that students below that point generally don’t. The discrimination parameter for test item 4 is 1.459.

The next example, Item 5, shows an item that doesn’t discriminate quite as well as Item 4. The a-value on that one is 0.53. It’s also a pretty easy item, with a b-value of -1.07. So, on this one, most students are likely to get it correct, unless they’re more than one standard deviation below the mean of the ability scale.
 
Figure 2
Item 5 with a lower b-value.


It’s not necessarily the case that an easy item automatically has poor discrimination. The final example, Item 8, is an easy item that discriminates very well. Although students at most ability levels are likely to select the correct answer, there is still a dramatic increase in likelihood of answering correctly within a relatively narrow range of the ability scale.
 

Figure 3
Item 8, is an easy item that discriminates well.


Tip:  This parameter should be near 1 or above.

Guessing Parameter
The guessing parameter (the c-value) is the probability of getting the item correct by just guessing. It defines the lower limit of the item characteristic curve. For a multiple choice item with four answer choices, the guessing parameter should be around 0.25 or, preferably, a bit lower.


Tip:  This parameter should be .25 or below for a four alternative multiple-choice test item.

As the Director of Educational Management Services, I have spent the past nine years working with teachers and reviewers in creating assessments using this information provided by IRT. I have found that this information although important is not the only consideration I use when I complete a test review. I find that the best use of item parameters is to be informed about what the data means, but at the same time, use knowledge of a district’s students, teachers, and curriculum to pick the best items to suit a specific population’s needs.

How do I use item parameters to inform my decision in test review? I believe that the best use of this data is to inform opinions and choices of a test reviewer. Let me give you an example.

I receive a comment from an initial review that says the item is too difficult for students. I check the b-value provided on the Test Review page. The item’s difficulty is a 3.00 or above, this shows that the reviewer’s intuition about the item is correct. I look to see what the b-values of the other items are on this assessment. If I find that there are alternative items with high b-values already on this test, I may decide to replace the item and place an easier item on the test.

On the other hand, if I check the b-value and find that the item has a b-value closer to 1.00, I know that other students have handled this item without a lot of difficulty. In this case, I may decide to leave the item on the assessment and see how students do on this item.

In other words, I believe that reviewer opinions and data should be used equally to inform decisions during test review.

Karyn White                                                        
Director of Educational Management Services
   

No comments: