ATI Town Hall Blog: English Language Arts Test Design

English tests, by their nature, require the students to do a lot of reading. When designing these tests, how much reading is reasonable? How can we best assess students’ reading comprehension abilities without creating tests that are too long and have too many texts?

The answer comes at the very beginning of the process, in assessment planning. District pacing guides are meant to ensure that all classes are being instructed and students are making progress toward mastering essential standards. In designing a benchmark assessment, districts often use their pacing guides to plan assessments, but those pacing guides, while useful in tracking progress toward state standards mastery, rarely reflect the full scope of instruction that is occurring in the classroom during a benchmark period.

Do English teachers have their students read an entire story only to teach the idea of a main character? Of course not. They teach about plot, the author’s use of language, the context and setting of the story, how it relates to the author’s life experiences, and all of the other elements of literature that compose a novel, short story, poem, or dramatic work. However, pacing guides may only emphasize one of these standards in a particular assessment period.

The pacing guide approach to assessment design, while it has the benefit of matching the district’s plans for instructing and assessing standards, has a tendency to narrow the focus of instruction so much that the assessment requires a large number of texts to measure very specific aspects of a text, and leaves students’ holistic comprehension of a text unmeasured. When the purpose of assessment is to measure student progress, this seems like an opportunity missed.

Measurement reliability is best served by having long tests, the longer the test the greater the reliability. For our purposes and the realities of class time available for assessment, we recommend 35-50 items per test.

When a pacing guide emphasizes a few core standards, it helps to clarify expectations for everyone, but when a test measures only a narrow range of standards, many more questions are required per standard. When these standards are spread across multiple genres, or focused on comparing or synthesizing texts, we start to see unintended and undesirable characteristics, namely the inclusion of too many texts on an assessment.

For example, to address reliability with a 35-50 item test, a pacing guide of five learning standards would require seven to ten items per standard. That doesn’t seem like many items to fully measure a standard, until we consider a concept like main character. How many “main” characters will a short story have? Standards like this often require a new text for each question. Eight or ten texts seem like an awful lot to read to assess whether a student understands the concept of a main character.

How about comparing and contrasting two texts? When the student is asked to compare and contrast across genres, or two different authors’ explorations of a similar theme, it’s an opportunity to see students demonstrate analytical skills and synthesis of information, higher-order thinking skills we want them to develop through reading. One or two questions that compare two texts, and up to 5 or 10 more questions that require analysis of each text in depth can better measure these higher-order or holistic skills than forcing students to read 10 different texts to answer 5 questions focused only on comparison.

Some pacing guides strive for balance, incorporating elements of fiction and nonfiction in each benchmark period. This is beneficial for students and teachers, allowing them to explore different forms of reading and writing, often in relation to each other. Complications in assessment occur when too few standards or standards without any overlap of genres are implemented on the same assessment. Measuring a single standard five to eight times on the morals of folklore and mythology with the rest of the test addressing nonfiction standards will result in a number of folklore texts with very few items per text and no possibility of overlap with the nonfiction standards.

So how do we address these concerns? Assessment Technology Incorporated has worked with a number of districts to develop a text packaging system that allows districts to still emphasize the essential standards that they want reflected in each benchmark, but to reduce the number of texts that appear on the tests. This is accomplished by including other learning standards that the teachers are instructing that may not appear on that benchmark period’s pacing guide but are an important component in measuring students’ overall reading comprehension. The package also balances the number of items per standard based on the occurrence of the skill in everyday reading. For example, the main character standard would get one item, compare and contrast maybe a couple, and elements of literature might get three or four, a reasonable distribution of the types of information students would see in a normal short work of fiction.

Another approach is to focus on one or two specific genres in an assessment rather than trying to address poetry, short story, persuasive text, informational text, and dramatic works in one assessment.
We can, by testing by genre or limiting repetition of POs that involve compare and contrast, cross-genre, cross-cultural, and single-instance items, reduce the number of texts.

The table below shows a test created without the text packaging approach for 2010-11, and the same test adjusted by the methods outlined above. Note that with text packaging the number of words that students had to read was cut nearly in half while the number of items on the two tests remained nearly the same. There are many benefits to fully utilizing texts as we are doing with the text-packaging approach: fewer texts for the students to read on the assessment, a more thorough demonstration of understanding of the text, less repetitive questioning, and by using fewer texts per test there are more texts available to choose from in later assessments.

*GO is a graphic organizer. It does not have a word count.
**Note this is the total number of items on the test, not the sum of the column as some items have more than one text attached.

If you are interested in exploring the text packaging options available in Galileo K-12 Online, please contact your Field Service coordinator or Karyn White in Educational Management Services to learn more.

Thursday, October 27, 2011

English Language Arts Test Design

No comments: