Tuesday, August 31, 2010

Adaptive Testing

While adaptive testing has been around for a long time, it has recently been gaining in popularity. Many have been looking at it as a magic bullet to meet the data driven decision making requirements of the modern accountability era. The approach can greatly shorten the time required to deliver an assessment with an acceptable level of measurement precision. While it is certainly true that adaptive testing has a number of potential benefits that provide it a place in the assessment bag of tools, it also has a number of limiting characteristics that have to be considered in selecting it for a particular use.

How does an adaptive test adapt? The approaches that are currently used fall into two basic categories based on whether adaptation decisions are made after every item or after a group of items. In the first case, a cumulative score is calculated each time a student completes an item. The next item is typically selected by determining the item that will provide the optimal level of information regarding the student’s ability. Once a score has been reached that provides an adequate level of measurement precision, the assessment is ended. Typically this endpoint is reached after far fewer questions than would be required to achieve a comparable level of precision on a standard or “linear” test , thus making the approach extremely efficient.

In the second case, decisions are made after administering preconfigured blocks of items rather than on an item by item basis. A typical design might specify a routing test followed by two stages, each of which involves a decision between pre-constructed item blocks optimized for lower, average, and high ability. In this design, students would complete the routing test and then be assigned to the appropriate second stage item block based on their score. After completing the second stage they would again be routed to the appropriate block of items in stage three based on their cumulative score on the routing test and the second stage. After completing the third stage the assessment would be ended and an overall score determined based on all the items completed.

What are the benefits of adaptive testing? One of the most notable is its efficiency. This is particularly true with the item by item approach. As I mentioned, an adequately precise measure of a student’s ability may typically be achieved with far fewer items than would be required for a typical “linear” test. This efficiency can make the approach far easier to use when large groups of students must be tested in a short time.

What considerations go along with the benefits? Two of the most fundamental are content control and item exposure. Both are controlled to some degree by the multi-stage design. The multi-stage design is not quite as efficient as the single-stage design. However, in those cases in which content control and/or item exposure are important considerations, the multi-stage approach may be preferred.

Content control is especially important in standards-based education. While the item by item approach is extremely efficient, the ability to ensure that all students are exposed to content reflecting standards targeted for instruction may be compromised in the item by item approach. Johnny, Suzy, and Javier may all sit down together and not be exposed to any of the same items. Suzy may have to multiply fractions while Johnny and Javier don’t. This may not be cause for concern if the test is being used for purposes of screening, but it may limit the utility of the measure if the results are being used for very specific decisions about student instructional needs. For instance, if there are instructional decisions to be made about providing instructional coaching on multiplication of fractions for Johnny and Javier, it would likely be most informative they had actually been asked to respond to the questions that Suzy was given. While Item Response Theory (IRT) may be used to estimate the probability that students have mastered these skills even if they haven’t been given the questions, actual administration of the items can afford the additional benefit allowing mistakes to be analyzed. The pre-constructed design of the multi-stage approach affords control of this issue at the expense of some efficiency.

Item exposure is particularly important when item security is a concern. The focus of the item selection algorithms on picking the optimal item for a given ability level can mean that the same limited set of items come to the top of the heap repeatedly. This is particularly true for the item by item approach. This issue can limit the utility of the assessment for repeated use because students are likely to see the same items again and again. In high stakes situations, the door is open for cheating because the items that are likely to show up become predictable. As with the content consideration, the multi-stage approach provides some control over this issue at the expense of efficiency.

The combination of strengths and unique considerations involved with adaptive testing point to some particular uses for which it would be a strong choice. One of the most notable is screening. The efficiency with which an adaptive test can provide a reliable measure of ability make it a good choice when the issue at hand is deciding whether a student should enter into a particular instructional group or class. The degree of content control provided by a multi-stage approach could provide more granular information when the decision at hand is whether a child should be given additional help on a specific skill or set of skills.

ATI is current working on a module to support adaptive testing. The module will support the the item by item and multi-stage approaches. Provision of these options will allow districts to take advantage of the strengths provided by adaptive testing when they are designing an assessment program that will best meet their needs.

Tuesday, August 17, 2010

ATI Develops New Technological Capabilities to Assess Literacy in Early Childhood

During the period from early childhood to third grade, children are developing the skills necessary to become successful readers such as an awareness of how the sounds in spoken language correspond to written language. As children progress to fourth grade and beyond, their ability to successfully read and comprehend text becomes critical for learning in other domains like math, science, and social studies. Unfortunately, although literacy is a major focus of early childhood education, not all children become successful readers. According to the 2009 National Assessment of Educational Progress (NAEP), one in three fourth-graders nationwide cannot read above a basic level.

Over the last decade, childhood literacy has become an important topic on a national and state level. In 1997 and 2002, Congress convened two national panels (the National Reading Panel and the National Early Literacy Panel) to provide research-based recommendations on how to improve reading achievement in early childhood. The reports issued by these panels are available at www.nationalreadingpanel.org and www.nifl.gov. Individual states are also implementing initiatives related to early childhood literacy. For example, Arizona recently passed a bill which calls upon a statewide task force to provide recommendations for a set of statewide assessments to measure students’ reading abilities in grades one and two. The bill also requires school districts to screen students in preschool through second grade for reading deficiencies, providing an opportunity for early intervention during this critical time period. Along with these evaluations, the bill requires that students reading far below grade level at the end of third grade are not promoted to fourth grade. Retained students must be provided with targeted intervention such as summer school reading instruction, online reading instruction, a different reading teacher, or intensive reading instruction during the next academic year.

To support national and state early childhood literacy initiatives, ATI has been working to develop a set of assessments targeting the critical aspects of reading for grades K through three. By providing valid, reliable, standards aligned assessments, ATI can assist districts and schools by identifying students with reading deficiencies and suggesting appropriate areas for intervention. ATI also supports targeted interventions and online reading instruction through Instructional Dialogs and other intervention materials that allow students to practice early literacy skills. Assessing the early literacy skills of very young children presents some unique challenges. For example, many standards related to early literacy are not best assessed using a standard text-only, multiple-choice item. In addition, very young children often cannot read even simple text. For these reasons, the development of new technological capabilities and innovative item types has been an important part of ATI’s work.

One new and exciting technological capability developed by ATI for these assessments is the ability to include audio material in items. The audio capability makes it possible to create computer-administered items for children who are not yet able to read. Instead of the teacher reading the item aloud, the child can listen to pre-recorded instructions, questions, and answer choices. This capability enables a more standardized presentation of items and makes the assessment easier to administer. In addition, the audio capability allows the direct assessment of standards that address the awareness or manipulation of the sounds in spoken language (phonemic awareness) without teacher involvement.

Another newly developed ATI technological capability is the ability to present text for a predetermined time period. This capability has been crucial to the development of a new innovative item type designed to assess reading fluency (a concept related to reading rate). Previously, assessment options for reading fluency were limited to one-on-one testing and scoring procedures requiring subjective judgment. This new capability represents a major advance in ATI’s coverage of early childhood standards by enabling online assessment of reading fluency and automated scoring.

Two more technological capabilities have been developed by ATI to tailor the assessment process for very young children. First, to keep children engaged and to encourage them to progress through the assessment, visually engaging scenes have been developed that slowly appear piece by piece as a reward after each question is answered. Second, the new assessments and item types have been designed to be compatible with portable tablet computers that employ touch screen technology as well as standard desktop computers. By facilitating the assessment of young children, these new technological capabilities developed by ATI will support districts and schools in implementing literacy assessments and interventions in early childhood.