ATI Town Hall Blog: May 2010

Pretests and posttests are among the most familiar forms of assessment in education. Moreover, interest in their use is rising as the nation becomes increasingly aware of the importance of assessing student academic growth over time. Although pretests and posttests are familiar forms of assessment, the ways in which they can be used and misused are sufficiently unfamiliar to deserve discussion. Determining how best to design and implement pretests and posttests is complex because these forms of assessment can be used effectively in many ways. Each way introduces the possibility of misuse. When misuse occurs, the potential value of the information that these assessments can provide is compromised. The purpose of this blog is to outline the uses of pretests and posttests and to factors issues related to use and misuse that can assist in preserving the considerable information value that can be realized when these forms of assessment are implemented effectively.

Uses of Pretests and Posttests

As its name implies, in education a pretest is an examination given prior to the onset of instruction. By contrast, a posttest measures proficiency following instruction. Pretests and posttests may serve a number of useful purposes. These include determining student proficiency before and after instruction, measuring student progress during a specified period of instruction, and comparing the performance of different groups of students before and after instruction.

Determining Proficiency Before or After Instruction

A pretest may be administered without a posttest to determine the initial level of proficiency attained by students prior to the beginning of instruction. Information on initial proficiency may be used to guide early instructional planning. For example, initial proficiency may indicate the capabilities that need special emphasis to promote learning during the early part of the school year. A posttest may be administered without a pretest to determine proficiency following instruction. For example, statewide assessments are typically administered toward the end of the school year to determine student proficiency for the year.

The design of pretests and posttests should be informed by the purposes that the assessments are intended to serve. For example, if the pretest is intended to identify enabling skills that the student possesses that are likely to assist in the mastery of instructional content to be covered during the current year, then the pretest should include skills taught previously that are likely to be helpful in promoting future learning during the year. Similarly, if a posttest is intended to provide a broad overview of the capabilities taught during the school year, then the assessment should cover the full range of objectives covered during that period. For instance, the posttest might be designed to cover the full range of objectives addressed in the state blueprint.

Measuring Progress

A pretest accompanied by a posttest can support the measurement of progress from the beginning of an instructional period to the end of the period. For example, teachers may use information on progress during the school year to determine whether or not proficiency is advancing rapidly enough to support the assumption that students will meet the standard on the upcoming statewide assessment. If the pretest and posttest are to be used to measure progress, then it is often useful to place pretest scores on a common scale with the posttest scores. When assessments are on a common scale, progress can be assessed with posttest items that differ from the items on the pretest. The problem of teaching to the test is effectively addressed because the item sets for the two tests are different. More specifically, it cannot be claimed that students improved because they memorized the answers to the specific questions on the pretest. Item Response Theory (IRT) provides one of a number of possible approaches that may be used to place pretests and posttests on a common scale. When IRT is used, the scaling process can be integrated into the task of estimating item parameters. Integration reduces computing time and complexity. For these reasons, ATI uses IRT to place scores from pretest, posttests, and other forms of assessment on a common scale.

Comparing Groups

A pretest may be given to support adjustments needed to make comparisons among groups with respect to subsequent instructional outcomes measured by performance on a posttest. Group comparisons may be implemented for a number of reasons. For example, group comparisons are generally required in experimental studies. In the prototypical experiment, students are assigned at random to different experimental conditions. Learning outcomes for each of the conditions are then compared. Group comparisons may also occur in instances in which there is an interest in identifying highly successful groups or groups needing additional resources. For example, group comparisons may be initiated to identify highly successful classes or schools. Group comparisons may be made to determine the extent to which instruction is effective in meeting the needs of NCLB subgroups. Finally, group comparisons involving students assigned to different teachers or administrators may be made if a district is implementing a performance-based pay initiative in which student outcomes play a role in determining staff compensation.

If a pretest and posttest are used to support comparisons among groups, a number of factors related to test design, test scheduling, and test security must be considered. Central concerns related to design involve content coverage and test difficulty. Both the pretest and the posttest should cover the content areas targeted for instruction. For example, if a particular set of objectives is covered on the pretest, then those objectives should also be addressed on the posttest. Targeted content increases the likelihood that the assessments will be sensitive to the effects of instruction. Both the pretest and the posttest should include a broad range of items varying in difficulty. Moreover, when the posttest follows the pretest by several months, the overall difficulty of the posttest generally should be higher than the difficulty of the pretest. Variation in difficulty increases the likelihood that instructional effects will be detected. For example, if both the pretest and the posttest are very easy, the likelihood of detecting effects will be reduced. In the extreme case in which all students receive a perfect score on each test, there will be no difference among the groups being compared.

When group comparisons are of interest, care should be taken to ensure that the pretest is administered at approximately the same time in all groups. Likewise the posttest should be administered at approximately the same time in all groups. Time on task affects the amount learned. When the time between assessments varies among groups, group comparisons may be spuriously affected by temporal factors.

Test security assumes special importance when group comparisons are made. Security is particularly important when comparisons involve high-stakes decisions. Security requires controlled access to tests and test items. Secure tests generally should not be accessible either before or after the time during which the assessment is scheduled. Galileo K-12 Online includes security features that restrict access to items on tests requiring high levels of security. Security imposes a number of requirements related to the handling of tests. When an assessment is administered online, the testing window should be as brief as possible. After the window is closed, students who have completed the test should not have the opportunity to log back into the testing environment and change their answers. Special provisions must be made for students who have missed the initial testing window and are taking the test during a subsequent period. When a test is administered offline, testing materials should be printed as close to the scheduled period for taking the assessment as possible. Materials available prior to the time scheduled for administration should be stored in a secure location. After testing, materials should either be stored in a secure location or destroyed.

Misuse of Pretests and Posttests

The misuse of tests generally occurs when tests are used for purposes other than those for which they are intended. This is true for pretests and posttests as well as for other kinds of assessments. As the previous discussion has shown, pretests and posttests are designed to serve a limited number of specific purposes. When these assessments are used for other purposes, there is a risk that the value of the information that they provide will be compromised. For example, if a pretest or posttest were to be used as a customized benchmark test, assessment results could be misleading. Conversely, if a customized benchmark assessment were used as a pretest or posttest, the credibility of assessment results could be compromised.

The central purpose of customized benchmark tests is to inform instruction. This purpose carries with it implications for test design and implementation that are generally not compatible with the purposes served by pretests and posttests. Benchmark assessments are interim assessments occurring during the school year following specified periods of instruction. Benchmark assessments provide a measure of what has been taught and an indication of what needs to be taught to promote further learning. Pretests and posttests are generally not well suited to serve as benchmarks because they often include constraints that limit their use in informing instruction. For instance, it is useful for teachers to analyze performance on benchmark assessment items to determine the kinds of mistakes made by students. This information is subsequently used to guide instruction. Pretests and posttests often call for high levels of security that curtail the analysis of specific items for purposes of informing instruction.

The temptation to use pretest and posttests as benchmarks may stem from the laudable motive of reducing the amount of time and resources devoted to testing students. Reductions in testing time increase the time available for instruction and reduce the costs associated with assessment. These are desirable outcomes. Unfortunately there often is a heavy cost associated with using assessments for purposes for which there are not intended. Often the cost is to compromise the validity of the assessments. When validity is compromised, results can be misleading. Pretests and posttests are valuable assessment tools. When they are appropriately designed, they can make a highly significant contribution to the success of an assessment program.

The question above is the million, or perhaps more appropriately, the multi-million dollar question inherent in effectively responding to one of the key areas of education reform called for in the Race to the Top (RTTT) Application for Phase 2 Funding CFDA Number: 84.395A. Fact of the matter is, the use of local, technology-based instructional improvement systems that measure student growth and successes and that inform teachers and principals about how to improve instruction is rapidly becoming an integral part of the educational landscape and a growing necessity for school districts in order to accomplish their goals in meeting local, state, and federal requirements for elevating student achievement.

According the Phase 2 RTTT application, “instructional improvement systems are technology-based tools and other strategies that provide teachers, principals, and administrators with meaningful support and actionable data to systemically manage continuous instructional improvement, including such activities as: instructional planning; gathering information through formative assessments, interim assessments, summative assessments, and looking at student work and other student data; analyzing information with the support of rapid-time reporting, using this information to inform decisions on appropriate next instructional steps, and evaluating the effectiveness of the actions taken. Such systems promote collaborative problem-solving and action planning; they may also integrate instructional data with student level data such as attendance, discipline, grades, credit accumulation, and student survey results to provide early warning indicators of a student’s risk of educational failure.”

As part of our efforts to support states and districts in addressing the challenges and opportunities inherent in the selection and implementation of an instructional improvement system, we have prepared a resource document. The resource document is located at http://www.ati-online.com/pdfs/researchK12/Meeting_IIS_Requirements.pdf. The material within the document is intended to assist local education agencies (LEAs) and State Departments of Education (SDEs) currently using or seeking to use an instructional improvement system to clearly define for themselves and in grant writing the attributes of the system and the contributions the system can make to LEA instructional improvement efforts.

We encourage you to take a look at the resource document; see if the information contained within it is addressed by your current system; and use the document as a guide in your ongoing efforts to improve the impact of your education initiatives on student learning. If you have questions or ideas that you would like to share please do so in this blog and/or contact me at Assessment Technology Incorporated.

Jason K. Feld, Ph.D.
1-800-367-4762 (Ext 121)
Jason@ati-online.com

Monday, May 10, 2010

Pretests and Posttests

Wednesday, May 5, 2010

What Constitutes an Effective, Research-Based Instructional Improvement System?