ATI Town Hall Blog: Generalizing Experimental Findings across Groups

One of the comments following Chuck Brainerd’s Forum presentation on experimental research in standards-based education raised a number of interesting questions involving the generalization of experimental findings across groups. The fundamental question is whether or not experimental findings obtained with one sample of students will generalize to intervention findings involving another sample of students. A related question is whether a large-scale experiment involving several hundred students will yield greater likelihood of generalizability across groups than a small-scale study.

These are complicated questions. To begin the discussion, let’s make some simplifying assumptions. Let’s assume that the experiment involves one experimental treatment group and one control group and that the results reveal a significant difference favoring the treatment group. Assume further that the experimental treatment can be applied without modification as the instructional procedure in the intervention. Under these conditions, we would expect the level of learning to be similar in the intervention group to that observed for the treatment group in the experiment if the two samples of students were drawn at random from the same population.

Given an appropriate sampling procedure, there would be no advantage for a large-scale experiment over a small-scale study. In fact, the small-scale study would have an advantage because random selection from the population would be less costly and less time consuming for a small-scale study than for a large-scale study. As Chuck pointed out, the only advantage associated with large numbers is an increase in statistical power. Since the power curve approaches the asymptote quickly, the benefits of increased sample size are rapidly outweighed by the increases in cost and time associated with large-scale studies.

What can we conclude in the absence of an appropriate sampling procedure? It is no secret that participants in experiments are rarely if ever selected at random from a defined population. Experimental studies are generally carried out using samples of convenience. In the best of circumstances, participants are assigned at random to experimental and control conditions. However, the step of selecting all participants at random from a defined population is almost always omitted. When this step is omitted, the basis for assuming generalizability of findings across groups is compromised. Moreover, there is no reason to assume that the degree of compromise is mitigated by increasing sample size.

In the case of small-scale studies, the question of whether or not findings from a particular group will generalize to other groups is often addressed by conducting multiple studies involving a variety of groups. This approach is supported by the ease of conducting small-scale studies and by the invariable need for additional research to address questions stemming from initial findings. As the number of replications of the initial findings rises, evidence for generalizability mounts. Historically, the conduct of multiple small-scale studies has been highly successful in supporting the generalizability of initial findings. For example, small-scale studies on observational learning have revealed generalizability not only across different groups of people, but also across species. For instance, dolphins and monkeys as well as people can learn simply by observing the behavior of a model.

In the case of large-scale studies, the conduct of multiple studies is generally impractical. As a consequence, the conduct of multiple studies does not provide an effective approach for establishing generalizability.

When research is closely linked to intervention practice involving implementation of an intervention system, the problem of generalizability is likely to be manageable for a small-scale study even when evidence regarding generalizability is lacking. If a small experimental study reveals a benefit for a particular treatment, our best guess is that it will have a benefit in practice. Thus, it is reasonable to implement the experimental treatment in a small-scale intervention. In an intervention system, implementation will be monitored to ensure fidelity of the treatment. Results will also be monitored to determine whether or not students are mastering the material presented at expected levels. If expected levels are achieved, intervention performance is consistent with expectations based on experimental findings. If expected levels are not achieved, intervention performance is not consistent with expectations. Under these conditions, an analysis of the intervention is conducted and because the intervention is short, adjustments can typically be made. Analyses of small-scale interventions may inform future research and guide practice toward the achievement of intervention goals.

As indicated at the beginning of this post, I have simplified the problems of linking experimental research to practice. For example, I have ignored issues often addressed through hierarchical linear modeling such as the nesting of school effects within districts and the nesting of class effects within schools. I have also avoided discussion of the contributions of generalizability theory to cross-group generalization. These issues I leave to another day.

Thursday, February 19, 2009

Generalizing Experimental Findings across Groups

1 comment: