blog posts and news stories

How Efficacy Studies Can Help Decision-makers Decide if a Product is Likely to Work in Their Schools

We and our colleagues have been working on translating the results of rigorous studies of the impact of educational products, programs, and policies for people in school districts who are making the decisions whether to purchase or even just try out—pilot—the product. We are influenced by Stanford University Methodologist Lee Cronbach, especially his seminal book (1982) and article (1975) where he concludes “When we give proper weight to local conditions, any generalization is a working hypothesis, not a conclusion…positive results obtained with a new procedure for early education in one community warrant another community trying it. But instead of trusting that those results generalize, the next community needs its own local evaluation” (p. 125). In other words, we consider even the best designed experiment to be like a case study, as much about the local and moderating role of context, as about the treatment when interpreting the causal effect of the program.

Following the focus on context, we can consider characteristics of the people and of the institution where the experiment was conducted to be co-causes of the result that deserve full attention—even though, technically, only the treatment, which was randomly assigned was controlled. Here we argue that any generalization from a rigorous study, where the question is whether the product is likely to be worth trying in a new district, must consider the full context of the study.

Technically, in the language of evaluation research, these differences in who or where the product or “treatment” works are called “interaction effects” between the treatment and the characteristic of interest (e.g., subgroups of students by demographic category or achievement level, teachers with different skills, or bandwidth available in the building). The characteristic of interest can be called a “moderator”, since it changes, or moderates, the impact of the treatment. An interaction reveals if there is differential impact and whether a group with a particular characteristic is advantaged, disadvantaged, or unaffected by the product.

The rules set out by The Department of Education’s What Works Clearinghouse (WWC) focus on the validity of the experimental conclusion: Did the program work on average compared to a control group? Whether it works better for poor kids than for middle class kids, works better for uncertified teachers versus veteran teachers, increases or closes a gap between English learners and those who are proficient, are not part of the information provided in their reviews. But these differences are exactly what buyers need in order to understand whether the product is a good candidate for a population like theirs. If a program works substantially better for English proficient students than for English learners, and the purchasing school has largely the latter type of student, it is important that the school administrator know the context for the research and the result.

The accuracy of an experimental finding depends on it not being moderated by conditions. This is recognized with recent methods of generalization (Tipton, 2013) that essentially apply non-experimental adjustments to experimental results to make them more accurate and more relevant to specific local contexts.

Work by Jaciw (2016a, 2016b) takes this one step further.

First, he confirms the result that if the impact of the program is moderated, and if moderators are distributed differently between sites, then an experimental result from one site will yield a biased inference for another site. This would be the case, for example, if the impact of a program depends on individual socioeconomic status, and there is a difference between the study and inference sites in the proportion of individuals with low socioeconomic status. Conditions for this “external validity bias” are well understood, but the consequences are addressed much less often than the usual selection bias. Experiments can yield accurate results about the efficacy of a program for the sample studied, but that average may not apply either to a subgroup within the sample or to a population outside the study.

Second, he uses results from a multisite trial to show empirically that there is potential for significant bias when inferring experimental results from one subset of sites to other inference sites within the study; however, moderators can account for much of the variation in impact across sites. Average impact findings from experiments provide a summary of whether a program works, but leaves the consumer guessing about the boundary conditions for that effect—the limits beyond which the average effect ceases to apply. Cronbach was highly aware of this, titling a chapter in his 1982 book “The Limited Reach of Internal Validity”. Using terms like “unbiased” to describe impact findings from experiments is correct in a technical sense (i.e., the point estimate, on hypothetical repeated sampling, is centered on the true average effect for the sample studied), but it can impart an incorrect sense of the external validity of the result: that it applies beyond the instance of the study.

Implications of the work cited, are, first, that it is possible to unpack marginal impact estimates through subgroup and moderator analyses to arrive at more-accurate inferences for individuals. Second, that we should do so—why obscure differences by paying attention to only the grand mean impact estimate for the sample? And third, that we should be planful in deciding which subgroups to assess impacts for in the context of individual experiments.

Local decision-makers’ primary concern should be with whether a program will work with their specific population, and to ask for causal evidence that considers local conditions through the moderating role of student, teacher, and school attributes. Looking at finer differences in impact may elicit criticism that it introduces another type of uncertainty—specifically from random sampling error—which may be minimal with gross impacts and large samples, but influential when looking at differences in impact with more and smaller samples. This is a fair criticism, but differential effects may be less susceptible to random perturbations (low power) than assumed, especially if subgroups are identified at individual levels in the context of cluster randomized trials (e.g., individual student-level SES, as opposed to school average SES) (Bloom, 2005; Jaciw, Lin, & Ma, 2016).

References:
Bloom, H. S. (2005). Randomizing groups to evaluate place-based programs. In H. S. Bloom (Ed.), Learning more from social experiments. New York: Russell Sage Foundation.

Cronbach, L. J. (1975). Beyond the two disciplines of scientific psychology. American Psychologist, 116-127.

Cronbach, L. J. (1982). Designing evaluations of educational and social programs. San Francisco, CA: Jossey-Bass.

Jaciw, A. P. (2016). Applications of a within-study comparison approach for evaluating bias in generalized causal inferences from comparison group studies. Evaluation Review, (40)3, 241-276. Retrieved from https://journals.sagepub.com/doi/abs/10.1177/0193841X16664457

Jaciw, A. P. (2016). Assessing the accuracy of generalized inferences from comparison group studies using a within-study comparison approach: The methodology. Evaluation Review, (40)3, 199-240. Retrieved from https://journals.sagepub.com/doi/abs/10.1177/0193841x16664456

Jaciw, A., Lin, L., & Ma, B. (2016). An empirical study of design parameters for assessing differential impacts for students in group randomized trials. Evaluation Review. Retrieved from https://journals.sagepub.com/doi/10.1177/0193841X16659600

Tipton, E. (2013). Improving generalizations from experiments using propensity score subclassification: Assumptions, properties, and contexts. Journal of Educational and Behavioral Statistics, 38, 239-266.

2018-01-16

Spring 2018 Conference Season is Taking Shape


We’ll be on the road again this spring.

SREE

Andrew Jaciw and Denis Newman will be in Washington DC for the annual spring conference of the The Society for Research on Educational Effectiveness (SREE), the premier conference on rigorous research. Andrew Jaciw will present his paper: Leveraging Fidelity Data to Making Sense of Impact Results: Informing Practice through Research. His presentation will be a part of Session 2I: Research Methods - Post-Random Assignment Models: Fidelity, Attrition, Mediation & More from 8-10am on Thursday, March 1.

SXSW EDU

In March, Denis Newman will be attending SXSW EDU Conference & Festival in Austin, TX and presenting on a panel along with Malvika Bhagwat, Jason Palmer, and Karen Billings titled Can Evidence Even Keep Up with EdTech? This will address how researchers and companies can produce evidence that products work—in time for educators and administrators to make a knowledgeable buying decision under accelerating timelines.

AERA

Empirical staff will be presenting in 4 different sessions at the annual conference of the American Educational Research Association (AERA) in NYC in April, all under Division H (Research, Evaluation, and Assessment in Schools).

  1. For Quasi-experiments on Edtech Products, What Counts as Being Treated?
  2. Teacher evaluation rubric properties and associations with school characteristics: Evidence from the Texas evaluation system
  3. Indicators of Successful Teacher Recruitment and Retention in Oklahoma Rural Schools
  4. The Challenges and Successes of Conducting Large-scale Educational Research

In addition to these presentations, we are planning another of our celebrated receptions in NYC so stay tuned for details.

ISTE

A panel on our Research Guidelines has been accepted at this major convention, considered the epicenter of edtech with thousands of users and 100s of companies, held this year in Chicago from June 24–27.

2017-12-18

APPAM doesn’t stand for A Pretty Pithy Abbreviated Meeting

APPAM does stand for excellence, critical thinking, and quality research.

The 2017 fall research conference kept reminding me of one recurrent theme: bridging the chasms between researchers, policymakers, and practitioners.

photo of program

Linear processes don’t work. Participatory research is critical!

Another hot topic is generalizability! There is a lot of work to be done here. What works? For whom? Why?

photo of city

Lots of food for thought!

photo of cake

2017-11-06

IES Publishes our Recent REL Southwest Teacher Studies

The U.S. Department of Education’s Institute of Education Sciences published two reports of studies we conducted for REL Southwest! We are thankful for the support and engagement we received from the Educator Effectiveness Research Alliance and the Oklahoma Rural Schools Research Alliance throughout the studies. The collaboration with the research alliances and educators aligns well with what we set out to do in our core mission: to support K-12 systems and empower educators in making evidence-based decisions.

The first study was published earlier this month and identified factors associated with successful recruitment and retention of teachers in Oklahoma rural school districts, in order to highlight potential strategies to address Oklahoma’s teaching shortage. This correlational study covered a 10-year period (the 2005-06 to 2014-15 school years) and used data from the Oklahoma State Department of Education, the Oklahoma Office of Educational Quality and Accountability, federal non-education sources, and publicly available geographic information systems from Google Maps. The study found that teachers who are male, those who have higher postsecondary degrees, and those who have more teaching experience are harder than others to recruit and retain in Oklahoma schools. In addition, for teachers in rural districts, higher total compensation and increased responsibilities in job assignment are positively associated with successful recruitment and retention. In order to provide context, the study also examined patterns of teacher job mobility between rural and non-rural school districts. The rate of teachers in Oklahoma rural schools reaching tenure is slightly lower than the rates for teachers in non-rural areas. Also, rural school districts in Oklahoma had consistently lower rates of success in recruiting teachers than non-rural school districts from 2006-07 to 2011-12.

This most recent study, published last week, examined data from the 2014-15 pilot implementation of the Texas Teacher Evaluation and Support System (T-TESS). In 2014-15 the Texas Education Agency piloted the T-TESS in 57 school districts. During the pilot year teacher overall ratings were based solely on rubric ratings on 16 dimensions across four domains.

The study examined the statistical properties of the T-TESS rubric to explore the extent to which it differentiates teachers on teaching quality and to investigate its internal consistency and efficiency. It also explored whether certain types of schools have teachers with higher or lower ratings. Using data from the pilot for more than 8,000 teachers, the study found that the rubric differentiates teacher effectiveness at the overall, domain, and dimension levels; domain and dimension ratings on the observation rubric are internally consistent; and the observation rubric is efficient, with each dimension making a unique contribution to a teacher’s overall rating. In addition, findings indicated that T-TESS rubric ratings varied slightly in relation to some school characteristics that were examined, such as socioeconomic status and percentage of English Language Learners. However, there is little indication that these characteristics introduced bias in the evaluators’ ratings.

2017-10-30

New Article Published on the Processes Involved with Scaling-Up or Abandoning an Innovation

Our study of scaling up an innovation that challenges conventional approaches to research is being published in the Peabody Journal of Education and is now available online at Taylor & Francis

The article, “School Processes That Can Drive Scaling-Up of an Innovation or Contribute to Its Abandonment”, looks at the drivers of school-level processes that predict the growth or the attrition of a school’s team implementing an innovation. We looked for the factors that helped to explain the school-level success or failure of a high school academic literacy framework, Reading Apprenticeship, developed by WestEd’s Strategic Literacy Initiative (SLI). The work was funded by an i3 validation grant on which we were independent evaluators. SLI had an innovative strategy for scaling-up, involving school-based cross-disciplinary teacher teams, and brought the framework to 274 schools across five states. This strategy follows research literature that views scale-up as increasing local ownership and depth of commitment. In this study, we show that there are factors working both for and against the increase of teachers and schools joining and staying in an innovation. Given wide variation in teacher uptake, we can identify processes present in the initial year that predicted gains and losses of participants.

Clicking on this link will allow you to read the abstract (and the full article if you subscribe to the journal). If you don’t already subscribe, but you would like to read the article, send us an email, and we will share with you a link that will grant you a free download of the article.

2017-10-20

Sure, the edtech product is proven to work, but will it work in my district?

It’s a scenario not uncommon in your district administrators’ office. They’ve received sales pitches and demos of a slew of new education technology (edtech) products, each one accompanied with “evidence” of its general benefits for teachers and students. But underlying the administrator’s decision is a question often left unanswered: Will this work in our district?

In the conventional approach to research advocated, for example, by the U.S. Department of Education and the Every Student Succeeds Act (ESSA), the finding that is reported and used in the review of products is the overall average impact for any and all subgroups of students, teachers, or schools in the study sample. In our own research, we have repeatedly seen that who it works for and under what conditions can be more important than the average impact. There are products that are effective on average but don’t work for an important subgroup of students, or vice versa, work for some students but not all. Some examples:

  • A math product, while found to be effective overall, was effective for white students but ineffective for minority students. This effect would be relevant to any district wanting to close (rather than further widen) an achievement gap.
  • A product that did well on average performed very well in elementary grades but poorly in middle school. This has obvious relevance for a district, as well as for the provider who may modify its marketing target.
  • A teacher PD product greatly benefitted uncertified teachers but didn’t help the veteran teachers do any better than their peers using the conventional textbook. This product may be useful for new teachers but a poor choice for others.

As a research organization, we have been looking at ways to efficiently answer these kinds of questions for products. Especially now, with the evidence requirements built into ESSA, school leaders can ask the edtech salesperson: “Does your product have evidence that ESSA calls for?” They may well hear an affirmative answer supported by an executive summary of a recent study. But, there’s a fundamental problem with what ESSA is asking for. ESSA doesn’t ask for evidence that the product is likely to work in your specific district. This is not the fault of ESSA’s drafters. The problem is built into the conventional design of research on “what works”. The U.S. Department of Education’s What Works Clearinghouse (WWC) bases its evidence rating only on an average; if there are different results for different subgroups of students, that difference is not part of the rating. Since ESSA adopts the WWC approach, that’s the law of the land. Hence, your district’s most pressing question is left unanswered: will this work for a district like mine?

Recently, the Software & Information Industry Association, the primary trade association of the software industry, released a set of guidelines for research explaining to its member companies the importance of working with districts to conduct research that will meet the ESSA standards. As the lead author of this report, I can say it was our goal to foster an improved dialog between the schools and the providers about the evidence that should be available to support buying these products. As an addendum to the guidelines aimed at arming educators with ways to look at the evidence and questions to ask the edtech salesperson, here are three suggestions:

  1. It is better to have some information than no information. The fact that there’s research that found the product worked somewhere gives you a working hypothesis that it could be a better than average bet to try out in your district. In this respect, you can consider the WWC and newer sites such as Evidence for ESSA rating of the study as a screening tool—they will point you to valid studies about the product you’re interested in. But you should treat previous research as a working hypothesis rather than proof.
  2. Look at where the research evidence was collected. You’ll want to know whether the research sites and populations in the study resemble your local conditions. WWC has gone to considerable effort to code the research by the population in the study and provides a search tool so you can find studies conducted in districts like yours. And if you download and read the original report, it may tell you whether it will help reduce or increase an achievement gap of concern.
  3. Make a deal with the salesperson. In exchange for your help in organizing a pilot and allowing them to analyze your data, you get the product for a year at a steep discount and a good ongoing price if you decide to implement the product on a full scale. While you’re unlikely to get results from a pilot (e.g., based on spring testing) in time to support a decision, you can at least lower your cost for the materials, and you’ll help provide a neighboring district (with similar populations and conditions) with useful evidence to support a strong working hypothesis as to whether it is likely to work for them as well.
2017-10-15

National Forum to Advance Rural Education 2017


We are participating in 2 discussions at the National Forum to Advance Rural Education, organized by Battelle for Kids on Thursday, October 12, 2017.

THURSDAY, OCTOBER 12 | 1:15–2:15pm
Quality Teachers in Rural Schools: Lessons Learned in Oklahoma
Join a discussion with the Regional Educational Laboratory Southwest (REL Southwest) and practitioners in the Oklahoma Rural Schools Research Alliance about their research focused on two areas of high need in rural schools: teacher recruitment and retention, and professional development. This informal discussion with the researchers and Oklahoma practitioners will focus on how you can use the information from these studies in your own state and school district.
Presenters:
Pia Peltola (REL Southwest, American Institutes for Research)
Susan Pinson (Oklahoma State Department of Education)
Kathren Stehno (Office of Educational Quality & Accountability)
Megan Toby (Empirical Education)
Haidee Williams (REL Southwest, American Institutes for Research)

Rosa Ailbouni Room, Third Floor

THURSDAY, OCTOBER 12 | 2:30–3pm
Recruiting and Retaining Quality Teachers in Oklahoma
Learn about research conducted in partnership with the Regional Educational Laboratory Southwest (REL Southwest) and practitioners in the Oklahoma Rural Schools Research Alliance. The research identified teacher, district, and community characteristics that are predictors of successful teacher recruitment and retention in rural Oklahoma which can inform future policy and practice. Join the researchers and alliance members who guided the research and discover how you can use the information in your school district.
Presenters:
Kathren Stehno (Office of Educational Quality & Accountability)
Megan Toby (Empirical Education)
Haidee Williams (REL Southwest, American Institutes for Research)

Great Hall Meeting Room 2, First Floor


2017-10-04

Join Our Webinar: Measuring Ed Tech impact in the ESSA Era

Tuesday, November 7, 2017 … 2:00 - 3:00pm PT

Our CEO, Denis Newman, will be collaborating with Andrew Coulson (Chief Strategist, MIND Research Institute) and Bridget Foster (Senior VP and Managing Director, SIIA) to bring you an informative webinar next month!

This free webinar (Co-hosted by edWeb.net and MCH Strategic Data) will introduce you to a new approach to evidence about which edtech products really work in K-12 schools. ESSA has changed the game when it comes to what counts as evidence. This webinar builds on the Education Technology Industry Network’s (ETIN) recent publication of Guidelines for EdTech Impact Research that explains the new ground rules.

The presentation will explore how we can improve the conversation between edtech developers and vendors (providers), and the school district decision makers who are buying and/or piloting the products (buyers). ESSA has provided a more user-friendly definition of evidence, which facilitates the conversation.

  • Many buyers are asking providers if there’s reason to think their product is likely to work in a district like theirs.
  • For providers, the new ESSA rules let them start with simple studies to show their product shows promise without having to invest in expensive trials to prove it will work everywhere.

The presentation brings together two experts: Andrew Coulson, a developer who has conducted research on their products and is concerned with improving the efficacy of edtech, and Denis Newman, a researcher who is the lead author of the ETIN Guidelines. The presentation will be moderated by Bridget Foster, a long-time educator who now directs the ETIN at SIIA. This edWebinar will be of interest to edtech developers, school and district administrators, education policy makers, association leaders, and any educator interested in the evidence of efficacy in edtech.

If you would like to attend, click here to register.

2017-09-28

We Are Participating in the Upcoming REL Webinar on Teacher Mobility

Join Regional Educational Laboratories Midwest and Southwest for a free webinar on October 4 to learn how states can address teacher demand and mobility trends. As a partner in REL Southwest, we will be reporting on our work on teacher recruitment and retention in rural Oklahoma.

Teachers and administrators change schools for a variety of reasons. Mobility can be a positive if an educator moves to a position that is a better fit, but it can also have serious implications for states. Mobility may harm schools that serve high-need populations, and mobility can also create additional recruitment and hiring costs for districts.

This webinar focuses on research addressing the teacher pipeline and the mobility of teachers between schools and districts. Presenters will discuss two published REL Midwest research studies on teacher mobility trends and strategies for estimating teacher supply and demand. Following each presentation, leaders from the Oklahoma State Department of Education and the Minnesota Department of Education will respond to the presentations and share state initiatives to meet teacher staffing needs. Presenters also will briefly highlight two upcoming REL Southwest studies related to teacher supply and demand that are expected to be released later this year.

The studies that will be discussed are:
- An examination of the movement of educators within and across three Midwest Region states (REL Midwest, AIR)
- Strategies for estimating teacher supply and demand using student and teacher data (REL Midwest, AIR)
- Indicators of successful teacher recruitment and retention in Oklahoma rural schools (REL Southwest, Empirical Education)
- Teacher mobility in Texas: Trends and associations with student, teacher, and school characteristics (REL Southwest, AIR, Empirical Education)

This webinar is designed for state education staff, administrators in schools and districts with significant American Indian populations, American Indian community leaders, research alliance and community of practice members, and education researchers. If you cannot attend the live event, register at the link below to be notified when a recording of the webinar is available online.

Exploring Educator Movement Between Districts
October 4, 2017
10:00–11:30 a.m. PT

The Regional Educational Laboratories (RELs) build the capacity of educators to use data and research to improve student outcomes. Each REL responds to needs identified in its region and makes learning opportunities and other resources available to educators throughout the United States. The REL program is a part of the Institute of Education Sciences (IES) in the U.S. Department of Education. To receive regular updates on REL work, including events and reports, follow IES on Facebook and Twitter.

You can register for this event on the REL website.

2017-09-21

Partnering with SRI and CAST on an RCT

Empirical Education and CAST are excited to announce a new partnership under an Investing in Innovation (i3) grant.

We’ll evaluate the Enhanced Units program, which was written as a development proposal by SRI and CAST. This project will aim to integrate content enhancement routines and learning and collaboration strategies, enhancements to improve student content learning, higher order reasoning, and collaboration.

We will conduct the experiment within up to three school districts in California and Virginia—working with teachers of high school science and social studies students. This is our first project with CAST, and it builds on our extensive experience conducting large-scale, rigorous, experimental impact studies, as well as formative and process evaluations.

For more information on our evaluation services and our work on i3 projects, please visit our i3 /EIR page and/or contact Robin Means.

2017-07-27
Archive