blog posts and news stories

Multi-Arm Parallel Group Design Explained

What do unconventional arm wrestling and randomized trials have in common?

Each can have many arms.

What is a 3 arm RCT?

Multi arm trials (or multi arm RCTs) are randomized experiments in which individuals are randomly assigned to multiple arms: usually two or more treatment variants, and a control (a 3-arm RCT).

They can be referred to in a number of ways.

  • multi-arm trials
  • multi-armed trials
  • multiarm trials
  • multiarmed trials
  • multi arm RCTs
  • 3-arm, 4-arm, 5-arm, etc RCTs
  • multi-factorial design (a type of multi-arm trial)

a figure illustrating a 2-arm trial with 2 arms with one labeled treatment and one labeled control

a figure illustrating a 3-arm trial with 3 arms with one labeled treatment 1, one labeled treatment 2, and one labeled control

When I think of a multiarmed wrestling match, I imagine a mess. Can’t you say the same about multiarmed trials?

Quite the contrary. They can become messy, but not if they’re done with forethought and consultation with stakeholders.

I had the great opportunity to be the guest editor of a special issue of Evaluation Review on the topic of Multiarmed Trials, where experts shared their knowledge.

Special Issue: Multi-armed Randomized Control Trials in Evaluation and Policy Analysis

We were fortunate to receive five valuable contributions. I hope the issue will serve as a go-to reference for evaluators who want to explore options beyond the standard two-armed (treatment-control) arrangement.

The first three articles are by pioneers of the method.

  • Larry L. Orr and Daniel Gubits: Some Lessons From 50 Years of Multi-armed Public Policy Experiments
  • Joseph Newhouse: The Design of the RAND Health Insurance Experiment: A Retrospective
  • Judith M. Gueron and Gayle Hamilton: Using Multi-Armed Designs to Test Operating Welfare-to-Work Programs

They cover a wealth of ideas essential for the successful conduct of multi-armed trials.

  • Motivations for study design and the choice of treatment variants, and their relationship to real-world policy interests
  • The importance of reflecting the complex ecology and political reality of the study context to get stake-holder buy-in and participation
  • The importance of patience and deliberation in selecting sites and samples
  • The allocation of participants to treatment arms with a view to statistical power

Should I read this special issue before starting my own multi-armed trial?

Absolutely! It’s easy to go wrong with this design, but if done right, it can yield more information than you’d get with a 2-armed trial. Sample allotment matters depending on the question you want to ask. In a 3-armed trial you have to ask yourself a question: Do you want 33.3% of the sample in each of the three conditions (two treatment conditions and control) or 25% in each of the treatment arms and 50% in control? It depends on the contrast and research question. So it requires you to think more deeply about what question it is you want to answer.

This sounds risky. Why would I ever want to run a multi-armed trial?

In short, running a multi-armed trial allows a head-to-head test of alternatives, to determine which provides a larger or more immediate return on investment. It also sets up nicely the question of whether certain alternatives work better with certain beneficiaries.

The next two articles make this clear. One study randomized treatment sites to one of several enhancements to assess the added value of each. The other used a nifty multifactorial design to simultaneously tests several dimensions of a treatment.

  • Laura Peck, Hilary Bruck, and Nicole Constance: Insights From the Health Profession Opportunity Grant Program’s Three-Armed, Multi-Site Experiment for Policy Learning and Evaluation Practice
  • Randall Juras, Amy Gorman, and Jacob Alex Klerman: Using Behavioral Insights to Market a Workplace Safety Program: Evidence From a Multi-Armed Experiment

More About 3 Arm RCTs

The special issue of Evaluation Review helped motivate the design of a multiarmed trial conducted through the Regional Educational Laboratory (REL) Southwest in partnership with the Arkansas Department of Education (ADE). We co-authored this study through our role on REL Southwest.

In this study with ADE, we randomly assigned 700 Arkansas public elementary schools to one of eight conditions determining how communication was sent to their households about the Reading Initiative for Student Excellence (R.I.S.E.) state literacy website.

The treatments varied on these dimensions.

  1. Mode of communication (email only or email and text message)
  2. The presentation of information (no graphic or with a graphic)
  3. Type of sender (generic sender or known sender)

In January 2022, households with children in these schools were sent three rounds of communications with information about literacy and a link to the R.I.S.E. website. The study examined the impact of these communications on whether parents and guardians clicked the link to visit the website (click rate). We also conducted an exploratory analysis of differences in how long they spent on the website (time on page).

How do you tell the effects apart?

It all falls out nicely if you imagine the conditions as branches, or cells in a cube (both are pictured below).

In the branching representation, there are eight possible pathways from left to right representing the eight conditions.

In the cube representation, the eight conditions correspond to the eight distinct cells.

In the study, we evaluated the impact of each dimension across levels of the other dimensions: for example, whether click rate increases if email is accompanied with text, compared to just email, irrespective of who the sender is or whether the infographic is used.

We also tested the impact on click rates of the “deluxe” version (email + text, with known sender and graphic, which is the green arrow path in the branch diagram [or the red dot cell in the cube diagram]) versus the “plain” version (email only, generic sender, and no graphic, which is the red arrow path in the branch diagram [or green red dot cell in the cube diagram])

a figure illustrating the multi arms of the RCT and what intervention each of them received

a figure of a cube illustrating multi-armed trials

That’s all nice and dandy, but have you ever heard of the KISS principle: Keep it Simple Sweetie? You are taking some risks in design, but getting some more information. Is the tradeoff worth it? I’d rather run a series of two-armed trials. I am giving you a last chance to convince me.

Two armed trials will always be the staple approach. But consider the following.

  • Knowing what works among educational interventions is a starting point, but it does not go far enough.
  • The last 5-10 years have witnessed prioritization of questions and methods for addressing the questions of what work for whom and under which conditions.
  • However, even this may not go far enough to get to the question at heart of what people on the ground want to know. We agree with Tony Bryk that practitioners typically want to answer the following question.

What will it take to make it (the program) work for me, for my students, and in my circumstances?

There are plenty of qualitative, quantitative, and mixed methods to address this question. There also are many evaluation frameworks to support systematic inquiry to inform various stakeholders.

We think multi-armed trials help to tease out the complexity in the interactions among treatments and conditions and so help address the more refined question Bryk asks above.

Consider our example above. One question we explored was about how response rates varied across rural schools when compared to urban schools. One might speculate the following.

  • Rural schools are smaller, allowing principals to get to know parents more personally
  • Rural and non-rural households may have different kinds of usage and connectivity with email versus text and with MMS versus SMS

If these moderating effects matter, then the study, as conducted, may help with customizing communications, or providing a rationale for improving connectivity, and altogether optimizing the costs of communication.

Multi-armed trials, done well, increase the yield of actionable information to support both researcher and on-the-ground stakeholder interests!

Well, thank you for your time. I feel well-armed with information. I’ll keep thinking about this and wrestle with the pros and cons.

2023-05-31

New Research Project Evaluating the Impact of EVERFI’s WORD Force Program on Early Literacy Skills

Empirical Education and EVERFI from Blackbaud are excited to announce a new partnership. Researchers at Empirical will evaluate the impact and implementation of the WORD Force program, a literacy adventure for K-2 students.

The WORD Force program is designed to be engaging and interactive, using games and real-world scenarios to to teach students key reading and literacy skills and understand how to use them in context. It also provides students with personalized feedback and support, allowing them to work at their own pace and track their progress.

We will conduct the experiment within up to four school districts—working with elementary school teachers. This is our second project with EVERFI, and it builds on our 20 years of extensive experience conducting large-scale, rigorous randomized controlled trial (RCT) studies. (Read EVERFI’s press release about our first project with them.)

In our current work together, we plan to answer these five research questions. 1. What is the impact of WORD Force on early literacy achievement, including on spoken language, phonological awareness, phonics, word building, vocabulary, reading fluency, and reading comprehension, for students in grades K–2? 2. What is the impact of WORD Force on improving early literacy achievement for students in grades K-2 from low- to middle-income households, English Language Learner (ELL) students, by grade, and depending on teacher background (e.g., years of teaching experience, or responses to baseline survey about orientation to literacy instruction)? 3. What is the impact of WORD Force on improving early literacy achievement for students in grades K-2 who struggle with reading (i.e., those in greatest need of reading intervention) as determined through a baseline assessment of literacy skills? 4. What are realized levels of implementation/usage by teachers and students, and are they associated with achievement outcomes? 5. Do impacts on intermediate instructional/implementation outcomes mediate impacts on achievement ?

Using a matched-pairs design, we will pair teachers who are similar in terms of years of experience and other characteristics. Then, from each pair, we will randomize one teacher to the WORD Force group and the other to the business-as-usual (BAU) control group. This RCT design will allow us to evaluate the causal impact of WORD Force on student achievement outcomes as contrasted with BAU. EVERFI will offer WORD Force to the teachers in BAU as soon as the experiment is over. EVERFI will be able to use these findings to identify implementation factors that influence student outcomes, such as the classroom literacy environment, literacy block arrangements, and teachers’ characteristics. This study will also contribute to the growing corpus of literature around the efficacy of educational technology usage in early elementary classrooms.

For more information on our evaluation services, please visit our research services page and/or contact us.

All research Empirical Education has conducted for EVERFI can be found on our EVERFI webpage.

2023-04-13

Meet Our Newest Researchers

The Empirical Research Team is pleased to announce the addition of 3 new team members. We welcome Rebecca Dowling, Lindsay Maurer, and Mayah Waltower as our newest researchers!

Rebecca Dowling, Research Manager

Rebecca (box 8 in the pet matching game) is taking on the role of project manager for two evaluations. One is the EVERFI WORD Force project, working with Mayah Waltower. The other is the How Are The Children project. Rebecca’s PhD in Applied Developmental Psychology with a specialization in educational contexts of development lends expertise to both of these projects. Her education is complemented by her experience managing evaluations before joining Empirical Education. Rebecca works out of her home office in Utah. Can you guess which pet works at home with her?

Lindsay Maurer, Research Assistant

Lindsay (box 6 in the pet matching game) assists Sze-Shun Lau with the CREATE project, a teacher residency program in Atlanta Public Schools invested in expanding equity in education by developing critically conscious, compassionate, and skilled educators. Lindsay’s experience as a research assistant studying educational excellence and equality at the University of California Davis is an asset to the CREATE project. Lindsay works out of her home office in San Francisco, CA. Can you guess which pet works at home with her?

Mayah Waltower, Research Assistant

Mayah (box 1 in the pet matching game) has taken on assisting Rebecca with the EVERFI WORD Force and the How Are The Children projects. Mayah also assists Sze-Shun Lau with the CREATE project, a teacher residency program in Atlanta Public Schools invested in expanding equity in education by developing critically conscious, compassionate, and skilled educators. Mayah works out of her home office in Atlanta, GA. Can you guess which pet works at home with her?

To get to know them better, we’d like to invite you to play our pet matching game. The goal of the game is to correctly match each new team member with their pet (yes, plants can be pets too). To submit your answers and see if you’re right, post your guesses to twitter and tag us @empiricaled.

2023-03-17

Rock Island-Milan School District, Connect with Kids, and Empirical Education Win an EIR Grant

The Rock Island-Milan School District #41 (RIMSD) in partnership with CWK Network, Inc. (Connect with Kids) and Empirical Education just announced that they were awarded an EIR grant to develop and evaluate a program called How Are The Children? (HATC). This project-based social emotional curriculum intends to foster students’ social emotional competence, increase student engagement, and ameliorate the long-term social emotional impacts of the COVID-19 pandemic.

We will be conducting an independent evaluation of the effectiveness of HATC through a randomized control trial and formative evaluation. Our findings will inform the improvement of the program, as well as to foster the expansion of the curriculum into other schools and districts.

For more details on this grant and the project, see the press announcement and our EIR grant proposal.

2023-03-14

We Won a SEED Grant in 2022 with Georgia State University

Empirical Education began serving as a program evaluator of the teacher residency program, Collaboration and Reflection to Enhance Atlanta Teacher Effectiveness (CREATE), in 2015 under a subcontract with Atlanta Neighborhood Charter Schools (ANCS) as part of their Investing in Innovation (i3) Development grant. In 2018, we extended this work with CREATE and Georgia State University through the Supporting Effective Educator Development (SEED) Grant Program, through the U.S. Department of Education. In 2020, we were awarded additional SEED grants to further extend our work with CREATE.

Last month, in October 2022, we were notified that this important work will receive continued funding through SEED. CREATE has proposed the following goals with this continued funding.

  • Goal 1: Recruit, support, retain compassionate, skilled, anti-racist educators via residency
  • Goal 2: Design and enact transformative learning opportunities for experienced educators, teacher educators, and local stakeholders
  • Goal 3: Sustain effective and financially viable models for educator recruitment, support, and retention
  • Goal 4: Ensure all research efforts are designed to benefit partner organizations

Empirical remains deeply committed to designing and executing a rigorous and independent evaluation that will inform partner organizations, local stakeholders, and a national audience of the potential impact and replicability of a multifaceted program that centers equity and wellness for educators and students. With this new grant, we are also committed to integrating more mixed method approaches to better align our evaluation with CREATE’s antiracist mission, and to contribute to recent conversations about what it means to conduct educational effectiveness work with an equity and social justice orientation.

Using a quasi-experimental design and mixed-methods process evaluation, we aim to understand the impact of CREATE on teachers’ equitable and effective classroom practices, student achievement, and teacher retention. We will also explore key mediating impacts, such as teacher well-being and self-compassion, and conduct a cost-effectiveness and cost-benefit analysis. Importantly, we want to explore the cost-benefit CREATE offers to local stakeholders, centering this work in the Atlanta community. This funding allows us to extend our evaluation through CREATE’s 10th cohort of residents, and to continue exploring the impact of CREATE on Cooperating Teachers and experienced educators in Atlanta Public Schools.

2023-02-06

Two New Studies for Regional Education Laboratory (REL) Southwest Completed

Student Group Differences in Arkansas Indicators of Postsecondary Readiness and Success

It is well documented that students from historically excluded communities face more challenges in school. They are often less likely to obtain postsecondary education, and as a result see less upward social mobility. Educational researchers and practitioners have developed policies aimed at disrupting this cycle. However, an important factor necessary to make these policies work is the ability of school administrators to identify students that are at risk of not reaching certain academic benchmarks and/or exhibit certain behavioral patterns that are correlated with future postsecondary success.

Arkansas Department of Education (ADE), like education authorities in many other states, is tracking K-12 students’ college readiness and enrollment and collecting a wide array of student progress indicators meant to predict their postsecondary success. A recent study by Regional Education Laboratory (REL) Southwest showed that a logistic regression model that uses a fairly small number of such indicators, measured as early as in seventh or eighth grade, predicts with a high degree of accuracy whether students will enroll in college four or five years later (Hester et al., 2021). But does this predictive model – and the entire “early warning” system that could rely on it – work equally well for all student groups? In general, predictive models are designed to reduce average prediction error. So, when the dataset used for predictive modeling covers several substantially different populations, the models tend to make more accurate predictions for the largest subset and less accurate for the rest of the observations. Meaning, if the sample your model relies on is mostly White, it will most accurately predict outcomes for White students. In addition, predictive strength of some indicators may vary across student groups. In practice, this means that such a model may turn out to be less useful to forecast the outcome for those students who should benefit the most from it

Researchers from Empirical Education and AIR teamed up to complete a study for REL Southwest that focuses on the differences in predictive strength and model accuracy across student groups. It was a massive analytical undertaking based on nine years of tracking two cohorts of six graders from the whole state, close to 80,000 records and hundreds of variables including student characteristics (including gender, race/ethnicity, eligibility for the National School Lunch Program, English learner student status, disability status, age, and district locale), middle and high school academic and behavioral indicators, and their interactions. First, we found that several student groups—including Black and Hispanic students, students eligible for the National School Lunch Program, English learner students, and students with disabilities—were substantially less likely (by 10 percentage points or more) to be ready for or enroll in college than students without these characteristics. However, our main finding, and a reassuring one, is that the model’s predictive power and predictive strength of most indicators is similar across student groups. In fact, the model often does a better job predicting postsecondary outcomes for those student groups in most need of support.

Let’s talk about what “better” means in a study like that. It is fair to say that statistical model quality is seldom of particular interest in educational research and is often limited to a footnote showing the value of R2 (the proportion of variation in the outcome explained by independent variables). It can tell us something about the amount of “noise” in the data, but it is hardly something that policy makers are normally concerned with. In the situation where the model’s ability to predict a binary outcome—whether or not the student went to college—is the primary concern, there is a clear need for an easily interpretable and actionable metric. We just need to know how often the model is likely to predict the future correctly based on current data.

Logistic regression, which is used for predicting binary outcomes, produces probabilities of outcomes. When the predicted probability (like that of college enrollment) is above fifty percent, then we say that it predicts success (“yes, this student will enroll”), and it predicts failure (“no, they will not enroll”) otherwise. When the actual outcomes are known, we can evaluate the accuracy of the model. Counting the cases in which the predicted outcome coincides with the actual one and dividing it by the total number of cases yields the overall model accuracy. The model accuracy is a useful metric that is typically reported in predictive studies with binary outcomes. We found, for example, that the model accuracy in predicting college persistence (students completing at least two years of college) is 70% when only middle school indicators are used as predictors, and it goes up to 75% when high school indicators are included. These statistics vary little across student groups, by no more than one or two percentage points. Although it is useful to know that outcomes two years after graduation from high school can be predicted with a decent accuracy in as early as eighth grade, the ultimate goal is to ensure that students at risk of failure are identified while schools still can provide them with necessary support. Unfortunately, such a metric as the model accuracy is not particularly helpful in this case.

Instead, a metric called “model specificity” in the parlance of predictive analytics lets us view the data from a different angle. It is calculated as a proportion of correctly predicted negative outcomes alone, ignoring the positive ones. The model specificity metric turns out to vary across student groups a lot in our study but the nature of this variation validates the ADE’s system: for the student groups in most need of support, the model specificity is higher than for the rest of the data. For some student groups, the model can detect that a student is not on track to postsecondary success with near certainty. For example, failure to attain college persistence is correctly predicted from middle school data in 91 percent of cases for English learner students compared to 65 percent for non-English learner students. Adding high school data into the mix lowers the gap—to 88 vs 76 percent—but specificity is still higher for the English learner students, and this pattern holds across all other student groups.

The predictive model used in the ADE study can certainly power an efficient early warning system. However, we need to keep in mind what those numbers mean. For some students from historically excluded communities, their early life experiences create significant obstacles down the road. Some high schools are not doing enough to put these students on a new track that would ensure college enrollment and graduation. It is also worth noting that while this study provides evidence that ADE has developed an effective system of indicators, the observations used in the study come from the two cohorts of students who were in the sixth graders in 2008–09 and 2009–10. Many socioeconomic conditions have changed since then. Thus, the only way to assure that the models remain accurate is to proceed from isolated studies to building “live” predictive tools that would update the models as soon as a new annual batch of outcome data becomes available.

Read complete report titled “Student Group Differences In Arkansas’ Indicators of Postsecondary Readiness and Success” here.

Early Progress and Outcomes of a Grow Your Own Grant Program for High School Students and Paraprofessionals in Texas

Shortage and high turnover of teachers is a problem that rural schools face across the nation. Empirical Education researchers have contributed to the search for solutions for this problem several times in recent years, including two studies completed for REL Southwest (Sullivan, et al., 2017; Lazarev et al., 2017). While much of the policy research is focused on the ways to recruit and retain credentialed teachers, some states are exploring novel methods to create new pathways into the profession that would help create new local teacher cadres. One such promising initiative is Grow Your Own (GYO) program funded by the Texas Education Agency (TEA). Starting in 2019, TEA provides grants to schools and districts that intend to expand the local teacher labor force through one or both of the following pathways. The first pathway offers high school students an early start in teacher preparation through a sequence of education and training courses. The second pathway aims to help paraprofessionals already employed by schools to transition into teaching positions by covering tuition for credentialing programs, as well as offering a stipend for living expenses.

In the joint project with AIR, Empirical Education researchers explored the potential of the first pathway for high school students to address teacher shortages in rural communities and to increase the diversity of teachers. Since this is a new program, our study was based on the first two years of implementation. We found promising evidence that GYO can positively impact rural communities and increase teacher diversity. For example, GYO grants were allocated primarily to rural and small town communities, and programs were implemented in smaller schools with a higher percentage of Hispanic students and economically disadvantaged students. Participating schools also had higher enrollment in the teacher preparation courses. In short, GYO seems to be reaching rural areas with smaller and more diverse schools, and is boosting enrollment in teacher preparation courses in these areas. However, we also found that fewer than 10% of students in participating districts completed at least one education and training course, and fewer than 1% of students completed the full sequence of courses. Additionally, white and female students are overrepresented in these courses. These and other preliminary results will help the state education agency to fine-tune the program and work toward a successful final result: a greater number and increased diversity of effective teachers who are from the community in which they teach. We look forward to continuing research on the impact of “Grow Your Own.”

Read complete report titled “Early Progress and Outcomes of a Grow Your Own Grant Program for High School Students and Paraprofessionals in Texas” here.

All research Empirical Education has conducted for REL Southwest can be found on our REL-SW webpage.

References

Hester, C., Plank, S., Cotla, C., Bailey, P., & Gerdeman, D. (2021). Identifying Indicators That Predict Postsecondary Readiness and Success in Arkansas. REL 2021-091. Regional Educational Laboratory Southwest. https://eric.ed.gov/?id=ED613040

Lazarev, V., Toby, M., Zacamy, J., Lin L., & Newman, D. (2017). Indicators of Successful Teacher Recruitment and Retention in Oklahoma Rural Schools (REL 2018–275). U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Southwest. https://ies.ed.gov/ncee/rel/Products/Publication/3872.

Sullivan, K., Barkowski, E., Lindsay, J., Lazarev, V., Nguyen, T., Newman, D., & Lin, L. (2017). Trends in Teacher Mobility in Texas and Associations with Teacher, Student, and School Characteristics (REL 2018–283). U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Southwest. https://ies.ed.gov/ncee/rel/Products/Publication/3883

2023-01-10

Evidentally Rejoins Empirical Education

On 10/31/18, we formed Evidentally from a set of projects that Empirical Education called Evidence as a Service (EaaS). The idea was to build a set of products that would automate much of the labor-intensive portions of the research process, e.g., statistical analysis, data cleaning, and similar efforts. Building on the education technology (edtech) efficiencies, particularly the collection of edtech usage data, would make it possible for non-researchers such as school administrators to conduct efficacy research.

By lowering the cost of research and introducing an ease into the research process, we are able to increase the number of valid studies that could be combined in a meta-analysis for generalizable results. The notion was that, as a product company, Evidentally could attempt to get investments unavailable to services companies such as Empirical Education. Unfortunately, for several reasons, Evidentally was unable to get that investment.

The intellectual property and projects of Evidentally have been returned to Empirical Education as of 12/7/22, and Evidentally (as its own entity) was dissolved. While the team is still committed to building the Evidence as a Service suite of tools, it will be conducted as a project of Empirical Education, under the branding of the Evidentally Evidence Suite. The Evidentally product is just one piece of Empirical Education’s education evidence offerings, when an education application or curriculum needs evidence of efficacy meeting any of the Every Students Succeeds Act (ESSA) Tiers of evidence.

2022-12-21

Happy New Year from Empirical Education

To ring in the new year, we want to share this two-minute video with you. It comprises highlights from 2022 from each person on our team. We hope you like it. Cheers to a healthy and prosperous 2023!

My colleagues appear in this order in the video.

Happy New Year photo by Sincerely Media

2022-12-15

Studying the Impacts of CAPIT Reading: An Early Literacy Program in Oklahoma

Empirical Education’s Evidentally recently conducted a study to evaluate the impact of CAPIT Reading on student early literacy achievement. The study utilized a quasi-experimental comparison group design using data from 12 elementary schools in a suburban school district in Oklahoma during the 2019–20 school year.

CAPIT Reading is a comprehensive PK–2 literacy solution that includes a digital phonics curriculum and teacher professional development. The program is a teacher-led phonemic awareness and phonics curriculum that includes lesson plans, built-in assessments, and ongoing support.

Four schools used CAPIT to supplement their literacy instruction for kindergarten students (treatment group) while eight schools did not (comparison group). The study linked CAPIT usage data and district demographic and achievement data to estimate the impact of CAPIT on the Letter Word Sounds Fluency (LWSF) and Early Literacy Composite scores of the aimsweb reading assessment, administered by the district in August and January.

We found a positive impact of CAPIT Reading on student early reading achievement on the aimsweb assessment for kindergarten students. This positive impact was estimated at 4.4 test score points for the aimsweb Early Literacy Composite score (effect size = 0.17; p = 0.01) and 7.8 points for the LWSF score (effect size = 0.29; p < 0.001). This impact on the LWSF score is equivalent to a 29% increase in growth for the average CAPIT student from the fall to winter tests.

We found limited evidence of differential impact favoring student subgroups, meaning that this positive impact for CAPIT users did not vary according to student characteristics such as eligibility for free and reduced-price lunch, race, or gender. We found that the impact on aimsweb overall was marginally greater for special education students by 4.9 points (p = 0.09) and that the impact on LWSF scores was greater for English Language Learners by 7.4 points (p = 0.09). Impact of CAPIT reading does not vary significantly across other student groups.

Read the CAPIT Reading Student Impact Report for more information on this early literacy research.

2022-12-07

SREE 2022 Annual Meeting

When I read the theme of the 2022 SREE Conference, “Reckoning to Racial Justice: Centering Underserved Communities in Research on Educational Effectiveness”, I was eager to learn more about the important work happening in our community. The conference made it clear that SREE researchers are becoming increasingly aware of the need to swap individual-level variables for system-level variables that better characterize issues of systematic access and privilege. I was also excited that many SREE researchers are pulling from the fields of mixed methods and critical race theory to foster more equity-aligned study designs, such as those that center participant voice and elevate counter-narratives.

I’m excited to share a few highlights from each day of the conference.

Wednesday, September 21, 2022

Dr. Kamilah B. Legette, University of Denver

Dr. Kamilah B Legette masked and presenting at SREE

Dr. Kamilah B. Legette from the University of Denver discussed their research exploring the relationship between a student’s race and teacher perceptions of the student’s behavior as a) severe, b) inappropriate, and c) indicative of patterned behavior. In their study, 22 teachers were asked to read vignettes describing non-compliant student behaviors (e.g., disrupting storytime) where student identity was varied by using names that are stereotypically gendered and Black (e.g., Jazmine, Darnell) or White (e.g., Katie, Cody).

Multilevel modeling revealed that while student race did not predict teacher perceptions of behavior as severe, inappropriate, or patterned, students’ race was a moderator of the strength of the relationship between teachers’ emotions and perceptions of severe and patterned behavior. Specifically, the relationship between feelings of frustration and severe behavior was stronger for Black children than for White children, and the relationship between feelings of anger and patterned behavior showed the same pattern. Dr. Legette’s work highlighted a need for teachers to engage in reflective practices to unpack these biases.

Dr. Johari Harris, University of Virginia

In the same session, Dr. Johari Harris from the University of Virginia shared their work with the Children’s Defense Fund Freedom Schools. Learning for All (LFA), one Freedom School for students in grades 3-5, offers a five-week virtual summer literacy program with a culturally responsive curriculum based on developmental science. The program aims to create humanizing spaces that (re)define and (re)affirm Black students’ racial-ethnic identities, while also increasing students’ literacy skills, motivation, and engagement.

Dr. Harris’s mixed methods research found that students felt LFA promoted equity and inclusion, and reported greater participation, relevance, and enjoyment within LFA compared to in-person learning environments prior to COVID-19. They also felt their teachers were culturally engaging, and reported a greater sense of belonging, desire to learn, and enjoyment.

While it’s often assumed that young children of color are not fully aware of their racial-ethnic identity or how it is situated within a White supremacist society, Dr. Harris’s work demonstrated the importance of offering culturally affirming spaces to upper-elementary aged students.

Thursday, September 22, 2022

Dr. Krystal Thomas, SRI

Dr. Krystal Thomas presenting at SREE

On Thursday, I attended a talk by Dr. Krystal Thomas from SRI International about the potential of open education resource (OER) programming to further culturally responsive and sustaining practices (CRSP). Their team developed a rubric to analyze OER programming, including materials and professional development (PD) opportunities. The rubric combined principles of OER (free and open access to materials, student-generated knowledge) and CRSP (critical consciousness, student agency, student ownership, inclusive content, classroom culture, and high academic standards).

Findings suggest that while OER offers access to quality instructional materials, it does not necessarily develop teacher capacity to employ CRSP. The team also found that some OER developers charge for CRSP PD, which undermines a primary goal of OER (i.e., open access). One opportunity this talk provided was eventual access to a rubric to analyze critical consciousness in program materials and professional learning (Dr. Thomas said these materials will be posted on the SRI website in upcoming months). I believe this rubric may support equity-driven research and evaluation, including Empirical’s evaluation of the antiracist teacher residency program, CREATE (Collaboration and Reflection to Enhance Atlanta Teacher Effectiveness).

Dr. Rekha Balu, Urban Institute; Dr. Sean Reardon, Stanford University; Dr. Beth Boulay, Abt Associates

left to right: Dr. Beth Boulay, Dr. Rekha Balu, Dr. Sean Reardon, and Titilola Harley on stage at SREE

The plenary talk, featuring discussants Dr. Rekha Balu, Dr. Sean Reardon, and Dr. Beth Boulay, offered suggestions for designing equity- and action-driven effectiveness studies. Dr. Balu urged the SREE community to undertake “projects of a lifetime”. These are long-haul initiatives that push for structural change in search of racial justice. Dr. Balu argued that we could move away from typical thinking about race as a “control variable”, towards thinking about race as an experience, a system, and a structure.

Dr. Balu noted the necessity of mixed methods and participant-driven approaches to serve this goal. Along these same lines, Dr. Reardon felt we need to consider system-level inputs (e.g., school funding) and system-level outputs (e.g., rate of high school graduation) in order to understand disparities in opportunity, rather than just focusing on individual-level factors (e.g., teacher effectiveness, student GPA, parent involvement) that distract from larger forces of inequity. Dr. Boulay noted the importance of causal evidence to persuade key gatekeepers to pursue equity initiatives and called for more high quality measures to serve that goal.

Friday, September 23, 2022

The tone of the conference on Friday was to call people in (a phrase used in opposition to “call people out”, which is often ego-driven, alienating, and counter-productive to motivating change).

Dr. Ivory Toldson, Howard University

Dr. Ivory Toldson at a podium presenting at SREE

In the morning, I attended the Keynote Session by Dr. Ivory Toldson from Howard University. What stuck with me from Dr. Toldson’s talk was their argument that we tend to use numbers as a proxy for people in statistical models, but to avoid some of the racism inherent in our profession as researchers, we must see numbers as people. Dr. Toldson urged the audience to use people to understand numbers, not numbers to understand people. In other words, by deriving a statistical outcome, we do not necessarily know more about the people we study. However, we are equipped with a conversation starter. For example, if Dr. Toldson hadn’t invited Black boys to voice their own experience of why they sometimes struggle in school, they may have never drawn a potential link between sleep deprivation and ADHD diagnosis: a huge departure from the traditional deficit narrative surrounding Black boys in school.

Dr. Toldson also challenged us to consider what our choice in the reference group means in real terms. When we use White students as the reference group, we normalize Whiteness and we normalize groups with the most power. This impacts not only the conclusions we draw, but also the larger framework in which we operate (i.e., White = standard, good, normal).

I also appreciated Dr. Toldson’s commentary on the need for “distributive trust” in schools. They questioned why the people furthest from the students (e.g., superintendents, principals) are given the most power to name best practices, rather than empowering teachers to do what they know works best and to report back. This thought led me to wonder, what can we do as researchers to lend power to teachers and students? Not in a performative way, but in a way that improves our research by honoring their beliefs and first-hand experiences; how can we engage them as knowledgeable partners who should be driving the narrative of effectiveness work?

Dr. Deborah Lindo, Dr. Karin Lange, Adam Smith, EF+Math Program; Jenny Bradbury, Digital Promise; Jeanette Franklin, New York City DOE

Later in the day, I attended a session about building research programs on a foundation of equity. Folks from EF+Math Program (Dr. Deborah Lindo, Dr. Karin Lange, and Dr. Adam Smith), Digital Promise (Jenny Bradbury), and the New York City DOE (Jeanette Franklin) introduced us to some ideas for implementing inclusive research, including a) fostering participant ownership of research initiatives; b) valuing participant expertise in research design; c) co-designing research in partnership with communities and participants; d) elevating participant voice, experiential data, and other non-traditional effectiveness data (e.g., “street data”); and e) putting relationships before research design and outcomes. As the panel noted, racism and inequity are products of design and can be redesigned. More equitable research practices can be one way of doing that.

Saturday, September 24, 2022

Dr. Andrew Jaciw, Empirical Education

Dr. Andrew Jaciw at a podium presenting at SREE

On Saturday, I sat in on a session that included a talk given by my colleague Dr. Andrew Jaciw. Instead of relaying my own interpretation of Andrew’s ideas and the values they bring to the SREE community, I’ll just note that he will summarize the ideas and insights from his talk and subsequent discussion in an upcoming blog. Keep your eyes open for that!

See you next year!

Dr. Chelsey Nardi and Dr. Leanne Doughty

2022-11-29
Archive