blog posts and news stories

AERA 2024 Annual Meeting

We had an inspiring trip to Philadelphia last month! The AERA conference theme was Dismantling Racial Injustice and Constructing Educational Possibilities: A Call to Action. We presented our latest research on the CREATE study, were able to spend time with our CREATE partners, and attend several captivating sessions on topics including intersectionality, QuantCrit methodology, survey development, race-focused survey research, and SEL. We came away from the conference energized and eager to apply this new learning to our current studies and for AERA 2025!

Thursday, April 11, 2024

Kimberlé Crenshaw 2024 AERA Annual Meeting Opening Plenary—Fighting Back to Move Forward: Defending the Freedom to Learn In the War Against Woke

Kimberle Crenshaw stands on stage delivering the opening plenary. Attendees fill the chairs in a large room, and some attendees sit on the floor.

Kimberlé Crenshaw’s opening plenary explored the relationship between our education system and our democracy, including censorship issues and what Crenshaw describes as a “violently politicized nostalgia for the past.” She brought in her own personal experience in recent years as she has witnessed terms that she coined, including “intersectionality,” being weaponized. She encouraged AERA attendees to fight against censorship in our institutions, and suggested that attendees check out the African American Policy Forum (AAPF) and the Freedom to Learn Network. To learn more, check out Intersectionality Matters!, an AAPF podcast hosted by Kimberlé Crenshaw.

Friday, April 12, 2024

Reconciling Traditional Quantitative Methods With the Imperative for Equitable, Critical, and Ethical Research

Five panelists sit on stage with a projector screen to their right. The heading on the projector screen reads Dialogue with Parents. Eleven attendees are pictured in the audience.

We were particularly excited to attend a panel on Reconciling Traditional Quantitative Methods With the Imperative for Equitable, Critical, and Ethical Research, as our team has been diving into the QuantCrit literature and interrogating our own quantitative methodology in our evaluations. The panelists embrace quantitative research, but emphasize that numbers are not neutral, and that the choices that quantitative researchers make in their research design are critical to conducting equitable research.

Nichole M. Garcia (Rutgers University) discussed her book project on intersectionality. Nancy López (University of New Mexico) encouraged researchers to consider additional questions about “street race” including “What race do you think that others assume what race you are” to better understand the role that the social construction of race plays in participants’ experiences. Jennifer Randall (University of Michigan) encouraged researchers to administer justice-oriented assessments, emphasizing that assessments are not objective, but rather subjective tools that reflect what we value and have historically contributed to educational inequalities. Yasmiyn Irizarry (University of Texas at Austin) encouraged researchers to do the work of citing QuantCrit literature when reporting quantitative research. (Check out #QuantCritSyllabus for resources compiled by Yasmiyn Irizarry and other QuantCrit scholars.)

This panel gave us food for thought, and pushed us to think through our own evaluation practices. As we look forward to AERA 2025, we hope to engage in conversations with evaluators on specific questions that come up in evaluation research, such as how to put WWC standards into conversation with QuantCrit methodology.

The Impact of the CREATE Residency Program on Early Career Teachers’ Well-Being

The Empirical Education team who presented at AERA in 2024.

Andrew Jaciw, Mayah Waltower, and Lindsay Maurer presented on The Impact of the CREATE Residency Program on Early Career Teachers’ Well-Being, focusing on our evaluation of the CREATE program. The CREATE Program at Georgia State University is a federally and philanthropically funded project that trains and supports educators across their career trajectory. In partnership with Atlanta Public Schools, CREATE includes a three-year residency model for prospective and early career teachers who are committed to reimagining classroom spaces for deep joy, liberation and flourishing.

CREATE has been awarded several grants from the U.S. Department of Education, in partnership with Empirical Education as the independent evaluators. The grants include those from Investing in Innovation (i3), Education Innovation and Research (EIR), and Supporting Effective Educator Development (SEED). CREATE is currently recruiting the 10th cohort of residents.

During our presentation, we looked back on promising results from CREATE’s initial program model (2015–2019), shared recent results suggesting possible explanatory links between mediators and outcomes (2021–22), and discussed CREATE evolving program model and how to identify/align more relevant measures (2022–current).

The following are questions that we continue to ponder.

  • What additional considerations should we take into account when thinking about measuring the well-being of Black educators?
  • Certain measures of well-being, such as the Maslach Burnout Inventory for Educators, respond to a more narrow definition of teacher well-being. Are there measures of teacher well-being that reflect the context of the school that teachers are in and/or that are more responsive to different educational contexts?
  • Are there culturally-responsive measures of teacher well-being?
  • How can we measure the impacts of concepts relating to racial and social justice in the current political context?

Please reach out to us if you have any resources to share!

Survey Development in Education: Using Surveys With Students and Parents

Much of what I do as a Research Assistant at Empirical Education is to support the design and development of surveys, so I was excited to have the chance to attend this session! The authors’ presentations were all incredibly informative, but there were three in particular that I found especially relevant. The first was a paper presented by Jiusheng Zhu (Beijing Normal University) that analyzed the impact of “information nudges” on students’ academic achievement. This paper demonstrated how personalized, specific information nudges about short-term impacts can encourage students to modify their behavior.

Jin Liu (University of South Carolina) presented a paper on the development and validation of an ultra-short survey scale aimed at assessing the quality of life for children with autism. Through the use of network analysis and strength centrality estimations, the scale, known as Quality of Life for Children with Autism Spectrum Disorder (QOLASD-C3), was condensed to a much shorter version that targets specific dimensions of interest. I found this topic particularly interesting, as we are always in the process of refining our survey development processes. Finding ways to boost response rates and minimize participant fatigue is crucial in ensuring the effectiveness of research efforts.

In the third paper, Jennifer Rotach and Davie Store (Kent ISD) demonstrated how demographics play a role in how students score on assessments. The authors explained how disaggregating the data is sometimes necessary to ensure that all students’ voices are heard. They explain that in many cases, school and district decisions are driven by average scores, often leading to the exclusion of those who are above or below the average. The authors explain that in some cases, disaggregating survey data by demographics (such as race, gender, or disability status) may be the most helpful in uncovering a different story than just the “average” will tell.

— Mayah

Sunday, April 14, 2024

Conducting Race-Focused Survey Research in the P-20 System during the Anti-Woke Political Revolt

A presentation slide titled Researcher Positionality Conceptual 
Framework shows an image of a brain, with thought bubbles that say Researching the Self, Researching the Self in Relation to Others, Engaged Reflection and Representation, and Shifting from the Self to the System

The four presentations in the symposium titled Conducting Race-Focused Survey Research in the P–20 System During the Anti-Woke Political Revolt focused on tensions, challenges, and problem-solving throughout the process of developing the Knowledge, Beliefs, and Mindsets (KBMs) about Equity in Educators and Educational Leadership Survey. On the CREATE project, where we are constantly working to improve our surveys and center racial equity in our work, we are wrestling with similar dilemmas in terms of sociopolitical context. Therefore, it was very eye-opening to hear panelists talk through their decision-making throughout the entire survey development process. The North Carolina State We-LEED research team walked through their process step-by-step, from conceptualization to the grounding literature and conceptual framing, and instrument development to cognitive interviews, and sample selection to recruitment strategies.

I particularly enjoyed hearing about cognitive interviews, where researchers asked participants to voice their inner monologue while taking the survey, so that they could understand participant feedback and be responsive to participant needs. It was also very helpful to hear the panelists reflect on their positionality and how their positionality connected to their research. I am highly anticipating reviewing this survey when it is finalized!

— Lindsay

Contemporary Approaches to Evaluating Universal School-Based Social Emotional Learning Programs: Effectiveness for Whom and How?

A screen projects a slide titled Contemporary Approaches to Evaluation SEL Programs. On the screen is a venn diagram with three circles. The three circles are labeled Skills-Based SEL, Adult Development SEL, and Justice Focused SEL. At the intersection of these three circles are bullet points with the words competencies, pedagogies, implementation, and outcomes. I was excited to attend a session focused on Social Emotional Learning (SEL), a topic that directly relates to the projects I am currently involved in. The symposium featured four papers that all highlighted the importance of conducting high-quality evaluations of Universal School-Based (USB) SEL initiatives.

In the first paper, Christina Cipriano (Yale University) presented a meta-analysis of studies focusing on SEL. This meta-analysis demonstrated that of the studies reviewed, SEL programs that were delivered by teachers showed greater improvements in SEL skills. This paper also provided evidence that programs that taught intrapersonal skills before teaching interpersonal skills showed greater effectiveness.

The second paper was presented by Melissa Lucas (Yale University) and underscored the necessity of including multilingual students in USB SEL evaluations, emphasizing the importance of considering these students when designing and implementing interventions.

Cheyeon Ha (Yale University) presented recommendations from the third paper, which underscored this point for me. The third paper was a meta-analysis of USB SEL studies in the U.S., and it showed that less than 15% of the studies it reviewed included student English Language Learner (ELL) status. Because students with different primary languages may respond to SEL interventions differently, understanding how these programs work on students based on ELL status is important and useful in better understanding an SEL program.

The final paper (presented by Christina Cipriano) provided methodological guidance, which I found particularly intriguing and thought-provoking. It highlighted the importance of utilizing mixed methods research, advocating for open data practices, and ensuring data accessibility and transparency for a wide range of stakeholders.

As we continue to work on projects aimed at implementing SEL and enhancing students’ social-emotional skills, the insights shared in this symposium will undoubtedly prove valuable in our efforts to conduct high-quality evaluations of SEL programs.

— Mayah

2024-05-30

The Rebel Alliance is Growing

The rebellion against the old NCLB way of doing efficacy research is gaining force. A growing community among edtech developers, funders, researchers, and school users has been meeting in an attempt to reach a consensus on an alternative built on ESSA.

This is being assisted by openness in the directions currently being pursued by IES. In fact, we are moving into a new phase marked by two-way communication with the regime. While the rebellion hasn’t yet handed over its lightsabers, it is encouraged by the level of interest from prominent researchers.

From these ongoing discussions, there have been some radical suggestions inching toward consensus. A basic idea now being questioned is this:

The difference between the average of the treatment group and the average of the control group is a valid measure of effectiveness.

There are two problems with this:

  1. In schools, there’s no “placebo” or something that looks like a useful program but is known to have zero effectiveness. Whatever is going on in the schools, or classes, or with teachers and students in the control condition has some usefulness or effectiveness. The usefulness of the activities in the control classes or schools may be greater than the activities being evaluated in the study, or may be not as useful. The study may find that the “effectiveness” of the activities being studied is positive, negative, or too small to be discerned statistically by the study. In any case, the size (negative or positive) of the effect is determined as much by what’s being done in the control group as the treatment group.
  2. Few educational activities have the same level of usefulness for all teachers and students. Looking at only the average will obscure the differences. For example, we ran a very large study for the U.S. Department of Education of a STEM program where we found, on average, the program was effective. What the department didn’t report was that it only worked for the white kids, not the black kids. The program increased instead of reducing the existing achievement gap. If you are considering adopting this STEM program, the impact on the different subgroups is relevant–a high minority school district may want to avoid it. Also, to make the program better, the developers need to know where it works and where it doesn’t. Again, the average impact is not just meaningless but also can be misleading.

A solution to the overuse of the average difference from studies is to conduct a lot more studies. The price the ED paid for our large study could have paid for 30 studies of the kind we are now conducting in the same state of the same program; in 10% of the time of the original study. If we had 10 different studies for each program, where studies are conducted in different school districts with different populations and levels of resources, the “average” across these studies start to make sense. Importantly, the average across these 10 studies for each of the subgroups will give a valid picture of where, how, and with which students and teachers the program tends to work best. This kind of averaging used in research is called meta-analysis and allows many small differences found across studies to build on the power of each study to generate reliable findings.

If developers or publishers of the products being used in schools took advantage of their hundreds of implementations to gather data, and if schools would be prepared to share student data for this research, we could have researcher findings that both help schools decide what will likely work for them and help developers improve their products.

2018-09-21

New Project with ALSDE to Study AMSTI

Empirical Education is excited to announce a new study of the Alabama Math, Science, and Technology Initiative (AMSTI). The Alabama legislature commissioned the study. AMSTI is the Alabama State Department of Education’s initiative to improve math and science teaching statewide. The program, which started over 20 years ago, operates in over 900 schools across the state. Many external evaluators have validated AMSTI.

Researchers here at Empirical Education, directed by Chief Scientist Andrew Jaciw, published a study in 2012. The cluster-randomized trial (CRCT) involved 82 schools and ~700 teachers. It assessed the efficacy of AMSTI over a three year period and showed an overall positive effect (Newman et al., 2012).

The new study that we are embarking on will use a quasi-experimental matched comparison group design. We will take advantage of existing data available from the Alabama State Department of Education and the AMSTI program. By comparing compare schools using AMSTI to matched schools not using AMSTI, we can determine the impact of the program on math and science achievement for students in grades 3 through 8. Our report will also include differential impacts of the program on important student subgroups. Using Improvement Science principles, we will examine school climates for a greater or reduced program impact.

At the conclusion of the study, we will distribute the report to select committees of the Alabama state legislature, the Governor and the Alabama State Board of Education, and the Alabama State Department of Education. Empirical Education researchers will travel to Montgomery, AL to present the study findings and recommendations for improvement to the Alabama legislature.

2018-07-13

A Rebellion Against the Current Research Regime

Finally! There is a movement to make education research more relevant to educators and edtech providers alike.

At various conferences, we’ve been hearing about a rebellion against the “business as usual” of research, which fails to answer the question of, “Will this product work in this particular school or community?” For educators, the motive is to find edtech products that best serve their students’ unique needs. For edtech vendors, it’s an issue of whether research can be cost-effective, while still identifying a product’s impact, as well as helping to maximize product/market fit.

The “business as usual” approach against which folks are rebelling is that of the U.S. Education Department (ED). We’ll call it the regime. As established by the Education Sciences Reform Act of 2002 and the Institute of Education Sciences (IES), the regime anointed the randomized control trial (or RCT) as the gold standard for demonstrating that a product, program, or policy caused an outcome.

Let us illustrate two ways in which the regime fails edtech stakeholders.

First, the regime is concerned with the purity of the research design, but not whether a product is a good fit for a school given its population, resources, etc. For example, in an 80-school RCT that the Empirical team conducted under an IES contract on a statewide STEM program, we were required to report the average effect, which showed a small but significant improvement in math scores (Newman et al., 2012). The table on page 104 of the report shows that while the program improved math scores on average across all students, it didn’t improve math scores for minority students. The graph that we provide here illustrates the numbers from the table and was presented later at a research conference.

bar graph representing math, science, and reading scores for minority vs non-minority students

IES had reasons couched in experimental design for downplaying anything but the primary, average finding, however this ignores the needs of educators with large minority student populations, as well as of edtech vendors that wish to better serve minority communities.

Our RCT was also expensive and took many years, which illustrates the second failing of the regime: conventional research is too slow for the fast-moving innovative edtech development cycles, as well as too expensive to conduct enough research to address the thousands of products out there.

These issues of irrelevance and impracticality were highlighted last year in an “academic symposium” of 275 researchers, edtech innovators, funders, and others convened by the organization now called Jefferson Education Exchange (JEX). A popular rallying cry coming out of the symposium is to eschew the regime’s brand of research and begin collecting product reviews from front-line educators. This would become a Consumer Reports for edtech. Factors associated with differences in implementation are cited as a major target for data collection. Bart Epstein, JEX’s CEO, points out: “Variability among and between school cultures, priorities, preferences, professional development, and technical factors tend to affect the outcomes associated with education technology. A district leader once put it to me this way: ‘a bad intervention implemented well can produce far better outcomes than a good intervention implemented poorly’.”

Here’s why the Consumer Reports idea won’t work. Good implementation of a program can translate into gains on outcomes of interest, such as improved achievement, reduction in discipline referrals, and retention of staff, but only if the program is effective. Evidence that the product caused a gain on the outcome of interest is needed or else all you measure is the ease of implementation and student engagement. You wouldn’t know if the teachers and students were wasting their time with a product that doesn’t work.

We at Empirical Education are joining the rebellion. The guidelines for research on edtech products we recently prepared for the industry and made available here is a step toward showing an alternative to the regime while adopting important advances in the Every Student Succeeds Act (ESSA).

We share the basic concern that established ways of conducting research do not answer the basic question that educators and edtech providers have: “Is this product likely to work in this school?” But we have a different way of understanding the problem. From years of working on federal contracts (often as a small business subcontractor), we understand that ED cannot afford to oversee a large number of small contracts. When there is a policy or program to evaluate, they find it necessary to put out multi-million-dollar, multi-year contracts. These large contracts suit university researchers, who are not in a rush, and large research companies that have adjusted their overhead rates and staffing to perform on these contracts. As a consequence, the regime becomes focused on the perfection in the design, conduct, and reporting of the single study that is intended to give the product, program, or policy a thumbs-up or thumbs-down.

photo of students in a classroom on computers

There’s still a need for a causal research design that can link conditions such as resources, demographics, or teacher effectiveness with educational outcomes of interest. In research terminology, these conditions are called “moderators,” and in most causal study designs, their impact can be measured.

The rebellion should be driving an increase the number of studies by lowering their cost and turn-around time. Given our recent experience with studies of edtech products, this reduction can reach a factor of 100. Instead of one study that costs $3 million and takes 5 years, think in terms of a hundred studies that cost $30,000 each and are completed in less than a month. If for each product, there are 5 to 10 studies that are combined, they would provide enough variation and numbers of students and schools to detect differences in kinds of schools, kinds of students, and patterns of implementation so as to find where it works best. As each new study is added, our understanding of how it works and with whom improves.

It won’t be enough to have reviews of product implementation. We need an independent measure of whether—when implemented well—the intervention is capable of a positive outcome. We need to know that it can make (i.e., cause) a difference AND under what conditions. We don’t want to throw out research designs that can detect and measure effect sizes, but we should stop paying for studies that are slow and expensive.

Our guidelines for edtech research detail multiple ways that edtech providers can adapt research to better work for them, especially in the era of ESSA. Many of the key recommendations are consistent with the goals of the rebellion:

  • The usage data collected by edtech products from students and teachers gives researchers very precise information on how well the program was implemented in each school and class. It identifies the schools and classes where implementation met the threshold for which the product was designed. This is a key to lowering cost and turn-around time.
  • ESSA offers four levels of evidence which form a developmental sequence, where the base level is based on existing learning science and provides a rationale for why a school should try it. The next level looks for a correlation between an important element in the rationale (measured through usage of that part of the product) and a relevant outcome. This is accepted by ESSA as evidence of promise, informs the developers how the product works, and helps product marketing teams get the right fit to the market. a pyramid representing the 4 levels of ESSA
  • The ESSA level that provides moderate evidence that the product caused the observed impact requires a comparison group matched to the students or schools that were identified as the users. The regime requires researchers to report only the difference between the user and comparison groups on average. Our guidelines insist that researchers must also estimate the extent to which an intervention is differentially effective for different demographic categories or implementation conditions.

From the point of view of the regime, nothing in these guidelines actually breaks the rules and regulations of ESSA’s evidence standards. Educators, developers, and researchers should feel empowered to collect data on implementation, calculate subgroup impacts, and use their own data to generate evidence sufficient for their own decisions.

A version of this article was published in the Edmarket Essentials magazine.

2018-05-09

Recognizing Success

When the Obama-Duncan administration approaches teacher evaluation, the emphasis is on recognizing success. We heard that clearly in Arne Duncan’s comments on the release of teacher value-added modeling (VAM) data for LA Unified by the LA Times. He’s quoted as saying, “What’s there to hide? In education, we’ve been scared to talk about success.” Since VAM is often thought of as a method for weeding out low performing teachers, Duncan’s statement referencing success casts the use of VAM in a more positive light. Therefore we want to raise the issue here: how do you know when you’ve found success? The general belief is that you’ll recognize it when you see it. But sorting through a multitude of variables is not a straightforward process, and that’s where research methods and statistical techniques can be useful. Below we illustrate how this plays out in teacher and in program evaluation.

As we report in our news story, Empirical is participating in the Gates Foundation project called Measures of Effective Teaching (MET). This project is known for its focus on value-added modeling (VAM) of teacher effectiveness. It is also known for having collected over 10,000 videos from over 2,500 teachers’ classrooms—an astounding accomplishment. Research partners from many top institutions hope to be able to identify the observable correlates for teachers whose students perform at high levels as well as for teachers whose students do not. (The MET project tested all the students with an “alternative assessment” in addition to using the conventional state achievement tests.) With this massive sample that includes both data about the students and videos of teachers, researchers can identify classroom practices that are consistently associated with student success. Empirical’s role in MET is to build a web-based tool that enables school system decision-makers to make use of the data to improve their own teacher evaluation processes. Thus they will be able to build on what’s been learned when conducting their own mini-studies aimed at improving their local observational evaluation methods.

When the MET project recently had its “leads” meeting in Washington DC, the assembled group of researchers, developers, school administrators, and union leaders were treated to an after-dinner speech and Q&A by Joanne Weiss. Joanne is now Arne Duncan’s chief of staff, after having directed the Race to the Top program (and before that was involved in many Silicon Valley educational innovations). The approach of the current administration to teacher evaluation—emphasizing that it is about recognizing success—carries over into program evaluation. This attitude was clear in Joanne’s presentation, in which she declared an intention to “shine a light on what is working.” The approach is part of their thinking about the reauthorization of ESEA, where more flexibility is given to local decision- makers to develop solutions, while the federal legislation is more about establishing achievement goals such as being the leader in college graduation.

Hand in hand with providing flexibility to find solutions, Joanne also spoke of the need to build “local capacity to identify and scale up effective programs.” We welcome the idea that school districts will be free to try out good ideas and identify those that work. This kind of cycle of continuous improvement is very different from the idea, incorporated in NCLB, that researchers will determine what works and disseminate these facts to the practitioners. Joanne spoke about continuous improvement, in the context of teachers and principals, where on a small scale it may be possible to recognize successful teachers and programs without research methodologies. While a teacher’s perception of student progress in the classroom may be aided by regular assessments, the determination of success seldom calls for research design. We advocate for a broader scope, and maintain that a cycle of continuous improvement is just as much needed at the district and state levels. At those levels, we are talking about identifying successful schools or successful programs where research and statistical techniques are needed to direct the light onto what is working. Building research capacity at the district and state level will be a necessary accompaniment to any plan to highlight successes. And, of course, research can’t be motivated purely by the desire to document the success of a program. We have to be equally willing to recognize failure. The administration will have to take seriously the local capacity building to achieve the hoped-for identification and scaling up of successful programs.

2010-11-18
Archive