blog posts and news stories

Updated Research on the Impact of Alabama’s Math, Science, and Technology Initiative (AMSTI) on Student Achievement

We are excited to release the findings of a new round of work conducted to continue our investigation of AMSTI. Alabama’s specialized training program for math and science teachers began over 20 years ago and now reaches over 900 schools across the state. As the program is constantly evolving to meet the demands of new standards and new assessment systems, the AMSTI team and the Alabama State Department of Education continue to support research to evaluate the program’s impact. Our new report builds on the work undertaken last year to answer three new research questions.

  1. What is the impact of AMSTI on reading achievement? We found a positive impact of AMSTI for students on the ACT Aspire reading assessment equivalent to 2 percentile points. This replicates a finding from our earlier 2012 study. This analysis used students of AMSTI-trained science teachers, as the training purposely integrates reading and writing practices into the science modules.
  2. What is the impact of AMSTI on early-career teachers? We found positive impacts of AMSTI for partially-trained math teachers and fully-trained science teachers. The sample of teachers for this analysis was those in their first three years of teaching, with varying levels of AMSTI training.
  3. How can AMSTI continue program development to better serve ELL students? Our earlier work found a negative impact of AMSTI training for ELL students in science. Building upon these results, we were able to identify a small subset of “model ELL AMSTI schools” where there was both a positive impact of AMSTI on ELL students, and where that impact was larger than any school-level effect on ELL students versus the entire sample. By looking at the site-specific best practices of these schools for supporting ELL students in science and across the board, the AMSTI team can start to incorporate these strategies into the program at large.

All research Empirical Education has conducted on AMSTI can be found on our AMSTI webpage.

2020-04-06

Report Released on the Effectiveness of SRI/CAST's Enhanced Units

Summary of Findings

Empirical Education has released the results of a semester-long randomized experiment on the effectiveness of SRI/CAST’s Enhanced Units (EU). This study was conducted in cooperation with one district in California, and with two districts in Virginia, and was funded through a competitive Investing in Innovation (i3) grant from the U.S. Department of Education. EU combines research-based content enhancement routines, collaboration strategies and technology components for secondary history and biology classes. The goal of the grant is to improve student content learning and higher order reasoning, especially for students with disabilities. EU was developed during a two-year design-based implementation process with teachers and administrators co-designing the units with developers.

The evaluation employed a group randomized control trial in which classes were randomly assigned within teachers to receive the EU curriculum, or continue with business-as-usual. All teachers were trained in Enhanced Units. Overall, the study involved three districts, five schools, 13 teachers, 14 randomized blocks, and 30 classes (15 in each condition, with 18 in biology and 12 in U.S. History). This was an intent-to-treat design, with impact estimates generated by comparing average student outcomes for classes randomly assigned to the EU group with average student outcomes for classes assigned to control group status, regardless of the level of participation in or teacher implementation of EU instructional approaches after random assignment.

Overall, we found a positive impact of EU on student learning in history, but not on biology or across the two domains combined. Within biology, we found that students experienced greater impact on the Evolution unit than the Ecology unit. These findings supports a theory developed by the program developers that EU works especially well with content that progresses in a sequential and linear way. We also found a positive differential effect favoring students with disabilities, which is an encouraging result given the goal of the grant.

Final Report of CAST Enhanced Units Findings

The full report for this study can be downloaded using the link below.

Enhanced Units final report

Dissemination of Findings

2023 Dissemination

In April 2023, The U.S. Department of Education’s Office of Innovation and Early Learning Programs (IELP) within the Office of Elementary and Secondary Education (OESE) compiled cross-project summaries of completed Investing in Innovation (i3) and Education Innovation and Research (EIR) projects. Our CAST Enhanced Units study is included in one of the cross-project summaries. Read the 16-page summary using the link below.

Findings from Projects with a Focus on Serving Students with Disabilities

2020 Dissemination

Hannah D’ Apice presented these findings at the 2020 virtual conference for the Society for Research on Educational Effectiveness (SREE) in September 2020. Watch the recorded presentation using the link below.

Symposium Session 9A. Unpacking the Logic Model: A Discussion of Mediators and Antecedents of Educational Outcomes from the Investing in Innovation (i3) Program

2019-12-26

Come and See Us in 2020

For a 13th consecutive year, we will be presenting research topics of interest at the annual meeting of the American Educational Research Association (AERA). This year, the meeting will be held in our very own San Francisco. Some of our presentation topics include: Strategies for Teacher Retention, Impact Evaluation of a Science Teacher Professional Learning Intervention, and Combining Strategic Instruction Model Routines with Technology to Improve Academic Outcomes for Students with Disabilities. We’ll also be making our unprecedented appearance at AERA’s sister conference The National Council on Measurement in Education (NCME). Our topic will be about connecting issues of measurement to accuracy of impact estimates.

In addition to our numerous presentations at AERA and NCME, we will also be traveling to Washington DC in March to present at the annual conference of the Society for Research on Educational Effectiveness (SREE). We’re included in three presentations as part of a symposium on Social and Emotional Learning in Educational Settings & Academic Learning, and we have one presentation and a poster that report the results of a randomized trial conducted as part of an i3 validation grant, and address certain methodological challenges we have faced in conducting RCTs generally. In all, we will be disseminating results of, and discussing approaches to addressing technical challenges, from three i3 projects. We have either presented at or attended the SREE conference for the past 14 years, and look forward to the rich program that SREE is bound to put together for us in 2020.

We would be delighted to see you in either San Francisco or Washington DC. Please let us know if you plan to attend either conference.

2019-12-16

Findings from our Recent Research on Learning A-Z’s Raz-Plus

Learning A-Z contracted with Empirical Education to conduct a study on their personalized reading solution: Raz-Plus. In 2019, Raz-Plus was honored by SIIA with a CODiE Award in the category of Best Reading/Writing/Literature Instructional Solution for Grades PreK-8!

We are excited to release the results of our recent study of Raz-Plus in Milwaukee Public Schools. Raz-Plus is a literacy program that includes leveled books, skills practice, and digital activities and assessments.

The quasi-experimental study was conducted using data from the 2016-17 school year and examined the impact of Raz-Plus usage on student achievement for 3rd, 4th, and 5th grade students using the STAR Reading (STAR) assessment. Nearly 25,000 students across 120 schools in the district completed over 3 million Raz-Plus activities during the study year. There were three main findings from the study:

  1. STAR scores for students in classes of teachers who actively used Raz-Plus are better than for comparison students. The result had an effect size of .083 (p < .01), which corresponds to a 3-percentile point gain on the STAR test, adjusting for differences in student demographics and pretest between Raz-Plus and comparison students.
  2. The positive impact of Raz-Plus was replicated across many student subgroups, including Asian, African-American, and Hispanic students, as well as economically disadvantaged students and English Language Learners.
  3. Several Raz-Plus usage metrics were positively associated with STAR outcomes, most notably the number of quizzes assigned (p < .01). The average student would expect to see a 1 percentile point gain in their STAR score for every 21 quizzes assigned.

This study added to a growing body of evidence, both in Milwaukee Public Schools and other districts around the country, demonstrates the effectiveness of Learning A-Z’s independent leveled curriculum products for literacy. You can download the report using the link below.

Read the summary and find a link to download the report here.

2019-12-05

The Power of Logic Models

The Texas Education Agency (TEA) has developed initiatives aimed at reducing the number of low-performing public schools in Texas. As part of Regional Educational Laboratory (REL) Southwest’s School Improvement Research Partnership (SWSI) with TEA and Texas districts, researchers from REL Southwest planned a series of logic model training sessions that support TEA’s school improvement programs, such as the System of Great Schools (SGS) Network initiative.

To ensure that programs are successful and on track, program developers often use logic models to deepen their understanding of the relationships among program components (that is, resources, activities, outputs, and outcomes) and how these interact over time. A logic model is a graphical depiction of the logical relationship among the resources, activities, and intended outcomes of a program, with a series of if-then statements connecting the components. The value of using a logic model to undergird programs is that it helps individuals and groups implementing an initiative to articulate the common goals of the effort. Additionally, it helps to ensure that the strategies, resources, and supports provided to key stakeholders are aligned with the articulated goals and outcomes, providing a roadmap that creates clear connections across these program components. Finally, over the course of implementation, logic models can facilitate decisionmaking about how to adjust implementation and make changes to the program that can be tested to ensure they align with overall goals and outcomes identified in the logic model.

The logic model training is designed to provide TEA with a hands-on experience to develop logic models for the state’s school improvement strategy. Another overarching goal is to build TEA’s capacity to support local stakeholders with the development of logic models for their school improvement initiatives aligned with Texas’s strategy and local context.

The first training session, titled “School Improvement Research Partnership: Using Logic Modeling for Statewide School Improvement Efforts,” was held earlier this year. SWSI partners focused on developing a logic model for the SGS initiative. It was an in-person gathering aimed at teaching participants how to create logic models by addressing the following:

  • Increasing knowledge of general concepts, purposes, and uses of logic models
  • Increasing knowledge and understanding of the components that make up a logic model
  • Building capacity in understanding links between components of school improvement initiatives
  • Providing hands-on opportunities to develop logic models for local school improvement initiatives

The timing of the logic model workshop was helpful because it allowed the district-focused SGS leaders at TEA to organize the developed SGS framework into a logic model that enables TEA to plan and guide implementation, lay the foundation for the development of an implementation rubric, and serve as a resource to continuously improve the strategy. TEA also plans to use the logic model to communicate with districts and other stakeholders about the sequence of the program and intended outcomes.

REL Southwest will continue to provide TEA with training and technical support and will engage local stakeholders as the logic models are finalized. These sessions will focus on refining the logic models and ensure that TEA staff will be equipped with the ability to develop logic models on their own for current and future initiatives and programs.

This blog post was co-published with REL Southwest.

2019-08-07

Conference Season 2019

Are you staying warm this winter? Can’t wait for the spring? Us either, with spring conference season right around the corner! Find our Empirical team traveling bicoastally in these upcoming months.

We’re starting the season right in our backyard at the Bay Area Learning Analytics (BayLAN) Conference at Stanford University on March 2, 2019! CEO Denis Newman will be presenting on a panel on the importance of efficacy with Jeremy Roschelle of Digital Promise. Senior Research Scientist Valeriy Lazarev will also be attending the conference.

The next day, the team will be off to SXSW EDU in Austin, Texas! Our goal is to talk to people about the new venture, Evidentally.

Then we’re headed to Washington D.C. to attend the annual Society for Research on Educational Effectiveness (SREE) Conference! Andrew Jaciw will be presenting “A Study of the Impact of the CREATE Residency Program on Teacher Socio-Emotional and Self-Regulatory Outcomes”. We will be presenting on Friday March 8, 2:30 PM - 4:00 PM during the “Social and Emotional Learning in Education Settings” sessions in Ballroom 1. Denis will also be attending and with Andrew, meeting with many research colleagues. If you can’t catch us in D.C., you can find Andrew back in the Bay Area at the sixth annual Carnegie Foundation Summit.

For the last leg of spring conferences, we’ll be back at the American Educational Research Association’s Annual (AERA) Meeting in Toronto, Canada from April 6th to 9th. There you’ll be able to hear more about the CREATE Teacher Residency Research Study presented by Andrew Jaciw, joined by Vice President of Research Operations Jenna Zacamy along with our new Research Manager, Audra Wingard. And for the first time in 10 years, you won’t be finding Denis at AERA… Instead he’ll be at the ASU GSV Summit in San Diego, California!

2019-02-12

Evidentally, a New Company Taking on Edtech Efficacy Analytics

Empirical Education has launched Evidentally, Inc., a new company that specializes in helping edtech companies and their investors make more effective products. Founded by Denis Newman, CEO, and Val Lazarev, Chief Product Architect, the company conducts rapid cycle evaluations that meet the federal Every Student Succeeds Act standards for moderate and promising evidence. The efficacy analytics leverage the edtech product’s usage metrics to efficiently identify states and districts with sufficient usage to make impact studies feasible. Evidentally is actively servicing and securing initial clients, and is seeking seed funding to prepare for expansion. In the meantime, the company is being incubated by Empirical Education, which has transferred intellectual property relating to its R&D prototypes of the service and is providing staffing through a services agreement. The Evidentally team will be meeting with partners and investors at SXSW EDU, EdSurge Immersion, ASU GSV Summit, and ISTE. Let’s talk!

2019-02-06

Research on AMSTI Presented to the Alabama State Board of Education

On January 13, Dr. Eric Mackey, Alabama’s new State Superintendent of Education, presented our rapid cycle evaluation of Alabama Math, Science, and Technology Initiative (AMSTI). The study is based on results for the 2016-17 school year, for which outcome data were available at the time the Alabama State Department of Education (ALSDE) contracted with Empirical in July 2018.

AMSTI is ALSDE’s initiative to improve math and science teaching statewide; the program, which started over 20 years ago, now operates in over 900 schools across the state.

Our current project, led by Val Lazarev compares classes taught by teachers who were fully trained in AMSTI with matched classrooms taught by teachers with no AMSTI training. The overall results, shown in the above graph were similar in magnitude to Empirical’s 2012 study directed by Denis Newman and designed by Empirical’s Chief Scientist, Andrew Jaciw. That cluster-randomized trial, which involved 82 schools and ~700 teachers, showed AMSTI had a small overall positive effect. The earlier study also showed that AMSTI may be exacerbating the achievement gap between black and white students. Since ALSDE was also interested in information that could improve AMSTI, the current study examined a number of subgroup impacts. In this project we did not find a difference between the value of AMSTI for black and white students. We did find a strong benefit for females in science. And for English learners, there was a negative effect of being in a science class of an AMSTI-trained teacher. The state board expressed concern and a commitment to using the results to guide improvement of the program.

Download both of the reports here.

2019-01-22

View from the West Coast: Relevance is More Important than Methodological Purity

Bob Slavin published a blog post in which he argues that evaluation research can be damaged by using the cloud-based data routinely collected by today’s education technology (edtech). We see serious flaws with this argument and it is quite clear that he directly opposes the position we have taken in a number of papers and postings, and also discussed as part of the west coast conversations about education research policy. Namely, we’ve argued that using the usage data routinely collected by edtech can greatly improve the relevance and usefulness of evaluations.

Bob’s argument is that if you use data collected during the implementation of the program to identify students and teachers who used the product as intended, you introduce bias. The case he is concerned with is in a matched comparison study (or quasi-experiment) where the researcher has to find the right matching students or classes to the students using the edtech. The key point he makes is:

“students who used the computers [or edtech product being evaluated] were more motivated or skilled than other students in ways the pretests do not detect.”

That is, there is an unmeasured characteristic, let’s call it motivation, that both explains the student’s desire to use the product and explains why they did better on the outcome measure. Since the characteristic is not measured, you don’t know which students in the control classes have this motivation. If you select the matching students only on the basis of their having the same pretest level, demographics, and other measured characteristics but you don’t match on “motivation”, you have biased the result.

The first thing to note about this concern, is that there may not be a factor such motivation that explains both edtech usage and the favorable outcome. It is just that there is a theoretical possibility that such a variable is driving the result. The bias may or may not be there and to reject a method because there is an unverifiable possibility of bias is an extreme move.

Second, it is interesting that he uses an example that seems concrete but is not at all specific to the bias mechanism he’s worried about.

“Sometimes teachers use computer access as a reward for good work, or as an extension activity, in which case the bias is obvious.”

This isn’t a problem of an unmeasured variable at all. The problem is that the usage didn’t cause the improvement—rather, the improvement caused the usage. This would be a problem in a randomized “gold standard” experiment. The example makes it sound like the problem is “obvious” and concrete, when Bob’s concern is purely theoretical. This example is a good argument for having the kind of implementation analyses of the sort that ISTE is doing in their Edtech Advisor and Jefferson Education Exchange has embarked on.

What is most disturbing about Bob’s blog post is that he makes a statement that is not supported by the ESSA definitions or U.S. Department of Education regulations or guidance. He claims that:

“In order to reach the second level (“moderate”) of ESSA or Evidence for ESSA, a matched study must do everything a randomized study does, including emphasizing ITT [Intent To Treat, i.e., using all students in the pre-identified schools or classes where administrators intended to use the product] estimates, with the exception of randomizing at the start.”

It is true that Bob’s own site Evidence for ESSA, will not accept any study that does not follow the ITT protocol but ESSA, itself, does not require that constraint.

Essentially, Bob is throwing away relevance to school decision-makers in order to maintain an unnecessary purity of research design. School decision-makers care whether the product is likely to work with their school’s population and available resources. Can it solve their problem (e.g., reduce achievement gaps among demographic categories) if they can implement it adequately? Disallowing efficacy studies that consider compliance to a pre-specified level of usage in selecting the “treatment group” is to throw out relevance in favor or methodological purity. Yes, there is a potential for bias, which is why ESSA considers matched-comparison efficacy studies to be “moderate” evidence. But school decisions aren’t made on the basis of which product has the largest average effect when all the non-users are included. A measure of subgroup differences, when the implementation is adequate, provides more useful information.

2018-12-27

Classrooms and Districts: Breaking Down Silos in Education Research and Evidence

I just got back from Edsurge’s Fusion conference. The theme, aimed at classroom and school leaders, was personalizing classroom instruction. This is guided by learning science, which includes brain development and the impact of trauma, as well as empathetic caregiving, as Pamela Cantor beautifully explained in her keynote. It also leads to detailed characterizations of learner variability being explored at Digital Promise by Vic Vuchic’s team, which is providing teachers with mappings between classroom goals and tools and strategies that can address learners who vary in background, cognitive skills, and socio-emotional character.

One of the conference tracks that particularly interested me was the workshops and discussions under “Research & Evidence”. Here is where I experienced a disconnect between Empirical ’s research policy-oriented work interpreting ESSA and Fusion’s focus on improving the classroom.

  • The Fusion conference is focused at the classroom level, where teachers along with their coaches and school leaders are making decisions about personalizing the instruction to students. They advocate basing decisions on research and evidence from the learning sciences.
  • Our work, also using research and evidence, has been focused on the school district level where decisions are about procurement and implementation of educational materials including the technical infrastructure needed, for example, for edtech products.

While the classroom and district levels have different needs and resources and look to different areas of scientific expertise, they need not form conceptual silos. But the differences need to be understood.

Consider the different ways we look at piloting a new product.

  • The Digital Promise edtech pilot framework attempts to move schools toward a more planful approach by getting them to identify and quantify the problem for which the product being piloted could be a solution. The success in the pilot classrooms is evaluated by the teachers, where detailed understandings by the teacher don’t call for statistical comparisons. Their framework points to tools such as the RCE Coach that can help with the statistics to support local decisions.
  • Our work looks at pilots differently. Pilots are excellent for understanding implementability and classroom acceptance (and working with developers to improve the product), but even with rapid cycle tools, the quantitative outcomes are usually not available in time for local decisions. We are more interested in how data can be accumulated nationally from thousands of pilots so that teachers and administrators can get information on which products are likely to work in their classrooms given their local demographics and resources. This is where review sites like Edsurge product reviews or Noodle’s ProcureK12) could be enhanced with evidence about for whom, and under what conditions, the products work best. With over 5,000 edtech products, an initial filter to help choose what a school should pilot will be necessary.

A framework that puts these two approaches together is promulgated in the Every Student Succeeds Act (ESSA). ESSA defines four levels of evidence, based on the strength of the causal inference about whether the product works. More than just a system for rating the scientific rigor of a study, it is a guide to developing a research program with a basis in learning science. The base level says that the program must have a rationale. This brings us back to the Digital Promise edtech pilot framework needing teachers to define their problem. The ESSA level 1 rationale is what the pilot framework calls for. Schools must start thinking through what the problem is that needs to be solved and why a particular product is likely to be a solution. This base level sets up the communication between educators and developers about not just whether the product works in the classroom, but how to improve it.

The next level in ESSA, called “correlational,” is considered weak evidence, because it shows only that the product has “promise” and is worth studying with a stronger method. However, this level is far more useful as a way for developers to gather information about which parts of the program are driving student results, and which patterns of usage may be detrimental. Schools can see if there is an amount of usage that maximizes the value of the product (rather than depending solely on the developer’s rationale). This level 2 calls for piloting the program and examining quantitative results. To get correlational results, the pilot must have enough students and may require going beyond a single school. This is a reason that we usually look for a district’s involvement in a pilot.

The top two levels in the ESSA scheme involve comparisons of students and teachers who use the product to those who do not. These are the levels where it begins to make sense to combine a number of studies of the same product from different districts in a statistical process called meta-analysis so we can start to make generalizations. At these levels, it is very important to look beyond just the comparison of the program group and the control group and gather information on the characteristics of schools, teachers, and students who benefit most (and least) from the product. This is the evidence of most value to product review sites.

When it comes to characterizing schools, teachers, and students, the “classroom” and the “district” approach have different, but equally important, needs.

  • The learner variability project has very fine-grained categories that teachers are able to establish for the students in their class.
  • For generalizable evidence, we need characteristics that are routinely collected by the schools. To make data analysis for efficacy studies a common occurrence, we have to avoid expensive surveys and testing of students that are used only for the research. Furthermore, the research community must reach consensus on a limited number of variables that will be used in research. Fortunately, another aspect of ESSA is the broadening of routine data collection for accountability purposes, so that information on improvements in socio-emotional learning or school climate will be usable in studies.

Edsurge and Digital Promise are part of a west coast contingent of researchers, funders, policymakers, and edtech developers that has been discussing these issues. We look forward to continuing this conversation within the framework provided by ESSA. When we look at the ESSA levels as not just vertical but building out from concrete classroom experience to more abstract and general results from thousands of school districts, then learning science and efficacy research are combined. This strengthens our ability to serve all students, teachers, and school leaders.

2018-10-08
Archive