blog posts and news stories

View from the West Coast: Relevance is More Important than Methodological Purity

Bob Slavin published a blog post in which he argues that evaluation research can be damaged by using the cloud-based data routinely collected by today’s education technology (edtech). We see serious flaws with this argument and it is quite clear that he directly opposes the position we have taken in a number of papers and postings, and also discussed as part of the west coast conversations about education research policy. Namely, we’ve argued that using the usage data routinely collected by edtech can greatly improve the relevance and usefulness of evaluations.

Bob’s argument is that if you use data collected during the implementation of the program to identify students and teachers who used the product as intended, you introduce bias. The case he is concerned with is in a matched comparison study (or quasi-experiment) where the researcher has to find the right matching students or classes to the students using the edtech. The key point he makes is:

“students who used the computers [or edtech product being evaluated] were more motivated or skilled than other students in ways the pretests do not detect.”

That is, there is an unmeasured characteristic, let’s call it motivation, that both explains the student’s desire to use the product and explains why they did better on the outcome measure. Since the characteristic is not measured, you don’t know which students in the control classes have this motivation. If you select the matching students only on the basis of their having the same pretest level, demographics, and other measured characteristics but you don’t match on “motivation”, you have biased the result.

The first thing to note about this concern, is that there may not be a factor such motivation that explains both edtech usage and the favorable outcome. It is just that there is a theoretical possibility that such a variable is driving the result. The bias may or may not be there and to reject a method because there is an unverifiable possibility of bias is an extreme move.

Second, it is interesting that he uses an example that seems concrete but is not at all specific to the bias mechanism he’s worried about.

“Sometimes teachers use computer access as a reward for good work, or as an extension activity, in which case the bias is obvious.”

This isn’t a problem of an unmeasured variable at all. The problem is that the usage didn’t cause the improvement—rather, the improvement caused the usage. This would be a problem in a randomized “gold standard” experiment. The example makes it sound like the problem is “obvious” and concrete, when Bob’s concern is purely theoretical. This example is a good argument for having the kind of implementation analyses of the sort that ISTE is doing in their Edtech Advisor and Jefferson Education Exchange has embarked on.

What is most disturbing about Bob’s blog post is that he makes a statement that is not supported by the ESSA definitions or U.S. Department of Education regulations or guidance. He claims that:

“In order to reach the second level (“moderate”) of ESSA or Evidence for ESSA, a matched study must do everything a randomized study does, including emphasizing ITT [Intent To Treat, i.e., using all students in the pre-identified schools or classes where administrators intended to use the product] estimates, with the exception of randomizing at the start.”

It is true that Bob’s own site Evidence for ESSA, will not accept any study that does not follow the ITT protocol but ESSA, itself, does not require that constraint.

Essentially, Bob is throwing away relevance to school decision-makers in order to maintain an unnecessary purity of research design. School decision-makers care whether the product is likely to work with their school’s population and available resources. Can it solve their problem (e.g., reduce achievement gaps among demographic categories) if they can implement it adequately? Disallowing efficacy studies that consider compliance to a pre-specified level of usage in selecting the “treatment group” is to throw out relevance in favor or methodological purity. Yes, there is a potential for bias, which is why ESSA considers matched-comparison efficacy studies to be “moderate” evidence. But school decisions aren’t made on the basis of which product has the largest average effect when all the non-users are included. A measure of subgroup differences, when the implementation is adequate, provides more useful information.

2018-12-27

Updated Research Guidelines Will Improve Education Technology Products and Provide More Value to Schools

Recommendations include 16 best practices for the design, implementation, and reporting of Usable Evidence for Educators

Palo Alto, CA (April 25, 2018) – Empirical Education Inc. and the Education Technology Industry Network (ETIN) of SIIA released an important update to the “Guidelines for Conducting and Reporting Edtech Impact Research in U.S. K-12 Schools” today.

Authored by Empirical Education researchers, Drs. Denis Newman, Andrew Jaciw, and Valeriy Lazarev, the Guidelines detail 16 best practices for the design, implementation, and reporting of efficacy research of education technology. Recommendations range from completing the product’s logic model before fielding it to disseminating a study’s results in accessible and non-technical language.

The Guidelines were first introduced in July 2017 at ETIN’s Edtech Impact Symposium to address the changing demand for research. They served to address new challenges driven by the accelerated pace of edtech development and product releases, the movement of new software to the cloud, and the passage of the Every Student Succeeds Act (ESSA). The authors committed to making regular updates to keep pace with technical advances in edtech and research methods.

“Our collaboration with ETIN brought the right mix of practical expertise to this important document,” said Denis Newman, CEO of Empirical Education and lead author of the Guidelines. “ETIN provided valuable expertise in edtech marketing, policy, and development. With over a decade of experience evaluating policies, programs, and products for the U.S. Department of Education, major research organizations, and publishers, Empirical Education brought a deep understanding of how studies are traditionally performed and how they can be improved in the future. Our experience with our Evidence as a Service™ offering to investors and developers of edtech products also informed the guidelines.”

The current edition advocates for analysis of usage patterns in the data collected routinely by edtech applications. These patterns help to identify classrooms and schools with adequate implementation and lead to lower-cost faster turn-around research. So rather than investing hundreds of thousands of dollars in a single large-scale study, developers should consider multiple small-scale studies. The authors point to the advantages of looking at subgroup analysis to better understand how and for whom the product works best, thus more directly answering common educator questions. Issues with quality of implementation are addressed in greater depth, and the visual design of the Guidelines has been refined for improved readability.

“These guidelines may spark a rebellion against the research business as usual, which doesn’t help educators know whether an edtech product will work for their specific populations. They also provide a basis for schools and developers to partner to make products better,” said Mitch Weisburgh, Managing Partner of Academic Business Advisors, LLC and President of ETIN, who has moderated panels and webinars on edtech research.

Empirical Education, in partnership with a variety of organizations, is conducting webinars to help explain the updates to the Guidelines, as well as to discuss the importance of these best practices in the age of ESSA. The updated Guidelines are available here: https://www.empiricaleducation.com/research-guidelines/.

2018-04-25

A Conversation About Building State and Local Research Capacity

John Q Easton, director of the Institute of Education Sciences (IES), came to New Orleans recently to participate in the annual meeting of the American Educational Research Association. At one of his stops, he was the featured speaker at a meeting of the Directors of Research and Evaluation (DRE), an organization composed of school district research directors. (DRE is affiliated with AERA and was recently incorporated as a 501©(3)). John started his remarks by pointing out that for much of his career he was a school district research director and felt great affinity to the group. He introduced the directions that IES was taking, especially how it was approaching working with school systems. He spent most of the hour fielding questions and engaging in discussion with the participants. Several interesting points came out of the conversation about roles for the researchers who work for education agencies.

Historically, most IES research grant programs have been aimed at university or other academic researchers. It is noteworthy that even in a program for “Evaluation of State and Local Education Programs and Policies,” grants have been awarded only to universities and large research firms. There is no expectation that researchers working for the state or local agency would be involved in the research beyond the implementation of the program. The RFP for the next generation of Regional Education Labs (REL) contracts may help to change that. The new RFP expects the RELs to work closely with education agencies to define their research questions and to assist alliances of state and local agencies in developing their own research capacity.

Members of the audience noted that, as district directors of research, they often spend more time reviewing research proposals from students and professors at local colleges who want to conduct research in their schools, rather than actually answering questions initiated by the district. Funded researchers treat the districts as the “human subjects,” paying incentives to participants and sometimes paying for data services. But the districts seldom participate in defining the research topic, conducting the studies, or benefiting directly from the reported findings. The new mission of the RELs to build local capacity will be a major shift.

Some in the audience pointed out reasons to be skeptical that this REL agenda would be possible. How can we build capacity if research and evaluation departments across the country are being cut? In fact, very little is known about the number of state or district practitioners whose capacity for research and evaluation could be built by applying the REL resources. (Perhaps, a good first research task for the RELs would be to conduct a national survey to measure the existing capacity.)

John made a good point in reply: IES and the RELs have to work with the district leadership—not just the R&E departments—to make this work. The leadership has to have a more analytic view. They need to see the value of having an R&E department that goes beyond test administration, and is able to obtain evidence to support local decisions. By cultivating a research culture in the district, evaluation could be routinely built in to new program implementations from the beginning. The value of the research would be demonstrated in the improvements resulting from informed decisions. Without a district leadership team that values research to find out what works for the district, internal R&E departments will not be seen as an important capacity.

Some in the audience pointed out that in parallel to building a research culture in districts, it will be necessary to build a practitioner culture among researchers. It would be straightforward for IES to require that research grantees and contractors engage the district R&E staff in the actual work, not just review the research plan and sign the FERPA agreement. Practitioners ultimately hold the expertise in how the programs and research can be implemented successfully in the district, thus improving the overall quality and relevance of the research.

2011-04-20
Archive