Empirical Education Inc.

Doing Something Truly Original in the Music of Program Evaluation

Is it possible to do something truly original in science?

How about in Quant evaluations in the social sciences?

The operative word here is "truly". I have in mind contributions that are "outside the box".

I would argue that standard Quant provides limited opportunity for originality. Yet, QuantCrit forces us to dig deep to arrive at original solutions - to reinterpret, reconfigure, and in some cases reinvent Quant approaches.

That is, I contend that Quant Crit asks the kinds of questions that force us to go outside the box of conventional assumptions and develop instrumentation and solutions that are broader and better. Yet, I qualify this by saying (and some will disagree), that doing so does not require us to give up the core assumptions that are at the foundation of Quant evaluation methods.

I find that developments and originality in jazz music closely parallel what I have in mind in discussing the evolution of genres in Quant evaluations, and what it means to conceive of and address problems and opportunities outside the box. (You can skip this section, and go straight to the final thoughts, but I would love to share my ideas with you here.)

An Analogy for Originality in the Artistry of Herbie Hancock

Last week I took my daughter, Maya, to see the legendary keyboardist Herbie Hancock perform live with Lionel Loueke, Terrance Blanchard and others. CHILLS along my spine, is how I would describe it. I found myself fixating on Hancock’s hand movements on the keys, and how he swiveled between the grand piano and the KORG synthesizer, and asking: "the improvisation is on-point all the time – how does he know how to go right there?"

Hancock, winner of an Academy award, and 14 Grammys, is a (if not the) major force in the evolution of jazz through the last 60 years, up to the contemporary scene.

His main start was in the 1960's as the pianist in Miles Davis' Second Great Quintet. (When Hancock was dispirited, Davis famously advised him "don't play the butter notes"). Check out this performance by the band of Wayne Shorter's composition "Footprints" from 1967 – note the symbiosis among the group and Hancocks respectful treatment of the melody.

In the 1970’s Hancock developed styles of jazz fusion and funk with the Headhunters (e.g., Chameleon.

Then in the 1980's Hancock explored electro styles, capped by the song "Rockit" – a smash that straddled jazz, pop and hip-hop. It featured scratch styling and became a mainstay for breakdancing (in upper elementary school I co-created a truly amateurish school play that ended in an ensemble Rockit dance with the best breakdancers in our school). Here's Hancock's Grammy performance.

Below is a picture of Hancock from the other night with the strapped synth popularized through the song Rockit.

Hancock did plenty more besides what I mention here, but I narrowed his contributions to just a couple to help me make my point.

His direction, especially with funk fusion and Rockit, ruffled the feathers of more than a few jazz purists. He did not mind. His response was "I have to be true to myself…it was something that I needed to do….because it takes courage to work outside the box…and yet, that’s where the growth lies”.

He also recognized that the need for progression was not just to satisfy his creative direction, but to keep the audience listening; that is, for the music, jazz, to stay alive and relevant. If someone asserts that "Rockit" was a betrayal of jazz that sacrilegiously crossed over into pop and hip-hop, I would counter argue that it opened up the world of jazz to a whole generation of pop listeners (including me). (I recognize similar developments in the genre-crossing works of recent times by Robert Glasper.)

Hancock is a perfect case study of an artist executing his craft (a) fearlessly, (b) not with the goal of pleasing everyone, (c) with the purpose of connecting with, and reaching new audiences, (d) by being open to alternative influences, (e) to achieve a harmonious melodic fusion (moving between his KORG synth, a grand piano), and (f) with constant appreciation reflection of the roots and fundamentals.

Hancock and Band

Coming Back to the Idea of the Fusion of Quant with Quant Crit in Program Evaluation

Society today presents us with situations that require critical examination of how we use the instruments on which we are trained, and an audit of the effect they have, both intended and unintended. It also requires that we adapt the applications of methods that we have honed for years. The contemporary situation poses the question: How can we expand the range of what we can do with the instruments on which we are trained, given the solutions that society needs today, recognizing that any application has social ramifications? I have in mind the need to prioritize problems of equity and social and racial justice. How do we look past conventional applications that limit the recognition, articulation, and development of solutions to important and vexing problems in society?

Rather than feeling powerless and overwhelmed, the Quant evaluator is very well positioned to do this work. I greatly appreciate the observation by Frances Stage on this point:

"…as quantitative researchers we are uniquely able to find those contradictions and negative assumptions that exist in quantitative research frames"

This is analogous to saying that a dedicated pianist in classic jazz is very well positioned to expand the progressions and reach harmonies that reflect contemporary opportunities, needs and interests. It may also require the Quant evaluator to expand his/her arrangements and instrumentation.

As Quant researchers and evaluators, we are most familiar with the "rules of playing" that reinforce "the same old song" that needs questioning. Quant Crit can give us the momentum to push the limits of our instruments and apply them in new ways.

In making these points I feel a welcome alignment with Hancock's approach: recognizing the need to break free from cliché and convention, to keep meaningful discussion going, to maximize relevance, to get to the core of evaluation purpose, to reach new audiences and seed/facilitate new collaborations.

Over the next year I'll be posting a few creations, and striking in some new directions, with syncopations and chords that try to maneuver around and through the orthodoxy – "switching up" between the "KORG and the baby grand" so to speak.

Please stay tuned.

The Band on Stage

2024-10-15

Posted by: Andrew Jaciw

Tags: andrew jaciw, fusion and program evaluation

SREE 2024: On a Mission to Deepen my Quant and Equity Perspectives

I am about to get on the plane to SREE

I am excited, but also somewhat nervous.

Why?

I'm excited
to immerse myself in the conference – my goal is to try to straddle paradigms of criticality, and the quant tradition. SREE historically has championed empirical findings using rigorous statistical methods.

I'm excited
because I will be discussing intersectionality – a topic of interest that emerged from attending a series of Critical Perspectives webinars hosted by SREE in the last few years. I want to try to pay it back by moving the conversation forward and contributing to the critical discussion.

I'm nervous
because the topic of intersectionality is new for me. The idea cuts across many areas - law, sociology, epidemiology, education. It’s a vast subject area with various literature streams. I am new to it. It also gets at social justice issues that I am not used to talking about, and I want to express those clearly and accurately. I understand the power and privilege of my words and presentation and want the audience to continue to inquire and move the conversation forward.

I'm nervous
Because issues of quantitative criticality require a person to confront their deeper philosophical commitments, assumptions, and theory of knowledge (epistemology). I have no problem with that; however, a few of my experimentalist colleagues have expressed a deep resistance to philosophy. One described it as merely a “throat clearing exercise”. (I wonder: Will those with a positivist bent leave my talk in droves?)

Andrew staring at clock

What is intersectionality anyways, and why was I attracted to the idea? It originates in the legal-scholarly work of Kimberle Crenshaw. She describes a court case filed against GM:

"In DeGraffenreid, the court refused to recognize the possibility of compound discrimination against Black women and analyzed their claim using the employment of white women as the historical base. As a consequence, the employment experiences of white women obscured the distinct discrimination that Black women experienced."

The courts refusal to "acknowledge that Black women encounter combined race and sex discrimination implies that the boundaries of sex and race discrimination doctrine are defined respectively by white women's and Black men's experiences."

The justices refused to recognize that hiring practices by GM compounded discrimination across specific intersections of socially-recognized categories (i.e., Black women). The issue is obvious but can be made concrete with an example. Imagine the following distribution of equally-qualified candidates. The court judgment would not have recognized the following situation of compound discrimination:

graphic of gender and race

Why did intersectionality spike my interest in the first place? In the course of the SREE Critical Perspectives seminars, it occurred to me that intersectionality was a concept that bridged what I know with what I want to know.

I like representing problems and opportunities in education in quantitative terms. I use models. However, I also prioritize understanding of the limits of our models, with reality serving as the ultimate check of the validity of the representation. Intersectionality, as a concept, pits out standard models against a reality that is both complex and socially urgent.

Intersectionality as a bridge:

graphic on intersectionality

Intersectionality presents an opportunity to reconcile two worlds, which is a welcome puzzle to work on.

picture of a puzzle

Here’s how I organized my talk. (See the postscript for how it went.)

My positionality: I discussed my background "where I am coming from": including that most of my training is in quant methods, that I am interested in problems of causal generalizability, that I don’t shy away from philosophy, and that my children are racialized as mixed-race and their status inspired my first hypothetical example.
I summarized intersectionality as originally conceived. I reviewed the idea as it was developed by Crenshaw.
I reviewed some of the developments in intersectionality among quantitative researchers who describe their work and approaches as "quantitative intersectionality".
I explored an extension of the idea of intersectionality through the concept of "unique to group" variables: I argued for the need to diversify our models of outcomes and impacts to take into account moderators of impact that are relevant to only specific groups and that respect the uniqueness of their experiences. (I will discuss this more in another blog that is soon to come.)
I provided two examples, one hypothetical, and one real that clarified what I mean by the role of "unique to group" variables.
I summarized the lessons.

picture of a streetlight

There were some other exceptional talks that I attended at SREE, including:

Postscript: How it went!

The other three talks in the session in which I presented (Unpacking Heterogeneous Effects: Methodological Innovations in Educational Research) were excellent. They included a work by Peter Halpin on a topic that I have been puzzled by for a while, specifically, how item-level information can be leveraged to assess program impacts. We almost always assess impacts on scale scores from “ready-made” tests that are based on calibrations of item-level scores. In an experiment one effectively introduces variance into a testing situation and I have wondered what it means for impacts to register at the item level, because each item-level effect will likely interact with the treatment effect. So “hats off” to linking psychometrics and construct validity to discussion of impacts.

As for my presentation, I was deeply moved by the sentiments that were expressed by several conference goers who came up to me afterwards. One comment was "you are on the right track". Others voiced an appreciation for my addressing the topic. I did feel THE BRIDGING between paradigms that I hoped to at least set in motion. This was especially true when one of the other presenters in the session, who had addressed the topic of effect heterogeneity across studies, commented: “Wow, you’re talking about some of the very same things that I am thinking”. It felt good to know that this convergence happened in spite of the fact that the two talks could be seen as very different at the surface level. (And no, people did not leave in droves.)

Thank you Baltimore! I feel more motivated than ever. Thank you SREE organizers and participants.

Picture of Baltimore.

Treating myself afterwards…

A special shoutout to Jose Blackorby. In the end, I did hang up my tie. But I haven’t given up on the idea – just need to find one from a hot pink or aqua blue palette.

Andrew standing by the sree banner

2024-10-04

Posted by: Andrew Jaciw

Tags: andrew jaciw, conference, intersectionality and sree

Happy New Year from Empirical Education

To ring in the new year, we want to share this two-minute video with you. It comprises highlights from 2022 from each person on our team. We hope you like it. Cheers to a healthy and prosperous 2023!

My colleagues appear in this order in the video.

Happy New Year photo by Sincerely Media

2022-12-15

Posted by: Robin Means

Tags: Adam Schellinger, Andrew Jaciw, Chelsey Nardi, Denis Newman, Garrett Lai, Jenna Zacamy, Kylene Shen, Li Lin, Lindsay Maurer, Robin Means, Skott Wade, Sze-Shun Lau, Val Lazarev and Zahava Heydel

Carnegie Summit 2017 Recap

If you’ve never been to Carnegie Summit, we highly recommend it.

This was our first year attending Carnegie Foundation’s annual conference in San Francisco, and we only wish we had checked it out sooner. Chief Scientist Andrew Jaciw attended on behalf of Empirical Education, and he took over our twitter account for the duration of the event. Below is a recap of his live tweeting, interspersed with additional thoughts too verbose for twitter’s strict character limitations.

Day 1

Curious about what I will learn. On my mind: Tony Bryk’s distinction between evidence-based practice and practice-based evidence. I am also thinking of how the approaches to be discussed connect to ideas of Lee Cronbach - he was very interested in timeliness and relevance of research findings and the limited reach of internal validity.

I enjoyed T. Bryk’s talk. These points resonated.

Improvement Science involves a hands-on approach to identifying systemic sources of predictable failure. This is appealing because it puts problem solving at the core, while realizing the context-specificity of what will actually work!

Day 2

Jared Bolte - Great talk! Improvement Science contrasts with traditional efficacy research by jumping right in to solve problems, instead of waiting. This raises an important question: What is the cost of delaying action to wait for efficacy findings? I am reminded of Lee Cronbach’s point: the half-life of empirical propositions is short!

This was an excellent session with Tony Bryk and John Easton. There were three important questions posed.

Day 3

Excited to Learn about PDSA cycles

2017-04-27

Posted by: Andrew Jaciw

Tags: andrew jaciw, carnegie, conference, education research, empirical education, evaluation, psychometric and research

SREE Spring 2017 Conference Recap

Several Empirical Education team members attended the annual SREE conference in Washington, DC from March 4th - 5th. This year’s conference theme, “Expanding the Toolkit: Maximizing Relevance, Effectiveness and Rigor in Education Research,” included a variety of sessions focused on partnerships between researchers and practitioners, classroom instruction, education policy, social and emotional learning, education and life cycle transitions, and research methods. Andrew Jaciw, Chief Scientist at Empirical Education, chaired a session about Advances in Quasi-Experimental Design. Jaciw also presented a poster on developing a “systems check” for efficacy studies under development. For more information on this diagnostic approach to evaluation, watch this Facebook Live video of Andrew’s discussion of the topic.

Other highlights of the conference included Sean Reardon’s keynote address highlighting uses of “big data” in creating context and generating hypotheses in education research. Based on data from the Stanford Education Data Archive (SEDA), Sean shared several striking patterns of variation in achievement and achievement gaps among districts across the country, as well as correlations between achievement gaps and socioeconomic status. Sean challenged the audience to consider how to expand this work and use this kind of “big data” to address critical questions about inequality in academic performance and education attainment. The day prior to the lecture, our CEO, Denis Newman, attended a workshop lead by Sean and colleagues (Workshop C) that provided a detailed overview of the SEDA data and how it can be used in education research. The psychometric work to generate equivalent scores for every district in the country, the basis for his findings, was impressive and we look forward to their solving the daunting problem of extending the database to encompass individual schools.

2017-03-24

Posted by: Jenna Zacamy & Megan Toby

Tags: achievement gap, andrew jaciw, conference, denis newman, education research, empirical education, evaluation, psychometric, QE, research, sree and Stanford Education Data Archive

Empirical Education Publication Productivity

Empirical Education’s research group, led by Chief Scientist Andrew Jaciw, has been busy publishing articles that address key concerns of educators and researchers.

Our article describing the efficacy trial of Math in Focus program that was accepted by JREE earlier this year just arrived in print to our Palo Alto office a couple of weeks ago. If you subscribe to JREE, it’s the very first article in the current issue (volume 9, number 4). If you don’t subscribe, we have a copy in our lobby if anyone would like to stop by and check it out.

Another article that the analysis team has been working on is called “An Empirical Study of Design Parameters for Assessing Differential Impacts for Students in Group Randomized Trials.” This one has recently been accepted for publication in the Evaluation Review in the issue that should be printed any day now. The paper grows out of our work on many cluster randomized trials and our interest in differential impacts of programs. We believe that the question of “what works” has limited meaning without systematic exploration of “for whom” and “under what conditions”. The common perception is that these latter concerns are secondary and our designs have too little power to assess them. We challenge these notions and provide guidelines for addressing these questions.

In another issue of Evaluation Review, we published two companion articles:

Assessing the Accuracy of Generalized Inferences From Comparison Group Studies Using a Within-Study Comparison Approach: The Methodology

Applications of a Within-Study Comparison Approach for Evaluating Bias in Generalized Causal Inferences from Comparison Groups Studies

This work further extends our interest in issues of external validity and equip researchers with a strategy for testing the limits of generalizations from randomized trials. Written for a technical audience, the work extends an approach commonly used to assess levels of selection bias in estimates from non-experimental studies to examine bias in generalized inferences from experiments and non-experiments.

It’s always exciting for our team to share the findings from our experiments, as well as the things we learn during the analysis that can help the evaluation community provide more productive evidence for educators. Much of our work is done in partnership with other organizations and if you’re interested in partnering with us on this kind of work, please email us.

2016-11-18

Posted by: Robin Means

Tags: Andrew Jaciw, comparison group studies, Evaluation Review, journal, Journal of Research on Educational Effectiveness, JREE, Math in Focus and MIF

Math in Focus Paper Published in JREE

Chief Scientist Andrew Jaciw’s paper entitled Assessing Impacts of Math in Focus, a “Singapore Math” Program was accepted by the Journal of Research on Educational Effectiveness. The paper reports the results of an RCT conducted in Clark County (Las Vegas, NV) by a team that included Whitney Hegseth, Li Lin, Megan Toby, Denis Newman, Boya Ma, and Jenna Zacamy. From the abstract (available online here):

Twenty-two grade-level teams across twelve schools were randomized to the program or business as usual. Measures included indicators of fidelity to treatment, and student mathematics learning. Impacts on mathematics achievement ranged between .11 and .15 standard deviation units, with no differential impact based on level of pretest [or] minority status.

2016-03-30

Posted by: Robin Means

Tags: Andrew Jaciw, Clark County, journal, Journal of Research on Educational Effectiveness, JREE, Math in Focus, MIF and Singapore Math

Evaluation Concludes Aspire’s PD Tools Show Promise to Impact Classroom Practice

Empirical Education Inc. has completed an independent evaluation (read the report here) of a set of tools and professional development opportunities developed and implemented by Aspire Public Schools under an Investing in Innovation (i3) grant. Aspire was awarded the development grant in the 2011 funding cycle and put the system, Transforming Teacher Talent (t3), into operation in 2013 in their 35 California schools. The goal of t3 was to improve teacher practice as measured by the Aspire Instructional Rubric (AIR) and thereby improve student outcomes on the California Standards Test (CST), the state assessment. Some of the t3 components connected the AIR scores from classroom observations to individualized professional development materials building on tools from BloomBoard, Inc.

To evaluate t3, Empirical principal investigator, Andrew Jaciw and his team designed the strongest feasible evaluation. Since it was not possible to split the schools into two groups by having two versions of Aspire’s technology infrastructure supporting t3, a randomized experiment or other comparison group design was not feasible. Working with the National Evaluation of i3 (NEi3) team, Empirical developed a correlational design comparing two years of teacher AIR scores and student CST scores; that is, from the 2012-13 school year to the scores in the first year of implementation, 2013-14. Because the state was in a transition to new Common Core tests, the evaluation was unable to collect student outcomes systematically. The AIR scores, however, provided evidence of substantial overall improvement with an effect size of 0.581 standard deviations (p <.001). The evidence meets the standards for “evidence-based” as defined in the recently enacted Every Student Succeeds Act (ESSA), which requires, at the least, that the test of the intervention “demonstrates a statistically significant effect on improving…relevant outcomes based on…promising evidence from at least 1 well designed and well-implemented correlational study with statistical controls for selection bias.” A demonstration of promise can assist in obtaining federal and other funding.

2016-03-07

Posted by: Robin Means

Tags: andrew jaciw, Aspire, bloomboard, correlational design, empirical education, evaluation, i3, pd, report, t3 and Transforming Teacher Talent

SREE Spring 2016 Conference Presentations

We are excited to be presenting two topics at the annual Spring Conference of The Society for Research on Educational Effectiveness (SREE) next week. Our first presentation addresses the problem of using multiple pieces of evidence to support decisions. Our second presentation compares the context of an RCT with schools implementing the same program without those constraints. If you’re at SREE, we hope to run into you, either at one of these presentations (details below) or at one of yours.

Friday, March 4, 2016 from 3:30 - 5PM
Roosevelt (“TR”) - Ritz-Carlton Hotel, Ballroom Level

6E. Evaluating Educational Policies and Programs
Evidence-Based Decision-Making and Continuous Improvement

Chair: Robin Wisniewski, RTI International

Does “What Works”, Work for Me?: Translating Causal Impact Findings from Multiple RCTs of a Program to Support Decision-Making
Andrew P. Jaciw, Denis Newman, Val Lazarev, & Boya Ma, Empirical Education

Saturday, March 5, 2016 from 10AM - 12PM
Culpeper - Fairmont Hotel, Ballroom Level

Session 8F: Evaluating Educational Policies and Programs & International Perspectives on Educational Effectiveness
The Challenge of Scale: Evidence from Charters, Vouchers, and i3

Chair: Ash Vasudeva, Bill & Melinda Gates Foundation

Comparing a Program Implemented under the Constraints of an RCT and in the Wild
Denis Newman, Valeriy Lazarev, & Jenna Zacamy, Empirical Education

2016-02-26

Posted by: Robin Means

Tags: andrew jaciw, conference, denis newman, education policy, empirical education, evidence, evidence based, i3, RCT, research, scale-up and sree

Empirical's Chief Scientist co-authored a recently released NCEE Reference Report

Together with researchers from Abt Associates, Andrew Jaciw, Chief Scientist of Empirical Education, co–authored a recently released report entitled, “Estimating the Impacts of Educational Interventions Using State Tests or Study-Administered Tests”. The full report released by the The National Center for Education Evaluation and Regional Assistance (NCEE) can be found on the Institute of Education Sciences (IES) website.The NCEE Reference Report examines and identifies factors that could affect the precision of program evaluations when they are based on state assessments instead of study-administered tests. The authors found that using the same test for both the pre- and post-test yielded more precise impact estimates; using two pre-test covariates, one from each type of test (state assessment and study- administered standardized test), yielded more precise impact estimates; using as the dependent variable the simple average of the post-test scores from the two types of tests yielded more precise impact estimates and smaller sample size requirements than using post-test scores from only one of the two types of tests.

2011-11-02

Posted by: Robin Means

Tags: Andrew Jaciw, intervention, NCEE, program evaluation and report

blog posts and news stories

An Analogy for Originality in the Artistry of Herbie Hancock

Coming Back to the Idea of the Fusion of Quant with Quant Crit in Program Evaluation

2024-10-15

2024-10-04

2022-12-15

Day 1

Day 2

Day 3

2017-04-27

2017-03-24

2016-11-18

2016-03-30

2016-03-07

2016-02-26

2011-11-02

Archive