blog posts and news stories

Does 1 teacher = 1 number? Some Questions About the Research on Composite Measures of Teacher Effectiveness

We are all familiar with approaches to combining student growth metrics and other measures to generate a single measure that can be used to rate teachers for the purpose of personnel decisions. For example, as an alternative to using seniority as the basis for reducing the workforce, a school system may want to base such decisions—at least in part—on a ranking based on a number of measures of teacher effectiveness. One of the reports released January 8 by the Measures of Effective Teaching (MET) addressed approaches to creating a composite (i.e., a single number that averages various aspects of teacher performance) from multiple measures such as value-added modeling (VAM) scores, student surveys, and classroom observations. Working with the thousands of data points in the MET longitudinal database, the researchers were able to try out multiple statistical approaches to combining measures. The important recommendation from this research for practitioners is that, while there is no single best way to weight the various measures that are combined in the composite, balancing the weights more evenly tends to increase reliability.

While acknowledging the value of these analyses, we want to take a step back in this commentary. Here we ask whether agencies may sometimes be jumping to the conclusion that a composite is necessary when the individual measures (and even the components of these measures) may have greater utility than the composite for many purposes.

The basic premise behind creating a composite measure is the idea that there is an underlying characteristic that the composite can more or less accurately reflect. The criterion for a good composite is the extent to which the result accurately identifies a stable characteristic of the teacher’s effectiveness.

A problem with this basic premise is that in focusing on the common factor, the aspects of each measure that are unrelated to the common factor get left out—treated as noise in the statistical equation. But, what if observations and student surveys measure things that are unrelated to what the teacher’s students are able to achieve in a single year under her tutelage (the basis for a VAM score)? What if there are distinct domains of teacher expertise that have little relation to VAM scores? By definition, the multifaceted nature of teaching gets reduced to a single value in the composite.

This single value does have a use in decisions that require an unequivocal ranking of teachers, such as some personnel decisions. For most purposes, however, a multifaceted set of measures would be more useful. The single measure has little value for directing professional development, whereas the detailed output of the observation protocols are designed for just that. Consider a principal deciding which teachers to assign as mentors, or a district administrator deciding which teachers to move toward a principalship. Might it be useful, in such cases, to have several characteristics to represent different dimensions of abilities relevant to success in the particular roles?

Instead of collapsing the multitude of data points from achievement, surveys, and observations, consider an approach that makes maximum use of the data points to identify several distinct characteristics. In the usual method for constructing a composite (and in the MET research), the results for each measure (e.g., the survey or observation protocol) are first collapsed into a single number, and then these values are combined into the composite. This approach already obscures a large amount of information. The Tripod student survey provides scores on the seven Cs; an observation framework may have a dozen characteristics; and even VAM scores, usually thought of as a summary number, can be broken down (with some statistical limitations) into success with low-scoring vs. with high-scoring students (or any other demographic category of interest). Analyzing dozens of these data points for each teacher can potentially identify several distinct facets of a teacher’s overall ability. Not all facets will be strongly correlated with VAM scores but may be related to the teacher’s ability to inspire students in subsequent years to take more challenging courses, stay in school, and engage parents in ways that show up years later.

Creating a single composite measure of teaching has value for a range of administrative decisions. However, the mass of teacher data now being collected are only beginning to be tapped for improving teaching and developing schools as learning organizations.

2013-02-14

REL West Releases Report of RCT on Problem-Based Economics Conducted with Empirical Ed Help

Three years ago, Empirical Education began assisting the Regional Educational Laboratory West (REL West) housed at WestEd in conducting a large-scale randomized experiment on the effectiveness of the Problem-Based Economics (PBE) curriculum.

Today, the Institute of Education Sciences released the final report indicating a significant impact of the program for students in 12th grade as measured by the Test of Economic Literacy. In addition to the primary focus on student achievement outcomes, the study examined changes in teachers’ content knowledge in economics, their pedagogical practices, and satisfaction with the curriculum. The report, Effects of Problem Based Economics on High School Economics Instruction is found on the IES website.

Eighty Arizona and California school districts participated in the study, which encompassed 84 teachers and over 8,000 students. Empirical Education was responsible for major aspects of research operations, which involved collecting, tracking, scoring, and warehousing all data including rosters and student records from the districts, as well as the distribution of the PBE curricular materials, assessments, and student and teacher surveys. To handle the high volume and multiple administrations of surveys and assessments, we created a detail-oriented operation including schedules for following up with survey responses where we achieved response rates of over 95% for both teacher and student surveys. The experienced team of research managers, RAs and data warehouse engineers maintained a rigorous 3-day turnaround for gathering end-of-unit exams and sending score reports to each teacher. The complete, documented dataset was delivered to the researchers at WestEd as our contribution to this REL West achievement.

2010-07-30

Reports Released on the Effect of Carnegie Learning’s Cognitive Tutor

The Maui School District has released results from a study of the effect of Carnegie Learning’s Cognitive Tutor (CT) on long-term course selections and grade performance. Building upon two previous randomized experiments on the impact of CT on student achievement in Algebra I and Pre–algebra, the study followed the same groups of students in the year following their exposure to CT. The research did not find evidence of an impact of CT on either course selection or course grade performance for students in the following school year. The study also found no evidence that variation among ethnicities in both the difficulty of course taken and course grade received depended on exposure to CT.

A concurrent study was conducted on the successes and challenges of program implementation with the teachers involved in the previous CT studies. The study took into account teachers’ levels of use and length of exposure to CT; the descriptive data comprised surveys, classroom observations, and interviews. The major challenges to implementation included a lack of access to resources, limited support for technology, and other technological difficulties. After 3 years of implementation, teachers reported that these initial barriers had been resolved; however teachers have yet to establish a fully collaborative classroom environment, as described in the Carnegie Learning implementation model.

Maui School District is the company’s first MeasureResults subscriber. A similar research initiative is being conducted at the community college level with The Maui Educational Consortium. The report for this study will be announced later this year.

2008-12-10

Two-Year Study on Effectiveness of Graphing Calculators Released

Results are in from a two-year randomized control trial on the effectiveness of graphing calculators on Algebra and Geometry achievement. Two reports are now available for this project, which was sponsored by Texas Instruments. In year 1 we contrasted business as usual in the math classes of two California school districts with classes equipped with sets of graphing calculators and led by teachers who received training in their use. In the second year we contrasted calculator-only classrooms with those also equipped with a calculator-based wireless networking system.

The project tracked achievement through state and other standardized test scores and implementation through surveys and observations. For the most part, the experiment could not discern an impact as a result of providing the equipment and training for the teachers. Data from surveys and observations make clear that the technology was not used extensively (and by some, not at all) suggesting that training, usability, and alignment issues must be addressed in adoption of this kind of program. There were modest effects, especially for Geometry, but these were often not found consistently for the two measurement scales. In one case contradictory results for the two school districts suggests that researchers should use caution in combining data from different settings.

2008-03-01

REL West Calls on Empirical for Assistance with Experiment on New Economics Program

WestEd, which holds the contract for the Regional Education Laboratory contract in the western region (REL West) has contracted with Empirical Education for the operations of a large randomized experiment involving over 75 high school economics teachers throughout California and Arizona. With a September 2007 start, the project has called for very rapid start up by Empirical Education staff including delivery and processing approximately 40,000 student tests and surveys, acquiring data from over 50 school districts and conducting web-based surveys with 84 teachers. The Problem-Based Economics program, developed by Buck Institute for Education is being tested in the context of a single-semester course. During the 2007-2008 school year, two cohorts of students will take the Test of Economic Literacy (TEL) as well as performance assessments and attitudinal surveys to test the impact of the new program compared to the materials in use in the classrooms of control teachers.

2007-09-06
Archive