Skip to content Skip to navigation

A Better Way to Evaluate Undergraduate Teaching

Tomorrow's Teaching and Learning

Message Number: 

The current methods for evaluating the quality of teaching have serious deficiencies.  They do not allow an instructor or an institution to objectively determine the quality of teaching in a course nor show how it can be improved.  I present the requirements for an optimum method for evaluating teaching.  I then present a new approach based on a detailed inventory of the teaching practices used in a course that allows a quantitative determination of the extent of use of practices that research has shown result in improved student learning.  This approach takes little time and provides: a high level of discrimination; guidance for how to achieve improvement; and clear evidence when there is improvement.   



The posting below looks at a new approach to evaluating teaching. It is by Carl Wieman a professor of physics and of the Graduate School of Education at Stanford University.  He is the founder of the Carl Wieman Science Education Initiative (CWSEI) at the University of British Columbia and the Science Education Initiative at the University of Colorado.  He is a Nobel Laureate in Physics and served as the Associate Director for Science in the White House Office of Science and Technology Policy., 650 – 497-3491.

The posting is a condensed version of a substantially longer article that appeared in the January, 2015 issue of Change Magazine


Rick Reis

UP NEXT: Case Study: Is Service Learning Worth the Effort?

Tomorrow’s Teaching and Learning

---------- 2,153 words ----------

A Better Way to Evaluate Undergraduate Teaching


A major problem in higher education is the lack of a good way to measure the quality of teaching. This lack makes it very difficult for faculty to objectively determine the quality of their teaching and systematically work to improve it.  It also makes it difficult for them to document that quality, either within the institution or for external accreditors. Institutions are unable to incentivize improvement, track improvement, or demonstrate to external stakeholders or accreditation agencies the quality of teaching that they provide.

Criteria for evaluation of teaching quality

We propose a simple and operational definition:  the effectiveness with which the teacher is producing the desired student outcomes for the given student population.   The biggest difficulty in evaluating the quality of teaching is that the attainment of these outcomes in an absolute sense is highly dependent on the backgrounds of the students and the specific outcomes as defined by the course and instructor.  Thus, meaningful measures of teaching quality must separate out the impact of the teacher from the many other factors that affect attainment of educational outcomes. 

If improvement in teaching is to happen, it must become a credible component in this incentive system, and to become a credible component it must meet certain criteria of quality-- criteria which the corresponding metrics of research quality already meet. 

1.     Validity. The most important criterion is that the measures of teaching quality must correlate with the achievement of the desired student outcomes which define quality.

2.     Meaningful comparisons.  The individual instructors needs to have a standard they can compare with to know how well they are doing, and what they might do to improve.  They also need a way to compare their performance with the standards of their department and their institution.  Department Chairs need to be able to compare the teaching of all the individuals in a department, and Deans and Provosts need to compare the performance of similar departments at the same or at different institutions.

3.     Fairness.  This requires that the method can be used across nearly all courses/instructors with nearly equal validity.  This will be true only if the correlation between the measured values of “teaching quality” and the measures of student outcomes is greater than the correlation between measured values of “teaching quality” and other factors that are not under the instructor’s control. (e. g. class size, level, subject, institutional type)

4.     Practicality.  It must be possible to obtain the measures of quality for instructors on an annual basis without requiring substantial investments of time and/or money.

5.     Improvement. The measure needs provide clear guidance to an instructor as to how well they are doing and how they can improve.

Faculty almost universally express great cynicism about student evaluations, the predominant way that undergraduate teaching is currently evaluated, and correspondingly, about the institutional commitment to teaching quality when student evaluations are the dominant measure of quality. 

The Teaching Practices Inventory

Here we offer a different method for evaluating teaching that does meet the above criteria, at least for the STEM disciplines.  It would likely also work for the social sciences with some modest changes, and an analogous instrument could be developed for the humanities. The design principle used to create this method was to first develop an instrument that could characterize as completely as possible all the teaching practices in use in nearly all STEM courses, while requiring little time and involving little subjective judgment in collecting the necessary information.  Knowing the full range of teaching practices used in any given course, it is then possible to determine the extent of use of practices that research has shown produce consistent improvements in student outcomes when compared to possible alternatives.  The quantitative measure of extent of use of practices that correlate with improved student outcomes is our measure of teaching quality.

It may seem surprising to evaluate the quality of teaching by looking only at the practices used by an instructor. However measuring practices as a proxy for a difficult-to-measure ultimate outcome is quite common when there are substantial correlations between the two.  The example most relevant to this discussion is the routine measurement of a science faculty member’s research contributions for annual review.  In the natural sciences this is typically based primarily on the numbers of papers published in reputable journals and research grants that a faculty member has had in the past 1-3 years, data that can be quickly and easily collected.  This system works quite well, because, while having a relatively large number of grants and publications does not guarantee substantial research contributions, they tend to be well correlated.  Correspondingly, a faculty member in the sciences who does not get research money and does not publish papers is very unlikely to be making significant research contributions.  Using effective, research-based teaching practices as a proxy for the desired student outcomes is based on much the same concept.

This use of such a proxy is only meaningful because of all the research in the past few decades establishing strong correlations between the type of STEM teaching practices used and both the amount of student learning achieved and course completion rates.  These correlations have been shown to hold across a large range of different instructors and institutions.  Those practices that are linked to improved learning in STEM are also consistent with empirically-grounded theoretical principles for the acquisition of complex expertise.  This explains the consistency of the results across disciplines, topics, and level of students, and provides confidence that those practices will be similarly effective in situations for which there is not yet research. 

The teaching practices inventory (TPI) characterizes all elements of how a course taught.  The current version of the inventory was developed over a six year period, during which it underwent numerous reviews by faculty and experts in undergraduate STEM education, and several rounds of real-world testing. This is discussed in detail in Wieman and Gilbert (2014).  We have now used the final version of the inventory to collect data on the teaching of more than 200 course offerings at UBC, from across the disciplines of biology, computer science, earth sciences, mathematics, physics, and statistics. It has also recently been used on a limited basis by several other institutions.  Most instructors complete the inventory in less than ten minutes. 

Table 1. Teaching practices inventory categories

I.          Course information provided

Information about the course, such as list of topics and organization of the course, and learning goals/objectives.

II.        Supporting materials provided

Materials provided that support learning of the course material, such as notes, video, and targeted references or readings.

III.       In-class features and activities

What is done in the classroom, including the range of different types of activities that the instructor might do or have the students do.

IV.       Assignments

Nature and frequency of the homework assignments in the course.

V.        Feedback and testing

Testing and grading in the course, and the feedback to students and feedback from students to instructor.

VI.       Other

Assorted items covering diagnostics, assessment, new methods, and student choice and reflection.

VII.     Training and guidance of teaching assistants

What selection criteria and training are used for course teaching assistants, and how their efforts are coordinated with other aspects of the course.

VIII.    Collaboration or sharing in teaching

Collaboration with other faculty, use of relevant education research literature, and use of educational materials from other sources.

Scoring rubric

The inventory responses provide a detailed picture of how a particular offering of a course is taught, and, when data are aggregated, how a department teaches.  We have created a rubric that converts the raw inventory data for a course into an “extent of use of research-based teaching practices (ETP) score” for each of the eight TPI categories, and for the course as a whole.  This is a measure of the extent of use of practices that research has shown are most educationally effective.  ETP points are given for each practice for which there is research showing that the practice improves learning. The distribution of points is shown on the inventory in Wieman and Gilbert (2014) along with references to the research that is the basis of the scoring.  

Figure 1. Histograms of the ETP (“extent of use of research-based teaching practices”) scores for the courses in three math and science departments of the five for which we collected extensive data and tested the TPI (from Wieman and Gilbert 2014).  Histogram bins are 5 wide ( ±2 around the central value). ETP scores are integers.


Figure 1 illustrates the ability of the TPI to readily identify the extent to which effective teaching practices are used by the different faculty within a department, and the differences between departments.  It is somewhat startling to see the actual range of use of effective teaching practices within a typical department--factors of 4 to 5.  High scoring courses incorporate many different mutually beneficial practices across all categories that support and encourage student learning, while low scoring courses have very few. 




Figure 2. D3 2006/07 - 2012/13 academic year comparison shown as % of total possible ETP score.


Benefits of Using the TPI

Use of the TPI for evaluating teaching has benefits for instructors, institutions, and students.  Faculty can now readily see how they can improve their teaching.  Simply by looking at the TPI and its scoring rubric, faculty can see the range of teaching practices that are in relatively common use and what the research indicates as to what practices will have an impact on student learning.  Comparing their own TPI results with others shows them their respective strengths and weaknesses.  The TPI provides them with a way to objectively document the quality and improvement in their teaching, and can free them from the capricious, frustrating, and sometimes quite mean-spirited, tyranny of student evaluations.   

Departments could use TPI data to benchmark the quality of their teaching, and identify targets of improvement and the results of improvement efforts.  An example of this use is shown in Figure 2, for a department that worked on improving its teaching between 2007 and 2012.  It takes little time to obtain data that can be used for institutional or accreditation reviews.  Institutions could use TPI results in the same way to track their overall improvement and to compare the quality of their teaching with their counterparts in peer institutions, in much the same way as they currently routinely compare research productivity and salaries across institutions.

Limitations of the TPI and scoring rubric for evaluating teaching

We were not successful in creating an inventory that was appropriate for use in all undergraduate STEM courses, in that it does not work for instructional labs, project-based courses, or seminar (largely student driven) courses, because such courses tend to be quite idiosyncratic. 

The obvious concern with the TPI data and scoring rubric as a measure of teaching quality is that this is measuring the use of particular practices, not how well those practices are being used.  It is important to remember that the important comparison here is not with perfection, but rather with alternative methods of evaluation. There are numerous research studies showing strong correlation between undergraduate STEM learning and the teaching methods used, independent of other characteristics of the teaching. For example, many studies have compared the same instructor using different methods (research-based active learning vs. traditional lecture).  We have done experiments at the CWSEI showing it is extremely difficult to rigorously measure quality, likely impossible to do on a routine basis in undergraduate courses.


Current methods of evaluating teaching at colleges and universities fail to encourage, guide, or document teaching that leads to improved student learning outcomes. Here we present an alternative that provides a measure of teaching quality that is correlated with desired student outcomes, free from large confounding variables, and can be used to make meaningful comparisons of the quality of teaching between faculty members, departments, and institutions. In contrast with current methods for evaluating undergraduate teaching, it is appropriate to incorporate the teaching practices inventory into the institutional incentive system. It also provides a way for faculty members, departments, and institutions to support claims of teaching commitment, improvement, and quality. 


Berk R (2005). “Survey of 12 strategies to measure teaching effectiveness.” International Journal of Teaching and Learning in Higher Education 17, 48–62.

Clayson, D. (2009)"Student evaluations of teaching: are they related to what students learn?: A meta-analysis and review of the literature." Journal of Marketing Education 31, no. 1: 16-29.

Freeman S, Eddy SL, McDonough M, Smith MK, Wenderoth MP, Okoroafor N, Jordt H (2014). “Active learning increases student performance in science, engineering, and mathematics.” Proc Natl Acad Sci,  10.1073/pnas.1319030111.

Singer S, Nielsen N, Schweingruber H (2012). Discipline-Based Education Research: Understanding and Improving Learning in Undergraduate Science and Engineering, Washington, DC: National Academies Press.

Wieman and Gilbert (2014), “The teaching practices inventory; a new tool for characterizing college and university teaching in mathematics and science,” CBE- Life Sciences Education Fall 2014