Reliability and validity are two key measures used in almost all social science and education experiments. The posting below gives a nice explanation of each of these measures. It is from Chapter 4, Assessment Planning and Implementation, in Assessing Academic Programs in Higher Education, by Mary J. Allen, California State University, Institute for Teaching and Learning. Anker Publishing Company, Inc., 176 Ballville Road, P.O. Box 249, Bolton, MA 01740-0249 USA. [www.ankerpub.com] Copyright ? 2004 by Anker Publishing Company, Inc. All rights reserved. ISBN 1-882982-67-3. Reprinted with permission.
UP NEXT: Myers-Briggs Can Help You Understand Your Students-and Colleagues-Better.
------------------------------- 586 words ------------------------------
RELIABILITY AND VALIDITY
Assessment results should be trustworthy, and a traditional way to examine this is to ask if results are reliable and valid (Allen & Yen, 2002). Reliability refers to measurement precision and stability, and reliability can be examined in a number of ways (see Figure 4.1). Conclusions about individuals are consistent when measurements are reliable. Reliability often is summarized with a correlation coefficient. If results are determined at random, the reliability coefficient is zero; and if identical results are obtained each time individuals are assessed, the reliability is 1.0. No procedure is perfectly reliably, but longer tests tend to be more reliable than shorter tests, procedures that assess abilities tend to be more reliable than ones that assess opinions or personalities, and objectively scored procedures tend to be more reliable than subjectively scores procedures.
Figure 4.1 MAJOR TYPES OF RELIABILITY
A reliability estimate based on assessing a group of people twice and correlating
the two scores. This coefficient measures score stability.
Parallel forms reliability (or alternate forms reliability)
A reliability estimate based on correlating scores collected using two versions
of the procedure. This coefficient indicates score consistency across the
How well two or more raters agree when decisions are based on subjective
Internal consistency reliability
A reliability estimate based on how highly parts of a test correlate with each
Coefficient alpha An internal consistency reliability estimate based on correlations among all
items on a test.
An internal consistency reliability estimate based on correlating two scores,
each calculated on half of a test.
Validity refers to how well a procedure assesses what it is supposed to be assessing. A valid assessment of a learning objective tells us how well students have mastered that objective, and it should provide useful formative information. Figure 4.2 described some major ways to evaluate a procedure's validity. Valid procedures avoid bias; that is, systematic underestimates or overestimates of what is being assessed. Bias and unreliability undermine validity because results are less trustworthy. Formative validity (i.e., how well the procedure yields findings that are useful for improving what is being assessed) is of primary importance for program assessment.
Reliability and validity are sometimes confused, but an absurd example should help clarify the difference. Imagine that we measure adult information literacy by multiplying people's head circumferences by 10.
Figure 4.2 MAJOR TYPES OF VALIDITY
Construct validity is examined by testing predictions based on the theory (or
construct) underlying the procedure. For example, faculty might predict that
scores on a test that assesses knowledge of anthropological terms will increase
as anthropology students progress in their major. We have more confidence in
the test's construct validity if predictions are empirically supported.
Criterion-related validity indicates how well results predict a phenomenon of
interest, and it is based on correlating assessment results with this criterion.
For example, scores on an admissions test can be correlated with college GPA to
demonstrate criterion-related validity.
Face validity is assessed by subjective evaluation of the measurement procedure.
This evaluation may by made by test takers or by experts in what is being
Formative validity is how well an assessment procedure provides information
that is useful for improving what is being assessed.
Sampling validity is how well the procedure's components, such as test items,
reflect the full range of what is being assessed. For example, a valid test of
content mastery should assess information across the entire content area, not
just isolated segments.
Allen, M.J., & Yen, W.M. (2002). Introduction to measurement theory. Prospect Heights, IL: Waveland.