# Reliability in psychometrics: what is it and how is it estimated in tests?

**The reliability of a test is a very important concept in psychometrics and in research in general.**

If you have studied psychology or other related careers, the concept of reliability is surely familiar to you. But... what exactly does it consist of? **Reliability in psychometrics is a quality or property of measurement instruments** (e.g. tests), which makes it possible to verify whether they are accurate, consistent and stable in their measurements.

In this article we tell you what this property consists of, give some examples to clarify the concept and explain the different ways of calculating the reliability coefficient in psychometrics.

## What is reliability in psychometrics?

Reliability is a concept encompassed within psychometrics, the discipline in charge of measuring the psychological variables of the human being through different techniques, methods and tools. Thus, reliability in psychometrics, to be redundant, consists of a psychometric property, which implies the absence of measurement errors of a given instrument. **implies the absence of measurement errors of a given instrument (e.g., a test).** (e.g., a test).

It is also known as the degree of consistency and stability of the scores obtained in different measurements through the same instrument or test. **Another synonym for reliability in psychometrics is "precision."**. Thus, we say that a test is reliable when it is accurate, error-free and its measurements are stable and consistent over repeated measurements.

Beyond reliability in psychology, in what fields does this concept appear and is it used? In different fields, such as social research and education.

### Examples

To better illustrate what this psychometric concept consists of, let us consider the following example: we use a thermometer to measure the daily temperature in a classroom. We take the measurement at ten o'clock in the morning every day for a week.

We will say that the thermometer is reliable (has a high reliability) if, when taking more or less the same temperature every day, the thermometer indicates so (i.e. the measurements are close to each other, there are no big jumps or big differences).

On the other hand, **if the measurements are totally different from each other** (the temperature being more or less the same every day), it means that the instrument is not reliable (because its measurements are not stable or consistent over time).

Another example to understand the concept of reliability in psychometrics: imagine weighing a basket of three apples every day for several days and recording the results. If these results vary greatly over successive measurements (i.e., as we repeat them), this would indicate that the reliability of the scale is not good, since the measurements would be inconsistent and unstable (the antagonists of reliability).

Thus, a reliable instrument is one that **shows consistent and stable results in repeated measurements of a given variable.** of a given variable.

## Measurement variability

How do we know if an instrument is reliable? For example, from the variability of its measurements. That is, if the scores we obtain (measuring the same thing repeatedly) with such an instrument are highly variable among themselves, we will consider that their values are not accurate, and therefore the instrument does not have a good reliability (it is not reliable).

Extrapolating this to psychological tests and to a subject's responses to one of them, we see how the fact that the subject answered the same test under the same conditions, repeatedly, would provide us with an indicator of reliability, **would provide us with an indicator of the reliability of the test, based on the variability of the scores.**.

## The calculation: reliability coefficient

How do we calculate reliability in psychometrics? From the reliability coefficient, which can be calculated in two different ways: from procedures involving two applications or just one. Let's see the different ways of calculating it, within these two large blocks:

### 1. Two applications

In the first group we find the different ways (or procedures) that allow us to calculate the reliability coefficient from two applications of a test. **allow us to calculate the reliability coefficient from two applications of a test.**. Let's get to know them, as well as their disadvantages:

1.1. Parallel or equivalent ways.

With this method, we obtain the measure of reliability, in this case also called "equivalence". The method consists of applying, simultaneously, the two tests: the X (the original test) and the X' (the equivalent test that we have created). The disadvantages of this procedure are basically two: the fatigue of the examinee and the construction of two tests.

1.2. Test-retest

The second method, within the procedures for calculating the reliability coefficient from two applications, is the test-retest, which allows us to obtain the stability of the test. It basically consists of **applying a test X, allowing a lapse of time to pass, and reapplying the same test X to the same sample.**.

The disadvantages of this procedure are: the learning that the tested subject may have acquired in that lapse of time, the evolution of the person, which may alter the results, etc.

1.3. Test-retest with alternative forms

Finally, another way of calculating reliability in psychometrics is based on the test-retest with alternative forms. **This is a combination of the two previous procedures.**Although it may be useful in certain cases, it combines the disadvantages of both.

The procedure consists of administering test X, allowing a period of time to elapse, and then administering test X' (i.e., the equivalent test created from the original, X).

### 2. A single application

On the other hand, the procedures for calculating reliability in psychometrics (reliability coefficient) from a single application of the test or measuring instrument are divided into two subgroups: the two halves and the covariance between items. Let's see it in more detail, for a better understanding:

2.1. Two halves

In this case **simply put, the test is divided into two halves.**. Within this section, we find three types of procedures (ways of dividing the test):

- Parallel forms: the Spearman-Brown formula is applied.
- Equivalent forms: the Rulon or Guttman-Flanagan formula is applied.
- Congeneric forms: the Raju formula is applied.

2.2. Covariance between items

Inter-item covariance **involves analyzing the relationship between all test items.**. Within it, we also find three methods or formulas specific to psychometrics:

Croanbach's alpha coefficient: its value ranges between 0 and 1. Kuder-Richardson (KR20): applied when the items are dichotomous (i.e., when they have only two values). Guttman.

### 3. Other methods

Beyond the procedures involving one or two applications of the test to calculate the reliability coefficient, we find other methods, such as: inter-rater reliability (which measures the consistency of the test), Hoyt's method, etc.

Bibliographical references:

- Kaplan, R.M., & Saccuzzo, D.P. (2010). Psychological Testing: Principles, Applications, and Issues. (8th ed.). Belmont, CA: Wadsworth, Cengage Learning.
- Martinez, M.A., Hernandez, M.J., & Hernandez, M.V.. (2014). Psicometría. Madrid: Alianza.
- Martínez Arias, R. (2006). Psicometría. Madrid: Anaya.
- Morales Vallejo, Pedro (2007). Estadística aplicada a las ciencias sociales. The reliability of tests and scales. Madrid: Universidad Pontificia Comillas. p. 8.
- Prieto, Gerardo; Delgado, Ana R. (2010). Reliability and validity. Papeles del psicólogo (Spain: Consejo General de Colegios Oficiales de Psicólogos) 31(1): 67-74.

(Updated at Apr 12 / 2024)