# Regression to the mean: definition and examples

**Let's see what regression to the mean is, a key concept in statistics and psychometrics.**

In research, whatever the subject, extremes are known to be very strange points and rarely hold. Getting an extreme score on a math test, a medical exam, or even a roll of the dice are rare situations that, as they are repeated, will involve values closer to the mean.

**The idea of regression to the mean is the name given to this increasing closeness to central values.**. Below we explain this concept, in addition to giving examples of it.

## What is regression to the mean?

In statistics, regression to the mean, historically called reversion to the mean and reversion to mediocrity, is the phenomenon that occurs when, for example, **if a variable has been measured and the first time an extreme value is obtained, in the second measurement it will tend to be closer to the mean.**. Paradoxically, if it turns out that in its second measurement it gives extreme values, it will tend to be closer to the mean in its first measurement.

Let us imagine that we have two dice and we throw them. The sum of the numbers obtained in each roll will give between 2 and 12, those two numbers being the extreme values, while 7 is the central value.

If, for example, in the first roll we have obtained a sum of 12, it is less likely that in the second roll we will have the same luck again. If you throw the dice X times you will see that, on the whole, you will obtain values closer to 7 than to the extremes, which, represented graphically, would give a normal distribution curve, i.e., it will tend towards the mean.

The idea of regression to the mean **is very important in research, since it must be considered in the design of scientific experiments and the interpretation of the data collected to avoid making wrong inferences.** data collected to avoid making wrong inferences.

### History of the concept

The concept of regression to the mean **was popularized by Sir Francis Galton at the end of the 19th century, speaking of the phenomenon in his paper "Regression to the Mean".**He discussed the phenomenon in his paper "Regression towards mediocrity in hereditary stature".

Francis Galton observed that extreme characteristics, in the case of his study, the height of the parents, did not seem to follow the same extreme pattern in their offspring. The children of very tall parents and the children of very short parents, instead of being respectively so tall and so short, had heights that tended toward mediocrity, an idea that today we modernly know as average. Galton had the feeling that **it was as if nature was looking for a way to neutralize the extreme values.**.

He quantified this tendency and, in doing so, invented linear regression analysis, thus laying the foundation for much of what is modern statistics. Since then, the term "regression" has taken on a wide variety of meanings, and can be used by modern statisticians to describe sampling bias phenomena.

## Importance of regression to the mean in statistics.

As we have already mentioned, regression to the mean is a phenomenon of great importance to be taken into account in scientific research. To understand why, let's look at the following case.

**Let's imagine 1,000 people of the same age who have been screened to assess their risk of having a Heart attack....**. Of these 1,000 people, we have seen very varied scores, as expected, however, the focus has been put on the 50 people who have obtained a score of maximum risk. Based on this, a special clinical intervention has been proposed for these people, in which dietary changes, increased physical activity and the application of pharmacological treatment will be introduced.

Let us imagine that, despite the efforts that have been made in developing the therapy, it has turned out to have no real effect on the health of the patients. Even so, in the second physical examination, performed some time after the first examination, it is reported that there are patients with some kind of improvement.

This improvement would be nothing more than the phenomenon of regression to the mean, having patients who, this time, **instead of giving values suggesting that they have an elevated risk of heart attack, they have a slightly lower risk of heart attack.**. The research group could fall into the error that, indeed, their therapeutic plan has worked, but it has not.

The best way to avoid this effect would be to select patients and randomly assign them into two groups: one group to receive the treatment and another group to act as a control. Based on the results obtained with the treatment group compared to the control group, improvements can be attributed, or not, to the effect of the therapeutic plan.

## Fallacies and examples of regression to the mean

Many phenomena are attributed as wrong causes when regression to the mean is not taken into account.

1. The case of Horace Secrist

An extreme example is what Horace Secrist thought he saw in his 1933 book *The Triumph of Mediocrity in Business* ("The Triumph of Mediocrity in Business"). This professor of statistics compiled hundreds of data to prove that **profit rates in companies with competitive businesses tended to move toward the mean over time.** over time. In other words, they started out very high but then declined, either because of burnout or because the tycoon had taken too many risks and become overconfident.

**In truth, this was not the real phenomenon**. The variability of profit rates was constant over time, what happened was that Secrist observed the regression to the mean, thinking that it was really a natural phenomenon that businesses that had high profits at the beginning stagnated over time.

### 2. Massachusetts schools

Another, more modern example is what happened in the evaluation of educational questionnaires in Massachusetts in 2000. In the previous year, schools in the state were assigned educational goals to achieve. This basically implied that **the school's grade point average, among other factors, was to be above a value agreed upon by the education authorities.**.

After the year had passed, the education department obtained information on all academic test results administered in the state's schools, tabulating the difference achieved by students between 1999 and 2000. Analyzers of the data were surprised to see that the schools that had done the worst in 1999, which had not met that year's targets, managed to meet them the following year. This was interpreted to mean that the state's new education policies were having an effect.

However, this was not the case. Confidence that the educational improvements were effective was shaken when it was found that the schools that had obtained the best scores in 1999 worsened their performance the following year. The issue was debated, and the idea that there had actually been improvements in schools that had scored poorly in 1999 was dismissed as a case of regression to normality, indicating that educational policies had not helped much.

Bibliographical references:

- Mee, R. W. and Chua, T .C. (1991). Regression toward the mean and the paired sample t-test. The American Statistician, 45, 1, 39-42.
- Rousseeuw, P. J. (1991). Why the Wrong Papers Get Published. Chance 4, 41-43.
- Schmittlein, D. C. (1989). Surprising inferences from unsurprising observations: Do conditional Expectations really regress to the mean?, The American Statistician, 43, 181-183.
- Gonzalez, J. J. (2009), Regression to the Mean: A Statistical Phenomenon with History and Social Impact, University of Las Palmas de Gran Canaria.
- History and Social Impact, University of Las Palmas de Gran Canaria.

(Updated at Apr 12 / 2024)