Kolmogorov-Smirnov test: what is it and how is it used in statistics?

This non-parametric test is widely used in inferential statistics. Let's see how it works.

In statistics, parametric and nonparametric tests are well known and used. A widely used non-parametric test is the Kolmogorov-Smirnov testwhich makes it possible to verify whether or not the scores of the sample follow a normal distribution.

It belongs to the group of so-called goodness-of-fit tests. In this article we will learn about its characteristics, what it is used for and how it is applied.

Nonparametric tests

The Kolmogorov-Smirnov test is a type of nonparametric test. a type of nonparametric test. Nonparametric tests (also called free distribution) are used in inferential statistics, and have the following characteristics:

Hypotheses on goodness of fit, independence, etc. are put forward.
The level of measurement of the variables is low (ordinal).
They do not have excessive restrictions.
They are applicable to small samples.
They are robust.

Kolmogórov-Smirnov test: features

The Kolmogorov-Smirnov test is a test that belongs to statistics, specifically to inferential statistics. Inferential statistics aims to extract information about populations.

It is a goodness-of-fit testthat is, it is used to verify whether or not the scores we have obtained from the sample follow a normal distribution. In other words, it allows us to measure the degree of agreement between the distribution of a set of data and a specific theoretical distribution. Its objective is to indicate whether the data come from a population that has the specified theoretical distribution, i.e., what it does is to test whether the observations could reasonably come from the specified distribution.

The Kolmogorov-Smirnov test addresses the following question: Do the observations in the sample come from some hypothesized distribution?

Null hypothesis and alternative hypothesis

As a goodness-of-fit test, it answers the question, "Does the sample (empirical) distribution fit the population (theoretical) distribution?". In this case, the null hypothesis (H0) will state that the empirical distribution is similar to the theoretical one (the null hypothesis is that the empirical distribution is similar to the theoretical one). (the null hypothesis is the one we are not trying to reject). In other words, the null hypothesis will state that the observed frequency distribution is consistent with the theoretical distribution (and that there is therefore a good fit).

In contrast, the alternative hypothesis (H1) will state that the observed frequency distribution is not consistent with the theoretical distribution (poor fit). As in other hypothesis contrast tests, the symbol α (alpha) will indicate the significance level of the test.

How is it calculated?

The result of the Kolmogorov-Smirnov test is represented by the letter Z. The Z is calculated from the largest difference (in absolute value) between the theoretical and observed (empirical) cumulative distribution functions..

Assumptions

In order to apply the Kolmogorov-Smirnov test correctly, a number of assumptions must be made. First, the test assumes that the parameters of the test distribution have been previously specified.. This procedure estimates the parameters from the sample.

On the other hand, the sample mean and the sample standard deviation are the parameters of a normal distributionthe minimum and maximum values of the sample define the range of the uniform distribution, the sample mean is the parameter of the Poisson distribution and the sample mean is the parameter of the exponential distribution.

The ability of the Kolmogorov-Smirnov test to detect deviations from the hypothesized distribution can be severely diminished. To contrast it with a normal distribution with estimated parameters, the K-S Lillliefors test should be considered..

Application

The Kolmogorov-Smirnov test can be applied on a sample to test whether a variable (e.g., academic grades or income €) is normally distributed. This is sometimes necessary to know, since many parametric tests require that the variables they use follow a normal distribution.

Advantages

Some of the advantages of the Kolmogórov-Smirnov test are are:

It is more powerful than the Chi-square (χ²) test (also goodness-of-fit test).
It is easy to calculate and use, and does not require clustering of the data.
The statistic is independent of the expected frequency distribution, it depends only on the sample size.

Differences with parametric tests

Parametric tests, unlike nonparametric tests such as the Kolmogorov-Smirnov test, have the following characteristics:

They raise hypotheses about parameters.
The level of measurement of the variables is at least quantitative.
There are a series of assumptions that must be fulfilled.
They do not lose information.
They have a high statistical power.

Some examples of parametric tests would be: the t-test for difference of means or ANOVA.

(Updated at Apr 13 / 2024)