Kruskal-Wallis Test

Foundational Concept

What is the Kruskal-Wallis Test?

The Kruskal-Wallis H test, introduced by William Kruskal and W. Allen Wallis in 1952, is a rank-based nonparametric statistical procedure designed to determine whether three or more independent samples originate from populations with the same distribution. The test generalizes the Wilcoxon rank-sum test to more than two groups and serves as the nonparametric counterpart to the one-way analysis of variance (ANOVA).

Unlike parametric ANOVA, which operates on group means and requires the assumption of normally distributed populations with homogeneous variances, the Kruskal-Wallis test converts all observations into ranks pooled across all groups. The test then evaluates whether the rank distributions differ systematically among groups. Because the procedure relies on ordinal rank information rather than the actual numerical values of the data, it is robust to violations of normality and is appropriate for ordinal-scale measurements, skewed distributions, and data sets where the number of observations is small.

The null hypothesis of the Kruskal-Wallis test states that the population distributions of all groups are identical. A statistically significant result warrants rejection of this null hypothesis and supports the conclusion that at least one group's distribution differs from the others. Critically, a significant Kruskal-Wallis result does not identify which specific pairs of groups differ; post-hoc pairwise comparisons using procedures such as the Dunn test with Bonferroni correction are required to localize the source of the detected difference.

Key distinction from parametric ANOVA: The Kruskal-Wallis test makes no assumption about the shape of the population distribution. It tests whether the rank distributions are consistent across groups rather than whether the means are equal. When population distributions are identical in shape but differ in location, the test is equivalent to a test of median equality.

Mathematical Foundation

The H Statistic and Its Derivation

All N observations across all groups are jointly ranked from 1 to N. Tied observations receive the average of the ranks they would have occupied if they had been distinct. Let k denote the number of groups, n_i the number of observations in group i, N the total sample size, and R_i the sum of ranks assigned to group i. The test statistic H is computed as follows.

Kruskal-Wallis H Statistic (without ties)

H = [12 / N(N+1)] × Σ [R_i² / n_i] − 3(N+1)

where:
N = total number of observations across all groups
k = number of groups
n_i = number of observations in group i
R_i = sum of ranks in group i

Tie Correction Factor

H_corrected = H / C

C = 1 − [Σ(t_j³ − t_j) / (N³ − N)]
t_j = number of tied observations in the j-th tie group
When ties are present, the corrected H is always larger than the uncorrected H, increasing power.

Under the null hypothesis, H follows approximately a chi-squared distribution with k − 1 degrees of freedom, provided each group contains at least five observations. For smaller samples, exact critical values derived from permutation distributions are more appropriate. The chi-squared approximation is considered adequate when n_i is at least 5 for each group.

The test statistic H attains larger values when the rank sums R_i deviate substantially from what would be expected if all observations were drawn from a common population. Under the null hypothesis, the expected rank sum for group i is n_i(N+1)/2. The H statistic is a scaled measure of the squared deviations of the observed rank sums from these expectations.

Statistical Assumptions

Assumptions of the Kruskal-Wallis Test

While the Kruskal-Wallis test requires far fewer assumptions than one-way ANOVA, several conditions must be satisfied for valid inference.

Assumption	Description	Required
Independence	Observations within and between groups must be independent of one another. Repeated measures or matched designs violate this assumption.	Required
Ordinal or continuous scale	The dependent variable must be measured on at least an ordinal scale so that meaningful ranks can be assigned to the observations.	Required
Similar distributional shape	For the test to function as a test of median equality (location), the shapes of the population distributions across groups should be similar, differing only in location. If shapes differ, the test compares stochastic dominance rather than medians.	Required for median interpretation
Minimum sample size	Each group should contain at least 5 observations for the chi-squared approximation of H to be reliable. With smaller groups, exact p-values should be used.	Recommended
Normality of dependent variable	Normality of the dependent variable within each group is not required. This is the primary advantage of the Kruskal-Wallis test over one-way ANOVA.	Not required
Homogeneity of variance	Equal variances across groups are not required. The test is based on ranks and does not use group variance estimates in its computation.	Not required

Decision Framework

When to Use the Kruskal-Wallis Test

01

Non-normal data

When the dependent variable is markedly skewed, contains extreme outliers, or fails a formal normality test such as Shapiro-Wilk or Kolmogorov-Smirnov, particularly with small sample sizes.

02

Ordinal outcomes

When the dependent variable is measured on an ordinal scale, such as Likert-type ratings, ranked preferences, or ordered categories where equal intervals between values cannot be assumed.

03

Small or unequal samples

When group sample sizes are small (n less than 30 per group) or substantially unequal across groups, making parametric distributional assumptions unverifiable or implausible.

04

Three or more groups

When comparing three or more independent groups simultaneously. For exactly two groups, the Mann-Whitney U test (Wilcoxon rank-sum) is the appropriate nonparametric alternative.

Choosing between ANOVA and Kruskal-Wallis: One-way ANOVA is preferred when normality and homogeneity of variance can be established, as it has greater statistical power under those conditions. The Kruskal-Wallis test is preferred when those conditions cannot be met, or when the measurement scale is ordinal. With large samples, both tests generally yield similar conclusions because of the central limit theorem.

Follow-Up Analysis

Post-Hoc Pairwise Comparisons: Dunn's Test

A significant Kruskal-Wallis test establishes only that at least one group differs from the others; it does not identify which pairs are responsible for the overall significance. Post-hoc pairwise comparisons are therefore required when the omnibus test is significant and the research question concerns specific group differences.

Dunn's test (1964) is the most widely used post-hoc procedure following a significant Kruskal-Wallis test. It performs pairwise comparisons using the same rank-based logic as the Kruskal-Wallis test, using the pooled variance estimate from all groups rather than computing separate variances for each pair. The test statistic for comparing groups i and j is:

Dunn's Pairwise Z Statistic

z_ij = (W_i − W_j) / SE_ij

W_i = mean rank of group i = R_i / n_i
SE_ij = sqrt[ N(N+1)/12 × (1/n_i + 1/n_j) ] (without tie correction)
p-values are compared to α/m (Bonferroni) or adjusted using the Holm-Bonferroni procedure,
where m = k(k−1)/2, the total number of pairwise comparisons.

The Bonferroni correction divides the nominal significance level by the number of pairwise comparisons, controlling the familywise error rate at the expense of statistical power. The Holm-Bonferroni procedure provides a less conservative alternative that maintains familywise error control with greater power. Both approaches are implemented in the calculator below.

Effect Size

Measuring the Magnitude of Group Differences

Statistical significance alone is insufficient for meaningful interpretation. Effect size quantifies the practical magnitude of the difference among groups, independent of sample size. For the Kruskal-Wallis test, two effect size measures are commonly reported.

Eta-squared (η²) for Kruskal-Wallis

η² = (H − k + 1) / (N − k)

Interpretation: 0.01 = small, 0.06 = medium, 0.14 = large (Cohen, 1988)
η² represents the proportion of total rank variance explained by group membership.

Alternative: ε² (epsilon-squared) = H / [(N²−1)/(N+1)] is less biased for small samples.

Epsilon-squared (ε²) is less biased than eta-squared when sample sizes are small and is increasingly recommended in contemporary methodological literature. Both measures range from 0 to 1, where 0 indicates no relationship between group membership and the dependent variable, and 1 indicates that group membership perfectly determines rank order.

What is the Kruskal-Wallis Test?

The H Statistic and Its Derivation

Assumptions of the Kruskal-Wallis Test

When to Use the Kruskal-Wallis Test

Non-normal data

Ordinal outcomes

Small or unequal samples

Three or more groups

Post-Hoc Pairwise Comparisons: Dunn's Test

Measuring the Magnitude of Group Differences

Kruskal-Wallis Calculator

Primary Literature