Compare three or more independent groups without assuming normality. The nonparametric alternative to one-way ANOVA, based on ranked observations rather than raw means.
Open CalculatorThe Kruskal-Wallis H test, introduced by William Kruskal and W. Allen Wallis in 1952, is a rank-based nonparametric statistical procedure designed to determine whether three or more independent samples originate from populations with the same distribution. The test generalizes the Wilcoxon rank-sum test to more than two groups and serves as the nonparametric counterpart to the one-way analysis of variance (ANOVA).
Unlike parametric ANOVA, which operates on group means and requires the assumption of normally distributed populations with homogeneous variances, the Kruskal-Wallis test converts all observations into ranks pooled across all groups. The test then evaluates whether the rank distributions differ systematically among groups. Because the procedure relies on ordinal rank information rather than the actual numerical values of the data, it is robust to violations of normality and is appropriate for ordinal-scale measurements, skewed distributions, and data sets where the number of observations is small.
The null hypothesis of the Kruskal-Wallis test states that the population distributions of all groups are identical. A statistically significant result warrants rejection of this null hypothesis and supports the conclusion that at least one group's distribution differs from the others. Critically, a significant Kruskal-Wallis result does not identify which specific pairs of groups differ; post-hoc pairwise comparisons using procedures such as the Dunn test with Bonferroni correction are required to localize the source of the detected difference.
All N observations across all groups are jointly ranked from 1 to N. Tied observations receive the average of the ranks they would have occupied if they had been distinct. Let k denote the number of groups, n_i the number of observations in group i, N the total sample size, and R_i the sum of ranks assigned to group i. The test statistic H is computed as follows.
Under the null hypothesis, H follows approximately a chi-squared distribution with k − 1 degrees of freedom, provided each group contains at least five observations. For smaller samples, exact critical values derived from permutation distributions are more appropriate. The chi-squared approximation is considered adequate when n_i is at least 5 for each group.
The test statistic H attains larger values when the rank sums R_i deviate substantially from what would be expected if all observations were drawn from a common population. Under the null hypothesis, the expected rank sum for group i is n_i(N+1)/2. The H statistic is a scaled measure of the squared deviations of the observed rank sums from these expectations.
While the Kruskal-Wallis test requires far fewer assumptions than one-way ANOVA, several conditions must be satisfied for valid inference.
| Assumption | Description | Required |
|---|---|---|
| Independence | Observations within and between groups must be independent of one another. Repeated measures or matched designs violate this assumption. | Required |
| Ordinal or continuous scale | The dependent variable must be measured on at least an ordinal scale so that meaningful ranks can be assigned to the observations. | Required |
| Similar distributional shape | For the test to function as a test of median equality (location), the shapes of the population distributions across groups should be similar, differing only in location. If shapes differ, the test compares stochastic dominance rather than medians. | Required for median interpretation |
| Minimum sample size | Each group should contain at least 5 observations for the chi-squared approximation of H to be reliable. With smaller groups, exact p-values should be used. | Recommended |
| Normality of dependent variable | Normality of the dependent variable within each group is not required. This is the primary advantage of the Kruskal-Wallis test over one-way ANOVA. | Not required |
| Homogeneity of variance | Equal variances across groups are not required. The test is based on ranks and does not use group variance estimates in its computation. | Not required |
When the dependent variable is markedly skewed, contains extreme outliers, or fails a formal normality test such as Shapiro-Wilk or Kolmogorov-Smirnov, particularly with small sample sizes.
When the dependent variable is measured on an ordinal scale, such as Likert-type ratings, ranked preferences, or ordered categories where equal intervals between values cannot be assumed.
When group sample sizes are small (n less than 30 per group) or substantially unequal across groups, making parametric distributional assumptions unverifiable or implausible.
When comparing three or more independent groups simultaneously. For exactly two groups, the Mann-Whitney U test (Wilcoxon rank-sum) is the appropriate nonparametric alternative.
A significant Kruskal-Wallis test establishes only that at least one group differs from the others; it does not identify which pairs are responsible for the overall significance. Post-hoc pairwise comparisons are therefore required when the omnibus test is significant and the research question concerns specific group differences.
Dunn's test (1964) is the most widely used post-hoc procedure following a significant Kruskal-Wallis test. It performs pairwise comparisons using the same rank-based logic as the Kruskal-Wallis test, using the pooled variance estimate from all groups rather than computing separate variances for each pair. The test statistic for comparing groups i and j is:
The Bonferroni correction divides the nominal significance level by the number of pairwise comparisons, controlling the familywise error rate at the expense of statistical power. The Holm-Bonferroni procedure provides a less conservative alternative that maintains familywise error control with greater power. Both approaches are implemented in the calculator below.
Statistical significance alone is insufficient for meaningful interpretation. Effect size quantifies the practical magnitude of the difference among groups, independent of sample size. For the Kruskal-Wallis test, two effect size measures are commonly reported.
Epsilon-squared (ε²) is less biased than eta-squared when sample sizes are small and is increasingly recommended in contemporary methodological literature. Both measures range from 0 to 1, where 0 indicates no relationship between group membership and the dependent variable, and 1 indicates that group membership perfectly determines rank order.
Enter your data below. Results include H statistic, p-value, effect size, post-hoc comparisons, and a full narrative report.
Run the test from the Data Entry tab to see results here.
Run the test from the Data Entry tab to see the narrative report here.