The Friedman Test: Conceptual Foundation
The Friedman test is a non-parametric statistical procedure for evaluating whether three or more related groups or repeated measurement conditions differ significantly from one another. It was introduced by Milton Friedman in 1937 in the Journal of the American Statistical Association as a method that avoids the normality assumption inherent in parametric analysis of variance.
The procedure works by ranking observations within each block (subject or matched group) from lowest to highest. If no true differences exist among conditions, the ranks across blocks would distribute randomly, producing approximately equal column rank totals. Systematic differences, in contrast, produce consistently high or low ranks in certain columns, generating a large test statistic.
The Friedman test is the direct non-parametric counterpart of the one-way repeated measures ANOVA. It is appropriate when the dependent variable is measured at the ordinal level, or when interval or ratio data violate the normality assumption required by parametric tests.
Statistical Formula
The Friedman test statistic Q is computed from rank sums across conditions. For N subjects and k conditions, each row of data is ranked from 1 to k. The column rank totals Rj are then used in the following sequence of calculations.
When to Use the Friedman Test
| Condition | Requirement | Example |
|---|---|---|
| Number of groups | Three or more related conditions | Pre-test, Post-test 1, Post-test 2 |
| Measurement level | Ordinal, interval, or ratio | Likert ratings, pain scores, reaction times |
| Sample structure | Repeated measures on the same subjects, or randomised blocks of matched subjects | Same participants rated under three conditions |
| Normality | Not assumed; violations are acceptable | Skewed distributions, small samples |
| Independence of blocks | Each row (subject) is independent of other rows | No cluster effects across subjects |
Assumptions
The Friedman test requires four conditions to hold. The dependent variable must be measured at the ordinal level or higher. The k groups must represent related samples — either repeated observations from the same individuals, or matched subjects in randomised blocks. There must be a minimum of three conditions; the test is not applicable for two-group comparisons, for which the Wilcoxon signed-rank test is appropriate. Finally, observations within each block are assumed to be mutually independent, meaning no carryover or order effects systematically influence the data.
Effect Size: Kendall's Coefficient of Concordance (W)
Kendall's W quantifies the degree of agreement among blocks (subjects) regarding the ordering of conditions. It is the effect size measure for the Friedman test and ranges from 0 (no agreement) to 1 (perfect agreement). It is computed as W = Q / [N(k − 1)].
| W Range | Interpretation | Visual | Reference |
|---|---|---|---|
| < 0.10 | Negligible concordance | Landis & Koch (1977) | |
| 0.10 – 0.29 | Weak concordance | Landis & Koch (1977) | |
| 0.30 – 0.49 | Moderate concordance | Landis & Koch (1977) | |
| 0.50 – 0.69 | Strong concordance | Legendre (2005) | |
| 0.70 – 1.00 | Very strong concordance | Legendre (2005) |
Friedman Test Calculator
Configure the study design, build the data table, enter your observed scores, and run the test. Results include the Friedman Q statistic, p-value, Kendall's W effect size, Bonferroni-corrected post-hoc comparisons, and four reporting narratives.
Each row represents one subject. Each column represents one condition. Enter the score that subject received under that condition at the intersection of their row and column. This orientation is the same used by SPSS and matches how most published Friedman test tutorials present data. If you enter data column-first (all subjects for Condition 1, then all subjects for Condition 2), the calculator will rank within the wrong groupings and produce incorrect results.
Example for 3 subjects and 3 conditions: Row 1 contains Subject 1's scores under Condition A, Condition B, and Condition C. Row 2 contains Subject 2's scores. Row 3 contains Subject 3's scores.
Ranks assigned within each row (subject). Tied values receive the average of their tied ranks. The Friedman test statistic is computed from these rank assignments.
Frequently Asked Questions
Both are non-parametric alternatives to ANOVA, but they address different study designs. The Kruskal-Wallis test applies to independent groups — different participants in each condition. The Friedman test applies to related groups — the same participants measured under each condition, or matched subjects in randomised blocks. Choosing the wrong test when groups are related inflates error variance and reduces statistical power.
A non-significant result means the data do not provide sufficient evidence to conclude that any condition differs from the others at the chosen significance level. Post-hoc comparisons are not warranted following a non-significant omnibus test, and individual pairwise comparisons should not be conducted or reported. The interpretation should state that no statistically significant differences were observed across conditions, acknowledge the possibility of a Type II error, and note that the conclusion is limited to the observed sample.
When two or more values within the same row are equal, they receive the average of the ranks they would have occupied had they been distinct. The uncorrected Friedman statistic is then divided by a correction factor that accounts for the degree of tying present in the data. The correction factor equals one minus the sum of all tie correction terms divided by the product of N, k, and k squared minus one. In the absence of ties, the correction factor equals one and the corrected statistic is identical to the uncorrected statistic. This calculator applies the ties correction automatically.
The minimum is three subjects (N = 3) with three conditions (k = 3), but the chi-square approximation becomes increasingly accurate as sample size grows. For three conditions, N of at least 10 is generally recommended for the chi-square approximation to be reliable. For larger k, the approximation holds at smaller N. Exact critical values from Friedman's original tables are available for small samples (N < 10 with k = 3, or N < 6 with k = 4) and should be consulted when sample sizes are very small. This calculator uses the chi-square approximation, which is standard practice in published research.
For k conditions, there are m = k(k − 1)/2 possible pairwise comparisons. The Bonferroni correction adjusts the significance threshold to α/m and multiplies each raw p-value by m (capping at 1.000) to produce an adjusted p-value. This controls the familywise error rate — the probability of making at least one Type I error across all comparisons — at the chosen α level. The pairwise z-statistic for each comparison is derived from the difference in column rank totals divided by the standard error, which equals the square root of N times k times k plus one divided by six.