What is the Friedman test and when should I use it?

The Friedman test is a non-parametric statistical procedure used to detect differences in treatments or conditions across multiple related groups or repeated measurements. It is the appropriate alternative to the one-way repeated measures ANOVA when the dependent variable is ordinal, or when the assumption of normality is violated with interval or ratio data. It requires at least three related conditions and was developed by economist Milton Friedman in 1937.

What are the assumptions of the Friedman test?

The Friedman test requires four conditions: the dependent variable must be measured at the ordinal, interval, or ratio level; the groups must consist of related samples (repeated measures on the same subjects, or matched subjects in randomised blocks); there must be three or more related groups; and the observations within each block must be independent of observations in other blocks. No assumption of normality is required.

How is the Friedman test statistic calculated?

The Friedman chi-square statistic is calculated by ranking observations within each row (block or subject) from 1 to k, summing the ranks for each column (condition), and applying the formula: chi-squared equals twelve divided by the product of N, k, and k plus one, multiplied by the sum of squared column rank totals, minus three times N times k plus one. When ties are present in any row, a correction factor is applied. The resulting statistic is evaluated against a chi-square distribution with k minus one degrees of freedom.

What is Kendall's W and how is it interpreted?

Kendall's coefficient of concordance W is the effect size measure for the Friedman test. It equals the Friedman chi-square statistic divided by the product of N and k minus one. W ranges from 0 to 1. Values below 0.10 indicate negligible agreement, 0.10 to 0.29 indicate weak agreement, 0.30 to 0.49 indicate moderate agreement, 0.50 to 0.69 indicate strong agreement, and 0.70 and above indicate very strong concordance among raters or repeated measurements.

Which post-hoc test follows a significant Friedman test?

After a significant Friedman test, pairwise comparisons are conducted to identify which specific conditions differ from one another. This calculator uses pairwise z-tests derived from column rank sum differences, with Bonferroni correction applied to control the familywise error rate. The adjusted significance threshold is the chosen alpha divided by the total number of pairwise comparisons, calculated as k times k minus one divided by two.

Friedman Test Calculator | Research Innovation Hub

The Friedman Test: Conceptual Foundation

The Friedman test is a non-parametric statistical procedure for evaluating whether three or more related groups or repeated measurement conditions differ significantly from one another. It was introduced by Milton Friedman in 1937 in the Journal of the American Statistical Association as a method that avoids the normality assumption inherent in parametric analysis of variance.

The procedure works by ranking observations within each block (subject or matched group) from lowest to highest. If no true differences exist among conditions, the ranks across blocks would distribute randomly, producing approximately equal column rank totals. Systematic differences, in contrast, produce consistently high or low ranks in certain columns, generating a large test statistic.

Non-Parametric Alternative

The Friedman test is the direct non-parametric counterpart of the one-way repeated measures ANOVA. It is appropriate when the dependent variable is measured at the ordinal level, or when interval or ratio data violate the normality assumption required by parametric tests.

Statistical Formula

The Friedman test statistic Q is computed from rank sums across conditions. For N subjects and k conditions, each row of data is ranked from 1 to k. The column rank totals R_j are then used in the following sequence of calculations.

Step 1 — Friedman Q Statistic (Uncorrected)

Q = 12 / [N × k × (k + 1)] × ΣR_j² − 3N(k + 1)

NNumber of subjects (rows / blocks)

kNumber of conditions (columns)

R_jSum of ranks in condition j

ΣR_j²Sum of squared column rank totals

Step 2 — Ties Correction Factor (CF)

CF = 1 − Σ(t_i³ − t_i) / [N × k × (k² − 1)]

Applied when two or more values in the same row are equal. When no ties are present, CF = 1 and the corrected statistic is identical to the uncorrected statistic.

t_iNumber of tied observations in tie group i within any row

Step 3 — Corrected Q Statistic

Q_corrected = Q / CF

This corrected value is used to obtain the p-value from the chi-square distribution with df = k − 1. When CF = 1 (no ties), this step is omitted.

Step 4 — Degrees of Freedom

df = k − 1

Q is evaluated against the chi-square distribution with k − 1 degrees of freedom. For three conditions, df = 2.

Effect Size — Kendall's Coefficient of Concordance (W)

W = Q / [N × (k − 1)]

W ranges from 0 (no agreement) to 1 (perfect concordance). It quantifies the degree to which all subjects rank the conditions in the same order. Interpreted using Landis & Koch (1977) benchmarks.

Post-hoc Standard Error (SE)

SE = √[N × k × (k + 1) / 6]

Used to compute pairwise z-statistics after a significant omnibus result. The z-statistic for each pair is (R_a − R_b) / SE, evaluated two-tailed with Bonferroni correction.

When to Use the Friedman Test

Condition	Requirement	Example
Number of groups	Three or more related conditions	Pre-test, Post-test 1, Post-test 2
Measurement level	Ordinal, interval, or ratio	Likert ratings, pain scores, reaction times
Sample structure	Repeated measures on the same subjects, or randomised blocks of matched subjects	Same participants rated under three conditions
Normality	Not assumed; violations are acceptable	Skewed distributions, small samples
Independence of blocks	Each row (subject) is independent of other rows	No cluster effects across subjects

Assumptions

The Friedman test requires four conditions to hold. The dependent variable must be measured at the ordinal level or higher. The k groups must represent related samples — either repeated observations from the same individuals, or matched subjects in randomised blocks. There must be a minimum of three conditions; the test is not applicable for two-group comparisons, for which the Wilcoxon signed-rank test is appropriate. Finally, observations within each block are assumed to be mutually independent, meaning no carryover or order effects systematically influence the data.

Effect Size: Kendall's Coefficient of Concordance (W)

Kendall's W quantifies the degree of agreement among blocks (subjects) regarding the ordering of conditions. It is the effect size measure for the Friedman test and ranges from 0 (no agreement) to 1 (perfect agreement). It is computed as W = Q / [N(k − 1)].

W Range	Interpretation	Reference
< 0.10	Negligible concordance	Landis & Koch (1977)
0.10 – 0.29	Weak concordance	Landis & Koch (1977)
0.30 – 0.49	Moderate concordance	Landis & Koch (1977)
0.50 – 0.69	Strong concordance	Legendre (2005)
0.70 – 1.00	Very strong concordance	Legendre (2005)

Friedman Test Calculator

Configure the study design, build the data table, enter your observed scores, and run the test. Results include the Friedman Q statistic, p-value, Kendall's W effect size, Bonferroni-corrected post-hoc comparisons, and four reporting narratives.

Data Orientation — Read Before Entering Data

Each row represents one subject. Each column represents one condition. Enter the score that subject received under that condition at the intersection of their row and column. This orientation is the same used by SPSS and matches how most published Friedman test tutorials present data. If you enter data column-first (all subjects for Condition 1, then all subjects for Condition 2), the calculator will rank within the wrong groupings and produce incorrect results.

Example for 3 subjects and 3 conditions: Row 1 contains Subject 1's scores under Condition A, Condition B, and Condition C. Row 2 contains Subject 2's scores. Row 3 contains Subject 3's scores.

Study Design Configuration

Number of Subjects (N) Minimum 3 subjects

Number of Conditions (k) Minimum 3 conditions

Significance Level (α) Two-tailed

Data Entry — Enter Observed Scores

Complete Statistical Output

Column Rank Totals and Mean Ranks

Rank Assignment Matrix

Ranks assigned within each row (subject). Tied values receive the average of their tied ranks. The Friedman test statistic is computed from these rank assignments.

Post-hoc Pairwise Comparisons (Bonferroni Correction)

Reporting Narratives

Frequently Asked Questions

Both are non-parametric alternatives to ANOVA, but they address different study designs. The Kruskal-Wallis test applies to independent groups — different participants in each condition. The Friedman test applies to related groups — the same participants measured under each condition, or matched subjects in randomised blocks. Choosing the wrong test when groups are related inflates error variance and reduces statistical power.

A non-significant result means the data do not provide sufficient evidence to conclude that any condition differs from the others at the chosen significance level. Post-hoc comparisons are not warranted following a non-significant omnibus test, and individual pairwise comparisons should not be conducted or reported. The interpretation should state that no statistically significant differences were observed across conditions, acknowledge the possibility of a Type II error, and note that the conclusion is limited to the observed sample.

When two or more values within the same row are equal, they receive the average of the ranks they would have occupied had they been distinct. The uncorrected Friedman statistic is then divided by a correction factor that accounts for the degree of tying present in the data. The correction factor equals one minus the sum of all tie correction terms divided by the product of N, k, and k squared minus one. In the absence of ties, the correction factor equals one and the corrected statistic is identical to the uncorrected statistic. This calculator applies the ties correction automatically.

The minimum is three subjects (N = 3) with three conditions (k = 3), but the chi-square approximation becomes increasingly accurate as sample size grows. For three conditions, N of at least 10 is generally recommended for the chi-square approximation to be reliable. For larger k, the approximation holds at smaller N. Exact critical values from Friedman's original tables are available for small samples (N < 10 with k = 3, or N < 6 with k = 4) and should be consulted when sample sizes are very small. This calculator uses the chi-square approximation, which is standard practice in published research.

For k conditions, there are m = k(k − 1)/2 possible pairwise comparisons. The Bonferroni correction adjusts the significance threshold to α/m and multiplies each raw p-value by m (capping at 1.000) to produce an adjusted p-value. This controls the familywise error rate — the probability of making at least one Type I error across all comparisons — at the chosen α level. The pairwise z-statistic for each comparison is derived from the difference in column rank totals divided by the standard error, which equals the square root of N times k times k plus one divided by six.