What Is the Single Sample T-Test?
The single sample t-test (also called the one-sample t-test) is a parametric inferential statistical procedure that tests whether the mean of a single sample differs significantly from a known or hypothesized population mean (μ₀). Its defining characteristic — and the feature that distinguishes it from the single sample z-test — is that the population standard deviation (σ) is unknown and must be estimated from the sample data itself using the sample standard deviation (s).
The test was formally derived by William Sealy Gosset in 1908, publishing under the pseudonym "Student" due to his employer's confidentiality restrictions. The resulting distribution — the Student's t-distribution — accounts for the additional uncertainty introduced by estimating σ from a finite sample, producing heavier tails than the standard normal and yielding conservative (less extreme) p-values, particularly at small sample sizes.
As sample size increases, the t-distribution converges to the standard normal distribution. At n → ∞, the t-test and z-test yield numerically identical results — a relationship that illuminates why the t-test is the preferred default in practice: it is the z-test generalized to the realistic condition of unknown population variance.
Single Sample T-Test vs. Z-Test
Understanding the relationship between the t-test and z-test is essential at the doctoral level, as the choice between them is not merely computational but reflects a fundamental epistemological distinction about what is known versus estimated.
| Criterion | Single Sample Z-Test | Single Sample T-Test |
|---|---|---|
| Population σ | Known (fixed, from external source) | Unknown — estimated by sample s |
| Reference Distribution | Standard Normal N(0,1) | Student's t with df = n − 1 |
| Critical Values | Fixed (e.g., ±1.960 at α=.05, two-tailed) | Depend on degrees of freedom (df) |
| Tail Weight | Thin tails | Heavier tails — more conservative at small n |
| Practical Applicability | Rare (σ seldom truly known) | Common — the default in most research |
| Asymptotic Behavior | — | Converges to z-test as n → ∞ |
| Standard Error | σ / √n (exact) | s / √n (estimated) |
When to Use the Single Sample T-Test
- The population σ is unknown and must be estimated from the sample — the most common real-world scenario.
- The variable is continuous, measured at interval or ratio scale.
- You are comparing a single sample mean to a known reference or normative value (μ₀).
- The sample is drawn randomly and independently from the population of interest.
- The distribution is approximately normal, or n is sufficiently large (≥ 30) to invoke the Central Limit Theorem.
- Classic examples: comparing a class mean score to a national norm; testing whether a sample's average differs from a clinical threshold.
- Two independent groups are compared — use independent samples t-test (Welch's or Student's).
- Observations are paired (pre-post, matched) — use paired samples t-test.
- The dependent variable is categorical or ordinal — use chi-square or Wilcoxon signed-rank test.
- The population σ is genuinely known — use the z-test for greater statistical power.
- Data are severely non-normal with very small n (< 10) — consider Wilcoxon signed-rank or bootstrapped CI.
- Three or more group means are compared — use ANOVA or Kruskal-Wallis.
Assumptions & Their Diagnostic Tests
The validity of inferences from the single sample t-test depends on the tenability of the following assumptions. Doctoral-level practice demands explicit assumption checking and, where violations are detected, discussion of remedial strategies or alternative approaches.
| Assumption | Description | Diagnostic Procedure | Robustness |
|---|---|---|---|
| Normality | The sampling distribution of the mean must be approximately normal. For n ≥ 30, CLT generally guarantees this regardless of population shape. For small n, the population itself should be approximately normal. | Shapiro-Wilk (n < 50), Kolmogorov-Smirnov, D'Agostino-Pearson, Q-Q plots, skewness/kurtosis indices (|skew| < 2, |kurt| < 7 as heuristics). | HIGH Robust for n ≥ 30 via CLT. |
| Independence | Each observation must be statistically independent. Violations — such as clustering, repeated measures, or autocorrelation — inflate or deflate Type I error rates unpredictably. | Durbin-Watson statistic (time series), ICC for clustered data, design-based review of sampling protocol. | CRITICAL Not robust; violates test logic. |
| Scale of Measurement | The dependent variable must be at least interval-level to justify computing a meaningful arithmetic mean and standard deviation. | Conceptual/theoretical evaluation. Likert-type scales require explicit argumentation. Ordinal scales → non-parametric alternatives. | MODERATE Debated in methodological literature. |
| Random Sampling | The sample must constitute a probability sample from the defined target population to support valid generalization of inferential conclusions. | Review of sampling frame design, recruitment procedure, non-response bias analysis, representativeness assessment. | MODERATE Convenience samples require explicit caveats. |
| No Outliers | Extreme outliers disproportionately influence the sample mean and standard deviation, distorting both the t-statistic and its associated p-value. | Boxplots, z-score screening (|z| > 3.29 at α = .001), Grubbs' test, visual inspection of histograms and stem-and-leaf plots. | MODERATE Use Winsorization or trimmed means if outliers are verified. |
The T-Test Formula & Its Decomposition
- x̄
- Observed sample mean
- μ₀
- Hypothesized population mean (null value)
- s
- Sample standard deviation (estimated from data)
- n
- Sample size
- s/√n
- Estimated Standard Error of the Mean (SEM)
- df
- Degrees of freedom = n − 1
- xᵢ
- Individual observation
- x̄
- Sample mean
- n − 1
- Bessel's correction — unbiased denominator
The denominator in computing the sample variance uses n − 1 rather than n — a correction known as Bessel's correction. Because the sample mean x̄ is used in computing s (and is therefore not independent of the deviations), one degree of freedom is consumed. Using n − 1 produces an unbiased estimator of the population variance σ².
Degrees of freedom (df = n − 1) also govern the shape of the t-distribution used to evaluate the t-statistic. With fewer df, the distribution has heavier tails, requiring a more extreme t-value to achieve the same α level. This reflects the greater uncertainty in σ estimation from small samples — a principled statistical conservatism.
The t-statistic, like the z-statistic, is a signal-to-noise ratio: the numerator (x̄ − μ₀) measures the raw discrepancy from the null value, while the denominator (s/√n) scales this discrepancy by the estimated variability in sample means.
H₀: μ = μ₀
H₁: μ ≠ μ₀
No directional prediction. Rejection region is split equally across both tails (α/2 per tail). The standard and recommended choice for most research designs without strong prior directional theory.
p = 2 × P(T(df) ≥ |t|)
H₀: μ = μ₀
H₁: μ > μ₀ or μ < μ₀
Directional prediction must be grounded in theory and registered prior to data collection. One-tailed tests increase power for the predicted direction but preclude detection of effects in the opposite direction.
p = P(T(df) ≥ t) or P(T(df) ≤ t)
The Student's t-Distribution
The Student's t-distribution is a family of symmetric, bell-shaped probability distributions parameterized by degrees of freedom (df = n − 1). Unlike the standard normal distribution, the t-distribution has heavier tails — a mathematical consequence of the additional uncertainty introduced by estimating σ from the sample.
As df increases (i.e., as n grows), the t-distribution approaches the standard normal distribution asymptotically. For practical purposes, the difference becomes negligible beyond approximately df = 120, which explains why some textbooks use z-critical values for large samples even when σ is unknown — though the t-distribution remains technically more appropriate.
| df | Critical t (α=.05, two-tailed) | Critical t (α=.01, two-tailed) | Comparable z (α=.05) |
|---|---|---|---|
| 5 | 2.571 | 4.032 | ±1.960 |
| 10 | 2.228 | 3.169 | |
| 20 | 2.086 | 2.845 | |
| 30 | 2.042 | 2.750 | |
| 60 | 2.000 | 2.660 | |
| 120 | 1.980 | 2.617 | |
| ∞ | 1.960 | 2.576 |
Cohen's d and Practical Significance
For the single sample t-test, Cohen's d is computed using the sample standard deviation s rather than the population σ (which is unknown). This version is sometimes called the sample-based or estimated Cohen's d:
- |x̄ − μ₀|
- Absolute mean difference
- s
- Sample standard deviation (Bessel-corrected)
Some researchers prefer Hedges' g as a small-sample correction to Cohen's d, computed as d × J(df), where J(df) is a correction factor that accounts for positive bias in d at small sample sizes. For n ≥ 20, the difference is negligible; for n < 20, Hedges' g is the more defensible choice.
Subtle effect, detectable only with adequately powered large samples.
Moderately visible effect; meaningful in most applied research contexts.
Substantial, practically important effect visible without statistical analysis.
Confidence Intervals for the T-Test
The confidence interval for the single sample t-test is constructed using the t-critical value (not z*), reflecting the estimation of σ from sample data:
- t*(df, α/2)
- Critical t-value for df = n−1 and chosen α
- s/√n
- Estimated Standard Error of the Mean
Because the t-critical value is always ≥ the z-critical value (converging only at df → ∞), t-based confidence intervals are always wider than z-based intervals for the same data. This widening accurately reflects the greater uncertainty from estimating σ — a principled form of statistical honesty.
The confidence interval and the hypothesis test are mathematically dual: if and only if μ₀ falls outside the (1−α) CI does the t-test reject H₀ at that α level. This duality makes CI reporting not merely supplementary but informationally equivalent to (and arguably superior to) binary significance decisions.
APA 7th edition mandates reporting confidence intervals for all primary inferential results. Researchers should specify both the confidence level and whether the interval was computed using t-critical or z-critical values.
T-Test Calculator for a Single Sample
APA 7th Edition Reporting
Confidence Interval Analysis Matrix
| Confidence Level | t* Critical Value (df) | Lower Bound | Upper Bound | Contains μ₀? |
|---|
Chicago / Narrative Style Report
Elaborated Technical Evaluation
Interpreting & Reporting Results
The t-test requires reporting the t-statistic, degrees of freedom, exact p-value, effect size, and confidence interval. Recommended format:
Note the degrees of freedom in parentheses after t. Never report p = 0.000 — use p < .001. Report exact p to three decimal places when ≥ .001.
- Using σ notation when reporting the sample SD — always use s or SD.
- Equating a non-significant result with H₀ being true — retain, never "accept" H₀.
- Using z-critical values instead of t-critical values when constructing CIs from sample data.
- Omitting degrees of freedom from the t-test reporting.
- Reporting only the p-value without effect size (Cohen's d) or confidence interval.
- Adopting one-tailed tests retrospectively to convert a non-significant result to significant.
- Interpreting a large Cohen's d as clinically significant without domain-specific benchmarking.