Stratified Random Sampling: A Doctoral-Level Reference

Section 01 — Foundational Theory

Epistemological & Theoretical Foundations

Stratified Random Sampling (StRS) is a probability sampling design in which the population is first partitioned into non-overlapping, exhaustive subgroups — called strata — and an independent probability sample is then drawn from each stratum. The resulting combined sample is statistically superior to simple random sampling (SRS) whenever stratum means differ and within-stratum units are relatively homogeneous.

Formal Definition

Stratified random sampling is a method of selecting a probability sample from a population of N units by first partitioning the population into H mutually exclusive and collectively exhaustive strata of sizes N₁, N₂, …, N_H (where ΣN_h = N), and then independently drawing a probability sample of size n_h from each stratum h (where Σn_h = n), using any legitimate within-stratum design — most commonly, simple random sampling without replacement.

— Cochran, W.G. (1977). Sampling Techniques (3rd ed.). John Wiley & Sons, pp. 89–91; Kish, L. (1965). Survey Sampling. John Wiley & Sons, p. 75.

Illustration: Population N = 400 partitioned into H = 4 strata — independent SRS drawn within each

Each coloured segment represents one stratum. The width is proportional to stratum size N_h. Independent SRS is applied within each stratum, not across the combined population. This independence is what makes exact variance estimation tractable — a critical advantage over systematic sampling.

Historical Development

The theoretical foundations of stratified sampling were established by Jerzy Neyman in his landmark 1934 paper "On the Two Different Aspects of the Representative Method," published in the Journal of the Royal Statistical Society. Neyman's paper did two things of enduring importance: it placed stratified sampling within a rigorous probabilistic framework, and it derived the allocation formula that minimises the variance of the stratified estimator for a fixed sample size — what is now universally called Neyman optimal allocation. Before Neyman's contribution, the dominant approach was A.L. Bowley's proportional allocation (1926), which had sound practical credentials but no proof of optimality.

William G. Cochran's (1977) Sampling Techniques provided the definitive treatment of stratified sampling for the survey research era, establishing the comparison framework between SRS, stratified SRS, and systematic designs that remains standard in doctoral methodology coursework. Leslie Kish (1965) extended the framework to complex multi-stage designs and developed the design effect (DEFF) measure that quantifies the efficiency gain (or loss) relative to SRS, permitting researchers to communicate the practical value of stratification in terms directly interpretable by applied users.

The Core Mechanic: Why Stratification Reduces Variance

The fundamental insight is that the total variance in any population can be decomposed into two components: variance between strata (differences among stratum means) and variance within strata (differences among units sharing the same stratum). SRS averages across both. Stratified sampling eliminates the between-strata component from the sampling variance entirely, because the allocation across strata is fixed by design rather than left to chance. Only the within-stratum variance contributes to the sampling variance of the stratified estimator.

This is not merely a theoretical result — it is the practical argument for stratification. If you know that hospitals in your study vary dramatically by size (small, medium, large) and that your outcome variable (say, patient readmission rate) differs substantially across size categories, then a purely random SRS could, by chance, over-represent large hospitals. Stratified sampling guarantees that small, medium, and large hospitals are each represented in exactly the proportion intended, and the variance of the estimator reflects only the uncertainty about within-stratum variation — not the certainty about between-stratum differences.

Why Stratified Sampling Is Used

Reason 01

Variance Reduction

When strata means differ substantially and within-stratum units are relatively homogeneous, stratification reduces the sampling variance below what SRS can achieve for the same total sample size n. The efficiency gain is proportional to the between-strata variance — the more heterogeneous the strata means, the larger the gain.

Reason 02

Domain (Subgroup) Estimation

When reliable estimates are needed for specific population subgroups — gender, region, age band, institution type — stratification guarantees an adequate sample within each subgroup. With SRS, small subgroups may receive too few observations for precise estimation. Disproportionate stratified sampling resolves this by oversampling small but important strata.

Reason 03

Administrative Convenience

When a population is naturally organised into administrative units — schools within districts, wards within hospitals, branches within a company — it is often operationally efficient to treat each administrative unit as a stratum and sample independently within each. This matches the data collection infrastructure to the sampling design, reducing logistical error.

Reason 04

Exact Variance Estimation

Unlike systematic sampling, stratified sampling permits an exact, design-based unbiased estimator of the sampling variance. Because n_h ≥ 2 units are drawn independently within each stratum, within-stratum sample variances s_h² are computable and directly usable in the variance formula — no approximation or assumption about the frame ordering is required.

Reason 05

Protection Against Imbalanced Samples

SRS cannot guarantee that a rare but theoretically important subgroup will appear in the sample at all. If 3% of a population are recent immigrants and your research question concerns their experience, a random SRS of n = 200 yields an expected 6 cases — with high probability of zero. Stratification guarantees a pre-specified minimum n_h.

Reason 06

Cost Efficiency via Optimal Allocation

When data collection costs vary across strata — e.g., conducting face-to-face interviews in rural areas costs more per unit than in urban areas — Neyman-extended cost-optimal allocation directs more of the sample budget toward cheaper strata and less toward expensive ones, minimising total cost for a fixed target precision, or minimising variance for a fixed total cost (Cochran, 1977, pp. 96–98).

📐

The Stratification Variable: The Most Consequential Design Decision

The efficiency gain from stratification depends entirely on the choice of stratification variable(s). The ideal stratification variable is one that (1) is known for every population element before sampling, (2) is strongly correlated with the outcome variable of interest, and (3) can be used to form groups with high within-stratum homogeneity. Stratifying on a variable uncorrelated with the outcome produces no variance reduction below SRS — it merely adds administrative complexity. Stratifying on a variable highly correlated with the outcome (e.g., stratifying a study of school achievement by prior year test score band) can yield very large efficiency gains. The rule of thumb is: a good stratification variable is one you would want to control for in the analysis anyway (Cochran, 1977, pp. 127–131; Kish, 1965, pp. 90–95).

Section 02 — Mathematical Theory

Estimators, Allocation Methods & Variance Theory

Stratified sampling is the only common probability design with both an unbiased point estimator and an exact, design-based unbiased variance estimator. This mathematical tractability — combined with the variance reduction from stratification — makes it the preferred design for precision-critical research.

1. Notation and Setup

Population and Sample Notation

H strata; N_h = size of stratum h; N = ΣN_h
W_h = N_h / N (stratum weight)
n_h = sample size in stratum h; n = Σn_h
f_h = n_h / N_h (within-stratum sampling fraction)
S_h² = (1/(N_h−1)) · Σᵢ(yᵢₕ − Ȳ_h)² (stratum population variance)
s_h² = (1/(n_h−1)) · Σᵢ(yᵢₕ − ȳ_h)² (stratum sample variance)

Ȳ_h = true stratum mean (population parameter) · ȳ_h = stratum sample mean (estimated)
All summations over h run from 1 to H unless otherwise noted.
The key convention: W_h = N_h/N are the population stratum weights, used for proper weighting in estimation.

2. The Stratified Estimator of the Population Mean

Stratified Mean Estimator — Unbiased

ȳ_st = Σ_h W_h · ȳ_h = Σ_h (N_h/N) · ȳ_h

This is a weighted average of the stratum sample means, where the weight of each stratum equals its share of the total population.
E(ȳ_st) = Ȳ — the estimator is unbiased regardless of the allocation method used, provided within-stratum sampling is itself unbiased (e.g., SRS within each stratum).
Proof: E(ȳ_st) = Σ W_h · E(ȳ_h) = Σ W_h · Ȳ_h = (1/N)·Σ N_h·Ȳ_h = (1/N)·ΣΣ yᵢₕ = Ȳ. ✓

3. True Variance of the Stratified Estimator

Exact Design-Based Variance — Cochran (1977), §5.3

V(ȳ_st) = Σ_h W_h² · (1 − f_h) · S_h² / n_h

W_h² = squared stratum weight = (N_h/N)² · (1−f_h) = finite population correction (FPC) for stratum h · S_h²/n_h = stratum sampling variance
This formula is exact — not an approximation — when SRS without replacement is used within each stratum.
When f_h = n_h/N_h is small (as it usually is in large populations), the FPC ≈ 1 and V(ȳ_st) ≈ Σ W_h² · S_h²/n_h.
Critical advantage over systematic sampling: This variance can be estimated unbiasedly from the data. Substitute s_h² for S_h² — no approximation required.

4. Unbiased Variance Estimator (from the sample)

Estimated Variance — Exact, Unbiased

v̂(ȳ_st) = Σ_h W_h² · (1 − f_h) · s_h² / n_h

s_h² = within-stratum sample variance = Σᵢ(yᵢₕ − ȳ_h)² / (n_h − 1)
Requires: n_h ≥ 2 in every stratum (at least two observations per stratum are needed to compute s_h²). This is the minimum requirement for variance estimation — single-unit strata are "collapsed" with adjacent strata before estimation.
E[v̂(ȳ_st)] = V(ȳ_st) — exactly unbiased. No assumptions about the population structure are required.

5. Allocation Methods

The allocation of the total sample size n across the H strata is the most consequential design decision in stratified sampling. Four allocation strategies are in common use, each with a different optimality criterion:

Equal Allocation

n_h = n / H for all h = 1, 2, …, H

Each stratum receives the same number of observations regardless of stratum size or variance.
Use case: When all H strata are of equal scientific interest, each stratum's estimate is needed at equal precision, and stratum sizes are similar.
Drawback: Inefficient when strata differ substantially in size — large strata are undersampled, small strata are oversampled. Requires post-hoc weighting to produce an unbiased population estimate.
Never EPSEM unless all N_h are equal.

Proportional Allocation — Bowley (1926)

n_h = n · (N_h / N) = n · W_h

The within-stratum sampling fraction is the same in every stratum: f_h = n_h/N_h = n/N = f for all h.
This produces an EPSEM design — every element has inclusion probability πᵢ = n/N = f regardless of stratum membership. The sample is self-weighting (no post-hoc weights needed).
V(ȳ_st,prop) = (1/n) · Σ W_h S_h² − (1/N) · Σ W_h S_h²
Always ≤ V(ȳ_SRS) when S_h² are finite — proportional allocation never performs worse than SRS (Cochran, 1977, pp. 104–105).
Gain over SRS: V(ȳ_SRS) − V(ȳ_st,prop) = (1/n) · Σ W_h(Ȳ_h − Ȳ)² ≥ 0

Neyman Optimal Allocation — Neyman (1934)

n_h = n · (N_h S_h) / Σ_j (N_j S_j) = n · (W_h S_h) / Σ_j (W_j S_j)

This minimises V(ȳ_st) for fixed total sample size n — it is the theoretically optimal allocation when all within-stratum data collection costs are equal.
Larger strata (larger N_h) and more variable strata (larger S_h) receive proportionally larger samples.
V(ȳ_st,opt) = (1/n) · [Σ W_h S_h]² − (1/N) · Σ W_h S_h² ≤ V(ȳ_st,prop) ≤ V(ȳ_SRS)
Gain of Neyman over proportional: V(ȳ_st,prop) − V(ȳ_st,opt) = (1/n) · Σ W_h(S_h − S̄_W)² where S̄_W = Σ W_h S_h
This gain is large when stratum variances S_h differ substantially — the practical case for Neyman allocation.

Cost-Optimal Allocation — Cochran (1977), §5.5

n_h = n · (N_h S_h / √c_h) / Σ_j (N_j S_j / √c_j)

c_h = per-unit data collection cost in stratum h
This minimises total survey cost for a fixed target variance, or equivalently minimises variance for a fixed total budget C = c₀ + Σ c_h n_h (where c₀ is a fixed overhead cost).
Logic: Expensive strata receive fewer units (high c_h → low n_h); cheap strata receive more. Also assigns more units to high-variance strata (high S_h → high n_h).
Reduces to Neyman allocation when all c_h are equal. Reduces to proportional allocation when all c_h and S_h are equal.

6. The Variance Inequality Chain

Efficiency Ordering — Cochran (1977), §5.6

V(ȳ_st,opt) ≤ V(ȳ_st,prop) ≤ V(ȳ_SRS) ≤ V(ȳ_st,equal) [last: not always]

The first two inequalities are unconditional — Neyman allocation always dominates proportional, which always dominates SRS, for any finite-population configuration with distinct stratum means.
Equal allocation may be worse than SRS when stratum sizes differ greatly (heavily oversampled small strata contribute high-variance estimates of small-weight quantities).
The gain from stratification is zero when all stratum means are identical (Ȳ_h = Ȳ for all h) — stratification on an unrelated variable adds complexity without reducing variance.

Visualising Allocation Methods: H = 3 Strata (N₁=500, N₂=300, N₃=200; S₁=10, S₂=20, S₃=30; c₁=1, c₂=2, c₃=4)

Bar lengths represent n_h for n = 120 total. Note how Neyman allocation directs the largest sample to Stratum 3 — smallest in size but highest variance. Cost-optimal allocation then adjusts downward for Stratum 3 because it is also the most expensive to sample.

7. Number and Boundaries of Strata

In general, increasing the number of strata H reduces variance — but with rapidly diminishing returns. Cochran (1977, pp. 127–130) demonstrates that with a uniformly distributed population and proportional allocation, most of the variance reduction achievable through stratification is captured with H = 4 to 6 strata. Beyond H = 6, additional strata produce minimal further variance reduction while substantially increasing design and operational complexity. The widely cited Cum √f rule (Dalenius and Hodges, 1959) provides an optimality condition for stratum boundary placement when the population distribution of the stratification variable is approximately known: boundaries are placed such that the cumulative square root of the frequency distribution is divided into H equal parts.

8. Post-Stratification

When stratum membership cannot be determined before sampling — for example, when age group or region of a respondent is unknown until the survey interview — the researcher can apply post-stratification: draw an SRS of size n from the full population, and after data collection, weight each observation by W_h / (n_h/n) = N_h/n_h, where n_h is the realised sample count in stratum h. The post-stratified estimator is approximately unbiased with variance approximately equal to the stratified estimator variance plus a small additional component due to random variation in the realised n_h values (Cochran, 1977, pp. 134–135). Post-stratification is widely used in political polling and large-scale social surveys to align achieved sample distributions with known population benchmarks (census-based weights).

Section 03 — Interactive Learning Tool

Stratified Sampling Simulator

Configure the number of strata, their sizes, variances, and the total sample size. Observe how three allocation methods distribute n_h across strata, watch the within-stratum sampling execute, and compare estimated variances in real time.

StRS Interactive Simulator

Visualises stratum structure, allocation methods, and the sampling distribution of ȳ_st

Number of Strata (H) 3

Total Sample (n) 24

Allocation Method

Stratum Populations — highlighted = selected in current sample

Stratum	N_h	W_h	S_h (σ)	n_h (allocated)	f_h = n_h/N_h	ȳ_h (sample)	v̂ contribution

— Total N

— Total n

— Strata H

— Stratified ȳ_st

— SE(ȳ_st) exact

— V_SRS / V_st (gain)

Sampling distribution of ȳ_st — histogram from simulation runs (independent random starts per run)

🔬

What the Simulator Demonstrates

Draw Sample: Displays each stratum's population as dots; selected units are highlighted in the stratum colour. Each stratum's n_h is independently drawn. The allocation table shows the exact n_h under the chosen method, the achieved ȳ_h, and each stratum's contribution to v̂(ȳ_st).

Allocation Method toggle: Switching between Proportional, Neyman, and Equal allocation updates n_h instantly. Notice that Neyman allocation concentrates observations in high-variance strata, potentially yielding a dramatically smaller SE(ȳ_st) than proportional allocation when stratum variances differ.

Run Simulation: Executes multiple independent stratified samples and plots the sampling distribution of ȳ_st. The spread of this distribution is the empirical standard error — compare it with the theoretical SE(ȳ_st) shown in the stats panel. Convergence between the two confirms the exactness of the design-based variance formula.

Section 04 — Critical Evaluation

Assumptions, Conditions & Limitations

Stratified sampling carries a specific and consequential set of assumptions. Five of these — exhaustive partitioning, known stratum sizes, adequate within-stratum sample sizes, the relevance of the stratification variable, and the independence of within-stratum sampling — require explicit justification and documentation in doctoral research.

Formal Assumptions

Assumption	Technical Requirement	Violation Consequence	Diagnostic / Remedy
Exhaustive, Non-Overlapping Strata	Every element of the population belongs to exactly one stratum: ⋃N_h = N and N_h ∩ N_j = ∅ for h≠j	Elements counted in multiple strata produce biased estimates; elements in no stratum are excluded (coverage error)	Conduct a pre-sampling frame audit; resolve dual membership by assignment rules pre-specified in the protocol; track excluded elements as coverage error
Known Stratum Sizes N_h	N_h must be known for all h before sampling to compute weights W_h = N_h/N and allocation n_h	Unknown N_h requires estimation; estimated weights introduce bias in ȳ_st proportional to the estimation error in W_h	Use administrative records, census data, or pilot enumeration to determine N_h; document the data source and vintage of the N_h estimates
n_h ≥ 2 in Every Stratum	Minimum two sampled units per stratum are required to compute the within-stratum sample variance s_h²	Single-unit strata produce undefined s_h²; variance estimation fails; s_h = 0 assumed, severely underestimating variance	Merge "thin" strata with adjacent strata (collapsed stratum method) before analysis; set minimum n_h = 2 as a hard constraint in allocation
Independent Sampling Across Strata	The sample drawn in stratum h must be statistically independent of the sample in stratum j ≠ h	Correlated selections (e.g., shared field worker who systematically selects similar units across strata) invalidate the variance formula and understate true uncertainty	Assign separate and independent randomisation procedures to each stratum; use audited PRNG with different seeds per stratum; document randomisation in the protocol
Stratification Variable Known Pre-Sampling	The variable used to assign units to strata must be known for every population element before the sample is drawn	If stratum membership is unknown until after contact, true stratified sampling is impossible; only post-stratification is feasible — with its additional variance component	Use administrative records, prior survey data, or observable proxies for stratum assignment; document the basis and date of stratum membership determination
SRS Within Strata (or Known Design)	The within-stratum design must be a legitimate probability design with known inclusion probabilities	Non-probability within-stratum selection (convenience, voluntary) invalidates design-based inference for the entire stratified estimator	Use randomised within-stratum selection; document the randomisation procedure; compute and report within-stratum inclusion probabilities πᵢ\|h

Core Limitations

Stratified sampling requires a substantially richer sampling frame than SRS or systematic sampling. Not only must every population element be listed, but stratum membership must be recorded for each element before sampling commences. In many research contexts, this is operationally demanding: hospital databases may not categorise patients by the clinical variable of interest; school registers may not record the socioeconomic indicators needed to stratify by deprivation band; company employee lists may lack the departmental or seniority classifications required for domain analysis.

When stratum membership cannot be determined from the frame, researchers must either (a) conduct a preliminary screening survey to assign stratum membership — adding cost and time — or (b) fall back on post-stratification, which provides approximately equivalent statistical properties but adds an additional variance component and requires that stratum membership can be determined after contact (Cochran, 1977, pp. 134–135).

Doctoral researchers must therefore document not only the stratification variable chosen, but the source and quality of the stratum membership data, including its currency (how recently was the membership information updated?) and its accuracy (what is the error rate in stratum classification?).

When the allocation is disproportionate — either because Neyman or equal allocation is used, or because small strata were deliberately oversampled for domain estimation — the raw sample data cannot be directly averaged to produce an unbiased estimate of the population mean. Differential inclusion probabilities mean that units from oversampled strata are over-represented in the raw sample; their contribution to the aggregate estimate must be down-weighted by the factor 1/πᵢ or equivalently by N_h/(n_h/N_h) = N_h/f_h per stratum.

This weighting is theoretically straightforward but creates practical complications: (1) survey software packages that do not support complex survey design specifications may compute incorrect standard errors if treated as SRS data; (2) subgroup analyses must account for the sampling weights, or estimates will be biased toward oversampled strata; (3) the design effect (DEFF = V(ȳ_st) / V(ȳ_SRS at same n)) may be greater than 1 for disproportionate allocation applied to outcomes weakly correlated with the stratification variable — disproportionate allocation can be less efficient than SRS for outcomes not related to the stratification variable (Kish, 1965, pp. 94–95; Groves et al., 2009, pp. 101–104).

Doctoral researchers must specify all sampling weights in their data documentation (codebook), confirm that analysis software correctly applies the design weights, and report analyses with and without weights as a sensitivity check if the weighting effect is substantial.

When a stratum contains only a single sampled unit (n_h = 1), the within-stratum sample variance s_h² is undefined — one cannot estimate variance from a single observation. This scenario, which Wolter (2007) calls the "lonely PSU" problem in cluster sampling, has the equivalent "lonely stratum unit" form in stratified sampling. It arises most commonly when strata are very small (N_h is small) and the allocation assigns n_h = 1 to save sample budget.

The standard remedy is the collapsed stratum estimator: pair adjacent strata (e.g., stratum 3 and stratum 4 are merged for variance estimation purposes, though the means are still estimated separately). This produces a conservative (upwardly biased) variance estimate for the merged pair. The degree of conservatism is bounded and predictable, making the collapsed stratum estimator methodologically preferable to any ad hoc variance imputation. The procedure, its rationale, and the stratum pairs involved must be documented in the analysis report (Cochran, 1977, pp. 136–138; Wolter, 2007, pp. 158–162).

Prevention is preferable to remedy: in the allocation phase, enforce a minimum n_h ≥ 2 across all strata. This constraint slightly reduces efficiency relative to unconstrained Neyman allocation but ensures variance estimability for every stratum. In large-scale government surveys, n_h ≥ 2 is a mandatory design requirement (Groves et al., 2009).

If the stratification variable has low or zero correlation with the primary outcome variable, stratified sampling with proportional allocation provides no variance reduction over SRS — and with disproportionate allocation, may actively increase variance. The gain from stratification, as shown in the formula V(ȳ_SRS) − V(ȳ_st,prop) = (1/n)·Σ W_h(Ȳ_h − Ȳ)², is zero if and only if all stratum means Ȳ_h are equal to the population mean Ȳ. If the stratification variable is genuinely uncorrelated with the outcome, this condition holds, and stratification produces a design that is no more efficient than SRS while being more complex to implement and analyse.

In multi-outcome surveys — the norm in social science research — this is a ubiquitous practical dilemma: the stratification variable may be strongly correlated with one outcome (justifying stratification for that outcome) but weakly correlated with another (providing no efficiency benefit for that outcome). The researcher must prioritise the primary outcome for stratification purposes and acknowledge that estimates of secondary outcomes may not benefit from the stratification efficiency gain (Kish, 1965, pp. 90–95; Cochran, 1977, pp. 122–127).

Non-response within strata is more complex in stratified designs than in SRS, because it may differ systematically across strata. If response rates vary by stratum — a nearly universal finding in stratified surveys — the effective achieved sample in each stratum diverges from the planned n_h. If within-stratum non-response is Missing at Random (MAR) conditional on stratum membership, the stratified estimator remains approximately unbiased. If non-response is Missing Not at Random (MNAR) — i.e., the probability of non-response depends on the outcome variable even within stratum — the estimator will be biased regardless of how well the strata were constructed.

The recommended approach: compute within-stratum response rates; apply non-response weighting adjustment factors r_h = n_h,total/n_h,respondents within each stratum; document these adjustments in the analysis report; conduct sensitivity analyses to assess the potential magnitude of MNAR bias. The AAPOR (2016) response rate formulas must be calculated and reported separately for each stratum in a transparent survey methodology report (Little & Rubin, 2002; Groves et al., 2009, pp. 208–213).

Section 05 — Comparative Analysis

Stratified Sampling vs. Other Probability Designs

Stratified random sampling is often the most statistically efficient probability design when strata can be meaningfully defined and the stratification variable is substantially correlated with the outcome. Understanding exactly where it excels and where it is outperformed is essential for methodologically justified design selection.

Criterion

Stratified RS

SRS

Systematic RS

Cluster Sampling

Frame Requirements

Complete frame with stratum membership for all elements

Complete list; no auxiliary info required

Ordered list; no strata required

Only cluster list required; no element list needed

Statistical Efficiency

Highest — eliminates between-strata variance from SE

Baseline benchmark

Better than SRS (ordered frame); worse (periodic)

Lowest — DEFF > 1 due to within-cluster homogeneity

Variance Estimation

Exact, unbiased (requires n_h ≥ 2 per stratum)

Exact, unbiased

Approximate only (fundamental theoretical limitation)

Complex; requires DEFF or linearisation

Domain Estimation

Excellent — guaranteed n_h per subgroup by design

Unreliable for small subgroups

Limited — random start may miss small subgroups

Feasible if clusters align with domains

EPSEM Property

Only under proportional allocation; not under Neyman or equal allocation

Always EPSEM

Always EPSEM (random or non-periodic frame)

With PPS selection; otherwise not EPSEM

Operational Simplicity

Moderate — requires stratum membership, allocation calculation, separate randomisation per stratum

Moderate — requires list and random number generation

High — single random start; fieldwork simple

High in field — enumerate and sample within clusters only

Continuous Populations

Not feasible — stratum membership must be known pre-sampling

Impossible — no frame exists

Ideal for real-time sequential populations

Possible with pre-defined time clusters

Best Used When

Population has known, heterogeneous subgroups; domain estimates needed; variance reduction is a priority; complete frame with stratum info available

Homogeneous population; complete frame; no subgroup requirements; simplest possible design needed

Ordered frame available; sequential/continuous population; no periodic frame structure; operational simplicity required

No complete element frame; geographically dispersed; cost constraints; multi-stage design necessary

Foundational Reference

Neyman (1934); Cochran (1977) Chs. 5–6; Kish (1965) Ch. 3

Cochran (1977) Ch. 2

Madow & Madow (1944); Cochran (1977) Ch. 8

Kish (1965) Ch. 5; Hansen, Hurwitz & Madow (1953)

📖

The Design Effect (DEFF) of Stratified Sampling

The design effect of any complex design relative to SRS is defined as DEFF = V(ȳ_complex) / V(ȳ_SRS,same n). For stratified sampling with proportional allocation: DEFF_st,prop = V(ȳ_st,prop) / V(ȳ_SRS) ≤ 1, with equality only when all strata means are identical. This means the effective sample size of a stratified sample is n_eff = n / DEFF_st ≥ n — the stratified sample of size n is statistically equivalent to an SRS of size n/DEFF_st > n. For Neyman allocation, DEFF is even smaller. This is the fundamental argument for stratification: you get "more than n" in statistical efficiency terms (Kish, 1965, pp. 258–260).

When Stratified Sampling Is the Optimal Choice

Condition 01

Substantive Heterogeneity Among Subgroups

When the population contains meaningfully different subgroups — by institution type, geographic region, demographic category, industry sector, or any variable known to correlate with the primary outcome — stratification is the statistically principled design. The larger the difference between stratum means (Ȳ_h − Ȳ), the larger the variance reduction, and the stronger the justification for stratification over SRS.

Condition 02

Mandatory Subgroup Estimates

When the research design requires separate reliable estimates for each subgroup — by regulatory requirement, multi-site mandate, or stakeholder accountability — stratified sampling is the only probability design that guarantees a minimum n_h for each subgroup by design. SRS cannot provide this guarantee: a rare subgroup comprising 5% of the population will yield an expected 0.05n observations, with substantial variance around this expectation.

Condition 03

Known, Stable Stratum Membership

Stratification is most effective — and its theoretical guarantees hold most cleanly — when stratum membership is known from reliable administrative records, is stable over the study period, and is verifiable for every element in the frame. Administrative databases in healthcare, education, and government contexts often provide exactly this, making them ideal settings for stratified sampling designs.

Condition 04

Cost Differentials Across Subgroups

When data collection costs vary substantially across subgroups — e.g., rural versus urban respondents, inpatient versus outpatient clinical records, international versus domestic firms — cost-optimal allocation makes stratified sampling uniquely able to minimise total survey cost for a target precision. No other common probability design incorporates cost differentials directly into the allocation formula.

Section 06 — Procedural Guide

Implementation Protocol for Doctoral Research

Rigorous implementation of stratified random sampling requires explicit documentation of every methodologically consequential decision: the stratification variable, the source and accuracy of stratum size data, the allocation method and its justification, and the within-stratum randomisation procedure. The following seven-step protocol meets the reporting standards of APA 7th Edition, STROBE, and CONSORT-equivalent guidelines.

Define Population, Frame & Strata

Write inclusion/exclusion criteria. Obtain the frame. Identify the stratification variable and confirm it is recorded for every element. Document frame source, date, and N_h per stratum.

Audit Frame for Errors

Check for duplicate records, elements belonging to multiple strata, and elements missing stratum classification. Resolve ambiguities using pre-specified rules before any sampling occurs.

Determine n and Allocation

Compute total n using Cochran's formula. Select allocation method: proportional (EPSEM), Neyman (minimum variance), or cost-optimal. Enforce n_h ≥ 2 in all strata. Document choice with justification.

Assign Sequential IDs Within Strata

Number elements 1 to N_h within each stratum. The numbering is the within-stratum frame. The ordering within each stratum does not affect the stratified estimator (SRS within each stratum is used).

Draw Independent SRS Within Each Stratum

For each stratum h, use a validated PRNG with a documented seed to select n_h units without replacement from the N_h elements. Record seeds, software version, and the full list of selected IDs per stratum.

Contact, Collect & Handle Non-Response

Apply the pre-specified non-response protocol. Record response and refusal outcomes per stratum using AAPOR disposition codes. Compute within-stratum response rates. Apply non-response weighting adjustments if warranted.

Estimate & Report

Compute ȳ_st = ΣW_h ȳ_h. Compute v̂(ȳ_st) = ΣW_h²(1−f_h)s_h²/n_h. Report allocation weights, response rates, and variance estimator — not SRS SE as proxy.

Allocation Method Selection Guide

Research Condition	Recommended Allocation	Key Justification	Weighting Required?
No strong prior knowledge of S_h; equal strata importance	Proportional (Bowley)	EPSEM; self-weighting; always ≤ V(ȳ_SRS); simple to communicate	No — self-weighting
S_h known or estimable from pilot; single primary outcome	Neyman Optimal	Minimises V(ȳ_st) for fixed n; substantial efficiency gain when S_h vary	Yes — W_h/f_h weights needed
Per-unit costs differ across strata; budget constraint binding	Cost-Optimal	Minimises cost for target precision; practically important in multi-site and international studies	Yes — W_h/f_h weights needed
Guaranteed minimum precision per domain required	Equal or minimum-n_h constrained	Ensures each domain estimate meets precision target regardless of stratum size	Yes — disproportionate allocation
S_h completely unknown; pilot data unavailable	Proportional (safe default)	Never worse than SRS; requires no prior variance estimates; robust to misspecification	No — self-weighting

Computing S_h for Neyman Allocation

⚠️

The S_h Estimation Challenge in Practice

Neyman allocation requires S_h — the within-stratum population standard deviation — before the sample is drawn. In practice, S_h is never known exactly and must be estimated. Four approaches are in common use: (1) Pilot survey: A small preliminary sample (n_h,pilot ≈ 20–30 per stratum) drawn before the main survey provides S_h estimates. Most reliable but adds time and cost. (2) Prior survey data: Within-stratum variances from a previous survey of the same or similar population. Valid when the population is stable. (3) Administrative records: When the outcome variable (or a proxy) is available for all N_h elements, the exact S_h can be computed. Increasingly feasible with linked administrative datasets. (4) Range estimation: For bounded variables, S_h ≈ Range_h/4 (approximately). Crude but sometimes sufficient for allocation planning. Cochran (1977, pp. 105–106) notes that moderately inaccurate S_h estimates still yield near-optimal allocations because the loss of efficiency from imprecise S_h estimates is generally small when the ratio of largest to smallest S_h is less than 3:1.

Reporting Requirements for Stratified Sampling in Peer-Reviewed Research

(a) Stratification variable: Identify the variable(s) used to define strata; justify its relevance as a correlate of the primary outcome variable; document the source, date, and accuracy of the stratum membership data.

(b) Stratum sizes: Report N, H, and all N_h values; document the administrative source from which N_h were obtained; note whether N_h were exact or estimated, and if estimated, describe the estimation method.

(c) Allocation method: Name the allocation method (proportional, Neyman, equal, or cost-optimal); provide the computed n_h for each stratum; justify the allocation choice in terms of the research objectives (precision per domain? minimum variance? cost efficiency?).

(d) Within-stratum randomisation: Specify the software package and version used; report the random seed(s) used for within-stratum selection; confirm SRS without replacement was applied within each stratum.

(e) Sampling weights: Identify all post-sampling weight adjustments, including the base weights W_h/f_h, non-response adjustments, and any raking or calibration weights applied. Provide the weight variable in the dataset with documentation in the codebook.

(f) Variance estimation: Explicitly state that the stratified variance formula v̂(ȳ_st) = ΣW_h²(1−f_h)s_h²/n_h was used; name the survey analysis software (e.g., R survey package, Stata svyset, SAS PROC SURVEYMEANS, SPSS Complex Samples). Do not report standard errors computed assuming SRS — this is a prevalent and consequential error in applied research.

(g) Non-response by stratum: Report within-stratum response rates per AAPOR standards; document the non-response protocol; describe any non-response weighting applied and the assumptions underpinning it.

Survey Software Commands for Stratified Design Specification

Software	Design Specification Syntax	Notes
R (survey package)	svydesign(ids=~1, strata=~stratum_var, weights=~wt, fpc=~N_h, data=df)	Lumley (2010); svymean(), svytotal() for estimates
Stata	svyset [pweight=wt], strata(stratum_var) fpc(N_h)	Then: svy: mean outcome_var; svy: proportion categorical_var
SAS	PROC SURVEYMEANS DATA=df STRATA stratum_var; WEIGHT wt; TOTAL N_h_var;	Outputs stratified estimates, SE, and CL automatically
SPSS Complex Samples	CSPLAN … STRATA stratum_var / INCLPROB wt_var.	Followed by CSDESCRIPTIVES or CSSELECT for analysis
Python (samplics)	TaylorEstimator(param="mean").estimate(y, strat=strat_var, samp_weight=wt)	samplics library; supports Taylor linearisation for variance estimation

Section 07 — Knowledge Assessment

Doctoral-Level Self-Assessment

These questions require application of theoretical and mathematical concepts, not rote recall. Questions are calibrated to doctoral comprehensive examination standard and emphasise the properties of stratified sampling that distinguish it from SRS, systematic sampling, and cluster designs.

Self-Assessment Quiz — Stratified Random Sampling

Select the best answer for each item, then submit for scored feedback.

Question 01 of 06

A researcher stratifies a population of N = 1,200 into three strata: N₁ = 600, N₂ = 360, N₃ = 240. She selects n = 120 using proportional allocation. What are n₁, n₂, n₃, and what is the inclusion probability πᵢ for every element in the population?

An₁ = 40, n₂ = 40, n₃ = 40; πᵢ = 1/3 for all elements.

Bn₁ = 60, n₂ = 36, n₃ = 24; πᵢ = 120/1200 = 0.10 for every element.

Cn₁ = 20, n₂ = 33, n₃ = 50; πᵢ differs by stratum.

DProportional allocation cannot produce integer n_h values with these stratum sizes; rounding introduces non-EPSEM deviations.

Question 02 of 06

A researcher computes Neyman optimal allocation for a study with H = 2 strata. Stratum 1: N₁ = 800, S₁ = 5. Stratum 2: N₂ = 200, S₂ = 25. Total n = 100. What is the Neyman allocation, and what does the result reveal about the relationship between stratum size and stratum variance in driving allocation?

An₁ = 80, n₂ = 20 — proportional to stratum size only.

Bn₁ ≈ 44, n₂ ≈ 56 — despite Stratum 2 being four times smaller, its much higher variance (S₂ = 25) results in it receiving the larger allocation under Neyman optimisation.

Cn₁ = 50, n₂ = 50 — Neyman allocation defaults to equal allocation when strata sizes differ dramatically.

Dn₁ = 0, n₂ = 100 — Neyman allocation assigns all observations to the highest-variance stratum.

Question 03 of 06

A thesis committee member argues that stratified sampling "always produces a smaller standard error than SRS" and therefore should always be preferred. A doctoral candidate challenges this claim. Who is correct, and under what precise condition is the claim false?

AThe committee member is correct — stratified sampling with any allocation method always produces SE ≤ SE(SRS).

BThe doctoral candidate is correct. The efficiency gain equals (1/n)·Σ W_h(Ȳ_h − Ȳ)² — which is zero when stratum means are identical. Disproportionate allocation can produce higher variance than SRS for outcomes unrelated to the stratification variable.

CBoth are partially correct — stratification always helps for large samples (n > 100) but may hurt for small samples.

DThe doctoral candidate is correct only if H is very large — with many fine strata, the design becomes equivalent to SRS and precision is lost.

Question 04 of 06

A researcher reports a survey result with a standard error calculated using the formula SE = √(s²/n), as if the data were from an SRS, despite the fact that the data were collected using stratified sampling with disproportionate allocation and sampling weights ranging from 1.2 to 4.8. What is the nature and likely direction of this error?

AThis is a minor procedural error with no systematic directional bias — the SRS SE may be slightly too high or slightly too low depending on chance.

BThis is a substantial error. The SRS formula ignores stratum weights and within-stratum variance structure. With disproportionate allocation and weights of 1.2–4.8, the unweighted s² is a biased measure of population variance. The SE is likely misestimated — typically underestimated when domain oversampling inflates raw variance.

CThis produces an overestimated SE — stratified sampling always produces smaller variance than SRS, so the SRS formula is always conservative.

DThe SE itself is correct; the error only affects the point estimate ȳ_st which should have been weighted before reporting.

Question 05 of 06

A researcher conducting a national study of school performance stratifies schools by ownership type (public, private non-profit, private for-profit) and geographic region (urban, rural) — yielding H = 6 strata. After data collection, she finds that one stratum (rural private for-profit) has only n_h = 1 unit. How should she address the variance estimation problem, and what does this situation imply for the study design?

ASet s_h² = 0 for the single-unit stratum — one unit cannot contribute to variance, so this stratum is excluded from the SE calculation.

BApply the collapsed stratum method — merge the single-unit stratum with a substantively similar adjacent stratum; compute variance from the pooled units. This produces a conservative estimate. The design implication is that n_h ≥ 2 should have been enforced as a hard constraint in the allocation phase.

CRemove the stratum from the analysis entirely and re-weight the remaining strata to sum to 1 — the single-unit stratum is too small to matter for the overall estimate.

DSelect one additional unit from the rural private for-profit stratum as a nearest-neighbour substitute and treat n_h = 2 as if it had been planned from the start.

Question 06 of 06

A researcher uses post-stratification on a large SRS to adjust her achieved sample to match known population proportions by age group. She argues that post-stratification produces estimates with exactly the same variance as a pre-planned stratified sample of the same total size. Is this claim correct?

AThe claim is incorrect. Post-stratification adds a variance component due to random variation in realised cell counts n_h — which are fixed by design in pre-planned stratification. The post-stratified estimator has slightly larger variance, though the difference is small for large samples.

BThe claim is correct — post-stratification and pre-planned stratification have identical variance when total n is the same.

CThe claim is moot — post-stratification on an SRS converts the design to a non-probability sample and should not be used.

DThe claim is incorrect — post-stratification always produces larger variance than both SRS and pre-planned stratification because the weights introduce additional variability.

—

Section 08 — Scholarly References

Primary Scholarly References

All content in this reference is grounded in peer-reviewed foundational literature in survey sampling methodology. References are formatted per APA 7th Edition.

Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97(4), 558–625. [The foundational derivation of optimal (Neyman) allocation; establishes the probabilistic framework for stratified sampling and proves optimality over proportional allocation when stratum variances differ.]
Cochran, W. G. (1977). Sampling techniques (3rd ed.). John Wiley & Sons. [Chapters 5 and 6 provide the definitive doctoral-level treatment of stratified sampling theory, all allocation methods, variance formulas, the gain from stratification, post-stratification, and comparison with SRS and systematic designs.]
Kish, L. (1965). Survey sampling. John Wiley & Sons. [Chapter 3 covers stratified sampling design, EPSEM conditions, design effects, and the efficiency comparison framework between stratified and other probability designs.]
Bowley, A. L. (1926). Measurements of the precision attained in sampling. Bulletin de l'Institut International de Statistique, 22, 1–62. [The originating paper for proportional allocation — establishes the equivalence between the stratified sample fraction and the population fraction within each stratum, laying the groundwork for Neyman's later optimal allocation proof.]
Lohr, S. L. (2010). Sampling: Design and analysis (2nd ed.). Brooks/Cole. [Chapters 3–4 provide a rigorous and accessible treatment of stratified sampling, post-stratification, collapsing strata, and software implementation of complex stratified designs in R.]
Wolter, K. M. (2007). Introduction to variance estimation (2nd ed.). Springer. [Chapters 3 and 6 cover variance estimation in stratified designs, the collapsed stratum estimator for sparse strata, jackknife and balanced repeated replication methods as alternatives to Taylor linearisation.]
Groves, R. M., Fowler, F. J., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2009). Survey methodology (2nd ed.). John Wiley & Sons. [Total survey error framework applied to stratified designs; non-response within strata; frame coverage error and stratum membership misclassification; post-stratification calibration weighting.]
Hansen, M. H., Hurwitz, W. N., & Madow, W. G. (1953). Sample survey methods and theory (Vols. 1–2). John Wiley & Sons. [Comprehensive design-based inference framework; mathematical proofs of the variance inequality chain for equal, proportional, and optimal allocation in the finite-population context.]
Dalenius, T., & Hodges, J. L. (1959). Minimum variance stratification. Journal of the American Statistical Association, 54(285), 88–101. [Establishes the cumulative square root frequency rule for optimal stratum boundary determination when the population distribution of the stratification variable is known or estimable.]
Särndal, C-E., Swensson, B., & Wretman, J. (1992). Model assisted survey sampling. Springer. [Sections 3.5–3.7 provide advanced model-assisted theory for stratified designs; the GREG estimator under stratification; calibration estimators as extensions of post-stratification.]
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). John Wiley & Sons. [MCAR/MAR/MNAR taxonomy applied to within-stratum non-response; multiple imputation and maximum likelihood methods for non-response adjustment in stratified samples.]
Lumley, T. (2010). Complex surveys: A guide to analysis using R. Wiley. [Practical implementation of stratified sampling designs using R's survey package; svydesign(), svymean(), svytotal(), and domain estimation with strata specification; variance estimation via Taylor linearisation.]
American Association for Public Opinion Research (AAPOR). (2016). Standard definitions: Final dispositions of case codes and outcome rates for surveys (9th ed.). AAPOR. [Mandatory reference for within-stratum response rate computation and reporting; non-response classification codes applicable to stratified survey designs.]

📚