Simple Random Sampling: A Doctoral-Level Reference

Section 01 — Foundational Theory

Epistemological & Theoretical Foundations

Simple Random Sampling (SRS) constitutes the purest form of probability sampling, wherein every element in the sampling frame possesses an equal, non-zero, and calculable probability of selection.

Formal Definition

Simple Random Sampling is a probabilistic selection method in which a sample of n units is drawn from a finite population of N units such that every possible sample of size n has an equal probability of selection, specifically 1/C(N,n), where C(N,n) is the number of possible combinations.

— Cochran, W.G. (1977). Sampling Techniques (3rd ed.). John Wiley & Sons, p. 18.

Historical Lineage

The formal mathematical treatment of SRS emerges from the foundational work of Jerzy Neyman (1934), whose seminal paper "On the Two Different Aspects of the Representative Method" established the probabilistic basis of modern sampling theory. Neyman distinguished between purposive (quota) sampling and random sampling, demonstrating that only the latter permits legitimate inferential statements about population parameters. This epistemological distinction—between descriptive adequacy and inferential validity—remains central to doctoral-level research methodology today.

W.G. Cochran's (1977) comprehensive formalization and F. Yates's earlier contributions in agricultural experimentation further cemented SRS within the classical frequentist paradigm. The method's design-based inferential logic—where randomness resides in the selection mechanism rather than in any assumed probability model for the outcome variable—distinguishes it from model-based approaches (Royall, 1970).

Two Varieties: With vs. Without Replacement

SRS-WOR · Without Replacement

Simple Random Sampling Without Replacement

Each element may be selected at most once. This is the standard form in social, behavioural, and health sciences research. Each draw reduces the available pool, meaning successive draws are not independent — however, the sample remains unbiased. Finite population correction (FPC) applies. The vast majority of applied survey research employs SRSWOR.

SRS-WR · With Replacement

Simple Random Sampling With Replacement

Elements are returned to the population before the next draw; identical units may appear multiple times in the sample. Each draw is statistically independent (identically distributed draws). While less common in practice, SRSWR simplifies variance derivation and forms the basis of bootstrapping and other resampling methods (Efron & Tibshirani, 1993).

EPSEM: The Core Principle

📐

Equal Probability of Selection Method (EPSEM)

SRS is the canonical EPSEM design. Kish (1965) formalized EPSEM as a design property ensuring self-weighting samples, where each sampled unit represents the same number of population units. In SRSWOR, P(element i selected) = n/N for all i ∈ {1,…,N}. The sample mean is therefore an unbiased estimator of the population mean with no post-stratification weighting required.

Key Properties at a Glance

Property 01

Unbiasedness

The sample mean ȳ is an unbiased estimator of the population mean Ȳ. E(ȳ) = Ȳ for any population distribution, by virtue of the randomisation mechanism alone.

Property 02

Consistency

As n → N, the sample mean converges in probability to the population mean. Variance of the estimator approaches zero as the sampling fraction f = n/N approaches 1.

Property 03

Design-Based Validity

Inferential validity derives from the randomisation mechanism, not from assumptions about the population distribution (Hansen, Hurwitz & Madow, 1953).

Property 04

Reproducibility

With a documented random seed, the selection process is fully replicable — a critical requirement for peer-reviewed research transparency (AAPOR, 2016).

Property 05

Minimum Variance

Among all unbiased linear estimators for a given n under design-based inference, the Horvitz–Thompson estimator under SRSWOR achieves the Cramér–Rao lower bound for variance.

Property 06

CLT Applicability

The Central Limit Theorem guarantees approximate normality of the sampling distribution of ȳ for sufficiently large n, enabling parametric confidence interval construction regardless of the population shape.

Section 02 — Mathematical Theory

Probability, Estimation & Variance Theory

SRS rests on a rigorous mathematical framework encompassing combinatorial probability, unbiased estimation, and variance decomposition. Mastery of these derivations is essential for doctoral-level critical appraisal of sampling designs.

1. Selection Probability

For a finite population of N elements, the number of distinct samples of size n (without replacement) is the binomial coefficient C(N,n). Under SRS, each such sample is equally probable:

Probability of Any Specific Sample

P(S) = 1 / C(N, n) = (n! · (N-n)!) / N!

P(S) = probability of selecting a specific sample S of size n
N = total population size
n = required sample size
C(N,n) = "N choose n" — number of possible combinations

The marginal (first-order inclusion) probability that a specific element i is included in the sample is:

First-Order Inclusion Probability (SRSWOR)

πᵢ = P(i ∈ s) = n / N for all i = 1, 2, …, N

πᵢ = marginal inclusion probability for element i
n = sample size · N = population size
Crucially, πᵢ is identical for all elements — the EPSEM property.

2. The Horvitz–Thompson Estimator

The general Horvitz–Thompson (H-T) estimator of the population total T is:

Horvitz–Thompson Estimator of Population Total

T̂ₕₜ = Σᵢ∈ₛ (yᵢ / πᵢ) = (N/n) · Σᵢ∈ₛ yᵢ

Under SRSWOR: πᵢ = n/N, so the H-T estimator simplifies to N·ȳ
yᵢ = observed value for sampled element i
This estimator is unbiased: E(T̂ₕₜ) = T

3. Sample Mean & Unbiasedness Proof

Unbiased Estimator of Population Mean

ȳ = (1/n) · Σᵢ∈ₛ yᵢ

E(ȳ) = Ȳ = (1/N) · Σᵢ₌₁ᴺ Yᵢ ✓

ȳ = sample mean (estimator)
Ȳ = population mean (target parameter)
Proof: E(ȳ) = E[(1/n)Σyᵢ] = (1/n)·n·Ȳ = Ȳ, using the fact that E(yᵢ) = Ȳ for all i in the sample.

4. Variance of the Sample Mean

Variance of ȳ — SRSWOR

V(ȳ) = (1 - f) · (S²/n)

f = n/N = sampling fraction
(1 - f) = Finite Population Correction (FPC) factor
S² = population variance = [1/(N-1)] · Σ(Yᵢ - Ȳ)²
n = sample size
As N→∞ or f→0: V(ȳ) → σ²/n (the familiar simple variance formula)

⚠️

When to Apply the Finite Population Correction (FPC)

The FPC factor (1 - f) is theoretically required whenever sampling is without replacement from a finite population. In practice, the correction is negligible when f ≤ 0.05 (i.e., the sample constitutes less than 5% of the population). When f exceeds this threshold — common in organizational studies, census sub-studies, or small-population clinical trials — ignoring FPC produces systematically inflated standard errors and overly conservative confidence intervals. Cochran (1977, p. 25) provides the formal derivation.

5. Estimated Variance (Unknown S²)

Since the population variance S² is unknown in practice, it is estimated from the sample:

Estimated Variance of the Sample Mean

v̂(ȳ) = (1 - f) · (s²/n)

s² = [1/(n-1)] · Σᵢ∈ₛ (yᵢ - ȳ)²

s² = sample variance (unbiased estimator of S²)
E(s²) = S² — the denominator (n-1) corrects for Bessel's bias
SE(ȳ) = √v̂(ȳ) = standard error of the mean

6. Confidence Interval Construction

95% Confidence Interval for Population Mean

CI: ȳ ± z_{α/2} · SE(ȳ)

= ȳ ± 1.96 · √[(1 - n/N) · (s²/n)]

z_{α/2} = 1.96 for α = 0.05 (large n, CLT applies)
For small n: substitute t_{α/2, n-1} (Student's t-distribution)
The 95% CI has the design-based interpretation: in repeated sampling from the same frame, 95% of such intervals will contain Ȳ.

7. Optimal Sample Size Determination

Cochran's (1977) Sample Size Formula

n₀ = (z²_{α/2} · p · q) / e²

n = n₀ / (1 + (n₀ - 1)/N)

p = estimated proportion (use 0.5 for maximum conservatism)
q = 1 - p · e = desired margin of error
n = finite-population corrected final sample size
Example: N=10,000, p=0.5, e=0.05, z=1.96 → n₀≈384 → n≈370

Section 03 — Interactive Learning Tool

Simple Random Sampling Simulator

Engage directly with the sampling mechanism. Adjust population size (N), sample size (n), and observe selection probabilities, the sampling distribution of the mean, and the effect of the FPC.

SRS Monte Carlo Simulator

Visualises selection mechanics and the Central Limit Theorem

Population Size (N) 42

Sample Size (n) 10

Simulation Runs 100

Population Units (N = 42) — highlighted = selected into sample

42 Population N

10 Sample n

23.8% Inclusion Prob. πᵢ = n/N

0.762 FPC Factor (1 − f)

— Sample Mean ȳ

— Std. Error SE(ȳ)

🔬

What the Simulator Demonstrates

Draw Sample: Animates the random selection of n units from N, illustrating the uniform inclusion probability πᵢ = n/N for every population element. Each element's identification number serves as its value yᵢ.

Run Simulation: Executes the specified number of independent samples and builds the empirical sampling distribution of ȳ, directly demonstrating the Central Limit Theorem — regardless of the rectangular population distribution, the distribution of sample means approximates normality as the number of draws increases.

Section 04 — Critical Evaluation

Assumptions, Conditions & Limitations

Doctoral-level engagement with SRS requires critical evaluation of its conditions of applicability. Uncritical deployment of SRS without examining underlying assumptions constitutes a methodological error.

Formal Assumptions

Assumption	Technical Statement	Violation Consequence	Diagnostic / Remedy
Complete Sampling Frame	Every population element must be listed and accessible in the frame	Coverage bias; non-representativeness; exclusion of unlisted subgroups	Frame audit; dual-frame designs (Hartley, 1962)
Equal Inclusion Probability	πᵢ = n/N for all i — no element privileged or excluded	Biased H-T estimator; invalid standard errors	Verify frame completeness; use PPS sampling if needed
Independence of Selection	SRSWR: draws are i.i.d.; SRSWOR: controlled dependence via FPC	Under-estimated variance if clustering ignored	Design effect (DEFF) calculation; complex survey SE methods
Finite Population	N must be known and fixed; super-population model not assumed	FPC miscalculation; over/under-coverage	Census or registry enumeration; model-assisted estimation
Non-zero Response Rate	Selected units must respond / be measurable	Non-response bias if missingness is non-random (MNAR)	Propensity weighting; multiple imputation (Little & Rubin, 2002)
True Randomisation	Selection must use a validated random mechanism (PRNG or table)	Selection bias; pseudo-random artifacts	Cryptographically secure PRNG (NIST SP 800-90A)

Frequently Cited Limitations in the Literature

SRS is only as valid as its sampling frame. Groves et al. (2009) distinguish between the target population (conceptually defined) and the frame population (operationally accessible). Any discrepancy between these two — termed coverage error — produces systematic bias that cannot be corrected through any design-based estimator.

Common frame deficiencies include: undercoverage (unlisted elements, e.g., homeless populations in household surveys), overcoverage (duplicate entries, out-of-scope units), and clustering (groups of elements represented by a single entry). Doctoral researchers must report frame construction procedures explicitly and assess coverage error magnitude.

SRS implicitly treats the population as homogeneous with respect to variability. When the population contains distinct subgroups (strata) with markedly different means or variances, SRS may yield disproportionate representation by chance. Neyman (1934) demonstrated that optimal allocation in stratified random sampling can reduce variance by up to 40% compared to SRS under conditions of between-stratum heterogeneity.

For example, in a study of household income across urban and rural regions with vastly different income distributions, SRS risks severely under-representing one stratum. Stratified random sampling with proportional or optimal allocation (Neyman allocation) is the methodologically superior choice in such scenarios.

SRS ignores geographical structure entirely. When selected units are dispersed across a large geographic area, data collection costs escalate dramatically. Cluster sampling — grouping population elements into natural clusters and selecting clusters randomly — trades increased variance (measured by the design effect, DEFF = 1 + (b̄ - 1)ρ, where ρ is the intracluster correlation coefficient and b̄ is the average cluster size) for dramatically reduced travel and administrative costs. Kish (1965) provides the definitive treatment of this trade-off.

When the research interest lies in estimating parameters for a rare subgroup (e.g., a 2% prevalence condition), SRS requires an enormously large total sample to produce adequate domain estimates. If the target subgroup prevalence is P = 0.02 and a minimum domain sample of n_d = 100 is required, the total SRS sample must be approximately n = 100/0.02 = 5,000. Oversampling designs, disproportionate stratification, or targeted sampling methods (Watters & Biernacki, 1989) are more cost-efficient in such contexts.

Randomisation of the selection mechanism does not protect against bias introduced by differential non-response. If non-response is related to the outcome variable — Missing Not At Random (MNAR) in the terminology of Rubin (1976) — the effective realised sample is no longer a probability sample in the strict sense. The response propensity model (Rosenbaum & Rubin, 1983) offers a partial remedy, but the assumption of ignorable non-response remains untestable without auxiliary data. This is one of the most consequential practical limitations of SRS in field research contexts.

Section 05 — Comparative Analysis

SRS versus Other Probability Sampling Designs

No single sampling design is universally optimal. The appropriate design is determined by population structure, research objectives, budgetary constraints, and acceptable levels of design complexity.

Criterion

SRS

Stratified RS

Cluster S.

Systematic S.

Requires Full Frame

YES

Cluster list only

YES

Statistical Efficiency

Baseline

Higher (if strata homogeneous)

Lower (DEFF > 1)

Equal or higher (periodic populations)

Subgroup Analysis

Poor (rare groups)

Excellent (proportional/optimal alloc.)

Feasible

Limited

Cost / Logistics

Moderate–High

Low (geographically clustered)

Moderate

Design-Based Validity

Full

Full (with DEFF)

Full (if periodic ok)

Variance Estimation

Simple closed-form

Moderately complex

Complex (sandwich/BRR/Jackknife)

Conservative (requires assumptions)

Best Used When

Homogeneous population, accessible frame, sufficient budget

Known heterogeneous subgroups exist

No complete frame; geographic dispersion

Ordered frame exists; periodicity acceptable

Foundational Reference

Cochran (1977)

Neyman (1934)

Kish (1965)

Madow & Madow (1944)

📖

The Design Effect (DEFF) as a Comparative Metric

The design effect, defined by Kish (1965) as DEFF = V_design(ȳ) / V_SRS(ȳ), quantifies the variance inflation (or deflation) of an alternative design relative to SRS. DEFF > 1 indicates that the alternative design is less efficient than SRS (e.g., cluster sampling with high intracluster correlation). DEFF < 1 indicates superior efficiency (e.g., stratified sampling with high between-stratum variance). The effective sample size is n_eff = n / DEFF. All published survey analyses should report DEFF to enable cross-study comparison and meta-analytic integration.

When SRS is the Optimal Choice

Condition 01

Complete, Accurate Frame Available

SRS is most appropriate when a comprehensive, up-to-date, unduplicated sampling frame exists — e.g., a student registry, employee database, or national health record system — with full coverage of the target population.

Condition 02

Population Relatively Homogeneous

When within-population variance on the key outcome variable is relatively uniform (low between-subgroup heterogeneity), SRS achieves near-optimal efficiency and stratification yields minimal gains.

Condition 03

No Domain Estimation Required

When research objectives concern the overall population mean or total — not subgroup-specific estimates — SRS provides unbiased estimation with minimal design complexity.

Condition 04

Theoretical Baseline Required

In methods research, pilot studies, and simulation studies evaluating estimator properties, SRS serves as the canonical design-based benchmark against which alternative procedures are evaluated.

Section 06 — Procedural Guide

Implementation Protocol for Doctoral Research

Rigorous implementation of SRS requires systematic adherence to a documented protocol. Each step must be reported with sufficient detail to satisfy peer-review and research ethics board requirements.

Define Target Population

Specify inclusion and exclusion criteria with precision. Ambiguous boundaries cause coverage error.

Construct Sampling Frame

Enumerate all N population elements. Audit for duplicates, out-of-scope entries, and undercoverage.

Determine Sample Size n

Apply Cochran's formula. Specify α, desired power, expected effect size, and margin of error.

Assign Unique IDs

Assign sequential identifiers (1 to N) to each population element. Document the assignment.

Generate Random Numbers

Use a validated PRNG or random number table. Record the seed for reproducibility and audit trail.

Select & Contact Units

Select the n elements corresponding to generated numbers. Document all contact attempts.

Randomisation Tools: Standards and Best Practices

Method	Standard	Acceptable For	Reproducibility
Cryptographic PRNG	NIST SP 800-90A (e.g., CTR_DRBG)	All research levels, including RCTs	Seed-dependent — record seed
R: sample() / set.seed()	Mersenne Twister (MT19937)	Academic / social science research	Full reproducibility with set.seed()
Python: random.sample() / numpy	PCG-64 (NumPy ≥1.17) or MT19937	Computational and applied research	numpy.random.default_rng(seed)
SPSS: Random Cases	Wichmann–Hill algorithm	Behavioural sciences	Seed via SET SEED command
Physical Random Number Table	Rand Corporation (1955) Million Random Digits	Historical benchmark; small N studies	Document start row and column

Reporting Requirements (APA 7th / CONSORT / STROBE)

(a) Population and frame: Full description of target population, frame source, frame date, coverage rate estimate, and any known frame deficiencies.

(b) Sample size determination: Report n, the formula used (Cochran or equivalent), assumed parameter values (α, power, effect size or margin of error), and any adjustment for anticipated non-response or design effect.

(c) Randomisation procedure: Software/algorithm used, version number, and random seed. State explicitly whether sampling was with or without replacement.

(d) Response rate: Report final response rate per AAPOR Response Rate Definitions. Provide comparison of respondent vs. non-respondent characteristics on available auxiliary variables.

(e) Variance estimation: State whether FPC was applied and justify the decision. Report standard errors, not merely standard deviations.

Section 07 — Knowledge Assessment

Doctoral-Level Self-Assessment

These questions require application of theoretical concepts, not rote recall. Questions are calibrated to doctoral comprehensive examination standard.

Self-Assessment Quiz — Simple Random Sampling

Select the best answer for each item, then submit for feedback.

Question 01 of 06

In a SRSWOR design with N = 500 and n = 50, a researcher omits the finite population correction factor when calculating the standard error. What is the direction and magnitude of the resulting error?

Question 02 of 06

A researcher uses SRS to study employees' job satisfaction in an organisation with 1,200 employees across three departments: 800 in Operations, 300 in Finance, and 100 in HR. The sample of n=120 by chance yields 98 from Operations, 18 from Finance, and 4 from HR. Which methodological concern is MOST pressing?

Question 03 of 06

Which of the following constitutes a violation of the EPSEM principle in a nominally SRS design?

Question 04 of 06

A researcher reports a 95% CI for ȳ as [42.3, 51.7] based on SRS. What is the correct design-based interpretation?

Question 05 of 06

Using Cochran's (1977) formula with p=0.5, e=0.05, α=0.05, what adjusted sample size n is needed from a population of N = 800?

Question 06 of 06

A survey achieves 60% response rate (n_r = 240 of n = 400 selected). The researcher compares respondents vs. non-respondents on administrative records and finds no significant difference on key covariates. What is the MOST defensible conclusion?

—

Section 08 — Scholarly References

Primary Scholarly References

All content in this resource is grounded in peer-reviewed foundational literature. References are formatted per APA 7th Edition.

Cochran, W. G. (1977). Sampling techniques (3rd ed.). John Wiley & Sons. [The definitive doctoral-level reference for SRS theory, variance derivation, and sample size determination.]
Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97(4), 558–625. [Foundational paper establishing probability sampling on rigorous mathematical grounds.]
Kish, L. (1965). Survey sampling. John Wiley & Sons. [Classic treatment of EPSEM, design effects, cluster sampling, and complex survey analysis.]
Lohr, S. L. (2010). Sampling: Design and analysis (2nd ed.). Brooks/Cole. [Contemporary graduate-level survey sampling textbook; recommended for variance estimation chapters.]
Groves, R. M., Fowler, F. J., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2009). Survey methodology (2nd ed.). John Wiley & Sons. [Authoritative reference on non-response bias, coverage error, and total survey error.]
Horvitz, D. G., & Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47(260), 663–685. [Original H-T estimator derivation; foundational for weighted estimation under unequal probability sampling.]
Hansen, M. H., Hurwitz, W. N., & Madow, W. G. (1953). Sample survey methods and theory (Vols. 1–2). John Wiley & Sons. [Comprehensive two-volume reference; essential for design-based inference theory.]
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. Chapman & Hall. [Establishes SRSWR as the basis of bootstrap resampling; relevant for variance estimation via Monte Carlo methods.]
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). John Wiley & Sons. [Rubin's MCAR/MAR/MNAR taxonomy and its implications for non-response in SRS surveys.]
Royall, R. M. (1970). On finite population sampling theory under certain linear regression models. Biometrika, 57(2), 377–387. [Model-based vs. design-based inference debate; critical for doctoral-level epistemological positioning.]
American Association for Public Opinion Research (AAPOR). (2016). Standard definitions: Final dispositions of case codes and outcome rates for surveys (9th ed.). AAPOR. [Mandatory reference for response rate reporting standards.]
Madow, W. G., & Madow, L. H. (1944). On the theory of systematic sampling. Annals of Mathematical Statistics, 15(1), 1–24. [Foundational paper on systematic sampling as an alternative to SRS.]

📚

Epistemological & Theoretical Foundations

Historical Lineage

Two Varieties: With vs. Without Replacement

Simple Random Sampling Without Replacement

Simple Random Sampling With Replacement

EPSEM: The Core Principle

Equal Probability of Selection Method (EPSEM)

Key Properties at a Glance

Unbiasedness

Consistency

Design-Based Validity

Reproducibility

Minimum Variance

CLT Applicability

Probability, Estimation & Variance Theory

1. Selection Probability

2. The Horvitz–Thompson Estimator

3. Sample Mean & Unbiasedness Proof

4. Variance of the Sample Mean

When to Apply the Finite Population Correction (FPC)

5. Estimated Variance (Unknown S²)

6. Confidence Interval Construction

7. Optimal Sample Size Determination

Simple Random Sampling Simulator

SRS Monte Carlo Simulator

What the Simulator Demonstrates

Assumptions, Conditions & Limitations

Formal Assumptions

Frequently Cited Limitations in the Literature

SRS versus Other Probability Sampling Designs

The Design Effect (DEFF) as a Comparative Metric

When SRS is the Optimal Choice

Complete, Accurate Frame Available

Population Relatively Homogeneous

No Domain Estimation Required

Theoretical Baseline Required

Implementation Protocol for Doctoral Research

Define Target Population

Construct Sampling Frame

Determine Sample Size n

Assign Unique IDs

Generate Random Numbers

Select & Contact Units

Randomisation Tools: Standards and Best Practices

Reporting Requirements (APA 7th / CONSORT / STROBE)

Doctoral-Level Self-Assessment

Self-Assessment Quiz — Simple Random Sampling

Primary Scholarly References

Recommended Further Reading for Doctoral Candidates