What is Sampling? Foundational Concepts
Sampling is the process of selecting a subset of individuals, units, or observations from a larger population with the objective of drawing inferences about that population. Because it is rarely feasible to study every member of a population, sampling allows researchers to obtain representative data efficiently, economically, and ethically. The validity of any inference drawn from sample data depends on two conditions: the sample must be sufficiently large to detect the effects of interest, and the sample must be selected in a manner that is representative of the target population.
The sample must be large enough to provide stable estimates and powerful enough to detect meaningful effects, yet small enough to remain practical. Both under-sampling and over-sampling carry costs: under-sampling increases the risk of Type II error; over-sampling wastes resources and may create ethical issues in studies involving human participants.
Core Terminology
| Term | Definition |
|---|---|
| Population (N) | The complete set of individuals or units sharing a defined characteristic that the researcher wishes to study |
| Sample (n) | The subset of the population actually selected for observation or measurement |
| Sampling Frame | A complete list or representation of all elements in the population from which the sample is drawn |
| Margin of Error (e) | The maximum acceptable difference between the sample estimate and the true population value, typically 0.05 for 5% |
| Confidence Level | The probability that the true population parameter falls within the calculated confidence interval, typically 95% |
| Sampling Error | The difference between the sample statistic and the true population parameter, quantified through the standard error |
| Standard Error (SE) | The standard deviation of the sampling distribution, calculated as sigma divided by the square root of n |
Probability Sampling Methods
In probability sampling, every member of the population has a known, non-zero probability of being selected. This property allows researchers to calculate sampling error, construct confidence intervals, and make statistically defensible generalisations from the sample to the population. Probability sampling is the standard requirement for quantitative research with inferential statistical objectives.
Non-Probability Sampling Methods
In non-probability sampling, the probability of selection is unknown or cannot be calculated for each population member. These methods do not allow for the calculation of sampling error, and findings from non-probability samples cannot be statistically generalised to the broader population. They are appropriate for qualitative research, exploratory inquiry, pilot testing, and the study of hard-to-reach populations.
Comparative Overview: All Sampling Methods
| Method | Type | Population Needed | Generalisable | Primary Use | Key Limitation |
|---|---|---|---|---|---|
| Simple Random | Probability | Complete list required | Yes | Quantitative, homogeneous populations | Requires complete sampling frame |
| Systematic Random | Probability | Ordered list required | Yes | Ordered populations, administrative data | Periodicity risk in ordered lists |
| Stratified Random | Probability | Subgroup data required | Yes | Subgroup comparisons, heterogeneous populations | Requires accurate stratum data |
| Cluster | Probability | Cluster list required | Yes (with limitations) | Geographically dispersed populations | Higher sampling error than SRS |
| Multi-Stage | Probability | Hierarchical structure needed | Yes | National or regional surveys | Complex design effect calculations |
| Convenience | Non-Probability | Not required | No | Pilot testing, exploratory studies | High selection bias |
| Purposive | Non-Probability | Not required | No (analytic) | Qualitative, expert sampling | Researcher judgement influences selection |
| Snowball | Non-Probability | Not required | No | Hidden or hard-to-reach populations | Referral and network bias |
| Quota | Non-Probability | Subgroup proportions needed | Limited | Survey research without probability frame | Non-random within-quota selection |
| Theoretical | Non-Probability | Not required | Theoretical | Grounded theory research only | Applicable to one methodology only |
Sample Size Calculators
The four calculators below cover the most widely used sample size formulas in quantitative research. Each returns a mathematically verified result with step-by-step output, APA 7th edition narrative, and a downloadable report. Select the formula appropriate for your study design using the criteria in the table above.
Cochran's Formula: Unknown or very large populations measuring proportions or categories.
Slovin's Formula: Known finite population, straightforward proportion studies, most common in Philippine social science research.
Krejcie-Morgan: Known finite population, alternative to Slovin with a chi-square basis, yields slightly smaller samples.
Continuous Variable: Studies measuring a continuous dependent variable when the population standard deviation is known or estimated from prior research.
Cochran's Formula
Cochran's (1977) formula is the most widely accepted method for determining sample size when the population is large or unknown. For proportional data, the formula is:
With finite population correction: n = n0 / (1 + (n0 − 1) / N)
Slovin's Formula
Slovin's Formula (1960) is a straightforward method for determining the sample size from a known finite population. It is widely used in social science research, particularly in the Philippines and Southeast Asia. The formula assumes a 95% confidence level and uses only the population size and an acceptable margin of error:
Krejcie-Morgan Formula
Krejcie and Morgan (1970) derived sample size requirements from the chi-square distribution for finite populations. This formula underlies the widely cited Krejcie-Morgan table published in Educational and Psychological Measurement. It applies a chi-square value with one degree of freedom at the desired confidence level:
Continuous Variable Sample Size
When the outcome variable is continuous and the population standard deviation is known or can be estimated from prior research or a pilot study, the following formula applies. This method is common in experimental, quasi-experimental, and health science research:
With finite population correction: n = n0 / (1 + (n0 − 1) / N)
Stratified Proportional Allocation
After determining the total sample size using any of the above formulas, researchers using stratified sampling must allocate the total sample proportionally across strata. The proportional allocation formula ensures each stratum's sample is proportional to its share of the population:
where Nh = stratum population, N = total population, n = total sample size
Strata (Name and Population Size)
| Stratum | Population (Nh) | Proportion | Sample (nh) |
|---|
Formula Comparison and Selection Criteria
| Formula | Population Required | Variable Type | Key Inputs | Typical Range (N=1000, e=5%) | Recommended for |
|---|---|---|---|---|---|
| Cochran (1977) | Not required | Proportions | Z, p, e; optional N | 279 (with FPC) / 385 (infinite) | Unknown or very large populations; rigorous quantitative studies |
| Slovin (1960) | Required (N) | Proportions | N, e | 286 | Known finite populations; Philippine social science research |
| Krejcie-Morgan (1970) | Required (N) | Proportions | N, chi-square, P, d | 278 | Educational and psychological research; replicates KM table values |
| Continuous Variable | Optional | Continuous | Z, sigma, e; optional N | Varies by sigma and e | Experimental research with known or estimated population standard deviation |
Attrition adjustment: Add 10 to 20 percent to the calculated n to account for non-response, dropout, or invalid responses. If you expect a 15 percent attrition rate, divide the required n by 0.85.
Slovin vs. Cochran: Slovin's Formula implicitly assumes a proportion of 0.5 and a 95% confidence level. It cannot accommodate different confidence levels without modification. Cochran's Formula is more flexible and statistically transparent.
Pilot study standard deviation: For continuous variable studies, a standard deviation estimated from a pilot study of at least 30 participants is acceptable when population data are unavailable.
Frequently Asked Questions
The two formulas differ in their underlying assumptions. Cochran's Formula requires the researcher to specify the expected population proportion (p) and the exact confidence level through its corresponding Z-value. Slovin's Formula implicitly assumes a population proportion of 0.5 and a 95% confidence level, and it does not incorporate these as explicit parameters. For a population of 1,000 with a 5% margin of error, Cochran's Formula produces 385 for an infinite population and 279 after applying the finite population correction, while Slovin's produces 286. The difference reflects the finite population correction in Cochran's adjusted result.
The finite population correction (FPC) factor is applied when the sample size constitutes more than 5% of the total population. Without correction, Cochran's Formula and the continuous variable formula treat the population as effectively infinite, which overestimates the required sample size for small finite populations. The correction formula is: n (corrected) = n0 divided by (1 + (n0 minus 1) divided by N). For a population of 1,000 where the uncorrected estimate is 385, the corrected sample size is 279, a reduction of 27%. For large populations (N greater than 10,000), the correction has negligible effect and may be omitted.
Use p = 0.5. The product p times q (which equals p times 1 minus p) is maximised when p = 0.5, producing the largest and therefore most conservative sample size estimate. This protects the study against underpowering in situations where the true population proportion is unknown. If prior research or a pilot study provides a reasonable estimate of the proportion, that estimate may be used to yield a smaller, more efficient sample. However, using p = 0.5 is the universally accepted conservative default in the absence of prior data.
Divide the statistically required sample size by the expected response rate. If the required n is 300 and you expect a 20% non-response rate (response rate of 0.80), the adjusted sample to recruit is 300 divided by 0.80, which equals 375. The adjustment ensures that even after accounting for non-responses, invalid questionnaires, or participant dropout, the minimum required sample size is still achieved. The expected non-response rate should be based on similar studies in the literature or documented institutional experience, and must be justified in the methodology chapter.
This is one of the most debated methodological questions in applied research. Statistically, inferential conclusions from non-probability samples cannot be validly generalised to the broader population because the probability of inclusion is unknown, making sampling error incalculable. However, in practice, many quantitative studies use non-probability sampling due to constraints of access, resources, or population structure. When this occurs, researchers must explicitly acknowledge the limitation, restrict their conclusions to the sample rather than the population, and contextualise findings within available literature. Convenience sampling should be a last resort in quantitative studies and must be thoroughly justified.