What is the correct formula for sample size calculation?

The appropriate formula depends on three conditions. Cochran's Formula is used when the population is unknown or very large, and it calculates n = Z squared times p times q, divided by e squared. Slovin's Formula is used for known finite populations and calculates n = N divided by (1 plus N times e squared). The Krejcie-Morgan formula uses chi-square and is also applicable to finite populations. For studies measuring continuous variables with a known standard deviation, the formula n = (Z times sigma divided by e) squared applies.

What is Slovin's Formula and when should I use it?

Slovin's Formula calculates sample size using n equals N divided by the quantity (1 plus N multiplied by e squared), where N is the total population and e is the acceptable margin of error, typically 0.05 for 5 percent. It is appropriate when the total population is known and finite, and when the study involves proportions or categorical variables. It assumes a 95 percent confidence level implicitly and is most widely used in social science research in the Philippines and Southeast Asia.

What is the difference between probability and non-probability sampling?

Probability sampling gives every member of the population a known, non-zero chance of being selected, allowing generalisability and the calculation of sampling error. Non-probability sampling does not guarantee equal or known chances of selection, making generalisation beyond the sample difficult. Probability sampling is preferred for quantitative research requiring statistical inference. Non-probability sampling is appropriate for qualitative research, pilot studies, hard-to-reach populations, and exploratory inquiry.

When should I use stratified random sampling?

Stratified random sampling is appropriate when the population contains distinct subgroups (strata) that differ significantly on the variable of interest, when you need to ensure representation from each subgroup, or when comparisons between subgroups are a research objective. Examples include studies comparing departments within an organisation, grade levels within a school, or income brackets within a community.

How does Cochran's Formula differ from Slovin's Formula?

Cochran's Formula is more flexible and statistically rigorous. It requires specification of the expected population proportion (p), the desired confidence level through its Z-value, and the margin of error. It applies to unknown or very large populations and can be adjusted for finite populations using a correction factor. Slovin's Formula is simpler, requires only population size and margin of error, assumes a proportion of 0.5 implicitly, and does not directly incorporate the confidence level as a parameter.

Sample Size Calculator | Research Innovation Hub

What is Sampling? Foundational Concepts

Sampling is the process of selecting a subset of individuals, units, or observations from a larger population with the objective of drawing inferences about that population. Because it is rarely feasible to study every member of a population, sampling allows researchers to obtain representative data efficiently, economically, and ethically. The validity of any inference drawn from sample data depends on two conditions: the sample must be sufficiently large to detect the effects of interest, and the sample must be selected in a manner that is representative of the target population.

Fundamental Principle

The sample must be large enough to provide stable estimates and powerful enough to detect meaningful effects, yet small enough to remain practical. Both under-sampling and over-sampling carry costs: under-sampling increases the risk of Type II error; over-sampling wastes resources and may create ethical issues in studies involving human participants.

Core Terminology

Term	Definition
Population (N)	The complete set of individuals or units sharing a defined characteristic that the researcher wishes to study
Sample (n)	The subset of the population actually selected for observation or measurement
Sampling Frame	A complete list or representation of all elements in the population from which the sample is drawn
Margin of Error (e)	The maximum acceptable difference between the sample estimate and the true population value, typically 0.05 for 5%
Confidence Level	The probability that the true population parameter falls within the calculated confidence interval, typically 95%
Sampling Error	The difference between the sample statistic and the true population parameter, quantified through the standard error
Standard Error (SE)	The standard deviation of the sampling distribution, calculated as sigma divided by the square root of n

Probability Sampling Methods

In probability sampling, every member of the population has a known, non-zero probability of being selected. This property allows researchers to calculate sampling error, construct confidence intervals, and make statistically defensible generalisations from the sample to the population. Probability sampling is the standard requirement for quantitative research with inferential statistical objectives.

Type 01

Simple Random Sampling

When to use

Use when a complete sampling frame exists, the population is relatively homogeneous, and no subgroup comparisons are required. Each member has an equal and independent chance of selection, implemented through a random number generator, lottery, or table of random numbers.

Type 02

Systematic Random Sampling

When to use

Use when a complete ordered list of the population is available and periodicity is not a concern. Select every k-th element where k equals the population size divided by the desired sample size. Computationally simpler than SRS and produces similar results in the absence of list patterns.

Type 03

Stratified Random Sampling

When to use

Use when the population contains distinct, non-overlapping subgroups (strata) that differ significantly on the variable of interest, or when subgroup comparisons are research objectives. Divide the population into strata, then apply simple random sampling within each stratum. Improves precision over simple random sampling.

Type 04

Cluster Sampling

When to use

Use when the population is geographically dispersed, a complete sampling frame is unavailable, and travel costs are a constraint. Divide the population into naturally occurring clusters (e.g., schools, barangays, hospitals), randomly select clusters, and survey all or a sample of members within selected clusters.

Type 05

Multi-Stage Sampling

When to use

Use for large-scale national or regional surveys where no single-stage method is practical. Combines two or more sampling methods across successive stages: for example, first randomly selecting provinces, then randomly selecting municipalities within those provinces, then randomly selecting households within those municipalities.

Non-Probability Sampling Methods

In non-probability sampling, the probability of selection is unknown or cannot be calculated for each population member. These methods do not allow for the calculation of sampling error, and findings from non-probability samples cannot be statistically generalised to the broader population. They are appropriate for qualitative research, exploratory inquiry, pilot testing, and the study of hard-to-reach populations.

Type 01

Convenience Sampling

When to use

Use for pilot studies, classroom-based research, or when resources and time are severely limited. Participants are selected based on their availability and willingness to participate. Results cannot be generalised, and selection bias is a significant risk. Appropriate as a starting point for exploratory inquiry.

Type 02

Purposive Sampling

When to use

Use in qualitative research where the researcher requires participants with specific characteristics, experiences, or expertise relevant to the study. Also called judgement sampling. Common in case studies, phenomenological research, and expert consultation studies. Participants are selected because they can provide information-rich data.

Type 03

Snowball Sampling

When to use

Use when studying hidden, marginalised, or hard-to-reach populations where no sampling frame exists. Initial participants recruit subsequent participants from their networks. Appropriate for research on stigmatised groups, illegal behaviours, undocumented populations, or rare conditions. Introduces referral bias that must be acknowledged.

Type 04

Quota Sampling

When to use

Use when proportional representation of known subgroups is required but probability sampling is not feasible. Researchers establish quotas for each subgroup (e.g., 50% female, 30% from urban areas) and select participants until quotas are met. Resembles stratified sampling but selection within quotas is non-random, introducing selection bias.

Type 05

Theoretical Sampling

When to use

Used exclusively in grounded theory research. The researcher samples additional cases or data sources based on emerging theoretical concepts during concurrent data collection and analysis. Sampling continues until theoretical saturation is reached: the point at which no new categories or properties emerge from additional data.

Comparative Overview: All Sampling Methods

Method	Type	Population Needed	Generalisable	Primary Use	Key Limitation
Simple Random	Probability	Complete list required	Yes	Quantitative, homogeneous populations	Requires complete sampling frame
Systematic Random	Probability	Ordered list required	Yes	Ordered populations, administrative data	Periodicity risk in ordered lists
Stratified Random	Probability	Subgroup data required	Yes	Subgroup comparisons, heterogeneous populations	Requires accurate stratum data
Cluster	Probability	Cluster list required	Yes (with limitations)	Geographically dispersed populations	Higher sampling error than SRS
Multi-Stage	Probability	Hierarchical structure needed	Yes	National or regional surveys	Complex design effect calculations
Convenience	Non-Probability	Not required	No	Pilot testing, exploratory studies	High selection bias
Purposive	Non-Probability	Not required	No (analytic)	Qualitative, expert sampling	Researcher judgement influences selection
Snowball	Non-Probability	Not required	No	Hidden or hard-to-reach populations	Referral and network bias
Quota	Non-Probability	Subgroup proportions needed	Limited	Survey research without probability frame	Non-random within-quota selection
Theoretical	Non-Probability	Not required	Theoretical	Grounded theory research only	Applicable to one methodology only

Sample Size Calculators

The four calculators below cover the most widely used sample size formulas in quantitative research. Each returns a mathematically verified result with step-by-step output, APA 7th edition narrative, and a downloadable report. Select the formula appropriate for your study design using the criteria in the table above.

Formula Selection Guide

Cochran's Formula: Unknown or very large populations measuring proportions or categories.

Slovin's Formula: Known finite population, straightforward proportion studies, most common in Philippine social science research.

Krejcie-Morgan: Known finite population, alternative to Slovin with a chi-square basis, yields slightly smaller samples.

Continuous Variable: Studies measuring a continuous dependent variable when the population standard deviation is known or estimated from prior research.

Cochran's Formula

Cochran's (1977) formula is the most widely accepted method for determining sample size when the population is large or unknown. For proportional data, the formula is:

      n0 = (Z2 × p × q) / e2      where q = 1 − p
      
      With finite population correction:  n = n0 / (1 + (n0 − 1) / N)

Confidence Level Most studies use 95%

Expected Proportion (p) Use 0.5 when unknown (most conservative)

Margin of Error (e) 0.05 = 5%, 0.01 = 1%

Population Size N (optional) Applies finite population correction

0

Required Sample Size (n)

APA 7th Edition Narrative

Slovin's Formula

Slovin's Formula (1960) is a straightforward method for determining the sample size from a known finite population. It is widely used in social science research, particularly in the Philippines and Southeast Asia. The formula assumes a 95% confidence level and uses only the population size and an acceptable margin of error:

      n = N / (1 + N × e2)
    

Population Size (N) Total number of individuals in the population

Margin of Error (e) 0.05 for 5%, 0.02 for 2%

0

Required Sample Size (n)

APA 7th Edition Narrative

Krejcie-Morgan Formula

Krejcie and Morgan (1970) derived sample size requirements from the chi-square distribution for finite populations. This formula underlies the widely cited Krejcie-Morgan table published in Educational and Psychological Measurement. It applies a chi-square value with one degree of freedom at the desired confidence level:

      n = (χ2 × N × P × (1 − P)) / (d2 × (N − 1) + χ2 × P × (1 − P))
    

Population Size (N) Total population size

Confidence Level Chi-square value at df = 1

Population Proportion (P) Use 0.5 for maximum sample size

Degree of Accuracy (d) 0.05 standard; 0.01 for high precision

0

Required Sample Size (n)

APA 7th Edition Narrative

Continuous Variable Sample Size

When the outcome variable is continuous and the population standard deviation is known or can be estimated from prior research or a pilot study, the following formula applies. This method is common in experimental, quasi-experimental, and health science research:

      n = (Z × σ / e)2
      
      With finite population correction:  n = n0 / (1 + (n0 − 1) / N)

Confidence Level

Population Std. Dev. (σ) From prior research or pilot study

Acceptable Error (e) In the same unit as the variable

Population Size N (optional) For finite population correction

0

Required Sample Size (n)

APA 7th Edition Narrative

Stratified Proportional Allocation

After determining the total sample size using any of the above formulas, researchers using stratified sampling must allocate the total sample proportionally across strata. The proportional allocation formula ensures each stratum's sample is proportional to its share of the population:

      nh = (Nh / N) × n
      
      where Nh = stratum population, N = total population, n = total sample size

Total Sample Size (n) From Cochran, Slovin, or KM above

Total Population (N) Sum of all stratum populations

Strata (Name and Population Size)

Stratified Sample Allocation

Stratum	Population (N_h)	Proportion	Sample (n_h)

APA 7th Edition Narrative

Formula Comparison and Selection Criteria

Formula	Population Required	Variable Type	Key Inputs	Typical Range (N=1000, e=5%)	Recommended for
Cochran (1977)	Not required	Proportions	Z, p, e; optional N	279 (with FPC) / 385 (infinite)	Unknown or very large populations; rigorous quantitative studies
Slovin (1960)	Required (N)	Proportions	N, e	286	Known finite populations; Philippine social science research
Krejcie-Morgan (1970)	Required (N)	Proportions	N, chi-square, P, d	278	Educational and psychological research; replicates KM table values
Continuous Variable	Optional	Continuous	Z, sigma, e; optional N	Varies by sigma and e	Experimental research with known or estimated population standard deviation

Critical Notes for Doctoral Research

Attrition adjustment: Add 10 to 20 percent to the calculated n to account for non-response, dropout, or invalid responses. If you expect a 15 percent attrition rate, divide the required n by 0.85.

Slovin vs. Cochran: Slovin's Formula implicitly assumes a proportion of 0.5 and a 95% confidence level. It cannot accommodate different confidence levels without modification. Cochran's Formula is more flexible and statistically transparent.

Pilot study standard deviation: For continuous variable studies, a standard deviation estimated from a pilot study of at least 30 participants is acceptable when population data are unavailable.

Frequently Asked Questions

The two formulas differ in their underlying assumptions. Cochran's Formula requires the researcher to specify the expected population proportion (p) and the exact confidence level through its corresponding Z-value. Slovin's Formula implicitly assumes a population proportion of 0.5 and a 95% confidence level, and it does not incorporate these as explicit parameters. For a population of 1,000 with a 5% margin of error, Cochran's Formula produces 385 for an infinite population and 279 after applying the finite population correction, while Slovin's produces 286. The difference reflects the finite population correction in Cochran's adjusted result.

The finite population correction (FPC) factor is applied when the sample size constitutes more than 5% of the total population. Without correction, Cochran's Formula and the continuous variable formula treat the population as effectively infinite, which overestimates the required sample size for small finite populations. The correction formula is: n (corrected) = n0 divided by (1 + (n0 minus 1) divided by N). For a population of 1,000 where the uncorrected estimate is 385, the corrected sample size is 279, a reduction of 27%. For large populations (N greater than 10,000), the correction has negligible effect and may be omitted.

Use p = 0.5. The product p times q (which equals p times 1 minus p) is maximised when p = 0.5, producing the largest and therefore most conservative sample size estimate. This protects the study against underpowering in situations where the true population proportion is unknown. If prior research or a pilot study provides a reasonable estimate of the proportion, that estimate may be used to yield a smaller, more efficient sample. However, using p = 0.5 is the universally accepted conservative default in the absence of prior data.

Divide the statistically required sample size by the expected response rate. If the required n is 300 and you expect a 20% non-response rate (response rate of 0.80), the adjusted sample to recruit is 300 divided by 0.80, which equals 375. The adjustment ensures that even after accounting for non-responses, invalid questionnaires, or participant dropout, the minimum required sample size is still achieved. The expected non-response rate should be based on similar studies in the literature or documented institutional experience, and must be justified in the methodology chapter.

This is one of the most debated methodological questions in applied research. Statistically, inferential conclusions from non-probability samples cannot be validly generalised to the broader population because the probability of inclusion is unknown, making sampling error incalculable. However, in practice, many quantitative studies use non-probability sampling due to constraints of access, resources, or population structure. When this occurs, researchers must explicitly acknowledge the limitation, restrict their conclusions to the sample rather than the population, and contextualise findings within available literature. Convenience sampling should be a last resort in quantitative studies and must be thoroughly justified.