Skip to main content

Section 01

Definition and Nature of Survey Research

Survey research is among the most widely used methodological approaches in the social, behavioral, and health sciences. At its foundation, it is a systematic method of collecting data from a defined group of people—often called respondents or participants—with the aim of describing, comparing, or explaining the distribution of attributes, attitudes, behaviors, and opinions within a population.

Formal Definition

Survey research is "the collection of information from a sample of individuals through their responses to questions" designed to generalize findings to a broader population of interest (Fowler, 2014, p. 1). It combines the logic of probability sampling with the systematic measurement of variables through standardized instruments.

What distinguishes survey research from casual questioning is its rigorous structure. Questions are prepared in advance, administered consistently across all respondents, and analyzed using statistical procedures that allow the researcher to draw inferences about populations that are far larger than the sample itself. This inferential power, rooted in probability theory and careful sampling design, is what makes survey research a cornerstone of quantitative inquiry.

Survey research is positioned squarely within the positivist tradition, which assumes that social reality can be measured, quantified, and analyzed objectively. However, contemporary researchers recognize that survey data always reflect respondent perceptions, which means that careful attention to wording, context, and measurement error is not optional—it is essential to the integrity of findings.

💡
Key Characteristic Survey research does not manipulate variables. Unlike experimental designs, it observes and measures variables as they naturally exist within participants' lives, making it descriptive and correlational by nature rather than causal—unless longitudinal or quasi-experimental designs are employed.

The scope of survey research is remarkably broad. Psychologists use surveys to measure personality traits and mental health status; sociologists deploy them to track social attitudes across decades; educational researchers apply them to evaluate student learning experiences; and public health researchers rely on them to monitor health behaviors at the population level. Their versatility, combined with relatively manageable costs compared to experimental designs, makes surveys a method of enduring relevance.

Section 02

Historical Background

The roots of survey research extend deeper into history than many researchers realize. Charles Booth's monumental social survey of London's poor, conducted between 1886 and 1903, is widely recognized as one of the earliest systematic social surveys. Booth's team gathered household-level data on living conditions, employment, and poverty across the entire city—a methodological ambition that remains impressive to this day.

In the United States, the development of public opinion polling in the 1930s marked a decisive acceleration in survey methodology. George Gallup's correct prediction of the 1936 U.S. presidential election result—contrary to the larger but methodologically flawed Literary Digest poll—demonstrated dramatically the superiority of probability-based sampling over convenience or volunteer samples. That moment shifted the entire discipline's understanding of what made a survey scientifically credible (Converse, 1987).

The postwar decades saw significant investment in survey methodology through institutions such as the Survey Research Center at the University of Michigan, established in 1946. Scholars like Leslie Kish formalized probability sampling theory; Rensis Likert, Hadley Cantril, and Louis Guttman contributed foundational work on attitude measurement. By the 1960s and 1970s, large-scale longitudinal surveys—such as the Panel Study of Income Dynamics (begun 1968)—demonstrated that survey data could capture social change over time with extraordinary depth.

The late twentieth and early twenty-first centuries brought two fundamental transformations. First, computer-assisted interviewing—initially telephone-based (CATI) and later web-based—dramatically increased the efficiency and reach of data collection. Second, rapidly declining response rates, the proliferation of online survey platforms, and the emergence of nonprobability sampling approaches prompted serious methodological debate about the foundations of inference from survey data (Tourangeau et al., 2013; Groves et al., 2011).

Today, survey research grapples simultaneously with unprecedented access—anyone with a smartphone can reach thousands of potential respondents within hours—and persistent threats to data quality including self-selection bias, satisficing behavior, and algorithmic filtering that may skew sample composition in ways researchers cannot easily detect or correct.

Section 03

Types of Survey Research

Survey research is not a monolithic method. Several distinct types exist, each suited to different research objectives, time horizons, and epistemological commitments. A thorough researcher selects the type that best matches the research question, not merely the type that is most convenient to administer.

By Time Frame

Cross-Sectional Survey

Data are collected from participants at a single point in time. This is the most commonly used type in social research. It provides a snapshot of the population's characteristics, attitudes, or behaviors as they exist at the moment of data collection. Suitable for descriptive and correlational research questions but cannot establish temporal precedence.

By Time Frame

Longitudinal Survey

Data are collected from the same population across multiple time points. Enables the researcher to track change over time, examine developmental trajectories, and establish temporal ordering of variables—a precondition for causal inference in nonexperimental designs. Subdivided into panel, cohort, and trend studies.

Longitudinal Subtype

Panel Study

The same individuals are surveyed repeatedly over time. Provides the richest data for studying individual change but is vulnerable to panel attrition—the systematic loss of participants that can bias findings if those who drop out differ meaningfully from those who remain. The Panel Study of Income Dynamics is a classic example.

Longitudinal Subtype

Cohort Study

Individuals sharing a defining characteristic (e.g., birth year, graduation year, employment entry) are followed over time. Not all cohort members need be the same individuals at each wave; representative samples from the cohort may be drawn instead. Widely used in epidemiology and educational research.

Longitudinal Subtype

Trend Study

Different samples from the same general population are surveyed at different time points. Unlike panel studies, trend studies replace, rather than retain, participants across waves. Suitable for monitoring aggregate change in public opinion, attitudes, or behaviors over extended periods. The Gallup Poll is a classic example.

By Purpose

Descriptive Survey

Aims to describe the characteristics or attributes of a specific population as they exist at a point in time. Does not seek to explain relationships among variables. Census surveys are the paradigmatic case. Results are typically reported as frequency distributions and percentages.

By Purpose

Analytical Survey

Goes beyond description to examine relationships and test hypotheses about associations among variables. Regression analysis, structural equation modeling, and multilevel modeling are among the techniques applied. Most doctoral research using survey methods employs an analytical framework.

By Administration

Census Survey

Data are collected from every member of the defined population rather than from a sample. Eliminates sampling error but is feasible only for small or institutionally bounded populations (e.g., all teachers in a single school district). National population censuses are government-administered examples conducted at enormous scale and cost.

Section 04

The Survey Design Process

Designing a rigorous survey is an iterative, multilayered process that requires careful planning before a single questionnaire item is written. Researchers who begin by drafting questions—before articulating their research questions, conceptual framework, or measurement model—invariably produce instruments that are difficult to analyze and whose results are difficult to interpret. The following process reflects best practices from the field's leading methodologists (Dillman et al., 2014; Fowler, 2014; Krosnick & Presser, 2010).

  1. 1

    Define the Research Problem and Objectives

    Articulate precisely what the survey is meant to learn. Formulate specific research questions or hypotheses. Identify the conceptual variables of interest and their theoretical relationships. This step determines everything that follows; vague research questions produce vague survey instruments.

  2. 2

    Review Existing Literature and Instruments

    Search systematically for published studies addressing the same or related research questions. Identify validated instruments that have demonstrated psychometric properties. Using existing, validated instruments—or adapting them with proper citation and pilot testing—is almost always preferable to constructing new measures from scratch.

  3. 3

    Define the Target Population and Sampling Frame

    Specify the population to which findings will be generalized with precision: not "teachers in the Philippines" but "public elementary school teachers employed in DepEd-managed schools in Region VII during School Year 2024–2025." Then identify or construct the sampling frame—the list or mechanism from which the sample will be drawn.

  4. 4

    Select the Sampling Method and Compute Sample Size

    Choose a probability or nonprobability sampling strategy appropriate to the research questions and available resources. Compute the required sample size using power analysis or accepted formulas, accounting for expected response rates and desired precision. Under-powered surveys yield unreliable estimates regardless of how well the questionnaire is designed.

  5. 5

    Design the Questionnaire

    Write, review, and refine questionnaire items following evidence-based guidelines for question wording, response option design, and questionnaire structure. Ensure every item maps directly to a research question or conceptual variable. Review the instrument for double-barreled questions, leading language, ambiguous terminology, and inappropriate reading-level assumptions.

  6. 6

    Establish Validity and Reliability

    Subject the draft instrument to expert review for content validity. Conduct cognitive interviewing with members of the target population to assess comprehension. Pilot test the instrument with a small sample (typically 30–50 participants) and compute reliability statistics (e.g., Cronbach's alpha). Revise as needed.

  7. 7

    Obtain Ethical Approval

    Submit the study protocol to the Institutional Review Board (IRB) or Ethics Review Committee before any data collection begins. Prepare informed consent procedures, data storage protocols, and participant confidentiality protections in accordance with institutional and national guidelines.

  8. 8

    Administer the Survey

    Deploy the instrument to the sample using the chosen administration mode. Implement a contact strategy—initial contact, reminders, and follow-ups—to maximize response rates without coercion. Tailored Design Method (Dillman et al., 2014) provides an evidence-based framework for optimizing response rates across administration modes.

  9. 9

    Process, Clean, and Analyze Data

    Code, clean, and screen data systematically before analysis. Assess patterns of missing data and apply appropriate handling strategies (listwise deletion, multiple imputation). Conduct analyses aligned with the research questions, reporting effect sizes and confidence intervals alongside significance tests.

  10. 10

    Interpret, Report, and Disseminate Findings

    Interpret results within the conceptual framework and in light of the study's limitations. Prepare reports and manuscripts that transparently describe sampling procedures, response rates, instrument properties, and analytical decisions. Disseminate findings in appropriate academic and practitioner venues.

Section 05

Sampling Methods

The credibility of survey research rests substantially on the adequacy of the sampling strategy. Sampling theory provides the mathematical justification for generalizing from a subset of a population to the whole; without a defensible sampling design, even the most carefully worded questionnaire produces findings of doubtful generalizability.

Sampling methods divide into two broad families: probability sampling, in which every member of the population has a known, nonzero probability of being selected, and nonprobability sampling, in which selection probabilities are unknown and generalizability depends on argument rather than mathematical proof.

Simple Random Sampling

Probability

Every member of the sampling frame has an equal probability of selection. Requires a complete and accurate sampling frame. Produces unbiased estimates with known precision. Computationally straightforward but logistically demanding for large, dispersed populations.

Systematic Sampling

Probability

Every kth element is selected from an ordered list after a random start. The sampling interval k = N/n, where N is the population size and n is the desired sample size. Approximately equivalent to simple random sampling unless the list has a periodic structure that coincides with the sampling interval.

Stratified Sampling

Probability

The population is divided into homogeneous subgroups (strata) and simple random samples are drawn from each stratum independently. Guarantees adequate representation of key subgroups and typically yields more precise estimates than simple random sampling when strata differ on the outcome variable.

Cluster Sampling

Probability

The population is divided into clusters (usually naturally occurring groups such as schools, barangays, or hospitals). A random sample of clusters is selected and all members within chosen clusters are surveyed. Reduces data collection costs substantially but typically increases standard errors relative to simple random sampling.

Multistage Sampling

Probability

Combines multiple sampling methods in sequence. A large-scale example: first select provinces at random, then municipalities within selected provinces, then barangays within selected municipalities, then households within selected barangays. Used by virtually all large national surveys.

Convenience Sampling

Nonprobability

Participants are selected on the basis of availability and willingness. Efficient but prone to severe selection bias. Common in student-centered research and online surveys. Limits generalizability; findings should be interpreted cautiously and labeled as preliminary or exploratory.

Purposive Sampling

Nonprobability

Participants are selected based on specific characteristics deemed relevant to the research question. Useful when the researcher requires participants with particular expertise, experience, or attributes. The researcher must explicitly justify selection criteria in the methodology section.

Quota Sampling

Nonprobability

The researcher sets quotas for participant categories (e.g., 50 male teachers, 50 female teachers) and recruits participants until quotas are filled. Superficially resembles stratified sampling but lacks random selection within strata; therefore does not yield probability-based estimates.

Snowball Sampling

Nonprobability

Initial participants are recruited, then asked to refer additional participants from their social networks. Used when the target population is hidden, hard to reach, or stigmatized (e.g., undocumented workers, individuals with rare conditions). Not recommended as the primary strategy in doctoral quantitative research unless the population is genuinely inaccessible by other means.

Computing Sample Size

One of the most consequential—and most frequently mishandled—decisions in survey research is sample size determination. Underpowered studies cannot detect true effects; overpowered studies waste resources. The appropriate approach depends on the analytical technique and the effect size the researcher expects to detect.

Slovin's Formula (Simplified for Descriptive Surveys)

n = N / (1 + Ne²)

Where n = required sample size, N = population size, e = margin of error (typically .05 or .10). Note: This formula assumes a proportion of 0.50 and should be used for descriptive purposes only.

Cochran's Formula (For Unknown or Large Populations)

n₀ = (Z² × p × q) / e²

Where Z = z-value for confidence level (1.96 for 95%), p = estimated proportion of the attribute (0.50 if unknown), q = 1 – p, e = desired margin of error. If the population is finite: n = n₀ / (1 + (n₀ – 1)/N).

⚠️
Important Caution for Doctoral Students Slovin's formula, while widely taught and used in Philippine graduate education, has important limitations. It was designed for descriptive surveys estimating proportions. For analytical surveys testing relationships among variables using regression or SEM, power analysis using Cohen's (1988) framework or G*Power software provides more defensible sample size estimates. Your thesis or dissertation committee may rightly question a sample size justified solely by Slovin's formula for a correlational or predictive study.

Section 06

Questionnaire Construction

The questionnaire is the primary instrument through which survey data are gathered. Its quality determines the quality of the data. A poorly designed questionnaire introduces systematic measurement error—bias that cannot be corrected at the analysis stage, no matter how sophisticated the statistical techniques applied. Questionnaire design is therefore a technical and intellectual discipline in its own right, not merely an administrative task.

Question Types

Question Type Description Best Used For Limitations
Closed-ended (fixed) Respondents choose from predetermined response options Quantifiable data; comparisons across respondents; statistical analysis May not capture full range of responses; response options may not fit all respondents
Open-ended Respondents provide answers in their own words Exploratory research; capturing nuance; generating hypotheses Difficult to quantify; requires content analysis; lower completion rates
Dichotomous Two mutually exclusive options (Yes/No; True/False) Simple factual questions; demographic screening Oversimplifies complex attitudes or behaviors
Multiple choice Select one or more options from a list Categorical variables with known options Exhaustive and mutually exclusive options required
Rating scale Respond on a numeric or labeled scale Measuring intensity, frequency, agreement, or satisfaction Prone to response biases (e.g., acquiescence, central tendency)
Ranking Order options from most to least preferred or important Establishing relative preferences Cognitively demanding; difficult to analyze statistically
Matrix Multiple items rated on the same scale in a grid format Efficient collection of scale items Prone to straight-lining (respondents check same column without reading)

Principles of Effective Question Wording

The phrasing of individual questions is where much measurement error originates. Krosnick and Presser (2010) summarize decades of experimental evidence on question wording effects that all survey researchers should internalize.

Every word in a survey question must be comprehensible to all members of the target population. Avoid technical jargon unless it is established vocabulary within the population. When in doubt, choose the simpler word: "start" instead of "commence," "use" instead of "utilize." Reading-level analyses (e.g., Flesch-Kincaid) can assist but do not replace cognitive interviewing with actual target participants.
A double-barreled question asks about two distinct things simultaneously, making it impossible to know which aspect the respondent is addressing. Example of a flawed item: "The school administration is responsive and supportive of teachers' needs." A respondent might find administration responsive but not supportive—or vice versa. Each construct must occupy its own item.
Leading questions embed an implied correct answer or socially desirable response. Example: "Don't you agree that teachers deserve higher salaries?" Loaded questions use emotionally charged language. Both systematically distort responses. Questions should be phrased in balanced, neutral language that does not presuppose any answer.
For closed-ended questions, response options must cover all plausible answers (exhaustive) and must not overlap (mutually exclusive). Including "Other (please specify)" addresses exhaustiveness when not all options can be anticipated. For age categories, ranges must be clearly defined: "25–34" and "35–44" rather than "25–35" and "35–45," which overlap.
Questions that require respondents to recall events from distant memory introduce substantial recall error. "How many times have you visited a health center in the past 12 months?" is more reliable than "How many times have you visited a health center in the past five years?" Bounded recall periods—specifying a concrete, recent time window—substantially reduce telescoping and forward-memory errors (Tourangeau et al., 2000).
Question order can systematically influence responses through context effects. General questions should precede specific ones (the funnel approach) to avoid artificially inflating the salience of specific topics. Sensitive or personally invasive questions should appear toward the end of the instrument, after rapport has been established, and should be preceded by less threatening questions.

Section 07

Measurement Scales in Survey Research

The choice of measurement scale determines the statistical analyses that are permissible, the amount of information captured per item, and the cognitive demands placed on respondents. Survey researchers must understand both the psychometric properties and the practical limitations of each scale type.

The Likert Scale

The Likert scale, developed by Rensis Likert in his 1932 doctoral dissertation, remains the most widely used scale in survey research. A Likert item presents a statement and asks respondents to indicate their degree of agreement using a symmetric agree–disagree response format. A full Likert scale consists of multiple such items that are summed or averaged to produce a composite score representing an underlying latent construct.

📘
Likert Item vs. Likert Scale A single Likert-type item is not a Likert scale. A true Likert scale is a composite of multiple items measuring the same latent variable. Treating a single item as if it were a scale is a common and consequential methodological error in graduate theses and dissertations. The scale must also demonstrate internal consistency—typically assessed via Cronbach's alpha—before scores can be meaningfully interpreted.

Interactive Example: 5-Point Likert Item

Statement: "My school's administration provides adequate support for professional development activities."

1Strongly Disagree
2Disagree
3Neither Agree nor Disagree
4Agree
5Strongly Agree

Common Scale Types Compared

Scale TypeInventor/SourceStructureKey FeatureTypical Use
Likert ScaleRensis Likert (1932)Agreement statements, 4–7 pointsBalanced agree–disagree options; composite scoringAttitudes, perceptions, beliefs
Semantic DifferentialOsgood et al. (1957)Bipolar adjective pairs, 7-pointMeasures connotative meaning along bipolar dimensionsBrand attitudes, person perception
Thurstone ScaleThurstone (1928)Judge-rated statementsItems assigned scale values through expert judgmentHistorical; rarely used in contemporary practice
Guttman ScaleGuttman (1944)Hierarchically ordered itemsCumulative: endorsing a harder item implies endorsing all easier onesMeasuring ordered abilities or behaviors
Visual Analog ScaleClinical researchContinuous line with endpointsCaptures fine-grained gradation; requires special scoringPain intensity, emotion intensity
Net Promoter ScoreReichheld (2003)0–10 single-itemClassifies promoters, passives, detractorsCustomer satisfaction; organizational loyalty

Levels of Measurement and Statistical Implications

Stevens's (1946) taxonomy of measurement levels—nominal, ordinal, interval, and ratio—remains foundational to understanding the statistical analyses that are appropriate for a given scale. Whether Likert scale data should be treated as ordinal or interval-level has generated substantial methodological debate.

LevelPropertiesExample in SurveysAppropriate Statistics
NominalCategories only; no rank or distanceSex, region, school typeMode, chi-square, frequency counts
OrdinalRanked categories; distances not equalLikert items (technically); education levelMedian, mode, Spearman's rho, Mann-Whitney U
IntervalEqual distances; no true zeroTemperature in Celsius; IQ scoresMean, standard deviation, Pearson's r, t-test, ANOVA
RatioEqual distances; true zeroIncome, age, years of experienceAll parametric statistics; geometric mean

Section 08

Data Collection Methods

The mode of survey administration—how the questionnaire is delivered to respondents—has significant implications for response rates, data quality, cost, coverage, and the nature of social desirability effects. No single mode is optimal for all research contexts; the choice requires careful analysis of the target population, budget, timeline, and the sensitivity of the topic.

ModeDescriptionAdvantagesDisadvantagesResponse Rate (Typical)
Face-to-Face Interview Trained interviewer administers questions in person Highest data quality; complex questions possible; can probe; low item non-response Expensive; interviewer effects; geographic constraints 60–80%
Telephone Interview (CATI) Interviews conducted by phone using computerized scripts Moderate cost; centralized supervision; wide geographic reach Cell-phone-only households; declining response rates; no visual aids 25–45%
Mail (Postal) Survey Paper questionnaire mailed to sampled addresses Low cost per completed survey; no interviewer effect; respondent controls pace Slow; low response rates; literacy required; limited length 20–40%
Web Survey (CAWI) Online questionnaire accessed via link (e.g., Google Forms, Qualtrics) Low cost; fast; multimedia possible; automatic data entry; wide reach Coverage bias (digital divide); low response rates; data quality concerns; no verifiable sampling frame 10–30%
Group-Administered Questionnaire distributed to a captive group (e.g., students in a classroom) Very high response rate; efficient; researcher can answer questions Limited to accessible groups; social desirability in shared settings 90–100%
Mixed-Mode Two or more modes used within the same study Higher coverage; higher response rates; cost-efficient Mode effects can introduce measurement error across groups Varies by combination
📋
The Problem of Non-Response Non-response is not merely an inconvenience—it is a serious threat to the validity of survey findings whenever respondents and non-respondents differ systematically on the variables being measured. This condition, called non-response bias, can occur even when overall response rates appear adequate. Researchers should always report response rates and, where possible, compare early and late respondents or use available administrative data to assess the likely direction and magnitude of non-response bias (Groves & Peytcheva, 2008).

Section 09

Validity and Reliability of Survey Instruments

Validity and reliability are the twin pillars of measurement quality in survey research. An instrument that measures nothing real is invalid; an instrument that measures something real but inconsistently is unreliable. Ideally, an instrument is both valid and reliable—though reliability is a necessary but not sufficient condition for validity.

Validity

Type of ValidityDefinitionHow to Establish
Content Validity Instrument items adequately represent the full domain of the construct Expert panel review; content validity ratio (Lawshe, 1975); systematic item mapping to construct dimensions
Face Validity Instrument appears, on its surface, to measure what it claims to measure Review by subject matter experts and members of the target population; not a substitute for empirical validation
Construct Validity Instrument measures the theoretical construct it purports to measure Confirmatory factor analysis (CFA); convergent and discriminant validity; nomological network testing
Convergent Validity Items measuring the same construct correlate with each other and with measures of related constructs Average Variance Extracted (AVE) ≥ .50; correlations with established measures of the same construct
Discriminant Validity Instrument does not correlate too highly with measures of conceptually distinct constructs AVE for each construct exceeds the squared correlation between constructs; Heterotrait-Monotrait ratio (HTMT)
Criterion-Related Validity Instrument scores predict or correlate with an external criterion Concurrent validity (correlation with simultaneous criterion); Predictive validity (correlation with future criterion)

Reliability

Reliability TypeDefinitionAssessment MethodAcceptable Threshold
Internal Consistency Items within a scale produce consistent responses Cronbach's alpha; McDonald's omega α ≥ .70 (Nunnally & Bernstein, 1994); α ≥ .80 preferred for clinical or high-stakes decisions
Test-Retest Reliability Scores are stable over time when the construct itself has not changed Pearson's r or ICC between time points (2–4 weeks apart is typical) r or ICC ≥ .70; ≥ .80 preferred
Inter-Rater Reliability Different raters produce consistent scores when rating the same responses Cohen's kappa (κ); ICC; percent agreement κ ≥ .70; ≥ .80 preferred
Parallel Forms Reliability Two equivalent forms of the instrument produce consistent scores Correlation between forms administered to the same group r ≥ .80 typically expected
Split-Half Reliability Two halves of the instrument correlate with each other Spearman-Brown prophecy formula applied to correlation of halves Corrected r ≥ .70
🔬
Beyond Cronbach's Alpha Cronbach's alpha has been the default reliability statistic in social science research for decades, but it has well-documented limitations: it assumes tau-equivalence (equal factor loadings) across items, is sensitive to the number of items, and can be inflated by item redundancy. Contemporary measurement scholars recommend McDonald's omega (ω) as a superior reliability estimate for multidimensional or non-tau-equivalent scales (McNeish, 2018). Doctoral students using structural equation modeling should report both alpha and omega.

Section 10

Data Analysis in Survey Research

The analytical stage transforms raw survey data into interpretable findings. The specific techniques employed depend on the research questions, the measurement levels of the variables, the sampling design, and the theoretical framework. A critical error made by many novice researchers is selecting statistical tests without consideration of the underlying assumptions or the study's inferential goals.

Preliminary Analysis Steps

  1. 1

    Data Screening and Cleaning

    Inspect the dataset for data entry errors, out-of-range values, and implausible response patterns (e.g., all items rated identically, also called straight-lining). Document all cleaning decisions in a data management log that can be reviewed by collaborators or external auditors.

  2. 2

    Missing Data Analysis

    Determine the extent and pattern of missing data. Data missing completely at random (MCAR) can be handled by listwise deletion with minimal bias. Data missing at random (MAR) or missing not at random (MNAR) require more sophisticated strategies: multiple imputation is the recommended approach under MAR. Report the percentage of missing data per variable and the imputation strategy employed.

  3. 3

    Assumption Testing

    Test the assumptions of the planned statistical procedures before applying them. For parametric tests, assess normality (Shapiro-Wilk test; histograms with normal curves), homogeneity of variance (Levene's test), and linearity. For regression, additionally assess independence of residuals (Durbin-Watson), absence of multicollinearity (VIF < 10; tolerance > .10), and homoscedasticity.

  4. 4

    Descriptive Statistics

    Report frequency distributions, measures of central tendency (mean, median, mode), and measures of dispersion (standard deviation, range, interquartile range) for all key variables. Descriptive statistics provide essential context for interpreting inferential results and should always be reported regardless of the study's primary analytical focus.

Inferential Statistical Techniques

Research Question TypeTechniqueKey Assumptions
Difference between two groups on a continuous variableIndependent samples t-testNormality; homogeneity of variance; independent observations
Difference among three or more groupsOne-way ANOVA; post-hoc testsNormality; homogeneity of variance; independent groups
Association between two categorical variablesChi-square test of independenceExpected cell frequency ≥ 5; independence of observations
Correlation between two continuous variablesPearson's r (interval); Spearman's ρ (ordinal)Linearity; bivariate normality (Pearson); no extreme outliers
Predicting one outcome from multiple predictorsMultiple linear regressionLinearity; normality of residuals; homoscedasticity; no multicollinearity
Predicting a binary outcomeBinary logistic regressionNo multicollinearity; adequate cell sizes; linearity of log-odds
Testing a theoretical measurement modelConfirmatory Factor Analysis (CFA)Multivariate normality; adequate sample size; identified model
Testing a theoretical structural model with latent variablesStructural Equation Modeling (SEM)Large samples (n ≥ 200 typically); multivariate normality; model identification
Nested data (students within schools)Multilevel Modeling (HLM)Sufficient cluster sizes and number of clusters; random effects specification
📌
Statistical Significance vs. Practical Significance Statistical significance (p < .05) indicates only that an observed result is unlikely to be due to chance. It says nothing about the magnitude or practical importance of an effect. Researchers must report effect sizes alongside p-values: Cohen's d for mean differences, η² or ω² for ANOVA, r² or R² for regression. The APA Publication Manual (7th ed., 2020) and most reputable journals now require effect size reporting. A statistically significant result with a negligible effect size (e.g., d = 0.05) may have no practical implications whatsoever.

Section 11

Ethical Considerations in Survey Research

The ethical conduct of survey research is not a bureaucratic formality. It reflects the researcher's fundamental respect for the autonomy, dignity, and wellbeing of the people who provide the data on which all findings rest. Without ethical integrity, scientific credibility is impossible to sustain.

Core Principle

Informed Consent

Participants must receive clear, comprehensible information about the study's purpose, procedures, risks, benefits, and their right to withdraw without penalty before agreeing to participate. Consent must be voluntary and free from coercion. In the Philippines, this requirement is governed by the National Ethical Guidelines for Health and Health-Related Research (PHREB, 2017).

Core Principle

Confidentiality and Anonymity

Confidentiality means the researcher knows participants' identities but will not disclose them. Anonymity means the researcher does not know who provided which responses. Most surveys guarantee confidentiality; anonymity requires that no identifying information be collected. Both must be actively protected through secure data storage, access controls, and data de-identification procedures.

Core Principle

Minimizing Harm

Researchers must anticipate and mitigate potential harms from participation, including psychological distress, breach of privacy, or social stigma. Questions about sensitive topics (e.g., mental health, domestic violence, substance use) require particular care. Referral resources should be provided when questions touch on distressing experiences.

Core Principle

Data Integrity

Survey data must be collected, processed, and reported honestly and accurately. Fabrication of data (creating fictional responses), falsification (manipulating genuine responses), and selective reporting of results are serious breaches of research integrity with severe professional and legal consequences. Raw data should be preserved and available for audit.

Core Principle

Justice and Equity in Sampling

Sampling decisions should not systematically exclude groups in ways that deny them the potential benefits of research participation or that concentrate research burdens unfairly. Historical exclusions of women, minorities, and marginalized communities from research samples have produced body of knowledge with significant blind spots that we are still working to correct.

Core Principle

IRB/Ethics Review

All survey research involving human participants must be reviewed and approved by an Institutional Review Board (IRB) or equivalent ethics committee before data collection begins. No exceptions apply on the grounds that the survey is "merely" descriptive or that the questions seem harmless. Ethics review protects both participants and researchers.

Section 12

Common Limitations and How to Address Them

No research method is without limitations, and survey research has several that researchers must acknowledge explicitly in their reports and dissertations. Identifying limitations is not an admission of failure—it is evidence of methodological maturity and scholarly honesty. Crucially, the researcher should not merely identify limitations but explain why the study's conclusions remain defensible despite them.

Respondents may answer in ways they believe are socially acceptable rather than truthfully. This is especially problematic for sensitive topics (e.g., discriminatory attitudes, risky behaviors, income). Mitigation strategies include: guaranteeing anonymity; using indirect question formats; employing validated social desirability scales (e.g., Marlowe-Crowne) and controlling for them statistically; using list experiments or randomized response techniques for particularly sensitive items.
When both predictor and outcome variables are measured using the same self-report instrument administered at the same time, correlations among them may be inflated by shared method variance rather than reflecting true construct relationships. Procedural remedies include: temporal separation of predictor and outcome measurement; using different response formats for different constructs; obtaining data from multiple sources. Statistical remedies include Harman's single-factor test (limited utility) and the common latent factor technique within SEM (Podsakoff et al., 2012).
Some respondents systematically agree with items regardless of their content—a tendency known as acquiescence or "yea-saying." Including both positively and negatively worded items (reverse-keyed items) within a scale is a common strategy to detect and partially control for this bias. However, researchers should note that reverse-keyed items introduce their own psychometric complications, and the evidence that they effectively reduce acquiescence is mixed (Weijters et al., 2013).
Individuals who choose to respond may differ systematically from those who do not, biasing results. Response rates have declined dramatically across all modes over the past two decades (Brick & Williams, 2013). Mitigation includes: using multiple contact attempts; offering incentives; comparing respondent characteristics to population parameters; conducting wave analysis comparing early and late respondents as a proxy for non-respondents.
Cross-sectional survey research cannot establish causal relationships, only associations. Even in longitudinal designs, establishing causation requires ruling out alternative explanations—a demanding analytical and design challenge. Researchers must be cautious in the language used to describe findings: "X was associated with Y" rather than "X caused Y" unless a truly experimental or quasi-experimental design was employed.
Rather than engaging carefully with each question (optimizing), some respondents exert minimal cognitive effort, selecting the first plausible response option, choosing the midpoint consistently, or not reading questions fully (Krosnick, 1991). Satisficing is more common in long surveys, among less motivated participants, and in self-administered formats where no interviewer is present to encourage engagement. Design remedies: shorter surveys; engaging introductions; attention check items; avoiding response formats that facilitate satisficing (e.g., scales with many points).

Section 13

Practical Examples of Survey Research

Abstract methodological principles become fully comprehensible only when grounded in concrete research scenarios. The following examples illustrate how survey research is designed and executed across different disciplines and contexts—including settings familiar to Filipino researchers and educators.

Example 01 — Educational Research

Teacher Burnout and Organizational Support in Philippine Public Elementary Schools

Research Question: To what extent does perceived organizational support predict emotional exhaustion among public elementary school teachers in Region VII?

Design: Descriptive-correlational survey; cross-sectional design.

Population & Sampling: All 4,820 public elementary school teachers in the Cebu City Division. Stratified random sampling by school cluster; n = 370 (Cochran's formula, p = .50, e = .05, 95% CI).

Instruments: (1) Survey of Perceived Organizational Support (SPOS; Eisenberger et al., 1986; 8-item version), Cronbach's α = .92 in pilot study; (2) Maslach Burnout Inventory – Educators Survey (MBI-ES; Maslach et al., 2017), emotional exhaustion subscale (9 items), α = .91. Both administered as paper questionnaires distributed through school coordinators.

Analysis: Descriptive statistics (means, SDs, frequency distributions); Pearson's r to test the bivariate relationship; simple linear regression to establish the predictive relationship, with demographic variables entered as covariates.

Key Finding (Illustrative): Perceived organizational support explained 28% of the variance in emotional exhaustion scores (R² = .28, F(1, 368) = 143.7, p < .001), with higher perceived support associated with lower emotional exhaustion (β = −.53, p < .001).

Example 02 — Public Health Research

COVID-19 Vaccine Hesitancy and Health Information Sources Among Adults in a Philippine Urban City

Research Question: What are the predictors of COVID-19 vaccine hesitancy among adults in Cebu City, and which health information sources are associated with lower hesitancy?

Design: Analytical cross-sectional survey.

Population & Sampling: Adults aged 18–65 residing in Cebu City. Multistage cluster sampling: first randomly select barangays (n = 30 of 80), then randomly select households, then randomly select one eligible adult per household; target n = 600.

Instrument: Adapted WHO SAGE Working Group Vaccine Hesitancy Survey questionnaire; additional items on media use and health information seeking behavior. Administered face-to-face by trained interviewers.

Analysis: Binary logistic regression with vaccine hesitancy (hesitant/not hesitant) as the dependent variable; predictors include age, sex, education, income, trust in health authorities, and primary health information source.

Example 03 — Organizational Research

Remote Work Flexibility and Job Satisfaction Among Knowledge Workers in the BPO Sector

Research Question: Does work schedule flexibility mediate the relationship between remote work arrangements and job satisfaction among BPO employees in the Philippines?

Design: Analytical cross-sectional survey; mediation analysis framework.

Instrument: Online survey administered via Microsoft Forms: (1) Telework and Flexibility Scale (adapted from Golden & Veiga, 2005); (2) Job Satisfaction Survey (Spector, 1985; 9-dimension scale); (3) Schedule Flexibility Scale (adapted from Baltes et al., 1999). All scales validated in pilot with 50 BPO employees; Cronbach's alphas ranged from .79 to .91.

Analysis: PROCESS macro (Hayes, 2022) for mediation analysis; bootstrapped confidence intervals (5,000 iterations) to test the indirect effect of remote work on job satisfaction through schedule flexibility.

Example 04 — Longitudinal Survey Design

Academic Self-Efficacy Trajectories in Senior High School Students

Research Question: How does academic self-efficacy change from Grade 11 to Grade 12, and what school-level and student-level factors predict trajectories of change?

Design: Two-wave panel study; data collected at Grade 11 (Time 1) and Grade 12 (Time 2, 12 months later). Multilevel growth modeling to account for nesting of students within schools.

Instrument: Academic Self-Efficacy Scale (Bandura, 2006); 7-point response scale (1 = "cannot do at all" to 7 = "highly certain can do").

Note on Attrition: Of 850 students surveyed at Time 1, 763 (89.8%) provided complete data at Time 2. Attrition analysis showed no significant difference between retained and lost participants on Time 1 self-efficacy scores, sex, or school type, suggesting attrition was approximately random and unlikely to bias findings substantially.

Test Your Understanding

15 questions covering all major topics in this module. Record your responses, then check your answers.

Question 1 of 15

Which of the following most accurately describes the fundamental purpose of survey research as a quantitative method?

Question 2 of 15

A researcher surveys Grade 12 students in 2020 and again surveys a different but representative sample of Grade 12 students from the same school in 2025 to track changes in career aspirations. This is an example of which type of survey design?

Question 3 of 15

In a study of 10,000 public school teachers, a researcher lists all teachers alphabetically and selects every 20th teacher starting from a random point between 1 and 20. This sampling technique is called:

Question 4 of 15

Which of the following is an example of a double-barreled questionnaire item?

Question 5 of 15

A researcher uses Cochran's formula and determines a required sample size of n₀ = 384. The actual population size is N = 1,200. What should the researcher do?

Question 6 of 15

A Cronbach's alpha of .64 is obtained from a pilot test of a new 12-item attitude scale. What is the most methodologically appropriate next step?

Question 7 of 15

Which type of validity is assessed by examining whether an instrument's scores correlate with scores on another well-established measure of the same construct administered simultaneously?

Question 8 of 15

Non-response bias is most appropriately defined as:

Question 9 of 15

A dissertation study found a statistically significant positive correlation between teaching experience and student achievement (r = .08, p = .02, n = 620). Which interpretation is most appropriate?

Question 10 of 15

Which sampling technique is most commonly used in large-scale national surveys to balance cost efficiency with representativeness?

Question 11 of 15

According to Stevens's (1946) levels of measurement, which statistical measure is appropriate for nominal-level data?

Question 12 of 15

A researcher conducting a survey on domestic violence attitudes is concerned about social desirability bias. Which technique specifically designed to address this concern for sensitive survey items would be most effective?

Question 13 of 15

Common method bias is most likely to be a serious concern when:

Question 14 of 15

Which analytical technique is most appropriate when a researcher has student-level data (n = 800) nested within school-level data (k = 40 schools) and wants to examine both student-level and school-level predictors of academic performance?

Question 15 of 15

Under the National Ethical Guidelines for Health and Health-Related Research in the Philippines (PHREB, 2017), when must a survey study receive IRB approval?

0/15 Your Score

Go to Classroom Activities

Section 15

Classroom Activities for Teachers

The following structured activities are designed for instructors teaching research methodology at the graduate or advanced undergraduate level. Each activity engages students as active producers of knowledge rather than passive consumers of information, consistent with constructivist principles of adult learning (Merriam & Bierema, 2014). All activities can be adapted for face-to-face, blended, or fully online modalities.

1
The Survey Autopsy: Diagnosing a Flawed Questionnaire
Individual + Group 60–90 min
Learning Objectives
  • Identify common questionnaire design errors in real instruments
  • Articulate how specific errors threaten measurement validity
  • Revise flawed items to meet professional design standards
Materials
  • A deliberately flawed questionnaire prepared by the instructor (containing double-barreled, leading, ambiguous, and response-mismatched items)
  • Questionnaire Design Checklist (instructor-provided rubric)
Procedure
  • Step 1 (15 min, individual): Each student receives the flawed questionnaire and reviews it independently, flagging problematic items and naming the specific error type.
  • Step 2 (20 min, pairs): Students compare their analyses with a partner, resolve disagreements, and together revise each flawed item.
  • Step 3 (20 min, full class): Pairs share their revisions. The class discusses competing revisions and reaches consensus on best-practice versions of each item.
  • Step 4 (15 min, individual reflection): Each student writes a brief reflection on which error type they found most difficult to detect and why.
Debrief Questions
  • Which questionnaire flaw is most likely to go undetected during instrument development? Why?
  • How does cognitive interviewing address the limitations of expert review alone?
2
Sampling Simulation: Who Gets Selected and Who Does Not?
Group Activity 45–60 min
Learning Objectives
  • Differentiate among probability sampling methods through direct application
  • Compute sampling intervals and select samples from a given frame
  • Evaluate sampling methods in terms of cost, representativeness, and feasibility
Materials
  • A printed list of 100 fictional faculty members with demographic attributes (sex, department, years of service, rank)
  • Random number table or random number generator (phone app)
  • Sampling comparison worksheet
Procedure
  • Groups of 4–5 students each draw a sample of n = 20 using a different assigned method: simple random, systematic, stratified by sex, cluster by department.
  • Each group computes the proportion of women, average years of service, and proportion of full professors in their sample.
  • Groups compare results: which method produced the sample most representative of the full list? Which method would be most efficient if the researcher had limited time?
  • Instructor facilitates a whole-class comparison of how different methods affect sample composition on these known characteristics.
Extension for Advanced Students
  • Calculate the design effect (DEFF) for the cluster sampling results to illustrate the efficiency trade-off.
3
Mini Survey Project: From Concept to Data Report
Group Project 2–3 Weeks Capstone
Learning Objectives
  • Experience the complete survey research process from design to reporting
  • Apply instrument design, sampling, and data analysis skills in an integrated manner
  • Communicate findings in an academic report format
Overview

Groups of 3–4 students design and conduct a small-scale survey study within their academic community. The research question must be approved by the instructor. The project unfolds in three phases:

Phase 1 — Design (Week 1)
  • Define a research question and identify the target population within the institution
  • Select a validated instrument or construct a 10–15 item scale; conduct a cognitive interview with 3 classmates not in the group
  • Determine sampling method and compute sample size; prepare IRB protocol (simplified ethical clearance form)
Phase 2 — Data Collection (Week 2)
  • Administer the survey; implement at least one follow-up contact for non-respondents
  • Compute and report the response rate
Phase 3 — Analysis and Report (Week 3)
  • Clean data; compute descriptive statistics and at least one inferential test
  • Submit a 2,000-word research report in APA 7th edition format, including method, results, discussion, limitations, and references
  • Present findings in a 10-minute class presentation
4
The Response Bias Lab: A Live Classroom Experiment
Full Class 30–40 min
Learning Objectives
  • Observe question-wording and context effects on survey responses in real time
  • Understand why experimental evidence, not intuition, drives questionnaire design guidelines
Procedure
  • Split version: Divide the class randomly into two groups. Give Group A a questionnaire asking about "freedom of speech" before a question about a controversial topic. Give Group B the same controversial question, but ask about "restrictions on hate speech" instead of "freedom of speech." Both versions address the same underlying attitude.
  • Collect responses anonymously (e.g., by show of hands or anonymous sticky-note counts).
  • Reveal and compare the response distributions between groups.
  • Debrief: How did framing (question wording) shape responses? What does this imply for instrument design?
Instructor Note

This replicates the classic question-wording experiment (Schuman & Presser, 1981). Students are often genuinely surprised by the magnitude of the effect, making it a memorable illustration of why careful wording matters.

5
Ethics Deliberation: The Informed Consent Dilemma
Discussion + Role Play 45 min
Learning Objectives
  • Apply ethical principles (autonomy, beneficence, non-maleficence, justice) to realistic survey research scenarios
  • Articulate the researcher's ethical responsibilities to participants and to the scientific community
Scenario Cards (Distribute one per small group)
  • Scenario A: A researcher studying stigma toward persons with disabilities wants to include deception in the consent form to prevent socially desirable responding. How should informed consent be handled?
  • Scenario B: A graduate student conducts an online survey on student mental health and discovers one respondent's answers suggest suicidal ideation. What are the researcher's ethical obligations given that the survey is anonymous?
  • Scenario C: A department chair distributes a survey on employee satisfaction and asks subordinates to complete it during a required meeting. What ethical concerns does this raise, and how should they be addressed?
  • Scenario D: A researcher collects survey data and finds that results do not support the sponsor's expected conclusions. The sponsor requests that these findings be omitted from the report. What should the researcher do?
Procedure
  • Groups deliberate for 15 minutes, applying the Belmont Report principles and PHREB guidelines to their scenario.
  • Each group presents their decision and rationale (5 minutes each).
  • Full-class debrief: Where did groups disagree? What ethical tensions are irresolvable through procedural rules alone?

Section 16

References

All references follow APA 7th Edition format. URLs included where publicly accessible. Citations prioritize publications from 2010–2026 to reflect current methodological practice.

  1. American Psychological Association. (2020). Publication manual of the American Psychological Association (7th ed.). American Psychological Association. https://doi.org/10.1037/0000165-000
  2. Baltes, B. B., Briggs, T. E., Huff, J. W., Wright, J. A., & Neuman, G. A. (1999). Flexible and compressed workweek schedules: A meta-analysis of their effects on work-related criteria. Journal of Applied Psychology, 84(4), 496–513. https://doi.org/10.1037/0021-9010.84.4.496
  3. Bandura, A. (2006). Guide for constructing self-efficacy scales. In F. Pajares & T. Urdan (Eds.), Self-efficacy beliefs of adolescents (Vol. 5, pp. 307–337). Information Age Publishing.
  4. Brick, J. M., & Williams, D. (2013). Explaining rising nonresponse rates in cross-sectional surveys. The ANNALS of the American Academy of Political and Social Science, 645(1), 36–59. https://doi.org/10.1177/0002716212456834
  5. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.
  6. Converse, J. M. (1987). Survey research in the United States: Roots and emergence 1890–1960. University of California Press.
  7. Dillman, D. A., Smyth, J. D., & Christian, L. M. (2014). Internet, phone, mail, and mixed-mode surveys: The tailored design method (4th ed.). Wiley.
  8. Eisenberger, R., Huntington, R., Hutchison, S., & Sowa, D. (1986). Perceived organizational support. Journal of Applied Psychology, 71(3), 500–507. https://doi.org/10.1037/0021-9010.71.3.500
  9. Fowler, F. J. (2014). Survey research methods (5th ed.). SAGE Publications.
  10. Golden, T. D., & Veiga, J. F. (2005). The impact of extent of telecommuting on job satisfaction: Resolving inconsistent findings. Journal of Management, 31(2), 301–318. https://doi.org/10.1177/0149206304271768
  11. Groves, R. M., & Peytcheva, E. (2008). The impact of nonresponse rates on nonresponse bias: A meta-analysis. Public Opinion Quarterly, 72(2), 167–189. https://doi.org/10.1093/poq/nfn011
  12. Groves, R. M., Floyd, J. F., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2011). Survey methodology (2nd ed.). Wiley.
  13. Hayes, A. F. (2022). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach (3rd ed.). Guilford Press.
  14. Krosnick, J. A. (1991). Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied Cognitive Psychology, 5(3), 213–236. https://doi.org/10.1002/acp.2350050305
  15. Krosnick, J. A., & Presser, S. (2010). Question and questionnaire design. In P. V. Marsden & J. D. Wright (Eds.), Handbook of survey research (2nd ed., pp. 263–314). Emerald Group Publishing.
  16. Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel Psychology, 28(4), 563–575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
  17. Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22(140), 1–55.
  18. Maslach, C., Leiter, M. P., & Jackson, S. E. (2017). Maslach Burnout Inventory manual (4th ed.). Mind Garden.
  19. McNeish, D. (2018). Thanks coefficient alpha, we'll take it from here. Psychological Methods, 23(3), 412–433. https://doi.org/10.1037/met0000144
  20. Merriam, S. B., & Bierema, L. L. (2014). Adult learning: Linking theory and practice. Jossey-Bass.
  21. Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). McGraw-Hill.
  22. Philippine Health Research Ethics Board (PHREB). (2017). National ethical guidelines for health and health-related research. Department of Health, Republic of the Philippines. https://phreb.doh.gov.ph
  23. Podsakoff, P. M., MacKenzie, S. B., & Podsakoff, N. P. (2012). Sources of method bias in social science research and recommendations on how to control it. Annual Review of Psychology, 63, 539–569. https://doi.org/10.1146/annurev-psych-120710-100452
  24. Reichheld, F. F. (2003). The one number you need to grow. Harvard Business Review, 81(12), 46–55.
  25. Schuman, H., & Presser, S. (1981). Questions and answers in attitude surveys: Experiments on question form, wording, and context. Academic Press.
  26. Spector, P. E. (1985). Measurement of human service staff satisfaction: Development of the Job Satisfaction Survey. American Journal of Community Psychology, 13(6), 693–713. https://doi.org/10.1007/BF00929796
  27. Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103(2684), 677–680. https://doi.org/10.1126/science.103.2684.677
  28. Tourangeau, R., Rips, L. J., & Rasinski, K. (2000). The psychology of survey response. Cambridge University Press.
  29. Tourangeau, R., Conrad, F. G., & Couper, M. P. (2013). The science of web surveys. Oxford University Press.
  30. Weijters, B., Cabooter, E., & Schillewaert, N. (2013). The effect of rating scale format on response styles: The number of response categories and response category labels. International Journal of Research in Marketing, 27(3), 236–247. https://doi.org/10.1016/j.ijresmar.2010.02.007