Survey Research: A Comprehensive Doctoral-Level Guide

Section 01

Definition and Nature of Survey Research

Survey research is among the most widely used methodological approaches in the social, behavioral, and health sciences. At its foundation, it is a systematic method of collecting data from a defined group of people—often called respondents or participants—with the aim of describing, comparing, or explaining the distribution of attributes, attitudes, behaviors, and opinions within a population.

Formal Definition

Survey research is "the collection of information from a sample of individuals through their responses to questions" designed to generalize findings to a broader population of interest (Fowler, 2014, p. 1). It combines the logic of probability sampling with the systematic measurement of variables through standardized instruments.

What distinguishes survey research from casual questioning is its rigorous structure. Questions are prepared in advance, administered consistently across all respondents, and analyzed using statistical procedures that allow the researcher to draw inferences about populations that are far larger than the sample itself. This inferential power, rooted in probability theory and careful sampling design, is what makes survey research a cornerstone of quantitative inquiry.

Survey research is positioned squarely within the positivist tradition, which assumes that social reality can be measured, quantified, and analyzed objectively. However, contemporary researchers recognize that survey data always reflect respondent perceptions, which means that careful attention to wording, context, and measurement error is not optional—it is essential to the integrity of findings.

💡

Key Characteristic Survey research does not manipulate variables. Unlike experimental designs, it observes and measures variables as they naturally exist within participants' lives, making it descriptive and correlational by nature rather than causal—unless longitudinal or quasi-experimental designs are employed.

The scope of survey research is remarkably broad. Psychologists use surveys to measure personality traits and mental health status; sociologists deploy them to track social attitudes across decades; educational researchers apply them to evaluate student learning experiences; and public health researchers rely on them to monitor health behaviors at the population level. Their versatility, combined with relatively manageable costs compared to experimental designs, makes surveys a method of enduring relevance.

Section 02

Historical Background

The roots of survey research extend deeper into history than many researchers realize. Charles Booth's monumental social survey of London's poor, conducted between 1886 and 1903, is widely recognized as one of the earliest systematic social surveys. Booth's team gathered household-level data on living conditions, employment, and poverty across the entire city—a methodological ambition that remains impressive to this day.

In the United States, the development of public opinion polling in the 1930s marked a decisive acceleration in survey methodology. George Gallup's correct prediction of the 1936 U.S. presidential election result—contrary to the larger but methodologically flawed Literary Digest poll—demonstrated dramatically the superiority of probability-based sampling over convenience or volunteer samples. That moment shifted the entire discipline's understanding of what made a survey scientifically credible (Converse, 1987).

The postwar decades saw significant investment in survey methodology through institutions such as the Survey Research Center at the University of Michigan, established in 1946. Scholars like Leslie Kish formalized probability sampling theory; Rensis Likert, Hadley Cantril, and Louis Guttman contributed foundational work on attitude measurement. By the 1960s and 1970s, large-scale longitudinal surveys—such as the Panel Study of Income Dynamics (begun 1968)—demonstrated that survey data could capture social change over time with extraordinary depth.

The late twentieth and early twenty-first centuries brought two fundamental transformations. First, computer-assisted interviewing—initially telephone-based (CATI) and later web-based—dramatically increased the efficiency and reach of data collection. Second, rapidly declining response rates, the proliferation of online survey platforms, and the emergence of nonprobability sampling approaches prompted serious methodological debate about the foundations of inference from survey data (Tourangeau et al., 2013; Groves et al., 2011).

Today, survey research grapples simultaneously with unprecedented access—anyone with a smartphone can reach thousands of potential respondents within hours—and persistent threats to data quality including self-selection bias, satisficing behavior, and algorithmic filtering that may skew sample composition in ways researchers cannot easily detect or correct.

Section 03

Types of Survey Research

Survey research is not a monolithic method. Several distinct types exist, each suited to different research objectives, time horizons, and epistemological commitments. A thorough researcher selects the type that best matches the research question, not merely the type that is most convenient to administer.

By Time Frame

Cross-Sectional Survey

Data are collected from participants at a single point in time. This is the most commonly used type in social research. It provides a snapshot of the population's characteristics, attitudes, or behaviors as they exist at the moment of data collection. Suitable for descriptive and correlational research questions but cannot establish temporal precedence.

By Time Frame

Longitudinal Survey

Data are collected from the same population across multiple time points. Enables the researcher to track change over time, examine developmental trajectories, and establish temporal ordering of variables—a precondition for causal inference in nonexperimental designs. Subdivided into panel, cohort, and trend studies.

Longitudinal Subtype

Panel Study

The same individuals are surveyed repeatedly over time. Provides the richest data for studying individual change but is vulnerable to panel attrition—the systematic loss of participants that can bias findings if those who drop out differ meaningfully from those who remain. The Panel Study of Income Dynamics is a classic example.

Longitudinal Subtype

Cohort Study

Individuals sharing a defining characteristic (e.g., birth year, graduation year, employment entry) are followed over time. Not all cohort members need be the same individuals at each wave; representative samples from the cohort may be drawn instead. Widely used in epidemiology and educational research.

Longitudinal Subtype

Trend Study

Different samples from the same general population are surveyed at different time points. Unlike panel studies, trend studies replace, rather than retain, participants across waves. Suitable for monitoring aggregate change in public opinion, attitudes, or behaviors over extended periods. The Gallup Poll is a classic example.

By Purpose

Descriptive Survey

Aims to describe the characteristics or attributes of a specific population as they exist at a point in time. Does not seek to explain relationships among variables. Census surveys are the paradigmatic case. Results are typically reported as frequency distributions and percentages.

By Purpose

Analytical Survey

Goes beyond description to examine relationships and test hypotheses about associations among variables. Regression analysis, structural equation modeling, and multilevel modeling are among the techniques applied. Most doctoral research using survey methods employs an analytical framework.

By Administration

Census Survey

Data are collected from every member of the defined population rather than from a sample. Eliminates sampling error but is feasible only for small or institutionally bounded populations (e.g., all teachers in a single school district). National population censuses are government-administered examples conducted at enormous scale and cost.

Section 04

The Survey Design Process

Designing a rigorous survey is an iterative, multilayered process that requires careful planning before a single questionnaire item is written. Researchers who begin by drafting questions—before articulating their research questions, conceptual framework, or measurement model—invariably produce instruments that are difficult to analyze and whose results are difficult to interpret. The following process reflects best practices from the field's leading methodologists (Dillman et al., 2014; Fowler, 2014; Krosnick & Presser, 2010).

1

Define the Research Problem and Objectives

Articulate precisely what the survey is meant to learn. Formulate specific research questions or hypotheses. Identify the conceptual variables of interest and their theoretical relationships. This step determines everything that follows; vague research questions produce vague survey instruments.
2

Review Existing Literature and Instruments

Search systematically for published studies addressing the same or related research questions. Identify validated instruments that have demonstrated psychometric properties. Using existing, validated instruments—or adapting them with proper citation and pilot testing—is almost always preferable to constructing new measures from scratch.
3

Define the Target Population and Sampling Frame

Specify the population to which findings will be generalized with precision: not "teachers in the Philippines" but "public elementary school teachers employed in DepEd-managed schools in Region VII during School Year 2024–2025." Then identify or construct the sampling frame—the list or mechanism from which the sample will be drawn.
4

Select the Sampling Method and Compute Sample Size

Choose a probability or nonprobability sampling strategy appropriate to the research questions and available resources. Compute the required sample size using power analysis or accepted formulas, accounting for expected response rates and desired precision. Under-powered surveys yield unreliable estimates regardless of how well the questionnaire is designed.
5

Design the Questionnaire

Write, review, and refine questionnaire items following evidence-based guidelines for question wording, response option design, and questionnaire structure. Ensure every item maps directly to a research question or conceptual variable. Review the instrument for double-barreled questions, leading language, ambiguous terminology, and inappropriate reading-level assumptions.
6

Establish Validity and Reliability

Subject the draft instrument to expert review for content validity. Conduct cognitive interviewing with members of the target population to assess comprehension. Pilot test the instrument with a small sample (typically 30–50 participants) and compute reliability statistics (e.g., Cronbach's alpha). Revise as needed.
7

Obtain Ethical Approval

Submit the study protocol to the Institutional Review Board (IRB) or Ethics Review Committee before any data collection begins. Prepare informed consent procedures, data storage protocols, and participant confidentiality protections in accordance with institutional and national guidelines.
8

Administer the Survey

Deploy the instrument to the sample using the chosen administration mode. Implement a contact strategy—initial contact, reminders, and follow-ups—to maximize response rates without coercion. Tailored Design Method (Dillman et al., 2014) provides an evidence-based framework for optimizing response rates across administration modes.
9

Process, Clean, and Analyze Data

Code, clean, and screen data systematically before analysis. Assess patterns of missing data and apply appropriate handling strategies (listwise deletion, multiple imputation). Conduct analyses aligned with the research questions, reporting effect sizes and confidence intervals alongside significance tests.
10

Interpret, Report, and Disseminate Findings

Interpret results within the conceptual framework and in light of the study's limitations. Prepare reports and manuscripts that transparently describe sampling procedures, response rates, instrument properties, and analytical decisions. Disseminate findings in appropriate academic and practitioner venues.

Section 05

Sampling Methods

The credibility of survey research rests substantially on the adequacy of the sampling strategy. Sampling theory provides the mathematical justification for generalizing from a subset of a population to the whole; without a defensible sampling design, even the most carefully worded questionnaire produces findings of doubtful generalizability.

Sampling methods divide into two broad families: probability sampling, in which every member of the population has a known, nonzero probability of being selected, and nonprobability sampling, in which selection probabilities are unknown and generalizability depends on argument rather than mathematical proof.

Simple Random Sampling

Probability

Every member of the sampling frame has an equal probability of selection. Requires a complete and accurate sampling frame. Produces unbiased estimates with known precision. Computationally straightforward but logistically demanding for large, dispersed populations.

Systematic Sampling

Probability

Every k^th element is selected from an ordered list after a random start. The sampling interval k = N/n, where N is the population size and n is the desired sample size. Approximately equivalent to simple random sampling unless the list has a periodic structure that coincides with the sampling interval.

Stratified Sampling

Probability

The population is divided into homogeneous subgroups (strata) and simple random samples are drawn from each stratum independently. Guarantees adequate representation of key subgroups and typically yields more precise estimates than simple random sampling when strata differ on the outcome variable.

Cluster Sampling

Probability

The population is divided into clusters (usually naturally occurring groups such as schools, barangays, or hospitals). A random sample of clusters is selected and all members within chosen clusters are surveyed. Reduces data collection costs substantially but typically increases standard errors relative to simple random sampling.

Multistage Sampling

Probability

Combines multiple sampling methods in sequence. A large-scale example: first select provinces at random, then municipalities within selected provinces, then barangays within selected municipalities, then households within selected barangays. Used by virtually all large national surveys.

Convenience Sampling

Nonprobability

Participants are selected on the basis of availability and willingness. Efficient but prone to severe selection bias. Common in student-centered research and online surveys. Limits generalizability; findings should be interpreted cautiously and labeled as preliminary or exploratory.

Purposive Sampling

Nonprobability

Participants are selected based on specific characteristics deemed relevant to the research question. Useful when the researcher requires participants with particular expertise, experience, or attributes. The researcher must explicitly justify selection criteria in the methodology section.

Quota Sampling

Nonprobability

The researcher sets quotas for participant categories (e.g., 50 male teachers, 50 female teachers) and recruits participants until quotas are filled. Superficially resembles stratified sampling but lacks random selection within strata; therefore does not yield probability-based estimates.

Snowball Sampling

Nonprobability

Initial participants are recruited, then asked to refer additional participants from their social networks. Used when the target population is hidden, hard to reach, or stigmatized (e.g., undocumented workers, individuals with rare conditions). Not recommended as the primary strategy in doctoral quantitative research unless the population is genuinely inaccessible by other means.

Computing Sample Size

One of the most consequential—and most frequently mishandled—decisions in survey research is sample size determination. Underpowered studies cannot detect true effects; overpowered studies waste resources. The appropriate approach depends on the analytical technique and the effect size the researcher expects to detect.

Slovin's Formula (Simplified for Descriptive Surveys)

n = N / (1 + Ne²)

Where n = required sample size, N = population size, e = margin of error (typically .05 or .10). Note: This formula assumes a proportion of 0.50 and should be used for descriptive purposes only.

Cochran's Formula (For Unknown or Large Populations)

n₀ = (Z² × p × q) / e²

Where Z = z-value for confidence level (1.96 for 95%), p = estimated proportion of the attribute (0.50 if unknown), q = 1 – p, e = desired margin of error. If the population is finite: n = n₀ / (1 + (n₀ – 1)/N).

⚠️

Important Caution for Doctoral Students Slovin's formula, while widely taught and used in Philippine graduate education, has important limitations. It was designed for descriptive surveys estimating proportions. For analytical surveys testing relationships among variables using regression or SEM, power analysis using Cohen's (1988) framework or G*Power software provides more defensible sample size estimates. Your thesis or dissertation committee may rightly question a sample size justified solely by Slovin's formula for a correlational or predictive study.

Section 06

Questionnaire Construction

The questionnaire is the primary instrument through which survey data are gathered. Its quality determines the quality of the data. A poorly designed questionnaire introduces systematic measurement error—bias that cannot be corrected at the analysis stage, no matter how sophisticated the statistical techniques applied. Questionnaire design is therefore a technical and intellectual discipline in its own right, not merely an administrative task.

Question Types

Question Type	Description	Best Used For	Limitations
Closed-ended (fixed)	Respondents choose from predetermined response options	Quantifiable data; comparisons across respondents; statistical analysis	May not capture full range of responses; response options may not fit all respondents
Open-ended	Respondents provide answers in their own words	Exploratory research; capturing nuance; generating hypotheses	Difficult to quantify; requires content analysis; lower completion rates
Dichotomous	Two mutually exclusive options (Yes/No; True/False)	Simple factual questions; demographic screening	Oversimplifies complex attitudes or behaviors
Multiple choice	Select one or more options from a list	Categorical variables with known options	Exhaustive and mutually exclusive options required
Rating scale	Respond on a numeric or labeled scale	Measuring intensity, frequency, agreement, or satisfaction	Prone to response biases (e.g., acquiescence, central tendency)
Ranking	Order options from most to least preferred or important	Establishing relative preferences	Cognitively demanding; difficult to analyze statistically
Matrix	Multiple items rated on the same scale in a grid format	Efficient collection of scale items	Prone to straight-lining (respondents check same column without reading)

Principles of Effective Question Wording

The phrasing of individual questions is where much measurement error originates. Krosnick and Presser (2010) summarize decades of experimental evidence on question wording effects that all survey researchers should internalize.

Every word in a survey question must be comprehensible to all members of the target population. Avoid technical jargon unless it is established vocabulary within the population. When in doubt, choose the simpler word: "start" instead of "commence," "use" instead of "utilize." Reading-level analyses (e.g., Flesch-Kincaid) can assist but do not replace cognitive interviewing with actual target participants.

A double-barreled question asks about two distinct things simultaneously, making it impossible to know which aspect the respondent is addressing. Example of a flawed item: "The school administration is responsive and supportive of teachers' needs." A respondent might find administration responsive but not supportive—or vice versa. Each construct must occupy its own item.

Leading questions embed an implied correct answer or socially desirable response. Example: "Don't you agree that teachers deserve higher salaries?" Loaded questions use emotionally charged language. Both systematically distort responses. Questions should be phrased in balanced, neutral language that does not presuppose any answer.

For closed-ended questions, response options must cover all plausible answers (exhaustive) and must not overlap (mutually exclusive). Including "Other (please specify)" addresses exhaustiveness when not all options can be anticipated. For age categories, ranges must be clearly defined: "25–34" and "35–44" rather than "25–35" and "35–45," which overlap.

Questions that require respondents to recall events from distant memory introduce substantial recall error. "How many times have you visited a health center in the past 12 months?" is more reliable than "How many times have you visited a health center in the past five years?" Bounded recall periods—specifying a concrete, recent time window—substantially reduce telescoping and forward-memory errors (Tourangeau et al., 2000).

Question order can systematically influence responses through context effects. General questions should precede specific ones (the funnel approach) to avoid artificially inflating the salience of specific topics. Sensitive or personally invasive questions should appear toward the end of the instrument, after rapport has been established, and should be preceded by less threatening questions.

Section 07

Measurement Scales in Survey Research

The choice of measurement scale determines the statistical analyses that are permissible, the amount of information captured per item, and the cognitive demands placed on respondents. Survey researchers must understand both the psychometric properties and the practical limitations of each scale type.

The Likert Scale

The Likert scale, developed by Rensis Likert in his 1932 doctoral dissertation, remains the most widely used scale in survey research. A Likert item presents a statement and asks respondents to indicate their degree of agreement using a symmetric agree–disagree response format. A full Likert scale consists of multiple such items that are summed or averaged to produce a composite score representing an underlying latent construct.

📘

Likert Item vs. Likert Scale A single Likert-type item is not a Likert scale. A true Likert scale is a composite of multiple items measuring the same latent variable. Treating a single item as if it were a scale is a common and consequential methodological error in graduate theses and dissertations. The scale must also demonstrate internal consistency—typically assessed via Cronbach's alpha—before scores can be meaningfully interpreted.

Interactive Example: 5-Point Likert Item

Statement: "My school's administration provides adequate support for professional development activities."

1Strongly Disagree

2Disagree

3Neither Agree nor Disagree

4Agree

5Strongly Agree

Common Scale Types Compared

Scale Type	Inventor/Source	Structure	Key Feature	Typical Use
Likert Scale	Rensis Likert (1932)	Agreement statements, 4–7 points	Balanced agree–disagree options; composite scoring	Attitudes, perceptions, beliefs
Semantic Differential	Osgood et al. (1957)	Bipolar adjective pairs, 7-point	Measures connotative meaning along bipolar dimensions	Brand attitudes, person perception
Thurstone Scale	Thurstone (1928)	Judge-rated statements	Items assigned scale values through expert judgment	Historical; rarely used in contemporary practice
Guttman Scale	Guttman (1944)	Hierarchically ordered items	Cumulative: endorsing a harder item implies endorsing all easier ones	Measuring ordered abilities or behaviors
Visual Analog Scale	Clinical research	Continuous line with endpoints	Captures fine-grained gradation; requires special scoring	Pain intensity, emotion intensity
Net Promoter Score	Reichheld (2003)	0–10 single-item	Classifies promoters, passives, detractors	Customer satisfaction; organizational loyalty

Levels of Measurement and Statistical Implications

Stevens's (1946) taxonomy of measurement levels—nominal, ordinal, interval, and ratio—remains foundational to understanding the statistical analyses that are appropriate for a given scale. Whether Likert scale data should be treated as ordinal or interval-level has generated substantial methodological debate.

Level	Properties	Example in Surveys	Appropriate Statistics
Nominal	Categories only; no rank or distance	Sex, region, school type	Mode, chi-square, frequency counts
Ordinal	Ranked categories; distances not equal	Likert items (technically); education level	Median, mode, Spearman's rho, Mann-Whitney U
Interval	Equal distances; no true zero	Temperature in Celsius; IQ scores	Mean, standard deviation, Pearson's r, t-test, ANOVA
Ratio	Equal distances; true zero	Income, age, years of experience	All parametric statistics; geometric mean

Section 08

Data Collection Methods

The mode of survey administration—how the questionnaire is delivered to respondents—has significant implications for response rates, data quality, cost, coverage, and the nature of social desirability effects. No single mode is optimal for all research contexts; the choice requires careful analysis of the target population, budget, timeline, and the sensitivity of the topic.

Mode	Description	Advantages	Disadvantages	Response Rate (Typical)
Face-to-Face Interview	Trained interviewer administers questions in person	Highest data quality; complex questions possible; can probe; low item non-response	Expensive; interviewer effects; geographic constraints	60–80%
Telephone Interview (CATI)	Interviews conducted by phone using computerized scripts	Moderate cost; centralized supervision; wide geographic reach	Cell-phone-only households; declining response rates; no visual aids	25–45%
Mail (Postal) Survey	Paper questionnaire mailed to sampled addresses	Low cost per completed survey; no interviewer effect; respondent controls pace	Slow; low response rates; literacy required; limited length	20–40%
Web Survey (CAWI)	Online questionnaire accessed via link (e.g., Google Forms, Qualtrics)	Low cost; fast; multimedia possible; automatic data entry; wide reach	Coverage bias (digital divide); low response rates; data quality concerns; no verifiable sampling frame	10–30%
Group-Administered	Questionnaire distributed to a captive group (e.g., students in a classroom)	Very high response rate; efficient; researcher can answer questions	Limited to accessible groups; social desirability in shared settings	90–100%
Mixed-Mode	Two or more modes used within the same study	Higher coverage; higher response rates; cost-efficient	Mode effects can introduce measurement error across groups	Varies by combination

📋

The Problem of Non-Response Non-response is not merely an inconvenience—it is a serious threat to the validity of survey findings whenever respondents and non-respondents differ systematically on the variables being measured. This condition, called non-response bias, can occur even when overall response rates appear adequate. Researchers should always report response rates and, where possible, compare early and late respondents or use available administrative data to assess the likely direction and magnitude of non-response bias (Groves & Peytcheva, 2008).

Section 09

Validity and Reliability of Survey Instruments

Validity and reliability are the twin pillars of measurement quality in survey research. An instrument that measures nothing real is invalid; an instrument that measures something real but inconsistently is unreliable. Ideally, an instrument is both valid and reliable—though reliability is a necessary but not sufficient condition for validity.

Validity

Type of Validity	Definition	How to Establish
Content Validity	Instrument items adequately represent the full domain of the construct	Expert panel review; content validity ratio (Lawshe, 1975); systematic item mapping to construct dimensions
Face Validity	Instrument appears, on its surface, to measure what it claims to measure	Review by subject matter experts and members of the target population; not a substitute for empirical validation
Construct Validity	Instrument measures the theoretical construct it purports to measure	Confirmatory factor analysis (CFA); convergent and discriminant validity; nomological network testing
Convergent Validity	Items measuring the same construct correlate with each other and with measures of related constructs	Average Variance Extracted (AVE) ≥ .50; correlations with established measures of the same construct
Discriminant Validity	Instrument does not correlate too highly with measures of conceptually distinct constructs	AVE for each construct exceeds the squared correlation between constructs; Heterotrait-Monotrait ratio (HTMT)
Criterion-Related Validity	Instrument scores predict or correlate with an external criterion	Concurrent validity (correlation with simultaneous criterion); Predictive validity (correlation with future criterion)

Reliability

Reliability Type	Definition	Assessment Method	Acceptable Threshold
Internal Consistency	Items within a scale produce consistent responses	Cronbach's alpha; McDonald's omega	α ≥ .70 (Nunnally & Bernstein, 1994); α ≥ .80 preferred for clinical or high-stakes decisions
Test-Retest Reliability	Scores are stable over time when the construct itself has not changed	Pearson's r or ICC between time points (2–4 weeks apart is typical)	r or ICC ≥ .70; ≥ .80 preferred
Inter-Rater Reliability	Different raters produce consistent scores when rating the same responses	Cohen's kappa (κ); ICC; percent agreement	κ ≥ .70; ≥ .80 preferred
Parallel Forms Reliability	Two equivalent forms of the instrument produce consistent scores	Correlation between forms administered to the same group	r ≥ .80 typically expected
Split-Half Reliability	Two halves of the instrument correlate with each other	Spearman-Brown prophecy formula applied to correlation of halves	Corrected r ≥ .70

🔬

Beyond Cronbach's Alpha Cronbach's alpha has been the default reliability statistic in social science research for decades, but it has well-documented limitations: it assumes tau-equivalence (equal factor loadings) across items, is sensitive to the number of items, and can be inflated by item redundancy. Contemporary measurement scholars recommend McDonald's omega (ω) as a superior reliability estimate for multidimensional or non-tau-equivalent scales (McNeish, 2018). Doctoral students using structural equation modeling should report both alpha and omega.

Section 10

Data Analysis in Survey Research

The analytical stage transforms raw survey data into interpretable findings. The specific techniques employed depend on the research questions, the measurement levels of the variables, the sampling design, and the theoretical framework. A critical error made by many novice researchers is selecting statistical tests without consideration of the underlying assumptions or the study's inferential goals.

Preliminary Analysis Steps

1

Data Screening and Cleaning

Inspect the dataset for data entry errors, out-of-range values, and implausible response patterns (e.g., all items rated identically, also called straight-lining). Document all cleaning decisions in a data management log that can be reviewed by collaborators or external auditors.
2

Missing Data Analysis

Determine the extent and pattern of missing data. Data missing completely at random (MCAR) can be handled by listwise deletion with minimal bias. Data missing at random (MAR) or missing not at random (MNAR) require more sophisticated strategies: multiple imputation is the recommended approach under MAR. Report the percentage of missing data per variable and the imputation strategy employed.
3

Assumption Testing

Test the assumptions of the planned statistical procedures before applying them. For parametric tests, assess normality (Shapiro-Wilk test; histograms with normal curves), homogeneity of variance (Levene's test), and linearity. For regression, additionally assess independence of residuals (Durbin-Watson), absence of multicollinearity (VIF < 10; tolerance > .10), and homoscedasticity.
4

Descriptive Statistics

Report frequency distributions, measures of central tendency (mean, median, mode), and measures of dispersion (standard deviation, range, interquartile range) for all key variables. Descriptive statistics provide essential context for interpreting inferential results and should always be reported regardless of the study's primary analytical focus.

Inferential Statistical Techniques

Research Question Type	Technique	Key Assumptions
Difference between two groups on a continuous variable	Independent samples t-test	Normality; homogeneity of variance; independent observations
Difference among three or more groups	One-way ANOVA; post-hoc tests	Normality; homogeneity of variance; independent groups
Association between two categorical variables	Chi-square test of independence	Expected cell frequency ≥ 5; independence of observations
Correlation between two continuous variables	Pearson's r (interval); Spearman's ρ (ordinal)	Linearity; bivariate normality (Pearson); no extreme outliers
Predicting one outcome from multiple predictors	Multiple linear regression	Linearity; normality of residuals; homoscedasticity; no multicollinearity
Predicting a binary outcome	Binary logistic regression	No multicollinearity; adequate cell sizes; linearity of log-odds
Testing a theoretical measurement model	Confirmatory Factor Analysis (CFA)	Multivariate normality; adequate sample size; identified model
Testing a theoretical structural model with latent variables	Structural Equation Modeling (SEM)	Large samples (n ≥ 200 typically); multivariate normality; model identification
Nested data (students within schools)	Multilevel Modeling (HLM)	Sufficient cluster sizes and number of clusters; random effects specification

📌

Statistical Significance vs. Practical Significance Statistical significance (p < .05) indicates only that an observed result is unlikely to be due to chance. It says nothing about the magnitude or practical importance of an effect. Researchers must report effect sizes alongside p-values: Cohen's d for mean differences, η² or ω² for ANOVA, r² or R² for regression. The APA Publication Manual (7th ed., 2020) and most reputable journals now require effect size reporting. A statistically significant result with a negligible effect size (e.g., d = 0.05) may have no practical implications whatsoever.

Section 11

Ethical Considerations in Survey Research

The ethical conduct of survey research is not a bureaucratic formality. It reflects the researcher's fundamental respect for the autonomy, dignity, and wellbeing of the people who provide the data on which all findings rest. Without ethical integrity, scientific credibility is impossible to sustain.

Core Principle

Informed Consent

Participants must receive clear, comprehensible information about the study's purpose, procedures, risks, benefits, and their right to withdraw without penalty before agreeing to participate. Consent must be voluntary and free from coercion. In the Philippines, this requirement is governed by the National Ethical Guidelines for Health and Health-Related Research (PHREB, 2017).

Core Principle

Confidentiality and Anonymity

Confidentiality means the researcher knows participants' identities but will not disclose them. Anonymity means the researcher does not know who provided which responses. Most surveys guarantee confidentiality; anonymity requires that no identifying information be collected. Both must be actively protected through secure data storage, access controls, and data de-identification procedures.

Core Principle

Minimizing Harm

Researchers must anticipate and mitigate potential harms from participation, including psychological distress, breach of privacy, or social stigma. Questions about sensitive topics (e.g., mental health, domestic violence, substance use) require particular care. Referral resources should be provided when questions touch on distressing experiences.

Core Principle

Data Integrity

Survey data must be collected, processed, and reported honestly and accurately. Fabrication of data (creating fictional responses), falsification (manipulating genuine responses), and selective reporting of results are serious breaches of research integrity with severe professional and legal consequences. Raw data should be preserved and available for audit.

Core Principle

Justice and Equity in Sampling

Sampling decisions should not systematically exclude groups in ways that deny them the potential benefits of research participation or that concentrate research burdens unfairly. Historical exclusions of women, minorities, and marginalized communities from research samples have produced body of knowledge with significant blind spots that we are still working to correct.

Core Principle

IRB/Ethics Review

All survey research involving human participants must be reviewed and approved by an Institutional Review Board (IRB) or equivalent ethics committee before data collection begins. No exceptions apply on the grounds that the survey is "merely" descriptive or that the questions seem harmless. Ethics review protects both participants and researchers.

Section 12

Common Limitations and How to Address Them

No research method is without limitations, and survey research has several that researchers must acknowledge explicitly in their reports and dissertations. Identifying limitations is not an admission of failure—it is evidence of methodological maturity and scholarly honesty. Crucially, the researcher should not merely identify limitations but explain why the study's conclusions remain defensible despite them.

Respondents may answer in ways they believe are socially acceptable rather than truthfully. This is especially problematic for sensitive topics (e.g., discriminatory attitudes, risky behaviors, income). Mitigation strategies include: guaranteeing anonymity; using indirect question formats; employing validated social desirability scales (e.g., Marlowe-Crowne) and controlling for them statistically; using list experiments or randomized response techniques for particularly sensitive items.

When both predictor and outcome variables are measured using the same self-report instrument administered at the same time, correlations among them may be inflated by shared method variance rather than reflecting true construct relationships. Procedural remedies include: temporal separation of predictor and outcome measurement; using different response formats for different constructs; obtaining data from multiple sources. Statistical remedies include Harman's single-factor test (limited utility) and the common latent factor technique within SEM (Podsakoff et al., 2012).

Some respondents systematically agree with items regardless of their content—a tendency known as acquiescence or "yea-saying." Including both positively and negatively worded items (reverse-keyed items) within a scale is a common strategy to detect and partially control for this bias. However, researchers should note that reverse-keyed items introduce their own psychometric complications, and the evidence that they effectively reduce acquiescence is mixed (Weijters et al., 2013).

Individuals who choose to respond may differ systematically from those who do not, biasing results. Response rates have declined dramatically across all modes over the past two decades (Brick & Williams, 2013). Mitigation includes: using multiple contact attempts; offering incentives; comparing respondent characteristics to population parameters; conducting wave analysis comparing early and late respondents as a proxy for non-respondents.

Cross-sectional survey research cannot establish causal relationships, only associations. Even in longitudinal designs, establishing causation requires ruling out alternative explanations—a demanding analytical and design challenge. Researchers must be cautious in the language used to describe findings: "X was associated with Y" rather than "X caused Y" unless a truly experimental or quasi-experimental design was employed.

Rather than engaging carefully with each question (optimizing), some respondents exert minimal cognitive effort, selecting the first plausible response option, choosing the midpoint consistently, or not reading questions fully (Krosnick, 1991). Satisficing is more common in long surveys, among less motivated participants, and in self-administered formats where no interviewer is present to encourage engagement. Design remedies: shorter surveys; engaging introductions; attention check items; avoiding response formats that facilitate satisficing (e.g., scales with many points).

Section 13

Practical Examples of Survey Research

Abstract methodological principles become fully comprehensible only when grounded in concrete research scenarios. The following examples illustrate how survey research is designed and executed across different disciplines and contexts—including settings familiar to Filipino researchers and educators.

Example 01 — Educational Research

Teacher Burnout and Organizational Support in Philippine Public Elementary Schools

Research Question: To what extent does perceived organizational support predict emotional exhaustion among public elementary school teachers in Region VII?

Design: Descriptive-correlational survey; cross-sectional design.

Population & Sampling: All 4,820 public elementary school teachers in the Cebu City Division. Stratified random sampling by school cluster; n = 370 (Cochran's formula, p = .50, e = .05, 95% CI).

Instruments: (1) Survey of Perceived Organizational Support (SPOS; Eisenberger et al., 1986; 8-item version), Cronbach's α = .92 in pilot study; (2) Maslach Burnout Inventory – Educators Survey (MBI-ES; Maslach et al., 2017), emotional exhaustion subscale (9 items), α = .91. Both administered as paper questionnaires distributed through school coordinators.

Analysis: Descriptive statistics (means, SDs, frequency distributions); Pearson's r to test the bivariate relationship; simple linear regression to establish the predictive relationship, with demographic variables entered as covariates.

Key Finding (Illustrative): Perceived organizational support explained 28% of the variance in emotional exhaustion scores (R² = .28, F(1, 368) = 143.7, p < .001), with higher perceived support associated with lower emotional exhaustion (β = −.53, p < .001).

Example 02 — Public Health Research

COVID-19 Vaccine Hesitancy and Health Information Sources Among Adults in a Philippine Urban City

Research Question: What are the predictors of COVID-19 vaccine hesitancy among adults in Cebu City, and which health information sources are associated with lower hesitancy?

Design: Analytical cross-sectional survey.

Population & Sampling: Adults aged 18–65 residing in Cebu City. Multistage cluster sampling: first randomly select barangays (n = 30 of 80), then randomly select households, then randomly select one eligible adult per household; target n = 600.

Instrument: Adapted WHO SAGE Working Group Vaccine Hesitancy Survey questionnaire; additional items on media use and health information seeking behavior. Administered face-to-face by trained interviewers.

Analysis: Binary logistic regression with vaccine hesitancy (hesitant/not hesitant) as the dependent variable; predictors include age, sex, education, income, trust in health authorities, and primary health information source.

Example 03 — Organizational Research

Remote Work Flexibility and Job Satisfaction Among Knowledge Workers in the BPO Sector

Research Question: Does work schedule flexibility mediate the relationship between remote work arrangements and job satisfaction among BPO employees in the Philippines?

Design: Analytical cross-sectional survey; mediation analysis framework.

Instrument: Online survey administered via Microsoft Forms: (1) Telework and Flexibility Scale (adapted from Golden & Veiga, 2005); (2) Job Satisfaction Survey (Spector, 1985; 9-dimension scale); (3) Schedule Flexibility Scale (adapted from Baltes et al., 1999). All scales validated in pilot with 50 BPO employees; Cronbach's alphas ranged from .79 to .91.

Analysis: PROCESS macro (Hayes, 2022) for mediation analysis; bootstrapped confidence intervals (5,000 iterations) to test the indirect effect of remote work on job satisfaction through schedule flexibility.

Example 04 — Longitudinal Survey Design

Academic Self-Efficacy Trajectories in Senior High School Students

Research Question: How does academic self-efficacy change from Grade 11 to Grade 12, and what school-level and student-level factors predict trajectories of change?

Design: Two-wave panel study; data collected at Grade 11 (Time 1) and Grade 12 (Time 2, 12 months later). Multilevel growth modeling to account for nesting of students within schools.

Instrument: Academic Self-Efficacy Scale (Bandura, 2006); 7-point response scale (1 = "cannot do at all" to 7 = "highly certain can do").

Note on Attrition: Of 850 students surveyed at Time 1, 763 (89.8%) provided complete data at Time 2. Attrition analysis showed no significant difference between retained and lost participants on Time 1 self-efficacy scores, sex, or school type, suggesting attrition was approximately random and unlikely to bias findings substantially.

Section 15

Classroom Activities for Teachers

The following structured activities are designed for instructors teaching research methodology at the graduate or advanced undergraduate level. Each activity engages students as active producers of knowledge rather than passive consumers of information, consistent with constructivist principles of adult learning (Merriam & Bierema, 2014). All activities can be adapted for face-to-face, blended, or fully online modalities.

The Survey Autopsy: Diagnosing a Flawed Questionnaire

Individual + Group 60–90 min

Learning Objectives

Identify common questionnaire design errors in real instruments
Articulate how specific errors threaten measurement validity
Revise flawed items to meet professional design standards

Materials

A deliberately flawed questionnaire prepared by the instructor (containing double-barreled, leading, ambiguous, and response-mismatched items)
Questionnaire Design Checklist (instructor-provided rubric)

Procedure

Step 1 (15 min, individual): Each student receives the flawed questionnaire and reviews it independently, flagging problematic items and naming the specific error type.
Step 2 (20 min, pairs): Students compare their analyses with a partner, resolve disagreements, and together revise each flawed item.
Step 3 (20 min, full class): Pairs share their revisions. The class discusses competing revisions and reaches consensus on best-practice versions of each item.
Step 4 (15 min, individual reflection): Each student writes a brief reflection on which error type they found most difficult to detect and why.

Debrief Questions

Which questionnaire flaw is most likely to go undetected during instrument development? Why?
How does cognitive interviewing address the limitations of expert review alone?

Sampling Simulation: Who Gets Selected and Who Does Not?

Group Activity 45–60 min

Learning Objectives

Differentiate among probability sampling methods through direct application
Compute sampling intervals and select samples from a given frame
Evaluate sampling methods in terms of cost, representativeness, and feasibility

Materials

A printed list of 100 fictional faculty members with demographic attributes (sex, department, years of service, rank)
Random number table or random number generator (phone app)
Sampling comparison worksheet

Procedure

Groups of 4–5 students each draw a sample of n = 20 using a different assigned method: simple random, systematic, stratified by sex, cluster by department.
Each group computes the proportion of women, average years of service, and proportion of full professors in their sample.
Groups compare results: which method produced the sample most representative of the full list? Which method would be most efficient if the researcher had limited time?
Instructor facilitates a whole-class comparison of how different methods affect sample composition on these known characteristics.

Extension for Advanced Students

Calculate the design effect (DEFF) for the cluster sampling results to illustrate the efficiency trade-off.

Mini Survey Project: From Concept to Data Report

Group Project 2–3 Weeks Capstone

Learning Objectives

Experience the complete survey research process from design to reporting
Apply instrument design, sampling, and data analysis skills in an integrated manner
Communicate findings in an academic report format

Overview

Groups of 3–4 students design and conduct a small-scale survey study within their academic community. The research question must be approved by the instructor. The project unfolds in three phases:

Phase 1 — Design (Week 1)

Define a research question and identify the target population within the institution
Select a validated instrument or construct a 10–15 item scale; conduct a cognitive interview with 3 classmates not in the group
Determine sampling method and compute sample size; prepare IRB protocol (simplified ethical clearance form)

Phase 2 — Data Collection (Week 2)

Administer the survey; implement at least one follow-up contact for non-respondents
Compute and report the response rate

Phase 3 — Analysis and Report (Week 3)

Clean data; compute descriptive statistics and at least one inferential test
Submit a 2,000-word research report in APA 7th edition format, including method, results, discussion, limitations, and references
Present findings in a 10-minute class presentation

The Response Bias Lab: A Live Classroom Experiment

Full Class 30–40 min

Learning Objectives

Observe question-wording and context effects on survey responses in real time
Understand why experimental evidence, not intuition, drives questionnaire design guidelines

Procedure

Split version: Divide the class randomly into two groups. Give Group A a questionnaire asking about "freedom of speech" before a question about a controversial topic. Give Group B the same controversial question, but ask about "restrictions on hate speech" instead of "freedom of speech." Both versions address the same underlying attitude.
Collect responses anonymously (e.g., by show of hands or anonymous sticky-note counts).
Reveal and compare the response distributions between groups.
Debrief: How did framing (question wording) shape responses? What does this imply for instrument design?

Instructor Note

This replicates the classic question-wording experiment (Schuman & Presser, 1981). Students are often genuinely surprised by the magnitude of the effect, making it a memorable illustration of why careful wording matters.

Ethics Deliberation: The Informed Consent Dilemma

Discussion + Role Play 45 min

Learning Objectives

Apply ethical principles (autonomy, beneficence, non-maleficence, justice) to realistic survey research scenarios
Articulate the researcher's ethical responsibilities to participants and to the scientific community

Scenario Cards (Distribute one per small group)

Scenario A: A researcher studying stigma toward persons with disabilities wants to include deception in the consent form to prevent socially desirable responding. How should informed consent be handled?
Scenario B: A graduate student conducts an online survey on student mental health and discovers one respondent's answers suggest suicidal ideation. What are the researcher's ethical obligations given that the survey is anonymous?
Scenario C: A department chair distributes a survey on employee satisfaction and asks subordinates to complete it during a required meeting. What ethical concerns does this raise, and how should they be addressed?
Scenario D: A researcher collects survey data and finds that results do not support the sponsor's expected conclusions. The sponsor requests that these findings be omitted from the report. What should the researcher do?

Procedure

Groups deliberate for 15 minutes, applying the Belmont Report principles and PHREB guidelines to their scenario.
Each group presents their decision and rationale (5 minutes each).
Full-class debrief: Where did groups disagree? What ethical tensions are irresolvable through procedural rules alone?

Section 16

References

All references follow APA 7th Edition format. URLs included where publicly accessible. Citations prioritize publications from 2010–2026 to reflect current methodological practice.

American Psychological Association. (2020). Publication manual of the American Psychological Association (7th ed.). American Psychological Association. https://doi.org/10.1037/0000165-000
Baltes, B. B., Briggs, T. E., Huff, J. W., Wright, J. A., & Neuman, G. A. (1999). Flexible and compressed workweek schedules: A meta-analysis of their effects on work-related criteria. Journal of Applied Psychology, 84(4), 496–513. https://doi.org/10.1037/0021-9010.84.4.496
Bandura, A. (2006). Guide for constructing self-efficacy scales. In F. Pajares & T. Urdan (Eds.), Self-efficacy beliefs of adolescents (Vol. 5, pp. 307–337). Information Age Publishing.
Brick, J. M., & Williams, D. (2013). Explaining rising nonresponse rates in cross-sectional surveys. The ANNALS of the American Academy of Political and Social Science, 645(1), 36–59. https://doi.org/10.1177/0002716212456834
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.
Converse, J. M. (1987). Survey research in the United States: Roots and emergence 1890–1960. University of California Press.
Dillman, D. A., Smyth, J. D., & Christian, L. M. (2014). Internet, phone, mail, and mixed-mode surveys: The tailored design method (4th ed.). Wiley.
Eisenberger, R., Huntington, R., Hutchison, S., & Sowa, D. (1986). Perceived organizational support. Journal of Applied Psychology, 71(3), 500–507. https://doi.org/10.1037/0021-9010.71.3.500
Fowler, F. J. (2014). Survey research methods (5th ed.). SAGE Publications.
Golden, T. D., & Veiga, J. F. (2005). The impact of extent of telecommuting on job satisfaction: Resolving inconsistent findings. Journal of Management, 31(2), 301–318. https://doi.org/10.1177/0149206304271768
Groves, R. M., & Peytcheva, E. (2008). The impact of nonresponse rates on nonresponse bias: A meta-analysis. Public Opinion Quarterly, 72(2), 167–189. https://doi.org/10.1093/poq/nfn011
Groves, R. M., Floyd, J. F., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2011). Survey methodology (2nd ed.). Wiley.
Hayes, A. F. (2022). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach (3rd ed.). Guilford Press.
Krosnick, J. A. (1991). Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied Cognitive Psychology, 5(3), 213–236. https://doi.org/10.1002/acp.2350050305
Krosnick, J. A., & Presser, S. (2010). Question and questionnaire design. In P. V. Marsden & J. D. Wright (Eds.), Handbook of survey research (2nd ed., pp. 263–314). Emerald Group Publishing.
Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel Psychology, 28(4), 563–575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22(140), 1–55.
Maslach, C., Leiter, M. P., & Jackson, S. E. (2017). Maslach Burnout Inventory manual (4th ed.). Mind Garden.
McNeish, D. (2018). Thanks coefficient alpha, we'll take it from here. Psychological Methods, 23(3), 412–433. https://doi.org/10.1037/met0000144
Merriam, S. B., & Bierema, L. L. (2014). Adult learning: Linking theory and practice. Jossey-Bass.
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). McGraw-Hill.
Philippine Health Research Ethics Board (PHREB). (2017). National ethical guidelines for health and health-related research. Department of Health, Republic of the Philippines. https://phreb.doh.gov.ph
Podsakoff, P. M., MacKenzie, S. B., & Podsakoff, N. P. (2012). Sources of method bias in social science research and recommendations on how to control it. Annual Review of Psychology, 63, 539–569. https://doi.org/10.1146/annurev-psych-120710-100452
Reichheld, F. F. (2003). The one number you need to grow. Harvard Business Review, 81(12), 46–55.
Schuman, H., & Presser, S. (1981). Questions and answers in attitude surveys: Experiments on question form, wording, and context. Academic Press.
Spector, P. E. (1985). Measurement of human service staff satisfaction: Development of the Job Satisfaction Survey. American Journal of Community Psychology, 13(6), 693–713. https://doi.org/10.1007/BF00929796
Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103(2684), 677–680. https://doi.org/10.1126/science.103.2684.677
Tourangeau, R., Rips, L. J., & Rasinski, K. (2000). The psychology of survey response. Cambridge University Press.
Tourangeau, R., Conrad, F. G., & Couper, M. P. (2013). The science of web surveys. Oxford University Press.
Weijters, B., Cabooter, E., & Schillewaert, N. (2013). The effect of rating scale format on response styles: The number of response categories and response category labels. International Journal of Research in Marketing, 27(3), 236–247. https://doi.org/10.1016/j.ijresmar.2010.02.007

Definition and Nature of Survey Research

Historical Background

Types of Survey Research

Cross-Sectional Survey

Longitudinal Survey

Panel Study

Cohort Study

Trend Study

Descriptive Survey

Analytical Survey

Census Survey

The Survey Design Process

Define the Research Problem and Objectives

Review Existing Literature and Instruments

Define the Target Population and Sampling Frame

Select the Sampling Method and Compute Sample Size

Design the Questionnaire

Establish Validity and Reliability

Obtain Ethical Approval

Administer the Survey

Process, Clean, and Analyze Data

Interpret, Report, and Disseminate Findings

Sampling Methods

Computing Sample Size

Questionnaire Construction

Question Types

Principles of Effective Question Wording

Measurement Scales in Survey Research

The Likert Scale

Interactive Example: 5-Point Likert Item

Common Scale Types Compared

Levels of Measurement and Statistical Implications

Data Collection Methods

Validity and Reliability of Survey Instruments

Validity

Reliability

Data Analysis in Survey Research

Preliminary Analysis Steps

Data Screening and Cleaning

Missing Data Analysis

Assumption Testing

Descriptive Statistics

Inferential Statistical Techniques

Ethical Considerations in Survey Research

Informed Consent

Confidentiality and Anonymity

Minimizing Harm

Data Integrity

Justice and Equity in Sampling

IRB/Ethics Review

Common Limitations and How to Address Them

Practical Examples of Survey Research

Teacher Burnout and Organizational Support in Philippine Public Elementary Schools

COVID-19 Vaccine Hesitancy and Health Information Sources Among Adults in a Philippine Urban City

Remote Work Flexibility and Job Satisfaction Among Knowledge Workers in the BPO Sector

Academic Self-Efficacy Trajectories in Senior High School Students

On This Page

Test Your Understanding

Classroom Activities for Teachers

Learning Objectives

Materials

Procedure

Debrief Questions

Learning Objectives

Materials

Procedure

Extension for Advanced Students

Learning Objectives

Overview

Phase 1 — Design (Week 1)

Phase 2 — Data Collection (Week 2)

Phase 3 — Analysis and Report (Week 3)

Learning Objectives

Procedure

Instructor Note

Learning Objectives

Scenario Cards (Distribute one per small group)

Procedure

References