The Histogram in Statistical and Academic Practice

The histogram is a graphical representation of the frequency distribution of a single continuous or discrete variable. Unlike the bar chart, which displays values for distinct categorical groups separated by visible gaps, the histogram encodes data through contiguous rectangular bars whose widths span equal intervals of the measurement scale and whose heights are proportional to the count, relative frequency, or density of observations falling within each interval. The contiguity of bars is not a stylistic choice but a mathematical statement: the absence of gaps signals that the underlying variable is continuous and that adjacent intervals share a common boundary.

The histogram was formally introduced by Karl Pearson in his 1895 paper in the Philosophical Transactions of the Royal Society, where he developed it as a tool for visualizing probability density from empirical data. Pearson's contribution was not merely graphical; it was epistemological. He recognised that any finite set of observations is a noisy approximation of an underlying population distribution, and that the histogram represents one of the earliest systematic approaches to nonparametric density estimation from sample data.

Distinction from the Bar Chart A histogram and a bar chart are not interchangeable. A histogram represents the frequency distribution of a single continuous variable; its bars are contiguous because the variable itself is continuous and intervals share boundaries. A bar chart compares values across discrete categorical groups; its bars are separated by gaps to signal the absence of continuity. Applying a histogram to categorical data, or a bar chart to a continuous distribution, constitutes an error of graphical representation that reviewers of peer-reviewed journals regularly identify as grounds for revision.

Bin Width Selection: The Central Methodological Decision

The bin width, also called the class width or interval width, is the most consequential parameter in histogram construction. It determines the resolution at which the data are displayed. A bin width that is too wide collapses the distribution into a few featureless bars, obscuring multimodality, skewness, and outlier structure. A bin width that is too narrow produces a jagged, noisy display in which random sampling variation dominates the visible pattern. The problem of optimal bin width is an active area of research in nonparametric statistics and is analogous to the bandwidth selection problem in kernel density estimation.

Three rules have achieved canonical status in statistical practice. Each encodes a different philosophical stance about the nature of the data and the goals of the visualization.

Formula Reference: The Three Standard Rules

Rule Formula Output Assumption Sample Range Best Used When
Sturges (1926) k = 1 + log2(n) k = number of bins Data follow a binomial distribution; approximate normality n = 30 to 200 Exploratory analysis of moderate, roughly symmetric samples
Freedman-Diaconis (1981) h = 2 × IQR × n−1/3 h = bin width Uses IQR; robust to outliers; no normality assumed n ≥ 100 recommended Large samples, skewed distributions, or data with outliers
Scott (1979) h = 3.5 × SD × n−1/3 h = bin width Minimises MISE; assumes approximately normal data Any n; optimal under normality Normally distributed data; reference standard in R and MATLAB
Converting bin width to bin count: k = ceil( (max − min) / h )  —  applies to Freedman-Diaconis and Scott only. IQR = interquartile range; SD = sample standard deviation; n = sample size; MISE = mean integrated squared error.
Sturges (1926)
Number of Bins
k = 1 + log2(n)
Bin width: h = (max − min) / k

Derived from the binomial distribution assumption. For n = 50: k = 1 + log2(50) = 6.64, so k = 7. Underestimates bins for large or skewed samples.

Freedman-Diaconis (1981)
Bin Width
h = 2 × IQR × n−1/3
k = ceil( (max − min) / h )

Uses IQR rather than SD, making it robust against outliers. For n = 100 with IQR = 10: h = 2 × 10 × 100−1/3 = 4.64. Typically produces more bins than Sturges.

Scott (1979)
Bin Width
h = 3.5 × SD × n−1/3
k = ceil( (max − min) / h )

Minimises mean integrated squared error (MISE) under normality. For n = 100 with SD = 15: h = 3.5 × 15 × 100−1/3 = 11.14. The default in R (hist()) and MATLAB.

Reporting Requirement There is no universally optimal bin width. Each rule encodes different assumptions about the data distribution. Researchers must report which rule was applied and justify the choice in the figure caption or methods section. When the three rules produce substantially different histograms, this discrepancy is itself informative about the distributional properties of the data and should be discussed rather than concealed by selecting the most visually convenient result.

Descriptive Statistics Formulas

The following formulas define the descriptive statistics computed by this tool. All estimates use the sample formulas with an n minus 1 denominator where applicable, which provides unbiased estimates of the corresponding population parameters.

Central Tendency
Mean
x-bar = (1/n) × ∑ xi
Median
Middle value of sorted data (or average of two middle values for even n)
Spread and Variability
Sample Variance and Standard Deviation
s² = ∑(xi − x-bar)² / (n − 1)
s = √s²
Standard Error and 95% Confidence Interval
SE = s / √n
95% CI = x-bar ± t0.025, n−1 × SE
Shape: Skewness and Kurtosis
Sample Skewness (Fisher g1, bias-corrected)
g1 = [ n(n+1) / ((n−1)(n−2)(n−3)) ] × ∑( (xi−x-bar) / s )3
• |g1| ≤ 0.5: approximately symmetric
• 0.5 < |g1| ≤ 1.0: moderately skewed
• |g1| > 1.0: substantially skewed
Sample Excess Kurtosis (Fisher g2, bias-corrected)
g2 = [(n−1)/((n−2)(n−3))] × [(n+1)K4 + 6]
where K4 = [n∑(xi−x-bar)4 / s4] − 3
• g2 = 0: mesokurtic (normal tail weight)
• g2 > 0: leptokurtic (heavy tails)
• g2 < 0: platykurtic (light tails)
Order Statistics and IQR
Quartiles and Interquartile Range
Q1 = 25th percentile (first quartile)
Q2 = 50th percentile (median)
Q3 = 75th percentile (third quartile)
IQR = Q3 − Q1
Computed using linear interpolation between order statistics.
Vertical Axis Quantities
Frequency, Relative Frequency, and Density
Frequency: fj = count of observations in bin j
Relative frequency: rfj = fj / n   (∑ rfj = 1)
Density: dj = rfj / h   (∑ dj × h = 1)

Density is required when overlaying a fitted distribution curve. The area of each density bar equals the relative frequency of that interval, so areas sum to 1 regardless of bin width.

What the Histogram Communicates

A well-constructed histogram communicates four distributional features that no summary statistic can convey alone. Central tendency is visible as the approximate location of the tallest bars. Spread is visible as the horizontal extent of the distribution. Shape is visible as symmetry or asymmetry, the presence of a single mode or multiple modes, and the thickness of the tails. Outliers appear as isolated bars at the extreme ends of the distribution, separated from the main body of observations.

These features correspond directly to the summary statistics produced by descriptive analysis. Positive skewness produces a right tail that extends further than the left, pulling the mean above the median. Negative skewness reverses this pattern. Bimodal distributions, which appear as two distinct peaks in the histogram, may indicate that the sample is drawn from two distinct subpopulations with different characteristics, a finding that no single mean or standard deviation can capture and that has direct implications for the appropriateness of parametric tests that assume a single homogeneous population.

Skewness Interpretation
Skewness quantifies the asymmetry of a distribution. A value between -0.5 and 0.5 indicates approximate symmetry. Values between 0.5 and 1.0 (or -0.5 to -1.0) indicate moderate skewness. Values beyond 1.0 or -1.0 indicate substantial skewness. In practice, skewness above 2.0 in absolute value raises serious concerns about the validity of statistical procedures that assume normality.
Excess Kurtosis Interpretation
Excess kurtosis measures tail heaviness relative to the normal distribution. A value of zero indicates normal tail weight (mesokurtic). Positive excess kurtosis (leptokurtic) indicates heavier tails and a sharper peak, common in financial returns and reaction time data. Negative excess kurtosis (platykurtic) indicates lighter tails and a flatter peak. Values beyond 3.0 in absolute value suggest substantial departure from normality.

Frequency, Relative Frequency, and Density

The vertical axis of a histogram can represent three distinct quantities, each appropriate for different purposes. Frequency (count) displays the raw number of observations in each bin and is appropriate when the absolute count is the quantity of interest. Relative frequency (proportion) divides each count by the total sample size, producing values that sum to one and allowing comparison across samples of different sizes. Density divides the relative frequency by the bin width, producing an estimate of the probability density function that is independent of the chosen bin width in the sense that the area of each bar equals the relative frequency of that interval. Density is the correct vertical axis when the histogram is intended as a visual estimate of the probability density function, and it is required when overlaying a fitted parametric distribution curve.

Design Requirements for Academic Publication

The APA Publication Manual (7th edition, 2020) specifies that figures must be labelled with a sequential Arabic numeral appearing in bold below the image, followed by the figure title in italic title case. The caption must describe the variable displayed, identify the bin-width rule used, state the sample size, and define any abbreviations. The figure must be referenced in the body text before it appears. Journals that print in black and white require that the histogram bar fill be a single grey shade distinct from the axis lines and gridlines, with no gradient fills or three-dimensional effects that distort the accurate reading of bar heights.

APA 7th Edition Requirements for Histograms
  1. Label. Figure 1 (bold) appears below the image.
  2. Title. The figure title in italic title case appears on the next line.
  3. Caption. The caption names the variable, states the bin-width rule, reports the sample size, and ends with a period.
  4. In-text reference. The figure must be cited as Figure 1 displays or see Figure 1 before it appears.
  5. Zero baseline. The frequency or density axis must begin at zero. Any other origin constitutes graphical misrepresentation.

Normality Testing and the Histogram

The histogram is frequently used as a preliminary visual check for normality before applying parametric statistical procedures. A normally distributed sample produces a histogram that is approximately bell-shaped, symmetric, and unimodal. However, visual assessment is unreliable for small samples because random sampling variation produces substantial apparent deviations from normality even when the population is exactly normal. For sample sizes below 50, formal tests such as the Shapiro-Wilk test provide more reliable evidence. For samples above 200, formal tests become hypersensitive and routinely reject normality for distributions that are practically indistinguishable from normal for the purposes of the analysis. In these cases, the histogram, supplemented by skewness and kurtosis statistics, provides more actionable guidance than the p-value of a normality test.

Selected Methodological Questions

When should density be used instead of frequency on the vertical axis?

Density should replace frequency or relative frequency on the vertical axis whenever the researcher intends to overlay a fitted probability distribution curve (such as a normal or gamma distribution), because density ensures that the area under each bar equals the relative frequency regardless of bin width. Without density, overlaid distribution curves appear to have the wrong scale relative to the histogram bars. For purely descriptive purposes where the count or proportion is the quantity of interest, frequency or relative frequency is more intuitive for a general audience.

Can histograms be used for ordinal data?

Histograms are technically appropriate only for continuous or discrete quantitative variables where a meaningful distance exists between values. Ordinal variables such as Likert-scale ratings lack this property: the distance between "agree" and "strongly agree" is not equivalent to the distance between any other adjacent pair, and representing them as a histogram implies equal interval spacing that does not exist. The appropriate display for ordinal data is a bar chart with separated bars and clearly labelled categories. Using a histogram for ordinal data constitutes a measurement-level violation that misrepresents the structure of the data.

How should multiple histograms be compared?

When comparing histograms across groups or time points, the vertical and horizontal axes must use identical scales across all histograms in the comparison. Differences in axis scaling between panels constitute a form of visual confounding that makes equal distributions appear different and different distributions appear similar. Overlaid or faceted histograms with a shared axis are preferable to separate figures when the purpose is direct distributional comparison. Density on the vertical axis is preferred over frequency when groups have different sample sizes, because it normalises for sample size differences.