The Histogram in Statistical and Academic Practice
The histogram is a graphical representation of the frequency distribution of a single continuous or discrete variable. Unlike the bar chart, which displays values for distinct categorical groups separated by visible gaps, the histogram encodes data through contiguous rectangular bars whose widths span equal intervals of the measurement scale and whose heights are proportional to the count, relative frequency, or density of observations falling within each interval. The contiguity of bars is not a stylistic choice but a mathematical statement: the absence of gaps signals that the underlying variable is continuous and that adjacent intervals share a common boundary.
The histogram was formally introduced by Karl Pearson in his 1895 paper in the Philosophical Transactions of the Royal Society, where he developed it as a tool for visualizing probability density from empirical data. Pearson's contribution was not merely graphical; it was epistemological. He recognised that any finite set of observations is a noisy approximation of an underlying population distribution, and that the histogram represents one of the earliest systematic approaches to nonparametric density estimation from sample data.
Bin Width Selection: The Central Methodological Decision
The bin width, also called the class width or interval width, is the most consequential parameter in histogram construction. It determines the resolution at which the data are displayed. A bin width that is too wide collapses the distribution into a few featureless bars, obscuring multimodality, skewness, and outlier structure. A bin width that is too narrow produces a jagged, noisy display in which random sampling variation dominates the visible pattern. The problem of optimal bin width is an active area of research in nonparametric statistics and is analogous to the bandwidth selection problem in kernel density estimation.
Three rules have achieved canonical status in statistical practice. Each encodes a different philosophical stance about the nature of the data and the goals of the visualization.
Formula Reference: The Three Standard Rules
| Rule | Formula | Output | Assumption | Sample Range | Best Used When |
|---|---|---|---|---|---|
| Sturges (1926) | k = 1 + log2(n) | k = number of bins | Data follow a binomial distribution; approximate normality | n = 30 to 200 | Exploratory analysis of moderate, roughly symmetric samples |
| Freedman-Diaconis (1981) | h = 2 × IQR × n−1/3 | h = bin width | Uses IQR; robust to outliers; no normality assumed | n ≥ 100 recommended | Large samples, skewed distributions, or data with outliers |
| Scott (1979) | h = 3.5 × SD × n−1/3 | h = bin width | Minimises MISE; assumes approximately normal data | Any n; optimal under normality | Normally distributed data; reference standard in R and MATLAB |
| Converting bin width to bin count: k = ceil( (max − min) / h ) — applies to Freedman-Diaconis and Scott only. IQR = interquartile range; SD = sample standard deviation; n = sample size; MISE = mean integrated squared error. | |||||
Bin width: h = (max − min) / k
Derived from the binomial distribution assumption. For n = 50: k = 1 + log2(50) = 6.64, so k = 7. Underestimates bins for large or skewed samples.
k = ceil( (max − min) / h )
Uses IQR rather than SD, making it robust against outliers. For n = 100 with IQR = 10: h = 2 × 10 × 100−1/3 = 4.64. Typically produces more bins than Sturges.
k = ceil( (max − min) / h )
Minimises mean integrated squared error (MISE) under normality. For n = 100 with SD = 15: h = 3.5 × 15 × 100−1/3 = 11.14. The default in R (hist()) and MATLAB.
Descriptive Statistics Formulas
The following formulas define the descriptive statistics computed by this tool. All estimates use the sample formulas with an n minus 1 denominator where applicable, which provides unbiased estimates of the corresponding population parameters.
s = √s²
95% CI = x-bar ± t0.025, n−1 × SE
• |g1| ≤ 0.5: approximately symmetric
• 0.5 < |g1| ≤ 1.0: moderately skewed
• |g1| > 1.0: substantially skewed
where K4 = [n∑(xi−x-bar)4 / s4] − 3
• g2 = 0: mesokurtic (normal tail weight)
• g2 > 0: leptokurtic (heavy tails)
• g2 < 0: platykurtic (light tails)
Q2 = 50th percentile (median)
Q3 = 75th percentile (third quartile)
IQR = Q3 − Q1
Computed using linear interpolation between order statistics.
Relative frequency: rfj = fj / n (∑ rfj = 1)
Density: dj = rfj / h (∑ dj × h = 1)
Density is required when overlaying a fitted distribution curve. The area of each density bar equals the relative frequency of that interval, so areas sum to 1 regardless of bin width.
What the Histogram Communicates
A well-constructed histogram communicates four distributional features that no summary statistic can convey alone. Central tendency is visible as the approximate location of the tallest bars. Spread is visible as the horizontal extent of the distribution. Shape is visible as symmetry or asymmetry, the presence of a single mode or multiple modes, and the thickness of the tails. Outliers appear as isolated bars at the extreme ends of the distribution, separated from the main body of observations.
These features correspond directly to the summary statistics produced by descriptive analysis. Positive skewness produces a right tail that extends further than the left, pulling the mean above the median. Negative skewness reverses this pattern. Bimodal distributions, which appear as two distinct peaks in the histogram, may indicate that the sample is drawn from two distinct subpopulations with different characteristics, a finding that no single mean or standard deviation can capture and that has direct implications for the appropriateness of parametric tests that assume a single homogeneous population.
Frequency, Relative Frequency, and Density
The vertical axis of a histogram can represent three distinct quantities, each appropriate for different purposes. Frequency (count) displays the raw number of observations in each bin and is appropriate when the absolute count is the quantity of interest. Relative frequency (proportion) divides each count by the total sample size, producing values that sum to one and allowing comparison across samples of different sizes. Density divides the relative frequency by the bin width, producing an estimate of the probability density function that is independent of the chosen bin width in the sense that the area of each bar equals the relative frequency of that interval. Density is the correct vertical axis when the histogram is intended as a visual estimate of the probability density function, and it is required when overlaying a fitted parametric distribution curve.
Design Requirements for Academic Publication
The APA Publication Manual (7th edition, 2020) specifies that figures must be labelled with a sequential Arabic numeral appearing in bold below the image, followed by the figure title in italic title case. The caption must describe the variable displayed, identify the bin-width rule used, state the sample size, and define any abbreviations. The figure must be referenced in the body text before it appears. Journals that print in black and white require that the histogram bar fill be a single grey shade distinct from the axis lines and gridlines, with no gradient fills or three-dimensional effects that distort the accurate reading of bar heights.
- Label. Figure 1 (bold) appears below the image.
- Title. The figure title in italic title case appears on the next line.
- Caption. The caption names the variable, states the bin-width rule, reports the sample size, and ends with a period.
- In-text reference. The figure must be cited as Figure 1 displays or see Figure 1 before it appears.
- Zero baseline. The frequency or density axis must begin at zero. Any other origin constitutes graphical misrepresentation.
Normality Testing and the Histogram
The histogram is frequently used as a preliminary visual check for normality before applying parametric statistical procedures. A normally distributed sample produces a histogram that is approximately bell-shaped, symmetric, and unimodal. However, visual assessment is unreliable for small samples because random sampling variation produces substantial apparent deviations from normality even when the population is exactly normal. For sample sizes below 50, formal tests such as the Shapiro-Wilk test provide more reliable evidence. For samples above 200, formal tests become hypersensitive and routinely reject normality for distributions that are practically indistinguishable from normal for the purposes of the analysis. In these cases, the histogram, supplemented by skewness and kurtosis statistics, provides more actionable guidance than the p-value of a normality test.
Selected Methodological Questions
When should density be used instead of frequency on the vertical axis?
Density should replace frequency or relative frequency on the vertical axis whenever the researcher intends to overlay a fitted probability distribution curve (such as a normal or gamma distribution), because density ensures that the area under each bar equals the relative frequency regardless of bin width. Without density, overlaid distribution curves appear to have the wrong scale relative to the histogram bars. For purely descriptive purposes where the count or proportion is the quantity of interest, frequency or relative frequency is more intuitive for a general audience.
Can histograms be used for ordinal data?
Histograms are technically appropriate only for continuous or discrete quantitative variables where a meaningful distance exists between values. Ordinal variables such as Likert-scale ratings lack this property: the distance between "agree" and "strongly agree" is not equivalent to the distance between any other adjacent pair, and representing them as a histogram implies equal interval spacing that does not exist. The appropriate display for ordinal data is a bar chart with separated bars and clearly labelled categories. Using a histogram for ordinal data constitutes a measurement-level violation that misrepresents the structure of the data.
How should multiple histograms be compared?
When comparing histograms across groups or time points, the vertical and horizontal axes must use identical scales across all histograms in the comparison. Differences in axis scaling between panels constitute a form of visual confounding that makes equal distributions appear different and different distributions appear similar. Overlaid or faceted histograms with a shared axis are preferable to separate figures when the purpose is direct distributional comparison. Density on the vertical axis is preferred over frequency when groups have different sample sizes, because it normalises for sample size differences.