What is the difference between a histogram and a bar chart?

A histogram displays the frequency distribution of a single continuous variable by grouping observations into adjacent intervals called bins. Its bars are contiguous because the underlying variable is continuous and the intervals share boundaries. A bar chart, by contrast, displays values for distinct categorical groups whose bars are separated by gaps to signal the absence of a continuous scale. Applying a histogram to categorical data, or a bar chart to a continuous distribution, constitutes an error of graphical representation.

What is Sturges' rule for bin selection?

Sturges' rule (1926) specifies the number of histogram bins as k equals 1 plus log base 2 of n, where n is the sample size. It is derived from the assumption that the data follow a binomial distribution and is appropriate for moderate sample sizes between 30 and 200 observations. For very large or highly skewed samples, Sturges' rule tends to underestimate the optimal number of bins.

What is the Freedman-Diaconis rule?

The Freedman-Diaconis rule (1981) determines bin width as two times the interquartile range divided by the cube root of n. Unlike Sturges' rule, it is robust against outliers because it uses the IQR rather than the range or standard deviation. It is recommended for large samples and skewed distributions where outlier sensitivity is a concern.

What is Scott's rule for histogram bin width?

Scott's rule (1979) determines bin width as 3.5 times the sample standard deviation divided by the cube root of n. It is derived to minimise the mean integrated squared error between the histogram and the true density under the assumption that the data are approximately normally distributed. It is the reference standard in many statistical software packages.

How should a histogram be reported in APA 7th edition?

In APA 7th edition, a histogram is labelled Figure with a sequential Arabic numeral appearing in bold below the image, followed by the figure title in italic title case on the next line. The caption describes the variable displayed, identifies the bin-width rule used, and states the sample size. The figure is placed as close as possible to the first in-text reference and must be mentioned in the text as Figure 1 displays or see Figure 1.

Histogram Maker and Reference Guide | Statistical Data Visualization Tool

The Histogram in Statistical and Academic Practice

The histogram is a graphical representation of the frequency distribution of a single continuous or discrete variable. Unlike the bar chart, which displays values for distinct categorical groups separated by visible gaps, the histogram encodes data through contiguous rectangular bars whose widths span equal intervals of the measurement scale and whose heights are proportional to the count, relative frequency, or density of observations falling within each interval. The contiguity of bars is not a stylistic choice but a mathematical statement: the absence of gaps signals that the underlying variable is continuous and that adjacent intervals share a common boundary.

The histogram was formally introduced by Karl Pearson in his 1895 paper in the Philosophical Transactions of the Royal Society, where he developed it as a tool for visualizing probability density from empirical data. Pearson's contribution was not merely graphical; it was epistemological. He recognised that any finite set of observations is a noisy approximation of an underlying population distribution, and that the histogram represents one of the earliest systematic approaches to nonparametric density estimation from sample data.

Distinction from the Bar Chart A histogram and a bar chart are not interchangeable. A histogram represents the frequency distribution of a single continuous variable; its bars are contiguous because the variable itself is continuous and intervals share boundaries. A bar chart compares values across discrete categorical groups; its bars are separated by gaps to signal the absence of continuity. Applying a histogram to categorical data, or a bar chart to a continuous distribution, constitutes an error of graphical representation that reviewers of peer-reviewed journals regularly identify as grounds for revision.

Bin Width Selection: The Central Methodological Decision

The bin width, also called the class width or interval width, is the most consequential parameter in histogram construction. It determines the resolution at which the data are displayed. A bin width that is too wide collapses the distribution into a few featureless bars, obscuring multimodality, skewness, and outlier structure. A bin width that is too narrow produces a jagged, noisy display in which random sampling variation dominates the visible pattern. The problem of optimal bin width is an active area of research in nonparametric statistics and is analogous to the bandwidth selection problem in kernel density estimation.

Three rules have achieved canonical status in statistical practice. Each encodes a different philosophical stance about the nature of the data and the goals of the visualization.

Formula Reference: The Three Standard Rules

Rule	Formula	Output	Assumption	Sample Range	Best Used When
Sturges (1926)	k = 1 + log₂(n)	k = number of bins	Data follow a binomial distribution; approximate normality	n = 30 to 200	Exploratory analysis of moderate, roughly symmetric samples
Freedman-Diaconis (1981)	h = 2 × IQR × n^−1/3	h = bin width	Uses IQR; robust to outliers; no normality assumed	n ≥ 100 recommended	Large samples, skewed distributions, or data with outliers
Scott (1979)	h = 3.5 × SD × n^−1/3	h = bin width	Minimises MISE; assumes approximately normal data	Any n; optimal under normality	Normally distributed data; reference standard in R and MATLAB
Converting bin width to bin count: k = ceil( (max − min) / h ) — applies to Freedman-Diaconis and Scott only. IQR = interquartile range; SD = sample standard deviation; n = sample size; MISE = mean integrated squared error.

Sturges (1926)

Number of Bins

k = 1 + log₂(n)
Bin width: h = (max − min) / k

Derived from the binomial distribution assumption. For n = 50: k = 1 + log2(50) = 6.64, so k = 7. Underestimates bins for large or skewed samples.

Freedman-Diaconis (1981)

Bin Width

h = 2 × IQR × n^−1/3
k = ceil( (max − min) / h )

Uses IQR rather than SD, making it robust against outliers. For n = 100 with IQR = 10: h = 2 × 10 × 100^−1/3 = 4.64. Typically produces more bins than Sturges.

Scott (1979)

Bin Width

h = 3.5 × SD × n^−1/3
k = ceil( (max − min) / h )

Minimises mean integrated squared error (MISE) under normality. For n = 100 with SD = 15: h = 3.5 × 15 × 100^−1/3 = 11.14. The default in R (hist()) and MATLAB.

Reporting Requirement There is no universally optimal bin width. Each rule encodes different assumptions about the data distribution. Researchers must report which rule was applied and justify the choice in the figure caption or methods section. When the three rules produce substantially different histograms, this discrepancy is itself informative about the distributional properties of the data and should be discussed rather than concealed by selecting the most visually convenient result.

Descriptive Statistics Formulas

The following formulas define the descriptive statistics computed by this tool. All estimates use the sample formulas with an n minus 1 denominator where applicable, which provides unbiased estimates of the corresponding population parameters.

Central Tendency

Mean

x-bar = (1/n) × ∑ x_i

Median

Middle value of sorted data (or average of two middle values for even n)

Spread and Variability

Sample Variance and Standard Deviation

s² = ∑(x_i − x-bar)² / (n − 1)
s = √s²

Standard Error and 95% Confidence Interval

SE = s / √n
95% CI = x-bar ± t_{0.025, n−1} × SE

Shape: Skewness and Kurtosis

Sample Skewness (Fisher g1, bias-corrected)

g1 = [ n(n+1) / ((n−1)(n−2)(n−3)) ] × ∑( (x_i−x-bar) / s )³
• |g1| ≤ 0.5: approximately symmetric
• 0.5 < |g1| ≤ 1.0: moderately skewed
• |g1| > 1.0: substantially skewed

Sample Excess Kurtosis (Fisher g2, bias-corrected)

g2 = [(n−1)/((n−2)(n−3))] × [(n+1)K₄ + 6]
where K₄ = [n∑(x_i−x-bar)⁴ / s⁴] − 3
• g2 = 0: mesokurtic (normal tail weight)
• g2 > 0: leptokurtic (heavy tails)
• g2 < 0: platykurtic (light tails)

Order Statistics and IQR

Quartiles and Interquartile Range

Q1 = 25th percentile (first quartile)
Q2 = 50th percentile (median)
Q3 = 75th percentile (third quartile)
IQR = Q3 − Q1
Computed using linear interpolation between order statistics.

Vertical Axis Quantities

Frequency, Relative Frequency, and Density

Frequency: f_j = count of observations in bin j
Relative frequency: rf_j = f_j / n (∑ rf_j = 1)
Density: d_j = rf_j / h (∑ d_j × h = 1)

Density is required when overlaying a fitted distribution curve. The area of each density bar equals the relative frequency of that interval, so areas sum to 1 regardless of bin width.

What the Histogram Communicates

A well-constructed histogram communicates four distributional features that no summary statistic can convey alone. Central tendency is visible as the approximate location of the tallest bars. Spread is visible as the horizontal extent of the distribution. Shape is visible as symmetry or asymmetry, the presence of a single mode or multiple modes, and the thickness of the tails. Outliers appear as isolated bars at the extreme ends of the distribution, separated from the main body of observations.

These features correspond directly to the summary statistics produced by descriptive analysis. Positive skewness produces a right tail that extends further than the left, pulling the mean above the median. Negative skewness reverses this pattern. Bimodal distributions, which appear as two distinct peaks in the histogram, may indicate that the sample is drawn from two distinct subpopulations with different characteristics, a finding that no single mean or standard deviation can capture and that has direct implications for the appropriateness of parametric tests that assume a single homogeneous population.

Skewness Interpretation

Skewness quantifies the asymmetry of a distribution. A value between -0.5 and 0.5 indicates approximate symmetry. Values between 0.5 and 1.0 (or -0.5 to -1.0) indicate moderate skewness. Values beyond 1.0 or -1.0 indicate substantial skewness. In practice, skewness above 2.0 in absolute value raises serious concerns about the validity of statistical procedures that assume normality.

Excess Kurtosis Interpretation

Excess kurtosis measures tail heaviness relative to the normal distribution. A value of zero indicates normal tail weight (mesokurtic). Positive excess kurtosis (leptokurtic) indicates heavier tails and a sharper peak, common in financial returns and reaction time data. Negative excess kurtosis (platykurtic) indicates lighter tails and a flatter peak. Values beyond 3.0 in absolute value suggest substantial departure from normality.

Frequency, Relative Frequency, and Density

The vertical axis of a histogram can represent three distinct quantities, each appropriate for different purposes. Frequency (count) displays the raw number of observations in each bin and is appropriate when the absolute count is the quantity of interest. Relative frequency (proportion) divides each count by the total sample size, producing values that sum to one and allowing comparison across samples of different sizes. Density divides the relative frequency by the bin width, producing an estimate of the probability density function that is independent of the chosen bin width in the sense that the area of each bar equals the relative frequency of that interval. Density is the correct vertical axis when the histogram is intended as a visual estimate of the probability density function, and it is required when overlaying a fitted parametric distribution curve.

Design Requirements for Academic Publication

The APA Publication Manual (7th edition, 2020) specifies that figures must be labelled with a sequential Arabic numeral appearing in bold below the image, followed by the figure title in italic title case. The caption must describe the variable displayed, identify the bin-width rule used, state the sample size, and define any abbreviations. The figure must be referenced in the body text before it appears. Journals that print in black and white require that the histogram bar fill be a single grey shade distinct from the axis lines and gridlines, with no gradient fills or three-dimensional effects that distort the accurate reading of bar heights.

APA 7th Edition Requirements for Histograms

Label. Figure 1 (bold) appears below the image.
Title. The figure title in italic title case appears on the next line.
Caption. The caption names the variable, states the bin-width rule, reports the sample size, and ends with a period.
In-text reference. The figure must be cited as Figure 1 displays or see Figure 1 before it appears.
Zero baseline. The frequency or density axis must begin at zero. Any other origin constitutes graphical misrepresentation.

Normality Testing and the Histogram

The histogram is frequently used as a preliminary visual check for normality before applying parametric statistical procedures. A normally distributed sample produces a histogram that is approximately bell-shaped, symmetric, and unimodal. However, visual assessment is unreliable for small samples because random sampling variation produces substantial apparent deviations from normality even when the population is exactly normal. For sample sizes below 50, formal tests such as the Shapiro-Wilk test provide more reliable evidence. For samples above 200, formal tests become hypersensitive and routinely reject normality for distributions that are practically indistinguishable from normal for the purposes of the analysis. In these cases, the histogram, supplemented by skewness and kurtosis statistics, provides more actionable guidance than the p-value of a normality test.

Selected Methodological Questions

When should density be used instead of frequency on the vertical axis?

Density should replace frequency or relative frequency on the vertical axis whenever the researcher intends to overlay a fitted probability distribution curve (such as a normal or gamma distribution), because density ensures that the area under each bar equals the relative frequency regardless of bin width. Without density, overlaid distribution curves appear to have the wrong scale relative to the histogram bars. For purely descriptive purposes where the count or proportion is the quantity of interest, frequency or relative frequency is more intuitive for a general audience.

Can histograms be used for ordinal data?

Histograms are technically appropriate only for continuous or discrete quantitative variables where a meaningful distance exists between values. Ordinal variables such as Likert-scale ratings lack this property: the distance between "agree" and "strongly agree" is not equivalent to the distance between any other adjacent pair, and representing them as a histogram implies equal interval spacing that does not exist. The appropriate display for ordinal data is a bar chart with separated bars and clearly labelled categories. Using a histogram for ordinal data constitutes a measurement-level violation that misrepresents the structure of the data.

How should multiple histograms be compared?

When comparing histograms across groups or time points, the vertical and horizontal axes must use identical scales across all histograms in the comparison. Differences in axis scaling between panels constitute a form of visual confounding that makes equal distributions appear different and different distributions appear similar. Overlaid or faceted histograms with a shared axis are preferable to separate figures when the purpose is direct distributional comparison. Density on the vertical axis is preferred over frequency when groups have different sample sizes, because it normalises for sample size differences.

Histogram Maker Reference Guide and Builder

The Histogram in Statistical and Academic Practice

Bin Width Selection: The Central Methodological Decision

Formula Reference: The Three Standard Rules

Descriptive Statistics Formulas

What the Histogram Communicates

Frequency, Relative Frequency, and Density

Design Requirements for Academic Publication

Normality Testing and the Histogram

Selected Methodological Questions

When should density be used instead of frequency on the vertical axis?

Can histograms be used for ordinal data?

How should multiple histograms be compared?

Histogram Builder

Descriptive Statistics Calculator