Histogram breaks in R

The hist function uses the Sturges method by default to determine the number of breaks on the histogram. This selection is very important because too many bins will increase the variability and few bins will group the data too much.

breaks argument

The breaks argument controls the number of bars, cells or bins of the histogram. By default breaks = "Sturges".

Sturges method (default)

The default method is the most recommended in the most of the cases.

# Sample data
set.seed(2)
x <- rnorm(2000)

# Histogram
hist(x,
     main = "Sturges")

Histogram in R with Sturges method

Too many bins

If you specify the number of breaks manually make sure the number is not too high.

# Sample data
set.seed(2)
x <- rnorm(2000)

# Histogram
hist(x, breaks = 80,
     main = "Too many bins")

Histogram in R with too many bins

Not enough bins

The number of bins can also be too small in some cases.

# Sample data
set.seed(2)
x <- rnorm(2000)

# Histogram
hist(x, breaks = 5,
     main = "Not enough bins")

Few bins histogram

Scott method

In addition to the Sturges method the breaks argument also supports the Scott method.

# Sample data
set.seed(2)
x <- rnorm(2000)

# Histogram
hist(x, breaks = "Scott",
     main = "Scott")

Histogram with Scott method in R

Freedman-Diaconis (FD) method

The Freedman-Diaconis algorithm can be selected passing “Freedman-Diaconis” or “FD” to the argument.

# Sample data
set.seed(2)
x <- rnorm(2000)

# Histogram
hist(x, breaks = "Freedman-Diaconis",
     main = "Freedman-Diaconis")
hist(x, breaks = "FD", # Equivalent
     main = "Freedman-Diaconis") 

Histogram Freedman-Diaconis method

You can also pass a vector giving the number of breakpoints or a function to compute the number of bins or breakpoints.

Plug in selection

An alternative to the Sturges method and selecting the breaks argument by hand is using the plug-in method to calculate the optimal bandwidth (Wand, 1995). This method is implemented in KernSmooth and you can use it as follows.

Plug in bins selection in R

# Sample data
set.seed(2)
x <- rnorm(2000)

# install.packages("KernSmooth")
library(KernSmooth)

# Optimal bandwidth
bin_width <- dpih(x)

# Number of bins
nbins <- seq(min(x) - bin_width,
             max(x) + bin_width,
             by = bin_width)

# Histogram
hist(x, breaks = nbins,
     main = "Plug-in method")
Storytelling with Data

A Data Visualization Guide for Business Professionals

Buy on Amazon
Better Data Visualizations

A Guide for Scholars, Researchers, and Wonks

Buy on Amazon
ggplot2

Elegant Graphics for Data Analysis

Buy on Amazon

See also