Sample data set

The following data set contains statistics about arrests in the USA. For illustration purpose we are going to use the first twenty rows of the data set in the following examples.

df <- USArrests[1:20, ]

Basic dendrogram

In order to create a dendrogram in R first you will need to calculate the distance matrix of your data with dist, then compute the hierarchical clustering of the distance matrix with hclust and plot the dendrogram.

Option 1

Plot the hierarchical clustering object with the plot function.

# Distance matrix
d <- dist(df)

# Hierarchical clustering
hc <- hclust(d)

# Dendrogram
plot(hc)

Dendrogram in R

Option 2

Transform the hierarchical clustering output to dendrogram class with as.dendrogram. This will create a nicer visualization.

# Distance matrix
d <- dist(df)

# Hierarchical clustering
hc <- hclust(d)

# Dendrogram
plot(as.dendrogram(hc))

# The previous line is similar to:
# plot(hc, hang = -1)

Hierarchical cluster dendrogram in R

The dist function computes the distance matrix based on the euclidean distance, but there are other distance measures available such as "maximum", "manhattan", "canberra", "binary" or "minkowski". Choose the one that best fits your data.

Clustering methods

In addition to several distance measures there are several hierarchical clustering methods you can choose. The default method is "complete", but you can choose between "ward.D", "ward.D2", "single", "average", "mcquitty", "median" or "centroid". Type ?hclust for additional details about each method and other arguments.

In the examples below you can see the difference using each method.

hclust function in R

ward.D

# Distance matrix
d <- dist(df)

# Hierarchical clustering
hc <- hclust(d, method = "ward.D")

# Dendrogram
plot(as.dendrogram(hc), main = "ward.D")

Cluster dendrogram in R

ward.D2

# Distance matrix
d <- dist(df)

# Hierarchical clustering
hc <- hclust(d, method = "ward.D2")

# Dendrogram
plot(as.dendrogram(hc), main = "ward.D2")

Clustering method in R

single

# Distance matrix
d <- dist(df)

# Hierarchical clustering
hc <- hclust(d, method = "single")

# Dendrogram
plot(as.dendrogram(hc), main = "single")

Plot hclust in R

average

# Distance matrix
d <- dist(df)

# Hierarchical clustering
hc <- hclust(d, method = "average")

# Dendrogram
plot(as.dendrogram(hc), main = "average")

Plot dendrogram in R

mcquitty

# Distance matrix
d <- dist(df)

# Hierarchical clustering
hc <- hclust(d, method = "mcquitty")

# Dendrogram
plot(as.dendrogram(hc), main = "mcquitty")

hclust methods

median

# Distance matrix
d <- dist(df)

# Hierarchical clustering
hc <- hclust(d, method = "median")

# Dendrogram
plot(as.dendrogram(hc), main = "median")

hclust centroid method

centroid

# Distance matrix
d <- dist(df)

# Hierarchical clustering
hc <- hclust(d, method = "centroid")

# Dendrogram
plot(as.dendrogram(hc), main = "centroid")

Adding rectangles around hierarchical clusters

Setting the number of clusters

The rect.hclust function allows adding clustering rectangles to the dendrogram. You can select the number of clusters to be displayed with k.

# Distance matrix
d <- dist(df)

# Hierarchical clustering
hc <- hclust(d)

# Dendrogram with 3 clusters
plot(as.dendrogram(hc))
rect.hclust(hc, k = 3)

Clustering with hclust in R

Note that you can display only some of the rectangles based on the number of clusters. In this example we are adding only the first and the third clusters rectangles.

# Distance matrix
d <- dist(df)

# Hierarchical clustering
hc <- hclust(d)

# 3 clusters, two rectangles
plot(as.dendrogram(hc))
rect.hclust(hc, k = 3,
            which = c(1, 3))

Adding rectangles to clusters in R

Clusters based on height

You can also create clusters based on height with h argument. Here we are setting h = 150, so two clusters will be created.

# Distance matrix
d <- dist(df)

# Hierarchical clustering
hc <- hclust(d)

# Dendrogram clusters based on height
plot(as.dendrogram(hc))
rect.hclust(hc, h = 150)

Clustering based on height

Color of the rectangles

The color for each rectangle can be customized with border argument. You can set one color or as many colors as rectangles.

# Distance matrix
d <- dist(df)

# Hierarchical clustering
hc <- hclust(d)

# Cluster colors
plot(as.dendrogram(hc))
rect.hclust(hc, k = 2,
            border = 3:4)

Cluster color customization in R

See also