In order to create a kernel density plot you will need to estimate the kernel density. For that purpose you can use the density
function and then pass the density object to the plot
function.
# Data
set.seed(14012021)
data <- rnorm(200, mean = 4)
# Kernel density estimation
d <- density(data)
# Kernel density plot
plot(d, lwd = 2, main = "Default kernel density plot")
The kernel
argument of the density
function uses the gaussian kernel by default (kernel = "gaussian"
), but there are more kernel types available, such as "rectangular"
, "triangular"
, "epanechnikov"
, "biweight"
, "cosine"
and "optcosine"
. The selection will depend on your data, but in most scenarios the default value is the most recommended.
Rectangular kernel
# Data
set.seed(14012021)
data <- rnorm(200, mean = 4)
# Kernel density estimation
d <- density(data,
kernel = "rectangular")
# Kernel density plot
plot(d, lwd = 2, main = "Rectangular kernel")
Triangular kernel
# Data
set.seed(14012021)
data <- rnorm(200, mean = 4)
# Kernel density estimation
d <- density(data,
kernel = "triangular")
# Kernel density plot
plot(d, lwd = 2, main = "Triangular kernel")
Epanechnikov kernel
# Data
set.seed(14012021)
data <- rnorm(200, mean = 4)
# Kernel density estimation
d <- density(data,
kernel = "epanechnikov")
# Kernel density plot
plot(d, lwd = 2, main = "Epanechnikov kernel")
Biweight kernel
# Data
set.seed(14012021)
data <- rnorm(200, mean = 4)
# Kernel density estimation
d <- density(data,
kernel = "biweight")
# Kernel density plot
plot(d, lwd = 2, main = "Biweight kernel")
Cosine kernel
# Data
set.seed(14012021)
data <- rnorm(200, mean = 4)
# Kernel density estimation
d <- density(data,
kernel = "cosine")
# Kernel density plot
plot(d, lwd = 2, main = "Cosine kernel")
The bw
argument of the density
function allows changing the smoothing bandwidth used. You can pass either a value or a string giving a selection rule or a function. The default value is "nrd0"
(or bw.nrd0(.)
), which implements a rule-of-thumb approach. Other available options are:
Rule-of-thumb variation given by Scott (1992)
"nrd"
or bw.nrd(.)
# Data
set.seed(14012021)
data <- rnorm(200, mean = 4)
# Kernel density estimation
d <- density(data,
bw = "nrd")
# Kernel density plot
plot(d, lwd = 2, main = "nrd bandwidth")
Unbiased cross-validation
"ucv"
or bw.ucv(.)
# Data
set.seed(14012021)
data <- rnorm(200, mean = 4)
# Kernel density estimation
d <- density(data,
bw = "ucv")
# Kernel density plot
plot(d, lwd = 2, main = "ucv bandwidth")
Biased cross-validation
"bcv"
or bw.bcv(.)
# Data
set.seed(14012021)
data <- rnorm(200, mean = 4)
# Kernel density estimation
d <- density(data,
bw = "bcv")
# Kernel density plot
plot(d, lwd = 2, main = "bcv bandwidth")
Methods by Sheather & Jones (1991)
"SJ"
or bw.SJ(.)
# Data
set.seed(14012021)
data <- rnorm(200, mean = 4)
# Kernel density estimation
d <- density(data,
bw = "SJ")
# Kernel density plot
plot(d, lwd = 2, main = "SJ bandwidth")
The bandwidth must be chosen carefully. A small bandwidth will create a very overfitted curve while a too big bandwidth will create an oversmoothed curve.
See also