Scatter plot with regression line or curve in R

Scatter plot based on a model

You can create a scatter plot based on a theoretical model and add it to the plot with the lines function. Consider the example of the following block of code as illustration.

# Data. Model: Y = X ^ 2
set.seed(54)
x <- seq(0, 10, by = 0.05)
y <- x ^ 2 + rnorm(length(x), sd = 20)

# Scatter plot and underlying model
plot(x, y, pch = 16)
lines(x, x ^ 2, col = 2, lwd = 3)

# Text
text(2, 70, expression(Y == X ^ 2))

Underlying model of the scatter plot

Scatter plot with linear regression

You can add a regression line to a scatter plot passing a lm object to the abline function. Recall that coef returns the coefficients of an estimated linear model.

Scatter plot with linear regression line

# Data. Model: Y = X ^ 2
set.seed(54)
x <- seq(0, 10, by = 0.05)
y <- x ^ 2 + rnorm(length(x), sd = 20)

# Linear model
model <- lm(y ~ x)

# Scatter plot and linear regression line
plot(x, y, pch = 16)
abline(model, col = 4, lwd = 3)

# Text
coef <- round(coef(model), 2)
text(2, 70,  paste("Y = ", coef[1], "+", coef[2], "x"))

In addition, if you want you can display the error terms or residuals between actual and estimated values by using the segments function. For that purpose you can input the variable x to the x0 and x1 arguments, the variable y to y0 and the estimate of the model, which is calculated with predict, to y1.

Linear regression with error terms or residuals in R

# Data. Model: Y = X ^ 2
set.seed(54)
x <- seq(0, 10, by = 0.05)
y <- x ^ 2 + rnorm(length(x), sd = 20)

# Linear model
modelo <- lm(y ~ x)

# Scatter plot and linear regression line
plot(x, y, pch = 16)

# Segments with error terms
segments(x0 = x, x1 = x, y0 = y, y1 = predict(modelo),
         lwd = 1, col = "red") 

# Regression line
abline(modelo, col = 4, lwd = 3)

# Paint the points again over the segments
points(x, y, pch = 16)

# Text
coef <- round(coef(modelo), 2)
text(2, 70,  paste("Y = ", coef[1], "+", coef[2], "x"))

Scatter plot with LOWESS regression curve

The LOWESS smoother uses locally-weighted polynomial regression. This non-parametric regression can be estimated with lowess function.

# Data. Model: Y = X ^ 2
set.seed(54)
x <- seq(0, 10, by = 0.05)
y <- x ^ 2 + rnorm(length(x), sd = 20)

# Scatter plot and LOWESS regression curve
plot(x, y, pch = 16)
lines(lowess(x, y), col = 3, lwd = 3)

LOWESS regression scatter plot in R

In this scenario, this method makes a better approximation of the theoretical model than the linear regression. You can check it comparing the theoretical model and the estimate of the LOWESS smoother:

# Data. Model: Y = X ^ 2
set.seed(54)
x <- seq(0, 10, by = 0.05)
y <- x ^ 2 + rnorm(length(x), sd = 20)

plot(x, y, pch = 16)

# Estimated model
lines(lowess(x, y), col = 3, lwd = 3)

# Theoretical model
lines(x, x ^ 2, col = 2, lwd = 3)

# Legend
legend(x = "topleft", # Position
       legend = c("LOWESS", "Theoretical"), # Texts
       lty = 1,       # Line types
       col = c(3, 2), # Colors
       lwd = 2)       # Line width

Comparing the estimate of the regression model and the theoretical model in R