What is Regression?

Linda
5 min readJan 28, 2023

Regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables.

It is used to predict the value of a continuous outcome variable (also known as the dependent variable) based on the values of one or more predictor variables (also known as independent variables). Common types of regression include linear regression, polynomial regression, and logistic regression.

Wikipedia image- Regression line for 50 random points in a Gaussian distribution around the line y=1.5x+2 (not shown). The regression line (shown) that best fits these points is actually y=1.533858x+2.129333.

There are several types of regression analysis, including linear regression, polynomial regression, multiple regression, and logistic regression:

  1. Linear regression is used to model the relationship between a continuous dependent variable and one or more independent variables by fitting a linear equation to the observed data
  2. Polynomial regression is a special case of linear regression in which the relationship between the independent variable x and the dependent variable y is modeled as an nth degree polynomial.
  3. Multiple regression is used when there are multiple independent variables that predict the value of a dependent variable.
  4. Logistic regression is used for predicting a binary outcome (1 / 0, Yes / No, True / False) given a set of independent variables.
  5. Ridge Regression, Lasso Regression, and Elastic Net are some examples of regularized linear regression which are useful when data is over-fitting and have multicollinearity.
  6. Decision Tree Regression, Random Forest Regression, and Gradient Boosting Regression are some examples of Non-linear regression technique.

Linear Regression

Linear regression is a statistical method that is used to model the relationship between a dependent variable (also called the outcome or response variable) and one or more independent variables (also called predictor or explanatory variables).

The goal of linear regression is to find the best-fitting line through the data points that minimizes the difference between the predicted values (based on the line) and the actual values. The equation of the line is represented by the following equation:

y = b0 + b1*x

Where y is the dependent variable, x is the independent variable, b0 is the y-intercept (the point where the line crosses the y-axis), and b1 is the slope of the line (the change in y for a unit change in x).

The slope and y-intercept are estimated using the least squares method, which minimizes the sum of the squared differences between the predicted and actual values. Linear regression can be used to make predictions about future values of the dependent variable, given new values of the independent variable(s).

Wikipedia image- Illustration of linear regression on a data set

Polynomial regression

Polynomial regression is a type of regression analysis in which the relationship between the independent variable x and the dependent variable y is modeled as an nth degree polynomial. It is a generalization of linear regression, where instead of fitting a straight line to the data, a polynomial function is fit.

The equation of a polynomial function of degree n is represented by:

y = b0 + b1x + b2x² + … + bn*x^n

Where b0, b1, b2, …, bn are the coefficients of the polynomial function and x, x², …, x^n are the powers of the independent variable. The goal is to find the optimal values of the coefficients that minimize the difference between the predicted values (based on the polynomial function) and the actual values. Polynomial regression can be used when the relationship between the independent and dependent variables is not linear. It can also be useful in modeling complex phenomena or fitting a model to a non-uniformly sampled data.

Advantages of using polynomial regression, including:

  • Flexibility: Polynomial regression can model relationships between the independent and dependent variables that are not linear. It can fit a wide range of functions, including polynomials of any degree.
  • Handling Non-Linear Relationships: Polynomial regression can capture and model non-linear relationships between the independent and dependent variables, which is not possible with linear regression.
  • Handling Non-Uniformly Sampled Data: Polynomial regression can be useful in fitting a model to a non-uniformly sampled data by providing a smooth fit.
  • Lack of interpretability: Polynomial regression models can be difficult to interpret, especially when the degree of the polynomial is high.
  • Model Selection: Selecting the right degree of polynomial can be challenging, and it requires some trial and error.

Disadvantages of using polynomial regression, including:

  • Overfitting: One of the major disadvantages of polynomial regression is that it can easily overfit the data, particularly when the degree of the polynomial is high. This means that the model may fit the noise in the data instead of the underlying relationship.
  • High Computational Cost: Fitting a polynomial regression model can be computationally expensive, especially for high-degree polynomials and large datasets.
  • Sensitivity to Outliers: Polynomial regression models are sensitive to outliers, which can have a large impact on the estimated coefficients and the overall fit of the model.

However, there are some key differences between the two methods:

  • Linear regression models the relationship between the independent and dependent variables as a straight line, while polynomial regression models it as an nth degree polynomial.
  • Linear regression is best used for modeling linear relationships, while polynomial regression is better suited for modeling non-linear relationships.
  • Linear regression is a simpler method and is less computationally expensive than polynomial regression, which is more complex and requires more computational resources.
  • Linear regression is less prone to overfitting than polynomial regression, which can easily overfit the data, particularly when the degree of the polynomial is high.
  • Linear regression is more interpretable than polynomial regression, which can be difficult to interpret, especially when the degree of the polynomial is high.
  • Linear regression assumes the relationship between the independent and dependent variables is linear, while polynomial regression assumes it is polynomial.

Linear regression and polynomial regression are both types of regression analysis that are used to model the relationship between a dependent variable and one or more independent variables.

by- R.Thigan

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Linda
Linda

Written by Linda

" | Passionate programmer with a love for writing. Crafting articles on all things programming. Let's code and create!"

No responses yet

Write a response