Ridge And Lasso Regression Made Easy

Manoj Gadde
Analytics Vidhya
Published in
3 min readMar 24, 2021

--

Photo by Greg Rakozy on Unsplash

In this article let’s understand two different techniques in regularised regression

  1. Ridge Regression
  2. Lasso Regression

let’s start our learning by understanding what is Regularisation?

what is Regularisation?
we often face situations in ml where our model performs very well on training data and it fails to generalize on unseen data(overfitting condition) or (low bias and high variance condition). regularisation is a method to reduce the complexity of the model or it is a process to create an optimally complex model. the complexity of a model increases due to high coefficient values and a high number of features, the goal of regularisation is to handle both these problems.

  1. Ridge Regression(L2)

regularised cost function has two components error term, regularised term

Here loss is nothing but an error term followed by regularised term, ‘lambda’ is a hyperparameter to tune or a balancing factor and ‘beta’ is the sum of squared coefficients. so by changing ‘λ’ we are controlling penalty term, if λ=0 it means we are using only the error term without any penalty and if ‘λ’ is a high value it means bigger the penalty and therefore magnitude of coefficients are reduced and it is important to choose optimal ‘λ’ value.

essentially the goal of ridge regression is to reduce the model complexity by reducing the magnitude of coefficients. we can also say that ridge regression takes care of the multicollinearity problem.

2.Lasso Regression(L1)

lasso means ‘least absolute shrinkage selector operator’ and it is similar to ridge regression the only difference is that lasso uses the absolute value of coefficients and it makes coefficients of redundant features to zero which means it helps to select the best features.

even for small values of ‘ λ’ the coefficients of features reduce to zero and we need to choose the optimal ‘λ’ value.

note: we can also use elastic net which is a combination of both ridge and lasso regression. let’s say we have a situation of 500 variables if we use ridge it only reduces the magnitude of coefficients and if we use lasso it will reduce coefficients of redundant features to zero(feature selection) and this leads to loss of information and here it’s good to use the elastic net and it performs well on large datasets.

This concludes our article on regularised regression. I hope this article helped you to understand the regularised regression.

This is my first medium post, if you liked this article please show your support by clapping for this article below and it really motivates me to write more interesting articles and if you have any questions leave a comment below and I’d love to hear from you.

I’m a data science enthusiast and currently pursuing my pg diploma in machine learning & ai from the international institute of information technology Bangalore.

you can also find me in www.linkedin.com/in/manoj-gadde

--

--

Manoj Gadde
Analytics Vidhya

Machine Learning | Deep Learning | Artificial Intelligence