# Regression With Scikit Learn (Part 4)

This is Day 32 of the #100DaysOfPython challenge.

This post will demonstrate how to use regression with regularization using `scikit-learn`

.

We will be working from the code written in part three.

Source code can be found on my GitHub repo `okeeffed/regression-with-scikit-learn-part-four`

.

## Prerequisites

- Familiarity Conda package, dependency and virtual environment manager. A handy additional reference for Conda is the blog post "The Definitive Guide to Conda Environments" on "Towards Data Science".
- Familiarity with JupyterLab. See here for my post on JupyterLab.
- These projects will also run Python notebooks on VSCode with the Jupyter Notebooks extension. If you do not use VSCode, it is expected that you know how to run notebooks (or alter the method for what works best for you).

## Getting started

Let's first clone the code from part three into the `regression-with-scikit-learn-part-four`

directory.

`1 2 3`

`# Make the `regression-with-scikit-learn-part-three` directory $ git clone https://github.com/okeeffed/regression-with-scikit-learn-part-three.git regression-with-scikit-learn-part-four $ cd regression-with-scikit-learn-part-four`

We can now begin adding code to our notebook at `docs/linear_regression.ipynb`

.

## What is Regularized Regression?

"Regularization" is a method to give a penalty to the model in order to prevent overfitting. The penalty is a function of the model's complexity. The more complex the model, the higher the penalty.

"Coefficient estimates are constrained to zero. The size (or magnitude) of the coefficient, as well as the error term, are penalized." - `statisticshowto.com`

Two commonly used methods of regularization are:

- Ridge Regression
- Lasso Regression

We will be exploring both today.

## Ridge Regression

Ridge regression tunes a model that is used to analyze data that has **multicollinearity**.

Multicollinearity is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. - Wikipedia

When the issue of multicollinearity occurs, least-squares are unbiased and variances are large. As a result, our predicted values will result to be far away from the actual values.

By using regularization, we can reduce the variance of our predictions. It shrinks the parameters to prevent multicollinearity. It ill also reduce the model complexity by shrinking coefficients.

To use `Ridge Regression`

in our work, we ourselves need to pick the value of **alpha**. It is similar to picking the value `k`

for k-Nearest Neighbors.

The process of finding the **alpha** that works best is known as **Hyperparameter Tuning**.

The value of alpha controls model complexity. `Alpha = 0`

: We get back OLS (which can lead to overfitting). Alpha being large means that large residuals are being over-penalized. This can lead to underfitting.

In a new cell, let's add the following:

`1 2 3 4 5 6 7`

`from sklearn.linear_model import Ridge X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Normalize = True will ensure all values are between 0 and 1 ridge = Ridge(alpha=0.1, normalize=True) ridge.fit(X_train, y_train) ridge_pred = ridge.predict(X_test) ridge.score(X_test, y_test) # 0.6996938275127313`

By running our code, we can see that the `R^2`

value is 0.6996938275127313.

## Lasso Regression

Lasso regression is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like the mean. This type is very useful when you have high levels of muticollinearity or when you want to automate certain parts of model selection, like variable selection/parameter elimination.

Applying `Lasso`

regression is similar to `Ridge`

. In a new cell, let's add the following:

`1 2 3 4 5 6 7 8 9`

`# Lasso regression from sklearn.model_selection import train_test_split from sklearn.linear_model import Lasso X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Normalize = True will ensure all values are between 0 and 1 lasso = Lasso(alpha=0.1, normalize=True) lasso.fit(X_train, y_train) lasso_pred = ridge.predict(X_test) lasso.score(X_test, y_test) # 0.5950229535328551`

Scoring our split gives us an `R^2`

value of 0.5950229535328551.

## Lasso for feature selection

One of the important aspects of Lasso regression is using it to select important features of a dataset.

It does this because it tends to shrink less important features down to zero. Coefficients that are not shrunk to zero are considered more important by the algorithm.

To demonstrate, let's add one last cell:

`1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21`

`# Shrinking less important features from sklearn.linear_model import Lasso from sklearn.datasets import load_boston import matplotlib.pyplot as plt plt.style.use('ggplot') boston = load_boston() X = boston.data y = boston.target lasso = Lasso(alpha=0.1) # Extract coef_ lasso_coef = lasso.fit(X, y).coef_ _ = plt.plot(range(len(boston.feature_names)), lasso_coef) _ = plt.xticks(range(len(boston.feature_names)), boston.feature_names, rotation=60) _ = plt.ylabel('Coefficients') plt.show()`

The output is as follows:

Lasso coefficients assigned to features

This diagram works as a great sanity check to confirm what we had worked out earlier: that the number of rooms is an important feature in predicting the price.

## Summary

Today's post looked into both `Ridge`

and `Lasso`

regression, as well as how to apply those methods using Scikit Learn.

## Resources and further reading

*Photo credit: chilis*

## Dennis O'Keeffe

Melbourne, Australia

## Related articles

### 5 Handy Python Tips For Beginners

- 100daysofpython

After 30+ days of my 100 day Python journey, I stop to reflect on the five tips that have been unusual by helpful coming into the language.

### Regression With Scikit Learn (Part 3)

- python
- 100daysofpython
- machine learning

In part three, we look at using k-fold cross-validation to prevent dependency on a particular train/test split.

### Regression With Scikit Learn (Part 2)

- python
- 100daysofpython
- machine learning

In second blog post on linear regression, we take what we learned in part one and look deeper into the basics of linear regression and applying a train-test-split to our data.

### Regression With Scikit Learn (Part 1)

- python
- 100daysofpython
- machine learning

Continuing with our look into supervised learning, today we explore the basics linear regression by applying it to another Scikit Learn toy dataset.

### Measuring Classifier Model Performance

- python
- 100daysofpython
- machine learning

See how we can visual test the performance capability of our K-Nearest Neighbors classifier using Scikit Learn.

### First Look At Supervised Learning With Classification

- python
- 100daysofpython
- machine learning

The 100 Days of Python series is moving into its second phase with Machine Learning. The journey will start with a first look at project setup and classification.

1,200+ PEOPLE ALREADY JOINED ❤️️

## Get fresh posts + news direct to your inbox.

No spam. We only send you relevant content.