 Regression – Find relation between Multiple Inputs and Target variable

One Input variable : When only one input variable and one output variable, scatter chart is useful in finding relationship X1 and y. Below diagram shows the example of such dataset.

Multiple Input variables : Quite often we see regression datasets with multiple input variables. With n-input and one output variable, an n+1 dimensional graph needs to be plotted. It is difficult to draw a plot with more than three dimensions. Linear Regression algorithm will provide a way to visualise this multi-dimensional graph in two dimensions.

A graph between residuals ( target value – predicted value ) vs fitted value ( predicted value ) would explain the relation between multiple input and output variable. The steps are as below.

1. Fit a Linear Regression model.
2. Predict values for all records.
3. Calculate residuals ( target value – predicted value ).
4. Fit a graph between residuals and predicted values.

How it works ?, the logic is simple, when we fit a straight line through data, relation between ERRORS (target minus predicted value) and PREDICTIONS resembles the relation between INPUT and OUTPUT variables.

let’s try to understand the concept with one input variable first.

When we observe above plot, predictions are made on “fitted line”, below two statements hold true

1. (a) The “data points” far from “fitted line” (or predicted value) will have huge error (positive or negative).
2. (b) The “data points” close to “fitted line” (or predicted value) line will have less error (positive or negative).

Now let’s look at Residuals vs Fitted Value graph. 