Linear Curve Fitting

  • Find equation of the best fit line for a scatter plot using concept of least square regression.

A straight line is described generically by f(x) = ax + b. In linear curve fitting, the goal is to identify the coefficients ‘a’ and ‘b’ such that f(x) ‘fits’ the given data well. The error between actual points and the ’best-fit’ line has to be minimized. that error such that the line will be the most accurate representation of the points. Here,

  1. Positive or negative error have the same value (data point is above or below the line)

  2. Weight greater errors more heavily.

    Linear Regression

Both things can be achieved by considering the squares of the errors rather than just plain values of errors. \[\mathrm{error} = \Sigma (d_i)^2\] \[\mathrm{error} = (y_1 - f(x_1))^2 + (y_2 - f(x_2))^2 + (y_3 - f(x_3))^2 + .....\] Our fit is a straight line, so now substitute f(x) = ax + b
\[\mathrm{error} = \sum_{i=1}^{n} (y_i - f(x_i))^2 = \sum_{i=1}^{n} (y_i - (ax_i + b))^2\] We have to minimize the error, so we use the concept of maxima/minima from Calculus, take derivative and equate it to zero. Since, there are two unknowns we will take partial derivative twice, once with respect to a and once with respect to b. \[\frac{\partial(\mathrm{error})}{\partial a} = -2 \sum_{i=1}^{n} x_i((y_i - (ax_i + b))) = 0\] \[\frac{\partial(\mathrm{error})}{\partial b} = -2 \sum_{i=1}^{n} ((y_i - (ax_i + b))) = 0\] \[a \sum x_i^2 + b\sum x_i = \sum x_i y_i\] \[a \sum x_i + b \times n = \sum y_i\] These equations can also be put in the matrix form, \[\begin{bmatrix} n & \sum x_i\\ \sum x_i & \sum x_i^2 \end{bmatrix}\begin{bmatrix} b\\a \end{bmatrix} = \begin{bmatrix} \sum y_i \\ \sum x_i y_i \end{bmatrix}\] We will first have to calculate all the \(\Sigma\) from the given data, and then using the above matrix form of equations, get the values of a and b.
Then, the best fit line will be \[y = ax + b\] This line later can be used for prediction of future values such as estimation of population, etc, or in case of missing scientific data, it gives reasonable value by interpolation.

Solved Examples

Solved Example:

10-1-01

Use the least square method to develop a linear trend equation for the data from the following data. State the equation and forecast a trend value for population in the year 2022.

10.1-01



Solution:

10.1-01-Solution

\[a = \dfrac{\Sigma Y}{n} = \dfrac{77}{5} = 15.4, \quad b = \dfrac{\Sigma XY}{\Sigma X^2}= \dfrac{18}{10} =1.8\] The forecasting equation is of the form Y = a + bX
Y = 15.4 + 1.8 X (for year 2019, X = 0).
For year 2022, code=3
$Y_{2022}=15.4+1.8 \times 3 = 20.8$
Hence the population will be 20.8 thousands or 20800.

Correct Answer: B

Solved Example:

10-1-02

Determine a forecast for engine oil shipments when the number of cars sold are 30000.

10.1-02



Solution:

10.1-02-Solution

\begin{align*} \bar{X} &= \dfrac{\Sigma X}{n} = \dfrac{144}{6} = 24\\ \bar{Y} &= \dfrac{\Sigma Y}{n} = \dfrac{65}{6} = 10.833 \end{align*} \begin{align*} b &= \dfrac{\Sigma XY - n \bar{X} \bar{Y}}{\Sigma X^2 - n {\bar{X}}^2}\\ &= \dfrac{1831 - 6 (24) (10.833)}{4156 - 6 {(24)}^2}\\ &= \dfrac{271}{700} = 0.39\\ a &= \bar{Y} - b\bar{X}= 10.833 - 0.3871 \times (24)= 1.54 \end{align*}

The regression equation is Y= 1.54 + 0.39 X (X = Cars Sold in '000, Y = Engine oil shipments)

Then, letting X = 30,

$Y = 1.54 + 0.39(30) = 13.24$, which is $\approx$ 13 shipments.

Correct Answer: B

Solved Example:

10-1-03

You are a car dealer specializing in the sale of 2016 Toyota Camry. Following is the data of seven different cars you have sold recently. Assume all cars in your shop are in same condition and only odometer (km) reading decides the selling price. A person who can afford to spend only $9000, what will be the odometer (km) reading he can expect for his future car?

10.1-03



Solution:

10.1-03

\[\bar{X} = \dfrac{\Sigma X}{n} = \dfrac{259}{7} = 37\] \[\bar{Y} = \dfrac{\Sigma Y}{n} = \dfrac{96.7}{7} = 13.8\] \begin{align*} b&= \dfrac{\Sigma XY - n \bar{X}\bar{Y}}{\Sigma X^2 - n \bar{X}^2}\\ &= \dfrac{3533.2 - 7 \times 37 \times 13.8}{9791 - 7 \times 37^2}\\ &= \dfrac{-41}{208}\\ &= -0.197 \end{align*} \[a= \bar{Y} - b \bar{X} = 13.8 - (-0.197) \times 37 = 21.089\] \begin{align*} Y &= a + bX \\ 9 &= 21.089 + (-0.197) X\\ X &= 61.365 \end{align*}

So, he can expect a car with 61000 km.

Correct Answer: D