Linear regression with single variable to predict profits
Problem statement
Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet. The chain already has trucks in various cities and you have data for profits and populations from the cities. You would like to use this data to help you select which city to expand to next.
Load data
The file data.txt contains the dataset for our linear regression problem. The first column is the population of a city and the second column is the profit of a food truck in that city. A negative value for profit indicates a loss. The dataset is loaded from the data file into the variables x and y:
data = load('data.txt'); % read comma separated data
x = data(:, 1); y = data(:, 2);
m = length(y); % number of training examples
Plot data
Before starting on any task, it is often useful to understand the data by visualizing it. For this dataset, we can use a scatter plot to visualize the data, since it has only two properties to plot (profit and population). (Many other problems in real life are multi-dimensional and can’t be plotted on a 2-d plot.)
Next, lets create a scatter plot of the data by calling a method plotData, function definition is provided at the end of document
plotData(x, y);
title('Scatter plot of training data');
Optimization function
The objective of linear regression is to minimize the cost function
where the hypothesis is given by the linear model
The parameters of model are the values. These are the values we will adjust to minimize cost . One way to do this is to use the batch gradient descent algorithm. In batch gradient descent, each iteration performs the update
(simultaneously update for all j)
With each step of gradient descent, parameters come closer to the optimal values that will achieve the lowest cost .
Initialization
We store each example as a row in the the x matrix. To take into account the intercept term , we add an additional first column to X and set it to all ones. This allows us to treat as simply another ‘feature’. In the following lines, we add another dimension to our data to accommodate the intercept term. We also initialize the initial parameters to 0 and the learning rate alpha to 0.01
X = [ones(m, 1), data(:,1)]; % Add a column of ones to x
theta = zeros(2, 1); % initialize fitting parameters
iterations = 1500;
alpha = 0.01;
We compute initial cost by calling computeCost function, function definition is given at the end of of document.
fprintf('Initial cost is %f\n',computeCost(X, y, theta));
Gradient descent
We optimize the cost function parameter by calling to gradientDescent method, again function definition is given at the end of the document. A good way to verify that gradient descent is working correctly is to look at the value of and check that it is decreasing with each step. We print the cost after every hundred iteration to keep things short. The cost is parameterized by the vector θ, not X and y. That is, we minimize the value of by changing the values of the vector θ, not by changing X or y. Value of should never increase, and should converge to a steady value by the end of the algorithm
theta = gradientDescent(X, y, theta, alpha, iterations);
Let's take a look at how our linear regression curve fits with training data.
plotData(x, y);
hold on; % keep previous plot visible
plot(X(:,2), X*theta, '-')
legend('Training data', 'Linear regression')
title('Training data with linear regression fit');
hold off % don't overlay any more plots on this figure
Prediction
Let's see value of our optimized θ parameter and make predictions on profits in areas of 35,000 and 70,000 people using that
% print theta to screen
fprintf('Theta found by gradient descent: ');
fprintf('%f %f \n', theta(1), theta(2));
% Predict values for population sizes of 35,000 and 70,000
predict1 = [1, 3.5] *theta;
fprintf('For population = 35,000, we predict a profit of %f\n',predict1*10000);
predict2 = [1, 7] * theta;
fprintf('For population = 70,000, we predict a profit of %f\n',predict2*10000);
Visualizing J(θ)
To understand the cost function better, we will now plot the cost over a 2-dimensional grid of and values.
% Grid over which we will calculate J
theta0_vals = linspace(-10, 10, 100);
theta1_vals = linspace(-1, 4, 100);
% initialize J vals to a matrix of 0's
J_vals = zeros(length(theta0_vals), length(theta1_vals));
% Fill out J vals
for i = 1:length(theta0_vals)
for j = 1:length(theta1_vals)
t = [theta0_vals(i); theta1_vals(j)];
J_vals(i,j) = computeCost(X, y, t);
end
end
After these lines are executed, we will have a 2-D array of values. We will then use these values to produce surface and contour plots of using the surf and contour commands.
% Because of the way meshgrids work in the surf command, we need to
% transpose J_vals before calling surf, or else the axes will be flipped
J_vals = J_vals';
% Surface plot
figure;
surf(theta0_vals, theta1_vals, J_vals);
xlabel('\theta_0'); ylabel('\theta_1');
title('Surface plot of J(θ)');
% Contour plot
figure;
% Plot J_vals as 15 contours spaced logarithmically between 0.01 and 100
contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 3, 20))
xlabel('\theta_0'); ylabel('\theta_1');
title('Contour plot of J(θ)');
hold on;
plot(theta(1), theta(2), 'rx', 'MarkerSize', 10, 'LineWidth', 2);
Implementation of functions
Implementation of plotData
function plotData(x, y)
%PLOTDATA Plots the data points x and y into a new figure
% PLOTDATA(x,y) plots the data points and gives the figure axes labels of
% population and profit.
figure; % open a new figure window
plot(x, y, 'rx', 'MarkerSize', 10); % Plot the data
ylabel('Profit in $10,000s'); % Set the y?axis label
xlabel('Population of City in 10,000s'); % Set the x?axis label
end
Implementation of computeCost
function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
% J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
% parameter for linear regression to fit the data points in X and y
% Initialize some useful values
m = length(y); % number of training examples
predicted_values = X * theta;
difference = predicted_values - y;
difference_squared = difference.^2;
J = sum(difference_squared);
J = J/2;
J = J/m;
end
Implementation of gradientDescent
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
theta_zero = theta(1,1);
prediced_values = X * theta;
difference = prediced_values - y;
col = X(:,1);
multi = col(:,1).* difference(:,1);
total = sum(multi);
theta_zero = theta_zero -( (alpha/m) * total );
theta_one = theta(2,1);
difference_1 = prediced_values - y;
col_1 = X(:,2);
multi_1 = col_1(:,1).* difference_1(:,1);
total_1 = sum(multi_1);
theta_one = theta_one -( (alpha/m) * total_1 );
theta(1,1) = theta_zero;
theta(2,1) = theta_one;
% ============================================================
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);
%Print value after every 100 iteration
if mod(iter,100)==0
fprintf('Cost after iteration %d :%f\n',iter,J_history(iter));
end
end
end