Logistic Regression to predict student admission

Problem statement

Suppose that you are the administrator of a university department and you want to determine each applicant’s chance of admission based on their results on two exams. You have historical data from previous applicants that you can use as a training set for logistic regression. For each training example, you have the applicant’s scores on two exams and the admissions decision. Your task is to build a classification model that estimates an applicant’s probability of admission based the scores from those two exams

Load Data

The file applicant_data.txt contains the dataset for our logistic regression problem. The first two columns contains the exam scores and the third column contains a label which indicateds admission verdict. The dataset is loaded from the data file into the variables X and y:

data = load('applicant_data.txt');
X = data(:, [1, 2]); y = data(:, 3);

Plot Data

Before starting to implement any learning algorithm, it is always good to visualize the data if possible. We start the exercise by first plotting the data to understand the problem we are working with.

% Implementaion of plotData is at the end of document

plotData(X, y);

% Put some labels

hold on;

% Labels and Legend

xlabel('Exam 1 score');

ylabel('Exam 2 score');

% Specified in plot order

legend('Admitted', 'Not admitted');

title('Scatter plot of training data');

hold off;

Sigmoid function

The logistic regression hypothesis is defined as:

where function g is the sigmoid function. The sigmoid function is defined as:

For large positive values of x, the sigmoid should be close to 1, while for large negative values, the sigmoid should be close to 0. for 0 it should be exactly 0.5. Implementation of sigmoid function is given at the end of document.

Cost function and gradient

The cost function in logistic regression is

and the gradient of the cost is a vector of the same length as θ where the

element (for j = 0, 1,...,n) is defined as follows:

Let's compute initial cost and gradient

%  Setup the data matrix appropriately, and add ones for the intercept term
[m, n] = size(X);
% Add intercept term to x and X_test
X = [ones(m, 1) X];
% Initialize fitting parameters
initial_theta = zeros(n + 1, 1);
% Compute and display initial cost and gradient
[cost, grad] = costFunction(initial_theta, X, y);
%Implementation of costFunction that computes cost and gradient is at the end of the document.
fprintf('Cost at initial theta (zeros): %f\n', cost);
Cost at initial theta (zeros): 0.693147
fprintf('Gradient at initial theta (zeros): \n');
Gradient at initial theta (zeros): 
fprintf(' %f \n', grad);
 -0.100000 
 -12.009217 
 -11.262842 

Learning parameters using builtin function

Octave/MATLAB’s fminunc is an optimization solver that finds the minimum of an unconstrained function. For logistic regression, we want to optimize the cost function J(θ) with parameters θ.

Concretely, we are going to use fminunc to find the best parameters θ for the logistic regression cost function, given a fixed dataset (of X and y values) we will pass to fminunc the following inputs:

The initial values of the parameters we are trying to optimize.
A function that, when given the training set and a particular θ, computes the logistic regression cost and gradient with respect to θ for the dataset (X, y).

We already implemented everything needed to use the builtin function so let's use that

%  Set options for fminunc
options = optimset('GradObj', 'on', 'MaxIter', 400);
%  Run fminunc to obtain the optimal theta
%  This function will return theta and the cost 
[theta, cost] = fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);
Local minimum found.

Optimization completed because the size of the gradient is less than
the default value of the optimality tolerance.

<stopping criteria details>
% Print theta to screen
fprintf('Cost at theta found by fminunc: %f\n', cost);
Cost at theta found by fminunc: 0.203498
fprintf('theta: \n');
theta: 
fprintf(' %f \n', theta);
 -25.161343 
 0.206232 
 0.201472 

In this code snippet, we first defined the options to be used with fminunc. Specifically, we set the GradObj option to on, which tells fminunc that our function returns both the cost and the gradient. This allows fminunc to use the gradient when minimizing the function. Furthermore, we set the MaxIter option to 400, so that fminunc will run for at most 400 steps before it terminates. To specify the actual function we are minimizing, we use a "short-hand" for specifying functions with the

. This creates a function, with argument t, which calls your costFunction. This allows us to wrap the costFunction for use with fminunc.

Using costFunction fminunc will converge on the right optimization parameters and return the final values of the cost and θ. By using fminunc, we did not have to write any loops ourself, or set a learning rate like you did for gradient descent. This is all done by fminunc: we only needed to provide a function calculating the cost and the gradient.

Plot decision boundary

Using final θ value let's plot the decision boundary on the training data.

% Plot Boundary

plotDecisionBoundary(theta, X, y);

% Implementation is given at the end of document

% Put some labels

hold on;

% Labels and Legend

xlabel('Exam 1 score');

ylabel('Exam 2 score');

% Specified in plot order

legend('Admitted', 'Not admitted');

hold off;

Evaluating logistic regression

After learning the parameters, we can use the model to predict whether a particular student will be admitted.

prob = sigmoid([1 45 85] * theta);
fprintf('For a student with scores 45 and 85, we predict an admission probability of %f\n\n', prob);
For a student with scores 45 and 85, we predict an admission probability of 0.776291

Another way to evaluate the quality of the parameters we have found is to see how well the learned model predicts on our training set.

% Compute accuracy on our training set
p = predict(theta, X);
%Implementation is given at the end of document
fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);
Train Accuracy: 89.000000
fprintf('\nProgram paused. Press enter to continue.\n');
Program paused. Press enter to continue.

Implementation of functions

Implementation of plotData

function plotData(X, y)
%PLOTDATA Plots the data points X and y into a new figure 
%   PLOTDATA(x,y) plots the data points with + for the positive examples
%   and o for the negative examples. X is assumed to be a Mx2 matrix.
% Create New Figure
figure; hold on;
pos = find(y==1); neg = find(y == 0);
% Plot Examples
plot(X(pos, 1), X(pos, 2), 'k+','LineWidth', 2,'MarkerSize', 7);
plot(X(neg, 1), X(neg, 2), 'ko', 'MarkerFaceColor', 'y','MarkerSize', 7);
% =========================================================================
hold off;
end

Implementation of sigmoid function

The code works with vectors and matrices. For a matrix, function performs the sigmoid function on every element

function g = sigmoid(z)
%SIGMOID Compute sigmoid function
%   g = SIGMOID(z) computes the sigmoid of z.
g = zeros(size(z));
for row_index = 1:size(z,1)
    for col_index = 1: size(z,2)
    val = z(row_index, col_index);
    g(row_index, col_index) = 1/(1 + exp(-val));
    end
end
end

Implementation of costFunction

function [J, grad] = costFunction(theta, X, y)
%COSTFUNCTION Compute cost and gradient for logistic regression
%   J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
%   parameter for logistic regression and the gradient of the cost
%   w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
J = 0;
for index = 1:m
   y_i = y(index);
   x_i = X(index,:);
   h_theta_i = sigmoid(x_i * theta);
   J = J +  ( -y_i * log(h_theta_i) - (1-y_i) * log(1-h_theta_i ) );
end
J = J/m;
grad = zeros(size(theta));
for j = 1:size(theta)
    sum = 0;
    for i = 1:m
    x_i = X(i,:);
    predicted = 1/ (1 + exp (- (x_i*theta)));
    y_i = y(i);
    sum = sum + (predicted - y_i)* X(i,j);
    end
    grad(j) = sum/m;
end
% Note: grad should have the same dimensions as theta
end

Implementation of plotDecisionBoundary

function plotDecisionBoundary(theta, X, y)
%PLOTDECISIONBOUNDARY Plots the data points X and y into a new figure with
%the decision boundary defined by theta
%   PLOTDECISIONBOUNDARY(theta, X,y) plots the data points with + for the 
%   positive examples and o for the negative examples. X is assumed to be 
%   a either 
%   1) Mx3 matrix, where the first column is an all-ones column for the 
%      intercept.
%   2) MxN, N>3 matrix, where the first column is all-ones
% Plot Data
plotData(X(:,2:3), y);
hold on
if size(X, 2) <= 3
    % Only need 2 points to define a line, so choose two endpoints
    plot_x = [min(X(:,2))-2,  max(X(:,2))+2];
    % Calculate the decision boundary line
    plot_y = (-1./theta(3)).*(theta(2).*plot_x + theta(1));
    % Plot, and adjust axes for better viewing
    plot(plot_x, plot_y);
    
    % Legend, specific for the exercise
    legend('Admitted', 'Not admitted', 'Decision Boundary');
    axis([30, 100, 30, 100]);
else
    % Here is the grid range
    u = linspace(-1, 1.5, 50);
    v = linspace(-1, 1.5, 50);
    z = zeros(length(u), length(v));
    % Evaluate z = theta*x over the grid
    for i = 1:length(u)
        for j = 1:length(v)
            z(i,j) = mapFeature(u(i), v(j))*theta;
        end
    end
    z = z'; % important to transpose z before calling contour
    % Plot z = 0
    % Notice you need to specify the range [0, 0]
    contour(u, v, z, [0, 0], 'LineWidth', 2);
end
hold off
end

Implementation of predict

function p = predict(theta, X)
%PREDICT Predict whether the label is 0 or 1 using learned logistic 
%regression parameters theta
%   p = PREDICT(theta, X) computes the predictions for X using a 
%   threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)
m = size(X, 1); % Number of training examples
p = zeros(m, 1);
for index = 1:m
   x_i = X(index,:);
   h_theta_i = sigmoid(x_i * theta);
   if(h_theta_i<.5)
       p(index) = 0;
   else
       p(index) = 1;
   end
end
% =========================================================================
end