Logistic Regression to predict student admission

Problem statement

Suppose that you are the administrator of a university department and you want to determine each applicant’s chance of admission based on their results on two exams. You have historical data from previous applicants that you can use as a training set for logistic regression. For each training example, you have the applicant’s scores on two exams and the admissions decision. Your task is to build a classification model that estimates an applicant’s probability of admission based the scores from those two exams

Load Data

The file applicant_data.txt contains the dataset for our logistic regression problem. The first two columns contains the exam scores and the third column contains a label which indicateds admission verdict. The dataset is loaded from the data file into the variables X and y:
data = load('applicant_data.txt');
X = data(:, [1, 2]); y = data(:, 3);

Plot Data

Before starting to implement any learning algorithm, it is always good to visualize the data if possible. We start the exercise by first plotting the data to understand the problem we are working with.
% Implementaion of plotData is at the end of document
plotData(X, y);
% Put some labels
hold on;
% Labels and Legend
xlabel('Exam 1 score');
ylabel('Exam 2 score');
% Specified in plot order
legend('Admitted', 'Not admitted');
title('Scatter plot of training data');
hold off;

Sigmoid function

The logistic regression hypothesis is defined as:
where function g is the sigmoid function. The sigmoid function is defined as:
For large positive values of x, the sigmoid should be close to 1, while for large negative values, the sigmoid should be close to 0. for 0 it should be exactly 0.5. Implementation of sigmoid function is given at the end of document.
Cost function and gradient
The cost function in logistic regression is
and the gradient of the cost is a vector of the same length as θ where the element (for j = 0, 1,...,n) is defined as follows:
Let's compute initial cost and gradient
% Setup the data matrix appropriately, and add ones for the intercept term
[m, n] = size(X);
% Add intercept term to x and X_test
X = [ones(m, 1) X];
% Initialize fitting parameters
initial_theta = zeros(n + 1, 1);
% Compute and display initial cost and gradient
[cost, grad] = costFunction(initial_theta, X, y);
%Implementation of costFunction that computes cost and gradient is at the end of the document.
fprintf('Cost at initial theta (zeros): %f\n', cost);
Cost at initial theta (zeros): 0.693147
fprintf('Gradient at initial theta (zeros): \n');
Gradient at initial theta (zeros):
fprintf(' %f \n', grad);
-0.100000
-12.009217
-11.262842
Learning parameters using builtin function
Octave/MATLAB’s fminunc is an optimization solver that finds the minimum of an unconstrained function. For logistic regression, we want to optimize the cost function J(θ) with parameters θ.
Concretely, we are going to use fminunc to find the best parameters θ for the logistic regression cost function, given a fixed dataset (of X and y values) we will pass to fminunc the following inputs:
We already implemented everything needed to use the builtin function so let's use that
% Set options for fminunc
options = optimset('GradObj', 'on', 'MaxIter', 400);
% Run fminunc to obtain the optimal theta
% This function will return theta and the cost
[theta, cost] = fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);
Local minimum found.

Optimization completed because the size of the gradient is less than
the default value of the optimality tolerance.

<stopping criteria details>
% Print theta to screen
fprintf('Cost at theta found by fminunc: %f\n', cost);
Cost at theta found by fminunc: 0.203498
fprintf('theta: \n');
theta:
fprintf(' %f \n', theta);
-25.161343
0.206232
0.201472
In this code snippet, we first defined the options to be used with fminunc. Specifically, we set the GradObj option to on, which tells fminunc that our function returns both the cost and the gradient. This allows fminunc to use the gradient when minimizing the function. Furthermore, we set the MaxIter option to 400, so that fminunc will run for at most 400 steps before it terminates. To specify the actual function we are minimizing, we use a "short-hand" for specifying functions with the . This creates a function, with argument t, which calls your costFunction. This allows us to wrap the costFunction for use with fminunc.
Using costFunction fminunc will converge on the right optimization parameters and return the final values of the cost and θ. By using fminunc, we did not have to write any loops ourself, or set a learning rate like you did for gradient descent. This is all done by fminunc: we only needed to provide a function calculating the cost and the gradient.

Plot decision boundary

Using final θ value let's plot the decision boundary on the training data.
% Plot Boundary
plotDecisionBoundary(theta, X, y);
% Implementation is given at the end of document
% Put some labels
hold on;
% Labels and Legend
xlabel('Exam 1 score');
ylabel('Exam 2 score');
% Specified in plot order
legend('Admitted', 'Not admitted');
hold off;

Evaluating logistic regression

After learning the parameters, we can use the model to predict whether a particular student will be admitted.
prob = sigmoid([1 45 85] * theta);
fprintf('For a student with scores 45 and 85, we predict an admission probability of %f\n\n', prob);
For a student with scores 45 and 85, we predict an admission probability of 0.776291
Another way to evaluate the quality of the parameters we have found is to see how well the learned model predicts on our training set.
% Compute accuracy on our training set
p = predict(theta, X);
%Implementation is given at the end of document
fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);
Train Accuracy: 89.000000
fprintf('\nProgram paused. Press enter to continue.\n');
Program paused. Press enter to continue.

Implementation of functions

Implementation of plotData
function plotData(X, y)
%PLOTDATA Plots the data points X and y into a new figure
% PLOTDATA(x,y) plots the data points with + for the positive examples
% and o for the negative examples. X is assumed to be a Mx2 matrix.
% Create New Figure
figure; hold on;
pos = find(y==1); neg = find(y == 0);
% Plot Examples
plot(X(pos, 1), X(pos, 2), 'k+','LineWidth', 2,'MarkerSize', 7);
plot(X(neg, 1), X(neg, 2), 'ko', 'MarkerFaceColor', 'y','MarkerSize', 7);
% =========================================================================
hold off;
end
Implementation of sigmoid function
The code works with vectors and matrices. For a matrix, function performs the sigmoid function on every element
function g = sigmoid(z)
%SIGMOID Compute sigmoid function
% g = SIGMOID(z) computes the sigmoid of z.
g = zeros(size(z));
for row_index = 1:size(z,1)
for col_index = 1: size(z,2)
val = z(row_index, col_index);
g(row_index, col_index) = 1/(1 + exp(-val));
end
end
end
Implementation of costFunction
function [J, grad] = costFunction(theta, X, y)
%COSTFUNCTION Compute cost and gradient for logistic regression
% J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
% parameter for logistic regression and the gradient of the cost
% w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
J = 0;
for index = 1:m
y_i = y(index);
x_i = X(index,:);
h_theta_i = sigmoid(x_i * theta);
J = J + ( -y_i * log(h_theta_i) - (1-y_i) * log(1-h_theta_i ) );
end
J = J/m;
grad = zeros(size(theta));
for j = 1:size(theta)
sum = 0;
for i = 1:m
x_i = X(i,:);
predicted = 1/ (1 + exp (- (x_i*theta)));
y_i = y(i);
sum = sum + (predicted - y_i)* X(i,j);
end
grad(j) = sum/m;
end
% Note: grad should have the same dimensions as theta
end
Implementation of plotDecisionBoundary
function plotDecisionBoundary(theta, X, y)
%PLOTDECISIONBOUNDARY Plots the data points X and y into a new figure with
%the decision boundary defined by theta
% PLOTDECISIONBOUNDARY(theta, X,y) plots the data points with + for the
% positive examples and o for the negative examples. X is assumed to be
% a either
% 1) Mx3 matrix, where the first column is an all-ones column for the
% intercept.
% 2) MxN, N>3 matrix, where the first column is all-ones
% Plot Data
plotData(X(:,2:3), y);
hold on
if size(X, 2) <= 3
% Only need 2 points to define a line, so choose two endpoints
plot_x = [min(X(:,2))-2, max(X(:,2))+2];
% Calculate the decision boundary line
plot_y = (-1./theta(3)).*(theta(2).*plot_x + theta(1));
% Plot, and adjust axes for better viewing
plot(plot_x, plot_y);
% Legend, specific for the exercise
legend('Admitted', 'Not admitted', 'Decision Boundary');
axis([30, 100, 30, 100]);
else
% Here is the grid range
u = linspace(-1, 1.5, 50);
v = linspace(-1, 1.5, 50);
z = zeros(length(u), length(v));
% Evaluate z = theta*x over the grid
for i = 1:length(u)
for j = 1:length(v)
z(i,j) = mapFeature(u(i), v(j))*theta;
end
end
z = z'; % important to transpose z before calling contour
% Plot z = 0
% Notice you need to specify the range [0, 0]
contour(u, v, z, [0, 0], 'LineWidth', 2);
end
hold off
end
Implementation of predict
function p = predict(theta, X)
%PREDICT Predict whether the label is 0 or 1 using learned logistic
%regression parameters theta
% p = PREDICT(theta, X) computes the predictions for X using a
% threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)
m = size(X, 1); % Number of training examples
p = zeros(m, 1);
for index = 1:m
x_i = X(index,:);
h_theta_i = sigmoid(x_i * theta);
if(h_theta_i<.5)
p(index) = 0;
else
p(index) = 1;
end
end
% =========================================================================
end