How did I learn Machine Learning : part 3 - Implement a simple neural network from scratch II

in #technology5 years ago (edited)

Hi,

In my previous post, I have discussed how does a simple neural network work and what are the notations we are going to use in this series of blogs. Today I am going to present to you how do we implement a simple neural network from scratch. I expect that you have a basic understanding of calculus as we are going to work with some mathematical equations. If you don't have such knowledge, please keep on reading and trying out the examples. I guarantee that you will become an expert in machine learning after the end of this series of blogs. I am trying to explain all the questions as much as simple. Also, I am not going to use any programming frameworks (such as TensorFlow, Keras, etc.) because I need to explain to you how to implement the machine learning stuff without using those programming frameworks. Then you can deeply understand how these things are working. If you do not understand any notation, derivation or expression please refer to the previous blog or put some comment. Let's start ...

In this neural network, we are trying to recognize cats vs. non-cats. For that, we are given a training dataset of 209 images and a test dataset of 50 images. The size of one image is 64*64 pixels, and since those images are colored images there are three channels. We are feeding images one by one to the neural network. There are several ways of feeding an image to a neural network. In this example, a flattened image is fed to the neural network and as a result of that, the number of neurons in the input layer equals 12288 (64pixels * 64pixels * 3channels).


In this network, we don't have any hidden layer and the output layer is the first and only layer. In the output layer, we expect a value between 0 and 1. If that value is greater than 0.5, it is considered as a cat. Therefore we have only one neuron in the output layer. Please refer to the attached image to get an idea about the network architecture.

When we consider the implementation of the neural network, we need to write equations for the forward propagation, cost, backward propagation and updating weights and bias. Forward propagation equations are mentioned below.

eqn (1) : z1(i) = w1x1(i) + w2x2(i) + ... + w12288x12288(i) + b
eqn (2) : (i) = a1(i) = g(z1(i))

Now we should calculate the cost per data(L) and cost for the whole dataset(J).

eqn (3) : L(a1(i), y(i))(i) = -y(i)log(a1(i)) - (1 - y(i))log(1 - a1(i))
eqn (4) : J = (Σi=1mL(a1(i), y(i))(i))/m

Now we should calculate the derivative for each weight and bias(∂J/∂w1, ∂J/∂w2, ∂J/∂b2, ... ) such that the cost should be reduced in each iteration. Let's try to calculate the ∂J/∂w1 and I will write ∂J/∂w1 as ∂w1. By the chain rule we can calculate the ∂w1 as follows.

eqn (5.1) : ∂w1 = ∂J/∂w1 = (Σi=1m∂L(i)/∂w1)/m
eqn (5.2) : ∂L(i)/∂w1 = ∂L(i)/∂a1(i)*∂a1(i)/∂w1
eqn (5.3) : ∂a1(i)/∂w1 = ∂a(i)/∂z1(i)*∂z1(i)/∂w1

Using eqn (5.2) and eqn (5.3),
eqn (5.4) : ∂L(i)/∂w1 = ∂L(i)/∂a1(i)*∂a(i)/∂z1(i)*∂z1(i)/∂w1

By getting the derivative of the eqn (3) w.r.t. ∂a1(i),
eqn (5.5) : ∂L(i)/∂a1(i) = - y(i)/a1(i) + (1-y(i))/(1-a1(i))

Let's assume activation function g() is sigmoid function. Therefore ∂a(i)/∂z1(i) equals the derivative of sigmoid function.
eqn (5.6) : ∂a(i)/∂z1(i) = a1(i)(1 - a1(i))

By getting the derivative of the eqn (1) w.r.t. dw1,
eqn (5.7) : ∂z1(i)/∂w1 = x1(i)

By substituting eqn (5.5), eqn (5.6) and eqn (5.7) into eqn (5.4) and simplifying the expression,
eqn (5.8) : ∂L(i)/∂w1 = (a1(i) - y(i))x1(i)

By substituting eqn (5.8) into eqn (5.1),
eqn (6) : ∂w1 = (Σi=1m(a1(i) - y(i))x1(i))/m

Following the eqn (6), we can write derivate for all weights.

∂w2 = (Σi=1m(a1(i) - y(i))x2(i))/m ...
∂w12288 = (Σi=1m(a1(i) - y(i))x12288(i))/m

Since ∂z1(i)/∂b = 1,
eqn (7) : ∂b = ∂J/∂b = (Σi=1m(a1(i) - y(i)))/m

Now we have to update the weights and bias as shown below.

eqn (8) : w1 = w1 - α*∂w1 ...
w12288 = w12288 - α*∂w12288
eqn (9) : b = b - α*∂b

Finally, we have derived all the questions we need to implement our neural network. In the next blog post, I will explain to you how to implement our neural network using theses equations. Please feel free to raise any concerns/suggestions on this blog post. Let's meet in the next post.

Sort:  

Congratulations @boostyslf! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You published more than 10 posts. Your next target is to reach 20 posts.
You got more than 10 replies. Your next target is to reach 50 replies.

You can view your badges on your Steem Board and compare to others on the Steem Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

To support your work, I also upvoted your post!

Vote for @Steemitboard as a witness to get one more award and increased upvotes!