Overview

Before diving into the implementation of it, I would like to mention the terminologies that we are going to follow in the network.

It is a simpler version of a typical Neural Network with two hidden layer with 3 Neurons each and an output layer with one neuron. (bias vector not shown in the diagram for the sake of simplicity.

Here, I is the input layer\bold I \text{ is the input layer}, H1 and H2 are the hidden layers \bold H_1 \text{ and }\bold H_2 \text{ are the hidden layers } and the last layer is the output layer, we can call it as O\bold O .

X\bold X can be the input matrix, where it has all the input vectors (dimensions will be discussed later) and Y^\bold {\hat{Y}} (hat symbol on the top of Y is small) is the predicted vector and Y\bold Y is the True vector.

We have a loss function L(Y^,Y)\bold {L(\hat Y, Y)} is simply log loss in this case, as we are using a single neuron for binary classification.

Even if the number of neurons in the output layer are more than 1, we can use cross-entropy as out loss function. But , Cross entropy for two classes is exactly equivalent to log-loss.

Lastly, the connections between the two layers are associated with some weights, which we will learn using back propagation. Instead of representing them as individual weights, we will compose them into a bigger matrix, just to make it easier for the computation during forward and backward propagation.

In the next chapter, we will try to make sense about the dimensions of input matrix, weight matrices and the output Once you understand the representation of these things, we can easily finish the forward propagation.

Last updated

Was this helpful?