Ataraxia through Epoché: [6.S191][note] Introduction to Deep Learning

Why activation functions?

introduce non-linearities into the network.

The loss of network measures the cost incurred from incorrect predictions.

the empirical loss measures the total loss over entire dataset.

Cross entropy loss can be used with models that output a probability between 0 and 1

f() here is the activation function, returns [0, 1].

- used due to log with probability is negative, thus we apply - to flip to positive value.

Mean squared error loss can be used with regression models that output continuous real numbers.

Training the network

Focus on Loss Optimization

We want to find the network weights that achieve the lowest lose.

Here's the gradient decent comes in. (back propagation)

i.e. every weight.

Thus, by using derivative(i.e. m, slope), we can see if the m is >0 or < 0. If > 0, we lower the weight. < 0, we increase the weight.

So how much the steps to take? (learning rate)
set too small, slow, too large, unstable.

Methods:

Ataraxia through Epoché