Why activation functions?
introduce non-linearities into the network.
The loss of network measures the cost incurred from incorrect predictions.
Cross entropy loss can be used with models that output a probability between 0 and 1
- used due to log with probability is negative, thus we apply - to flip to positive value.
Mean squared error loss can be used with regression models that output continuous real numbers.
Training the network
Focus on Loss Optimization
We want to find the network weights that achieve the lowest lose.
Here's the gradient decent comes in. (back propagation)
Thus, by using derivative(i.e. m, slope), we can see if the m is >0 or < 0. If > 0, we lower the weight. < 0, we increase the weight.
So how much the steps to take? (learning rate)
set too small, slow, too large, unstable.
set too small, slow, too large, unstable.
Methods:
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.