Jan 24, 2016

[ML] Neural Networks

Neural networks:

Idea:
There could be more than 2 features, i.e x1, x2, x3,..., xn
We could have ϴ0, ϴ1,..., ϴv, while 'v' maps to the number of
    terms in a polynomial.
 
Think about letting computer to recognize a picture of car.

A picture with 50 x 50 pixels image -> 2500 pixels

x = [
pixel 1 intensity   (0-255)
pixel 2 intensity   (0-255)
...
pixel 2500 intensity    (0-255)]

So, if we still want to count the features of all these combination,
there would be about xi * xj := 3 millions of features, how's that?!

Way too large for previous learning function.

Here's how neural networks comes in...

-------
NN:
Algorithms that try to mimic the brain.
Diminish at the late 90's, however, thank to distributed/powerful computing,
NN has come alive again.

Auditory cortex:
Was used for hearing. However, if we cut the input of hearing and reroute
the imput signal from seeing, auditory cortex will start learning to see.

Idea:
Plug-in any sensors to the brain, brain will start to learn.

---
NN model representation I:

Neuron:
input wire: Dendrite
ouput wire: Axon


-----
Artificial neuron design:
Neuron model: logistic unit

Input:
   x0(bias unit, always equal to 1),  x1, x2, x3 (as features)

processing:
    1 neuron

output:
    h(x)
 
Sigmoid (logistic) activation function.
i.e:
    g(z) = 1 / ( 1+e^(-z) )

----
Now, let's talk about network of neurons.

input: (layer 1)
    x0(bias unit, always equal to 1),  x1, x2, x3 (as features)
 
processing: (layer 2, aka hidden layer)
    multiple neurons. (also with  a0 neuron as bias unit)

output: (layer 3)
    h(x)


---
So,
ai^(j) = "activation" of unit i in layer j
ϴ^(j) = matrix of weights controlling function mapping from
    layer j to layer j + 1


---
Vectorize the computation:

 a1^(2) = g( z1^(2) )
 where z1^(2) = ϴ10^(1)x0 + ϴ11^(1)x1 + ... + ϴ13^(1)x3


 
 =>
 ϴ^(i)x



 ----
Neural network learning it's own features:

Rather than taking the sensor input from layer 1,
    i.e x1, x2, x3
we are using a1, a2, a3 as the new input.


-----

Ok, let's unmask layer 1.
The input x1, x2, x3 for a^(i) can be predicted by
layer 1's  ϴ1^(i-1), ϴ2^(i-1), ϴ3^(i-1), neat!


-----
Neural network can be compose by architectures:
i.e Multiple layers, but still,
only 1 input layer and 1 output layer. Others are hidden layer.


-----
Examples:

Non-linear classification example: XOR/XNOR



And Function:



Or Function:


-------
ϴ is also called 'weights'

Negation:


---
pipe line them together!

(Not x1) And (Not x2)

---
x1 XNOR x2



---

NN representation of Multi-class classification:

e.g 0-9 numbers, there are 10 catagories.

example:
Pedestrian
Car
Motorcycle
Truck

This a h(x) ∈ R^4

For Pedestrian:
[ 1
  0
  0
  0]

For car:
[ 0
  1
  0
  0]

For Motorcycle:
[ 0
  0
  1
  0]

For truck:
[ 0
  0
  0
  1]



Re-representing y = {1,2,3,4} into:

y^(i) is one of [1;0;0;0] ... [0;0;0;1]

 

Non-Linear hypothese.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.