Deep Neural Network in Machine Learning

Breakthrough results in
- Image classification
- Speech Recognition
- Machine Translation
- Multi-modal learning

Deep Neural Network

Problem: training networks with many hidden layers doesn't work very well.
Local minima, very slow training if initialize with zero weights.
Diffusion of gradient.

Hierarchical Representation

Hierarchical Representation help represent complex functions.
NLP:character -> word -> Chunk -> Clause -> Sentence
Image: pixel > edge -> texton -> motif -> part -> object
Deep Learning: Learning a hierarchy of internal representations
Learned internal representation at the hidden layers (trainable feature extractor)
Feature learning

Unsupervised Pre-training

We will use greedy, layer wise pre-training

Train one layer at a time
Fix the parameters of previous hidden layers
Previous layers viewed as feature extraction

find hidden unit features that are more common in training input than in random inputs

Tuning the Classifier

After pre-training of the layers
- Add output layer

- Train the whole network using supervised learning (Back propagation)

In deep neural Network

Feed forward NN
Stacked Autoencoders (multi layer neural net with target output = input)
Stacked restricted Boltzmann machine
Convolutional Neural Network

A Deep Architecture: Multi-Layer Perceptron

Output Layer
Here predicting a supervised

Hidden layers
These learn more abstract representations as you head up

Input layer
Raw sensory inputs

A Neural Network

Training: Back Propagation of Error
- Calculate total error at the top
- Calculate contributions to error at each step going backwards
- The weights are modified as the error is propagated

Training Deep Networks

Difficulties of supervised training of deep networks
1. Early layers of MLP do not get trained well

Diffusion of Gradient - error attenuates as it propagates to earlier layers

Leads to very slow training

the error to earlier layers drops quickly as the top layers "mostly" solve the task

2. Often not enough labeled data available while there may be lots of unlabeled data
3. Deep networks tend to have more local minima problems than shallow networks during supervised training.

Training of neural networks

Forward Propagation :

- Sum inputs, produce activation
- feed-forward

Activation Functions

Autoencoder

Unlabeled training examples set
{ 𝒳1, 𝒳2, 𝒳3, .... }, 𝒳i ∈ Rn
Set the target values to be equal to the inputs. yi = 𝒳i
Network is trained to output the input (learn identify function).
hw,b (𝒳) ≂ 𝒳
Solution may be trivial!