Recurrent Neural Networks Tutorial | RNN Fundamentals Explained | Irawen

Agenda :

Why Not Feed forward Networks ?
What Is Recurrent Neural Network?
Issues With Recurrent Neural Network
Vanishing And Exploding Gradient
How To Overcome These Challenges?
Long Short Term Memory Units
LSTM Use-Case

Why Not Feed forward Networks ?

A trained feed forward network can be exposed to any random collection of photographs, and the first photograph it is exposed to will not necessarily alter how it classifies the second

Seeing photograph of a dog will not lead the net to perceive an elephant next.

How To Overcome This Challenge ?

What Is Recurrent Neural Network ?

Recurrent Network are a type of artificial neural network designed to recognize patterns in sequences of data, such as text, genomes, handwriting, the spoken word, or numerical times series data emanating from sensors, stock markets and governments agencies.

Ex.

Suppose your gym trainer has made a schedule for you.

➝ The exercise are repeated after every third day.

What IS Recurrent Neural Network ?

First Day ⟶ Shoulder Exercises

Second Day ⟶ Biceps Exercises

Third Day ⟶ Cardio Exercises

Predicting the type of exercise

Using Feedforward Net Using Recurrent Net

Training A Recurrent Network

Recurrent Neural Nets uses back propagation algorithm, but it is applied for every time stamp. It is commonly known as Backpropagation Through Time (BTT).

Let's look at the issues with Backpropagation

vanishing Gradient

Exploding gradient

How To Overcome These Challenges ?

Exploding gradients :

Truncated BTT :- Instead of starting backpropagation at the last time stamp, we can choose a smaller time stamp like 10 (we will lose the temporal context after 10 time stamps)

Clip gradients at threshold :- Clip the gradient when it goes higher than a threshold

RMSprop to adjust learning rate

Vanishing gradients :

ReLU activation function :- We can use activation functions like ReLU, which gives output one while calculating gradient

RMSprop :- Clip the gradient when it goes higher than a threshold

LSTM, GRUs :- Different network architectures that has been specially designed can be used to combat this problem.

Long Short Term Memory Networks

Long Short Term Memory networks - usually just called "LSTMs" - are a special kind of RNN.

They are capable of learning long-term dependencies.

The repeating module in a standard RNN contains a single layer.

Step - 1

The first step in the LSTM is to identify those information that are not required and will be thrown away from the cell state. This decision is made by a sigmoid layer called as forget layer.

Step - 2 :

The next step is to decide, what new information we are going to store in the cell state. This whole process comprises of following steps. A sigmoid layer called the "input gate layer" decide which values will be updated. Next, a tanh layer creates a vector of new candidate values, that could be added to the state.

In the next step, we will combine these two to update the state.

Step - 3 :

Now we will update the old cell state, Ct-1, into the new cell state Ct. First , we multiply the old state (Ct-1) by Ft, forgetting the things we decided to forget earlier . Then, we add it * ct. This is the new candidate values, scaled by how much we decided to update each state value.

Step - 4 :

We will run a sigmoid layer which decides what parts of the cell state we are going to output. Then, we put the cell state through tanh (push the values to be between -1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.

Long Short Term Memory Networks Use-Case

We will feed a LSTM with correct sequences from the next of 3 symbols as inputs and 1 labeled symbol, eventually the neural network will learn to predict the next symbol correctly.