In
this blog, we will learn Recurrent Neural Networks. Also, will study
every important concepts related to Recurrent Neural Networks. Besides,
theory, we will use images for better representation and understanding
of Recurrent Neural Networks.
Introduction to Recurrent Neural Networks
Generally,
a recurrent neural network is a type of advanced artificial neural
network. Also, this ANN involves directed cycles in memory. As this
network has the ability to build on earlier types of networks. That
contains with fixed-size input vectors and output vectors.
Understanding the Recurrent Neural Networks
Let’s
say we have one task. That is to predict the next word in a sentence.
To accomplish it, we will try to use a multilayer perceptron. In MLP, we
have three layers. Such as an input layer, a hidden layer, and an
output layer. As in this, input layer receives the input, the hidden
layer activations are applied. Then we finally receive the output.
Further,
we have to send these activations to the next hidden layer. Although,
these successive activations helps to produce the output. Thus, each
hidden layer is characterized by its own weights and biases.
Since hidden layers behave independently, As they have their own weights and activations. Further, the main objective is to identify a relationship between successive points. Can we supply the inputs to hidden layers? Yes, we can!
Since hidden layers behave independently, As they have their own weights and activations. Further, the main objective is to identify a relationship between successive points. Can we supply the inputs to hidden layers? Yes, we can!
Here,
hidden layers are different. As their weight and bias are different.
Although, each layer is independent and we can’t combine them together.
Same weights and bias are required to combine the hidden layers.
All
layers are combined together as they have same weight and bias. Then,
we have to roll all these hidden layers in a single recurrent layer.
So
it’s like supplying the input to the hidden layer. At all the time
steps weights of the recurrent neuron would be the same since its a
single neuron now. So a recurrent neuron stores the state of a previous
input and combines with the current input. Further, thereby preserving
some relationship of the current input with the previous input.
What can RNNs do?
We
can say that RNN has shown great success in many MLP tasks. And the
most common type of RNNs we use is LSTMs. That they are too good at
capturing the long-term dependencies than vanilla RNNs are.
Why Recurrent Neural Networks?
This
network connection often offers so many advantages. They are very
helpful in image recognition and context information. As the time steps
increase, the unit gets influenced by larger neighborhood. With that
information, recurrent networks can watch large regions in the input
space. In CNN this ability is limited to units in higher layers.
Furthermore, the recurrent connections increase the network depth. While
they keep the number of parameters low by weight sharing. Reducing the
parameters is also a modern trend of CNN architectures
Additionally, the recurrent connections yield to an ability to handle sequential data. This ability is very useful for many tasks. As for the last point recurrent connections of neurons are biological inspired and they are used for many tasks in the brain. Therefore using such connections can enhance artificial networks and bring interesting behaviors. The last big advantage is that RNN offers some kind of memory, which can be used in many applications.
Additionally, the recurrent connections yield to an ability to handle sequential data. This ability is very useful for many tasks. As for the last point recurrent connections of neurons are biological inspired and they are used for many tasks in the brain. Therefore using such connections can enhance artificial networks and bring interesting behaviors. The last big advantage is that RNN offers some kind of memory, which can be used in many applications.
Training RNNs
Generally,
training an RNN is similar to training a traditional Neural Network. We
also use the backpropagation algorithm for this. Because the parameters
are shared by all time steps in the network. The gradient at each
output depends not only on the calculations of the current time step.
But also the previous time steps.
For example:
In order to calculate the gradient at we would need to backpropagate 3 steps and sum up the gradients. This is called Backpropagation Through Time (BPTT). If this doesn’t make a whole lot of sense yet, don’t worry, we’ll have a whole post on the gory details. For now, that vanilla RNNs trained with BPTT have difficulties learning long-term dependencies. Due to what is called the vanishing/exploding gradient problem. There exists some machinery to deal with these problems. Also, certain types of RNNs (like LSTMs) were specifically designed to get around them.
The training of almost all networks is done by back-propagation. But with the recurrent connection, it has to be adapted. This is simply done by unfolding the net like. It is shown that the network consists of one recurrent layer and one feed forward layer. The network can be unfolded to k instances off.
In the example, in figure the network is unfolded with a depth of k = 3. After unfolding, the network can be trained in the same way as an FFD with Backpropagation. We have to except that each epoch has to run through each unfolded layer. The algorithm for recurrent nets is then called Backpropagation through time (BPTT).
For example:
In order to calculate the gradient at we would need to backpropagate 3 steps and sum up the gradients. This is called Backpropagation Through Time (BPTT). If this doesn’t make a whole lot of sense yet, don’t worry, we’ll have a whole post on the gory details. For now, that vanilla RNNs trained with BPTT have difficulties learning long-term dependencies. Due to what is called the vanishing/exploding gradient problem. There exists some machinery to deal with these problems. Also, certain types of RNNs (like LSTMs) were specifically designed to get around them.
The training of almost all networks is done by back-propagation. But with the recurrent connection, it has to be adapted. This is simply done by unfolding the net like. It is shown that the network consists of one recurrent layer and one feed forward layer. The network can be unfolded to k instances off.
In the example, in figure the network is unfolded with a depth of k = 3. After unfolding, the network can be trained in the same way as an FFD with Backpropagation. We have to except that each epoch has to run through each unfolded layer. The algorithm for recurrent nets is then called Backpropagation through time (BPTT).
RNN Extensions
Over
the years researchers have developed more sophisticated types of RNNs.
That is to deal with some of the shortcomings of the vanilla RNN model.
a. Bidirectional RNNs
These
are based on the idea that the output at a time may not only depend on
the previous elements in the sequence. But also future elements.
For example:
To predict a missing word in a sequence you want to look at both the left and the right context. Bidirectional RNNs are quite simple. They are just two RNNs stacked on top of each other. The output is then computed based on the hidden state of both RNNs.
For example:
To predict a missing word in a sequence you want to look at both the left and the right context. Bidirectional RNNs are quite simple. They are just two RNNs stacked on top of each other. The output is then computed based on the hidden state of both RNNs.
b. Deep (Bidirectional) RNNs
These
are similar to Bidirectional RNNs, only that we now have multiple
layers per time step. In practice, this gives us a higher learning
capacity.
c. LSTM networks
LSTMs don’t have a different architecture from RNNs. But they use a different function to compute the hidden state.
The memory in LSTMs is called cells. Internally these cells decide what to keep in memory. They then combine the previous state, the current memory, and the input.
It turns out that these types of units are very efficient at capturing long-term dependencies.
LSTMs can be quite confusing in the beginning. But if you’re interested in learning more this post has an excellent explanation.
The memory in LSTMs is called cells. Internally these cells decide what to keep in memory. They then combine the previous state, the current memory, and the input.
It turns out that these types of units are very efficient at capturing long-term dependencies.
LSTMs can be quite confusing in the beginning. But if you’re interested in learning more this post has an excellent explanation.
Advantages of RNN
a. Store Information
The
RNN can use the feedback connection. That is to store information over
time in form of activations. This ability is significant for many
applications. In the recurrent networks are described that they have
some form of memory.
b. Learn Sequential Data
The
RNN can handle sequential data of arbitrary length. On the left, the
default FFN is shown which can just compute one fixed-size input to one
fixed size output. With the recurrent approach also one too many, many
to one and many to many inputs to outputs are possible.
One example for one to many networks is that you label an image with a sentence. The many to one approach could handle a sequence of image. They produce one sentence for it. And finally the many to many approaches can be used for language translations. Other use cases for the many to many approaches could be to label each image of a video sequence.
One example for one to many networks is that you label an image with a sentence. The many to one approach could handle a sequence of image. They produce one sentence for it. And finally the many to many approaches can be used for language translations. Other use cases for the many to many approaches could be to label each image of a video sequence.
Applications of RNN
Particularly, RNNs are useful in training on any type of sequential data.For example:
It would make sense to use an RNN. Such as for image/video captioning, word prediction, translation, image processing. However, an RNN can also suit to be trained on non-sequential data in a non-sequential manner. Not too long ago I implemented an RNN for a computational neuroscience project. In case you want to implement your very first RNN, here are a some of tips:
a. Unfold your network
This
allows you to visualize network interacts with itself at adjacent
time-steps. But it also allows you to visualize how the error is
back-propagated through the system (BPTT).
The rule of thumb was any connection, at time step ‘t’,. That isn’t feed-forward should be connected to the next time step at ‘t+1’.
The rule of thumb was any connection, at time step ‘t’,. That isn’t feed-forward should be connected to the next time step at ‘t+1’.
b. Keep track of your back-propagated errors
Don’t
duplicate parameters. Use one set of weights for all your states
(time-steps). This ensures that you are using a minimal amount of
memory. And your weights for each state is the same across all states.
c. They are used in speech processing, non-Markovian control, and music composition. In addition, RNN is used successfully for sequential data. Such as handwriting recognition and speech recognition.
d. The advantage in comparison to FFD is, that RNN can handle sequential data.
A single RNN is proposed for sequence labeling. Most successful applications of RNN refer to tasks like handwriting recognition and speech recognition.
A single RNN is proposed for sequence labeling. Most successful applications of RNN refer to tasks like handwriting recognition and speech recognition.
e. They are also used in for Clinical decision support systems.
They used a network based on the Jordan/Elman neural network.
Furthermore, in a recurrent fuzzy for control of dynamic systems is
proposed as a newer application which uses combinations of RNN with CNN.
f.
A great application is in collaboration with Natural Language
Processing (NLP). RNNs have been demonstrated by many people on the
internet. It can represent a language model. These language models can
take input such as a large set of Shakespeare’s poems. And after
training these models they can generate their own Shakespearean poems
that is very hard to differentiate from originals!
No comments:
Post a Comment