Long-Short Term Memory + GRUs

RNNs suffer from short-term memory → if sequence is very long, have a hard time carrying information from earlier time steps to later ones
- processing paragraph of text to do predictions is difficult with RNN's as they could leave out important information from the beginning
RNNs suffer from vanishing gradient problem → gradient shrinks as it back propogates through time, doesn't contribute to the learning
- updates are very small and so learning is halted
layers that get a small graident update stop learning → usually the earlier layers and because of this, RNN's forget what happens early on in longer sequences, having short-term memory
LSTM's developed as a solution to short-term memory and use mechanisms called gates to regulate flow of information
gates learn which data in sequence is important to keep or throw away → passes relevant information down sequences to make predictions
LSTM's and GRU's work for speech recog, synthesis, text-generation, and more
Status Quo: RNNs:
- each word passed in an RNN is processed into a sequence of vectors one by one, passes the previous hidden state to the next step of the sequence
- hidden state acts as the neural networks memory and holds information on previous data the network has seen before