• RNNs suffer from short-term memory → if sequence is very long, have a hard time carrying information from earlier time steps to later ones

    • processing paragraph of text to do predictions is difficult with RNN's as they could leave out important information from the beginning
  • RNNs suffer from vanishing gradient problem → gradient shrinks as it back propogates through time, doesn't contribute to the learning

    • updates are very small and so learning is halted
  • layers that get a small graident update stop learning → usually the earlier layers and because of this, RNN's forget what happens early on in longer sequences, having short-term memory

  • LSTM's developed as a solution to short-term memory and use mechanisms called gates to regulate flow of information

  • gates learn which data in sequence is important to keep or throw away → passes relevant information down sequences to make predictions

  • LSTM's and GRU's work for speech recog, synthesis, text-generation, and more

  • Status Quo: RNNs:

    • each word passed in an RNN is processed into a sequence of vectors one by one, passes the previous hidden state to the next step of the sequence
    • hidden state acts as the neural networks memory and holds information on previous data the network has seen before