Attention and Transformers

S2S or sequence-to-sequence takes sequence of items and outputs another sequence of items
the actual model is made up of an encoder and decoder → encoder processes each item in input sequence, compiles information into a vector (vector is called context vector) → decoder recieves the context vector which is uses to produce the output sequence item by item
encoder-decoders are recurrent neural networks → context is vector of floats which can also be considered the hidden units in the encoder RNN
RNNs by nature take 2 inputs → a hidden state and the current input → word can be represented in vector in the form of word embeddings