Short notes - Mogrifier LSTM
Short notes on Mogrifier LSTM:
Overall objective of the paper: How do we inject context into the embeddings of a LSTM? Note, that the hidden state of the LSTM does contain context-specific info, but how do we have the hidden state play more into the embeddings generated by the LSTM?
The paper takes a fresh look at the traditional LSTM, and the “gates” in there. First, the vanilla LSTM:
Some insights:
-
the
i
gate essentially “scales” the rows of the weight matrix W_j. What? Look at the equation -
motivated by the above, equip the LSTM with gates that scale the columns of the weight matrices.