Language Modeling, Recurrent Neural Nets and Seq2Seq
Topics
This post covers the third lecture in the course: “Language Modeling, Recurrent Neural Nets and Seq2Seq.”
TThis lecture will cover the history of language modeling prior to transformers. Even though you should typically use a transformer-based language model, this lecture will provide valuable context for understanding NLP and how the field has evolved. It will also introduce sequence-tosequence models and attention, key pre-requisites for understanding transformers and various applications.
The Lecture is broken up into three parts:
- Language Modeling, covering early examples of language modeling
- Word Embeddings, covering GloVe, Word2Vec, etc
- Seq2Seq, covering Sequence to Sequence models (LSTMs)
Lecture Videos
Language Modeling
Word Embeddings
Sequence to Sequence Models
Lecture notes: Language Modeling, Word Embeddings, Seq2Seq
References Cited in Lecture 3: Language Modeling, Recurrent Neural Nets and Seq2Seq
Academic Papers
-
Jurafsky, Daniel and James Martin, “N-Gram Language Models” Speech and Language Processing, 2020 .
-
Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. “Learning long-term dependencies with gradient descent is difficult.” IEEE transactions on neural networks 5, no. 2 (1994): 157-166.
-
Hochreiter, Sepp, and Jürgen Schmidhuber. “Long short-term memory.” Neural computation 9, no. 8 (1997): 1735-1780.
-
Greff, Klaus, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber. “LSTM: A search space odyssey.” IEEE transactions on neural networks and learning systems 28, no. 10 (2016): 2222-2232.
-
Chung, Junyoung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. “Empirical evaluation of gated recurrent neural networks on sequence modeling.” arXiv preprint arXiv:1412.3555 (2014).
-
Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. “Sequence to sequence learning with neural networks.” arXiv preprint arXiv:1409.3215 (2014).
-
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. “Neural machine translation by jointly learning to align and translate.” arXiv preprint arXiv:1409.0473 (2014).
-
Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. “Distributed representations of words and phrases and their compositionality.” Advances in Neural Information Processing Systems 26 (2013): 3111-3119. (Notes: this paper developed the word2vec model commonly used in NLP)
-
Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. “Glove: Global vectors for word representation.” In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532-1543. 2014.
Other Resources
-
Olah, Chris, and Shan Carter. “Attention and augmented recurrent neural networks.” Distill 1, no. 9 (2016): e1.
-
Olah, Christopher. “Deep Learning, NLP, and Representation”, 2014.
Code Bases
- Pytorch Examples: Classifying MNIST digits with an RNN
Image Source: https://thorirmar.com/post/insight_into_lstm/