Basics of Training and Optimizing Neural Networks

January 24, 2023 - 2 minute read - Category: Intro - Tags: Deep learning

Topics

This post covers the seventh lecture in the course: “Basics of Optimizing Neural Networks; Classification.”

This is a kitchen sink lecture about optimizing neural nets. The Goodfellow, Bengio and Courville textbook is a very helpful reference. Here are some additional references that provide very useful insights:

Lecture Video

Lecture notes

References Cited in Lecture 7: Basics of Optimizing Neural Networks; classification

Momentum

Goh, Gabriel. “Why momentum really works.” Distill 2, no. 4 (2017): e6 (excellent treatment of momentum).

Batch Normalization

Ioffe, Sergey, and Christian Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift.” In International conference on machine learning, pp. 448-456. PMLR, 2015.
Santurkar, Shibani, Dimitris Tsipras, Andrew Ilyas, and Aleksander Madry. “How does batch normalization help optimization?.” Advances in Neural Information Processing Systems (2018).

Other Refinements

He, Tong, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and Mu Li. “Bag of tricks for image classification with convolutional neural networks.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 558-567. 2019.

Hyperparameter Tuning

Li, Hao, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. “Visualizing the loss landscape of neural nets.” Advances in Neural Information Processing Systems (2018).
Li, Lisha, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. “Hyperband: A novel bandit-based approach to hyperparameter optimization.” The Journal of Machine Learning Research 18, no. 1 (2017): 6765-6816.
Falkner, Stefan, Aaron Klein, and Frank Hutter. “BOHB: Robust and efficient hyperparameter optimization at scale.” In International Conference on Machine Learning, pp. 1437-1446. PMLR, 2018.
Frankle, Jonathan, and Michael Carbin. “The lottery ticket hypothesis: Finding sparse, trainable neural networks.” arXiv preprint arXiv:1803.03635 (2018).

Other Resources

Use Weights & Biases
Weights and Biases: Running Hyperparameter Sweeps to Pick the Best Model
Weights and Biases: Tune Hyperparameters
Weights and Biases: Parameter Importance

Image Source: https://wandb.ai/