What Can Deep Learning Learn from Linear Regression
When training large-scale deep neural networks for pattern recognition, hundreds of hours on clusters of GPUs are required to achieve state-of-the-art performance. Improved optimization algorithms could potentially enable faster industrial prototyping and make training contemporary models more accessible.
In this talk, I will attempt to distill the key difficulties in optimizing large, deep neural networks for pattern recognition. In particular, I will emphasize that many of the popularized notions of what make these problems “hard” are not true impediments at all. I will show that it is not only easy to globally optimize neural networks, but that such global optimization remains easy when fitting completely random data.
I will argue instead that the source of difficulty in deep learning is a lack of understanding of generalization---namely understanding behavior on new and unseen data. By appealing to standard concepts from linear regression, I will describe why certain popular theories of generalization fail to explain the success of large neural nets. I will close with some possible approaches to patching this theory and guiding the engineering of deep learning models with enormous capacity.
BIO: Benjamin Recht is an Associate Professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. Ben's research group studies the theory and practice of optimization algorithms with a particular focus on applications in machine learning and data analysis. Ben is the recipient of a Presidential Early Career Awards for Scientists and Engineers, an Alfred P. Sloan Research Fellowship, the 2012 SIAM/MOS Lagrange Prize in Continuous Optimization, the 2014 Jamon Prize, and the 2015 William O. Baker Award for Initiatives in Research.