Challenges for deep learning Challenges for deep learning
- 10:00 25th October 2013 ( week 2, Michaelmas Term 2013 )Lecture Theatre B
Deep learning research has been successful beyond expectations in the last few years, both in terms of academic impact and industrial fallout, e.g., with companies such as Google, Microsoft, Baidu, or Facebook investing heavily and releasing products based on deep learning. However, many fundamental challenges on the road to AI remain and will be the focus of this talk. We outline the numerical optimization issue (training larger and larger models gets more and more difficult, and yet the larger models are the most successful ones) and the computational resources issue. For the latter we propose to develop learning algorithms based on distributed conditional computation (where only some of the parameters need to be touched for any particular example), thus constructing dynamically structured networks. We raise the question of what is a good representation and how to help the learner disentangle the underlying factors of representation using broad priors. Finally, we discuss a fundamental challenge for probabilistic models involving many random variables (for unsupervised or structured output learning, for example), having to do with the marginalization required during training and for inference. We suggest that current approximations (e.g. based on MCMC or variational methods) may be fundamentally insufficient for complex inputs for which the number of major modes (even when conditioning on the input) is very large and these modes are separated by vast low-density regions. We propose a novel alternative to maximum likelihood (which includes dependency networks as a special case) called Generative Stochastic Networks, which tries to learn a simpler (but conditional) distribution, whose normalization constant would be easier to approximate. That conditional distribution is the transition operator of a Markov chain whose stationary distribution is the one estimated by the learner.
Speaker bio
Yoshua Bengio received his PhD in Computer Science from McGill University in 1991. After two post-doctoral years, one at M.I.T. and one at AT&T Bell Laboratories, he became a Professor in the Department of Computer Science and Operations Research at Université de Montréal. He is the author of two books and of more than 150 publications, the most cited being in the areas of deep learning, recurrent neural networks, probabilistic learning algorithms, and pattern recognition. Since 2000, Dr. Bengio has held a Canada Research Chair in Statistical Learning Algorithms, and he also holds an NSERC Industrial Chair.Dr. Bengio is a recipient of the Urgel-Archambault 2009 prize, a Fellow of the Centre Inter-universitaire de Recherche en Analyse des Organisations (CIRANO), Action Editor of the Journal of Machine Learning Research, Associate editor of Foundations and Trends in Machine Learning and of Computational Intelligence, and former Associate Editor of Machine Learning and of the IEEE Transactions on Neural Networks. He is also Director of the Canadian research group on Inference from High-Dimensional Data within the MITACS network of centers of excellence, and Founder and head of the Laboratoire d’Informatique des Systèmes Adaptatifs.
Dr. Bengio's current interests include fundamental questions on learning deep architectures, the geometry of generalization in high-dimensional spaces, biologically inspired learning algorithms, and challenging applications of statistical machine learning in artificial intelligence tasks.