Real World Benchmarks for Deep Probabilistic AI
Bayesian deep learning (BDL) is an emerging sub-field of Deep Probabilistic AI which stands at the core of probabilistic programming languages such as Edward. BDL offers a pragmatic approach to combining Bayesian probability theory together with deep learning models in practical and scalable ways, giving tools to quantify what deep models “know”. Even though the number of applications making use of BDL might be increasing quickly, the development of the field itself is impeded by the lack of realistic benchmarks to guide research. Evaluating new inference techniques on real applications requires expert domain knowledge, and instead currently researchers developing new inference tools for BDL often use MNIST-like toy benchmarks, ignoring cost of development or scalability aspects. To make significant progress in the deployment of BDL and new deep AI inference tools, new tools must scale to real world settings, and for that researchers must be able to evaluate their inference and iterate quickly with real world benchmark tasks.
This project aims to develop a set of real-world probabilistic-programming-language-agnostic evaluations and data to benchmark probabilistic AI inference techniques, and BDL in particular. It will do this by refining applications which already make use of BDL, and develop additional benchmarks together with Intel Labs, making use of Intel Xeon servers. Additionally, the project will include the designing of a public competition with the developed benchmarks to be hosted at the Bayesian Deep Learning workshop at NIPS (Neural Information Processing Systems) Conference.
The benchmarks will make testing new inference techniques radically easier, leading to rapid development of new tools. With the community competing on the benchmarks, this should lead to significant advancements in reliability of existing and new deep probabilistic AI tools in real world applications.