Multimodal Pretrained Models for Sequential Decision-Making: Synthesis, Verification, Grounding, and Perception
- 14:00 16th January 2024 ( Hilary Term 2024 )Strachey Seminar Room (003), Robert Hooke Building, Department of Computer Science, OX1 3QD
The rapidly increasing capabilities of multi-modal pre-trained generative models in text generation, question answering, and image annotation offer new opportunities in the design and operation of autonomous systems. On the one hand, generative language models can help remove a series of restricting assumptions in the existing design flow, e.g., on the availability of complete knowledge of the features, actions, and observations necessary for delivering a task. Concurrently, multi-modal language and vision models provide powerful perceptual insights by efficiently segmenting and textually labeling given images. On the other hand, these models have their own limitations. They act as black boxes whose input-output transformations are, at best, not well-understood. Their outputs are not amenable to direct integration into the design flow of autonomous systems, and, at times, they may even be unintuitive or even conflict with common sense. Their accuracy is questionable, and the uncertainty in the outputs does not currently admit quantifiable, interpretable, and actionable characterizations. The main hypothesis of this presentation is that multi-modal pre-trained generative models, when (and only when) equipped with algorithms—and supporting theory—that account for the characteristics of autonomous systems, will facilitate the design flow for increasingly autonomous systems to operate in dynamic, uncertain, and possibly adversarial environments. I will support this hypothesis through a discussion of a series of algorithms algorithms to (1) integrate multi-modal pre-trained generative models into the synthesis and execution of sequential (i.e., long-horizon) decision-making strategies for autonomous systems, (2) characterize and account for the uncertainties of the outputs of these models relevant for the task to be delivered, and (3) pro-actively reduce these uncertainties to improve task performance and safety risks.