Exploring Uncertainty in Focus Instruction Tuning: The Role of Feature Specification in Model Confidence and Uncertainty

Supervisor

Suitable for

Abstract

Existing uncertainty quantification (UQ) methods, such as MC Dropout [1], Deep Ensembles[2], predictive entropy, and semantic entropy [3] estimate model confidence and uncertainty but do not explicitly investigate how focusing on or ignoring specific features influences uncertainty. In many cases, models may express high certainty when relying on spurious correlations, but when instructed to ignore these features, their uncertainty may increase significantly. Understanding how uncertainty shifts when the model is forced to focus on different aspects of the input is crucial for evaluating robustness and generalization under distribution shifts. Focus Instruction Tuning (FIT) [4] provides a natural framework for this investigation by explicitly controlling which features the model attends to, making it possible to analyze how uncertainty behaves under different feature specifications.

This project has two key aims. First, we seek to benchmark standard UQ measures within the FIT framework, examining how a model’s elicited uncertainty changes when instructed to focus on or disregard specific features. This is particularly relevant in cases where models learn spurious correlations, as they may appear confident when attending to irrelevant features but exhibit increased uncertainty when forced to rely on causal signals. The second aim is to develop a Bayesian-style UQ methodology within FIT, incorporating feature priors and posterior inference to quantify uncertainty over specific feature attributions. This approach would allow for a more interpretable and structured form of UQ, where uncertainty is explicitly linked to individual features, providing deeper insights into model confidence and failure modes. By disentangling uncertainty across features, this method could offer a more robust alternative for evaluating and improving uncertainty quantification under distribution shifts and biased correlations.

[1] Gal, Yarin, and Zoubin Ghahramani. "A theoretically grounded application of dropout in recurrent neural networks." Advances in neural information processing systems 29 (2016).

[2] Lakshminarayanan, Balaji, Alexander Pritzel, and Charles Blundell. "Simple and scalable predictive uncertainty estimation using deep ensembles." Advances in neural information processing systems 30 (2017).

[3] Kuhn, Lorenz, Yarin Gal, and Sebastian Farquhar. "Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation." The Eleventh International Conference on Learning Representations.

[4] Lamb, Tom A., et al. "Focus On This, Not That! Steering LLMs With Adaptive Feature Specification." arXiv preprint arXiv:2410.22944 (2024).

Exploring Uncertainty in Focus Instruction Tuning: The Role of Feature Specification in Model Confidence and Uncertainty

Supervisor

Suitable for

Abstract

Student Space