Robust Semantic Uncertainty Estimation in Open-Ended Text Generation
Supervisor
Suitable for
Abstract
Semantic Entropy (SE) [1] provides a principled approach to quantifying uncertainty in open-ended language generation by assessing the variability in generated responses at a semantic level, rather than relying solely on token-level confidence scores. By clustering model outputs based on their semantic similarity, SE captures uncertainty in a way that reflects meaningful differences between responses. However, despite its promise, SE relies on ad hoc methodological choices, particularly in how it defines semantic similarity and clusters responses. A major limitation is the use of Natural Language Inference (NLI) models to assess textual semantic similarity, which can fail in cases involving long-form responses, nuanced context dependencies, or domain-specific knowledge. These issues introduce noise into SE-based uncertainty estimation, potentially leading to unreliable or inconsistent confidence assessments.
A follow-up work, Kernel Language Entropy (KLE) [2], addresses some of these issues by replacing hard clustering with a kernel-based similarity function, providing more fine-grained uncertainty estimates using the von Neumann entropy. While KLE improves over SE by considering pairwise semantic dependencies, it still inherits certain methodological limitations—including sensitivity to kernel choice and reliance on similarity metrics that may not generalize well across diverse natural language generation tasks.
This project aims to further refine the methodology behind Semantic Entropy and its successors by addressing these open challenges. Specifically, we seek to improve how semantic similarity is measured, reduce reliance on short-text-based entailment models, and develop uncertainty quantification techniques that generalize across a wider range of tasks, including long-context and domain-specific text generation. By investigating more robust and scalable approaches, this work will contribute to more reliable semantic uncertainty estimation in language models, ultimately improving their trustworthiness in safety-critical applications.
[1] Kuhn, Lorenz, Yarin Gal, and Sebastian Farquhar. "Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation." The Eleventh International Conference on Learning Representations.
[2] Nikitin, Alexander, et al. "Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities." arXiv preprint arXiv:2405.20003 (2024).