AI Safety and Theoretical Computer Science

Scott Aaronson ( University of Texas, Austin & OpenAI )

31May
14:00 31st May 2024 ( Trinity Term 2024 )
Lecture Theatre A, Department of Computer Science, Wolfson Building, Parks Road, Oxford,OX1 3QD

Title: AI Safety and Theoretical Computer Science

Abstract: Progress on AI safety and alignment, like the current AI revolutionmore generally, has been almost entirely empirical. In this talk, however,I'll survey a few areas where I think theoretical computer science cancontribute to AI safety, including:

- How can we robustly watermark the outputs of Large Language Models andother generative AI systems, to help identify academic cheating, deepfakes,and AI-enabled fraud? I'll explain my proposal and its basic mathematical properties, as well as what remains to be done.

- Can one insert undetectable cryptographic backdoors into neural nets, for good or ill? In what senses can those backdoors also be unremovable? How robust are they against fine-tuning?

- Should we expect neural nets to be "generically" interpretable? I'll discuss a beautiful formalization of that question due to Paul Christiano, along with some initial progress on it, and an unexpected connection to quantum computing.

Seminar Series

Departmental Seminars

Coordinators

Ronnie Clark

AI Safety and Theoretical Computer Science

Seminar Series

See also

Coordinators

News & Events