Skip to main content

Developing a Novel Pipeline for Pose-Conditioned Image/Video Generation with Diffusion Models

Supervisors

Suitable for

MSc in Advanced Computer Science
Mathematics and Computer Science, Part C
Computer Science and Philosophy, Part C
Computer Science, Part B
Computer Science, Part C

Abstract

Supervisors: Oishi Deb; Philip Torr This project focuses on developing a novel pipeline for Pose-Conditioned Image Generation using Diffusion Models. This approach ensures the preservation of identity and viewpoints across azimuthal angles. It has significant applications in downstream tasks, such as 3D object articulation and 3D mesh deformation. The generated target will serve as pseudo-2D targets for 3D mesh deformation, effectively eliminating the need for 3D targets. These images will retain the identity of the reference frames while adopting the target pose specified through textual prompts. Achieving this requires a degree of disentanglement between identity and pose information. The project will leverage state-of-the-art diffusion models, such as Stable Diffusion 3, which offer high fidelity to user input text prompts. Additionally, this framework can be extended to animation, utilizing stable video diffusion models for target generation, paving the way for innovative applications in animated 3D content creation. This project is in collaboration between VGG and TVG. This is a well-scoped project which will lead to a publication soon, and a codebase is already developed, so it will be well-suited for students who would like to get published.

References:

1. Stable Diffusion 3 - paper 2. Stable Video Diffusion - paper 3. MVDream - paper 4. Qianyi Deng*, Oishi Deb*, Amir Patel, Christian Rupprecht, Philip Torr, Niki Trigoni, and Andrew Markham. “Towards multi-modal animal pose estimation: An in-depth analysis”, 2024 - https://arxiv.org/pdf/2410.09312. 5. Oishi Deb, Diffusion Models for 3D Reconstruction and Articulation of Deformable Objects, Oxford Computer Science Conference, 2024, Link. 6. Personalized Image Animator (PIA) - https://github.com/open-mmlab/PIA?tab=readme-ov-file