Skip to main content

Refinement for feed-forward SfM models

Supervisor

Suitable for

MSc in Advanced Computer Science
Mathematics and Computer Science, Part C
Computer Science and Philosophy, Part C
Computer Science, Part B
Computer Science, Part C

Abstract

Advances in machine learning have massively impacted 3D computer vision. This has also changed the relationship between visual geometry and deep neural networks. For a long time, the belief was that accurate 3D reconstruction should be obtained from visual geometry principles by solving systems of equations or via optimization of energy functions, like in bundle adjustment (BA). In this view, machine learning was relegated to work as a pre-processor, addressing tasks like feature matching and tasks that geometry cannot handle, such as monocular depth prediction.

Later, as machine learning methods matured, they became integrated more deeply in visual geometry pipelines, culminating in methods like VGGSfM that, using differentiable BA, achieve state-of-the-art results in Structure from Motion (SfM). Even so, visual geometry still plays a major role, which increases complexity and computational cost.

As more and more powerful and capable foundation models emerge in computer vision, 3D tasks can now be solved directly by a neural predictor, eschewing visual geometry almost entirely. Recent contributions like Dust3r and its evolution Mast3r have shown promising results in this direction.

In this project we will explore visual geometry post-optimization of the predictions of feed-forward 3D models and analyse how much neural predictions can be improved with classical multi-view geometry approaches.

Goals:

  • Setup an evaluation framework for both the deep and traditional 3D reconstruction methods to measure the progress.
  • Use gradient descent optimization to update the predictions from feed-forward 3D models, making them more geometry-aligned. 
  • Test several hypotheses for bias in multiple different models.

Stretch Goal:

  • Define your own objective function that improves the results even further.
  • Analyze the capabilities and limitations of feed-forward 3D models. 

References:

Leroy, Vincent, Yohann Cabon, and Jérôme Revaud. "Grounding image matching in 3d with mast3r." European Conference on Computer Vision. Springer, Cham, 2025.

Wang, Shuzhe, et al. "Dust3r: Geometric 3d vision made easy." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.

Wang, Jianyuan, et al. "VGGSfM: Visual Geometry Grounded Deep Structure From Motion." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.

Richard Hartley and Andrew Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, 2000

 

Pre-requisites: Machine Learning