Recent advances in computer vision have predominantly relied on data-driven approaches that leverage deep learning and large-scale datasets. Deep neural networks have achieved remarkable success in tasks such as stereo matching and monocular depth reconstruction. However, these methods lack explicit models of 3D geometry that can be directly analyzed, transferred across modalities, or systematically modified for controlled experimentation. We investigate the role of Gaussian curvature in 3D surface modeling. Besides Gaussian curvature being an invariant quantity under change of observers or coordinate systems, we demonstrate using the Middlebury stereo dataset that it offers a sparse and compact description of 3D surfaces. Furthermore, we show a strong correlation between the performance rank of top state-of-the-art stereo and monocular methods and the low total absolute Gaussian curvature. We propose that this property can serve as a geometric prior to improve future 3D reconstruction algorithms.
In this paper we study the importance of the Gaussian curvature in 3D vision, presenting an analysis on 3D reconstruction provided by stereo algorithms on the Middlebury benchmark.
We observed that the best techniques tend to reconstruct surfaces with low Gaussian curvature and more consistent normals. See some experiments below:
This table presents the Middlebury benchmark ranking for the 15 training images, with techniques listed in descending LGC order. Superscripts indicate each method’s rank among the compared techniques for the metrics AvgError, RMS, Bad 2.0, and Bad 4.0. Darker cell shading highlights better performance, indicating the technique is among the Top 1, Top 3, or Top 5 best approaches. Notably, top-performing methods (Group A) generally exhibit higher LGC (i.e., lower GC).
GT x SOTA approaches: a point-wise analysis of curvature for "Piano" image. Black coordinates represent values of $|K| > 1{,}000m^{-2}$. For the GT, black coordinates also represent NaN values, which are measurement inconsistencies during Middlebury disparity estimation. In the second row, we applied smoothing with $\sigma = 2\, m$ in the 3D point cloud before computing the GC.
Interpretation: Foundation-Stereo estimates lower Gaussian curvature than Selective-IGEV.
This table presents techniques listed in ascending Normal Average Error (NormAvg) order for the 15 Middlebury training images. Superscripts indicate the lowest 1-5th Normal Error (NormalsErr) per technique for each one of the 15 Middlebury images. Darker cell shading highlights lower NormalsErr, indicating the technique is among the Top 1, Top 3, or Top 5 approaches with the lowest NormalsErr. Notably, the top-performing methods are from the Group A, and predict surface normals that are closely aligned with the ground truth.
Normals Analysis: Qualitative comparison on normals reconstruction for Piano data.
Interpretation: Selective-IGEV reconstructs noisier surfaces, while Foundation-Stereo and BLMT-Stereo present consistent normals. Also, BLMT-Stereo has sharper results near the edges (depth discontinuities).
In summary, our work enhances the interpretability and understanding of 3D vision by highlighting Gaussian Curvature as an intrinsic geometric prior for indoor 3D surfaces. Grounded in modern deep learning data, our approach underscores the importance of 3D geometric modeling in capturing critical visual information and can guide the development of next-generation vision systems.
Towards Understanding 3D Vision: the Role of Gaussian Curvature
Sherlon Almeida da Silva, Davi Geiger, Luiz Velho, and Moacir Antonelli Ponti
If you have any questions, please reach out to Sherlon Almeida: sherlon@usp.br.
@inproceedings{dasilva2026gausscurv,
title={Towards Understanding 3D Vision: the Role of Gaussian Curvature},
author={da Silva, Sherlon Almeida and Geiger, Davi and Velho, Luiz and Ponti, Moacir Antonelli},
booktitle={21st International Conference on Computer Vision Theory and Applications},
year={2026},
organization={SCITEPRESS}
}