期刊文献+

基于局部对齐单目视频深度的三维场景重建 被引量:1

Toward 3D scene reconstruction from locally scale-aligned monocular video depth
在线阅读 下载PDF
导出
摘要 单目深度估计方法在各种场景下已经取得了较强的鲁棒性,该类方法通常预测尺度偏移量未知的不变仿射深度而非度量深度,因为收集大规模的不变仿射深度训练数据比收集度量深度训练数据容易得多。然而,在某些基于视频的应用场景中,例如视频深度估计和三维场景重建,每帧预测的深度中存在的未知比例和偏移量值可能会导致预测的深度不一致。为了解决该问题,我们提出了一种基于局部加权线性回归的方法,通过利用稀疏锚点恢复深度的尺度图和偏移量图,以保证连续帧之间的一致性。大量的实验表明,我们的方法可以在几个零样本基准上显著降低现有技术方法的Rel误差(相对误差)。此外,我们收集了630万张RGBD图像对来训练鲁棒的深度模型。通过局部恢复尺度和偏移量,我们的ResNet50-backbone模型性能甚至超过了最先进的DPT ViT-Large模型。与基于几何的重建方法相结合,我们提出了一种新的稠密三维场景重建流程,既能受益于稀疏点的尺度一致性,又能受益于单目深度估计方法的鲁棒性。通过对视频的每一帧依次预测深度图,我们可以重建出准确的三维场景几何信息。 Monocular depth estimation methods have achieved excellent robustness on diverse scenes,usually by predicting affine-invariant depth,up to an unknown scale and shift,rather than metric depth in that it is much easier to collect large-scale affine-invariant depth training data.However,in some video-based scenarios such as video depth estimation and 3D scene reconstruction,the unknown scale and shift residing in per-frame prediction may cause the predicted depth to be inconsistent.To tackle this problem,we propose a locally weighted linear regression method to recover the scale and shift map with very sparse anchor points,which ensures the consistency along consecutive frames.Extensive experiments show that our method can drop the Rel error(relative error)of existing state-of-the-art approaches significantly over several zero-shot benchmarks.Besides,we merge 6.3 million RGBD images to train robust depth models.By locally recovering scale and shift,our produced ResNet50-backbone model even outperforms the state-of-the-art DPT ViT-Large model.Combined with geometry-based reconstruction methods,we formulate a new dense 3D scene reconstruction pipeline,which benefits from both the scale consistency of sparse points and the robustness of monocular methods.By performing simple per-frame prediction over a video,the accurate 3D scene geometry can be recovered.
作者 徐光锴 赵峰 Guangkai Xu;Feng Zhao(National Engineering Laboratory for Brain-inspired Intelligence Technology and Application,School of Information Science and Technology,University of Science and Technology of China,Hefei 230027,China)
出处 《中国科学技术大学学报》 CAS CSCD 北大核心 2024年第4期13-22,12,66,共12页 JUSTC
基金 supported by the Anhui Provincial Natural Science Foundation (2108085UD12)。
关键词 三维场景重建 单目深度估计 局部加权线性回归 3D scene reconstruction monocular depth estimation locally weighted linear regression
  • 相关文献

同被引文献1

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部