从单张野外图片生成高保真3D人脸纹理是一项具有挑战的工作,现有方法在颜色和光照恢复方面已经取得了显著的进展,但是却仍然无法较好地重建中高频纹理细节。其主要原因在于真实面部UV纹理数据集的匮乏,现有模型大多基于合成UV纹理图训...从单张野外图片生成高保真3D人脸纹理是一项具有挑战的工作,现有方法在颜色和光照恢复方面已经取得了显著的进展,但是却仍然无法较好地重建中高频纹理细节。其主要原因在于真实面部UV纹理数据集的匮乏,现有模型大多基于合成UV纹理图训练模型,由于缺少真实标签的监督其与真实UV纹理相比必然存在较大的差异,从而导致模型学习到错误的纹理分布。基于以上思考,我们尝试使用原始图片空间中的细节纹理来引导UV空间中UV纹理图的生成,并提出两阶段训练方式以缓解仅使用合成UV纹理图训练模型带来的个性化细节缺失问题。此外,借助于扩散模型在图像生成任务中的卓越性能,我们还设计了一款跨域引导扩散模型,其将空间域和频率域中的细节信息编码为高级语义条件,用来引导扩散模型的生成过程,从而实现近乎精确的重建。最后,我们将跨域引导扩散模型作为UV纹理生成器嵌入到三维重建框架中,用于重建高保真的3D人脸纹理。实验结果表明本文提出的跨域引导扩散模型能较好地生成中高频纹理细节,并在定量和定性分析中明显优于其他3D人脸纹理生成工作。Generating high-fidelity 3D facial textures from a single outdoor image is a challenging task. While existing methods have made significant progress in color and lighting recovery, they still struggle to accurately reconstruct mid-to-high frequency texture details. This is primarily due to the lack of real facial UV texture datasets. Most models are trained using synthetic UV texture maps, which inherently differ from real UV textures due to the absence of ground truth supervision, leading to inaccurate texture distribution learning. In light of this, we attempt to use detailed textures from the original image space to guide the generation of UV texture maps in the UV space. We propose a two-stage training approach to alleviate the loss of personalized details caused by training models solely on synthetic UV texture maps. Additionally, leveraging the exceptional performance of diffusion models in image generation tasks, we design a cross-domain guided diffusion model. This model encodes detailed information from both spatial and frequency domains into high-level semantic conditions to guide the diffusion process for near-accurate reconstruction. Finally, we integrate the cross-domain guided diffusion model as a UV texture generator into a 3D reconstruction framework to reconstruct high-fidelity 3D facial textures. Experimental results demonstrate that our proposed cross-domain guided diffusion model effectively generates mid-to-high frequency texture details and significantly outperforms other 3D facial texture generation methods in both quantitative and qualitative analyses.展开更多
文摘从单张野外图片生成高保真3D人脸纹理是一项具有挑战的工作,现有方法在颜色和光照恢复方面已经取得了显著的进展,但是却仍然无法较好地重建中高频纹理细节。其主要原因在于真实面部UV纹理数据集的匮乏,现有模型大多基于合成UV纹理图训练模型,由于缺少真实标签的监督其与真实UV纹理相比必然存在较大的差异,从而导致模型学习到错误的纹理分布。基于以上思考,我们尝试使用原始图片空间中的细节纹理来引导UV空间中UV纹理图的生成,并提出两阶段训练方式以缓解仅使用合成UV纹理图训练模型带来的个性化细节缺失问题。此外,借助于扩散模型在图像生成任务中的卓越性能,我们还设计了一款跨域引导扩散模型,其将空间域和频率域中的细节信息编码为高级语义条件,用来引导扩散模型的生成过程,从而实现近乎精确的重建。最后,我们将跨域引导扩散模型作为UV纹理生成器嵌入到三维重建框架中,用于重建高保真的3D人脸纹理。实验结果表明本文提出的跨域引导扩散模型能较好地生成中高频纹理细节,并在定量和定性分析中明显优于其他3D人脸纹理生成工作。Generating high-fidelity 3D facial textures from a single outdoor image is a challenging task. While existing methods have made significant progress in color and lighting recovery, they still struggle to accurately reconstruct mid-to-high frequency texture details. This is primarily due to the lack of real facial UV texture datasets. Most models are trained using synthetic UV texture maps, which inherently differ from real UV textures due to the absence of ground truth supervision, leading to inaccurate texture distribution learning. In light of this, we attempt to use detailed textures from the original image space to guide the generation of UV texture maps in the UV space. We propose a two-stage training approach to alleviate the loss of personalized details caused by training models solely on synthetic UV texture maps. Additionally, leveraging the exceptional performance of diffusion models in image generation tasks, we design a cross-domain guided diffusion model. This model encodes detailed information from both spatial and frequency domains into high-level semantic conditions to guide the diffusion process for near-accurate reconstruction. Finally, we integrate the cross-domain guided diffusion model as a UV texture generator into a 3D reconstruction framework to reconstruct high-fidelity 3D facial textures. Experimental results demonstrate that our proposed cross-domain guided diffusion model effectively generates mid-to-high frequency texture details and significantly outperforms other 3D facial texture generation methods in both quantitative and qualitative analyses.