期刊文献+

基于自监督视觉Transformer的图像美学质量评价方法

Image aesthetic quality evaluation method based on self-supervised vision Transformer
在线阅读 下载PDF
导出
摘要 现有的图像美学质量评价方法普遍使用卷积神经网络(CNN)提取图像特征,但受局部感受野机制的限制,CNN较难提取图像的全局特征,导致全局构图关系、全局色彩搭配等美学属性缺失。为解决该问题,提出基于自监督视觉Transformer(SSViT)模型的图像美学质量评价方法。利用自注意力机制建立图像局部块之间的长距离依赖关系,自适应地学习图像不同局部块之间的相关性,提取图像的全局特征,从而刻画图像的美学属性;同时,设计图像降质分类、图像美学质量排序和图像语义重构这3项美学质量感知任务,利用无标注的图像数据对视觉Transformer(ViT)进行自监督预训练,增强全局特征的表达能力。在AVA(Aesthetic Visual Assessment)数据集上的实验结果显示,SSViT模型在美学质量分类准确率、皮尔森线性相关系数(PLCC)和斯皮尔曼等级相关系数(SRCC)指标上分别达到83.28%、0.7634和0.7462。以上实验结果表明,SSViT模型具有较高的图像美学质量评价准确性。 The existing image aesthetic quality evaluation methods widely use Convolution Neural Network(CNN)to extract image features.Limited by the local receptive field mechanism,it is difficult for CNN to extract global features from a given image,thereby resulting in the absence of aesthetic attributes like global composition relations,global color matching and so on.In order to solve this problem,an image aesthetic quality evaluation method based on SSViT(Self-Supervised Vision Transformer)model was proposed.Self-attention mechanism was utilized to establish long-distance dependencies among local patches of the image and to adaptively learn their correlations,and extracted the global features so as to characterize the aesthetic attributes.Meanwhile,three tasks of perceiving the aesthetic quality,namely classifying image degradation,ranking image aesthetic quality,and reconstructing image semantics,were designed to pre-train the vision Transformer in a self-supervised manner using unlabeled image data,so as to enhance the representation of global features.The experimental results on AVA(Aesthetic Visual Assessment)dataset show that the SSViT model achieves 83.28%,0.7634,0.7462 on the metrics including evaluation accuracy,Pearson Linear Correlation Coefficient(PLCC)and SRCC(Spearman Rank-order Correlation Coefficient),respectively.These experimental results demonstrate that the SSViT model achieves higher accuracy in image aesthetic quality evaluation.
作者 黄荣 宋俊杰 周树波 刘浩 HUANG Rong;SONG Junjie;ZHOU Shubo;LIU Hao(College of Information Science and Technology,Donghua University,Shanghai 201620,China;Engineering Research Center of Digitalized Textile&Fashion Technology,Ministry of Education(Donghua University),Shanghai 201620,China)
出处 《计算机应用》 CSCD 北大核心 2024年第4期1269-1276,共8页 journal of Computer Applications
基金 国家自然科学基金资助项目(62001099,61803372) 中央高校基本科研业务费专项资金资助项目(2232023D⁃30)。
关键词 图像美学质量评价 视觉Transformer 自监督学习 全局特征 自注意力机制 image aesthetic quality evaluation Vision Transformer(ViT) self-supervised learning global feature self-attention mechanism
  • 相关文献

参考文献7

二级参考文献40

  • 1Luo Y W, Tang X O. Photo and video quality evaluation: focusing on the subject [C]// Proceedings of the 10th European Conference on Computer Vision. Berlin, Germany: Springer-Verlag, 2008: 386-399. [DOI:10.1007/978-3-540-88690-7_29].
  • 2Wang Z, Bovik A C, Sheikh H R, et al. Image quality assessment: from error visibility to structural similarity [J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. [DOI: 10.1109/TIP.2003.819861].
  • 3Eskicioglu A M, Fisher P S. Image quality measures and their performance [J]. IEEE Transactions on Communications, 1995, 43(12):2959-2965. [DOI: 10.1109/26.477498].
  • 4Sheikh H R, Sabir M F, Bovik A C. A statistical evaluation of recent full reference image quality assessment algorithms [J]. IEEE Transactions on Image Processing, 2006, 15(11): 3440-3451. [DOI: 10.1109/TIP.2006.881959].
  • 5Suresh S, Babu R V, Kim H J. No-reference image quality assessment using modified extreme learning machine classifier [J]. Applied Soft Computing, 2009, 9(2): 541-552. [DOI: 10.1016/ j.asoc.2008.07.005].
  • 6Ciancio A, Da Costa ALNT, Da Silva EAB, et al. No-reference blur assessment of digital pictures based on multifeature classifiers[J]. IEEE Transactions on Image Processing, 2011, 20(1): 64-75. [DOI: 10.1109/TIP.2010.2053549].
  • 7Datta R, Joshi D, Li J, et al. Studying aesthetics in photogra- phic images using a computational approach[C]// Proceedings of the 9th European Conference on Computer Vision. Berlin, Germany: Springer-Verlag, 2006: 288-301. [DOI: 10.1007/ 11744078_23].
  • 8Ke Y, Tang X, Jing F. The design of high-level features for photo quality assessment [C]// Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE Computer Society, 2006(1): 419-426. [DOI: 10.1109/CVP R.2006.303].
  • 9Sun X, Yao H, Ji R, et al. Photo assessment based on computational visual attention model [C]// Proceedings of the ACM International Confer-ence on Multimedia. New York, USA: ACM Press, 2009:541-544. [DOI:10.1145/1631272.16 31351].
  • 10Dhar S, Ordonez V, Berg T L. High level descri-beable attri- butes for predicting aesthetics and interestingness[C]//Procee- dings of the 2011 IEEE Computer Society Conference on Compu- ter Vision and Pattern Recognition. New York, USA: IEEE Computer Society, 2011: 1657-1664. [DOI: 10. 1109/CVPR.2011.5995467].

共引文献220

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部