摘要
随着数字媒体等技术的发展,出现了弹幕系统这种新型的评论模式并逐渐流行。它能够使视频观众即时发布关于视频情节内容的评论,也可以帮助观众理解视频内容。弹幕文本数据的产生,为短文本处理和实时数据处理提供了新的素材。研究弹幕数据的特点和其表达的情感,可以帮助我们更好地理解视频情节;研究弹幕内容之间的相似度进而分析用户之间的关联关系,不仅能够深入了解弹幕用户的特点、发掘不同视频之间的潜在联系,而且可以为视频制作时受众群体的选择提供更为准确的解决方案。首先将弹幕文本数据进行收集和预处理,然后计算这些文本的情感值。针对弹幕文本口语化的特点,建立了网络弹幕常用词词典。通过改进传统的k-means聚类算法,对所有发表弹幕的用户进行基于情感值的分类。这样的分类可以帮助我们了解观看特定类型视频的观众在情感上的异同点。
With the development of digital media and other technologies,barrage comments,a new type of commentary system have become more and more popular.It enables audiences to immediately comment on videos and helps them understand the content.Barrage comments open up a new study area in short text and real-time data processing.By studying barrage comments deeply,we can understand the video plot;by studying the similarity between barrage comments and analyzing the association between users,we are able to understand the features of the users and potential connections between different videos,which can also provide a more accurate solution to the selection of target audience at the time of video production.We first introduce the collection and pre-processing on barrage comments,and then calculate the emotional values.Since the barrage comments are usually oral and out of structure in syntax and grammar,a dictionary for the commonly used barrage comments is built.The classic kmeans is adapted for obtaining the user groups based on the emotional values.We perform emotionbased classification for all users who post barrage comments.This sort of classification can help us understand the emotional similarities and differences among viewers watching aparticular type of videos.
作者
洪庆
王思尧
赵钦佩
李江峰
饶卫雄
HONG Qing;WANG Si-yao;ZHAO Qin-pei;LI Jiang-feng;RAO Wei-xiong(School of Software Engineering,Tongji University,Shanghai 200092,China)
出处
《计算机工程与科学》
CSCD
北大核心
2018年第6期1125-1139,共15页
Computer Engineering & Science
基金
国家自然科学基金(61572365
61503286
61702372)
上海市自然科学基金(15ZR1443000)
上海市科技英才扬帆计划项目(15YF1412600)
上海市科委项目(14DZ1118700)
中央高校基本科研业务费专项资金
关键词
弹幕系统
短文本分析
时间序列
情感分析
用户分类
barrage comments system
short text analysis
time series
sentiment analysis
user classification