摘要
基于网络舆情监控的需要,设计一个网络舆情热点事件自动发现模型,包括舆情信息采集、中文分词、特征选择、文本分词和聚类分析。对K-means算法进行改进,减少算法对孤立点的敏感性和降低算法的时间和空间复杂度。利用F1值对改进的K-means算法和传统K-means算法进行性能比较,证明了该模型的可行性与有效性。
Based on the needs of the network public opinion monitoring, this paper designs a model for automatic discovering the network public opinion hot issues. The system includes public opinion information acquisition, Chinese word splitter, feature se- lection, text segmentation and clustering analysis. By improving the K-means algorithm, the sensitivity of the algorithm for outlier is reduced, and the time and space complexity of the algorithm is reduced also. This paper makes use of F1 value to compare the improved K-means algorithm with the traditional K-means algorithm, which obtains satisfactory results and proves the feasibility and effectiveness of this model.
出处
《计算机与现代化》
2014年第4期143-147,共5页
Computer and Modernization
基金
教育部人文社科基金资助项目(10YJAZH069)
江苏省"六大人才高峰"高层次人才项目(XXRJ-013)