摘要
为了提升文本信息检索的正确率及检索效率,增强读者的阅读体验,研究一种基于特征聚类的文本信息检索算法.首先采用PCA技术对高维文本信息进行降维处理,去除复杂文本信息中的冗余数据.然后采用改进K-Means算法对降维文本信息进行聚类.采用检索准确率及检索时间两种算法评价指标,分别与多种算法进行对比分析.结果显示,检索时间分别降低13.3%和25.7%,检索准确率也得到一定程度的提高.
In order to improve the accuracy and efficiency of text information retrieval and enhance readers′reading experience.A text information retrieval algorithm based on feature clustering is studied.Firstly,PCA technology is used to reduce the dimension of high-dimensional text information to remove the redundant data in complex text information.Then the improved K-Means algorithm is used to cluster the reduced dimension text information.The two evaluation indexes of retrieval accuracy and retrieval time are used to compare and analyze with various algorithms.The results show that the retrieval time is reduced by 13.3%and 25.7%respectively,and the retrieval accuracy is also improved to a certain extent.
作者
杨宇环
张开生
YANG Yu-huan;ZHANG Kai-sheng(Information Department of the Library, Shaanxi University of Science & Technology, Xi′an 710021,China;School of Electrical and Control Engineering, Shaanxi University of Science & Technology, Xi′an 710021,China)
出处
《陕西科技大学学报》
北大核心
2022年第4期178-182,共5页
Journal of Shaanxi University of Science & Technology
基金
陕西科技大学校级自选科研项目(ZX14-25)。
关键词
文本信息
特征降维
特征聚类
改进K-MEANS
算法评价
text information
feature dimensionality reduction
feature clustering
improved K-Means
algorithm evaluation