期刊文献+

基于Logistic回归模型的藏文文本分类研究与实现 被引量:8

Research and Implementation of Based on the Logistic Regression Model for Tibetan Text Classification
在线阅读 下载PDF
导出
摘要 文本分类是信息处理领域的核心研究内容,在自动检索和文本过滤等研究领域中被广泛使用。本次研究主要是基于Logistic回归模型分类器对藏文文本进行分类,其核心思想是首先对藏文语料进行收集和预处理,且利用信息增益算法和欧式距离分别对文本特征进行选择与提取;其次构造Logistic回归模型分类器;最后测试和分析分类的准确率、召回率和F1值,同时,对Logistic算法和Gaussian NB算法进行分类性能对比,结果显示Logistic算法具有较好的分类效果。 Text categorization is a core research content,in the field of information processing in an automated retrieval and text filtering is widely used in the field of study.Research and implementation of the classifier is mainly based on Logistic regression model classifying Tibetan text,its core idea is to Tibetan corpus collection and pretreatment in the first place,and the use of Euclidean distance and information gain algorithm of text feature selection and extraction respectively;Then the Logistic regression model to construct classifiers;Finally,the classification accuracy of the test and analysis,the recall rate and F1 value,as well as the Logistic algorithm and GaussianNB algorithm classification performance comparison,results show that the Logistic algorithm has better classification effect.
作者 群诺 贾宏云 Qun Nuo;Jia Hongyun(Academy of Information Science and Technology,Tibet University,Lhasa Tibet 850000,China)
出处 《信息与电脑》 2018年第5期70-73,共4页 Information & Computer
基金 西藏自治区科技计划重大科技专项(项目编号:ZDZX2017000136) 西藏大学"珠峰学者人才发展支持计划"项目
关键词 藏文文本分类 LOGISTIC回归模型 特征选择与提取 Tibetan text classification Logistic regression model feature selection and extraction
  • 相关文献

参考文献2

二级参考文献17

  • 1徐凤亚,罗振声.文本自动分类中特征权重算法的改进研究[J].计算机工程与应用,2005,41(1):181-184. 被引量:56
  • 2邹娟,周经野,邓成.一种基于语义分析的中文特征值提取方法[J].计算机工程与应用,2005,41(36):164-166. 被引量:6
  • 3谈文蓉,符红光,刘莉,杨宪泽.一种基于贝叶斯分类与机读词典的多义词排歧方法[J].计算机应用,2006,26(6):1389-1391. 被引量:5
  • 4李文斌,刘椿年,陈嶷瑛.基于特征信息增益权重的文本分类算法[J].北京工业大学学报,2006,32(5):456-460. 被引量:19
  • 5Rennie J D M,Shih L,Teevan J,et al.Tackling the poor assumptions of Naive Bayes text classifiers [C]//Proceedings of the Twentieth International Conference on Machine Learning,2003,2:616-623.
  • 6Chiang J H,Chen Y C.Hierarchical fuzzy-KNN networks for news documents categorization[C]//lOth IEEE International Conference on Fuzzy Systems,2001(2) :720-723.
  • 7Sebastiani F,Nazionale C,Valdambrini N.An improved boosting algorithm and its application to text categorization[C]//Proceedings of the Ninth International Conference on Information and Knowledge Management, 2000: 78-85.
  • 8Zhang Hao,Berg A C,Maire M,et al.SVM-KNN:Discriminative nearest neighbor classification for visual category recognition[C]// IEEE Computer Society Conference on HHComputer Vision and Pattern Recognition, 2006 : 2126-2136.
  • 9Yang Y.An evaluaton of statistical approaches to text categorization[J].Information Retrieval, 1999,1 ( 1 ) : 76-78.
  • 10Komarek P,Moore A.Fast robust logistic regression for large sparse datasets with binary outputs[C]//Proceedings of the Ninth International Workshop on Artifical Intelligence and Statistics,2003:197-204.

共引文献22

同被引文献39

引证文献8

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部