摘要
该文利用SVM在小训练样本集条件下仍有高泛化能力的特性,结合文本分类问题中同类别文本的特征在特征空间中具有聚类性分布的特点,提出一种使用语义中心集代替原训练样本集作为训练样本和支持向量的SVM:语义SVM。文中给出语义中心集的生成步骤,进而给出语义SVM的在线学习(在线分类知识积累)算法框架,以及基于SMO算法的在线学习算法的实现。实验结果说明语义SVM及其在线学习算法具有巨大的应用潜力:不仅在线学习速度和分类速度相对于标准SVM及其简单增量算法有数量级提高,而且分类准确率方面具有一定优势。
This paper suggests a very efficient Support Vector Machine algorithm for text categorization,Semantic Support Vector Machines or Semantic SVMs.Semantic SVMs exploit the character of SVMs that they have good generation ability even with small training set.Semantic SVMs are also based on the truth that feature distribution of certain categorization of texts is clustery in feature space.The original training text set is substituted by Semantic center set in Semantic SVMs as training samples and support vectors.This paper gives out the steps to generate a Semantic SVM from training texts and the framework of on-line learning algorithm of Semantic SVMs.The implementaion of on-line learning algorithm based on Sequential Minimal Optimization is also devised in this paper.Experiments on real-life corpus show that Semantic SVMs are promising:tens times faster than standard SVMs while slightly improve the classifying precision.
出处
《计算机工程与应用》
CSCD
北大核心
2004年第36期11-14,57,共5页
Computer Engineering and Applications
基金
国家自然科学基金资助(编号:60272088)