摘要
对于建立动态贝叶斯网络(DBN)分类模型时,带有类标注样本数据集获得困难的问题,提出一种基于EM和分类损失的半监督主动DBN学习算法.半监督学习中的EM算法可以有效利用未标注样本数据来学习DBN分类模型,但是由于迭代过程中易于加入错误的样本分类信息而影响模型的准确性.基于分类损失的主动学习借鉴到EM学习中,可以自主选择有用的未标注样本来请求用户标注,当把这些样本加入训练集后能够最大程度减少模型对未标注样本分类的不确定性.实验表明,该算法能够显著提高DBN学习器的效率和性能,并快速收敛于预定的分类精度.
A semi-supervised active DBN learning algorithm based on EM and classification loss is set forth for building Dynamic Bayesian Networks (DBN) classifier when it is difficult to get sufficient labeled training data. Although the EM algorithm of semi-supervised learning can use unlabeled examples to learn DBN, it often suffers from adding incorrect class information which affect classifier's accuracy. The classification loss method of active learning combined with EM results in maximal reduction of the uncertainty of classifying unlabeled examples through actively selecting useful unlabeled examples to label and adding them to training data. Experimental results show that the proposed algorithm can improve the efficiency and accuracy of DBN learner and can achieve expected classification accuracy quickly.
出处
《小型微型计算机系统》
CSCD
北大核心
2007年第4期656-660,共5页
Journal of Chinese Computer Systems
基金
中央民族大学青年教师科研基金项目
北京市教委重点学科共建项目.
关键词
动态贝叶斯网络
半监督学习
主动学习
EM算法
dynamic bayesian networks
semi-supervised learning
active learning
expectation-maximization algorithm