期刊文献+

基于信息熵函数的启发式贝叶斯因果推理 被引量:8

Heuristic Bayesian Causal Inference based on Information Entropy Function
在线阅读 下载PDF
导出
摘要 贝叶斯网络分类器(BNC)由于其优越的分类性能和可解释性在数据挖掘和人工智能等领域有着广泛的应用.信息论为其迅速发展奠定了坚实的数学理论基础,例如条件互信息被用来度量BNC拓扑结构中属性间的条件依赖关系.然而,贝叶斯网络又被称为因果网络,但目前人工智能等领域中有关贝叶斯网络因果关系的研究是一个很有争议性的课题.属性间因果性的定义远比相关性的定义复杂微妙很多.而条件互信息可能不适用于度量BNC整体拓扑结构对数据的拟合性,并且其表达式的对称性决定了其只能描述属性之间的无向相关性,而非有向因果性.本文从信息熵的角度对贝叶斯网络中的因果关系进行了探索性的研究,首先基于对似然函数定义了联合熵函数与贝叶斯网络拓扑结构中联合概率分布的映射关系,然后在此基础上提出了类条件熵和局部条件熵函数来识别拓扑结构中属性间的因果关系.最后提出了一种基于类标签驱动的启发式结构学习方法来构建可以兼顾有标签数据拟合和无标签数据泛化的BNC(记为HBN).对美国加州大学欧文分校(UCI)机器学习数据库中35个数据集的实验评估表明,本文所提出算法与其它算法相比在分类性能上具有显著优势,例如HBN在0-1损失函数上明显优于CFWNB(17优5劣)、SKDB(14优5劣)、AIWNB(17优7劣);在偏差上HBN与CFWNB相比26优6劣,与SKDB相比10优5劣,与WAODE相比15优7劣,与RF相比29优4劣,与AIWNB相比22优6劣.由于CFWNB、WAODE、AIWNB没有结构学习过程,其拓扑结构不受训练数据扰动的影响.这三种算法的方差显著低于其它算法.而HBN的局部拓扑结构能充分体现测试实例中隐含的因果关系,在一定程度上减轻训练数据过拟合带来的负面影响.因此,与SKDB和RF相比,HBN的方差结果均明显占优(20优9劣,26优3劣).与其他算法相比,HBN的0-1损失函数和偏差结果分别平均提高了6.06%和12.65%.与SKDB和RF相比,HBN的方差结果平均提高了16.49%.HBN为不确定性知识表示和推理提供了一种有效且可行的方法. Bayesian network classifier(BNC)has been widely used in the data mining,artificial intelligence and other fields due to its excellent classification performance and interpretability.Information theory has established a strong mathematical and theoretical basis for its rapid development.For example,conditional mutual information is widely used to measure the conditional dependence between attributes in the topology structure of BNC.However,Bayesian network is also called causal network,the research on causality in the Bayesian network is a controversial topic in the artificial intelligence and other fields.The definition of causality between attributes is much more complex and subtler than that of correlation.Conditional mutual information may be not suitable for measuring the extent to which the global topology structureof BNC fits data,and the symmetry of its expression determines that it can only describe the undirected correlation between attributes,not the directed causality.An exploratory research is carried out in the causal relationship of Bayesian networks from the perspective of information entropy.This paper firstly defines the mapping relationship between the joint entropy function and the joint probability distribution within the Bayesian networks from the perspective of the log-likelihood function,and then proposes the class conditional entropy function and local conditional entropy function based on the joint entropy function to identify the causal relationships between attributes in the topology structure.Finally,a label-driven heuristic structure learning method is proposed to build a BNC that can balance labeled data fitting and unlabeled data generalization,which is named HBN.Experimental evaluation on 35 datasets from the UCI machine learning repository shows that the proposed algorithm enjoys significant advantages in terms of classification performance over other state-of-the-art algorithms.For example,in terms of 0-1 loss function,HBN beats the algorithm of correlation-based feature weighting filter for naive Bayes(CFWNB)on 17 datasets and loses 5,beats selective k-dependence Bayesian classifier(SKDB)on 14 datasets and loses 5,beats attribute and instance weighted naive Bayes(AIWNB)on 17 datasets and loses 7.In terms of bias,HBN beats CFWNB on 26 datasets and loses 6,beats SKDB on 10 datasets and loses 5,beats AIWNB on 22 datasets and loses 6.Besides,when compared with ensemble algorithms,HBN also achieves significant advantages over weighted average one-estimators(WAODE:11 wins and 2 loses in terms of 0-1 loss;15 wins and 7 loses in terms of bias)and random forest(RF:19 wins and 9 loses in terms of 0-1 loss;29 wins and 4 loses in terms of bias).Variance-wise,CFWNB,WAODE and AIWNB have no structure learning and are irrelevant to the variation of training data,thus they enjoy lower variance results.The local topology of HBN can fully reflect the implicit causality in testing instances,and reduce the negative impact of training data over fitting to a certain extent.Thus,HBN has significant advantages in terms of variance over SKDB(20 wins and 9 loses)and RF(26 wins and 3 loses).Compared with other algorithms,the average 0-1 loss and bias results of HBN are improved by about 6.06%and 12.65%.Compared with SKDB and RF,the average variance results of HBN is improved by about 16.49%.HBN is effective and feasible for uncertain knowledge representation and reasoning.
作者 刘洋 王利民 孙铭会 LIU Yang;WANG Li-Min;SUN Ming-Hui(College of Computer Science and Technology,Jilin University,Changchun 130012;Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education,Jilin University,Changchun 130012)
出处 《计算机学报》 EI CAS CSCD 北大核心 2021年第10期2135-2147,共13页 Chinese Journal of Computers
基金 国家重点研发计划(No.2019YFC1804804) 吉林省科技发展计划项目(No.20200201281JC)资助.
关键词 贝叶斯网络分类器 对数似然函数 联合熵 条件熵 交叉熵 bayesian network classifier log likelihood function joint entropy conditional entropy cross entropy
  • 相关文献

参考文献8

二级参考文献30

  • 1Han J, Kamber M. Data Mining: Concepts and Techniques. 2nd Edition. San Francisco, CA: Morgan Kaufmann, 2005.
  • 2Friedman N, Geiger D, Goldszmidt M. Bayesian network classifiers. Machine Learning, 1997, 29(2/3): 131-163.
  • 3Greiner R, Zhou W. Structural extension to logistic regres- sion= Discriminative parameter learning of belief net classifi- ers//Proceedings of the 18th Annual National Conference on Artificial Intelligence ( AAAI 2002). Edmonton, Canada, 2002:167-173.
  • 4Greiner R, Su X, Shen B et al. Structural extension to logis- tic regression: Discriminative parameter learning of belief net classifiers. Machine Learning, 2005, 59(3): 297-322.
  • 5Chickering D M, Heckerman D, Meek C. Large sample learning of Bayesian networks is NP-hard. The Journal of Machine Learning Research, 2004, 5 : 1287-1330.
  • 6Tillman R E. Structure learning with independent non-identi cally distributed data//Proceedings of the 26th Annual Inter national Conference on Machine Learning. New York, 2009: 1041-1048.
  • 7Zheng Z, Webb G I. Lazy learning of Bayesian rules. Machine Learning, 2000, 41(1): 53-84.
  • 8Keogh E J, Pazzani M J. Learning the structure of augmen- ted Bayesian classifiers. International Journal on Artificial In telligence Tools, 2002, 11(4): 587-601.
  • 9Webb G I, Boughton J R, Wang Z. Not so naive Bayes: Ag- gregating one-dependence estimators. Machine Learning, 2005, 58(1): 5-24.
  • 10Naele A, Dejori M, Stetter M. Bayesian substructure learn- ing- Approximate learning of very large network strue tures//Proceedings of the 18th European Conference on Ma- chine Learning (ECML, 2007). Warsaw, Poland, 2007: 238-249.

共引文献279

同被引文献92

引证文献8

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部