摘要
本文提出了一种新的基于EP的分类法CEEP。CEEP仅使用最短的EP(eEP)建立分类器,并使用不同于早先的基于EP的分类法(如,CAEP)的评分标准。文中还讨论了eEP的有效挖掘,最小支持度和最小增长率阈值的自适应选取等问题。在UCI机器学习库中的12个数据集上的实验表明,本文的分类方法具有很好的分类正确率。如何保证eEP有足够的履盖率,以及如何处理稀有类的分类,尚待进一步研究。此外,如何将装袋(bagging)和推进(bootstrap)的思想与CEEP的方法相结合,进一步提高分类的正确率,也是值得深入研究的问题。
Emerging patterns (EPs)are itemsets whose supports change significantly from one data class to another. It has been shown that they are useful for constructing accurate classifiers. However,the existing EP-based classifiers may suffer from two major deficiencies: (1)they use a large number of EPs,which may lead to high processing over-head; and (2)their scoring method based on growth rate and support may reduce the contribution of EPs with high differentiating power and low support,which may lead to misclassification. This work proposes a novel classification method,CEEP (Classification by Essential Emerging Patterns),which uses a special kind of EPs,called essential Emerging Patterns (eEPs),and a growth-rate-based scoring method to construct classifiers. Mining eEPs is much easer than mining EPs,and using eEPs only is sufficient to construct accurate classifiers. Our experiment study carried on 12 benchmark datasets from the UCI Machine Learning Repository shows that CEEP performs comparably with other state-of-the-art classification methods such as NB,C5.0,CBA,CMAR,CAEP and BCEP in terms of overall predictive accuracy.
出处
《计算机科学》
CSCD
北大核心
2004年第11期211-214,共4页
Computer Science
基金
河南省自然科学基金(项目号:0211050100)