摘要
本研究针对大数据分析预处理的缺失数据值填补问题及解决方案进行了探讨,提出了一种用于缺失值填充的插补算法。该算法在MissForest算法基础上融合K折交叉验证的思想,通过在不同缺失率下的插补试验与分析表明:该算法的填补误差小于传统的CNN插补算法,运行时间复杂度优于基础的MissForest算法,具有较强的泛化能力。
This study discusses the problem of missing data value filling in the preprocessing of big data analysis and its solutions, and proposes an interpolation algorithm for missing value filling. This algorithm combines the idea of K-fold cross-validation on the basis of the MissForest algorithm. Interpolation experiments and analysis under different missing rates show that the algorithm has less filling error than the traditional CNN interpolation algorithm, running time complexity is better than the basic MissForest algorithm, and has a strong generalization ability.
作者
华南
马春萍
朱彦霞
刘惠萍
曹彦
王飞
张利鹏
HUA Nan;MA Chunping;ZHU Yanxia;LIU Huiping;CAO Yan;WANG Fei;ZHANG Lipeng(China Radio and Television Henan Network Co.,Ltd.,Zhengzhou 450000,China;The First Affiliated Hospital of Henan University of CM,Zhengzhou 450000,China;Henan General Hospital,Zhengzhou 450002,China;College of Information Engineering Xuchang University,Xuchang 461000,China;Henan Univer-sity of Animal Husbandry and Economy,Zhengzhou 450000,China)
出处
《河南科技》
2022年第3期18-21,共4页
Henan Science and Technology
基金
2020年度河南省医学科技攻关计划联合共建项目(LHGJ20200242)
2021年度河南省重点研发与推广专项(科技攻关)项目(212102311002,212102210138,212102311000)
2022年度河南省高等学校重点项目(22B520023,22A520040)。
关键词
数据缺失
机器学习
缺失值插补
随机森林
missing data
machine learning
imputation of missing values
random forest