期刊文献+

基于Logistic回归和随机森林的心力衰竭预后预测建模 被引量:4

Prognosis prediction modeling of heart failure based on Logistic regression and random forest
在线阅读 下载PDF
导出
摘要 目的基于电子病历系统结构化信息创建的临床数据库,通过机器学习算法进行数据预处理和特征选择,构建预测心力衰竭患者住院期间死亡和6个月内死亡预测模型,从而辅助识别高危患者,为治疗干预提供指导。方法以PhysioNet网站上公开的一个数据集为研究数据来源,该数据集纳入了2016年12月至2019年6月在四川省自贡市第四人民医院住院的心力衰竭患者临床信息,利用Python进行数据预处理、特征选择,并构建Logistic回归及随机森林预后预测模型,以增大ROC曲线下面积(area under curve,AUC)为目标优化模型,并在测试集中以AUC、准确率、精确度、召回率和F1分数综合验证模型预测效果。结果通过数据预处理共获得146项特征用于住院期间心力衰竭死亡预测建模,155项特征用于6个月内心力衰竭死亡预测建模,基于随机森林的建模方法用于住院期间死亡效果最佳,AUC为0.8931;在6个月内死亡预测上,结合LASSO和RFE进行特征选择,筛选出包括出院去向(健康护理机构、家庭或未知)、入院病房(全科)、出院科室(心血管科)、Killip分级(Ⅰ、Ⅱ和Ⅲ级)、心肌梗死情况以及充血性心力衰竭情况共10个特征进行Logistic回归建模,AUC达到0.8336,与基于全部特征进行随机森林特征效果(AUC=0.8460)相当。结论本研究探索出一套针对电子病历系统结构化临床数据进行数据预处理、特征工程、机器学习算法建模并验证模型的方法,利用真实世界数据构建兼顾预测准确性和高危个体检出率的心衰预后预测模型。 Objective To assist screening heart failure patients with high risk and provide guidance for treatment and intervention by conducting machine learning algorithm in data preprocessing,feature engineering,and fitting prognostic prediction models for death during hospitalization and death within 6 months,based on the clinical database established by extracting data from electronic healthcare record system.Methods We downloaded an open-access dataset containing the clinical information of heart failure patients admitted in Zigong Fourth People’s Hospital in Sichuan Province from Dec 2016 to Jun 2019 as the data source of our study.The data preprocessing,feature engineering and prognostic predicting models fitting was conduct in Python development environment.The optimization of the prediction models was conducted aiming to improve the area under the curve(AUC)of ROC.AUC,accuracy,precision,recall and F1-score were used for the evaluation of the models in the test dataset.Results A total of 146 features were used to fit the in-hospital mortality prediction model,and 155 features were collected for the 6-months mortality prediction model fitting after data preprocessing.Model based on random forest showed better prediction effects on in-hospital mortality,with an AUC of 0.8931,while logistic regression with 10 features selected by LASSO integrated with RFE,including discharge destination(healthcare,home or unknown),admission ward(general ward),discharge department(cardiology department),Killip grade(Ⅰ,Ⅱ,Ⅲ),myocardial infarction and congestive heart failure,showed equivalent effects(AUC=0.8336)on 6-month mortality prediction to model based on random forest(AUC=0.8460).Conclusion We explored the process of data preprocessing,feature engineering,prognostic prediction models fitting and validation with machine learning algorithm based on the structured clinical data extracted from electronic healthcare record system,and established prognostic prediction models that can balance the prediction accuracy and recall of the high risk heart failure patients with real world data.
作者 童睿 阚丽虹 朱中生 TONG Rui;KAN Li-hong;ZHU Zhong-sheng(Department of Cardiology,Shanghai Pudong Hospital-Fudan University Pudong Medical Center,Shanghai 201399,China)
出处 《复旦学报(医学版)》 CAS CSCD 北大核心 2022年第5期656-664,共9页 Fudan University Journal of Medical Sciences
基金 复旦大学附属浦东医院重点学科项目(Zdxk2020-06) 上海市浦东新区卫生系统重点专科建设项目(PWZzk2017-17)。
关键词 心力衰竭 预后预测 机器学习 heart failure prognosis prediction machine learning
  • 相关文献

同被引文献78

引证文献4

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部