摘要
为进一步提高PM_(2.5)浓度预测的精度,基于XGBoost和LSTM进行改进得到变权组合模型XGBoost-LSTM(Variable).过对预测因子进行相关性分析,得到其它大气污染物和气象因素对PM_(2.5)浓度的影响,确定最优PM_(2.5)浓度预测因子,再将预处理后数据集输入LSTM模型和XGBoost模型分别进行预测,采用基于残差改进的自适应变权组合方法得到最终预测结果.结果表明,污染物变量的相对重要性高于气象因子变量,其中当前PM_(2.5)和CO浓度的相对重要性较高,而平均风速和相对湿度重要性较低.XGBoost-LSTM(Variable)模型的RMSE、MAE和MAPE值为1.75、1.12和6.06,优于LSTM、XGBoost、SVR、XGBoost-LSTM(Equal)和XGBoost-LSTM(Residual)模型.分季节预测结果表明,XGBoost-LSTM(Variable)模型在春季预测精度最好,而夏季预测精度较差.模型预测精度高的原因在于其不仅考虑了数据的时间序列特征,又兼顾了数据的非线性特征.
In order to further improve the accuracy of PM_(2.5) concentration prediction,a variable weight combination short-term 1-hour PM_(2.5) concentration prediction model based on LSTM network and XGBoost model was proposed.First,analyze the predictive factors,explore the influence of air pollutant factors and meteorological factors on the PM_(2.5) concentration,to determine the best PM_(2.5) concentration predictive factors and analysis the variable importance.Then,after data pretreatment the LSTM prediction model and the XGBoost prediction model was built respectively,and adopt the adaptive variable weight combination method based on residual improvement to obtain the final prediction result.The results show that:The relative importance of pollutant variables is higher than the importance of meteorological factors,among which the relative importance of current PM_(2.5) concentration and CO concentration is higher,while the importance of average wind speed and relative humidity is lower.The values of RMSE,MAE and MAPE of the variable weight combined XGBoost-LSTM(Variable)model proposed in this study are 1.75,1.12 and 6.06,which are better than LSTM,XGBoost,SVR,XGBoost-LSTM(Equal)and XGBoost-LSTM(Residual)model.The combined model predicts performance best in spring but the forecast accuracy is poor in summer.The variable weight method combination model proposed in this study effectively combines the advantages of the two models,not only considers the time series information of the data but also takes into account the nonlinear relationship between the features,and has higher prediction accuracy compared with other models.
作者
康俊锋
谭建林
方雷
肖亚来
KANG Jun-feng;TAN Jian-lin;FANG Lei;XIAO Ya-lai(School of Civil and Surveying&Mapping Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,China;Department of Environmental Science and Engineering,Fudan University,Shanghai 200433,China)
出处
《中国环境科学》
EI
CAS
CSCD
北大核心
2021年第9期4016-4025,共10页
China Environmental Science
基金
国家重点研发计划项目(2016YFC08033105)
国家留学基金资助项目(201808360065)
江西省教育厅科学技术研究项目(GJJ150661)
国家自然科学基金青年基金资助项目(41701462)。