摘要
近红外光谱采集过程、环境差异(温度、湿度、光照)和操作偏差对光谱数据的可靠性产生较大影响。试验提出一种PDBSCAN方法,用于自动筛选和剔除异常光谱。P-DBSCAN算法是DBSCAN聚类方法中的轮廓系数反向调整参数邻域半径和密度阈值,针对近红外谱带独有特征构造异常光谱自动剔除算法,文中使用构造数据(温度异常和角度异常)和试验数据分别测试P-DBSCAN算法的有效性,并与孤立森林(IF)、蒙特卡洛交互验证(MCCV)、马氏距离(MD)三种传统异常数据剔除方法进行对比分析,进一步将P-DBSCAN算法用于土壤有机质(OM)含量预测建模。结果表明:P-DBSCAN结合偏最小二乘回归模型(P-DBSCAN-PLS)预测能力最强;与传统算法IF、MCCV、MD比较,P-DBSCAN算法具备自适应性;与基础DBSCAN算法比较,文中提出的基于谱峰确定关键参数初值的方法,降低了基础算法搜索效果对关键参数选取的依赖性,同时显著降低了搜索工作量,提高了算法的高维以及密度不均匀数据集的适应性。
In the process of NIR spectroscopy acquisition,environmental differences(temperature,humidity,light,etc.)and operation deviations have a great impact on the reliability of spectral data.A peaks-density based spatial clustering of applications with noise(P-DBSCAN)method was proposed to automatically screen these abnormal spectra and eliminate them.P-DBSCAN algorithm was the neighborhood radius and density threshold of the contour coefficient reversely adjusted parameters in the DBSCAN clustering method.Based on the unique characteristics of the near-infrared spectral band,an automatic elimination algorithm for abnormal spectra was constructed.Constructed data(temperature anomaly and angle anomaly)and experimental data are used to test the effectiveness of the P-DBSCAN algorithm.The results were compared with three traditional methods,isolated forest(IF),Monte Carlo interactive verification(MCCV)and Mahalanobis distance(MD).Furthermore,the P-DBSCAN algorithm was used to predict soil organic matter(OM)content.The results showed that P-DBSCAN combined with partial least squares regression model(P-DBSCAN-PLS)had the strongest prediction ability.Compared with the traditional algorithms IF,MCCV and MD,P-DBSCAN algorithm has self-adaptability.Compared with the basic DBSCAN algorithm,the method proposed to determine the initial values of key parameters based on spectral peaks reduced the dependence of the search effect of the basic algorithm on the selection of key parameters,signi-ficantly reduced the search workload,and improved the adaptability of the algorithm to high-dimensional and non-uniform data sets.
作者
李昊翾
赵肖宇
Li Haoxuan;Zhao Xiaoyu(College of Information and Electrical Engineering,Heilongjiang Bayi Agricultural University,Daqing163319)
出处
《黑龙江八一农垦大学学报》
2025年第1期112-118,126,共8页
journal of heilongjiang bayi agricultural university
基金
黑龙江省自然科学科学基金项目(LH2022C061)
黑龙江八一农垦大学青年创新人才培养计划(ZRCQC202205)。