摘要
经济统计信息多包含多维度的属性,在研究数据内在结构时,需要采用降维方法将多维信息转换到三维以内的空间以实现多维信息可视化和聚类。支持向量机(Support Vector Machine,SVM)在解决小样本、非线性及高维模式识别中表现出许多特有的优势,但SVM是一种监督分类方法,需要已知样本集来训练分类过程。由于高维经济统计数据中往往缺少已知聚类中心,从其他方法的聚类结果选择小样本集作为聚类中心具有很大的主观性;空间自相关分析能揭示出高空间聚集区域和随机离散区域,并能分析出各区域的空间聚集模式,这为已知小样本的选择提供了可行的方法。该文以四川2007年统计年鉴的经济数据为例,通过主成分分析法和非线性映射法进行聚类,将各类中心和空间自相关分析揭示的高空间聚集目标作为已知样本集导入SVM,得到的结论是:采集于主成分分析法和非线性映射法的两个不同已知样本集的SVM分类结果之间的差异较大,已知样本集的选择具有很大主观性;空间自相关分析结果能大量减少特征样本集的数目,这不仅简化了SVM算法分类过程,并且结果也能准确反映四川发展实际情况。
There are more than three attributes in economic statistical data and other data sets generally.When studying the inherent structural characteristics of these data such as clustering and distribution,researchers need to reduce multi-dimensional information to three-dimensional space or less to achieve multi-dimensional visualization and classify the data.Support Vector Machine (SVM) demonstrates many unique advantages in solving small sample,nolinear data and high dimensional pattern recognition,and it theoretically can get global optimal solution.But SVM is a supervised classification method,needing known sample set to train the classification process.Because of the lack of the known cluster centers in high-dimensional economic statistics data,it is subjective when choosing small sample set as cluster centers from classification results of other algorithms.Spatial autocorrelation can reveal the regions of high spatial aggregation and random discrete regions,and analyze spatial distribution pattern of each region,which provides a feasible method for the choice of the known small sample set.In the paper,economic statistical data of Sichuan Province in 2007 were analyzed by implementing SVM,whose known sample set were from the classification results of Principal Component Analysis (PCA) and Nonlinear Mapping (NLM),as well as high spatial aggregation pattern revealed by spatial autocorrelation.Finally,considering the status of economic development in Sichuan,the differences between the classification results of these methods were analyzed,and concluded that:there are great differences between the classification results of PCA-SVM and NLM-SVM,and the choice of known sample set are subjective.Spatial autocorrelation analysis can achieve a significant reduction in the number of the characteristic sample set,which not only simplifies the classification process of the SVM algorithm,and the results can accurately reflect the actual development situation of Sichuan.
出处
《地理与地理信息科学》
CSCD
北大核心
2014年第4期36-41,共6页
Geography and Geo-Information Science
基金
国家自然科学基金项目(40901191)
中央高校基本科研业务费资助项目(ZD20140203)
河北省高等学校科学研究计划(ZD20140203)
关键词
支持向量机
空间自相关
空间聚类
降维
主成分分析
非线性映射
Support Vector Machine
spatial autocorrelation
spatial clustering
dimension-reduction
Principal Component Analysis
Nonlinear Mapping