Support Vector Clustering (SVC) is a kernel-based unsupervised learning clustering method. The main drawback of SVC is its high computational complexity in getting the adjacency matrix describing the connectivity for ...Support Vector Clustering (SVC) is a kernel-based unsupervised learning clustering method. The main drawback of SVC is its high computational complexity in getting the adjacency matrix describing the connectivity for each pairs of points. Based on the proximity graph model [3], the Euclidean distance in Hilbert space is calculated using a Gaussian kernel, which is the right criterion to generate a minimum spanning tree using Kruskal's algorithm. Then the connectivity estimation is lowered by only checking the linkages between the edges that construct the main stem of the MST (Minimum Spanning Tree), in which the non-compatibility degree is originally defined to support the edge selection during linkage estimations. This new approach is experimentally analyzed. The results show that the revised algorithm has a better performance than the proximity graph model with faster speed, optimized clustering quality and strong ability to noise suppression, which makes SVC scalable to large data sets.展开更多
Support vector clustering (SVC) is an important boundary-based clustering algorithm in multiple applications for its capability of handling arbitrary cluster shapes.However,SVC's popularity is degraded by its highl...Support vector clustering (SVC) is an important boundary-based clustering algorithm in multiple applications for its capability of handling arbitrary cluster shapes.However,SVC's popularity is degraded by its highly intensive time complexity and poor label performance.To overcome such problems,we present a novel efficient and robust convex decomposition based cluster labeling (CDCL) method based on the topological property of dataset.The CDCL decomposes the implicit cluster into convex hulls and each one is comprised by a subset of support vectors (SVs).According to a robust algorithm applied in the nearest neighboring convex hulls,the adjacency matrix of convex hulls is built up for finding the connected components;and the remaining data points would be assigned the label of the nearest convex hull appropriately.The approach's validation is guaranteed by geometric proofs.Time complexity analysis and comparative experiments suggest that CDCL improves both the efficiency and clustering quality significantly.展开更多
The probability of default(PD) is the key element in the New Basel Capital Accord and the most essential factor to financial institutions' risk management.To obtain good PD estimation,practitioners and academics h...The probability of default(PD) is the key element in the New Basel Capital Accord and the most essential factor to financial institutions' risk management.To obtain good PD estimation,practitioners and academics have put forward numerous default prediction models.However,how to use multiple models to enhance overall performance on default prediction remains untouched.In this paper,a parametric and non-parametric combination model is proposed.Firstly,binary logistic regression model(BLRM),support vector machine(SVM),and decision tree(DT) are used respectively to establish models with relatively stable and high performance.Secondly,in order to make further improvement to the overall performance,a combination model using the method of multiple discriminant analysis(MDA) is constructed.In this way,the coverage rate of the combination model is greatly improved,and the risk of miscarriage is effectively reduced.Lastly,the results of the combination model are analyzed by using the K-means clustering,and the clustering distribution is consistent with a normal distribution.The results show that the combination model based on parametric and non-parametric can effectively enhance the overall performance on default prediction.展开更多
基金TheNationalHighTechnologyResearchandDevelopmentProgramofChina (No .86 3 5 11 930 0 0 9)
文摘Support Vector Clustering (SVC) is a kernel-based unsupervised learning clustering method. The main drawback of SVC is its high computational complexity in getting the adjacency matrix describing the connectivity for each pairs of points. Based on the proximity graph model [3], the Euclidean distance in Hilbert space is calculated using a Gaussian kernel, which is the right criterion to generate a minimum spanning tree using Kruskal's algorithm. Then the connectivity estimation is lowered by only checking the linkages between the edges that construct the main stem of the MST (Minimum Spanning Tree), in which the non-compatibility degree is originally defined to support the edge selection during linkage estimations. This new approach is experimentally analyzed. The results show that the revised algorithm has a better performance than the proximity graph model with faster speed, optimized clustering quality and strong ability to noise suppression, which makes SVC scalable to large data sets.
基金supported by the National Natural Science Foundation of China under Grant No. 60972077 and partially under Grant No. 70921061the National Science and Technology Major Program under Grant No. 2010ZX03003-003-01+1 种基金the Natural Science Foundation of Beijing under Grant No. 9092009the Fundamental Research Funds for the Central Universities under Grant No.2011RC0212
文摘Support vector clustering (SVC) is an important boundary-based clustering algorithm in multiple applications for its capability of handling arbitrary cluster shapes.However,SVC's popularity is degraded by its highly intensive time complexity and poor label performance.To overcome such problems,we present a novel efficient and robust convex decomposition based cluster labeling (CDCL) method based on the topological property of dataset.The CDCL decomposes the implicit cluster into convex hulls and each one is comprised by a subset of support vectors (SVs).According to a robust algorithm applied in the nearest neighboring convex hulls,the adjacency matrix of convex hulls is built up for finding the connected components;and the remaining data points would be assigned the label of the nearest convex hull appropriately.The approach's validation is guaranteed by geometric proofs.Time complexity analysis and comparative experiments suggest that CDCL improves both the efficiency and clustering quality significantly.
基金supported by the National Natural Science Foundation of China Key Project under Grant No.70933003the National Natural Science Foundation of China under Grant Nos.70871109 and 71203247
文摘The probability of default(PD) is the key element in the New Basel Capital Accord and the most essential factor to financial institutions' risk management.To obtain good PD estimation,practitioners and academics have put forward numerous default prediction models.However,how to use multiple models to enhance overall performance on default prediction remains untouched.In this paper,a parametric and non-parametric combination model is proposed.Firstly,binary logistic regression model(BLRM),support vector machine(SVM),and decision tree(DT) are used respectively to establish models with relatively stable and high performance.Secondly,in order to make further improvement to the overall performance,a combination model using the method of multiple discriminant analysis(MDA) is constructed.In this way,the coverage rate of the combination model is greatly improved,and the risk of miscarriage is effectively reduced.Lastly,the results of the combination model are analyzed by using the K-means clustering,and the clustering distribution is consistent with a normal distribution.The results show that the combination model based on parametric and non-parametric can effectively enhance the overall performance on default prediction.