Support vector clustering (SVC) is an important boundary-based clustering algorithm in multiple applications for its capability of handling arbitrary cluster shapes.However,SVC's popularity is degraded by its highl...Support vector clustering (SVC) is an important boundary-based clustering algorithm in multiple applications for its capability of handling arbitrary cluster shapes.However,SVC's popularity is degraded by its highly intensive time complexity and poor label performance.To overcome such problems,we present a novel efficient and robust convex decomposition based cluster labeling (CDCL) method based on the topological property of dataset.The CDCL decomposes the implicit cluster into convex hulls and each one is comprised by a subset of support vectors (SVs).According to a robust algorithm applied in the nearest neighboring convex hulls,the adjacency matrix of convex hulls is built up for finding the connected components;and the remaining data points would be assigned the label of the nearest convex hull appropriately.The approach's validation is guaranteed by geometric proofs.Time complexity analysis and comparative experiments suggest that CDCL improves both the efficiency and clustering quality significantly.展开更多
The support vector machine (SVM) is a novel machine learning tool in data mining. In this paper, the geometric approach based on the compressed convex hull (CCH) with a mathematical framework is introduced to solv...The support vector machine (SVM) is a novel machine learning tool in data mining. In this paper, the geometric approach based on the compressed convex hull (CCH) with a mathematical framework is introduced to solve SVM classification problems. Compared with the reduced convex hull (RCH), CCH preserves the shape of geometric solids for data sets; meanwhile, it is easy to give the necessary and sufficient condition for determining its extreme points. As practical applications of CCH, spare and probabilistic speed-up geometric algorithms are developed. Results of numerical experiments show that the proposed algorithms can reduce kernel calculations and display nice performances.展开更多
Towards optimal k-prototype discovery,k-means-like algorithms give us inspirations of central samples collection,yet the unstable seed samples selection,the hypothesis of a circle-like pattern,and the unknown K are st...Towards optimal k-prototype discovery,k-means-like algorithms give us inspirations of central samples collection,yet the unstable seed samples selection,the hypothesis of a circle-like pattern,and the unknown K are still challenges,particularly for non-predetermined data patterns.We propose an adaptive k-prototype clustering method(kProtoClust)which launches cluster exploration with a sketchy division of K clusters and finds evidence for splitting and merging.On behalf of a group of data samples,support vectors and outliers from the perspective of support vector data description are not the appropriate candidates for prototypes,while inner samples become the first candidates for instability reduction of seeds.Different from the representation of samples in traditional,we extend sample selection by encouraging fictitious samples to emphasize the representativeness of patterns.To get out of the circle-like pattern limitation,we introduce a convex decomposition-based strategy of one-cluster-multiple-prototypes in which convex hulls of varying sizes are prototypes,and accurate connection analysis makes the support of arbitrary cluster shapes possible.Inspired by geometry,the three presented strategies make kProtoClust bypassing the K dependence well with the global and local position relationship analysis for data samples.Experimental results on twelve datasets of irregular cluster shape or high dimension suggest that kProtoClust handles arbitrary cluster shapes with prominent accuracy even without the prior knowledge K.展开更多
基金supported by the National Natural Science Foundation of China under Grant No. 60972077 and partially under Grant No. 70921061the National Science and Technology Major Program under Grant No. 2010ZX03003-003-01+1 种基金the Natural Science Foundation of Beijing under Grant No. 9092009the Fundamental Research Funds for the Central Universities under Grant No.2011RC0212
文摘Support vector clustering (SVC) is an important boundary-based clustering algorithm in multiple applications for its capability of handling arbitrary cluster shapes.However,SVC's popularity is degraded by its highly intensive time complexity and poor label performance.To overcome such problems,we present a novel efficient and robust convex decomposition based cluster labeling (CDCL) method based on the topological property of dataset.The CDCL decomposes the implicit cluster into convex hulls and each one is comprised by a subset of support vectors (SVs).According to a robust algorithm applied in the nearest neighboring convex hulls,the adjacency matrix of convex hulls is built up for finding the connected components;and the remaining data points would be assigned the label of the nearest convex hull appropriately.The approach's validation is guaranteed by geometric proofs.Time complexity analysis and comparative experiments suggest that CDCL improves both the efficiency and clustering quality significantly.
基金Supported by the National Natural Science Foundation of China (No. 30571059)the National High-Tech Research and Development Program of China (No. 2006AA02Z190)Shanghai Leading Academic Discipline Project (No. 530405)
文摘The support vector machine (SVM) is a novel machine learning tool in data mining. In this paper, the geometric approach based on the compressed convex hull (CCH) with a mathematical framework is introduced to solve SVM classification problems. Compared with the reduced convex hull (RCH), CCH preserves the shape of geometric solids for data sets; meanwhile, it is easy to give the necessary and sufficient condition for determining its extreme points. As practical applications of CCH, spare and probabilistic speed-up geometric algorithms are developed. Results of numerical experiments show that the proposed algorithms can reduce kernel calculations and display nice performances.
基金supported by the National Natural Science Foundation of China under Grant No.62162009the Key Technologies R&D Program of He’nan Province under Grant No.242102211065+1 种基金the Scientific Research Innovation Team of Xuchang University under GrantNo.2022CXTD003Postgraduate Education Reform and Quality Improvement Project of Henan Province under Grant No.YJS2024JD38.
文摘Towards optimal k-prototype discovery,k-means-like algorithms give us inspirations of central samples collection,yet the unstable seed samples selection,the hypothesis of a circle-like pattern,and the unknown K are still challenges,particularly for non-predetermined data patterns.We propose an adaptive k-prototype clustering method(kProtoClust)which launches cluster exploration with a sketchy division of K clusters and finds evidence for splitting and merging.On behalf of a group of data samples,support vectors and outliers from the perspective of support vector data description are not the appropriate candidates for prototypes,while inner samples become the first candidates for instability reduction of seeds.Different from the representation of samples in traditional,we extend sample selection by encouraging fictitious samples to emphasize the representativeness of patterns.To get out of the circle-like pattern limitation,we introduce a convex decomposition-based strategy of one-cluster-multiple-prototypes in which convex hulls of varying sizes are prototypes,and accurate connection analysis makes the support of arbitrary cluster shapes possible.Inspired by geometry,the three presented strategies make kProtoClust bypassing the K dependence well with the global and local position relationship analysis for data samples.Experimental results on twelve datasets of irregular cluster shape or high dimension suggest that kProtoClust handles arbitrary cluster shapes with prominent accuracy even without the prior knowledge K.