Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets(VLDS).In this work,a novel division and partition clustering method(DP...Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets(VLDS).In this work,a novel division and partition clustering method(DP) was proposed to solve the problem.DP cut the source data set into data blocks,and extracted the eigenvector for each data block to form the local feature set.The local feature set was used in the second round of the characteristics polymerization process for the source data to find the global eigenvector.Ultimately according to the global eigenvector,the data set was assigned by criterion of minimum distance.The experimental results show that it is more robust than the conventional clusterings.Characteristics of not sensitive to data dimensions,distribution and number of nature clustering make it have a wide range of applications in clustering VLDS.展开更多
Monotonic regression (MR) is a least distance problem with monotonicity constraints induced by a partiaily ordered data set of observations. In our recent publication [In Ser. Nonconvex Optimization and Its Applicat...Monotonic regression (MR) is a least distance problem with monotonicity constraints induced by a partiaily ordered data set of observations. In our recent publication [In Ser. Nonconvex Optimization and Its Applications, Springer-Verlag, (2006) 83, pp. 25-33], the Pool-Adjazent-Violators algorithm (PAV) was generalized from completely to partially ordered data sets (posets). The new algorithm, called CPAV, is characterized by the very low computational complexity, which is of second order in the number of observations. It treats the observations in a consecutive order, and it can follow any arbitrarily chosen topological order of the poset of observations. The CPAV algorithm produces a sufficiently accurate solution to the MR problem, but the accuracy depends on the chosen topological order. Here we prove that there exists a topological order for which the resulted CPAV solution is optimal. Furthermore, we present results of extensive numerical experiments, from which we draw conclusions about the most and the least preferable topological orders.展开更多
提出了在输入-输出积空间中利用监督模糊聚类技术快速建立粗糙数据模型(rough data model,简称RDM)的一种方法.该方法将RDM模型的分类质量性能指标与具有良好特性的Gustafson-Kessel(G-K)聚类算法结合在一起,并通过引入数据对模糊类的...提出了在输入-输出积空间中利用监督模糊聚类技术快速建立粗糙数据模型(rough data model,简称RDM)的一种方法.该方法将RDM模型的分类质量性能指标与具有良好特性的Gustafson-Kessel(G-K)聚类算法结合在一起,并通过引入数据对模糊类的推定隶属度的概念,给出了将模糊聚类模型转化为粗糙数据模型的方法,从而设计出一种通过迭代计算使目标函数最小的两个必要条件方程来获取RDM模型的有效算法,将Kowalczyk方法的多维搜索过程变为以聚类数目为参数的一维搜索,极大地减少了寻优时间.与传统的粗糙集理论和Kowalczyk方法相比,提出的方法具有更好的数据概括能力和噪声数据处理能力.最后,通过不同的数据集实验测试,结果表明了该方法的有效性.展开更多
基金Supported by National Natural Science Foundation of China(60675039)National High Technology Research and Development Program of China(863 Program)(2006AA04Z217)Hundred Talents Program of Chinese Academy of Sciences
基金Projects(60903082,60975042)supported by the National Natural Science Foundation of ChinaProject(20070217043)supported by the Research Fund for the Doctoral Program of Higher Education of China
文摘Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets(VLDS).In this work,a novel division and partition clustering method(DP) was proposed to solve the problem.DP cut the source data set into data blocks,and extracted the eigenvector for each data block to form the local feature set.The local feature set was used in the second round of the characteristics polymerization process for the source data to find the global eigenvector.Ultimately according to the global eigenvector,the data set was assigned by criterion of minimum distance.The experimental results show that it is more robust than the conventional clusterings.Characteristics of not sensitive to data dimensions,distribution and number of nature clustering make it have a wide range of applications in clustering VLDS.
文摘Monotonic regression (MR) is a least distance problem with monotonicity constraints induced by a partiaily ordered data set of observations. In our recent publication [In Ser. Nonconvex Optimization and Its Applications, Springer-Verlag, (2006) 83, pp. 25-33], the Pool-Adjazent-Violators algorithm (PAV) was generalized from completely to partially ordered data sets (posets). The new algorithm, called CPAV, is characterized by the very low computational complexity, which is of second order in the number of observations. It treats the observations in a consecutive order, and it can follow any arbitrarily chosen topological order of the poset of observations. The CPAV algorithm produces a sufficiently accurate solution to the MR problem, but the accuracy depends on the chosen topological order. Here we prove that there exists a topological order for which the resulted CPAV solution is optimal. Furthermore, we present results of extensive numerical experiments, from which we draw conclusions about the most and the least preferable topological orders.
文摘提出了在输入-输出积空间中利用监督模糊聚类技术快速建立粗糙数据模型(rough data model,简称RDM)的一种方法.该方法将RDM模型的分类质量性能指标与具有良好特性的Gustafson-Kessel(G-K)聚类算法结合在一起,并通过引入数据对模糊类的推定隶属度的概念,给出了将模糊聚类模型转化为粗糙数据模型的方法,从而设计出一种通过迭代计算使目标函数最小的两个必要条件方程来获取RDM模型的有效算法,将Kowalczyk方法的多维搜索过程变为以聚类数目为参数的一维搜索,极大地减少了寻优时间.与传统的粗糙集理论和Kowalczyk方法相比,提出的方法具有更好的数据概括能力和噪声数据处理能力.最后,通过不同的数据集实验测试,结果表明了该方法的有效性.