Kernel-based methods work by embedding the data into a feature space and then searching linear hypothesis among the embedding data points. The performance is mostly affected by which kernel is used. A promising way is...Kernel-based methods work by embedding the data into a feature space and then searching linear hypothesis among the embedding data points. The performance is mostly affected by which kernel is used. A promising way is to learn the kernel from the data automatically. A general regularized risk functional (RRF) criterion for kernel matrix learning is proposed. Compared with the RRF criterion, general RRF criterion takes into account the geometric distributions of the embedding data points. It is proven that the distance between different geometric distdbutions can be estimated by their centroid distance in the reproducing kernel Hilbert space. Using this criterion for kernel matrix learning leads to a convex quadratically constrained quadratic programming (QCQP) problem. For several commonly used loss functions, their mathematical formulations are given. Experiment results on a collection of benchmark data sets demonstrate the effectiveness of the proposed method.展开更多
The polar codes defined by the kernel matrix are a class of codes with low coding-decoding complexity and can achieve the Shannon limit. In this paper, a novel method to construct the 2<sup>n</sup>-dimensi...The polar codes defined by the kernel matrix are a class of codes with low coding-decoding complexity and can achieve the Shannon limit. In this paper, a novel method to construct the 2<sup>n</sup>-dimensional kernel matrix is proposed, that is based on primitive BCH codes that make use of the interception, the direct sum and adding a row and a column. For ensuring polarization of the kernel matrix, a solution is also put forward when the partial distances of the constructed kernel matrix exceed their upper bound. And the lower bound of exponent of the 2<sup>n</sup>-dimensional kernel matrix is obtained. The lower bound of exponent of our constructed kernel matrix is tighter than Gilbert-Varshamov (G-V) type, and the scaling exponent is better in the case of 16-dimensional.展开更多
We study support vector machines (SVM) for which the kernel matrix is not specified exactly and it is only known to belong to a given uncertainty set. We consider uncertainties that arise from two sources: (i) da...We study support vector machines (SVM) for which the kernel matrix is not specified exactly and it is only known to belong to a given uncertainty set. We consider uncertainties that arise from two sources: (i) data measurement uncertainty, which stems from the statistical errors of input samples; (ii) kernel combination uncertainty, which stems from the weight of individual kernel that needs to be optimized in multiple kernel learning (MKL) problem. Much work has been studied, such as uncertainty sets that allow the corresponding SVMs to be reformulated as semi-definite programs (SDPs), which is very computationally expensive however. Our focus in this paper is to identify uncertainty sets that allow the corresponding SVMs to be reformulated as second-order cone programs (SOCPs), since both the worst case complexity and practical computational effort required to solve SOCPs is at least an order of magnitude less than that needed to solve SDPs of comparable size. In the main part of the paper we propose four uncertainty sets that meet this criterion. Experimental results are presented to confirm the validity of these SOCP reformulations.展开更多
The kernel energy method(KEM) has been shown to provide fast and accurate molecular energy calculations for molecules at their equilibrium geometries.KEM breaks a molecule into smaller subsets,called kernels,for the p...The kernel energy method(KEM) has been shown to provide fast and accurate molecular energy calculations for molecules at their equilibrium geometries.KEM breaks a molecule into smaller subsets,called kernels,for the purposes of calculation.The results from the kernels are summed according to an expression characteristic of KEM to obtain the full molecule energy.A generalization of the kernel expansion to density matrices provides the full molecule density matrix and orbitals.In this study,the kernel expansion for the density matrix is examined in the context of density functional theory(DFT) Kohn-Sham(KS) calculations.A kernel expansion for the one-body density matrix analogous to the kernel expansion for energy is defined,and is then converted into a normalizedprojector by using the Clinton algorithm.Such normalized projectors are factorizable into linear combination of atomic orbitals(LCAO) matrices that deliver full-molecule Kohn-Sham molecular orbitals in the atomic orbital basis.Both straightforward KEM energies and energies from a normalized,idempotent density matrix obtained from a density matrix kernel expansion to which the Clinton algorithm has been applied are compared to reference energies obtained from calculations on the full system without any kernel expansion.Calculations were performed both for a simple proof-of-concept system consisting of three atoms in a linear configuration and for a water cluster consisting of twelve water molecules.In the case of the proof-of-concept system,calculations were performed using the STO-3 G and6-31 G(d,p) bases over a range of atomic separations,some very far from equilibrium.The water cluster was calculated in the 6-31 G(d,p) basis at an equilibrium geometry.The normalized projector density energies are more accurate than the straightforward KEM energy results in nearly all cases.In the case of the water cluster,the energy of the normalized projector is approximately four times more accurate than the straightforward KEM energy result.The KS density matrices of this study are applicable to quantum crystallography.展开更多
The paper is related to the norm estimate of Mercer kernel matrices. The lower and upper bound estimates of Rayleigh entropy numbers for some Mercer kernel matrices on [0, 1] × [0, 1] based on the Bernstein-Durrm...The paper is related to the norm estimate of Mercer kernel matrices. The lower and upper bound estimates of Rayleigh entropy numbers for some Mercer kernel matrices on [0, 1] × [0, 1] based on the Bernstein-Durrmeyer operator kernel are obtained, with which and the approximation property of the Bernstein-Durrmeyer operator the lower and upper bounds of the Rayleigh entropy number and the l2 -norm for general Mercer kernel matrices on [0, 1] x [0, 1] are provided.展开更多
Driven by the challenge of integrating large amount of experimental data, classification technique emerges as one of the major and popular tools in computational biology and bioinformatics research. Machine learning m...Driven by the challenge of integrating large amount of experimental data, classification technique emerges as one of the major and popular tools in computational biology and bioinformatics research. Machine learning methods, especially kernel methods with Support Vector Machines (SVMs) are very popular and effective tools. In the perspective of kernel matrix, a technique namely Eigen- matrix translation has been introduced for protein data classification. The Eigen-matrix translation strategy has a lot of nice properties which deserve more exploration. This paper investigates the major role of Eigen-matrix translation in classification. The authors propose that its importance lies in the dimension reduction of predictor attributes within the data set. This is very important when the dimension of features is huge. The authors show by numerical experiments on real biological data sets that the proposed framework is crucial and effective in improving classification accuracy. This can therefore serve as a novel perspective for future research in dimension reduction problems.展开更多
基金supported by the National Natural Science Fundation of China (60736021)the Joint Funds of NSFC-Guangdong Province(U0735003)
文摘Kernel-based methods work by embedding the data into a feature space and then searching linear hypothesis among the embedding data points. The performance is mostly affected by which kernel is used. A promising way is to learn the kernel from the data automatically. A general regularized risk functional (RRF) criterion for kernel matrix learning is proposed. Compared with the RRF criterion, general RRF criterion takes into account the geometric distributions of the embedding data points. It is proven that the distance between different geometric distdbutions can be estimated by their centroid distance in the reproducing kernel Hilbert space. Using this criterion for kernel matrix learning leads to a convex quadratically constrained quadratic programming (QCQP) problem. For several commonly used loss functions, their mathematical formulations are given. Experiment results on a collection of benchmark data sets demonstrate the effectiveness of the proposed method.
文摘The polar codes defined by the kernel matrix are a class of codes with low coding-decoding complexity and can achieve the Shannon limit. In this paper, a novel method to construct the 2<sup>n</sup>-dimensional kernel matrix is proposed, that is based on primitive BCH codes that make use of the interception, the direct sum and adding a row and a column. For ensuring polarization of the kernel matrix, a solution is also put forward when the partial distances of the constructed kernel matrix exceed their upper bound. And the lower bound of exponent of the 2<sup>n</sup>-dimensional kernel matrix is obtained. The lower bound of exponent of our constructed kernel matrix is tighter than Gilbert-Varshamov (G-V) type, and the scaling exponent is better in the case of 16-dimensional.
基金supported in part by the National Natural Science Foundation of China under Grant No.60678049Natural Science Foundation of Tianjin under Grant No.07JCYBJC14600
文摘We study support vector machines (SVM) for which the kernel matrix is not specified exactly and it is only known to belong to a given uncertainty set. We consider uncertainties that arise from two sources: (i) data measurement uncertainty, which stems from the statistical errors of input samples; (ii) kernel combination uncertainty, which stems from the weight of individual kernel that needs to be optimized in multiple kernel learning (MKL) problem. Much work has been studied, such as uncertainty sets that allow the corresponding SVMs to be reformulated as semi-definite programs (SDPs), which is very computationally expensive however. Our focus in this paper is to identify uncertainty sets that allow the corresponding SVMs to be reformulated as second-order cone programs (SOCPs), since both the worst case complexity and practical computational effort required to solve SOCPs is at least an order of magnitude less than that needed to solve SDPs of comparable size. In the main part of the paper we propose four uncertainty sets that meet this criterion. Experimental results are presented to confirm the validity of these SOCP reformulations.
文摘The kernel energy method(KEM) has been shown to provide fast and accurate molecular energy calculations for molecules at their equilibrium geometries.KEM breaks a molecule into smaller subsets,called kernels,for the purposes of calculation.The results from the kernels are summed according to an expression characteristic of KEM to obtain the full molecule energy.A generalization of the kernel expansion to density matrices provides the full molecule density matrix and orbitals.In this study,the kernel expansion for the density matrix is examined in the context of density functional theory(DFT) Kohn-Sham(KS) calculations.A kernel expansion for the one-body density matrix analogous to the kernel expansion for energy is defined,and is then converted into a normalizedprojector by using the Clinton algorithm.Such normalized projectors are factorizable into linear combination of atomic orbitals(LCAO) matrices that deliver full-molecule Kohn-Sham molecular orbitals in the atomic orbital basis.Both straightforward KEM energies and energies from a normalized,idempotent density matrix obtained from a density matrix kernel expansion to which the Clinton algorithm has been applied are compared to reference energies obtained from calculations on the full system without any kernel expansion.Calculations were performed both for a simple proof-of-concept system consisting of three atoms in a linear configuration and for a water cluster consisting of twelve water molecules.In the case of the proof-of-concept system,calculations were performed using the STO-3 G and6-31 G(d,p) bases over a range of atomic separations,some very far from equilibrium.The water cluster was calculated in the 6-31 G(d,p) basis at an equilibrium geometry.The normalized projector density energies are more accurate than the straightforward KEM energy results in nearly all cases.In the case of the water cluster,the energy of the normalized projector is approximately four times more accurate than the straightforward KEM energy result.The KS density matrices of this study are applicable to quantum crystallography.
基金Supported by the Science Foundation of Zhejiang Province(Y604003)
文摘The paper is related to the norm estimate of Mercer kernel matrices. The lower and upper bound estimates of Rayleigh entropy numbers for some Mercer kernel matrices on [0, 1] × [0, 1] based on the Bernstein-Durrmeyer operator kernel are obtained, with which and the approximation property of the Bernstein-Durrmeyer operator the lower and upper bounds of the Rayleigh entropy number and the l2 -norm for general Mercer kernel matrices on [0, 1] x [0, 1] are provided.
基金supported by Research Grants Council of Hong Kong under Grant No.17301214HKU CERG Grants,Fundamental Research Funds for the Central Universities+2 种基金the Research Funds of Renmin University of ChinaHung Hing Ying Physical Research Grantthe Natural Science Foundation of China under Grant No.11271144
文摘Driven by the challenge of integrating large amount of experimental data, classification technique emerges as one of the major and popular tools in computational biology and bioinformatics research. Machine learning methods, especially kernel methods with Support Vector Machines (SVMs) are very popular and effective tools. In the perspective of kernel matrix, a technique namely Eigen- matrix translation has been introduced for protein data classification. The Eigen-matrix translation strategy has a lot of nice properties which deserve more exploration. This paper investigates the major role of Eigen-matrix translation in classification. The authors propose that its importance lies in the dimension reduction of predictor attributes within the data set. This is very important when the dimension of features is huge. The authors show by numerical experiments on real biological data sets that the proposed framework is crucial and effective in improving classification accuracy. This can therefore serve as a novel perspective for future research in dimension reduction problems.
文摘We exploit the theory of reproducing kernels to deduce a matrix inequality for the inverse of the restriction of a positive definite Hermitian matrix.