In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying result...In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying results by using conventional linear sta- tistical methods. Recursive feature elimination based on support vector machine (SVM RFE) is an effective algorithm for gene selection and cancer classification, which are integrated into a consistent framework. In this paper, we propose a new method to select parameters of the aforementioned algorithm implemented with Gaussian kernel SVMs as better alternatives to the common practice of selecting the apparently best parameters by using a genetic algorithm to search for a couple of optimal parameter. Fast implementation issues for this method are also discussed for pragmatic reasons. The proposed method was tested on two repre- sentative hereditary breast cancer and acute leukaemia datasets. The experimental results indicate that the proposed method per- forms well in selecting genes and achieves high classification accuracies with these genes.展开更多
Stroke and cerebral haemorrhage are the second leading causes of death in the world after ischaemic heart disease.In this work,a dataset containing medical,physiological and environmental tests for stroke was used to ...Stroke and cerebral haemorrhage are the second leading causes of death in the world after ischaemic heart disease.In this work,a dataset containing medical,physiological and environmental tests for stroke was used to evaluate the efficacy of machine learning,deep learning and a hybrid technique between deep learning and machine learning on theMagnetic Resonance Imaging(MRI)dataset for cerebral haemorrhage.In the first dataset(medical records),two features,namely,diabetes and obesity,were created on the basis of the values of the corresponding features.The t-Distributed Stochastic Neighbour Embedding algorithm was applied to represent the high-dimensional dataset in a low-dimensional data space.Meanwhile,the Recursive Feature Elimination algorithm(RFE)was applied to rank the features according to priority and their correlation to the target feature and to remove the unimportant features.The features are fed into the various classification algorithms,namely,Support Vector Machine(SVM),K Nearest Neighbours(KNN),Decision Tree,Random Forest,and Multilayer Perceptron.All algorithms achieved superior results.The Random Forest algorithm achieved the best performance amongst the algorithms;it reached an overall accuracy of 99%.This algorithm classified stroke cases with Precision,Recall and F1 score of 98%,100%and 99%,respectively.In the second dataset,the MRI image dataset was evaluated by using the AlexNet model and AlexNet+SVM hybrid technique.The hybrid model AlexNet+SVM performed is better than the AlexNet model;it reached accuracy,sensitivity,specificity and Area Under the Curve(AUC)of 99.9%,100%,99.80%and 99.86%,respectively.展开更多
Objective: Identification of colorectal cancer (CRC) metastasis genes is one of the most important issues in CRC research. For the purpose of mining CRC metastasis-associated genes, an integrated analysis of mJcroa...Objective: Identification of colorectal cancer (CRC) metastasis genes is one of the most important issues in CRC research. For the purpose of mining CRC metastasis-associated genes, an integrated analysis of mJcroarray data was presented, by combined with evidence acquired from comparative genornic hybridization (CGH) data. Methods: Gene expression profile data of CRC samples were obtained at Gene Expression Omnibus (GEO) website. The 15 important chromosomal aberration sites detected by using CGH technology were used for integrated genomic and transcriptomic analysis. Significant Analysis of Microarray (SAM) was used to detect significantly differentially expressed genes across the whole genome. The overlapping genes were selected in their corresponding chromosomal aberration regions, and analyzed by using the Database for Annotation, Visualization and Integrated Discovery (DAVID). Finally, SVM-T-RFE gene selection algorithm was applied to identify ted genes in CRC. Results: A minimum gene set was obtained with the minimum number [14] of genes, and the highest classification accuracy (100%) in both PRI and META datasets. A fraction of selected genes are associated with CRC or its metastasis. Conclusions- Our results demonstrated that integration analysis is an effective strategy for mining cancer- associated genes.展开更多
基金Project supported by the National Basic Research Program (973) of China (No. 2002CB312200) and the Center for Bioinformatics Pro-gram Grant of Harvard Center of Neurodegeneration and Repair,Harvard Medical School, Harvard University, Boston, USA
文摘In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying results by using conventional linear sta- tistical methods. Recursive feature elimination based on support vector machine (SVM RFE) is an effective algorithm for gene selection and cancer classification, which are integrated into a consistent framework. In this paper, we propose a new method to select parameters of the aforementioned algorithm implemented with Gaussian kernel SVMs as better alternatives to the common practice of selecting the apparently best parameters by using a genetic algorithm to search for a couple of optimal parameter. Fast implementation issues for this method are also discussed for pragmatic reasons. The proposed method was tested on two repre- sentative hereditary breast cancer and acute leukaemia datasets. The experimental results indicate that the proposed method per- forms well in selecting genes and achieves high classification accuracies with these genes.
文摘Stroke and cerebral haemorrhage are the second leading causes of death in the world after ischaemic heart disease.In this work,a dataset containing medical,physiological and environmental tests for stroke was used to evaluate the efficacy of machine learning,deep learning and a hybrid technique between deep learning and machine learning on theMagnetic Resonance Imaging(MRI)dataset for cerebral haemorrhage.In the first dataset(medical records),two features,namely,diabetes and obesity,were created on the basis of the values of the corresponding features.The t-Distributed Stochastic Neighbour Embedding algorithm was applied to represent the high-dimensional dataset in a low-dimensional data space.Meanwhile,the Recursive Feature Elimination algorithm(RFE)was applied to rank the features according to priority and their correlation to the target feature and to remove the unimportant features.The features are fed into the various classification algorithms,namely,Support Vector Machine(SVM),K Nearest Neighbours(KNN),Decision Tree,Random Forest,and Multilayer Perceptron.All algorithms achieved superior results.The Random Forest algorithm achieved the best performance amongst the algorithms;it reached an overall accuracy of 99%.This algorithm classified stroke cases with Precision,Recall and F1 score of 98%,100%and 99%,respectively.In the second dataset,the MRI image dataset was evaluated by using the AlexNet model and AlexNet+SVM hybrid technique.The hybrid model AlexNet+SVM performed is better than the AlexNet model;it reached accuracy,sensitivity,specificity and Area Under the Curve(AUC)of 99.9%,100%,99.80%and 99.86%,respectively.
基金supported by a grant from the National Natural Science Foundation of China(Grant No.61373057)a grant from the Zhejiang Provincial Natural Science Foundation of China(Grant No.Y1110763)
文摘Objective: Identification of colorectal cancer (CRC) metastasis genes is one of the most important issues in CRC research. For the purpose of mining CRC metastasis-associated genes, an integrated analysis of mJcroarray data was presented, by combined with evidence acquired from comparative genornic hybridization (CGH) data. Methods: Gene expression profile data of CRC samples were obtained at Gene Expression Omnibus (GEO) website. The 15 important chromosomal aberration sites detected by using CGH technology were used for integrated genomic and transcriptomic analysis. Significant Analysis of Microarray (SAM) was used to detect significantly differentially expressed genes across the whole genome. The overlapping genes were selected in their corresponding chromosomal aberration regions, and analyzed by using the Database for Annotation, Visualization and Integrated Discovery (DAVID). Finally, SVM-T-RFE gene selection algorithm was applied to identify ted genes in CRC. Results: A minimum gene set was obtained with the minimum number [14] of genes, and the highest classification accuracy (100%) in both PRI and META datasets. A fraction of selected genes are associated with CRC or its metastasis. Conclusions- Our results demonstrated that integration analysis is an effective strategy for mining cancer- associated genes.