期刊文献+

一种快速非比对的蛋白质序列相似性与进化分析方法

A fast alignment⁃free method for protein sequence similarity and evolution analysis
在线阅读 下载PDF
导出
摘要 本文提出了一种新的快速非比对的蛋白质序列相似性与进化分析方法。在刻画蛋白质序列特征时,首先将氨基酸的10种理化性质通过主成分分析浓缩为6个主成分,并且将每条蛋白质序列里的氨基酸数目作为权重对主成分得分值进行加权平均,然后再融合氨基酸的位置信息构成一个26维的蛋白质序列特征向量,最后利用欧式距离度量蛋白质序列间的相似性及进化关系。通过对3个蛋白质序列数据集的测试表明,本文提出的方法能将每条蛋白质序列准确聚类,并且简便快捷,说明了该方法的有效性。 In this paper,we propose a new fast alignment⁃free method for protein sequence similarity and evolution analysis.First,10 groups of physicochemical properties of amino acids are reduced to 6 principal components using principal component analysis,and the number of amino acids in each protein sequence is used as weights to the scores of the principal components.Then,the amino acid position information is fused to form a 26⁃dimension feature vector for each protein sequence.Finally,the Euclidean distance is used to measure the similarity and evolutionary distance between protein sequences.The test on three datasets shows that our method can cluster each protein sequence accurately,which illustrates the validity of our method.
作者 艾亮 冯杰 AI Liang;FENG Jie(School of Science,Minzu University of China,Beijing 100081,China)
出处 《生物信息学》 2023年第3期179-186,共8页 Chinese Journal of Bioinformatics
关键词 蛋白质序列 主成分分析 相似性 系统进化树 Protein sequences Principal component analysis Similarity Phylogenetic trees
  • 相关文献

参考文献3

二级参考文献27

  • 1贾晓超,李培芳,罗辽复.基因组中“k字”频数的分布[J].内蒙古大学学报(自然科学版),2005,36(3):301-305. 被引量:3
  • 2Hamori E, Ruskin J. H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences[J]. Biol Chem, 258(2);1318-1327.
  • 3Gates M A. A simple way to look at DNA[J]. Theor Biol, 1986, 119(3) :319-328.
  • 4Nandy A. A new graphical representation and analysis of DNA sequence structure. I: Methodology and application to globin genes[J]. Curr Sci, 1994, 66(14): 309-314.
  • 5Leong P M, Morgenthaler S. Random walk and gap plots of DNA sequences[J]. Applic Biosc, 1995, 11(5): 503-511.
  • 6Randic M, Vracko M, Nandy A, et al. On 3-13 Graphical representation of DNA primary sequence and their numerical characterization[J]. Chem Inf Comput Sci, 2000, 40(5), 1235-1244.
  • 7Randic M, Vracko M, Lerg N, et al. Novel 2-D graphical representation of DNA sequence and their numerical characterization[J]. Chem Phys Lett, 2003, 368(14): 1-6.
  • 8Jeffrey H I. Chaos game representation of gene structure[J]. Nucleic Acids Res, 1990, 18(8) : 2163-2170.
  • 9Zhang C T, Zhang R. Analysis of distribution of bases in the coding sequences by a diagrammatic technique[J]. Nucleic Acids Res, 1991, 19(22): 6313-6317.
  • 10Randic M, Vraeko M, Basak S C. On 3-D graphicai representation of DNA primary sequences and their numerical characterization[J]. Chem Inf Comput Sci, 2000, 40(5) : 1235-1244.

共引文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部