摘要
引入数据挖掘技术,研究DNA序列数据内在规律性,并给出DNA序列分类问题的算法.综合考虑碱基组的出现概率以及相邻氨基酸之间的关系,从DNA序列片段的个案中密码子分布密度角度出发,提取出已知类别的DNA序列片段的特征;应用分类的逐步判别分析方法,剔除判别能力不显著的变量,给出DNA序列分类的判别函数.仿真结果表明,该算法具有分类计算公式简单且分类结果精度的优点.
Using data mining technology, the inherent regularity of DNA sequence data was investigated; the algorithm of DNA sequence classification was given. Based on the appearance probability of Tri-base Forms and the relationship between adjacent amino acids, and from the view of codon distribution density in the case of the DNA sequence segmentation, the characters of DNA sequence segmentation whose categories were known were obtained. Using the method of stepwise discriminant analysis, the insensitive variables in math model were deleted; the discriminant functions of DNA sequence classifications were established. The simulation results show that this Mgorithm is simple in structure and have a precise classification result.
出处
《生物数学学报》
CSCD
北大核心
2009年第2期363-368,共6页
Journal of Biomathematics