DNA N6-甲基腺嘌呤(6mA)是一种重要的表观遗传修饰,参与基因调控、DNA复制和修复等生物过程,对疾病研究也具有重要意义,准确识别DNA 6mA位点对理解其功能和机制至关重要。尽管现有的NA 6mA位点预测方法已取得较大成功,但在预测精度和跨...DNA N6-甲基腺嘌呤(6mA)是一种重要的表观遗传修饰,参与基因调控、DNA复制和修复等生物过程,对疾病研究也具有重要意义,准确识别DNA 6mA位点对理解其功能和机制至关重要。尽管现有的NA 6mA位点预测方法已取得较大成功,但在预测精度和跨物种泛化能力上仍有改进空间。本文提出了一种结合双向长短期记忆网络(BiLSTM)和卷积神经网络(CNN)的混合深度学习模型(BiLSTM→CNN)来提高对DNA 6mA位点预测的能力。模型首先采用one-hot、EIIP和DNA二聚体三种编码方式对DNA序列进行编码,然后在不同网络结构、层数和优化器下优化模型。通过在蔷薇科植物、水稻和拟南芥的数据集上的广泛实验表明,BiLSTM→CNN模型在蔷薇科植物中的准确率(ACC)为94.5%,在水稻中为93.8%,在拟南芥中为86.6%。与其他方法相比,BiLSTM→CNN模型在三个植物物种的6mA位点预测中均展现出良好的性能,并具有出色的跨物种泛化能力。DNA N6-methyladenine (6mA) is an important epigenetic modification involved in biological processes such as gene regulation, DNA replication, and repair, making it significant for disease research. Therefore, accurately identifying DNA 6mA sites is crucial for understanding their functions and mechanisms. Despite notable successes with existing methods, there is still room for improvement in prediction accuracy and cross-species generalization. In this study, we propose a hybrid deep learning model (BiLSTM→CNN) that integrates bidirectional long short-term memory networks (BiLSTM) and convolutional neural networks (CNN). Firstly, the model-encoded DNA sequences employ one-hot encoding, EIIP encoding, and DNA dimer encoding. And then optimized under various network architectures, layer configurations and optimizers. We conducted experiments on datasets from Rosaceae, rice and Arabidopsis thaliana, the results indicate that the BiLSTM→CNNmodel achieves an accuracy (ACC) of 94.5% for Rosaceae, 93.8% for rice, and 86.6% for Arabidopsis. Compared to other methods, BiLSTM→CNNdemonstrates excellent performance in predicting 6mA sites across the three plant species, and exhibits cross-species generalization capabilities.展开更多
文摘DNA N6-甲基腺嘌呤(6mA)是一种重要的表观遗传修饰,参与基因调控、DNA复制和修复等生物过程,对疾病研究也具有重要意义,准确识别DNA 6mA位点对理解其功能和机制至关重要。尽管现有的NA 6mA位点预测方法已取得较大成功,但在预测精度和跨物种泛化能力上仍有改进空间。本文提出了一种结合双向长短期记忆网络(BiLSTM)和卷积神经网络(CNN)的混合深度学习模型(BiLSTM→CNN)来提高对DNA 6mA位点预测的能力。模型首先采用one-hot、EIIP和DNA二聚体三种编码方式对DNA序列进行编码,然后在不同网络结构、层数和优化器下优化模型。通过在蔷薇科植物、水稻和拟南芥的数据集上的广泛实验表明,BiLSTM→CNN模型在蔷薇科植物中的准确率(ACC)为94.5%,在水稻中为93.8%,在拟南芥中为86.6%。与其他方法相比,BiLSTM→CNN模型在三个植物物种的6mA位点预测中均展现出良好的性能,并具有出色的跨物种泛化能力。DNA N6-methyladenine (6mA) is an important epigenetic modification involved in biological processes such as gene regulation, DNA replication, and repair, making it significant for disease research. Therefore, accurately identifying DNA 6mA sites is crucial for understanding their functions and mechanisms. Despite notable successes with existing methods, there is still room for improvement in prediction accuracy and cross-species generalization. In this study, we propose a hybrid deep learning model (BiLSTM→CNN) that integrates bidirectional long short-term memory networks (BiLSTM) and convolutional neural networks (CNN). Firstly, the model-encoded DNA sequences employ one-hot encoding, EIIP encoding, and DNA dimer encoding. And then optimized under various network architectures, layer configurations and optimizers. We conducted experiments on datasets from Rosaceae, rice and Arabidopsis thaliana, the results indicate that the BiLSTM→CNNmodel achieves an accuracy (ACC) of 94.5% for Rosaceae, 93.8% for rice, and 86.6% for Arabidopsis. Compared to other methods, BiLSTM→CNNdemonstrates excellent performance in predicting 6mA sites across the three plant species, and exhibits cross-species generalization capabilities.