摘要
目的探索代谢组学研究中数据处理的新方法。方法本文提出了在代谢组学数据预处理中,用稳健PCA的方法进行离群样品点的诊断,用变量的类内差异和类间差异的比较来判断非保守性代谢组分,用尺度同一化的方法进行数据预处理来消除数据的尺度差异。并以Arabidopsis thaliana属的四个基因型的植株代谢组学的数据为例,用以上的方法进行数据预处理后再用PCA的方法分析。结果与结论研究表明这三种数据预处理方法的应用会明显的改善代谢组学生物信息学分析中聚类分析的结果和生物标志物识别的准确性及全面性。
Aim To search for and application of new method for data processing in metabonomic studies. Methods The paper proposed that in the processing of metabonomic data, robust PCA method can be used to diagnose outliers; and unstable variables judged by comparison between difference within class and difference among classes should be excluded before data analysis; moreover, the data should be properly scaled before further processing. The proposed methods were used to preprocess metabolomic data of four genotypes of the Arabidopsis thaliana plants. Results and Conclusion The outcome demonstrated that the application of these methods can obviously improve clustering and biomarker identifying results.
出处
《药学学报》
CAS
CSCD
北大核心
2006年第1期47-53,共7页
Acta Pharmaceutica Sinica
基金
科技部国家重点基础研究发展计划(973计划)资助项目(2004CB518902)
国家高技术研究发展计划(863计划)资助项目(2003AA2Z347D)