期刊文献+

分布式数据挖掘中的最优K相异性取样技术 被引量:5

Sampling method using optimizable K-dissimilarity for distributed data mining
在线阅读 下载PDF
导出
摘要 为了弥补基于集中式处理的分布式数据挖掘方法的不足,有效地实施分布式数据挖掘(DDM)任务,需要一种能从分布式数据源中获取多样化代表性取样集的技术.提出了一种新的适用于分布式数据挖掘环境的数据取样算法(OptiSim-DDM方法),算法核心是基于最优K相异性进行数据选择,采用移动Agent技术和扩展的最优K相异性数据多样化代表性子集选择方法,能在各分布式数据场地中轮转选择出全局数据集的多样化代表性取样集.该方法通过降低所挖掘的数据集的数据规模来降低数据挖掘算法的时空复杂度,降低网络通讯代价,提高数据挖掘的执行效率,适合于各场地数据是互相关联和互相依赖的分布式数据挖掘任务.实验结果证实该方法是可行、有效的. A sampling method to obtain a diversity representative subset from distributed data sources is necessary to avoid the shortcomings of client-serve methods based on centralized datasets and to effectively perform distributed data mining tasks. A novel data sampling method for distributed data mining, OptiSim-DDM, is proposed. Its main idea is data selection using optimizable K-dissimilarity selection. The OptiSim-DDM is an integration of the technology of mobile agents and an extending optimizable K-dissimilarity selection method. A diversity representative sampling dataset selected in turn from distributed data cites can be generated by use of this method. Apart from being able to reduce the complexity of time and space and to decrease the communication costs as well as improving the efficiency of performing data mining tasks in distributed environment by scaling down the dataset for data mining, the OptiSim-DDM is suitable for the cases that data mining is performed on a special sampling dataset generated by means of interaction and inter-combination of sites dataset in the distributed environment. The experimental results show that the new method is effective and efficient.
出处 《东南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2008年第3期385-389,共5页 Journal of Southeast University:Natural Science Edition
基金 国家自然科学基金资助项目(70371015) 教育部高等学校博士点科研基金资助项目(20040286009) 福建省教育厅科技资助项目(JB06142)
关键词 分布式数据挖掘 最优K相异性选择算法 AGENT distributed data mining(DDM) optimizable K-dissimilarity selection method Agent
  • 相关文献

参考文献11

  • 1Park B, Kargupta H. Distributed data mining: algorithms, systems, and applications[M]. Hillsdale, NJ: Lawrence Erlbaum, 2003:341 - 361.
  • 2Zaki M J, Pan Y. Introduction: recent developments in parallel and distributed data mining[J]. Journal of Distrib Parallel Databases, 2002,11 ( 2 ) : 123 - 127.
  • 3Ashrafi M Z, Taniar D, Smit K A. A data mining architecture for distributed environments [C]//Innovative Internet Computing Systems, Lecture Notes in Computer Science. Berlin, Germany: Springer-Verlag, 2002, 2346 : 27 - 38.
  • 4Kargupta H, Park B. Collective data mining: a new perspective toward distributed data mining [ C ]//Advances in Distributed and Parallel Knowledge Discovery. Menlo Park. CA, USA: AAAI/MIT Press, 2000 : 131 - 178.
  • 5Cabri G, Leonardi L, Zambonelli F. Mobile agent technology: current trends and perspectives [EB/OL]. (2002-11-10) [2007-05-02 ]. http.//polaris. ing. unimo. it/MOON/papers/aica98, pdf.
  • 6Clark R D. OptiSim: an extended dissimilarity selection method for finding diverse representative subsets [J]. Journal of Chem Inf Computer Science, 1997,37 ( 6 ): 1181 - 1188.
  • 7Clark R D, Langton W J. Balancing representativeness against diversity using optimizable K-dissimilarity and hierarchical clustering [J]. Journal of Chem Inf Computer Science, 1998,38 ( 6 ): 1079 - 1086.
  • 8Soltanshahi F, Akella L, Clark R D. OptDesign : extending optimizable K-dissimilarity selection for use in combinatorial library design [J]. Journal of Chem Inf Computer Science, 2003,43( 3 ) : 829 - 836.
  • 9胡文瑜,孙志挥,周晓云.基于相异性选择的密度聚类算法研究[J].小型微型计算机系统,2006,27(9):1601-1604. 被引量:2
  • 10Zhong N, Matsui Y, Okuno T, et al. Framework of a multi-agent kdd system [C]//Proc of Intelligent Data Engineering and Automated Learning-IDEAL, Third International Conference. Manchester, UK: Springer- Verlag ,2002 : 337 - 346.

二级参考文献2

共引文献1

同被引文献33

引证文献5

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部