期刊文献+

Benford法则在水文数据质量挖掘中的应用研究 被引量:3

An Application Research of Bedford′s Law in Hydrological Data Quality Mining
在线阅读 下载PDF
导出
摘要 为识别和改进数据中存在的质量问题,采用Benford法则进行数据质量挖掘分析,该方法通过分析数字分布规律来检测数据的合理性,达到控制数据质量的目的.以水文数据中降水量数据为样本验证方法的有效性.实验结果表明,该方法能有效识别数据集中存在异常信息,提高了水文数据的数据质量,具有一定的应用前景. In order to identify and improve the data quality problems,this paper adopts a data quality mining methods based on Benford's Law.This method detects the rationality of data through analyzing the distribution of the data to reach the goal of data quality control.Finally,we used precipitation data as a sample to verify the validity of this method.The results show that the method can effectively identify the abnormal data,improving the data quality,and has certain application prospect.
出处 《微电子学与计算机》 CSCD 北大核心 2011年第8期180-183,186,共5页 Microelectronics & Computer
基金 国家自然科学基金项目(51079040) 水利部"九四八"项目(201016)
关键词 水文数据 数据质量 数据挖掘 BENFORD法则 hydrological data data quality data mining Benford′s Law
  • 相关文献

参考文献9

  • 1Cappiello C, Francalanci C, Pernici B. Data quality as- sessment from the user' s perspective[C]//Proceedings of the 2004 international workshop on Information qual- ity in information systems. NewYork, NY, USA. ACM, 2004.
  • 2Strong D, Lee Y, Wang R. Data quality in context[J]. Commun ACM, 1997,40(5) : 103-110.
  • 3Daniel Aebi, Louis Perroehon. Towards improving data quality[C]//Proc of the international conference on in- formation systems and management of data. Delhi: Louis Perrochon, 1993: 273-281.
  • 4张昌年.一种基于VSM的检测相似重复记录的方法[J].微电子学与计算机,2008,25(8):184-187. 被引量:10
  • 5韩京宇,徐立臻,董逸生.数据质量研究综述[J].计算机科学,2008,35(2):1-5. 被引量:105
  • 6Breunig M M, Kriegel H P,Ng R, et al. LOF: Id--en tifying density--based local outliers[C]// Proc SIG-- MOD Conf on Management of Data, NY, USA: ACM, 2000,.
  • 7Hipp J,Guntzer U, Grimmer U. Data quality mining--mak- ing a virtue of necessity[C]// Proc of the 6th ACM SIC,--MOD Workshop on DMICD. [s. 1. ]: ACM, 2001:52-57.
  • 8Dominik Luebbers,Udo Grimmer, Matthias Jarke. Sy- stematic development of data mining-based data quality Tools proceedings of the 29th international conference on very Large data. Berlin, Germany: ACM, 2003.
  • 9Nigrini M J, Mittermaier L J. The use of benford's law as an aid in analytical procedures~J~. Auditing: A Journal of Practice&Theory, 1997,16(2) : 52-67.

二级参考文献85

  • 1程国达,苏杭丽.一种检测汉语相似重复记录的有效方法[J].计算机应用,2005,25(6):1362-1365. 被引量:8
  • 2韩京宇,徐立臻,董逸生.一种大数据量的相似记录检测方法[J].计算机研究与发展,2005,42(12):2206-2212. 被引量:32
  • 3Monge A, Elkan C. An efficient domain-independent algorithm for detecting approximately duplicate database records [C]. In: Proceedings of the ACM-SIGMOD Workshop on Research Issues on Knowledge Discovery and Data Mining,Tucson, AZ, 1997.
  • 4Motro A, Rakov I. Estimating the quality of data in relational databases [C]. In.. Proeeedings of the 1996 Conferenee on Informtion Quality, Cambridge, Massaehusetts, Oetober 1996.
  • 5Motro A, Anokhin P, Acar A C. Utility-based resolution of data inconsistencies [C]. IQIS 2004. 35-43.
  • 6Parssian A, Sarkar S, Jacob V S. Assessing data quality for information products [C]. 1999.
  • 7Parssian A, Sarkar S, Jacob V S. Assessing information quality for the composite relational operation ioins [C]. In:Proc. of Seventh International Conference on Information Quality, 2002.
  • 8Kahn B K, Strong D M. Product and Service Performance Model for Information Quality: An Update. IQ 1998. 102-115.
  • 9Barnett V , Lewis T. Outliers in statistical data. New York: John Wiley and Sons Inc , 1994.
  • 10Liu B, Hsu W, Ma Y. Integrating classification and association rule mining [C]. In.. Proc. of 4^th International Conference on Knowledge Discovery and Data Mining (KDD98), ACM press, 1998. 80-86.

共引文献112

同被引文献21

  • 1左其亭,高峰.水文时间序列周期叠加预测模型及3种改进模型[J].郑州大学学报(工学版),2004,25(4):67-73. 被引量:13
  • 2王文,马骏.若干水文预报方法综述[J].水利水电科技进展,2005,25(1):56-60. 被引量:83
  • 3丁海龙,徐宏炳.数据质量分析及应用[J].计算机技术与发展,2007,17(3):236-238. 被引量:35
  • 4陈卫东,张维明.数据质量模型及选择运算中的质量传播研究[J].计算机工程与应用,2007,43(27):1-3. 被引量:4
  • 5Parssian A, Sumit S, Varghese J S. Assessing data quality for information products : impact of selection, projection, and carte- sian product [ J ]. Management Science, 2004,50 ( 7 ) : 967 - 982.
  • 6Yang W L, Wang R Y, Ziad M. Data quality [ M ]. New York : Kluwer Academic Publishers,2001.
  • 7Pipino L L,Lee Y W,Wang R Y. Data Assessment[ J]. Com- munications of the ACM ,2002,45 (4) :211-218.
  • 8Even A, Shankaranarayanan G. Utility- driven assessment of data quality[J]. The Data Base for Advances in InformationSystems ,2007,38 ( 2 ) :75-93.
  • 9Missier P, Embury S, Greenwood M. Quality views: Capturingand exploiting the user perspective on data quality [ C ]//Proc of 32th VLDB. Seoul, Korea : [ s. n. ] ,2006:977-988.
  • 10Ynan Man, Liu Wei. A novel data quality controling and asses- sing model based on rules[ C ]//ISECS' 10 Proceedings of the 2010 Third International Symposium on Electronic Commerce and Security. Gnangzhou : Academy Publisher,2010:29-32.

引证文献3

二级引证文献62

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部