期刊文献+

一种流数据立方体分析挖掘框架 被引量:2

A Cube Analytical Mining Framework for Stream Data
在线阅读 下载PDF
导出
摘要 流数据是目前一种重要的数据展现形式,对流数据进行OLAM(联机分析挖掘)操作可为分析人员提供多层次的数据视图。但OLAM要求在不同粒度中实现对数据的聚合操作,而流式数据内含时态特性和持续到达特性,使得数据无法被多次重复操作。使用传统OLAP(联机分析处理)方法无法生成部分物化视图且流数据规模宏大,受限于存储空间大小而无法保存全部数据单元信息。针对上述问题,提出了一种基于概要技术的流数据OLAM框架——sketch cube(概要立方体),该框架把任意维度组合映射成唯一自然数,根据上下限单调原则对维度组合裁剪,在类线性空间中保存有效数据单元信息,并构建时间序列索引提高检索效率。通过理论分析给出使用sketch cube的前提条件,同时通过真实海量流数据实验分析表明,sketch sube在有效性、存储空间效率和正确率上可以满足实时挖掘的需求。 Stream data has been one of the most significant data format recently. OLAM (online analytical mining) operation could provide multi-level data views for analysts. However, OLAM operations depend on data aggregation, which is in conflict with the continuous incensement and dynamic nature of stream data. Thus, partial materialized view from stream data directly by typical OLAP approaches cannot be created and all data cells for the limitation of storage cannot be saved. In order to solve the above problems, an advanced sketch based OLAM framework named sketch cube to analyze stream data was proposed. Sketch cube maps a set of attributes to a unique number and stores it in sub-linear data structure, and then builds an inverted index by tiled time window. The precondition of using sketch cube by theoretical analysis was given and the storage efficiency and query performance on mass mobile data corpus was evaluated, which supports requirements of real-time analysis.
出处 《电信科学》 北大核心 2014年第9期61-71,共11页 Telecommunications Science
基金 浙江省自然科学基金资助项目(No.LQ14F020002) 浙江省本科院校中青年学科带头人学术攀登基金资助项目(No.PD2013453)
关键词 流数据 概要立方体 联机分析挖掘 实时分析 stream data, sketch cube, online analytical mining, real-time analysis
  • 相关文献

参考文献23

  • 1Aggarwal C C. An Introduction to Data Streams. Data Streams. Springer US, 2007.
  • 2Hellerstein J M, Haas P J, Wang H J. Online aggregation. ACM SIGMOD Record, 1997, 26(2): 171-182.
  • 3Zhang X, Chou P L, Dang G. Efficient computation of iceberg cubes by bounding aggregate functions. IEEE Transactions on Knowledge and Data Engineering, 2007,19(7).
  • 4Chen Y, Do ng G, Han J, et al. Multi-dimensional regression analysis of time-series data streams. Proceedings of the 28th International Conference on Very Large Data Bases, VLDB Endowment, Hang Kong, China, 2002:323-334.
  • 5胡文瑜,孙志挥,吴英杰.数据挖掘取样方法研究[J].计算机研究与发展,2011,48(1):45-54. 被引量:54
  • 6De Rougemont M, Cao P T. Approximate answers to OLAP queries on streaming data warehouses. Proceedings of the Fifteenth International Workshop on Data Warehousing and OLAP, Maui, Hi, USA, 2012:121-128.
  • 7Babcock B, Shinath B, Mayur D, et al. Models and issues in data stream systems. Proceedings of the 21st ACM Symposium on Principles of Database Systems, Madison, Wiscomsin, USA, 2002:1-16.
  • 8Chandrasekaran S, Cooper O, Deshpande A. TelegraphCQ: continuous dataflow processing for an uncertain world. Proceedings of the Conf on Innovative Data Systems Research, Asilomar, CA, USA, 2003.
  • 9Hetal T, Nikolay L, Hamid M, et al. SMM: a data stream management system for knowledge discovery. Proceedings of International Conference on Data Engineering, Hannover, Germany, 2011:757-768.
  • 10Rosenberg A L. Efficient pairing functions-and why you should care. International Journal of Foundations of Computer Science, 2003, 14(1): 3-17.

二级参考文献74

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2贾彩燕,陆汝钤.关联规则挖掘的取样误差量化模型和快速估计算法[J].计算机学报,2006,29(4):625-634. 被引量:7
  • 3杨雪梅,董逸生,徐宏炳,刘学军,钱江波,王永利.高维数据流的在线相关性分析[J].计算机研究与发展,2006,43(10):1744-1750. 被引量:9
  • 4Bloom BH.Space/Time trade-offs in hash coding with allowable errors.Communications of the ACM,1970,13(7):422-426.[doi:10.1145/362686.362692].
  • 5Fan L,Cao P,Almeida J,Broder AZ.Summary cache:A scalable wide-area Web cache sharing protocol.IEEE/ACM Trans.on Networking,2000,8(3):281-293.[doi:10.1109/90.851975].
  • 6Cohen S,Matias Y.Spectral bloom filters.In:Proc.of the 2003 ACM SIGMOD Int'l Conf.on Management of Data.New York:ACM Press,2003.241-252.
  • 7Kumar A,Xu J.Space-Code bloom filter for efficient per-flow traffic measurement.In:Proc.of the IEEE INFOCOM 2004,Vol.3.Washington:IEEE Computer Society Press,2004.1762-1773.
  • 8Pagh A,Pagh R,Rao S.An optimal bloom filter replacement.In:Proc.of the 16th Annual ACM-SIAM Symp.on Discrete Algorithms.New York:ACM Press,2005.823-829.
  • 9Bonomi F,Mitzenmacher M,Panigrahy R,Singh S,Varghese G.An improved construction for counting bloom filters.In:Proc.of the 14th Conf.on Annual European Symp.,Vol.14.Springer-Verlag,2006.684-695.
  • 10Estan C,Varghese G.New directions in traffic measurement and accounting.ACM SIGCOMM Computer Communication Review,2002,32(4):323-336.[doi:10.1145/964725.633056].

共引文献58

同被引文献13

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部