摘要
以对大规模个体数据通过打包形成的区间型符号数据为研究对象,针对个体在区间内往往不服从均匀分布的实际情况,研究一般分布的区间型符号数据的描述统计和分析方法.对符号数据分析进行了概述,并定义了一般分布的区间变量.研究了一般分布的区间变量的经验分布函数和经验联合分布函数.在此基础上,讨论了一般分布区间变量的描述统计量的求解.最后给出了算例,运用一般分布区间型符号数据的因子分析方法.以中国股市为背景进行了应用研究.结论表明:以往研究基于均匀分布假设所给出的描述统计量的计算,可看作文中所给求解公式的特例.另外,研究方法基于经验分布理论,无需知道个体在区间内服从分布函数的具体表达式,且在计算过程中充分利用了区间内的个体信息.
Interval symbolic data gained by data packaging on the original individuals of a sample are subjects of this paper. The individuals are always non-uniformly distributed within the intervals. Regarding this situation, this paper concentrates on descriptive statistics and analysis of generally distributed interval data, within which each individual is arbitrarily distributed. The basic theory of symbolic data analysis was first introduced. Then the definition of generally distributed interval was proposed. In the following, the study on empirical distribution function and empirical joint distribution function for generally distributed interval symbolic data were put forward. Based on this, the descriptive statistics of generally distributed interval variables were obtained. Finally a numerical example was given. And an application study in Chinese stock market was carried through using factor analysis of generally distributed interval symbolic data. Research shows that the previous works supposing uniform distribution are especial case of this work. Besides this, the method presented in this paper does not need the exact form of distribution function, since it is obtained upon theory of empirical distribution. Furthermore, it makes the best of the individuals sample information of the intervals.
出处
《系统工程理论与实践》
EI
CSSCI
CSCD
北大核心
2011年第12期2367-2372,共6页
Systems Engineering-Theory & Practice
基金
国家自然科学基金(70701026)
天津市哲学社会科学研究规划(TJGL11-099)
关键词
符号数据分析
区间数据
描述统计
一般分布
symbolic data analysis
interval valued data
descriptive statistics
general distribution