摘要
关联关系挖掘与发现是大数据挖掘与分析的重要基础,现有的关联关系挖掘方法多是对数据进行统计分析,对未知数据缺少关联判别作用.尝试从学习的角度进行关联关系挖掘,给出了关联学习的形式化定义和相关概念,并根据关联学习定义构建学习数据集.具体地构建了2类关联图像数据集(two class associated image data sets,TAID),利用卷积神经网络提取关联特征,然后分别用softmax函数和K近邻算法判别关联关系,基于此提出3种关联关系判别器:关联图像卷积神经网络判别器(associated image convolutional neural network discriminator,AICNN)、关联图像LeNet判别器(associated image LeNet discriminator,AILeNet)和关联图像K近邻判别器(associated image K-nearest neighbor discriminator,AIKNN).3种关联判别器在TAID数据集上进行测试,AICNN在64×64像素90000个训练样本上的判别精度达0.8217,AILeNet在256×256像素22500个训练样本上的判别精度达0.8456,AIKNN在256×256像素22500个训练样本上的判别精度达到0.8664.这3种关联判别器有效地证明了学习角度挖掘关联关系的可行性.
Discovering associations is an important task in big data mining and analysis.Most of the existing mining methods just summarize the associations among data statistically,and cannot learn experience from known data as well as generalize to unseen instances.This paper attempts to explore the associations from learning perspective,and some formal definitions of association learning and relative model concepts are proposed.According to the definitions,a learning data set,namely,the two-class associated image data sets(TAID)are constructed.Then three association discriminators are designed,where associated image convolutional neural network discriminator(AICNN)and associated image LeNet discriminator(AILeNet)are end-to-end learning using softmax function for discrimination,associated image K-nearest neighbor discriminator(AIKNN)based on the associated features extracted by convolutional neural network adopts the K-nearest neighbor algorithm for discrimination.Furthermore,these discriminators are tested on the TAID.The discriminant accuracy of AICNN on an image training set of 90000 samples and 64×64 size is 0.8217;AILeNet and AIKNN on 22500256×256 images are 0.8456 and 0.8664 respectively.These three experiments effectively demonstrate the feasibility of learning the associations in data.
作者
钱宇华
张明星
成红红
Qian Yuhua;Zhang Mingxing;Cheng Honghong(Research Institute of Big Data Science and Industry,Shanxi University,Taiyuan 030006;Key Laboratory of Computational Intelligence and Chinese Information Processing(Shanxi University),Ministry of Education,Taiyuan 030006;School of Computer and Information Technology,Shanxi University,Taiyuan 030006)
出处
《计算机研究与发展》
EI
CSCD
北大核心
2020年第2期424-432,共9页
Journal of Computer Research and Development
基金
国家自然科学基金项目(61672332)
山西省拔尖创新人才支持计划项目
山西省三晋学者项目
山西省回国留学人员科研项目(2017023)~~
关键词
关联关系
关联学习
关联判别器
关联图像数据集
关联学习准则
association
association learning
association discriminator
association image data sets
association learning criteria