[目的]本文对国内外已经发表的自然语言处理领域有关文本嵌入的研究进行较深入的分析和对比,详细描述文本嵌入的知识结构和发展脉络,以及针对不同领域、不同数据集的模型改进方法,讨论流行的嵌入模型,比较每个模型在文本嵌入中的优缺点...[目的]本文对国内外已经发表的自然语言处理领域有关文本嵌入的研究进行较深入的分析和对比,详细描述文本嵌入的知识结构和发展脉络,以及针对不同领域、不同数据集的模型改进方法,讨论流行的嵌入模型,比较每个模型在文本嵌入中的优缺点,同时指出文本嵌入所面临的挑战,提出可能的解决方案。[方法]检索Web of Science数据库、CNKI数据库和万方数据,获取国内外文本嵌入研究的相关文献,运用内容分析法对文献做系统梳理分析,对这些文献中利用的文本嵌入技术以及改进方案、建模思想、生成过程等方面进行对比与分析。[结果]经过去重和合并,保留内容最相关的61篇文献。文本嵌入方法可以归纳为三类:基于频率的文本嵌入、基于神经网络的文本嵌入和基于主题建模的文本嵌入。针对语料库的规模大小、多义词嵌入、通用嵌入的域适应等文本嵌入所面临的挑战,从被调查的研究文章中提出了可能的解决方案。展开更多
In the era of Big Data,we are faced with an inevitable and challenging problem of“overload information”.To alleviate this problem,it is important to use effective automatic text summarization techniques to obtain th...In the era of Big Data,we are faced with an inevitable and challenging problem of“overload information”.To alleviate this problem,it is important to use effective automatic text summarization techniques to obtain the key information quickly and efficiently from the huge amount of text.In this paper,we propose a hybrid method of extractive text summarization based on deep learning and graph ranking algorithms(ETSDG).In this method,a pre-trained deep learning model is designed to yield useful sentence embeddings.Given the association between sentences in raw documents,a traditional LexRank algorithm with fine-tuning is adopted fin ETSDG.In order to improve the performance of the extractive text summarization method,we further integrate the traditional LexRank algorithm with deep learning.Testing results on the data set DUC2004 show that ETSDG has better performance in ROUGE metrics compared with certain benchmark methods.展开更多
More and more embedded devices, such as mobile phones, tablet PCs and laptops, are used in every field, so huge files need to be stored or backed up into cloud storage. Optimizing the performance of cloud storage is v...More and more embedded devices, such as mobile phones, tablet PCs and laptops, are used in every field, so huge files need to be stored or backed up into cloud storage. Optimizing the performance of cloud storage is very important for Internet development. This paper presents the performance evaluation of the open source distributed storage system, a highly available, distributed, eventually consistent object/blob store from Open Stack cloud computing components. This paper mainly focuses on the mechanism of cloud storage as well as the optimization methods to process different sized files. This work provides two major contributions through comprehensive performance evaluations. First, it provides different configurations for Open Stack Swift system and an analysis of how every component affects the performance. Second, it presents the detailed optimization methods to improve the performance in processing different sized files. The experimental results show that our method improves the performance and the structure. We give the methods to optimize the object-based cloud storage system to deploy the readily available storage system.展开更多
文摘[目的]本文对国内外已经发表的自然语言处理领域有关文本嵌入的研究进行较深入的分析和对比,详细描述文本嵌入的知识结构和发展脉络,以及针对不同领域、不同数据集的模型改进方法,讨论流行的嵌入模型,比较每个模型在文本嵌入中的优缺点,同时指出文本嵌入所面临的挑战,提出可能的解决方案。[方法]检索Web of Science数据库、CNKI数据库和万方数据,获取国内外文本嵌入研究的相关文献,运用内容分析法对文献做系统梳理分析,对这些文献中利用的文本嵌入技术以及改进方案、建模思想、生成过程等方面进行对比与分析。[结果]经过去重和合并,保留内容最相关的61篇文献。文本嵌入方法可以归纳为三类:基于频率的文本嵌入、基于神经网络的文本嵌入和基于主题建模的文本嵌入。针对语料库的规模大小、多义词嵌入、通用嵌入的域适应等文本嵌入所面临的挑战,从被调查的研究文章中提出了可能的解决方案。
文摘In the era of Big Data,we are faced with an inevitable and challenging problem of“overload information”.To alleviate this problem,it is important to use effective automatic text summarization techniques to obtain the key information quickly and efficiently from the huge amount of text.In this paper,we propose a hybrid method of extractive text summarization based on deep learning and graph ranking algorithms(ETSDG).In this method,a pre-trained deep learning model is designed to yield useful sentence embeddings.Given the association between sentences in raw documents,a traditional LexRank algorithm with fine-tuning is adopted fin ETSDG.In order to improve the performance of the extractive text summarization method,we further integrate the traditional LexRank algorithm with deep learning.Testing results on the data set DUC2004 show that ETSDG has better performance in ROUGE metrics compared with certain benchmark methods.
基金performed by key technology of networking media broadcast based on cloud computing in"China Twelfth Five-Year"Plan for Science&Technology Project(Grant No.:2013BAH65F01-2013BAH65F04)NSFC(Grant No.:61472144)+3 种基金National science and technology support plan(Grant No.:2013BAH65F03,2013BAH65F04)GDSTP(Grant No.:2013B010202004,2014A010103012)GDUPS(2011)Research Fund for the Doctoral Program of Higher Education of China(Grant No.:20120172110023)
文摘More and more embedded devices, such as mobile phones, tablet PCs and laptops, are used in every field, so huge files need to be stored or backed up into cloud storage. Optimizing the performance of cloud storage is very important for Internet development. This paper presents the performance evaluation of the open source distributed storage system, a highly available, distributed, eventually consistent object/blob store from Open Stack cloud computing components. This paper mainly focuses on the mechanism of cloud storage as well as the optimization methods to process different sized files. This work provides two major contributions through comprehensive performance evaluations. First, it provides different configurations for Open Stack Swift system and an analysis of how every component affects the performance. Second, it presents the detailed optimization methods to improve the performance in processing different sized files. The experimental results show that our method improves the performance and the structure. We give the methods to optimize the object-based cloud storage system to deploy the readily available storage system.