Traditional topic models have been widely used for analyzing semantic topics from electronic documents.However,the obvious defects of topic words acquired by them are poor in readability and consistency.Only the domai...Traditional topic models have been widely used for analyzing semantic topics from electronic documents.However,the obvious defects of topic words acquired by them are poor in readability and consistency.Only the domain experts are possible to guess their meaning.In fact,phrases are the main unit for people to express semantics.This paper presents a Distributed Representation-Phrase Latent Dirichlet Allocation(DR-Phrase LDA)which is a phrase topic model.Specifically,we reasonably enhance the semantic information of phrases via distributed representation in this model.The experimental results show the topics quality acquired by our model is more readable and consistent than other similar topic models.展开更多
The risk classification of BBS posts is important to the evaluation of societal risk level within a period. Using the posts collected from Tianya forum as the data source, the authors adopted the societal risk indicat...The risk classification of BBS posts is important to the evaluation of societal risk level within a period. Using the posts collected from Tianya forum as the data source, the authors adopted the societal risk indicators from socio psychology, and conduct document-level multiple societal risk classification of BBS posts. To effectively capture the semantics and word order of documents, a shallow neural network as Paragraph Vector is applied to realize the distributed vector representations of the posts in the vector space. Based on the document vectors, the authors apply one classification method KNN to identify the societal risk category of the posts. The experimental results reveal that paragraph vector in document-level societal risk classification achieves much faster training speed and at least 10% improvements of F-measures than Bag-of-Words. Furthermore, the performance of paragraph vector is also superior to edit distance and Lucene-based search method. The present work is the first attempt of combining document embedding method with socio psychology research results to public opinions area.展开更多
Video reconstruction quality largely depends on the ability of employed sparse domain to adequately represent the underlying video in Distributed Compressed Video Sensing (DCVS). In this paper, we propose a novel dyna...Video reconstruction quality largely depends on the ability of employed sparse domain to adequately represent the underlying video in Distributed Compressed Video Sensing (DCVS). In this paper, we propose a novel dynamic global-Principal Component Analysis (PCA) sparse representation algorithm for video based on the sparse-land model and nonlocal similarity. First, grouping by matching is realized at the decoder from key frames that are previously recovered. Second, we apply PCA to each group (sub-dataset) to compute the principle components from which the sub-dictionary is constructed. Finally, the non-key frames are reconstructed from random measurement data using a Compressed Sensing (CS) reconstruction algorithm with sparse regularization. Experimental results show that our algorithm has a better performance compared with the DCT and K-SVD dictionaries.展开更多
In surveying data processing,we generally suppose that the observational errors distribute normally.In this case the method of least squares can give the minimum variance unbiased estimation of the parameters.The meth...In surveying data processing,we generally suppose that the observational errors distribute normally.In this case the method of least squares can give the minimum variance unbiased estimation of the parameters.The method of least squares does not have the character of robustness,so the use of it will become unsuitable when a few measurements inheriting gross error mix with others.We can use the robust estimating methods that can avoid the influence of gross errors.With this kind of method there is no need to know the exact distribution of the observations.But it will cause other difficulties such as the hypothesis testing for estimated parameters when the sample size is not so big.For non_normally distributed measurements we can suppose they obey the p _norm distribution law.The p _norm distribution is a distributional class,which includes the most frequently used distributions such as the Laplace,Normal and Rectangular ones.This distribution is symmetric and has a kurtosis between 3 and -6/5 when p is larger than 1.Using p _norm distribution to describe the statistical character of the errors,the only assumption is that the error distribution is a symmetric and unimodal curve.This method possesses the property of a kind of self_adapting.But the density function of the p _norm distribution is so complex that it makes the theoretical analysis more difficult.And the troublesome calculation also makes this method not suitable for practice.The research of this paper indicates that the p _norm distribution can be represented by the linear combination of Laplace distribution and normal distribution or by the linear combination of normal distribution and rectangular distribution approximately.Which kind of representation will be taken is according to whether the parameter p is larger than 1 and less than 2 or p is larger than 2.The approximate distribution have the same first four order moments with the exact one.It means that approximate distribution has the same mathematical expectation,variance,skewness and kurtosis with p _norm distribution.Because every density function used in the approximate formulae has a simple form,using the approximate density function to replace the p _norm ones will simplify the problems of p _norm distributed data processing obviously.展开更多
Text classification has always been an increasingly crucial topic in natural language processing.Traditional text classification methods based on machine learning have many disadvantages such as dimension explosion,da...Text classification has always been an increasingly crucial topic in natural language processing.Traditional text classification methods based on machine learning have many disadvantages such as dimension explosion,data sparsity,limited generalization ability and so on.Based on deep learning text classification,this paper presents an extensive study on the text classification models including Convolutional Neural Network-Based(CNN-Based),Recurrent Neural Network-Based(RNN-based),Attention Mechanisms-Based and so on.Many studies have proved that text classification methods based on deep learning outperform the traditional methods when processing large-scale and complex datasets.The main reasons are text classification methods based on deep learning can avoid cumbersome feature extraction process and have higher prediction accuracy for a large set of unstructured data.In this paper,we also summarize the shortcomings of traditional text classification methods and introduce the text classification process based on deep learning including text preprocessing,distributed representation of text,text classification model construction based on deep learning and performance evaluation.展开更多
As a key technology of rapid and low-cost drug development, drug repositioning is getting popular. In this study, a text mining approach to the discovery of unknown drug-disease relation was tested. Using a word embed...As a key technology of rapid and low-cost drug development, drug repositioning is getting popular. In this study, a text mining approach to the discovery of unknown drug-disease relation was tested. Using a word embedding algorithm, senses of over 1.7 million words were well represented in sufficiently short feature vectors. Through various analysis including clustering and classification, feasibility of our approach was tested. Finally, our trained classification model achieved 87.6% accuracy in the prediction of drug-disease relation in cancer treatment and succeeded in discovering novel drug-disease relations that were actually reported in recent studies.展开更多
A kind of new environment representation and object localization scheme is proposed in the paper aiming to accomplish the task of object operation more efficiently in intelligent space. First, a distributed environmen...A kind of new environment representation and object localization scheme is proposed in the paper aiming to accomplish the task of object operation more efficiently in intelligent space. First, a distributed environment represen- tation method is put forward to reduce storage burden and improve the system's stability. The layered topological maps are separately stored in different landmarks attached to the key positions of intelligent space, so that the robot can search the landmarks on which the map information can be read from the QR code, and then the environment map can be built autonomously. Map building is an important prerequisite for object search. An object search scheme based on RFID and vision technology is proposed. The RFID tags are attached to the target objects and reference objects in the indoor environ- ment. A fixed RFID system is built to monitor the rough position (room and local area) of target and a mobile RFID system is constructed to detect the targets which are not in the covering range of the fixed system. The existing area of target is determined by the time sequence of reference tags and target tags, and the accurate position is obtained by onboard vision system at a short distance. The experiments demonstrate that the distributed environment representation proposed in the paper can fully meet the requirements of object localization, and the positioning scheme has high search efficiency, high localization accuracy and precision, and a strong anti-interference ability in the complex indoor environment.展开更多
The quick response code based artificial labels are applied to provide semantic concepts and relations of surroundings that permit the understanding of complexity and limitations of semantic recognition and scene only...The quick response code based artificial labels are applied to provide semantic concepts and relations of surroundings that permit the understanding of complexity and limitations of semantic recognition and scene only with robot's vision.By imitating spatial cognizing mechanism of human,the robot constantly received the information of artificial labels at cognitive-guide points in a wide range of structured environment to achieve the perception of the environment and robot navigation.The immune network algorithm was used to form the environmental awareness mechanism with "distributed representation".The color recognition and SIFT feature matching algorithm were fused to achieve the memory and cognition of scenario tag.Then the cognition-guide-action based cognizing semantic map was built.Along with the continuously abundant map,the robot did no longer need to rely on the artificial label,and it could plan path and navigate freely.Experimental results show that the artificial label designed in this work can improve the cognitive ability of the robot,navigate the robot in the case of semi-unknown environment,and build the cognizing semantic map favorably.展开更多
In the context of collaborative robotics,distributed situation awareness is essential for supporting collective intelligence in teams of robots and human agents where it can be used for both individual and collective ...In the context of collaborative robotics,distributed situation awareness is essential for supporting collective intelligence in teams of robots and human agents where it can be used for both individual and collective decision support.This is particularly important in applications pertaining to emergency rescue and crisis management.During operational missions,data and knowledge are gathered incrementally and in different ways by heterogeneous robots and humans.We describe this as the creation of Hastily Formed Knowledge Networks(HFKNs).The focus of this paper is the specification and prototyping of a general distributed system architecture that supports the creation of HFKNs by teams of robots and humans.The information collected ranges from low-level sensor data to high-level semantic knowledge,the latter represented in part as RDF Graphs.The framework includes a synchronization protocol and associated algorithms that allow for the automatic distribution and sharing of data and knowledge between agents.This is done through the distributed synchronization of RDF Graphs shared between agents.High-level semantic queries specified in SPARQL can be used by robots and humans alike to acquire both knowledge and data content from team members.The system is empirically validated and complexity results of the proposed algorithms are provided.Additionally,a field robotics case study is described,where a 3D mapping mission has been executed using several UAVs in a collaborative emergency rescue scenario while using the full HFKN Framework.展开更多
Recently, the emergence of pre-trained models(PTMs) has brought natural language processing(NLP) to a new era. In this survey, we provide a comprehensive review of PTMs for NLP. We first briefly introduce language rep...Recently, the emergence of pre-trained models(PTMs) has brought natural language processing(NLP) to a new era. In this survey, we provide a comprehensive review of PTMs for NLP. We first briefly introduce language representation learning and its research progress. Then we systematically categorize existing PTMs based on a taxonomy from four different perspectives. Next,we describe how to adapt the knowledge of PTMs to downstream tasks. Finally, we outline some potential directions of PTMs for future research. This survey is purposed to be a hands-on guide for understanding, using, and developing PTMs for various NLP tasks.展开更多
Knowledge graph representation has been a long standing goal of artificial intelligence. In this paper,we consider a method for knowledge graph embedding of hyper-relational data, which are commonly found in knowledge...Knowledge graph representation has been a long standing goal of artificial intelligence. In this paper,we consider a method for knowledge graph embedding of hyper-relational data, which are commonly found in knowledge graphs. Previous models such as Trans(E, H, R) and CTrans R are either insufficient for embedding hyper-relational data or focus on projecting an entity into multiple embeddings, which might not be effective for generalization nor accurately reflect real knowledge. To overcome these issues, we propose the novel model Trans HR, which transforms the hyper-relations in a pair of entities into an individual vector, serving as a translation between them. We experimentally evaluate our model on two typical tasks—link prediction and triple classification.The results demonstrate that Trans HR significantly outperforms Trans(E, H, R) and CTrans R, especially for hyperrelational data.展开更多
基金This work was supported by the Project of Industry and University Cooperative Research of Jiangsu Province,China(No.BY2019051)Ma,J.would like to thank the Jiangsu Eazytec Information Technology Company(www.eazytec.com)for their financial support.
文摘Traditional topic models have been widely used for analyzing semantic topics from electronic documents.However,the obvious defects of topic words acquired by them are poor in readability and consistency.Only the domain experts are possible to guess their meaning.In fact,phrases are the main unit for people to express semantics.This paper presents a Distributed Representation-Phrase Latent Dirichlet Allocation(DR-Phrase LDA)which is a phrase topic model.Specifically,we reasonably enhance the semantic information of phrases via distributed representation in this model.The experimental results show the topics quality acquired by our model is more readable and consistent than other similar topic models.
基金supported by the National Natural Science Foundation of China under Grant Nos.71171187,71371107,and 61473284
文摘The risk classification of BBS posts is important to the evaluation of societal risk level within a period. Using the posts collected from Tianya forum as the data source, the authors adopted the societal risk indicators from socio psychology, and conduct document-level multiple societal risk classification of BBS posts. To effectively capture the semantics and word order of documents, a shallow neural network as Paragraph Vector is applied to realize the distributed vector representations of the posts in the vector space. Based on the document vectors, the authors apply one classification method KNN to identify the societal risk category of the posts. The experimental results reveal that paragraph vector in document-level societal risk classification achieves much faster training speed and at least 10% improvements of F-measures than Bag-of-Words. Furthermore, the performance of paragraph vector is also superior to edit distance and Lucene-based search method. The present work is the first attempt of combining document embedding method with socio psychology research results to public opinions area.
基金supported by the Innovation Project of Graduate Students of Jiangsu Province, China under Grants No. CXZZ12_0466, No. CXZZ11_0390the National Natural Science Foundation of China under Grants No. 61071091, No. 61271240, No. 61201160, No. 61172118+2 种基金the Natural Science Foundation of the Higher Education Institutions of Jiangsu Province, China under Grant No. 12KJB510019the Science and Technology Research Program of Hubei Provincial Department of Education under Grants No. D20121408, No. D20121402the Program for Research Innovation of Nanjing Institute of Technology Project under Grant No. CKJ20110006
文摘Video reconstruction quality largely depends on the ability of employed sparse domain to adequately represent the underlying video in Distributed Compressed Video Sensing (DCVS). In this paper, we propose a novel dynamic global-Principal Component Analysis (PCA) sparse representation algorithm for video based on the sparse-land model and nonlocal similarity. First, grouping by matching is realized at the decoder from key frames that are previously recovered. Second, we apply PCA to each group (sub-dataset) to compute the principle components from which the sub-dictionary is constructed. Finally, the non-key frames are reconstructed from random measurement data using a Compressed Sensing (CS) reconstruction algorithm with sparse regularization. Experimental results show that our algorithm has a better performance compared with the DCT and K-SVD dictionaries.
文摘In surveying data processing,we generally suppose that the observational errors distribute normally.In this case the method of least squares can give the minimum variance unbiased estimation of the parameters.The method of least squares does not have the character of robustness,so the use of it will become unsuitable when a few measurements inheriting gross error mix with others.We can use the robust estimating methods that can avoid the influence of gross errors.With this kind of method there is no need to know the exact distribution of the observations.But it will cause other difficulties such as the hypothesis testing for estimated parameters when the sample size is not so big.For non_normally distributed measurements we can suppose they obey the p _norm distribution law.The p _norm distribution is a distributional class,which includes the most frequently used distributions such as the Laplace,Normal and Rectangular ones.This distribution is symmetric and has a kurtosis between 3 and -6/5 when p is larger than 1.Using p _norm distribution to describe the statistical character of the errors,the only assumption is that the error distribution is a symmetric and unimodal curve.This method possesses the property of a kind of self_adapting.But the density function of the p _norm distribution is so complex that it makes the theoretical analysis more difficult.And the troublesome calculation also makes this method not suitable for practice.The research of this paper indicates that the p _norm distribution can be represented by the linear combination of Laplace distribution and normal distribution or by the linear combination of normal distribution and rectangular distribution approximately.Which kind of representation will be taken is according to whether the parameter p is larger than 1 and less than 2 or p is larger than 2.The approximate distribution have the same first four order moments with the exact one.It means that approximate distribution has the same mathematical expectation,variance,skewness and kurtosis with p _norm distribution.Because every density function used in the approximate formulae has a simple form,using the approximate density function to replace the p _norm ones will simplify the problems of p _norm distributed data processing obviously.
基金This work supported in part by the National Natural Science Foundation of China under Grant 61872134,in part by the Natural Science Foundation of Hunan Province under Grant 2018JJ2062in part by Science and Technology Development Center of the Ministry of Education under Grant 2019J01020in part by the 2011 Collaborative Innovative Center for Development and Utilization of Finance and Economics Big Data Property,Universities of Hunan Province。
文摘Text classification has always been an increasingly crucial topic in natural language processing.Traditional text classification methods based on machine learning have many disadvantages such as dimension explosion,data sparsity,limited generalization ability and so on.Based on deep learning text classification,this paper presents an extensive study on the text classification models including Convolutional Neural Network-Based(CNN-Based),Recurrent Neural Network-Based(RNN-based),Attention Mechanisms-Based and so on.Many studies have proved that text classification methods based on deep learning outperform the traditional methods when processing large-scale and complex datasets.The main reasons are text classification methods based on deep learning can avoid cumbersome feature extraction process and have higher prediction accuracy for a large set of unstructured data.In this paper,we also summarize the shortcomings of traditional text classification methods and introduce the text classification process based on deep learning including text preprocessing,distributed representation of text,text classification model construction based on deep learning and performance evaluation.
文摘As a key technology of rapid and low-cost drug development, drug repositioning is getting popular. In this study, a text mining approach to the discovery of unknown drug-disease relation was tested. Using a word embedding algorithm, senses of over 1.7 million words were well represented in sufficiently short feature vectors. Through various analysis including clustering and classification, feasibility of our approach was tested. Finally, our trained classification model achieved 87.6% accuracy in the prediction of drug-disease relation in cancer treatment and succeeded in discovering novel drug-disease relations that were actually reported in recent studies.
基金supported by the National High Technology Research and Development Program of China(No.2009AA04Z220)the National Natural Science Foundation of China(No.61075092)
文摘A kind of new environment representation and object localization scheme is proposed in the paper aiming to accomplish the task of object operation more efficiently in intelligent space. First, a distributed environment represen- tation method is put forward to reduce storage burden and improve the system's stability. The layered topological maps are separately stored in different landmarks attached to the key positions of intelligent space, so that the robot can search the landmarks on which the map information can be read from the QR code, and then the environment map can be built autonomously. Map building is an important prerequisite for object search. An object search scheme based on RFID and vision technology is proposed. The RFID tags are attached to the target objects and reference objects in the indoor environ- ment. A fixed RFID system is built to monitor the rough position (room and local area) of target and a mobile RFID system is constructed to detect the targets which are not in the covering range of the fixed system. The existing area of target is determined by the time sequence of reference tags and target tags, and the accurate position is obtained by onboard vision system at a short distance. The experiments demonstrate that the distributed environment representation proposed in the paper can fully meet the requirements of object localization, and the positioning scheme has high search efficiency, high localization accuracy and precision, and a strong anti-interference ability in the complex indoor environment.
基金Projects(61203330,61104009,61075092)supported by the National Natural Science Foundation of ChinaProject(2013M540546)supported by China Postdoctoral Science Foundation+2 种基金Projects(ZR2012FM031,ZR2011FM011,ZR2010FM007)supported by Shandong Provincal Nature Science Foundation,ChinaProjects(2011JC017,2012TS078)supported by Independent Innovation Foundation of Shandong University,ChinaProject(201203058)supported by Shandong Provincal Postdoctoral Innovation Foundation,China
文摘The quick response code based artificial labels are applied to provide semantic concepts and relations of surroundings that permit the understanding of complexity and limitations of semantic recognition and scene only with robot's vision.By imitating spatial cognizing mechanism of human,the robot constantly received the information of artificial labels at cognitive-guide points in a wide range of structured environment to achieve the perception of the environment and robot navigation.The immune network algorithm was used to form the environmental awareness mechanism with "distributed representation".The color recognition and SIFT feature matching algorithm were fused to achieve the memory and cognition of scenario tag.Then the cognition-guide-action based cognizing semantic map was built.Along with the continuously abundant map,the robot did no longer need to rely on the artificial label,and it could plan path and navigate freely.Experimental results show that the artificial label designed in this work can improve the cognitive ability of the robot,navigate the robot in the case of semi-unknown environment,and build the cognizing semantic map favorably.
基金This work has been supported by the ELLIIT Network Organization for Information and Communication Technology,Sweden(Project B09)and the Swedish Foundation for Strategic Research SSF(Smart Systems Project RIT15-0097)The first author is also supported by an RExperts Program Grant 2020A1313030098 from the Guangdong Department of Science and Technology,China in addition to a Sichuan Province International Science and Technology Innovation Cooperation Project Grant 2020YFH0160.
文摘In the context of collaborative robotics,distributed situation awareness is essential for supporting collective intelligence in teams of robots and human agents where it can be used for both individual and collective decision support.This is particularly important in applications pertaining to emergency rescue and crisis management.During operational missions,data and knowledge are gathered incrementally and in different ways by heterogeneous robots and humans.We describe this as the creation of Hastily Formed Knowledge Networks(HFKNs).The focus of this paper is the specification and prototyping of a general distributed system architecture that supports the creation of HFKNs by teams of robots and humans.The information collected ranges from low-level sensor data to high-level semantic knowledge,the latter represented in part as RDF Graphs.The framework includes a synchronization protocol and associated algorithms that allow for the automatic distribution and sharing of data and knowledge between agents.This is done through the distributed synchronization of RDF Graphs shared between agents.High-level semantic queries specified in SPARQL can be used by robots and humans alike to acquire both knowledge and data content from team members.The system is empirically validated and complexity results of the proposed algorithms are provided.Additionally,a field robotics case study is described,where a 3D mapping mission has been executed using several UAVs in a collaborative emergency rescue scenario while using the full HFKN Framework.
基金the National Natural Science Foundation of China(Grant Nos.61751201 and 61672162)the Shanghai Municipal Science and Technology Major Project(Grant No.2018SHZDZX01)and ZJLab。
文摘Recently, the emergence of pre-trained models(PTMs) has brought natural language processing(NLP) to a new era. In this survey, we provide a comprehensive review of PTMs for NLP. We first briefly introduce language representation learning and its research progress. Then we systematically categorize existing PTMs based on a taxonomy from four different perspectives. Next,we describe how to adapt the knowledge of PTMs to downstream tasks. Finally, we outline some potential directions of PTMs for future research. This survey is purposed to be a hands-on guide for understanding, using, and developing PTMs for various NLP tasks.
基金partially supported by the National Natural Science Foundation of China(Nos.61302077,61520106007,61421061,and 61602048)
文摘Knowledge graph representation has been a long standing goal of artificial intelligence. In this paper,we consider a method for knowledge graph embedding of hyper-relational data, which are commonly found in knowledge graphs. Previous models such as Trans(E, H, R) and CTrans R are either insufficient for embedding hyper-relational data or focus on projecting an entity into multiple embeddings, which might not be effective for generalization nor accurately reflect real knowledge. To overcome these issues, we propose the novel model Trans HR, which transforms the hyper-relations in a pair of entities into an individual vector, serving as a translation between them. We experimentally evaluate our model on two typical tasks—link prediction and triple classification.The results demonstrate that Trans HR significantly outperforms Trans(E, H, R) and CTrans R, especially for hyperrelational data.