Using similar single-difference methodology(SSDM) to solve the deformation values of the monitoring points, there is unstability of the deformation information series, at sometimes.In order to overcome this shortcomin...Using similar single-difference methodology(SSDM) to solve the deformation values of the monitoring points, there is unstability of the deformation information series, at sometimes.In order to overcome this shortcoming, Kalman filtering algorithm for this series is established,and its correctness and validity are verified with the test data obtained on the movable platform in plane. The results show that Kalman filtering can improve the correctness, reliability and stability of the deformation information series.展开更多
A new similar single-difference mathematical model (SS-DM) and its corresponding algorithmare advanced to solve the deformationof monitoring point directly in singleepoch. The method for building theSSDM is introduced...A new similar single-difference mathematical model (SS-DM) and its corresponding algorithmare advanced to solve the deformationof monitoring point directly in singleepoch. The method for building theSSDM is introduced in detail, and themain error sources affecting the accu-racy of deformation measurement areanalyzed briefly, and the basic algo-rithm and steps of solving the deform-ation are discussed.In order to validate the correctnessand the accuracy of the similar single-difference model, the test with fivedual frequency receivers is carried outon a slideway which moved in plane inFeb. 2001. In the test,five sessions areobserved. The numerical results oftest data show that the advanced mod-el is correct.展开更多
K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper propo...K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper proposes an improved K-means algorithm based on the similarity matrix. The im- proved algorithm can effectively avoid the random selection of initial center points, therefore it can provide effective initial points for clustering process, and reduce the fluctuation of clustering results which are resulted from initial points selections, thus a better clustering quality can be obtained. The experimental results also show that the F-measure of the improved K-means algorithm has been greatly improved and the clustering results are more stable.展开更多
The molecular similarity of 139 organic compounds was calculated by the topologic index method, the flexible super-ball algorithm was used to scan similar molecules and structures. The results show that the properti...The molecular similarity of 139 organic compounds was calculated by the topologic index method, the flexible super-ball algorithm was used to scan similar molecules and structures. The results show that the properties of organic compounds estimated from this method are reliable.展开更多
The fundamental problem of similarity studies, in the frame of data-mining, is to examine and detect similar items in articles, papers, and books with huge sizes. In this paper, we are interested in the probabilistic,...The fundamental problem of similarity studies, in the frame of data-mining, is to examine and detect similar items in articles, papers, and books with huge sizes. In this paper, we are interested in the probabilistic, and the statistical and the algorithmic aspects in studies of texts. We will be using the approach of k-shinglings, a k-shingling being defined as a sequence of k consecutive characters that are extracted from a text (k ≥ 1). The main stake in this field is to find accurate and quick algorithms to compute the similarity in short times. This will be achieved in using approximation methods. The first approximation method is statistical and, is based on the theorem of Glivenko-Cantelli. The second is the banding technique. And the third concerns a modification of the algorithm proposed by Rajaraman et al. ([1]), denoted here as (RUM). The Jaccard index is the one being used in this paper. We finally illustrate these results of the paper on the four Gospels. The results are very conclusive.展开更多
In this paper, we proposed an improved hybrid semantic matching algorithm combining Input/Output (I/O) semantic matching with text lexical similarity to overcome the disadvantage that the existing semantic matching al...In this paper, we proposed an improved hybrid semantic matching algorithm combining Input/Output (I/O) semantic matching with text lexical similarity to overcome the disadvantage that the existing semantic matching algorithms were unable to distinguish those services with the same I/O by only performing I/O based service signature matching in semantic web service discovery techniques. The improved algorithm consists of two steps, the first is logic based I/O concept ontology matching, through which the candidate service set is obtained and the second is the service name matching with lexical similarity against the candidate service set, through which the final precise matching result is concluded. Using Ontology Web Language for Services (OWL-S) test collection, we tested our hybrid algorithm and compared it with OWL-S Matchmaker-X (OWLS-MX), the experimental results have shown that the proposed algorithm could pick out the most suitable advertised service corresponding to user's request from very similar ones and provide better matching precision and efficiency than OWLS-MX.展开更多
Borda sorting algorithm is a kind of improvement algorithm based on weighted position sorting algorithm,it is mainly suitable for the high duplication of search results,for the independent search results,the effect is...Borda sorting algorithm is a kind of improvement algorithm based on weighted position sorting algorithm,it is mainly suitable for the high duplication of search results,for the independent search results,the effect is not very good and the computing method of relative score in Borda sorting algorithm is according to the rule of the linear regressive,but position relationship cannot fully represent the correlation changes.aimed at this drawback,the new sorting algorithm is proposed in this paper,named PMS-Sorting algorithm,firstly the position score of the returned results is standardized processing,and the similarity retrieval word string with the query results is combined into the algorithm,the similarity calculation method is also improved,through the experiment,the improved algorithm is superior to traditional sorting algorithm.展开更多
Pattern discovery from time series is of fundamental importance. Most of the algorithms of pattern discovery in time series capture the values of time series based on some kinds of similarity measures. Affected by the...Pattern discovery from time series is of fundamental importance. Most of the algorithms of pattern discovery in time series capture the values of time series based on some kinds of similarity measures. Affected by the scale and baseline, value-based methods bring about problem when the objective is to capture the shape. Thus, a similarity measure based on shape, Sh measure, is originally proposed, andthe properties of this similarity and corresponding proofs are given. Then a time series shape pattern discovery algorithm based on Sh measure is put forward. The proposed algorithm is terminated in finite iteration with given computational and storage complexity. Finally the experiments on synthetic datasets and sunspot datasets demonstrate that the time series shape pattern algorithm is valid.展开更多
Web 2.0信息时代,信息量迅速增加,信息检索速率却显著降低,如何提高信息的自动分类管理水平,从海量数据中高效、准确、快速获取有价值的信息与知识成为智慧图书馆亟待研究与解决的问题。文章提出了在数字图书馆服务中运用新型文本聚类...Web 2.0信息时代,信息量迅速增加,信息检索速率却显著降低,如何提高信息的自动分类管理水平,从海量数据中高效、准确、快速获取有价值的信息与知识成为智慧图书馆亟待研究与解决的问题。文章提出了在数字图书馆服务中运用新型文本聚类群智能分析方法。该算法通过改进文本间的语义相似度计算,融合K-means聚类算法与蚁群聚类算法(Ant Colony Optimization,ACO)的优点,在初始分类时将K-means聚类算法用作快速分类,用分类结果指导更新蚂蚁各途径信息素,指导蚂蚁后续聚类途径选择,提高聚类运行效率。该分析方法因为不需要类别的信息,能自动完成文本分组,所以可以更好地应用到图书馆资源的推荐与检索服务中。图书馆数字文本数据库实验证明,混合蚁群聚类算法比单独的K-means、ACO都具有更好的聚类效果,可以看出该算法的有效性。展开更多
In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising...In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising data based on a semantic description in coal mines is studied.First,the semantic and numerical-based hybrid description method of security supervising data in coal mines is described.Secondly,the similarity measurement method of semantic and numerical data are separately given and a weight-based hybrid similarity measurement method for the security supervising data based on a semantic description in coal mines is presented.Thirdly,taking the hybrid similarity measurement method as the distance criteria and using a grid methodology for reference,an improved CURE clustering algorithm based on the grid is presented.Finally,the simulation results of a security supervising data set in coal mines validate the efficiency of the algorithm.展开更多
This paper presents a fuzzy logic approach to efficiently perform unsupervised character classification for improvement in robustness, correctness and speed of a character recognition system. The characters are first ...This paper presents a fuzzy logic approach to efficiently perform unsupervised character classification for improvement in robustness, correctness and speed of a character recognition system. The characters are first split into eight typographical categories. The classification scheme uses pattern matching to classify the characters in each category into a set of fuzzy prototypes based on a nonlinear weighted similarity function. The fuzzy unsupervised character classification, which is natural in the repre...展开更多
文摘Using similar single-difference methodology(SSDM) to solve the deformation values of the monitoring points, there is unstability of the deformation information series, at sometimes.In order to overcome this shortcoming, Kalman filtering algorithm for this series is established,and its correctness and validity are verified with the test data obtained on the movable platform in plane. The results show that Kalman filtering can improve the correctness, reliability and stability of the deformation information series.
基金the National Land and Resource Bureau Science and Technology Foundation (No. 20001020304).
文摘A new similar single-difference mathematical model (SS-DM) and its corresponding algorithmare advanced to solve the deformationof monitoring point directly in singleepoch. The method for building theSSDM is introduced in detail, and themain error sources affecting the accu-racy of deformation measurement areanalyzed briefly, and the basic algo-rithm and steps of solving the deform-ation are discussed.In order to validate the correctnessand the accuracy of the similar single-difference model, the test with fivedual frequency receivers is carried outon a slideway which moved in plane inFeb. 2001. In the test,five sessions areobserved. The numerical results oftest data show that the advanced mod-el is correct.
文摘K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper proposes an improved K-means algorithm based on the similarity matrix. The im- proved algorithm can effectively avoid the random selection of initial center points, therefore it can provide effective initial points for clustering process, and reduce the fluctuation of clustering results which are resulted from initial points selections, thus a better clustering quality can be obtained. The experimental results also show that the F-measure of the improved K-means algorithm has been greatly improved and the clustering results are more stable.
基金the National Natural Science Foundation of China(Grant No. 29767001).
文摘The molecular similarity of 139 organic compounds was calculated by the topologic index method, the flexible super-ball algorithm was used to scan similar molecules and structures. The results show that the properties of organic compounds estimated from this method are reliable.
文摘The fundamental problem of similarity studies, in the frame of data-mining, is to examine and detect similar items in articles, papers, and books with huge sizes. In this paper, we are interested in the probabilistic, and the statistical and the algorithmic aspects in studies of texts. We will be using the approach of k-shinglings, a k-shingling being defined as a sequence of k consecutive characters that are extracted from a text (k ≥ 1). The main stake in this field is to find accurate and quick algorithms to compute the similarity in short times. This will be achieved in using approximation methods. The first approximation method is statistical and, is based on the theorem of Glivenko-Cantelli. The second is the banding technique. And the third concerns a modification of the algorithm proposed by Rajaraman et al. ([1]), denoted here as (RUM). The Jaccard index is the one being used in this paper. We finally illustrate these results of the paper on the four Gospels. The results are very conclusive.
基金Supported by the National Natural Science Foundation of China (No. 60872018)the Specialized Research Fund for the Doctoral Program of Higher Education (No. 20070293001)973 Project (No. 2007CB310607)
文摘In this paper, we proposed an improved hybrid semantic matching algorithm combining Input/Output (I/O) semantic matching with text lexical similarity to overcome the disadvantage that the existing semantic matching algorithms were unable to distinguish those services with the same I/O by only performing I/O based service signature matching in semantic web service discovery techniques. The improved algorithm consists of two steps, the first is logic based I/O concept ontology matching, through which the candidate service set is obtained and the second is the service name matching with lexical similarity against the candidate service set, through which the final precise matching result is concluded. Using Ontology Web Language for Services (OWL-S) test collection, we tested our hybrid algorithm and compared it with OWL-S Matchmaker-X (OWLS-MX), the experimental results have shown that the proposed algorithm could pick out the most suitable advertised service corresponding to user's request from very similar ones and provide better matching precision and efficiency than OWLS-MX.
基金This work was funded by the National Natural Science Foundation of China under Grant(No.61772152 and No.61502037)the Basic Research Project(Nos.JCKY2016206B001,JCKY2014206C002 and JCKY2017604C010)the Technical Foundation Project(No.JSQB2017206C002).
文摘Borda sorting algorithm is a kind of improvement algorithm based on weighted position sorting algorithm,it is mainly suitable for the high duplication of search results,for the independent search results,the effect is not very good and the computing method of relative score in Borda sorting algorithm is according to the rule of the linear regressive,but position relationship cannot fully represent the correlation changes.aimed at this drawback,the new sorting algorithm is proposed in this paper,named PMS-Sorting algorithm,firstly the position score of the returned results is standardized processing,and the similarity retrieval word string with the query results is combined into the algorithm,the similarity calculation method is also improved,through the experiment,the improved algorithm is superior to traditional sorting algorithm.
文摘Pattern discovery from time series is of fundamental importance. Most of the algorithms of pattern discovery in time series capture the values of time series based on some kinds of similarity measures. Affected by the scale and baseline, value-based methods bring about problem when the objective is to capture the shape. Thus, a similarity measure based on shape, Sh measure, is originally proposed, andthe properties of this similarity and corresponding proofs are given. Then a time series shape pattern discovery algorithm based on Sh measure is put forward. The proposed algorithm is terminated in finite iteration with given computational and storage complexity. Finally the experiments on synthetic datasets and sunspot datasets demonstrate that the time series shape pattern algorithm is valid.
基金The National Natural Science Foundation of China(No.50674086)Specialized Research Fund for the Doctoral Program of Higher Education(No.20060290508)the Postdoctoral Scientific Program of Jiangsu Province(No.0701045B)
文摘In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising data based on a semantic description in coal mines is studied.First,the semantic and numerical-based hybrid description method of security supervising data in coal mines is described.Secondly,the similarity measurement method of semantic and numerical data are separately given and a weight-based hybrid similarity measurement method for the security supervising data based on a semantic description in coal mines is presented.Thirdly,taking the hybrid similarity measurement method as the distance criteria and using a grid methodology for reference,an improved CURE clustering algorithm based on the grid is presented.Finally,the simulation results of a security supervising data set in coal mines validate the efficiency of the algorithm.
文摘This paper presents a fuzzy logic approach to efficiently perform unsupervised character classification for improvement in robustness, correctness and speed of a character recognition system. The characters are first split into eight typographical categories. The classification scheme uses pattern matching to classify the characters in each category into a set of fuzzy prototypes based on a nonlinear weighted similarity function. The fuzzy unsupervised character classification, which is natural in the repre...