Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities...Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities in intelligent data analyzing applications are mostly represented with the help of IF-THEN rules. With the help of these rules the following tasks are solved: prediction, classification, pattern recognition and others. Using different approaches---clustering algorithms, neural network methods, fuzzy rule processing methods--we can extract rules that in an understandable language characterize the data. This allows interpreting the data, finding relationships in the data and extracting new rules that characterize them. Knowledge acquisition in this paper is defined as the process of extracting knowledge from numerical data in the form of rules. Extraction of rules in this context is based on clustering methods K-means and fuzzy C-means. With the assistance of K-means, clustering algorithm rules are derived from trained neural networks. Fuzzy C-means is used in fuzzy rule based design method. Rule extraction methodology is demonstrated in the Fisher's Iris flower data set samples. The effectiveness of the extracted rules is evaluated. Clustering and rule extraction methodology can be widely used in evaluating and analyzing various economic and financial processes.展开更多
One of the most important problems of clustering is to define the number of classes. In fact, it is not easy to find an appropriate method to measure whether the cluster configuration is acceptable or not. In this pap...One of the most important problems of clustering is to define the number of classes. In fact, it is not easy to find an appropriate method to measure whether the cluster configuration is acceptable or not. In this paper we propose a possible and non-automatic solution considering different criteria of clustering and comparing their results. In this way robust structures of an analyzed dataset can be often caught (or established) and an optimal cluster configuration, which presents a meaningful association, may be defined. In particular, we also focus on the variables which may be used in cluster analysis. In fact, variables which contain little clustering information can cause misleading and not-robustness results. Therefore, three algorithms are employed in this study: K-means partitioning methods, Partitioning Around Medoids (PAM) and the Heuristic Identification of Noisy Variables (HINoV). The results are compared with robust methods ones.展开更多
It is illegal to spread and transmit pornographic images over internet,either in real or in artificial format.The traditional methods are designed to identify real pornographic images and they are less efficient in de...It is illegal to spread and transmit pornographic images over internet,either in real or in artificial format.The traditional methods are designed to identify real pornographic images and they are less efficient in dealing with artificial images.Therefore,criminals turn to release artificial pornographic images in some specific scenes,e.g.,in social networks.To efficiently identify artificial pornographic images,a novel bag-of-visual-words based approach is proposed in the work.In the bag-of-words(Bo W)framework,speeded-up robust feature(SURF)is adopted for feature extraction at first,then a visual vocabulary is constructed through K-means clustering and images are represented by an improved Bo W encoding method,and finally the visual words are fed into a learning machine for training and classification.Different from the traditional BoW method,the proposed method sets a weight on each visual word according to the number of features that each cluster contains.Moreover,a non-binary encoding method and cross-matching strategy are utilized to improve the discriminative power of the visual words.Experimental results indicate that the proposed method outperforms the traditional method.展开更多
文摘Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities in intelligent data analyzing applications are mostly represented with the help of IF-THEN rules. With the help of these rules the following tasks are solved: prediction, classification, pattern recognition and others. Using different approaches---clustering algorithms, neural network methods, fuzzy rule processing methods--we can extract rules that in an understandable language characterize the data. This allows interpreting the data, finding relationships in the data and extracting new rules that characterize them. Knowledge acquisition in this paper is defined as the process of extracting knowledge from numerical data in the form of rules. Extraction of rules in this context is based on clustering methods K-means and fuzzy C-means. With the assistance of K-means, clustering algorithm rules are derived from trained neural networks. Fuzzy C-means is used in fuzzy rule based design method. Rule extraction methodology is demonstrated in the Fisher's Iris flower data set samples. The effectiveness of the extracted rules is evaluated. Clustering and rule extraction methodology can be widely used in evaluating and analyzing various economic and financial processes.
文摘One of the most important problems of clustering is to define the number of classes. In fact, it is not easy to find an appropriate method to measure whether the cluster configuration is acceptable or not. In this paper we propose a possible and non-automatic solution considering different criteria of clustering and comparing their results. In this way robust structures of an analyzed dataset can be often caught (or established) and an optimal cluster configuration, which presents a meaningful association, may be defined. In particular, we also focus on the variables which may be used in cluster analysis. In fact, variables which contain little clustering information can cause misleading and not-robustness results. Therefore, three algorithms are employed in this study: K-means partitioning methods, Partitioning Around Medoids (PAM) and the Heuristic Identification of Noisy Variables (HINoV). The results are compared with robust methods ones.
基金Projects(41001260,61173122,61573380) supported by the National Natural Science Foundation of ChinaProject(11JJ5044) supported by the Hunan Provincial Natural Science Foundation of China
文摘It is illegal to spread and transmit pornographic images over internet,either in real or in artificial format.The traditional methods are designed to identify real pornographic images and they are less efficient in dealing with artificial images.Therefore,criminals turn to release artificial pornographic images in some specific scenes,e.g.,in social networks.To efficiently identify artificial pornographic images,a novel bag-of-visual-words based approach is proposed in the work.In the bag-of-words(Bo W)framework,speeded-up robust feature(SURF)is adopted for feature extraction at first,then a visual vocabulary is constructed through K-means clustering and images are represented by an improved Bo W encoding method,and finally the visual words are fed into a learning machine for training and classification.Different from the traditional BoW method,the proposed method sets a weight on each visual word according to the number of features that each cluster contains.Moreover,a non-binary encoding method and cross-matching strategy are utilized to improve the discriminative power of the visual words.Experimental results indicate that the proposed method outperforms the traditional method.