During software development,developers tend to tangle multiple concerns into a single commit,resulting in many composite commits.This paper studies the problem of detecting and untangling composite commits,so as to im...During software development,developers tend to tangle multiple concerns into a single commit,resulting in many composite commits.This paper studies the problem of detecting and untangling composite commits,so as to improve the maintainability and understandability of software.Our approach is built upon the observation that both the textual content of code statements and the dependencies between code statements are helpful in comprehending the code commit.Based on this observation,we first construct an attributed graph for each commit,where code statements and various code dependencies are modeled as nodes and edges,respectively,and the textual bodies of code statements are maintained as node attributes.Based on the attributed graph,we propose graph-based learning algorithms that first detect whether the given commit is a composite commit,and then untangle the composite commit into atomic ones.We evaluate our approach on nine C#projects,and the results demonstrate the effectiveness and efficiency of our approach.展开更多
Contrastive self‐supervised representation learning on attributed graph networks with Graph Neural Networks has attracted considerable research interest recently.However,there are still two challenges.First,most of t...Contrastive self‐supervised representation learning on attributed graph networks with Graph Neural Networks has attracted considerable research interest recently.However,there are still two challenges.First,most of the real‐word system are multiple relations,where entities are linked by different types of relations,and each relation is a view of the graph network.Second,the rich multi‐scale information(structure‐level and feature‐level)of the graph network can be seen as self‐supervised signals,which are not fully exploited.A novel contrastive self‐supervised representation learning framework on attributed multiplex graph networks with multi‐scale(named CoLM^(2)S)information is presented in this study.It mainly contains two components:intra‐relation contrast learning and interrelation contrastive learning.Specifically,the contrastive self‐supervised representation learning framework on attributed single‐layer graph networks with multi‐scale information(CoLMS)framework with the graph convolutional network as encoder to capture the intra‐relation information with multi‐scale structure‐level and feature‐level selfsupervised signals is introduced first.The structure‐level information includes the edge structure and sub‐graph structure,and the feature‐level information represents the output of different graph convolutional layer.Second,according to the consensus assumption among inter‐relations,the CoLM^(2)S framework is proposed to jointly learn various graph relations in attributed multiplex graph network to achieve global consensus node embedding.The proposed method can fully distil the graph information.Extensive experiments on unsupervised node clustering and graph visualisation tasks demonstrate the effectiveness of our methods,and it outperforms existing competitive baselines.展开更多
The identification of design pattern instances is important for program understanding and software maintenance. Aiming at the mining of design patterns in existing systems, this paper proposes a subgraph isomorphism a...The identification of design pattern instances is important for program understanding and software maintenance. Aiming at the mining of design patterns in existing systems, this paper proposes a subgraph isomorphism approach to discover several design patterns in a legacy system at a time. The attributed relational graph is used to describe design patterns and legacy systems. The sub-graph isomorphism approach consists of decomposition and composition process. During the decomposition process, graphs corresponding to the design patterns are decomposed into sub-graphs, some of which are graphs corresponding to the elemental design patterns. The composition process tries to get sub-graph isomorphism of the matched graph if sub-graph isomorphism of each subgraph is obtained. Due to the common structures between design patterns, the proposed approach can reduce the matching times of entities and relations. Compared with the existing methods, the proposed algorithm is not linearly dependent on the number of design pattern graphs. Key words design pattern mining - attributed relational graph - subgraph isomorphism CLC number TP 311.5 Foundation item: Supported by the National Natural Science Foundation of China (60273075) and the Science Foundation of Naval University of Engineering (HGDJJ03019)Biography: LI Qing-hua (1940-), male, Professor, research direction: parallel computing.展开更多
A phishing detection system, which comprises client-side filtering plug-in, analysis center and protected sites, is proposed. An image-based similarity detection algorithm is conceived to calculate the similarity of t...A phishing detection system, which comprises client-side filtering plug-in, analysis center and protected sites, is proposed. An image-based similarity detection algorithm is conceived to calculate the similarity of two web pages. The web pages are first converted into images, and then divided into sub-images with iterated dividing and shrinking. After that, the attributes of sub-images including color histograms, gray histograms and size parameters are computed to construct the attributed relational graph(ARG)of each page. In order to match two ARGs, the inner earth mover's distances(EMD)between every two nodes coming from each ARG respectively are first computed, and then the similarity of web pages by the outer EMD between two ARGs is worked out to detect phishing web pages. The experimental results show that the proposed architecture and algorithm has good robustness along with scalability, and can effectively detect phishing.展开更多
CAD model retrieval based on functional semantics is more significant than content-based 3D model retrieval during the mechanical conceptual design phase. However, relevant research is still not fully discussed. There...CAD model retrieval based on functional semantics is more significant than content-based 3D model retrieval during the mechanical conceptual design phase. However, relevant research is still not fully discussed. Therefore, a functional semantic-based CAD model annotation and retrieval method is proposed to support mechanical conceptual design and design reuse, inspire designer creativity through existing CAD models, shorten design cycle, and reduce costs. Firstly, the CAD model functional semantic ontology is constructed to formally represent the functional semantics of CAD models and describe the mechanical conceptual design space comprehensively and consistently. Secondly, an approach to represent CAD models as attributed adjacency graphs(AAG) is proposed. In this method, the geometry and topology data are extracted from STEP models. On the basis of AAG, the functional semantics of CAD models are annotated semi-automatically by matching CAD models that contain the partial features of which functional semantics have been annotated manually, thereby constructing CAD Model Repository that supports model retrieval based on functional semantics. Thirdly, a CAD model retrieval algorithm that supports multi-function extended retrieval is proposed to explore more potential creative design knowledge in the semantic level. Finally, a prototype system, called Functional Semantic-based CAD Model Annotation and Retrieval System(FSMARS), is implemented. A case demonstrates that FSMARS can successfully botain multiple potential CAD models that conform to the desired function. The proposed research addresses actual needs and presents a new way to acquire CAD models in the mechanical conceptual design phase.展开更多
Kinematic semantics is often an important content of a CAD model(it refers to a single part/solid model in this work)in many applications,but it is usually not the belonging of the model,especially for the one retriev...Kinematic semantics is often an important content of a CAD model(it refers to a single part/solid model in this work)in many applications,but it is usually not the belonging of the model,especially for the one retrieved from a common database.Especially,the effective and automatic method to reconstruct the above information for a CAD model is still rare.To address this issue,this paper proposes a smart approach to identify each assembly interface on every CAD model since the assembly interface is the fundamental but key element of reconstructing kinematic semantics.First,as the geometry of an assembly interface is formed by one or more adjacent faces on each model,a face-attributed adjacency graph integrated with face structure fingerprint is proposed.This can describe each CAD model as well as its assembly interfaces uniformly.After that,aided by the above descriptor,an improved graph attention network is developed based on a new dual-level anti-interference filtering mechanism,which makes it have the great potential to identify all representative kinds of assembly interface faces with high accuracy that have various geometric shapes but consistent kinematic semantics.Moreover,based on the abovementioned graph and face-adjacent relationships,each assembly interface on a model can be identified.Finally,experiments on representative CAD models are implemented to verify the effectiveness and characteristics of the proposed approach.The results show that the average assembly-interface-face-identification accuracy of the proposed approach can reach 91.75%,which is about 2%–5%higher than those of the recent-representative graph neural networks.Besides,compared with the state-of-the-art methods,our approach is more suitable to identify the assembly interfaces(with various shapes)for each individual CAD model that has typical kinematic pairs.展开更多
Graph transformation systems have become a general formal modeling language to describe many models in software development process.Behavioral modeling of dynamic systems and model-to-model transformations are only a ...Graph transformation systems have become a general formal modeling language to describe many models in software development process.Behavioral modeling of dynamic systems and model-to-model transformations are only a few examples in which graphs have been used to software development.But even the perfect graph transformation system must be equipped with automated analysis capabilities to let users understand whether such a formal specification fulfills their requirements.In this paper,we present a new solution to verify graph transformation systems using the Bogor model checker.The attributed graph grammars(AGG)-like graph transformation systems are translated to Bandera intermediate representation(BIR),the input language of Bogor,and Bogor verifies the model against some interesting properties defined by combining linear temporal logic(LTL) and special-purpose graph rules.Experimental results are encouraging,showing that in most cases our solution improves existing approaches in terms of both performance and expressiveness.展开更多
Network modeling is an important approach in many fields in analyzing complex systems. Recently new series of methods have emerged, by using Kronecker product and similar tools to model real systems. One of such appro...Network modeling is an important approach in many fields in analyzing complex systems. Recently new series of methods have emerged, by using Kronecker product and similar tools to model real systems. One of such approaches is the multiplicative attribute graph(MAG) model, which generates networks based on category attributes of nodes. In this paper we try to extend this model into a continuous one, give an overview of its properties, and discuss some special cases related to real-world networks, as well as the influence of attribute distribution and affinity function respectively.展开更多
One paper in a preceding issue of this journal has introduced the Bayesian Ying-Yang(BYY)harmony learning from a perspective of problem solving,parameter learning,and model selection.In a complementary role,the paper ...One paper in a preceding issue of this journal has introduced the Bayesian Ying-Yang(BYY)harmony learning from a perspective of problem solving,parameter learning,and model selection.In a complementary role,the paper provides further insights from another perspective that a co-dimensional matrix pair(shortly co-dim matrix pair)forms a building unit and a hierarchy of such building units sets up the BYY system.The BYY harmony learning is re-examined via exploring the nature of a co-dim matrix pair,which leads to improved learning performance with refined model selection criteria and a modified mechanism that coordinates automatic model selection and sparse learning.Besides updating typical algorithms of factor analysis(FA),binary FA(BFA),binary matrix factorization(BMF),and nonnegative matrix factorization(NMF)to share such a mechanism,we are also led to(a)a new parametrization that embeds a de-noise nature to Gaussian mixture and local FA(LFA);(b)an alternative formulation of graph Laplacian based linear manifold learning;(c)a codecomposition of data and covariance for learning regularization and data integration;and(d)a co-dim matrix pair based generalization of temporal FA and state space model.Moreover,with help of a co-dim matrix pair in Hadamard product,we are led to a semi-supervised formation for regression analysis and a semi-blind learning formation for temporal FA and state space model.Furthermore,we address that these advances provide with new tools for network biology studies,including learning transcriptional regulatory,Protein-Protein Interaction network alignment,and network integration.展开更多
基金supported by the National Natural Science Foundation of China under Grant No.62025202the Fundamental Research Funds for the Central Universities under Grant No.020214380102.
文摘During software development,developers tend to tangle multiple concerns into a single commit,resulting in many composite commits.This paper studies the problem of detecting and untangling composite commits,so as to improve the maintainability and understandability of software.Our approach is built upon the observation that both the textual content of code statements and the dependencies between code statements are helpful in comprehending the code commit.Based on this observation,we first construct an attributed graph for each commit,where code statements and various code dependencies are modeled as nodes and edges,respectively,and the textual bodies of code statements are maintained as node attributes.Based on the attributed graph,we propose graph-based learning algorithms that first detect whether the given commit is a composite commit,and then untangle the composite commit into atomic ones.We evaluate our approach on nine C#projects,and the results demonstrate the effectiveness and efficiency of our approach.
基金support by the National Natural Science Foundation of China(NSFC)under grant number 61873274.
文摘Contrastive self‐supervised representation learning on attributed graph networks with Graph Neural Networks has attracted considerable research interest recently.However,there are still two challenges.First,most of the real‐word system are multiple relations,where entities are linked by different types of relations,and each relation is a view of the graph network.Second,the rich multi‐scale information(structure‐level and feature‐level)of the graph network can be seen as self‐supervised signals,which are not fully exploited.A novel contrastive self‐supervised representation learning framework on attributed multiplex graph networks with multi‐scale(named CoLM^(2)S)information is presented in this study.It mainly contains two components:intra‐relation contrast learning and interrelation contrastive learning.Specifically,the contrastive self‐supervised representation learning framework on attributed single‐layer graph networks with multi‐scale information(CoLMS)framework with the graph convolutional network as encoder to capture the intra‐relation information with multi‐scale structure‐level and feature‐level selfsupervised signals is introduced first.The structure‐level information includes the edge structure and sub‐graph structure,and the feature‐level information represents the output of different graph convolutional layer.Second,according to the consensus assumption among inter‐relations,the CoLM^(2)S framework is proposed to jointly learn various graph relations in attributed multiplex graph network to achieve global consensus node embedding.The proposed method can fully distil the graph information.Extensive experiments on unsupervised node clustering and graph visualisation tasks demonstrate the effectiveness of our methods,and it outperforms existing competitive baselines.
文摘The identification of design pattern instances is important for program understanding and software maintenance. Aiming at the mining of design patterns in existing systems, this paper proposes a subgraph isomorphism approach to discover several design patterns in a legacy system at a time. The attributed relational graph is used to describe design patterns and legacy systems. The sub-graph isomorphism approach consists of decomposition and composition process. During the decomposition process, graphs corresponding to the design patterns are decomposed into sub-graphs, some of which are graphs corresponding to the elemental design patterns. The composition process tries to get sub-graph isomorphism of the matched graph if sub-graph isomorphism of each subgraph is obtained. Due to the common structures between design patterns, the proposed approach can reduce the matching times of entities and relations. Compared with the existing methods, the proposed algorithm is not linearly dependent on the number of design pattern graphs. Key words design pattern mining - attributed relational graph - subgraph isomorphism CLC number TP 311.5 Foundation item: Supported by the National Natural Science Foundation of China (60273075) and the Science Foundation of Naval University of Engineering (HGDJJ03019)Biography: LI Qing-hua (1940-), male, Professor, research direction: parallel computing.
基金The National Basic Research Program of China (973Program)(2010CB328104,2009CB320501)the National Natural Science Foundation of China (No.60773103,90912002)+1 种基金Specialized Research Fund for the Doctoral Program of Higher Education(No.200802860031)Key Laboratory of Computer Network and Information Integration of Ministry of Education of China (No.93K-9)
文摘A phishing detection system, which comprises client-side filtering plug-in, analysis center and protected sites, is proposed. An image-based similarity detection algorithm is conceived to calculate the similarity of two web pages. The web pages are first converted into images, and then divided into sub-images with iterated dividing and shrinking. After that, the attributes of sub-images including color histograms, gray histograms and size parameters are computed to construct the attributed relational graph(ARG)of each page. In order to match two ARGs, the inner earth mover's distances(EMD)between every two nodes coming from each ARG respectively are first computed, and then the similarity of web pages by the outer EMD between two ARGs is worked out to detect phishing web pages. The experimental results show that the proposed architecture and algorithm has good robustness along with scalability, and can effectively detect phishing.
基金Supported by National Natural Science Foundation of China (Grant No.51175287)National Science and Technology Major Project of China (Grant No.2011ZX02403)
文摘CAD model retrieval based on functional semantics is more significant than content-based 3D model retrieval during the mechanical conceptual design phase. However, relevant research is still not fully discussed. Therefore, a functional semantic-based CAD model annotation and retrieval method is proposed to support mechanical conceptual design and design reuse, inspire designer creativity through existing CAD models, shorten design cycle, and reduce costs. Firstly, the CAD model functional semantic ontology is constructed to formally represent the functional semantics of CAD models and describe the mechanical conceptual design space comprehensively and consistently. Secondly, an approach to represent CAD models as attributed adjacency graphs(AAG) is proposed. In this method, the geometry and topology data are extracted from STEP models. On the basis of AAG, the functional semantics of CAD models are annotated semi-automatically by matching CAD models that contain the partial features of which functional semantics have been annotated manually, thereby constructing CAD Model Repository that supports model retrieval based on functional semantics. Thirdly, a CAD model retrieval algorithm that supports multi-function extended retrieval is proposed to explore more potential creative design knowledge in the semantic level. Finally, a prototype system, called Functional Semantic-based CAD Model Annotation and Retrieval System(FSMARS), is implemented. A case demonstrates that FSMARS can successfully botain multiple potential CAD models that conform to the desired function. The proposed research addresses actual needs and presents a new way to acquire CAD models in the mechanical conceptual design phase.
基金supported by the National Natural Science Foundation of China[61702147]the Zhejiang Provincial Science and Technology Program in China[2021C03137].
文摘Kinematic semantics is often an important content of a CAD model(it refers to a single part/solid model in this work)in many applications,but it is usually not the belonging of the model,especially for the one retrieved from a common database.Especially,the effective and automatic method to reconstruct the above information for a CAD model is still rare.To address this issue,this paper proposes a smart approach to identify each assembly interface on every CAD model since the assembly interface is the fundamental but key element of reconstructing kinematic semantics.First,as the geometry of an assembly interface is formed by one or more adjacent faces on each model,a face-attributed adjacency graph integrated with face structure fingerprint is proposed.This can describe each CAD model as well as its assembly interfaces uniformly.After that,aided by the above descriptor,an improved graph attention network is developed based on a new dual-level anti-interference filtering mechanism,which makes it have the great potential to identify all representative kinds of assembly interface faces with high accuracy that have various geometric shapes but consistent kinematic semantics.Moreover,based on the abovementioned graph and face-adjacent relationships,each assembly interface on a model can be identified.Finally,experiments on representative CAD models are implemented to verify the effectiveness and characteristics of the proposed approach.The results show that the average assembly-interface-face-identification accuracy of the proposed approach can reach 91.75%,which is about 2%–5%higher than those of the recent-representative graph neural networks.Besides,compared with the state-of-the-art methods,our approach is more suitable to identify the assembly interfaces(with various shapes)for each individual CAD model that has typical kinematic pairs.
文摘Graph transformation systems have become a general formal modeling language to describe many models in software development process.Behavioral modeling of dynamic systems and model-to-model transformations are only a few examples in which graphs have been used to software development.But even the perfect graph transformation system must be equipped with automated analysis capabilities to let users understand whether such a formal specification fulfills their requirements.In this paper,we present a new solution to verify graph transformation systems using the Bogor model checker.The attributed graph grammars(AGG)-like graph transformation systems are translated to Bandera intermediate representation(BIR),the input language of Bogor,and Bogor verifies the model against some interesting properties defined by combining linear temporal logic(LTL) and special-purpose graph rules.Experimental results are encouraging,showing that in most cases our solution improves existing approaches in terms of both performance and expressiveness.
基金the National Natural Science Foundation of China(No.61379074)the Zhejiang Provincial Natural Science Foundation of China(No.LZ12F02003)
文摘Network modeling is an important approach in many fields in analyzing complex systems. Recently new series of methods have emerged, by using Kronecker product and similar tools to model real systems. One of such approaches is the multiplicative attribute graph(MAG) model, which generates networks based on category attributes of nodes. In this paper we try to extend this model into a continuous one, give an overview of its properties, and discuss some special cases related to real-world networks, as well as the influence of attribute distribution and affinity function respectively.
基金supported by the General Research Fund from Research Grant Council of Hong Kong(Project No.CUHK4180/10E)the National Basic Research Program of China(973 Program)(No.2009CB825404).
文摘One paper in a preceding issue of this journal has introduced the Bayesian Ying-Yang(BYY)harmony learning from a perspective of problem solving,parameter learning,and model selection.In a complementary role,the paper provides further insights from another perspective that a co-dimensional matrix pair(shortly co-dim matrix pair)forms a building unit and a hierarchy of such building units sets up the BYY system.The BYY harmony learning is re-examined via exploring the nature of a co-dim matrix pair,which leads to improved learning performance with refined model selection criteria and a modified mechanism that coordinates automatic model selection and sparse learning.Besides updating typical algorithms of factor analysis(FA),binary FA(BFA),binary matrix factorization(BMF),and nonnegative matrix factorization(NMF)to share such a mechanism,we are also led to(a)a new parametrization that embeds a de-noise nature to Gaussian mixture and local FA(LFA);(b)an alternative formulation of graph Laplacian based linear manifold learning;(c)a codecomposition of data and covariance for learning regularization and data integration;and(d)a co-dim matrix pair based generalization of temporal FA and state space model.Moreover,with help of a co-dim matrix pair in Hadamard product,we are led to a semi-supervised formation for regression analysis and a semi-blind learning formation for temporal FA and state space model.Furthermore,we address that these advances provide with new tools for network biology studies,including learning transcriptional regulatory,Protein-Protein Interaction network alignment,and network integration.