Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subse...Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.展开更多
The legacy of United States cluster munition use in Laos and Cambodia during the Second Indochina War is residual bomblets that unexpectedly detonate years later, killing and injuring children, farmers, and other civi...The legacy of United States cluster munition use in Laos and Cambodia during the Second Indochina War is residual bomblets that unexpectedly detonate years later, killing and injuring children, farmers, and other civilians. Cluster munitions release dozens of smaller bomblets that rain deadly ammunition on troops, armored tanks, and vegetation, effectively striking broad sections of war zone landscapes in one launch. While many bomblets detonate immediately, others fail to detonate and can lie dormant on the ground for years. The primary objectives of this study were to document the long-term consequences and impacts of the US Air Force bombing of Laos and Cambodia during the Second Indochina War (1959 to 1973). The historical lessons learned by United States should be shared with Russia and Ukraine governments and military. These countries need to discontinue the use of cluster bombs to prevent additional people living along the Russia-Ukraine border from having to live and die with the consequences of unexploded ordnance, including cluster bombs, for the next century.展开更多
Reliable Cluster Head(CH)selectionbased routing protocols are necessary for increasing the packet transmission efficiency with optimal path discovery that never introduces degradation over the transmission reliability...Reliable Cluster Head(CH)selectionbased routing protocols are necessary for increasing the packet transmission efficiency with optimal path discovery that never introduces degradation over the transmission reliability.In this paper,Hybrid Golden Jackal,and Improved Whale Optimization Algorithm(HGJIWOA)is proposed as an effective and optimal routing protocol that guarantees efficient routing of data packets in the established between the CHs and the movable sink.This HGJIWOA included the phases of Dynamic Lens-Imaging Learning Strategy and Novel Update Rules for determining the reliable route essential for data packets broadcasting attained through fitness measure estimation-based CH selection.The process of CH selection achieved using Golden Jackal Optimization Algorithm(GJOA)completely depends on the factors of maintainability,consistency,trust,delay,and energy.The adopted GJOA algorithm play a dominant role in determining the optimal path of routing depending on the parameter of reduced delay and minimal distance.It further utilized Improved Whale Optimisation Algorithm(IWOA)for forwarding the data from chosen CHs to the BS via optimized route depending on the parameters of energy and distance.It also included a reliable route maintenance process that aids in deciding the selected route through which data need to be transmitted or re-routed.The simulation outcomes of the proposed HGJIWOA mechanism with different sensor nodes confirmed an improved mean throughput of 18.21%,sustained residual energy of 19.64%with minimized end-to-end delay of 21.82%,better than the competitive CH selection approaches.展开更多
Customer segmentation according to load-shape profiles using smart meter data is an increasingly important application to vital the planning and operation of energy systems and to enable citizens’participation in the...Customer segmentation according to load-shape profiles using smart meter data is an increasingly important application to vital the planning and operation of energy systems and to enable citizens’participation in the energy transition.This study proposes an innovative multi-step clustering procedure to segment customers based on load-shape patterns at the daily and intra-daily time horizons.Smart meter data is split between daily and hourly normalized time series to assess monthly,weekly,daily,and hourly seasonality patterns separately.The dimensionality reduction implicit in the splitting allows a direct approach to clustering raw daily energy time series data.The intraday clustering procedure sequentially identifies representative hourly day-unit profiles for each customer and the entire population.For the first time,a step function approach is applied to reduce time series dimensionality.Customer attributes embedded in surveys are employed to build external clustering validation metrics using Cramer’s V correlation factors and to identify statistically significant determinants of load-shape in energy usage.In addition,a time series features engineering approach is used to extract 16 relevant demand flexibility indicators that characterize customers and corresponding clusters along four different axes:available Energy(E),Temporal patterns(T),Consistency(C),and Variability(V).The methodology is implemented on a real-world electricity consumption dataset of 325 Small and Medium-sized Enterprise(SME)customers,identifying 4 daily and 6 hourly easy-to-interpret,well-defined clusters.The application of the methodology includes selecting key parameters via grid search and a thorough comparison of clustering distances and methods to ensure the robustness of the results.Further research can test the scalability of the methodology to larger datasets from various customer segments(households and large commercial)and locations with different weather and socioeconomic conditions.展开更多
An aluminoborate,Na_(2.5)Rb[Al{B_(5)O_(10)}{B_(3)O_(5)}]·0.5NO_(3)·H_(2)O(1),was synthesized under hydrothermal condition,which was built by mixed oxoboron clusters and AlO_(4)tetrahedra.In the structure,the...An aluminoborate,Na_(2.5)Rb[Al{B_(5)O_(10)}{B_(3)O_(5)}]·0.5NO_(3)·H_(2)O(1),was synthesized under hydrothermal condition,which was built by mixed oxoboron clusters and AlO_(4)tetrahedra.In the structure,the[B_(5)O_(10)]^(5-)and[B_(3)O_(7)]^(5-)clusters are alternately connected to form 1D[B_(8)O_(15)]_(n)^(6n-)chains,which are further linked by AlO_(4)units to form a 2D monolayer with 7‑membered ring and 10‑membered ring windows.Two adjacent monolayers with opposite orientations further form a porous‑layered structure with six channels through B—O—Al bonds.Compound 1 was characterized by single crystal X‑ray diffraction,powder X‑ray diffraction(PXRD),IR spectroscopy,UV‑Vis diffuse reflection spectroscopy,and thermogravimetric analysis(TGA),respectively.UV‑Vis diffuse reflectance analysis indicates that compound 1 shows a wide transparency range with a short cutoff edge of 201 nm,suggesting it may have potential application in UV regions.CCDC:2383923.展开更多
Domaining is a crucial process in geostatistics, particularly when significant spatial variations are observed within a site, as these variations can significantly affect the outcomes of spatial modeling. This study i...Domaining is a crucial process in geostatistics, particularly when significant spatial variations are observed within a site, as these variations can significantly affect the outcomes of spatial modeling. This study investigates the application of hard and fuzzy clustering algorithms for domain delineation, using geological and geochemical data from two exploration campaigns at the eastern Kahang deposit in central Iran. The dataset includes geological layers (lithology, alteration, and mineral zones), geochemical layers (Cu, Mo, Ag, and Au grades), and borehole coordinates. Six clustering algorithms—K-means, hierarchical, affinity propagation, self-organizing map (SOM), fuzzy C-means, and Gustafson-Kessel—were applied to determine the optimal number of clusters, which ranged from 3 to 4. The fuzziness and weighting parameters were found to range from 1.1 to 1.3 and 0.1 to 0.3, respectively, based on the evaluation of various hard and fuzzy cluster validity indices. Directional variograms were computed to assess spatial anisotropy, and the anisotropy ellipsoid for each domain was defined to identify the model with the highest level of anisotropic discrimination among the domains. The SOM algorithm, which incorporated both qualitative and quantitative data, produced the best model, resulting in the identification of three distinct domains. These findings underscore the effectiveness of combining clustering techniques with variogram analysis for accurate domain delineation in geostatistical modeling.展开更多
In order to solve the problems of short network lifetime and high data transmission delay in data gathering for wireless sensor network(WSN)caused by uneven energy consumption among nodes,a hybrid energy efficient clu...In order to solve the problems of short network lifetime and high data transmission delay in data gathering for wireless sensor network(WSN)caused by uneven energy consumption among nodes,a hybrid energy efficient clustering routing base on firefly and pigeon-inspired algorithm(FF-PIA)is proposed to optimise the data transmission path.After having obtained the optimal number of cluster head node(CH),its result might be taken as the basis of producing the initial population of FF-PIA algorithm.The L′evy flight mechanism and adaptive inertia weighting are employed in the algorithm iteration to balance the contradiction between the global search and the local search.Moreover,a Gaussian perturbation strategy is applied to update the optimal solution,ensuring the algorithm can jump out of the local optimal solution.And,in the WSN data gathering,a onedimensional signal reconstruction algorithm model is developed by dilated convolution and residual neural networks(DCRNN).We conducted experiments on the National Oceanic and Atmospheric Administration(NOAA)dataset.It shows that the DCRNN modeldriven data reconstruction algorithm improves the reconstruction accuracy as well as the reconstruction time performance.FF-PIA and DCRNN clustering routing co-simulation reveals that the proposed algorithm can effectively improve the performance in extending the network lifetime and reducing data transmission delay.展开更多
Clustered heavy precipitation(CHP)events can severely impact human society,infrastructure,and natural ecosystems.Consequently,short-term climate prediction of CHP events is vital for the prevention and mitigation of a...Clustered heavy precipitation(CHP)events can severely impact human society,infrastructure,and natural ecosystems.Consequently,short-term climate prediction of CHP events is vital for the prevention and mitigation of associated hazards.Employing year-to-year increment(DY)and multiple linear regression approaches,this study developed a seasonal prediction model for pre-summer(i.e.,May and June)CHP frequency in South China(SC)during 1981–2022.Three robust predictor factors were identified:March sea surface temperature in Southwestern Atlantic,early-winter snow depth in East Europe,and winter soil moisture in Central Asia.Three predictors exert substantial impacts on presummer precipitation in SC via modulation of an anomalous anticyclone(cyclone)over the(subtropical)western North Pacific.In leave-one-out cross-validation test during 1981–2022,the prediction model exhibited reasonable performance in predicting the interannual and interdecadal variations and trends of CHP days.The temporal correlation coefficient(TCC)was 0.66 between the observations and predictions.In the independent hindcast for 2013–2022,the TCC was as high as 0.85.Moreover,coherent covariations were observed between the frequency and the amounts of CHP,with a TCC of 0.99 for 1981–2022.Those three predictors show good performance in forecasting CHP amounts over SC,with a TCC of 0.68 between the predictions and observations in the cross-validation test during 1981–2022 and of 0.86 in the independent hindcasts during 2013–2022.Notably,the predictors also showed good predictive skill for years with high CHP occurrence(e.g.,1998 and 2019).The predicted high-incidence areas of heavy precipitation days were highly consistent with observations,with a pattern correlation coefficient of 0.44(0.55)for 1998(2019).This study provides valuable insights to improve seasonal prediction of pre-summer CHP frequency in SC.展开更多
Container-based virtualization technology has been more widely used in edge computing environments recently due to its advantages of lighter resource occupation, faster startup capability, and better resource utilizat...Container-based virtualization technology has been more widely used in edge computing environments recently due to its advantages of lighter resource occupation, faster startup capability, and better resource utilization efficiency. To meet the diverse needs of tasks, it usually needs to instantiate multiple network functions in the form of containers interconnect various generated containers to build a Container Cluster(CC). Then CCs will be deployed on edge service nodes with relatively limited resources. However, the increasingly complex and timevarying nature of tasks brings great challenges to optimal placement of CC. This paper regards the charges for various resources occupied by providing services as revenue, the service efficiency and energy consumption as cost, thus formulates a Mixed Integer Programming(MIP) model to describe the optimal placement of CC on edge service nodes. Furthermore, an Actor-Critic based Deep Reinforcement Learning(DRL) incorporating Graph Convolutional Networks(GCN) framework named as RL-GCN is proposed to solve the optimization problem. The framework obtains an optimal placement strategy through self-learning according to the requirements and objectives of the placement of CC. Particularly, through the introduction of GCN, the features of the association relationship between multiple containers in CCs can be effectively extracted to improve the quality of placement.The experiment results show that under different scales of service nodes and task requests, the proposed method can obtain the improved system performance in terms of placement error ratio, time efficiency of solution output and cumulative system revenue compared with other representative baseline methods.展开更多
We investigated the ionization and dissociation processes of ammonia clusters ranging from dimer to pentamer induced by 800-nm femtosecond laser fields.Time-of-flight(TOF)mass spectra of the ammonia clusters were reco...We investigated the ionization and dissociation processes of ammonia clusters ranging from dimer to pentamer induced by 800-nm femtosecond laser fields.Time-of-flight(TOF)mass spectra of the ammonia clusters were recorded over a range of laser intensities from 2.1×10^(12)W/cm^(2) to 5.6×10^(12)W/cm^(2).The protonated ion signals dominate the spectra,which is consistent with the stability of the geometric structures.The ionization and dissociation channels of ammonia clusters are discussed.The competition and switching among observed dissociation channels are revealed by analyzing the variations in the relative ionic yields of specific protonated and unprotonated clusters under different laser intensities.These results indicate that the ionization of the neutral multiple-ammonia units,produced through the dissociation of cluster ions,may start to contribute,as well as the additional processes to consume protonated ions and/or produce unprotonated ions induced by the femtosecond laser fields when the laser intensity is above^4×10^(12)W/cm^(2).These findings provide deeper insights into the ionization and dissociation dynamics in multi-photon ionization experiments involving ammonia clusters.展开更多
We study the structural and dynamical properties of A209 based on Chandra and XMM-Newton observations.We obtain detailed temperature,pressure,and entropy maps with the contour binning method,and find a hot region in t...We study the structural and dynamical properties of A209 based on Chandra and XMM-Newton observations.We obtain detailed temperature,pressure,and entropy maps with the contour binning method,and find a hot region in the NW direction.The X-ray brightness residual map and corresponding temperature profiles reveal a possible shock front in the NW direction and a cold front feature in the SE direction.Combined with the galaxy luminosity density map we propose a weak merger scenario.A young sub-cluster passing from the SE to NW direction could explain the optical subpeak,the intracluster medium temperature map,the X-ray surface brightness excess,and the X-ray peak offset together.展开更多
Clustering a social network is a process of grouping social actors into clusters where intra-cluster similarities among actors are higher than inter-cluster similarities. Clustering approaches, i.e. , k-medoids or hie...Clustering a social network is a process of grouping social actors into clusters where intra-cluster similarities among actors are higher than inter-cluster similarities. Clustering approaches, i.e. , k-medoids or hierarchical, use the distance function to measure the dissimilarities among actors. These distance functions need to fulfill various properties, including the triangle inequality (TI). However, in some cases, the triangle inequality might be violated, impacting the quality of the resulting clusters. With experiments, this paper explains how TI violates while performing traditional clustering techniques: k-medoids, hierarchical, DENGRAPH, and spectral clustering on social networks and how the violation of TI affects the quality of the resulting clusters.展开更多
In recent years,many unknown protocols are constantly emerging,and they bring severe challenges to network security and network management.Existing unknown protocol recognition methods suffer from weak feature extract...In recent years,many unknown protocols are constantly emerging,and they bring severe challenges to network security and network management.Existing unknown protocol recognition methods suffer from weak feature extraction ability,and they cannot mine the discriminating features of the protocol data thoroughly.To address the issue,we propose an unknown application layer protocol recognition method based on deep clustering.Deep clustering which consists of the deep neural network and the clustering algorithm can automatically extract the features of the input and cluster the data based on the extracted features.Compared with the traditional clustering methods,deep clustering boasts of higher clustering accuracy.The proposed method utilizes network-in-network(NIN),channel attention,spatial attention and Bidirectional Long Short-term memory(BLSTM)to construct an autoencoder to extract the spatial-temporal features of the protocol data,and utilizes the unsupervised clustering algorithm to recognize the unknown protocols based on the features.The method firstly extracts the application layer protocol data from the network traffic and transforms the data into one-dimensional matrix.Secondly,the autoencoder is pretrained,and the protocol data is compressed into low dimensional latent space by the autoencoder and the initial clustering is performed with K-Means.Finally,the clustering loss is calculated and the classification model is optimized according to the clustering loss.The classification results can be obtained when the classification model is optimal.Compared with the existing unknown protocol recognition methods,the proposed method utilizes deep clustering to cluster the unknown protocols,and it can mine the key features of the protocol data and recognize the unknown protocols accurately.Experimental results show that the proposed method can effectively recognize the unknown protocols,and its performance is better than other methods.展开更多
Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The signif...Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The significance of low-rank prior in MVSC is emphasized, highlighting its role in capturing the global data structure across views for improved performance. However, it faces challenges with outlier sensitivity due to its reliance on the Frobenius norm for error measurement. Addressing this, our paper proposes a Low-Rank Multi-view Subspace Clustering Based on Sparse Regularization (LMVSC- Sparse) approach. Sparse regularization helps in selecting the most relevant features or views for clustering while ignoring irrelevant or noisy ones. This leads to a more efficient and effective representation of the data, improving the clustering accuracy and robustness, especially in the presence of outliers or noisy data. By incorporating sparse regularization, LMVSC-Sparse can effectively handle outlier sensitivity, which is a common challenge in traditional MVSC methods relying solely on low-rank priors. Then Alternating Direction Method of Multipliers (ADMM) algorithm is employed to solve the proposed optimization problems. Our comprehensive experiments demonstrate the efficiency and effectiveness of LMVSC-Sparse, offering a robust alternative to traditional MVSC methods.展开更多
Open clusters(OCs)serve as invaluable tracers for investigating the properties and evolution of stars and galaxies.Despite recent advancements in machine learning clustering algorithms,accurately discerning such clust...Open clusters(OCs)serve as invaluable tracers for investigating the properties and evolution of stars and galaxies.Despite recent advancements in machine learning clustering algorithms,accurately discerning such clusters remains challenging.We re-visited the 3013 samples generated with a hybrid clustering algorithm of FoF and pyUPMASK.A multi-view clustering(MvC)ensemble method was applied,which analyzes each member star of the OC from three perspectives—proper motion,spatial position,and composite views—before integrating the clustering outcomes to deduce more reliable cluster memberships.Based on the MvC results,we further excluded cluster candidates with fewer than ten member stars and obtained 1256 OC candidates.After isochrone fitting and visual inspection,we identified 506 candidate OCs in the Milky Way.In addition to the 493 previously reported candidates,we finally discovered 13 high-confidence new candidate clusters.展开更多
The evolution of dislocation loops in austenitic steels irradiated with Fe^(+)is investigated using cluster dynamics(CD)simulations by developing a CD model.The CD predictions are compared with experimental results in...The evolution of dislocation loops in austenitic steels irradiated with Fe^(+)is investigated using cluster dynamics(CD)simulations by developing a CD model.The CD predictions are compared with experimental results in the literature.The number density and average diameter of the dislocation loops obtained from the CD simulations are in good agreement with the experimental data obtained from transmission electron microscopy(TEM)observations of Fe~+-irradiated Solution Annealed 304,Cold Worked 316,and HR3 austenitic steels in the literature.The CD simulation results demonstrate that the diffusion of in-cascade interstitial clusters plays a major role in the dislocation loop density and dislocation loop growth;in particular,for the HR3 austenitic steel,the CD model has verified the effect of temperature on the density and size of the dislocation loops.展开更多
Developing highly active alloy catalysts that surpass the performance of platinum group metals in the oxygen reduction reaction(ORR)is critical in electrocatalysis.Gold-based single-atom alloy(AuSAA)clusters are gaini...Developing highly active alloy catalysts that surpass the performance of platinum group metals in the oxygen reduction reaction(ORR)is critical in electrocatalysis.Gold-based single-atom alloy(AuSAA)clusters are gaining recognition as promising alternatives due to their potential for high activity.However,enhancing its activity of AuSAA clusters remains challenging due to limited insights into its actual active site in alkaline environments.Herein,we studied a variety of Au_(54)M_(1) SAA cluster catalysts and revealed the operando formed MO_(x)(OH)_(y) complex acts as the crucial active site for catalyzing the ORR under the basic solution condition.The observed volcano plot indicates that Au_(54)Co_(1),Au_(54)M_(1),and Au_(54)Ru_(1) clusters can be the optimal Au_(54)M_(1) SAA cluster catalysts for the ORR.Our findings offer new insights into the actual active sites of AuSAA cluster catalysts,which will inform rational catalyst design in experimental settings.展开更多
Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have ...Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have been introduced to formknowledge-driven clustering algorithms,which reveal a data structure that considers not only the relationships between data but also the compatibility with knowledge hints.However,these algorithms cannot produce the optimal number of clusters by the clustering algorithm itself;they require the assistance of evaluation indices.Moreover,knowledge hints are usually used as part of the data structure(directly replacing some clustering centers),which severely limits the flexibility of the algorithm and can lead to knowledgemisguidance.To solve this problem,this study designs a newknowledge-driven clustering algorithmcalled the PCM clusteringwith High-density Points(HP-PCM),in which domain knowledge is represented in the form of so-called high-density points.First,a newdatadensitycalculation function is proposed.The Density Knowledge Points Extraction(DKPE)method is established to filter out high-density points from the dataset to form knowledge hints.Then,these hints are incorporated into the PCM objective function so that the clustering algorithm is guided by high-density points to discover the natural data structure.Finally,the initial number of clusters is set to be greater than the true one based on the number of knowledge hints.Then,the HP-PCM algorithm automatically determines the final number of clusters during the clustering process by considering the cluster elimination mechanism.Through experimental studies,including some comparative analyses,the results highlight the effectiveness of the proposed algorithm,such as the increased success rate in clustering,the ability to determine the optimal cluster number,and the faster convergence speed.展开更多
In order to enhance the accuracy of Air Traffic Control(ATC)cybersecurity attack detection,in this paper,a new clustering detection method is designed for air traffic control network security attacks.The feature set f...In order to enhance the accuracy of Air Traffic Control(ATC)cybersecurity attack detection,in this paper,a new clustering detection method is designed for air traffic control network security attacks.The feature set for ATC cybersecurity attacks is constructed by setting the feature states,adding recursive features,and determining the feature criticality.The expected information gain and entropy of the feature data are computed to determine the information gain of the feature data and reduce the interference of similar feature data.An autoencoder is introduced into the AI(artificial intelligence)algorithm to encode and decode the characteristics of ATC network security attack behavior to reduce the dimensionality of the ATC network security attack behavior data.Based on the above processing,an unsupervised learning algorithm for clustering detection of ATC network security attacks is designed.First,determine the distance between the clustering clusters of ATC network security attack behavior characteristics,calculate the clustering threshold,and construct the initial clustering center.Then,the new average value of all feature objects in each cluster is recalculated as the new cluster center.Second,it traverses all objects in a cluster of ATC network security attack behavior feature data.Finally,the cluster detection of ATC network security attack behavior is completed by the computation of objective functions.The experiment took three groups of experimental attack behavior data sets as the test object,and took the detection rate,false detection rate and recall rate as the test indicators,and selected three similar methods for comparative test.The experimental results show that the detection rate of this method is about 98%,the false positive rate is below 1%,and the recall rate is above 97%.Research shows that this method can improve the detection performance of security attacks in air traffic control network.展开更多
In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world da...In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world data,particularly in the field of medical imaging.Traditional deep subspace clustering algorithms,which are mostly unsupervised,are limited in their ability to effectively utilize the inherent prior knowledge in medical images.Our MAS-DSC algorithm incorporates a semi-supervised learning framework that uses a small amount of labeled data to guide the clustering process,thereby enhancing the discriminative power of the feature representations.Additionally,the multi-scale feature extraction mechanism is designed to adapt to the complexity of medical imaging data,resulting in more accurate clustering performance.To address the difficulty of hyperparameter selection in deep subspace clustering,this paper employs a Bayesian optimization algorithm for adaptive tuning of hyperparameters related to subspace clustering,prior knowledge constraints,and model loss weights.Extensive experiments on standard clustering datasets,including ORL,Coil20,and Coil100,validate the effectiveness of the MAS-DSC algorithm.The results show that with its multi-scale network structure and Bayesian hyperparameter optimization,MAS-DSC achieves excellent clustering results on these datasets.Furthermore,tests on a brain tumor dataset demonstrate the robustness of the algorithm and its ability to leverage prior knowledge for efficient feature extraction and enhanced clustering performance within a semi-supervised learning framework.展开更多
基金supported in part by NIH grants R01NS39600,U01MH114829RF1MH128693(to GAA)。
文摘Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.
文摘The legacy of United States cluster munition use in Laos and Cambodia during the Second Indochina War is residual bomblets that unexpectedly detonate years later, killing and injuring children, farmers, and other civilians. Cluster munitions release dozens of smaller bomblets that rain deadly ammunition on troops, armored tanks, and vegetation, effectively striking broad sections of war zone landscapes in one launch. While many bomblets detonate immediately, others fail to detonate and can lie dormant on the ground for years. The primary objectives of this study were to document the long-term consequences and impacts of the US Air Force bombing of Laos and Cambodia during the Second Indochina War (1959 to 1973). The historical lessons learned by United States should be shared with Russia and Ukraine governments and military. These countries need to discontinue the use of cluster bombs to prevent additional people living along the Russia-Ukraine border from having to live and die with the consequences of unexploded ordnance, including cluster bombs, for the next century.
文摘Reliable Cluster Head(CH)selectionbased routing protocols are necessary for increasing the packet transmission efficiency with optimal path discovery that never introduces degradation over the transmission reliability.In this paper,Hybrid Golden Jackal,and Improved Whale Optimization Algorithm(HGJIWOA)is proposed as an effective and optimal routing protocol that guarantees efficient routing of data packets in the established between the CHs and the movable sink.This HGJIWOA included the phases of Dynamic Lens-Imaging Learning Strategy and Novel Update Rules for determining the reliable route essential for data packets broadcasting attained through fitness measure estimation-based CH selection.The process of CH selection achieved using Golden Jackal Optimization Algorithm(GJOA)completely depends on the factors of maintainability,consistency,trust,delay,and energy.The adopted GJOA algorithm play a dominant role in determining the optimal path of routing depending on the parameter of reduced delay and minimal distance.It further utilized Improved Whale Optimisation Algorithm(IWOA)for forwarding the data from chosen CHs to the BS via optimized route depending on the parameters of energy and distance.It also included a reliable route maintenance process that aids in deciding the selected route through which data need to be transmitted or re-routed.The simulation outcomes of the proposed HGJIWOA mechanism with different sensor nodes confirmed an improved mean throughput of 18.21%,sustained residual energy of 19.64%with minimized end-to-end delay of 21.82%,better than the competitive CH selection approaches.
基金supported by the Spanish Ministry of Science and Innovation under Projects PID2022-137680OB-C32 and PID2022-139187OB-I00.
文摘Customer segmentation according to load-shape profiles using smart meter data is an increasingly important application to vital the planning and operation of energy systems and to enable citizens’participation in the energy transition.This study proposes an innovative multi-step clustering procedure to segment customers based on load-shape patterns at the daily and intra-daily time horizons.Smart meter data is split between daily and hourly normalized time series to assess monthly,weekly,daily,and hourly seasonality patterns separately.The dimensionality reduction implicit in the splitting allows a direct approach to clustering raw daily energy time series data.The intraday clustering procedure sequentially identifies representative hourly day-unit profiles for each customer and the entire population.For the first time,a step function approach is applied to reduce time series dimensionality.Customer attributes embedded in surveys are employed to build external clustering validation metrics using Cramer’s V correlation factors and to identify statistically significant determinants of load-shape in energy usage.In addition,a time series features engineering approach is used to extract 16 relevant demand flexibility indicators that characterize customers and corresponding clusters along four different axes:available Energy(E),Temporal patterns(T),Consistency(C),and Variability(V).The methodology is implemented on a real-world electricity consumption dataset of 325 Small and Medium-sized Enterprise(SME)customers,identifying 4 daily and 6 hourly easy-to-interpret,well-defined clusters.The application of the methodology includes selecting key parameters via grid search and a thorough comparison of clustering distances and methods to ensure the robustness of the results.Further research can test the scalability of the methodology to larger datasets from various customer segments(households and large commercial)and locations with different weather and socioeconomic conditions.
文摘An aluminoborate,Na_(2.5)Rb[Al{B_(5)O_(10)}{B_(3)O_(5)}]·0.5NO_(3)·H_(2)O(1),was synthesized under hydrothermal condition,which was built by mixed oxoboron clusters and AlO_(4)tetrahedra.In the structure,the[B_(5)O_(10)]^(5-)and[B_(3)O_(7)]^(5-)clusters are alternately connected to form 1D[B_(8)O_(15)]_(n)^(6n-)chains,which are further linked by AlO_(4)units to form a 2D monolayer with 7‑membered ring and 10‑membered ring windows.Two adjacent monolayers with opposite orientations further form a porous‑layered structure with six channels through B—O—Al bonds.Compound 1 was characterized by single crystal X‑ray diffraction,powder X‑ray diffraction(PXRD),IR spectroscopy,UV‑Vis diffuse reflection spectroscopy,and thermogravimetric analysis(TGA),respectively.UV‑Vis diffuse reflectance analysis indicates that compound 1 shows a wide transparency range with a short cutoff edge of 201 nm,suggesting it may have potential application in UV regions.CCDC:2383923.
文摘Domaining is a crucial process in geostatistics, particularly when significant spatial variations are observed within a site, as these variations can significantly affect the outcomes of spatial modeling. This study investigates the application of hard and fuzzy clustering algorithms for domain delineation, using geological and geochemical data from two exploration campaigns at the eastern Kahang deposit in central Iran. The dataset includes geological layers (lithology, alteration, and mineral zones), geochemical layers (Cu, Mo, Ag, and Au grades), and borehole coordinates. Six clustering algorithms—K-means, hierarchical, affinity propagation, self-organizing map (SOM), fuzzy C-means, and Gustafson-Kessel—were applied to determine the optimal number of clusters, which ranged from 3 to 4. The fuzziness and weighting parameters were found to range from 1.1 to 1.3 and 0.1 to 0.3, respectively, based on the evaluation of various hard and fuzzy cluster validity indices. Directional variograms were computed to assess spatial anisotropy, and the anisotropy ellipsoid for each domain was defined to identify the model with the highest level of anisotropic discrimination among the domains. The SOM algorithm, which incorporated both qualitative and quantitative data, produced the best model, resulting in the identification of three distinct domains. These findings underscore the effectiveness of combining clustering techniques with variogram analysis for accurate domain delineation in geostatistical modeling.
基金partially supported by the National Natural Science Foundation of China(62161016)the Key Research and Development Project of Lanzhou Jiaotong University(ZDYF2304)+1 种基金the Beijing Engineering Research Center of Highvelocity Railway Broadband Mobile Communications(BHRC-2022-1)Beijing Jiaotong University。
文摘In order to solve the problems of short network lifetime and high data transmission delay in data gathering for wireless sensor network(WSN)caused by uneven energy consumption among nodes,a hybrid energy efficient clustering routing base on firefly and pigeon-inspired algorithm(FF-PIA)is proposed to optimise the data transmission path.After having obtained the optimal number of cluster head node(CH),its result might be taken as the basis of producing the initial population of FF-PIA algorithm.The L′evy flight mechanism and adaptive inertia weighting are employed in the algorithm iteration to balance the contradiction between the global search and the local search.Moreover,a Gaussian perturbation strategy is applied to update the optimal solution,ensuring the algorithm can jump out of the local optimal solution.And,in the WSN data gathering,a onedimensional signal reconstruction algorithm model is developed by dilated convolution and residual neural networks(DCRNN).We conducted experiments on the National Oceanic and Atmospheric Administration(NOAA)dataset.It shows that the DCRNN modeldriven data reconstruction algorithm improves the reconstruction accuracy as well as the reconstruction time performance.FF-PIA and DCRNN clustering routing co-simulation reveals that the proposed algorithm can effectively improve the performance in extending the network lifetime and reducing data transmission delay.
基金Guangdong Major Project of Basic and Applied Basic Research(2020B0301030004)Science and Technology Development Plan in Jilin Province of China(20230203135SF)+1 种基金National Natural Science Foundation of China(41875119)Special Fund for Innovative Development of China Meteorological Administration(CXFZ2022J007)。
文摘Clustered heavy precipitation(CHP)events can severely impact human society,infrastructure,and natural ecosystems.Consequently,short-term climate prediction of CHP events is vital for the prevention and mitigation of associated hazards.Employing year-to-year increment(DY)and multiple linear regression approaches,this study developed a seasonal prediction model for pre-summer(i.e.,May and June)CHP frequency in South China(SC)during 1981–2022.Three robust predictor factors were identified:March sea surface temperature in Southwestern Atlantic,early-winter snow depth in East Europe,and winter soil moisture in Central Asia.Three predictors exert substantial impacts on presummer precipitation in SC via modulation of an anomalous anticyclone(cyclone)over the(subtropical)western North Pacific.In leave-one-out cross-validation test during 1981–2022,the prediction model exhibited reasonable performance in predicting the interannual and interdecadal variations and trends of CHP days.The temporal correlation coefficient(TCC)was 0.66 between the observations and predictions.In the independent hindcast for 2013–2022,the TCC was as high as 0.85.Moreover,coherent covariations were observed between the frequency and the amounts of CHP,with a TCC of 0.99 for 1981–2022.Those three predictors show good performance in forecasting CHP amounts over SC,with a TCC of 0.68 between the predictions and observations in the cross-validation test during 1981–2022 and of 0.86 in the independent hindcasts during 2013–2022.Notably,the predictors also showed good predictive skill for years with high CHP occurrence(e.g.,1998 and 2019).The predicted high-incidence areas of heavy precipitation days were highly consistent with observations,with a pattern correlation coefficient of 0.44(0.55)for 1998(2019).This study provides valuable insights to improve seasonal prediction of pre-summer CHP frequency in SC.
文摘Container-based virtualization technology has been more widely used in edge computing environments recently due to its advantages of lighter resource occupation, faster startup capability, and better resource utilization efficiency. To meet the diverse needs of tasks, it usually needs to instantiate multiple network functions in the form of containers interconnect various generated containers to build a Container Cluster(CC). Then CCs will be deployed on edge service nodes with relatively limited resources. However, the increasingly complex and timevarying nature of tasks brings great challenges to optimal placement of CC. This paper regards the charges for various resources occupied by providing services as revenue, the service efficiency and energy consumption as cost, thus formulates a Mixed Integer Programming(MIP) model to describe the optimal placement of CC on edge service nodes. Furthermore, an Actor-Critic based Deep Reinforcement Learning(DRL) incorporating Graph Convolutional Networks(GCN) framework named as RL-GCN is proposed to solve the optimization problem. The framework obtains an optimal placement strategy through self-learning according to the requirements and objectives of the placement of CC. Particularly, through the introduction of GCN, the features of the association relationship between multiple containers in CCs can be effectively extracted to improve the quality of placement.The experiment results show that under different scales of service nodes and task requests, the proposed method can obtain the improved system performance in terms of placement error ratio, time efficiency of solution output and cumulative system revenue compared with other representative baseline methods.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.92261201,12134005,12334011)。
文摘We investigated the ionization and dissociation processes of ammonia clusters ranging from dimer to pentamer induced by 800-nm femtosecond laser fields.Time-of-flight(TOF)mass spectra of the ammonia clusters were recorded over a range of laser intensities from 2.1×10^(12)W/cm^(2) to 5.6×10^(12)W/cm^(2).The protonated ion signals dominate the spectra,which is consistent with the stability of the geometric structures.The ionization and dissociation channels of ammonia clusters are discussed.The competition and switching among observed dissociation channels are revealed by analyzing the variations in the relative ionic yields of specific protonated and unprotonated clusters under different laser intensities.These results indicate that the ionization of the neutral multiple-ammonia units,produced through the dissociation of cluster ions,may start to contribute,as well as the additional processes to consume protonated ions and/or produce unprotonated ions induced by the femtosecond laser fields when the laser intensity is above^4×10^(12)W/cm^(2).These findings provide deeper insights into the ionization and dissociation dynamics in multi-photon ionization experiments involving ammonia clusters.
基金supported by the National Natural Science Foundation of China(grant Nos.U2038104 and 11703014)the Bureau of International Cooperation,Chinese Academy of Sciences(GJHZ1864)。
文摘We study the structural and dynamical properties of A209 based on Chandra and XMM-Newton observations.We obtain detailed temperature,pressure,and entropy maps with the contour binning method,and find a hot region in the NW direction.The X-ray brightness residual map and corresponding temperature profiles reveal a possible shock front in the NW direction and a cold front feature in the SE direction.Combined with the galaxy luminosity density map we propose a weak merger scenario.A young sub-cluster passing from the SE to NW direction could explain the optical subpeak,the intracluster medium temperature map,the X-ray surface brightness excess,and the X-ray peak offset together.
文摘Clustering a social network is a process of grouping social actors into clusters where intra-cluster similarities among actors are higher than inter-cluster similarities. Clustering approaches, i.e. , k-medoids or hierarchical, use the distance function to measure the dissimilarities among actors. These distance functions need to fulfill various properties, including the triangle inequality (TI). However, in some cases, the triangle inequality might be violated, impacting the quality of the resulting clusters. With experiments, this paper explains how TI violates while performing traditional clustering techniques: k-medoids, hierarchical, DENGRAPH, and spectral clustering on social networks and how the violation of TI affects the quality of the resulting clusters.
基金This work is supported by the National Key R&D Program of China(2017YFB0802900).
文摘In recent years,many unknown protocols are constantly emerging,and they bring severe challenges to network security and network management.Existing unknown protocol recognition methods suffer from weak feature extraction ability,and they cannot mine the discriminating features of the protocol data thoroughly.To address the issue,we propose an unknown application layer protocol recognition method based on deep clustering.Deep clustering which consists of the deep neural network and the clustering algorithm can automatically extract the features of the input and cluster the data based on the extracted features.Compared with the traditional clustering methods,deep clustering boasts of higher clustering accuracy.The proposed method utilizes network-in-network(NIN),channel attention,spatial attention and Bidirectional Long Short-term memory(BLSTM)to construct an autoencoder to extract the spatial-temporal features of the protocol data,and utilizes the unsupervised clustering algorithm to recognize the unknown protocols based on the features.The method firstly extracts the application layer protocol data from the network traffic and transforms the data into one-dimensional matrix.Secondly,the autoencoder is pretrained,and the protocol data is compressed into low dimensional latent space by the autoencoder and the initial clustering is performed with K-Means.Finally,the clustering loss is calculated and the classification model is optimized according to the clustering loss.The classification results can be obtained when the classification model is optimal.Compared with the existing unknown protocol recognition methods,the proposed method utilizes deep clustering to cluster the unknown protocols,and it can mine the key features of the protocol data and recognize the unknown protocols accurately.Experimental results show that the proposed method can effectively recognize the unknown protocols,and its performance is better than other methods.
文摘Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The significance of low-rank prior in MVSC is emphasized, highlighting its role in capturing the global data structure across views for improved performance. However, it faces challenges with outlier sensitivity due to its reliance on the Frobenius norm for error measurement. Addressing this, our paper proposes a Low-Rank Multi-view Subspace Clustering Based on Sparse Regularization (LMVSC- Sparse) approach. Sparse regularization helps in selecting the most relevant features or views for clustering while ignoring irrelevant or noisy ones. This leads to a more efficient and effective representation of the data, improving the clustering accuracy and robustness, especially in the presence of outliers or noisy data. By incorporating sparse regularization, LMVSC-Sparse can effectively handle outlier sensitivity, which is a common challenge in traditional MVSC methods relying solely on low-rank priors. Then Alternating Direction Method of Multipliers (ADMM) algorithm is employed to solve the proposed optimization problems. Our comprehensive experiments demonstrate the efficiency and effectiveness of LMVSC-Sparse, offering a robust alternative to traditional MVSC methods.
基金supported by the National Key Research And Development Program of China(No.2022YFF0711500)the National Natural Science Foundation of China(NSFC,Grant No.12373097)+1 种基金the Basic and Applied Basic Research Foundation Project of Guangdong Province(No.2024A1515011503)the Guangzhou Science and Technology Funds(2023A03J0016)。
文摘Open clusters(OCs)serve as invaluable tracers for investigating the properties and evolution of stars and galaxies.Despite recent advancements in machine learning clustering algorithms,accurately discerning such clusters remains challenging.We re-visited the 3013 samples generated with a hybrid clustering algorithm of FoF and pyUPMASK.A multi-view clustering(MvC)ensemble method was applied,which analyzes each member star of the OC from three perspectives—proper motion,spatial position,and composite views—before integrating the clustering outcomes to deduce more reliable cluster memberships.Based on the MvC results,we further excluded cluster candidates with fewer than ten member stars and obtained 1256 OC candidates.After isochrone fitting and visual inspection,we identified 506 candidate OCs in the Milky Way.In addition to the 493 previously reported candidates,we finally discovered 13 high-confidence new candidate clusters.
基金supported by the National Natural Science Foundation of China(No.U1967212)the Fundamental Research Funds for the Central Universities(No.2021MS032)the Nuclear Materials Innovation Foundation(No.WDZC-2023-AW-0305)。
文摘The evolution of dislocation loops in austenitic steels irradiated with Fe^(+)is investigated using cluster dynamics(CD)simulations by developing a CD model.The CD predictions are compared with experimental results in the literature.The number density and average diameter of the dislocation loops obtained from the CD simulations are in good agreement with the experimental data obtained from transmission electron microscopy(TEM)observations of Fe~+-irradiated Solution Annealed 304,Cold Worked 316,and HR3 austenitic steels in the literature.The CD simulation results demonstrate that the diffusion of in-cascade interstitial clusters plays a major role in the dislocation loop density and dislocation loop growth;in particular,for the HR3 austenitic steel,the CD model has verified the effect of temperature on the density and size of the dislocation loops.
文摘Developing highly active alloy catalysts that surpass the performance of platinum group metals in the oxygen reduction reaction(ORR)is critical in electrocatalysis.Gold-based single-atom alloy(AuSAA)clusters are gaining recognition as promising alternatives due to their potential for high activity.However,enhancing its activity of AuSAA clusters remains challenging due to limited insights into its actual active site in alkaline environments.Herein,we studied a variety of Au_(54)M_(1) SAA cluster catalysts and revealed the operando formed MO_(x)(OH)_(y) complex acts as the crucial active site for catalyzing the ORR under the basic solution condition.The observed volcano plot indicates that Au_(54)Co_(1),Au_(54)M_(1),and Au_(54)Ru_(1) clusters can be the optimal Au_(54)M_(1) SAA cluster catalysts for the ORR.Our findings offer new insights into the actual active sites of AuSAA cluster catalysts,which will inform rational catalyst design in experimental settings.
基金supported by the National Key Research and Development Program of China(No.2022YFB3304400)the National Natural Science Foundation of China(Nos.6230311,62303111,62076060,61932007,and 62176083)the Key Research and Development Program of Jiangsu Province of China(No.BE2022157).
文摘Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have been introduced to formknowledge-driven clustering algorithms,which reveal a data structure that considers not only the relationships between data but also the compatibility with knowledge hints.However,these algorithms cannot produce the optimal number of clusters by the clustering algorithm itself;they require the assistance of evaluation indices.Moreover,knowledge hints are usually used as part of the data structure(directly replacing some clustering centers),which severely limits the flexibility of the algorithm and can lead to knowledgemisguidance.To solve this problem,this study designs a newknowledge-driven clustering algorithmcalled the PCM clusteringwith High-density Points(HP-PCM),in which domain knowledge is represented in the form of so-called high-density points.First,a newdatadensitycalculation function is proposed.The Density Knowledge Points Extraction(DKPE)method is established to filter out high-density points from the dataset to form knowledge hints.Then,these hints are incorporated into the PCM objective function so that the clustering algorithm is guided by high-density points to discover the natural data structure.Finally,the initial number of clusters is set to be greater than the true one based on the number of knowledge hints.Then,the HP-PCM algorithm automatically determines the final number of clusters during the clustering process by considering the cluster elimination mechanism.Through experimental studies,including some comparative analyses,the results highlight the effectiveness of the proposed algorithm,such as the increased success rate in clustering,the ability to determine the optimal cluster number,and the faster convergence speed.
基金National Natural Science Foundation of China(U2133208,U20A20161)National Natural Science Foundation of China(No.62273244)Sichuan Science and Technology Program(No.2022YFG0180).
文摘In order to enhance the accuracy of Air Traffic Control(ATC)cybersecurity attack detection,in this paper,a new clustering detection method is designed for air traffic control network security attacks.The feature set for ATC cybersecurity attacks is constructed by setting the feature states,adding recursive features,and determining the feature criticality.The expected information gain and entropy of the feature data are computed to determine the information gain of the feature data and reduce the interference of similar feature data.An autoencoder is introduced into the AI(artificial intelligence)algorithm to encode and decode the characteristics of ATC network security attack behavior to reduce the dimensionality of the ATC network security attack behavior data.Based on the above processing,an unsupervised learning algorithm for clustering detection of ATC network security attacks is designed.First,determine the distance between the clustering clusters of ATC network security attack behavior characteristics,calculate the clustering threshold,and construct the initial clustering center.Then,the new average value of all feature objects in each cluster is recalculated as the new cluster center.Second,it traverses all objects in a cluster of ATC network security attack behavior feature data.Finally,the cluster detection of ATC network security attack behavior is completed by the computation of objective functions.The experiment took three groups of experimental attack behavior data sets as the test object,and took the detection rate,false detection rate and recall rate as the test indicators,and selected three similar methods for comparative test.The experimental results show that the detection rate of this method is about 98%,the false positive rate is below 1%,and the recall rate is above 97%.Research shows that this method can improve the detection performance of security attacks in air traffic control network.
基金supported in part by the National Natural Science Foundation of China under Grant 62171203in part by the Jiangsu Province“333 Project”High-Level Talent Cultivation Subsidized Project+2 种基金in part by the SuzhouKey Supporting Subjects for Health Informatics under Grant SZFCXK202147in part by the Changshu Science and Technology Program under Grants CS202015 and CS202246in part by Changshu Key Laboratory of Medical Artificial Intelligence and Big Data under Grants CYZ202301 and CS202314.
文摘In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world data,particularly in the field of medical imaging.Traditional deep subspace clustering algorithms,which are mostly unsupervised,are limited in their ability to effectively utilize the inherent prior knowledge in medical images.Our MAS-DSC algorithm incorporates a semi-supervised learning framework that uses a small amount of labeled data to guide the clustering process,thereby enhancing the discriminative power of the feature representations.Additionally,the multi-scale feature extraction mechanism is designed to adapt to the complexity of medical imaging data,resulting in more accurate clustering performance.To address the difficulty of hyperparameter selection in deep subspace clustering,this paper employs a Bayesian optimization algorithm for adaptive tuning of hyperparameters related to subspace clustering,prior knowledge constraints,and model loss weights.Extensive experiments on standard clustering datasets,including ORL,Coil20,and Coil100,validate the effectiveness of the MAS-DSC algorithm.The results show that with its multi-scale network structure and Bayesian hyperparameter optimization,MAS-DSC achieves excellent clustering results on these datasets.Furthermore,tests on a brain tumor dataset demonstrate the robustness of the algorithm and its ability to leverage prior knowledge for efficient feature extraction and enhanced clustering performance within a semi-supervised learning framework.