Machine learning(ML)is increasingly applied for medical image processing with appropriate learning paradigms.These applications include analyzing images of various organs,such as the brain,lung,eye,etc.,to identify sp...Machine learning(ML)is increasingly applied for medical image processing with appropriate learning paradigms.These applications include analyzing images of various organs,such as the brain,lung,eye,etc.,to identify specific flaws/diseases for diagnosis.The primary concern of ML applications is the precise selection of flexible image features for pattern detection and region classification.Most of the extracted image features are irrelevant and lead to an increase in computation time.Therefore,this article uses an analytical learning paradigm to design a Congruent Feature Selection Method to select the most relevant image features.This process trains the learning paradigm using similarity and correlation-based features over different textural intensities and pixel distributions.The similarity between the pixels over the various distribution patterns with high indexes is recommended for disease diagnosis.Later,the correlation based on intensity and distribution is analyzed to improve the feature selection congruency.Therefore,the more congruent pixels are sorted in the descending order of the selection,which identifies better regions than the distribution.Now,the learning paradigm is trained using intensity and region-based similarity to maximize the chances of selection.Therefore,the probability of feature selection,regardless of the textures and medical image patterns,is improved.This process enhances the performance of ML applications for different medical image processing.The proposed method improves the accuracy,precision,and training rate by 13.19%,10.69%,and 11.06%,respectively,compared to other models for the selected dataset.The mean error and selection time is also reduced by 12.56%and 13.56%,respectively,compared to the same models and dataset.展开更多
This article constructs statistical selection procedures for exponential populations that may differ in only the threshold parameters. The scale parameters of the populations are assumed common and known. The independ...This article constructs statistical selection procedures for exponential populations that may differ in only the threshold parameters. The scale parameters of the populations are assumed common and known. The independent samples drawn from the populations are taken to be of the same size. The best population is defined as the one associated with the largest threshold parameter. In case more than one population share the largest threshold, one of these is tagged at random and denoted the best. Two procedures are developed for choosing a subset of the populations having the property that the chosen subset contains the best population with a prescribed probability. One procedure is based on the sample minimum values drawn from the populations, and another is based on the sample means from the populations. An “Indifference Zone” (IZ) selection procedure is also developed based on the sample minimum values. The IZ procedure asserts that the population with the largest test statistic (e.g., the sample minimum) is the best population. With this approach, the sample size is chosen so as to guarantee that the probability of a correct selection is no less than a prescribed probability in the parameter region where the largest threshold is at least a prescribed amount larger than the remaining thresholds. Numerical examples are given, and the computer R-codes for all calculations are given in the Appendices.展开更多
Non-orthogonal multiple access(NOMA)is a promising technology for the next generation wireless communication networks.The benefits of this technology can be further enhanced through deployment in conjunction with mult...Non-orthogonal multiple access(NOMA)is a promising technology for the next generation wireless communication networks.The benefits of this technology can be further enhanced through deployment in conjunction with multiple-input multipleoutput(MIMO)systems.Antenna selection plays a critical role in MIMO–NOMA systems as it has the potential to significantly reduce the cost and complexity associated with radio frequency chains.This paper considers antenna selection for downlink MIMO–NOMA networks with multiple-antenna basestation(BS)and multiple-antenna user equipments(UEs).An iterative antenna selection scheme is developed for a two-user system,and to determine the initial power required for this selection scheme,a power estimation method is also proposed.The proposed algorithm is then extended to a general multiuser NOMA system.Numerical results demonstrate that the proposed antenna selection algorithm achieves near-optimal performance with much lower computational complexity in both two-user and multiuser scenarios.展开更多
This study provides a systematic investigation into the influence of feature selection methods on cryptocurrency price forecasting models employing technical indicators.In this work,over 130 technical indicators—cove...This study provides a systematic investigation into the influence of feature selection methods on cryptocurrency price forecasting models employing technical indicators.In this work,over 130 technical indicators—covering momentum,volatility,volume,and trend-related technical indicators—are subjected to three distinct feature selection approaches.Specifically,mutual information(MI),recursive feature elimination(RFE),and random forest importance(RFI).By extracting an optimal set of 20 predictors,the proposed framework aims to mitigate redundancy and overfitting while enhancing interpretability.These feature subsets are integrated into support vector regression(SVR),Huber regressors,and k-nearest neighbors(KNN)models to forecast the prices of three leading cryptocurrencies—Bitcoin(BTC/USDT),Ethereum(ETH/USDT),and Binance Coin(BNB/USDT)—across horizons ranging from 1 to 20 days.Model evaluation employs the coefficient of determination(R2)and the root mean squared logarithmic error(RMSLE),alongside a walk-forward validation scheme to approximate real-world trading contexts.Empirical results indicate that incorporating momentum and volatility measures substantially improves predictive accuracy,with particularly pronounced effects observed at longer forecast windows.Moreover,indicators related to volume and trend provide incremental benefits in select market conditions.Notably,an 80%–85% reduction in the original feature set frequently maintains or enhances model performance relative to the complete indicator set.These findings highlight the critical role of targeted feature selection in addressing high-dimensional financial data challenges while preserving model robustness.This research advances the field of cryptocurrency forecasting by offering a rigorous comparison of feature selection methods and their effects on multiple digital assets and prediction horizons.The outcomes highlight the importance of dimension-reduction strategies in developing more efficient and resilient forecasting algorithms.Future efforts should incorporate high-frequency data and explore alternative selection techniques to further refine predictive accuracy in this highly volatile domain.展开更多
Feature selection methods rooted in rough sets confront two notable limitations:their high computa-tional complexity and sensitivity to noise,rendering them impractical for managing large-scale and noisy datasets.The ...Feature selection methods rooted in rough sets confront two notable limitations:their high computa-tional complexity and sensitivity to noise,rendering them impractical for managing large-scale and noisy datasets.The primary issue stems from these methods’undue reliance on all samples.To overcome these challenges,we introduce the concept of cross-similarity grounded in a robust fuzzy relation and design a rapid and robust feature selection algorithm.Firstly,we construct a robust fuzzy relation by introducing a truncation parameter.Then,based on this fuzzy relation,we propose the concept of cross-similarity,which emphasizes the sample-to-sample similarity relations that uniquely determine feature importance,rather than considering all such relations equally.After studying the manifestations and properties of cross-similarity across different fuzzy granularities,we propose a forward greedy feature selection algorithm that leverages cross-similarity as the foundation for information measurement.This algorithm significantly reduces the time complexity from O(m2n2)to O(mn2).Experimental findings reveal that the average runtime of five state-of-the-art comparison algorithms is roughly 3.7 times longer than our algorithm,while our algorithm achieves an average accuracy that surpasses those of the five comparison algorithms by approximately 3.52%.This underscores the effectiveness of our approach.This paper paves the way for applying feature selection algorithms grounded in fuzzy rough sets to large-scale gene datasets.展开更多
Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irr...Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irrelevant or redundant features,and the variability in risk factors such as age,lifestyle,andmedical history.These challenges often lead to inefficient and less accuratemodels.Traditional predictionmethodologies face limitations in effectively handling large feature sets and optimizing classification performance,which can result in overfitting poor generalization,and high computational cost.This work proposes a novel classification model for heart disease prediction that addresses these challenges by integrating feature selection through a Genetic Algorithm(GA)with an ensemble deep learning approach optimized using the Tunicate Swarm Algorithm(TSA).GA selects the most relevant features,reducing dimensionality and improvingmodel efficiency.Theselected features are then used to train an ensemble of deep learning models,where the TSA optimizes the weight of each model in the ensemble to enhance prediction accuracy.This hybrid approach addresses key challenges in the field,such as high dimensionality,redundant features,and classification performance,by introducing an efficient feature selection mechanism and optimizing the weighting of deep learning models in the ensemble.These enhancements result in a model that achieves superior accuracy,generalization,and efficiency compared to traditional methods.The proposed model demonstrated notable advancements in both prediction accuracy and computational efficiency over traditionalmodels.Specifically,it achieved an accuracy of 97.5%,a sensitivity of 97.2%,and a specificity of 97.8%.Additionally,with a 60-40 data split and 5-fold cross-validation,the model showed a significant reduction in training time(90 s),memory consumption(950 MB),and CPU usage(80%),highlighting its effectiveness in processing large,complex medical datasets for heart disease prediction.展开更多
In the task of Facial Expression Recognition(FER),data uncertainty has been a critical factor affecting performance,typically arising from the ambiguity of facial expressions,low-quality images,and the subjectivity of...In the task of Facial Expression Recognition(FER),data uncertainty has been a critical factor affecting performance,typically arising from the ambiguity of facial expressions,low-quality images,and the subjectivity of annotators.Tracking the training history reveals that misclassified samples often exhibit high confidence and excessive uncertainty in the early stages of training.To address this issue,we propose an uncertainty-based robust sample selection strategy,which combines confidence error with RandAugment to improve image diversity,effectively reducing overfitting caused by uncertain samples during deep learning model training.To validate the effectiveness of the proposed method,extensive experiments were conducted on FER public benchmarks.The accuracy obtained were 89.08%on RAF-DB,63.12%on AffectNet,and 88.73%on FERPlus.展开更多
In recent years, particle swarm optimization (PSO) has received widespread attention in feature selection due to its simplicity and potential for global search. However, in traditional PSO, particles primarily update ...In recent years, particle swarm optimization (PSO) has received widespread attention in feature selection due to its simplicity and potential for global search. However, in traditional PSO, particles primarily update based on two extreme values: personal best and global best, which limits the diversity of information. Ideally, particles should learn from multiple advantageous particles to enhance interactivity and optimization efficiency. Accordingly, this paper proposes a PSO that simulates the evolutionary dynamics of species survival in mountain peak ecology (PEPSO) for feature selection. Based on the pyramid topology, the algorithm simulates the features of mountain peak ecology in nature and the competitive-cooperative strategies among species. According to the principles of the algorithm, the population is first adaptively divided into many subgroups based on the fitness level of particles. Then, particles within each subgroup are divided into three different types based on their evolutionary levels, employing different adaptive inertia weight rules and dynamic learning mechanisms to define distinct learning modes. Consequently, all particles play their respective roles in promoting the global optimization performance of the algorithm, similar to different species in the ecological pattern of mountain peaks. Experimental validation of the PEPSO performance was conducted on 18 public datasets. The experimental results demonstrate that the PEPSO outperforms other PSO variant-based feature selection methods and mainstream feature selection methods based on intelligent optimization algorithms in terms of overall performance in global search capability, classification accuracy, and reduction of feature space dimensions. Wilcoxon signed-rank test also confirms the excellent performance of the PEPSO.展开更多
The principle of genomic selection(GS) entails estimating breeding values(BVs) by summing all the SNP polygenic effects. The visible/near-infrared spectroscopy(VIS/NIRS) wavelength and abundance values can directly re...The principle of genomic selection(GS) entails estimating breeding values(BVs) by summing all the SNP polygenic effects. The visible/near-infrared spectroscopy(VIS/NIRS) wavelength and abundance values can directly reflect the concentrations of chemical substances, and the measurement of meat traits by VIS/NIRS is similar to the processing of genomic selection data by summing all ‘polygenic effects' associated with spectral feature peaks. Therefore, it is meaningful to investigate the incorporation of VIS/NIRS information into GS models to establish an efficient and low-cost breeding model. In this study, we measured 6 meat quality traits in 359Duroc×Landrace×Yorkshire pigs from Guangxi Zhuang Autonomous Region, China, and genotyped them with high-density SNP chips. According to the completeness of the information for the target population, we proposed 4breeding strategies applied to different scenarios: Ⅰ, only spectral and genotypic data exist for the target population;Ⅱ, only spectral data exist for the target population;Ⅲ, only spectral and genotypic data but with different prediction processes exist for the target population;and Ⅳ, only spectral and phenotypic data exist for the target population.The 4 scenarios were used to evaluate the genomic estimated breeding value(GEBV) accuracy by increasing the VIS/NIR spectral information. In the results of the 5-fold cross-validation, the genetic algorithm showed remarkable potential for preselection of feature wavelengths. The breeding efficiency of Strategies Ⅱ, Ⅲ, and Ⅳ was superior to that of traditional GS for most traits, and the GEBV prediction accuracy was improved by 32.2, 40.8 and 15.5%, respectively on average. Among them, the prediction accuracy of Strategy Ⅱ for fat(%) even improved by 50.7% compared to traditional GS. The GEBV prediction accuracy of Strategy Ⅰ was nearly identical to that of traditional GS, and the fluctuation range was less than 7%. Moreover, the breeding cost of the 4 strategies was lower than that of traditional GS methods, with Strategy Ⅳ being the lowest as it did not require genotyping.Our findings demonstrate that GS methods based on VIS/NIRS data have significant predictive potential and are worthy of further research to provide a valuable reference for the development of effective and affordable breeding strategies.展开更多
Addressing the complex issue of emergency resource distribution center site selection in uncertain environments, this study was conducted to comprehensively consider factors such as uncertainty parameters and the urge...Addressing the complex issue of emergency resource distribution center site selection in uncertain environments, this study was conducted to comprehensively consider factors such as uncertainty parameters and the urgency of demand at disaster-affected sites. Firstly, urgency cost, economic cost, and transportation distance cost were identified as key objectives. The study applied fuzzy theory integration to construct a triangular fuzzy multi-objective site selection decision model. Next, the defuzzification theory transformed the fuzzy decision model into a precise one. Subsequently, an improved Chaotic Quantum Multi-Objective Harris Hawks Optimization (CQ-MOHHO) algorithm was proposed to solve the model. The CQ-MOHHO algorithm was shown to rapidly produce high-quality Pareto front solutions and identify optimal site selection schemes for emergency resource distribution centers through case studies. This outcome verified the feasibility and efficacy of the site selection decision model and the CQ-MOHHO algorithm. To further assess CQ-MOHHO’s performance, Zitzler-Deb-Thiele (ZDT) test functions, commonly used in multi-objective optimization, were employed. Comparisons with Multi-Objective Harris Hawks Optimization (MOHHO), Non-dominated Sorting Genetic Algorithm II (NSGA-II), and Multi-Objective Grey Wolf Optimizer (MOGWO) using Generational Distance (GD), Hypervolume (HV), and Inverted Generational Distance (IGD) metrics showed that CQ-MOHHO achieved superior global search ability, faster convergence, and higher solution quality. The CQ-MOHHO algorithm efficiently achieved a balance between multiple objectives, providing decision-makers with satisfactory solutions and a valuable reference for researching and applying emergency site selection problems.展开更多
In this paper,a feature selection method for determining input parameters in antenna modeling is proposed.In antenna modeling,the input feature of artificial neural network(ANN)is geometric parameters.The selection cr...In this paper,a feature selection method for determining input parameters in antenna modeling is proposed.In antenna modeling,the input feature of artificial neural network(ANN)is geometric parameters.The selection criteria contain correlation and sensitivity between the geometric parameter and the electromagnetic(EM)response.Maximal information coefficient(MIC),an exploratory data mining tool,is introduced to evaluate both linear and nonlinear correlations.The EM response range is utilized to evaluate the sensitivity.The wide response range corresponding to varying values of a parameter implies the parameter is highly sensitive and the narrow response range suggests the parameter is insensitive.Only the parameter which is highly correlative and sensitive is selected as the input of ANN,and the sampling space of the model is highly reduced.The modeling of a wideband and circularly polarized antenna is studied as an example to verify the effectiveness of the proposed method.The number of input parameters decreases from8 to 4.The testing errors of|S_(11)|and axis ratio are reduced by8.74%and 8.95%,respectively,compared with the ANN with no feature selection.展开更多
Landslide susceptibility prediction(LSP)is significantly affected by the uncertainty issue of landslide related conditioning factor selection.However,most of literature only performs comparative studies on a certain c...Landslide susceptibility prediction(LSP)is significantly affected by the uncertainty issue of landslide related conditioning factor selection.However,most of literature only performs comparative studies on a certain conditioning factor selection method rather than systematically study this uncertainty issue.Targeted,this study aims to systematically explore the influence rules of various commonly used conditioning factor selection methods on LSP,and on this basis to innovatively propose a principle with universal application for optimal selection of conditioning factors.An'yuan County in southern China is taken as example considering 431 landslides and 29 types of conditioning factors.Five commonly used factor selection methods,namely,the correlation analysis(CA),linear regression(LR),principal component analysis(PCA),rough set(RS)and artificial neural network(ANN),are applied to select the optimal factor combinations from the original 29 conditioning factors.The factor selection results are then used as inputs of four types of common machine learning models to construct 20 types of combined models,such as CA-multilayer perceptron,CA-random forest.Additionally,multifactor-based multilayer perceptron random forest models that selecting conditioning factors based on the proposed principle of“accurate data,rich types,clear significance,feasible operation and avoiding duplication”are constructed for comparisons.Finally,the LSP uncertainties are evaluated by the accuracy,susceptibility index distribution,etc.Results show that:(1)multifactor-based models have generally higher LSP performance and lower uncertainties than those of factors selection-based models;(2)Influence degree of different machine learning on LSP accuracy is greater than that of different factor selection methods.Conclusively,the above commonly used conditioning factor selection methods are not ideal for improving LSP performance and may complicate the LSP processes.In contrast,a satisfied combination of conditioning factors can be constructed according to the proposed principle.展开更多
The increasing prevalence of diabetes has led to a growing population of endstage kidney disease(ESKD)patients with diabetes.Currently,kidney transplantation is the best treatment option for ESKD patients;however,it i...The increasing prevalence of diabetes has led to a growing population of endstage kidney disease(ESKD)patients with diabetes.Currently,kidney transplantation is the best treatment option for ESKD patients;however,it is limited by the lack of donors.Therefore,dialysis has become the standard treatment for ESKD patients.However,the optimal dialysis method for diabetic ESKD patients remains controversial.ESKD patients with diabetes often present with complex conditions and numerous complications.Furthermore,these patients face a high risk of infection and technical failure,are more susceptible to malnutrition,have difficulty establishing vascular access,and experience more frequent blood sugar fluctuations than the general population.Therefore,this article reviews nine critical aspects:Survival rate,glucose metabolism disorder,infectious complications,cardiovascular events,residual renal function,quality of life,economic benefits,malnutrition,and volume load.This study aims to assist clinicians in selecting individualized treatment methods by comparing the advantages and disadvantages of hemodialysis and peritoneal dialysis,thereby improving patients’quality of life and survival rates.展开更多
In classification problems,datasets often contain a large amount of features,but not all of them are relevant for accurate classification.In fact,irrelevant features may even hinder classification accuracy.Feature sel...In classification problems,datasets often contain a large amount of features,but not all of them are relevant for accurate classification.In fact,irrelevant features may even hinder classification accuracy.Feature selection aims to alleviate this issue by minimizing the number of features in the subset while simultaneously minimizing the classification error rate.Single-objective optimization approaches employ an evaluation function designed as an aggregate function with a parameter,but the results obtained depend on the value of the parameter.To eliminate this parameter’s influence,the problem can be reformulated as a multi-objective optimization problem.The Whale Optimization Algorithm(WOA)is widely used in optimization problems because of its simplicity and easy implementation.In this paper,we propose a multi-strategy assisted multi-objective WOA(MSMOWOA)to address feature selection.To enhance the algorithm’s search ability,we integrate multiple strategies such as Levy flight,Grey Wolf Optimizer,and adaptive mutation into it.Additionally,we utilize an external repository to store non-dominant solution sets and grid technology is used to maintain diversity.Results on fourteen University of California Irvine(UCI)datasets demonstrate that our proposed method effectively removes redundant features and improves classification performance.The source code can be accessed from the website:https://github.com/zc0315/MSMOWOA.展开更多
In vehicle edge computing(VEC),asynchronous federated learning(AFL)is used,where the edge receives a local model and updates the global model,effectively reducing the global aggregation latency.Due to different amount...In vehicle edge computing(VEC),asynchronous federated learning(AFL)is used,where the edge receives a local model and updates the global model,effectively reducing the global aggregation latency.Due to different amounts of local data,computing capabilities and locations of the vehicles,renewing the global model with same weight is inappropriate.The above factors will affect the local calculation time and upload time of the local model,and the vehicle may also be affected by Byzantine attacks,leading to the deterioration of the vehicle data.However,based on deep reinforcement learning(DRL),we can consider these factors comprehensively to eliminate vehicles with poor performance as much as possible and exclude vehicles that have suffered Byzantine attacks before AFL.At the same time,when aggregating AFL,we can focus on those vehicles with better performance to improve the accuracy and safety of the system.In this paper,we proposed a vehicle selection scheme based on DRL in VEC.In this scheme,vehicle’s mobility,channel conditions with temporal variations,computational resources with temporal variations,different data amount,transmission channel status of vehicles as well as Byzantine attacks were taken into account.Simulation results show that the proposed scheme effectively improves the safety and accuracy of the global model.展开更多
The selection of important factors in machine learning-based susceptibility assessments is crucial to obtain reliable susceptibility results.In this study,metaheuristic optimization and feature selection techniques we...The selection of important factors in machine learning-based susceptibility assessments is crucial to obtain reliable susceptibility results.In this study,metaheuristic optimization and feature selection techniques were applied to identify the most important input parameters for mapping debris flow susceptibility in the southern mountain area of Chengde City in Hebei Province,China,by using machine learning algorithms.In total,133 historical debris flow records and 16 related factors were selected.The support vector machine(SVM)was first used as the base classifier,and then a hybrid model was introduced by a two-step process.First,the particle swarm optimization(PSO)algorithm was employed to select the SVM model hyperparameters.Second,two feature selection algorithms,namely principal component analysis(PCA)and PSO,were integrated into the PSO-based SVM model,which generated the PCA-PSO-SVM and FS-PSO-SVM models,respectively.Three statistical metrics(accuracy,recall,and specificity)and the area under the receiver operating characteristic curve(AUC)were employed to evaluate and validate the performance of the models.The results indicated that the feature selection-based models exhibited the best performance,followed by the PSO-based SVM and SVM models.Moreover,the performance of the FS-PSO-SVM model was better than that of the PCA-PSO-SVM model,showing the highest AUC,accuracy,recall,and specificity values in both the training and testing processes.It was found that the selection of optimal features is crucial to improving the reliability of debris flow susceptibility assessment results.Moreover,the PSO algorithm was found to be not only an effective tool for hyperparameter optimization,but also a useful feature selection algorithm to improve prediction accuracies of debris flow susceptibility by using machine learning algorithms.The high and very high debris flow susceptibility zone appropriately covers 38.01%of the study area,where debris flow may occur under intensive human activities and heavy rainfall events.展开更多
Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can a...Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification.展开更多
In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selec...In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selection.Themotivation for utilizingGWOandHHOstems fromtheir bio-inspired nature and their demonstrated success in optimization problems.We aimto leverage the strengths of these algorithms to enhance the effectiveness of feature selection in microarray-based cancer classification.We selected leave-one-out cross-validation(LOOCV)to evaluate the performance of both two widely used classifiers,k-nearest neighbors(KNN)and support vector machine(SVM),on high-dimensional cancer microarray data.The proposed method is extensively tested on six publicly available cancer microarray datasets,and a comprehensive comparison with recently published methods is conducted.Our hybrid algorithm demonstrates its effectiveness in improving classification performance,Surpassing alternative approaches in terms of precision.The outcomes confirm the capability of our method to substantially improve both the precision and efficiency of cancer classification,thereby advancing the development ofmore efficient treatment strategies.The proposed hybridmethod offers a promising solution to the gene selection problem in microarray-based cancer classification.It improves the accuracy and efficiency of cancer diagnosis and treatment,and its superior performance compared to other methods highlights its potential applicability in realworld cancer classification tasks.By harnessing the complementary search mechanisms of GWO and HHO,we leverage their bio-inspired behavior to identify informative genes relevant to cancer diagnosis and treatment.展开更多
This review updates the present status of the field of molecular markers and marker-assisted selection(MAS),using the example of drought tolerance in barley.The accuracy of selected quantitative trait loci(QTLs),candi...This review updates the present status of the field of molecular markers and marker-assisted selection(MAS),using the example of drought tolerance in barley.The accuracy of selected quantitative trait loci(QTLs),candidate genes and suggested markers was assessed in the barley genome cv.Morex.Six common strategies are described for molecular marker development,candidate gene identification and verification,and their possible applications in MAS to improve the grain yield and yield components in barley under drought stress.These strategies are based on the following five principles:(1)Molecular markers are designated as genomic‘tags’,and their‘prediction’is strongly dependent on their distance from a candidate gene on genetic or physical maps;(2)plants react differently under favourable and stressful conditions or depending on their stage of development;(3)each candidate gene must be verified by confirming its expression in the relevant conditions,e.g.,drought;(4)the molecular marker identified must be validated for MAS for tolerance to drought stress and improved grain yield;and(5)the small number of molecular markers realized for MAS in breeding,from among the many studies targeting candidate genes,can be explained by the complex nature of drought stress,and multiple stress-responsive genes in each barley genotype that are expressed differentially depending on many other factors.展开更多
Birds,a fascinating and diverse group occupying various habitats worldwide,exhibit a wide range of life-history traits,reproductive methods,and migratory behaviors,all of which influence their immune systems.The assoc...Birds,a fascinating and diverse group occupying various habitats worldwide,exhibit a wide range of life-history traits,reproductive methods,and migratory behaviors,all of which influence their immune systems.The association between major histocompatibility complex(MHC)genes and certain ecological factors in response to pathogen selection has been extensively studied;however,the role of the co-working molecule T cell receptor(TCR)remains poorly understood.This study aimed to analyze the copy numbers of TCR-V genes,the selection pressure(ωvalue)on MHC genes using available genomic data,and their potential ecological correlates across 93 species from 13 orders.The study was conducted using the publicly available genome data of birds.Our findings suggested that phylogeny influences the variability in TCR-V gene copy numbers and MHC selection pressure.The phylogenetic generalized least squares regression model revealed that TCR-Vαδcopy number and MHC-I selection pressure were positively associated with body mass.Clutch size was correlated with MHC selection pressure,and Migration was correlated with TCR-Vβcopy number.Further analyses revealed that the TCR-Vβcopy number was positively correlated with MHC-IIB selection pressure,while the TCR-Vγcopy number was negatively correlated with MHC-I peptide-binding region selection pressure.Our findings suggest that TCR-V diversity is significant in adaptive evolution and is related to species’life-history strategies and immunological defenses and provide valuable insights into the mechanisms underlying TCR-V gene duplication and MHC selection in avian species.展开更多
基金the Deanship of Scientifc Research at King Khalid University for funding this work through large group Research Project under grant number RGP2/421/45supported via funding from Prince Sattam bin Abdulaziz University project number(PSAU/2024/R/1446)+1 种基金supported by theResearchers Supporting Project Number(UM-DSR-IG-2023-07)Almaarefa University,Riyadh,Saudi Arabia.supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.2021R1F1A1055408).
文摘Machine learning(ML)is increasingly applied for medical image processing with appropriate learning paradigms.These applications include analyzing images of various organs,such as the brain,lung,eye,etc.,to identify specific flaws/diseases for diagnosis.The primary concern of ML applications is the precise selection of flexible image features for pattern detection and region classification.Most of the extracted image features are irrelevant and lead to an increase in computation time.Therefore,this article uses an analytical learning paradigm to design a Congruent Feature Selection Method to select the most relevant image features.This process trains the learning paradigm using similarity and correlation-based features over different textural intensities and pixel distributions.The similarity between the pixels over the various distribution patterns with high indexes is recommended for disease diagnosis.Later,the correlation based on intensity and distribution is analyzed to improve the feature selection congruency.Therefore,the more congruent pixels are sorted in the descending order of the selection,which identifies better regions than the distribution.Now,the learning paradigm is trained using intensity and region-based similarity to maximize the chances of selection.Therefore,the probability of feature selection,regardless of the textures and medical image patterns,is improved.This process enhances the performance of ML applications for different medical image processing.The proposed method improves the accuracy,precision,and training rate by 13.19%,10.69%,and 11.06%,respectively,compared to other models for the selected dataset.The mean error and selection time is also reduced by 12.56%and 13.56%,respectively,compared to the same models and dataset.
文摘This article constructs statistical selection procedures for exponential populations that may differ in only the threshold parameters. The scale parameters of the populations are assumed common and known. The independent samples drawn from the populations are taken to be of the same size. The best population is defined as the one associated with the largest threshold parameter. In case more than one population share the largest threshold, one of these is tagged at random and denoted the best. Two procedures are developed for choosing a subset of the populations having the property that the chosen subset contains the best population with a prescribed probability. One procedure is based on the sample minimum values drawn from the populations, and another is based on the sample means from the populations. An “Indifference Zone” (IZ) selection procedure is also developed based on the sample minimum values. The IZ procedure asserts that the population with the largest test statistic (e.g., the sample minimum) is the best population. With this approach, the sample size is chosen so as to guarantee that the probability of a correct selection is no less than a prescribed probability in the parameter region where the largest threshold is at least a prescribed amount larger than the remaining thresholds. Numerical examples are given, and the computer R-codes for all calculations are given in the Appendices.
文摘Non-orthogonal multiple access(NOMA)is a promising technology for the next generation wireless communication networks.The benefits of this technology can be further enhanced through deployment in conjunction with multiple-input multipleoutput(MIMO)systems.Antenna selection plays a critical role in MIMO–NOMA systems as it has the potential to significantly reduce the cost and complexity associated with radio frequency chains.This paper considers antenna selection for downlink MIMO–NOMA networks with multiple-antenna basestation(BS)and multiple-antenna user equipments(UEs).An iterative antenna selection scheme is developed for a two-user system,and to determine the initial power required for this selection scheme,a power estimation method is also proposed.The proposed algorithm is then extended to a general multiuser NOMA system.Numerical results demonstrate that the proposed antenna selection algorithm achieves near-optimal performance with much lower computational complexity in both two-user and multiuser scenarios.
文摘This study provides a systematic investigation into the influence of feature selection methods on cryptocurrency price forecasting models employing technical indicators.In this work,over 130 technical indicators—covering momentum,volatility,volume,and trend-related technical indicators—are subjected to three distinct feature selection approaches.Specifically,mutual information(MI),recursive feature elimination(RFE),and random forest importance(RFI).By extracting an optimal set of 20 predictors,the proposed framework aims to mitigate redundancy and overfitting while enhancing interpretability.These feature subsets are integrated into support vector regression(SVR),Huber regressors,and k-nearest neighbors(KNN)models to forecast the prices of three leading cryptocurrencies—Bitcoin(BTC/USDT),Ethereum(ETH/USDT),and Binance Coin(BNB/USDT)—across horizons ranging from 1 to 20 days.Model evaluation employs the coefficient of determination(R2)and the root mean squared logarithmic error(RMSLE),alongside a walk-forward validation scheme to approximate real-world trading contexts.Empirical results indicate that incorporating momentum and volatility measures substantially improves predictive accuracy,with particularly pronounced effects observed at longer forecast windows.Moreover,indicators related to volume and trend provide incremental benefits in select market conditions.Notably,an 80%–85% reduction in the original feature set frequently maintains or enhances model performance relative to the complete indicator set.These findings highlight the critical role of targeted feature selection in addressing high-dimensional financial data challenges while preserving model robustness.This research advances the field of cryptocurrency forecasting by offering a rigorous comparison of feature selection methods and their effects on multiple digital assets and prediction horizons.The outcomes highlight the importance of dimension-reduction strategies in developing more efficient and resilient forecasting algorithms.Future efforts should incorporate high-frequency data and explore alternative selection techniques to further refine predictive accuracy in this highly volatile domain.
基金supported by the Anhui Provincial Department of Education University Research Project(2024AH051375)Research Project of Chizhou University(CZ2022ZRZ06)+1 种基金Anhui Province Natural Science Research Project of Colleges and Universities(2024AH051368)Excellent Scientific Research and Innovation Team of Anhui Colleges(2022AH010098).
文摘Feature selection methods rooted in rough sets confront two notable limitations:their high computa-tional complexity and sensitivity to noise,rendering them impractical for managing large-scale and noisy datasets.The primary issue stems from these methods’undue reliance on all samples.To overcome these challenges,we introduce the concept of cross-similarity grounded in a robust fuzzy relation and design a rapid and robust feature selection algorithm.Firstly,we construct a robust fuzzy relation by introducing a truncation parameter.Then,based on this fuzzy relation,we propose the concept of cross-similarity,which emphasizes the sample-to-sample similarity relations that uniquely determine feature importance,rather than considering all such relations equally.After studying the manifestations and properties of cross-similarity across different fuzzy granularities,we propose a forward greedy feature selection algorithm that leverages cross-similarity as the foundation for information measurement.This algorithm significantly reduces the time complexity from O(m2n2)to O(mn2).Experimental findings reveal that the average runtime of five state-of-the-art comparison algorithms is roughly 3.7 times longer than our algorithm,while our algorithm achieves an average accuracy that surpasses those of the five comparison algorithms by approximately 3.52%.This underscores the effectiveness of our approach.This paper paves the way for applying feature selection algorithms grounded in fuzzy rough sets to large-scale gene datasets.
文摘Heart disease prediction is a critical issue in healthcare,where accurate early diagnosis can save lives and reduce healthcare costs.The problem is inherently complex due to the high dimensionality of medical data,irrelevant or redundant features,and the variability in risk factors such as age,lifestyle,andmedical history.These challenges often lead to inefficient and less accuratemodels.Traditional predictionmethodologies face limitations in effectively handling large feature sets and optimizing classification performance,which can result in overfitting poor generalization,and high computational cost.This work proposes a novel classification model for heart disease prediction that addresses these challenges by integrating feature selection through a Genetic Algorithm(GA)with an ensemble deep learning approach optimized using the Tunicate Swarm Algorithm(TSA).GA selects the most relevant features,reducing dimensionality and improvingmodel efficiency.Theselected features are then used to train an ensemble of deep learning models,where the TSA optimizes the weight of each model in the ensemble to enhance prediction accuracy.This hybrid approach addresses key challenges in the field,such as high dimensionality,redundant features,and classification performance,by introducing an efficient feature selection mechanism and optimizing the weighting of deep learning models in the ensemble.These enhancements result in a model that achieves superior accuracy,generalization,and efficiency compared to traditional methods.The proposed model demonstrated notable advancements in both prediction accuracy and computational efficiency over traditionalmodels.Specifically,it achieved an accuracy of 97.5%,a sensitivity of 97.2%,and a specificity of 97.8%.Additionally,with a 60-40 data split and 5-fold cross-validation,the model showed a significant reduction in training time(90 s),memory consumption(950 MB),and CPU usage(80%),highlighting its effectiveness in processing large,complex medical datasets for heart disease prediction.
文摘In the task of Facial Expression Recognition(FER),data uncertainty has been a critical factor affecting performance,typically arising from the ambiguity of facial expressions,low-quality images,and the subjectivity of annotators.Tracking the training history reveals that misclassified samples often exhibit high confidence and excessive uncertainty in the early stages of training.To address this issue,we propose an uncertainty-based robust sample selection strategy,which combines confidence error with RandAugment to improve image diversity,effectively reducing overfitting caused by uncertain samples during deep learning model training.To validate the effectiveness of the proposed method,extensive experiments were conducted on FER public benchmarks.The accuracy obtained were 89.08%on RAF-DB,63.12%on AffectNet,and 88.73%on FERPlus.
文摘In recent years, particle swarm optimization (PSO) has received widespread attention in feature selection due to its simplicity and potential for global search. However, in traditional PSO, particles primarily update based on two extreme values: personal best and global best, which limits the diversity of information. Ideally, particles should learn from multiple advantageous particles to enhance interactivity and optimization efficiency. Accordingly, this paper proposes a PSO that simulates the evolutionary dynamics of species survival in mountain peak ecology (PEPSO) for feature selection. Based on the pyramid topology, the algorithm simulates the features of mountain peak ecology in nature and the competitive-cooperative strategies among species. According to the principles of the algorithm, the population is first adaptively divided into many subgroups based on the fitness level of particles. Then, particles within each subgroup are divided into three different types based on their evolutionary levels, employing different adaptive inertia weight rules and dynamic learning mechanisms to define distinct learning modes. Consequently, all particles play their respective roles in promoting the global optimization performance of the algorithm, similar to different species in the ecological pattern of mountain peaks. Experimental validation of the PEPSO performance was conducted on 18 public datasets. The experimental results demonstrate that the PEPSO outperforms other PSO variant-based feature selection methods and mainstream feature selection methods based on intelligent optimization algorithms in terms of overall performance in global search capability, classification accuracy, and reduction of feature space dimensions. Wilcoxon signed-rank test also confirms the excellent performance of the PEPSO.
基金supported by the National Natural Science Foundation of China(32160782 and 32060737).
文摘The principle of genomic selection(GS) entails estimating breeding values(BVs) by summing all the SNP polygenic effects. The visible/near-infrared spectroscopy(VIS/NIRS) wavelength and abundance values can directly reflect the concentrations of chemical substances, and the measurement of meat traits by VIS/NIRS is similar to the processing of genomic selection data by summing all ‘polygenic effects' associated with spectral feature peaks. Therefore, it is meaningful to investigate the incorporation of VIS/NIRS information into GS models to establish an efficient and low-cost breeding model. In this study, we measured 6 meat quality traits in 359Duroc×Landrace×Yorkshire pigs from Guangxi Zhuang Autonomous Region, China, and genotyped them with high-density SNP chips. According to the completeness of the information for the target population, we proposed 4breeding strategies applied to different scenarios: Ⅰ, only spectral and genotypic data exist for the target population;Ⅱ, only spectral data exist for the target population;Ⅲ, only spectral and genotypic data but with different prediction processes exist for the target population;and Ⅳ, only spectral and phenotypic data exist for the target population.The 4 scenarios were used to evaluate the genomic estimated breeding value(GEBV) accuracy by increasing the VIS/NIR spectral information. In the results of the 5-fold cross-validation, the genetic algorithm showed remarkable potential for preselection of feature wavelengths. The breeding efficiency of Strategies Ⅱ, Ⅲ, and Ⅳ was superior to that of traditional GS for most traits, and the GEBV prediction accuracy was improved by 32.2, 40.8 and 15.5%, respectively on average. Among them, the prediction accuracy of Strategy Ⅱ for fat(%) even improved by 50.7% compared to traditional GS. The GEBV prediction accuracy of Strategy Ⅰ was nearly identical to that of traditional GS, and the fluctuation range was less than 7%. Moreover, the breeding cost of the 4 strategies was lower than that of traditional GS methods, with Strategy Ⅳ being the lowest as it did not require genotyping.Our findings demonstrate that GS methods based on VIS/NIRS data have significant predictive potential and are worthy of further research to provide a valuable reference for the development of effective and affordable breeding strategies.
文摘Addressing the complex issue of emergency resource distribution center site selection in uncertain environments, this study was conducted to comprehensively consider factors such as uncertainty parameters and the urgency of demand at disaster-affected sites. Firstly, urgency cost, economic cost, and transportation distance cost were identified as key objectives. The study applied fuzzy theory integration to construct a triangular fuzzy multi-objective site selection decision model. Next, the defuzzification theory transformed the fuzzy decision model into a precise one. Subsequently, an improved Chaotic Quantum Multi-Objective Harris Hawks Optimization (CQ-MOHHO) algorithm was proposed to solve the model. The CQ-MOHHO algorithm was shown to rapidly produce high-quality Pareto front solutions and identify optimal site selection schemes for emergency resource distribution centers through case studies. This outcome verified the feasibility and efficacy of the site selection decision model and the CQ-MOHHO algorithm. To further assess CQ-MOHHO’s performance, Zitzler-Deb-Thiele (ZDT) test functions, commonly used in multi-objective optimization, were employed. Comparisons with Multi-Objective Harris Hawks Optimization (MOHHO), Non-dominated Sorting Genetic Algorithm II (NSGA-II), and Multi-Objective Grey Wolf Optimizer (MOGWO) using Generational Distance (GD), Hypervolume (HV), and Inverted Generational Distance (IGD) metrics showed that CQ-MOHHO achieved superior global search ability, faster convergence, and higher solution quality. The CQ-MOHHO algorithm efficiently achieved a balance between multiple objectives, providing decision-makers with satisfactory solutions and a valuable reference for researching and applying emergency site selection problems.
基金National Natural Science Foundation of China(62161048)Sichuan Science and Technology Program(2022NSFSC0547,2022ZYD0109)。
文摘In this paper,a feature selection method for determining input parameters in antenna modeling is proposed.In antenna modeling,the input feature of artificial neural network(ANN)is geometric parameters.The selection criteria contain correlation and sensitivity between the geometric parameter and the electromagnetic(EM)response.Maximal information coefficient(MIC),an exploratory data mining tool,is introduced to evaluate both linear and nonlinear correlations.The EM response range is utilized to evaluate the sensitivity.The wide response range corresponding to varying values of a parameter implies the parameter is highly sensitive and the narrow response range suggests the parameter is insensitive.Only the parameter which is highly correlative and sensitive is selected as the input of ANN,and the sampling space of the model is highly reduced.The modeling of a wideband and circularly polarized antenna is studied as an example to verify the effectiveness of the proposed method.The number of input parameters decreases from8 to 4.The testing errors of|S_(11)|and axis ratio are reduced by8.74%and 8.95%,respectively,compared with the ANN with no feature selection.
基金funded by the Natural Science Foundation of China(Grant Nos.42377164 and 41972280)the Badong National Observation and Research Station of Geohazards(Grant No.BNORSG-202305).
文摘Landslide susceptibility prediction(LSP)is significantly affected by the uncertainty issue of landslide related conditioning factor selection.However,most of literature only performs comparative studies on a certain conditioning factor selection method rather than systematically study this uncertainty issue.Targeted,this study aims to systematically explore the influence rules of various commonly used conditioning factor selection methods on LSP,and on this basis to innovatively propose a principle with universal application for optimal selection of conditioning factors.An'yuan County in southern China is taken as example considering 431 landslides and 29 types of conditioning factors.Five commonly used factor selection methods,namely,the correlation analysis(CA),linear regression(LR),principal component analysis(PCA),rough set(RS)and artificial neural network(ANN),are applied to select the optimal factor combinations from the original 29 conditioning factors.The factor selection results are then used as inputs of four types of common machine learning models to construct 20 types of combined models,such as CA-multilayer perceptron,CA-random forest.Additionally,multifactor-based multilayer perceptron random forest models that selecting conditioning factors based on the proposed principle of“accurate data,rich types,clear significance,feasible operation and avoiding duplication”are constructed for comparisons.Finally,the LSP uncertainties are evaluated by the accuracy,susceptibility index distribution,etc.Results show that:(1)multifactor-based models have generally higher LSP performance and lower uncertainties than those of factors selection-based models;(2)Influence degree of different machine learning on LSP accuracy is greater than that of different factor selection methods.Conclusively,the above commonly used conditioning factor selection methods are not ideal for improving LSP performance and may complicate the LSP processes.In contrast,a satisfied combination of conditioning factors can be constructed according to the proposed principle.
基金Supported by Science and Technology Department of Jilin Province,No.YDZJ202201ZYTS110 and No.20200201352JC.
文摘The increasing prevalence of diabetes has led to a growing population of endstage kidney disease(ESKD)patients with diabetes.Currently,kidney transplantation is the best treatment option for ESKD patients;however,it is limited by the lack of donors.Therefore,dialysis has become the standard treatment for ESKD patients.However,the optimal dialysis method for diabetic ESKD patients remains controversial.ESKD patients with diabetes often present with complex conditions and numerous complications.Furthermore,these patients face a high risk of infection and technical failure,are more susceptible to malnutrition,have difficulty establishing vascular access,and experience more frequent blood sugar fluctuations than the general population.Therefore,this article reviews nine critical aspects:Survival rate,glucose metabolism disorder,infectious complications,cardiovascular events,residual renal function,quality of life,economic benefits,malnutrition,and volume load.This study aims to assist clinicians in selecting individualized treatment methods by comparing the advantages and disadvantages of hemodialysis and peritoneal dialysis,thereby improving patients’quality of life and survival rates.
基金supported in part by the Natural Science Youth Foundation of Hebei Province under Grant F2019403207in part by the PhD Research Startup Foundation of Hebei GEO University under Grant BQ2019055+3 种基金in part by the Open Research Project of the Hubei Key Laboratory of Intelligent Geo-Information Processing under Grant KLIGIP-2021A06in part by the Fundamental Research Funds for the Universities in Hebei Province under Grant QN202220in part by the Science and Technology Research Project for Universities of Hebei under Grant ZD2020344in part by the Guangxi Natural Science Fund General Project under Grant 2021GXNSFAA075029.
文摘In classification problems,datasets often contain a large amount of features,but not all of them are relevant for accurate classification.In fact,irrelevant features may even hinder classification accuracy.Feature selection aims to alleviate this issue by minimizing the number of features in the subset while simultaneously minimizing the classification error rate.Single-objective optimization approaches employ an evaluation function designed as an aggregate function with a parameter,but the results obtained depend on the value of the parameter.To eliminate this parameter’s influence,the problem can be reformulated as a multi-objective optimization problem.The Whale Optimization Algorithm(WOA)is widely used in optimization problems because of its simplicity and easy implementation.In this paper,we propose a multi-strategy assisted multi-objective WOA(MSMOWOA)to address feature selection.To enhance the algorithm’s search ability,we integrate multiple strategies such as Levy flight,Grey Wolf Optimizer,and adaptive mutation into it.Additionally,we utilize an external repository to store non-dominant solution sets and grid technology is used to maintain diversity.Results on fourteen University of California Irvine(UCI)datasets demonstrate that our proposed method effectively removes redundant features and improves classification performance.The source code can be accessed from the website:https://github.com/zc0315/MSMOWOA.
基金supported in part by the National Natural Science Foundation of China(No.61701197)in part by the National Key Research and Development Program of China(No.2021YFA1000500(4))in part by the 111 Project(No.B23008).
文摘In vehicle edge computing(VEC),asynchronous federated learning(AFL)is used,where the edge receives a local model and updates the global model,effectively reducing the global aggregation latency.Due to different amounts of local data,computing capabilities and locations of the vehicles,renewing the global model with same weight is inappropriate.The above factors will affect the local calculation time and upload time of the local model,and the vehicle may also be affected by Byzantine attacks,leading to the deterioration of the vehicle data.However,based on deep reinforcement learning(DRL),we can consider these factors comprehensively to eliminate vehicles with poor performance as much as possible and exclude vehicles that have suffered Byzantine attacks before AFL.At the same time,when aggregating AFL,we can focus on those vehicles with better performance to improve the accuracy and safety of the system.In this paper,we proposed a vehicle selection scheme based on DRL in VEC.In this scheme,vehicle’s mobility,channel conditions with temporal variations,computational resources with temporal variations,different data amount,transmission channel status of vehicles as well as Byzantine attacks were taken into account.Simulation results show that the proposed scheme effectively improves the safety and accuracy of the global model.
基金supported by the Second Tibetan Plateau Scientific Expedition and Research Program(Grant no.2019QZKK0904)Natural Science Foundation of Hebei Province(Grant no.D2022403032)S&T Program of Hebei(Grant no.E2021403001).
文摘The selection of important factors in machine learning-based susceptibility assessments is crucial to obtain reliable susceptibility results.In this study,metaheuristic optimization and feature selection techniques were applied to identify the most important input parameters for mapping debris flow susceptibility in the southern mountain area of Chengde City in Hebei Province,China,by using machine learning algorithms.In total,133 historical debris flow records and 16 related factors were selected.The support vector machine(SVM)was first used as the base classifier,and then a hybrid model was introduced by a two-step process.First,the particle swarm optimization(PSO)algorithm was employed to select the SVM model hyperparameters.Second,two feature selection algorithms,namely principal component analysis(PCA)and PSO,were integrated into the PSO-based SVM model,which generated the PCA-PSO-SVM and FS-PSO-SVM models,respectively.Three statistical metrics(accuracy,recall,and specificity)and the area under the receiver operating characteristic curve(AUC)were employed to evaluate and validate the performance of the models.The results indicated that the feature selection-based models exhibited the best performance,followed by the PSO-based SVM and SVM models.Moreover,the performance of the FS-PSO-SVM model was better than that of the PCA-PSO-SVM model,showing the highest AUC,accuracy,recall,and specificity values in both the training and testing processes.It was found that the selection of optimal features is crucial to improving the reliability of debris flow susceptibility assessment results.Moreover,the PSO algorithm was found to be not only an effective tool for hyperparameter optimization,but also a useful feature selection algorithm to improve prediction accuracies of debris flow susceptibility by using machine learning algorithms.The high and very high debris flow susceptibility zone appropriately covers 38.01%of the study area,where debris flow may occur under intensive human activities and heavy rainfall events.
基金financial supports from National Natural Science Foundation of China(No.62205172)Huaneng Group Science and Technology Research Project(No.HNKJ22-H105)Tsinghua University Initiative Scientific Research Program and the International Joint Mission on Climate Change and Carbon Neutrality。
文摘Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification.
基金the Deputyship for Research and Innovation,“Ministry of Education”in Saudi Arabia for funding this research(IFKSUOR3-014-3).
文摘In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selection.Themotivation for utilizingGWOandHHOstems fromtheir bio-inspired nature and their demonstrated success in optimization problems.We aimto leverage the strengths of these algorithms to enhance the effectiveness of feature selection in microarray-based cancer classification.We selected leave-one-out cross-validation(LOOCV)to evaluate the performance of both two widely used classifiers,k-nearest neighbors(KNN)and support vector machine(SVM),on high-dimensional cancer microarray data.The proposed method is extensively tested on six publicly available cancer microarray datasets,and a comprehensive comparison with recently published methods is conducted.Our hybrid algorithm demonstrates its effectiveness in improving classification performance,Surpassing alternative approaches in terms of precision.The outcomes confirm the capability of our method to substantially improve both the precision and efficiency of cancer classification,thereby advancing the development ofmore efficient treatment strategies.The proposed hybridmethod offers a promising solution to the gene selection problem in microarray-based cancer classification.It improves the accuracy and efficiency of cancer diagnosis and treatment,and its superior performance compared to other methods highlights its potential applicability in realworld cancer classification tasks.By harnessing the complementary search mechanisms of GWO and HHO,we leverage their bio-inspired behavior to identify informative genes relevant to cancer diagnosis and treatment.
基金supported by Bolashak International Fellowships,Center for International Programs,Ministry of Education and Science,KazakhstanAP14869777 supported by the Ministry of Education and Science,KazakhstanResearch Projects BR10764991 and BR10765000 supported by the Ministry of Agriculture,Kazakhstan。
文摘This review updates the present status of the field of molecular markers and marker-assisted selection(MAS),using the example of drought tolerance in barley.The accuracy of selected quantitative trait loci(QTLs),candidate genes and suggested markers was assessed in the barley genome cv.Morex.Six common strategies are described for molecular marker development,candidate gene identification and verification,and their possible applications in MAS to improve the grain yield and yield components in barley under drought stress.These strategies are based on the following five principles:(1)Molecular markers are designated as genomic‘tags’,and their‘prediction’is strongly dependent on their distance from a candidate gene on genetic or physical maps;(2)plants react differently under favourable and stressful conditions or depending on their stage of development;(3)each candidate gene must be verified by confirming its expression in the relevant conditions,e.g.,drought;(4)the molecular marker identified must be validated for MAS for tolerance to drought stress and improved grain yield;and(5)the small number of molecular markers realized for MAS in breeding,from among the many studies targeting candidate genes,can be explained by the complex nature of drought stress,and multiple stress-responsive genes in each barley genotype that are expressed differentially depending on many other factors.
基金supported by the“Pioneer”and“Leading Goose”R&D Program of Zhejiang(No.2022C04014)Zhejiang Science and Technology Major Program on Agricultural New Variety Breeding(No.2021C02068-10).
文摘Birds,a fascinating and diverse group occupying various habitats worldwide,exhibit a wide range of life-history traits,reproductive methods,and migratory behaviors,all of which influence their immune systems.The association between major histocompatibility complex(MHC)genes and certain ecological factors in response to pathogen selection has been extensively studied;however,the role of the co-working molecule T cell receptor(TCR)remains poorly understood.This study aimed to analyze the copy numbers of TCR-V genes,the selection pressure(ωvalue)on MHC genes using available genomic data,and their potential ecological correlates across 93 species from 13 orders.The study was conducted using the publicly available genome data of birds.Our findings suggested that phylogeny influences the variability in TCR-V gene copy numbers and MHC selection pressure.The phylogenetic generalized least squares regression model revealed that TCR-Vαδcopy number and MHC-I selection pressure were positively associated with body mass.Clutch size was correlated with MHC selection pressure,and Migration was correlated with TCR-Vβcopy number.Further analyses revealed that the TCR-Vβcopy number was positively correlated with MHC-IIB selection pressure,while the TCR-Vγcopy number was negatively correlated with MHC-I peptide-binding region selection pressure.Our findings suggest that TCR-V diversity is significant in adaptive evolution and is related to species’life-history strategies and immunological defenses and provide valuable insights into the mechanisms underlying TCR-V gene duplication and MHC selection in avian species.