To study the atmospheric aging of acrylic coatings,a two-year aging exposure experiment was conducted in 13 representative climatic environments in China.An atmospheric aging evaluation model of acrylic coatings was d...To study the atmospheric aging of acrylic coatings,a two-year aging exposure experiment was conducted in 13 representative climatic environments in China.An atmospheric aging evaluation model of acrylic coatings was developed based on aging data including11 environmental factors from 567 cities.A hybrid method of random forest and Spearman correlation analysis was used to reduce the redundancy and multicollinearity of the data set by dimensionality reduction.A semi-supervised collaborative trained regression model was developed with the environmental factors as input and the low-frequency impedance modulus values of the electrochemical impedance spectra of acrylic coatings in 3.5wt%NaCl solution as output.The model improves accuracy compared to supervised learning algorithms model(support vector machines model).The model provides a new method for the rapid evaluation of the aging performance of acrylic coatings,and may also serve as a reference to evaluate the aging performance of other organic coatings.展开更多
In engineering practice,it is often necessary to determine functional relationships between dependent and independent variables.These relationships can be highly nonlinear,and classical regression approaches cannot al...In engineering practice,it is often necessary to determine functional relationships between dependent and independent variables.These relationships can be highly nonlinear,and classical regression approaches cannot always provide sufficiently reliable solutions.Nevertheless,Machine Learning(ML)techniques,which offer advanced regression tools to address complicated engineering issues,have been developed and widely explored.This study investigates the selected ML techniques to evaluate their suitability for application in the hot deformation behavior of metallic materials.The ML-based regression methods of Artificial Neural Networks(ANNs),Support Vector Machine(SVM),Decision Tree Regression(DTR),and Gaussian Process Regression(GPR)are applied to mathematically describe hot flow stress curve datasets acquired experimentally for a medium-carbon steel.Although the GPR method has not been used for such a regression task before,the results showed that its performance is the most favorable and practically unrivaled;neither the ANN method nor the other studied ML techniques provide such precise results of the solved regression analysis.展开更多
Determination of Shear Bond strength(SBS)at interlayer of double-layer asphalt concrete is crucial in flexible pavement structures.The study used three Machine Learning(ML)models,including K-Nearest Neighbors(KNN),Ext...Determination of Shear Bond strength(SBS)at interlayer of double-layer asphalt concrete is crucial in flexible pavement structures.The study used three Machine Learning(ML)models,including K-Nearest Neighbors(KNN),Extra Trees(ET),and Light Gradient Boosting Machine(LGBM),to predict SBS based on easily determinable input parameters.Also,the Grid Search technique was employed for hyper-parameter tuning of the ML models,and cross-validation and learning curve analysis were used for training the models.The models were built on a database of 240 experimental results and three input variables:temperature,normal pressure,and tack coat rate.Model validation was performed using three statistical criteria:the coefficient of determination(R2),the Root Mean Square Error(RMSE),and the mean absolute error(MAE).Additionally,SHAP analysis was also used to validate the importance of the input variables in the prediction of the SBS.Results show that these models accurately predict SBS,with LGBM providing outstanding performance.SHAP(Shapley Additive explanation)analysis for LGBM indicates that temperature is the most influential factor on SBS.Consequently,the proposed ML models can quickly and accurately predict SBS between two layers of asphalt concrete,serving practical applications in flexible pavement structure design.展开更多
In order to study the characteristics of pure fly ash-based geopolymer concrete(PFGC)conveniently,we used a machine learning method that can quantify the perception of characteristics to predict its compressive streng...In order to study the characteristics of pure fly ash-based geopolymer concrete(PFGC)conveniently,we used a machine learning method that can quantify the perception of characteristics to predict its compressive strength.In this study,505 groups of data were collected,and a new database of compressive strength of PFGC was constructed.In order to establish an accurate prediction model of compressive strength,five different types of machine learning networks were used for comparative analysis.The five machine learning models all showed good compressive strength prediction performance on PFGC.Among them,R2,MSE,RMSE and MAE of decision tree model(DT)are 0.99,1.58,1.25,and 0.25,respectively.While R2,MSE,RMSE and MAE of random forest model(RF)are 0.97,5.17,2.27 and 1.38,respectively.The two models have high prediction accuracy and outstanding generalization ability.In order to enhance the interpretability of model decision-making,we used importance ranking to obtain the perception of machine learning model to 13 variables.These 13 variables include chemical composition of fly ash(SiO_(2)/Al_(2)O_(3),Si/Al),the ratio of alkaline liquid to the binder,curing temperature,curing durations inside oven,fly ash dosage,fine aggregate dosage,coarse aggregate dosage,extra water dosage and sodium hydroxide dosage.Curing temperature,specimen ages and curing durations inside oven have the greatest influence on the prediction results,indicating that curing conditions have more prominent influence on the compressive strength of PFGC than ordinary Portland cement concrete.The importance of curing conditions of PFGC even exceeds that of the concrete mix proportion,due to the low reactivity of pure fly ash.展开更多
Machine learning(ML)is increasingly applied for medical image processing with appropriate learning paradigms.These applications include analyzing images of various organs,such as the brain,lung,eye,etc.,to identify sp...Machine learning(ML)is increasingly applied for medical image processing with appropriate learning paradigms.These applications include analyzing images of various organs,such as the brain,lung,eye,etc.,to identify specific flaws/diseases for diagnosis.The primary concern of ML applications is the precise selection of flexible image features for pattern detection and region classification.Most of the extracted image features are irrelevant and lead to an increase in computation time.Therefore,this article uses an analytical learning paradigm to design a Congruent Feature Selection Method to select the most relevant image features.This process trains the learning paradigm using similarity and correlation-based features over different textural intensities and pixel distributions.The similarity between the pixels over the various distribution patterns with high indexes is recommended for disease diagnosis.Later,the correlation based on intensity and distribution is analyzed to improve the feature selection congruency.Therefore,the more congruent pixels are sorted in the descending order of the selection,which identifies better regions than the distribution.Now,the learning paradigm is trained using intensity and region-based similarity to maximize the chances of selection.Therefore,the probability of feature selection,regardless of the textures and medical image patterns,is improved.This process enhances the performance of ML applications for different medical image processing.The proposed method improves the accuracy,precision,and training rate by 13.19%,10.69%,and 11.06%,respectively,compared to other models for the selected dataset.The mean error and selection time is also reduced by 12.56%and 13.56%,respectively,compared to the same models and dataset.展开更多
The high porosity and tunable chemical functionality of metal-organic frameworks(MOFs)make it a promising catalyst design platform.High-throughput screening of catalytic performance is feasible since the large MOF str...The high porosity and tunable chemical functionality of metal-organic frameworks(MOFs)make it a promising catalyst design platform.High-throughput screening of catalytic performance is feasible since the large MOF structure database is available.In this study,we report a machine learning model for high-throughput screening of MOF catalysts for the CO_(2) cycloaddition reaction.The descriptors for model training were judiciously chosen according to the reaction mechanism,which leads to high accuracy up to 97%for the 75%quantile of the training set as the classification criterion.The feature contribution was further evaluated with SHAP and PDP analysis to provide a certain physical understanding.12,415 hypothetical MOF structures and 100 reported MOFs were evaluated under 100℃ and 1 bar within one day using the model,and 239 potentially efficient catalysts were discovered.Among them,MOF-76(Y)achieved the top performance experimentally among reported MOFs,in good agreement with the prediction.展开更多
Diabetic retinopathy(DR)remains a leading cause of vision impairment and blindness among individuals with diabetes,necessitating innovative approaches to screening and management.This editorial explores the transforma...Diabetic retinopathy(DR)remains a leading cause of vision impairment and blindness among individuals with diabetes,necessitating innovative approaches to screening and management.This editorial explores the transformative potential of artificial intelligence(AI)and machine learning(ML)in revolutionizing DR care.AI and ML technologies have demonstrated remarkable advancements in enhancing the accuracy,efficiency,and accessibility of DR screening,helping to overcome barriers to early detection.These technologies leverage vast datasets to identify patterns and predict disease progression with unprecedented precision,enabling clinicians to make more informed decisions.Furthermore,AI-driven solutions hold promise in personalizing management strategies for DR,incorpo-rating predictive analytics to tailor interventions and optimize treatment path-ways.By automating routine tasks,AI can reduce the burden on healthcare providers,allowing for a more focused allocation of resources towards complex patient care.This review aims to evaluate the current advancements and applic-ations of AI and ML in DR screening,and to discuss the potential of these techno-logies in developing personalized management strategies,ultimately aiming to improve patient outcomes and reduce the global burden of DR.The integration of AI and ML in DR care represents a paradigm shift,offering a glimpse into the future of ophthalmic healthcare.展开更多
Machine learning(ML)is a type of artificial intelligence that assists computers in the acquisition of knowledge through data analysis,thus creating machines that can complete tasks otherwise requiring human intelligen...Machine learning(ML)is a type of artificial intelligence that assists computers in the acquisition of knowledge through data analysis,thus creating machines that can complete tasks otherwise requiring human intelligence.Among its various applications,it has proven groundbreaking in healthcare as well,both in clinical practice and research.In this editorial,we succinctly introduce ML applications and present a study,featured in the latest issue of the World Journal of Clinical Cases.The authors of this study conducted an analysis using both multiple linear regression(MLR)and ML methods to investigate the significant factors that may impact the estimated glomerular filtration rate in healthy women with and without non-alcoholic fatty liver disease(NAFLD).Their results implicated age as the most important determining factor in both groups,followed by lactic dehydrogenase,uric acid,forced expiratory volume in one second,and albumin.In addition,for the NAFLD-group,the 5th and 6th most important impact factors were thyroid-stimulating hormone and systolic blood pressure,as compared to plasma calcium and body fat for the NAFLD+group.However,the study's distinctive contribution lies in its adoption of ML methodologies,showcasing their superiority over traditional statistical approaches(herein MLR),thereby highlighting the potential of ML to represent an invaluable advanced adjunct tool in clinical practice and research.展开更多
BACKGROUND Machine learning(ML),a major branch of artificial intelligence,has not only demonstrated the potential to significantly improve numerous sectors of healthcare but has also made significant contributions to ...BACKGROUND Machine learning(ML),a major branch of artificial intelligence,has not only demonstrated the potential to significantly improve numerous sectors of healthcare but has also made significant contributions to the field of solid organ transplantation.ML provides revolutionary opportunities in areas such as donorrecipient matching,post-transplant monitoring,and patient care by automatically analyzing large amounts of data,identifying patterns,and forecasting outcomes.AIM To conduct a comprehensive bibliometric analysis of publications on the use of ML in transplantation to understand current research trends and their implications.METHODS On July 18,a thorough search strategy was used with the Web of Science database.ML and transplantation-related keywords were utilized.With the aid of the VOS viewer application,the identified articles were subjected to bibliometric variable analysis in order to determine publication counts,citation counts,contributing countries,and institutions,among other factors.RESULTS Of the 529 articles that were first identified,427 were deemed relevant for bibliometric analysis.A surge in publications was observed over the last four years,especially after 2018,signifying growing interest in this area.With 209 publications,the United States emerged as the top contributor.Notably,the"Journal of Heart and Lung Transplantation"and the"American Journal of Transplantation"emerged as the leading journals,publishing the highest number of relevant articles.Frequent keyword searches revealed that patient survival,mortality,outcomes,allocation,and risk assessment were significant themes of focus.CONCLUSION The growing body of pertinent publications highlights ML's growing presence in the field of solid organ transplantation.This bibliometric analysis highlights the growing importance of ML in transplant research and highlights its exciting potential to change medical practices and enhance patient outcomes.Encouraging collaboration between significant contributors can potentially fast-track advancements in this interdisciplinary domain.展开更多
Critical to the safe, efficient, and reliable operation of an autonomous maritime vessel is its ability to perceive the external environment through onboard sensors. For this research, data was collected from a LiDAR ...Critical to the safe, efficient, and reliable operation of an autonomous maritime vessel is its ability to perceive the external environment through onboard sensors. For this research, data was collected from a LiDAR sensor installed on a 16-foot catamaran unmanned vessel. This sensor generated point clouds of the surrounding maritime environment, which were then labeled by hand for training a machine learning (ML) model to perform a semantic segmentation task on LiDAR scans. In particular, the developed semantic segmentation classifies each point-cloud point as belonging to a certain buoy type. This paper describes the developed Unity Game Engine (Unity) simulation to emulate the maritime environment perceived by LiDAR with the goal of generating large (automatically labeled) simulation datasets and improving the ML model performance since hand-labeled real-life LiDAR scan data may be scarce. The Unity simulation data combined with labeled real-life point cloud data was used for a PointNet-based neural network model, the architecture of which is presented in this paper. Fitting the PointNet-based model on the simulation data followed by fine-tuning the combined dataset allowed for accurate semantic segmentation of point clouds on the real-world data. The ML model performance on several combinations of simulation and real-life data is explored. The resulting Intersection over Union (IoU) metric scores are quite high, ranging between 0.78 and 0.89, when validated on simulation and real-life data. The confusion matrix-entry values indicate an accurate semantic segmentation of the buoy types.展开更多
Every second, a large volume of useful data is created in social media about the various kind of online purchases and in another forms of reviews. Particularly, purchased products review data is enormously growing in ...Every second, a large volume of useful data is created in social media about the various kind of online purchases and in another forms of reviews. Particularly, purchased products review data is enormously growing in different database repositories every day. Most of the review data are useful to new customers for theier further purchases as well as existing companies to view customers feedback about various products. Data Mining and Machine Leaning techniques are familiar to analyse such kind of data to visualise and know the potential use of the purchased items through online. The customers are making quality of products through their sentiments about the purchased items from different online companies. In this research work, it is analysed sentiments of Headphone review data, which is collected from online repositories. For the analysis of Headphone review data, some of the Machine Learning techniques like Support Vector Machines, Naive Bayes, Decision Trees and Random Forest Algorithms and a Hybrid method are applied to find the quality via the customers’ sentiments. The accuracy and performance of the taken algorithms are also analysed based on the three types of sentiments such as positive, negative and neutral.展开更多
The purpose of this research paper is to explore how early Machine Learning models have shown a bias in the results where a bias should not be seen. A prime example is an ML model that favors male applicants over fema...The purpose of this research paper is to explore how early Machine Learning models have shown a bias in the results where a bias should not be seen. A prime example is an ML model that favors male applicants over female applicants. While the model is supposed to take into consideration other aspects of the data, it tends to have a bias and skew the results one way or another. Therefore, in this paper, we will be exploring how this bias comes about and how it can be fixed. In this research, I have taken different case studies of real-world examples of these biases being shown. For example, an Amazon hiring application that favored male applicants or a loan application that favored western applicants is both studies that I will reference in this paper and explore the situation itself. In order to find out where the bias is coming from, I have constructed a machine learning model that will use a dataset found on Kaggle, and I will analyze the results of said ML model. The results that the research has yielded clarify the reason for said bias in the artificial intelligence models. The way the model was trained influences the way the results will play out. If the model is trained with a large amount of male applicant data over female applicant data, the model will favor male applicants. Therefore, when they are trained with new data, they are likely to accept applications that are male over female despite having equivalent parts. Later in the paper, I will dive deeper into the way that AI applications work and how they find biases and trends in order to classify things correctly. However, there is a fine line between classification and bias and making sure that it is rightfully corrected and tested is important in machine learning today.展开更多
The rapid growth of machine learning(ML)across fields has intensified the challenge of selecting the right algorithm for specific tasks,known as the Algorithm Selection Problem(ASP).Traditional trial-and-error methods...The rapid growth of machine learning(ML)across fields has intensified the challenge of selecting the right algorithm for specific tasks,known as the Algorithm Selection Problem(ASP).Traditional trial-and-error methods have become impractical due to their resource demands.Automated Machine Learning(AutoML)systems automate this process,but often neglect the group structures and sparsity in meta-features,leading to inefficiencies in algorithm recommendations for classification tasks.This paper proposes a meta-learning approach using Multivariate Sparse Group Lasso(MSGL)to address these limitations.Our method models both within-group and across-group sparsity among meta-features to manage high-dimensional data and reduce multicollinearity across eight meta-feature groups.The Fast Iterative Shrinkage-Thresholding Algorithm(FISTA)with adaptive restart efficiently solves the non-smooth optimization problem.Empirical validation on 145 classification datasets with 17 classification algorithms shows that our meta-learning method outperforms four state-of-the-art approaches,achieving 77.18%classification accuracy,86.07%recommendation accuracy and 88.83%normalized discounted cumulative gain.展开更多
Solar cells made from perovskites have experienced rapid development as examples of third-generation solar cells in recent years. The traditional trial-and-error method is inefficient, and the search space is incredib...Solar cells made from perovskites have experienced rapid development as examples of third-generation solar cells in recent years. The traditional trial-and-error method is inefficient, and the search space is incredibly large. This makes developing advanced perovskite materials, as well as high conversion efficiencies and stability of perovskite solar cells (PSCs), a challenging task. A growing number of data-driven machine learning (ML) applications are being developed in the materials science field, due to the availability of large databases and increased computing power. There are many advantages associated with the use of machine learning to predict the properties of potential perovskite materials, as well as provide additional knowledge on how these materials work to fast-track their progress. Thus, the purpose of this paper is to develop a conceptual model to improve the efficiency of a perovskite solar cell using machine learning techniques in order to improve its performance. This study relies on the application of design science as a method to conduct the research as part of the study. The developed model consists of six phases: Data collection and preprocessing, feature selection and engineering, model training and evaluation, performance assessment, optimization and fine-tuning, and deployment and application. As a result of this model, there is a great deal of promise in advancing the field of perovskite solar cells as well as providing a basis for developing more efficient and cost-effective solar energy technologies in the future.展开更多
NJmat is a user-friendly,data-driven machine learning interface designed for materials design and analysis.The platform integrates advanced computational techniques,including natural language processing(NLP),large lan...NJmat is a user-friendly,data-driven machine learning interface designed for materials design and analysis.The platform integrates advanced computational techniques,including natural language processing(NLP),large language models(LLM),machine learning potentials(MLP),and graph neural networks(GNN),to facili-tate materials discovery.The platform has been applied in diverse materials research areas,including perovskite surface design,catalyst discovery,battery materials screening,structural alloy design,and molecular informatics.By automating feature selection,predictive modeling,and result interpretation,NJmat accelerates the development of high-performance materials across energy storage,conversion,and structural applications.Additionally,NJmat serves as an educational tool,allowing students and researchers to apply machine learning techniques in materials science with minimal coding expertise.Through automated feature extraction,genetic algorithms,and interpretable machine learning models,NJmat simplifies the workflow for materials informatics,bridging the gap between AI and experimental materials research.The latest version(available at https://figshare.com/articles/software/NJmatML/24607893(accessed on 01 January 2025))enhances its functionality by incorporating NJmatNLP,a module leveraging language models like MatBERT and those based on Word2Vec to support materials prediction tasks.By utilizing clustering and cosine similarity analysis with UMAP visualization,NJmat enables intuitive exploration of materials datasets.While NJmat primarily focuses on structure-property relationships and the discovery of novel chemistries,it can also assist in optimizing processing conditions when relevant parameters are included in the training data.By providing an accessible,integrated environment for machine learning-driven materials discovery,NJmat aligns with the objectives of the Materials Genome Initiative and promotes broader adoption of AI techniques in materials science.展开更多
Accurate prediction of the remaining useful life(RUL)is crucial for the design and management of lithium-ion batteries.Although various machine learning models offer promising predictions,one critical but often overlo...Accurate prediction of the remaining useful life(RUL)is crucial for the design and management of lithium-ion batteries.Although various machine learning models offer promising predictions,one critical but often overlooked challenge is their demand for considerable run-to-failure data for training.Collection of such training data leads to prohibitive testing efforts as the run-to-failure tests can last for years.Here,we propose a semi-supervised representation learning method to enhance prediction accuracy by learning from data without RUL labels.Our approach builds on a sophisticated deep neural network that comprises an encoder and three decoder heads to extract time-dependent representation features from short-term battery operating data regardless of the existence of RUL labels.The approach is validated using three datasets collected from 34 batteries operating under various conditions,encompassing over 19,900 charge and discharge cycles.Our method achieves a root mean squared error(RMSE)within 25 cycles,even when only 1/50 of the training dataset is labelled,representing a reduction of 48%compared to the conventional approach.We also demonstrate the method's robustness with varying numbers of labelled data and different weights assigned to the three decoder heads.The projection of extracted features in low space reveals that our method effectively learns degradation features from unlabelled data.Our approach highlights the promise of utilising semi-supervised learning to reduce the data demand for reliability monitoring of energy devices.展开更多
Monitoring of the mechanical behavior of underwater shield tunnels is vital for ensuring their long-term structural stability.Typically determined by empirical or semi-empirical methods,the limited number of monitorin...Monitoring of the mechanical behavior of underwater shield tunnels is vital for ensuring their long-term structural stability.Typically determined by empirical or semi-empirical methods,the limited number of monitoring points and coarse monitoring schemes pose huge challenges in terms of capturing the complete mechanical state of the entire structure.Therefore,with the aim of optimizing the monitoring scheme,this study introduces a spatial deduction model for the stress distribution of the overall structure using a machine learning algorithm.Initially,clustering experiments were performed on a numerical data set to determine the typical positions of structural mechanical responses.Subsequently,supervised learning methods were applied to derive the data information across the entire surface by using the data from these typical positions,which allows flexibility in the number and combinations of these points.According to the evaluation results of the model under various conditions,the optimized number of monitoring points and their locations are determined.Experimental findings suggest that an excessive number of monitoring points results in information redundancy,thus diminishing the deduction capability.The primary positions for monitoring points are determined as the spandrel and hance of the tunnel structure,with the arch crown and inch arch serving as additional positions to enhance the monitoring network.Compared with common methods,the proposed model shows significantly improved characterization abilities,establishing its reliability for optimizing the monitoring scheme.展开更多
Machine learning(ML)has recently enabled many modeling tasks in design,manufacturing,and condition monitoring due to its unparalleled learning ability using existing data.Data have become the limiting factor when impl...Machine learning(ML)has recently enabled many modeling tasks in design,manufacturing,and condition monitoring due to its unparalleled learning ability using existing data.Data have become the limiting factor when implementing ML in industry.However,there is no systematic investigation on how data quality can be assessed and improved for ML-based design and manufacturing.The aim of this survey is to uncover the data challenges in this domain and review the techniques used to resolve them.To establish the background for the subsequent analysis,crucial data terminologies in ML-based modeling are reviewed and categorized into data acquisition,management,analysis,and utilization.Thereafter,the concepts and frameworks established to evaluate data quality and imbalance,including data quality assessment,data readiness,information quality,data biases,fairness,and diversity,are further investigated.The root causes and types of data challenges,including human factors,complex systems,complicated relationships,lack of data quality,data heterogeneity,data imbalance,and data scarcity,are identified and summarized.Methods to improve data quality and mitigate data imbalance and their applications in this domain are reviewed.This literature review focuses on two promising methods:data augmentation and active learning.The strengths,limitations,and applicability of the surveyed techniques are illustrated.The trends of data augmentation and active learning are discussed with respect to their applications,data types,and approaches.Based on this discussion,future directions for data quality improvement and data imbalance mitigation in this domain are identified.展开更多
Objective: This study investigates the auxiliary role of resting-state electroencephalography (EEG) in the clinical diagnosis of attention-deficit hyperactivity disorder (ADHD) using machine learning techniques. Metho...Objective: This study investigates the auxiliary role of resting-state electroencephalography (EEG) in the clinical diagnosis of attention-deficit hyperactivity disorder (ADHD) using machine learning techniques. Methods: Resting-state EEG recordings were obtained from 57 children, comprising 28 typically developing children and 29 children diagnosed with ADHD. The EEG signal data from both groups were analyzed. To ensure analytical accuracy, artifacts and noise in the EEG signals were removed using the EEGLAB toolbox within the MATLAB environment. Following preprocessing, a comparative analysis was conducted using various ensemble learning algorithms, including AdaBoost, GBM, LightGBM, RF, XGB, and CatBoost. Model performance was systematically evaluated and optimized, validating the superior efficacy of ensemble learning approaches in identifying ADHD. Conclusion: Applying machine learning techniques to extract features from resting-state EEG signals enabled the development of effective ensemble learning models. Differential entropy and energy features across multiple frequency bands proved particularly valuable for these models. This approach significantly enhances the detection rate of ADHD in children, demonstrating high diagnostic efficacy and sensitivity, and providing a promising tool for clinical application.展开更多
Advances in gene editing and natural genetic variability present significant opportunities to generate novel alleles and select natural sources of genetic variation for horticulture crop improvement.The genetic improv...Advances in gene editing and natural genetic variability present significant opportunities to generate novel alleles and select natural sources of genetic variation for horticulture crop improvement.The genetic improvement of crops to enhance their resilience to abiotic stresses and new pests due to climate change is essential for future food security.The field of genomics has made significant strides over the past few decades,enabling us to sequence and analyze entire genomes.However,understanding the complex relationship between genes and their expression in phenotypes-the observable characteristics of an organism-requires a deeper understanding of phenomics.Phenomics seeks to link genetic information with biological processes and environmental factors to better understand complex traits and diseases.Recent breakthroughs in this field include the development of advanced imaging technologies,artificial intelligence algorithms,and large-scale data analysis techniques.These tools have enabled us to explore the relationships between genotype,phenotype,and environment in unprecedented detail.This review explores the importance of understanding the complex relationship between genes and their expression in phenotypes.Integration of genomics with efficient high throughput plant phenotyping as well as the potential of machine learning approaches for genomic and phenomics trait discovery.展开更多
基金the National Key R&D Program of China(2023YFB3812901)the Postdoctoral Fellowship Program of CPSF(No.GZC20230239)+1 种基金the China Postdoctoral Science Foundation(No.2023M740219)the National Natural Science Foundation of China(No.22209094)。
文摘To study the atmospheric aging of acrylic coatings,a two-year aging exposure experiment was conducted in 13 representative climatic environments in China.An atmospheric aging evaluation model of acrylic coatings was developed based on aging data including11 environmental factors from 567 cities.A hybrid method of random forest and Spearman correlation analysis was used to reduce the redundancy and multicollinearity of the data set by dimensionality reduction.A semi-supervised collaborative trained regression model was developed with the environmental factors as input and the low-frequency impedance modulus values of the electrochemical impedance spectra of acrylic coatings in 3.5wt%NaCl solution as output.The model improves accuracy compared to supervised learning algorithms model(support vector machines model).The model provides a new method for the rapid evaluation of the aging performance of acrylic coatings,and may also serve as a reference to evaluate the aging performance of other organic coatings.
基金supported by the SP2024/089 Project by the Faculty of Materials Science and Technology,VˇSB-Technical University of Ostrava.
文摘In engineering practice,it is often necessary to determine functional relationships between dependent and independent variables.These relationships can be highly nonlinear,and classical regression approaches cannot always provide sufficiently reliable solutions.Nevertheless,Machine Learning(ML)techniques,which offer advanced regression tools to address complicated engineering issues,have been developed and widely explored.This study investigates the selected ML techniques to evaluate their suitability for application in the hot deformation behavior of metallic materials.The ML-based regression methods of Artificial Neural Networks(ANNs),Support Vector Machine(SVM),Decision Tree Regression(DTR),and Gaussian Process Regression(GPR)are applied to mathematically describe hot flow stress curve datasets acquired experimentally for a medium-carbon steel.Although the GPR method has not been used for such a regression task before,the results showed that its performance is the most favorable and practically unrivaled;neither the ANN method nor the other studied ML techniques provide such precise results of the solved regression analysis.
基金the University of Transport Technology under grant number DTTD2022-12.
文摘Determination of Shear Bond strength(SBS)at interlayer of double-layer asphalt concrete is crucial in flexible pavement structures.The study used three Machine Learning(ML)models,including K-Nearest Neighbors(KNN),Extra Trees(ET),and Light Gradient Boosting Machine(LGBM),to predict SBS based on easily determinable input parameters.Also,the Grid Search technique was employed for hyper-parameter tuning of the ML models,and cross-validation and learning curve analysis were used for training the models.The models were built on a database of 240 experimental results and three input variables:temperature,normal pressure,and tack coat rate.Model validation was performed using three statistical criteria:the coefficient of determination(R2),the Root Mean Square Error(RMSE),and the mean absolute error(MAE).Additionally,SHAP analysis was also used to validate the importance of the input variables in the prediction of the SBS.Results show that these models accurately predict SBS,with LGBM providing outstanding performance.SHAP(Shapley Additive explanation)analysis for LGBM indicates that temperature is the most influential factor on SBS.Consequently,the proposed ML models can quickly and accurately predict SBS between two layers of asphalt concrete,serving practical applications in flexible pavement structure design.
基金Funded by the Natural Science Foundation of China(No.52109168)。
文摘In order to study the characteristics of pure fly ash-based geopolymer concrete(PFGC)conveniently,we used a machine learning method that can quantify the perception of characteristics to predict its compressive strength.In this study,505 groups of data were collected,and a new database of compressive strength of PFGC was constructed.In order to establish an accurate prediction model of compressive strength,five different types of machine learning networks were used for comparative analysis.The five machine learning models all showed good compressive strength prediction performance on PFGC.Among them,R2,MSE,RMSE and MAE of decision tree model(DT)are 0.99,1.58,1.25,and 0.25,respectively.While R2,MSE,RMSE and MAE of random forest model(RF)are 0.97,5.17,2.27 and 1.38,respectively.The two models have high prediction accuracy and outstanding generalization ability.In order to enhance the interpretability of model decision-making,we used importance ranking to obtain the perception of machine learning model to 13 variables.These 13 variables include chemical composition of fly ash(SiO_(2)/Al_(2)O_(3),Si/Al),the ratio of alkaline liquid to the binder,curing temperature,curing durations inside oven,fly ash dosage,fine aggregate dosage,coarse aggregate dosage,extra water dosage and sodium hydroxide dosage.Curing temperature,specimen ages and curing durations inside oven have the greatest influence on the prediction results,indicating that curing conditions have more prominent influence on the compressive strength of PFGC than ordinary Portland cement concrete.The importance of curing conditions of PFGC even exceeds that of the concrete mix proportion,due to the low reactivity of pure fly ash.
基金the Deanship of Scientifc Research at King Khalid University for funding this work through large group Research Project under grant number RGP2/421/45supported via funding from Prince Sattam bin Abdulaziz University project number(PSAU/2024/R/1446)+1 种基金supported by theResearchers Supporting Project Number(UM-DSR-IG-2023-07)Almaarefa University,Riyadh,Saudi Arabia.supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.2021R1F1A1055408).
文摘Machine learning(ML)is increasingly applied for medical image processing with appropriate learning paradigms.These applications include analyzing images of various organs,such as the brain,lung,eye,etc.,to identify specific flaws/diseases for diagnosis.The primary concern of ML applications is the precise selection of flexible image features for pattern detection and region classification.Most of the extracted image features are irrelevant and lead to an increase in computation time.Therefore,this article uses an analytical learning paradigm to design a Congruent Feature Selection Method to select the most relevant image features.This process trains the learning paradigm using similarity and correlation-based features over different textural intensities and pixel distributions.The similarity between the pixels over the various distribution patterns with high indexes is recommended for disease diagnosis.Later,the correlation based on intensity and distribution is analyzed to improve the feature selection congruency.Therefore,the more congruent pixels are sorted in the descending order of the selection,which identifies better regions than the distribution.Now,the learning paradigm is trained using intensity and region-based similarity to maximize the chances of selection.Therefore,the probability of feature selection,regardless of the textures and medical image patterns,is improved.This process enhances the performance of ML applications for different medical image processing.The proposed method improves the accuracy,precision,and training rate by 13.19%,10.69%,and 11.06%,respectively,compared to other models for the selected dataset.The mean error and selection time is also reduced by 12.56%and 13.56%,respectively,compared to the same models and dataset.
基金financial support from the National Key Research and Development Program of China(2021YFB 3501501)the National Natural Science Foundation of China(No.22225803,22038001,22108007 and 22278011)+1 种基金Beijing Natural Science Foundation(No.Z230023)Beijing Science and Technology Commission(No.Z211100004321001).
文摘The high porosity and tunable chemical functionality of metal-organic frameworks(MOFs)make it a promising catalyst design platform.High-throughput screening of catalytic performance is feasible since the large MOF structure database is available.In this study,we report a machine learning model for high-throughput screening of MOF catalysts for the CO_(2) cycloaddition reaction.The descriptors for model training were judiciously chosen according to the reaction mechanism,which leads to high accuracy up to 97%for the 75%quantile of the training set as the classification criterion.The feature contribution was further evaluated with SHAP and PDP analysis to provide a certain physical understanding.12,415 hypothetical MOF structures and 100 reported MOFs were evaluated under 100℃ and 1 bar within one day using the model,and 239 potentially efficient catalysts were discovered.Among them,MOF-76(Y)achieved the top performance experimentally among reported MOFs,in good agreement with the prediction.
文摘Diabetic retinopathy(DR)remains a leading cause of vision impairment and blindness among individuals with diabetes,necessitating innovative approaches to screening and management.This editorial explores the transformative potential of artificial intelligence(AI)and machine learning(ML)in revolutionizing DR care.AI and ML technologies have demonstrated remarkable advancements in enhancing the accuracy,efficiency,and accessibility of DR screening,helping to overcome barriers to early detection.These technologies leverage vast datasets to identify patterns and predict disease progression with unprecedented precision,enabling clinicians to make more informed decisions.Furthermore,AI-driven solutions hold promise in personalizing management strategies for DR,incorpo-rating predictive analytics to tailor interventions and optimize treatment path-ways.By automating routine tasks,AI can reduce the burden on healthcare providers,allowing for a more focused allocation of resources towards complex patient care.This review aims to evaluate the current advancements and applic-ations of AI and ML in DR screening,and to discuss the potential of these techno-logies in developing personalized management strategies,ultimately aiming to improve patient outcomes and reduce the global burden of DR.The integration of AI and ML in DR care represents a paradigm shift,offering a glimpse into the future of ophthalmic healthcare.
文摘Machine learning(ML)is a type of artificial intelligence that assists computers in the acquisition of knowledge through data analysis,thus creating machines that can complete tasks otherwise requiring human intelligence.Among its various applications,it has proven groundbreaking in healthcare as well,both in clinical practice and research.In this editorial,we succinctly introduce ML applications and present a study,featured in the latest issue of the World Journal of Clinical Cases.The authors of this study conducted an analysis using both multiple linear regression(MLR)and ML methods to investigate the significant factors that may impact the estimated glomerular filtration rate in healthy women with and without non-alcoholic fatty liver disease(NAFLD).Their results implicated age as the most important determining factor in both groups,followed by lactic dehydrogenase,uric acid,forced expiratory volume in one second,and albumin.In addition,for the NAFLD-group,the 5th and 6th most important impact factors were thyroid-stimulating hormone and systolic blood pressure,as compared to plasma calcium and body fat for the NAFLD+group.However,the study's distinctive contribution lies in its adoption of ML methodologies,showcasing their superiority over traditional statistical approaches(herein MLR),thereby highlighting the potential of ML to represent an invaluable advanced adjunct tool in clinical practice and research.
文摘BACKGROUND Machine learning(ML),a major branch of artificial intelligence,has not only demonstrated the potential to significantly improve numerous sectors of healthcare but has also made significant contributions to the field of solid organ transplantation.ML provides revolutionary opportunities in areas such as donorrecipient matching,post-transplant monitoring,and patient care by automatically analyzing large amounts of data,identifying patterns,and forecasting outcomes.AIM To conduct a comprehensive bibliometric analysis of publications on the use of ML in transplantation to understand current research trends and their implications.METHODS On July 18,a thorough search strategy was used with the Web of Science database.ML and transplantation-related keywords were utilized.With the aid of the VOS viewer application,the identified articles were subjected to bibliometric variable analysis in order to determine publication counts,citation counts,contributing countries,and institutions,among other factors.RESULTS Of the 529 articles that were first identified,427 were deemed relevant for bibliometric analysis.A surge in publications was observed over the last four years,especially after 2018,signifying growing interest in this area.With 209 publications,the United States emerged as the top contributor.Notably,the"Journal of Heart and Lung Transplantation"and the"American Journal of Transplantation"emerged as the leading journals,publishing the highest number of relevant articles.Frequent keyword searches revealed that patient survival,mortality,outcomes,allocation,and risk assessment were significant themes of focus.CONCLUSION The growing body of pertinent publications highlights ML's growing presence in the field of solid organ transplantation.This bibliometric analysis highlights the growing importance of ML in transplant research and highlights its exciting potential to change medical practices and enhance patient outcomes.Encouraging collaboration between significant contributors can potentially fast-track advancements in this interdisciplinary domain.
文摘Critical to the safe, efficient, and reliable operation of an autonomous maritime vessel is its ability to perceive the external environment through onboard sensors. For this research, data was collected from a LiDAR sensor installed on a 16-foot catamaran unmanned vessel. This sensor generated point clouds of the surrounding maritime environment, which were then labeled by hand for training a machine learning (ML) model to perform a semantic segmentation task on LiDAR scans. In particular, the developed semantic segmentation classifies each point-cloud point as belonging to a certain buoy type. This paper describes the developed Unity Game Engine (Unity) simulation to emulate the maritime environment perceived by LiDAR with the goal of generating large (automatically labeled) simulation datasets and improving the ML model performance since hand-labeled real-life LiDAR scan data may be scarce. The Unity simulation data combined with labeled real-life point cloud data was used for a PointNet-based neural network model, the architecture of which is presented in this paper. Fitting the PointNet-based model on the simulation data followed by fine-tuning the combined dataset allowed for accurate semantic segmentation of point clouds on the real-world data. The ML model performance on several combinations of simulation and real-life data is explored. The resulting Intersection over Union (IoU) metric scores are quite high, ranging between 0.78 and 0.89, when validated on simulation and real-life data. The confusion matrix-entry values indicate an accurate semantic segmentation of the buoy types.
文摘Every second, a large volume of useful data is created in social media about the various kind of online purchases and in another forms of reviews. Particularly, purchased products review data is enormously growing in different database repositories every day. Most of the review data are useful to new customers for theier further purchases as well as existing companies to view customers feedback about various products. Data Mining and Machine Leaning techniques are familiar to analyse such kind of data to visualise and know the potential use of the purchased items through online. The customers are making quality of products through their sentiments about the purchased items from different online companies. In this research work, it is analysed sentiments of Headphone review data, which is collected from online repositories. For the analysis of Headphone review data, some of the Machine Learning techniques like Support Vector Machines, Naive Bayes, Decision Trees and Random Forest Algorithms and a Hybrid method are applied to find the quality via the customers’ sentiments. The accuracy and performance of the taken algorithms are also analysed based on the three types of sentiments such as positive, negative and neutral.
文摘The purpose of this research paper is to explore how early Machine Learning models have shown a bias in the results where a bias should not be seen. A prime example is an ML model that favors male applicants over female applicants. While the model is supposed to take into consideration other aspects of the data, it tends to have a bias and skew the results one way or another. Therefore, in this paper, we will be exploring how this bias comes about and how it can be fixed. In this research, I have taken different case studies of real-world examples of these biases being shown. For example, an Amazon hiring application that favored male applicants or a loan application that favored western applicants is both studies that I will reference in this paper and explore the situation itself. In order to find out where the bias is coming from, I have constructed a machine learning model that will use a dataset found on Kaggle, and I will analyze the results of said ML model. The results that the research has yielded clarify the reason for said bias in the artificial intelligence models. The way the model was trained influences the way the results will play out. If the model is trained with a large amount of male applicant data over female applicant data, the model will favor male applicants. Therefore, when they are trained with new data, they are likely to accept applications that are male over female despite having equivalent parts. Later in the paper, I will dive deeper into the way that AI applications work and how they find biases and trends in order to classify things correctly. However, there is a fine line between classification and bias and making sure that it is rightfully corrected and tested is important in machine learning today.
文摘The rapid growth of machine learning(ML)across fields has intensified the challenge of selecting the right algorithm for specific tasks,known as the Algorithm Selection Problem(ASP).Traditional trial-and-error methods have become impractical due to their resource demands.Automated Machine Learning(AutoML)systems automate this process,but often neglect the group structures and sparsity in meta-features,leading to inefficiencies in algorithm recommendations for classification tasks.This paper proposes a meta-learning approach using Multivariate Sparse Group Lasso(MSGL)to address these limitations.Our method models both within-group and across-group sparsity among meta-features to manage high-dimensional data and reduce multicollinearity across eight meta-feature groups.The Fast Iterative Shrinkage-Thresholding Algorithm(FISTA)with adaptive restart efficiently solves the non-smooth optimization problem.Empirical validation on 145 classification datasets with 17 classification algorithms shows that our meta-learning method outperforms four state-of-the-art approaches,achieving 77.18%classification accuracy,86.07%recommendation accuracy and 88.83%normalized discounted cumulative gain.
文摘Solar cells made from perovskites have experienced rapid development as examples of third-generation solar cells in recent years. The traditional trial-and-error method is inefficient, and the search space is incredibly large. This makes developing advanced perovskite materials, as well as high conversion efficiencies and stability of perovskite solar cells (PSCs), a challenging task. A growing number of data-driven machine learning (ML) applications are being developed in the materials science field, due to the availability of large databases and increased computing power. There are many advantages associated with the use of machine learning to predict the properties of potential perovskite materials, as well as provide additional knowledge on how these materials work to fast-track their progress. Thus, the purpose of this paper is to develop a conceptual model to improve the efficiency of a perovskite solar cell using machine learning techniques in order to improve its performance. This study relies on the application of design science as a method to conduct the research as part of the study. The developed model consists of six phases: Data collection and preprocessing, feature selection and engineering, model training and evaluation, performance assessment, optimization and fine-tuning, and deployment and application. As a result of this model, there is a great deal of promise in advancing the field of perovskite solar cells as well as providing a basis for developing more efficient and cost-effective solar energy technologies in the future.
基金supported by the Jiangsu Provincial Science and Technology Project Basic Research Program(Natural Science Foundation of Jiangsu Province)(No.BK20211283).
文摘NJmat is a user-friendly,data-driven machine learning interface designed for materials design and analysis.The platform integrates advanced computational techniques,including natural language processing(NLP),large language models(LLM),machine learning potentials(MLP),and graph neural networks(GNN),to facili-tate materials discovery.The platform has been applied in diverse materials research areas,including perovskite surface design,catalyst discovery,battery materials screening,structural alloy design,and molecular informatics.By automating feature selection,predictive modeling,and result interpretation,NJmat accelerates the development of high-performance materials across energy storage,conversion,and structural applications.Additionally,NJmat serves as an educational tool,allowing students and researchers to apply machine learning techniques in materials science with minimal coding expertise.Through automated feature extraction,genetic algorithms,and interpretable machine learning models,NJmat simplifies the workflow for materials informatics,bridging the gap between AI and experimental materials research.The latest version(available at https://figshare.com/articles/software/NJmatML/24607893(accessed on 01 January 2025))enhances its functionality by incorporating NJmatNLP,a module leveraging language models like MatBERT and those based on Word2Vec to support materials prediction tasks.By utilizing clustering and cosine similarity analysis with UMAP visualization,NJmat enables intuitive exploration of materials datasets.While NJmat primarily focuses on structure-property relationships and the discovery of novel chemistries,it can also assist in optimizing processing conditions when relevant parameters are included in the training data.By providing an accessible,integrated environment for machine learning-driven materials discovery,NJmat aligns with the objectives of the Materials Genome Initiative and promotes broader adoption of AI techniques in materials science.
基金supported by the National Natural Science Foundation of China(No.52207229)the Key Research and Development Program of Ningxia Hui Autonomous Region of China(No.2024BEE02003)+1 种基金the financial support from the AEGiS Research Grant 2024,University of Wollongong(No.R6254)the financial support from the China Scholarship Council(No.202207550010).
文摘Accurate prediction of the remaining useful life(RUL)is crucial for the design and management of lithium-ion batteries.Although various machine learning models offer promising predictions,one critical but often overlooked challenge is their demand for considerable run-to-failure data for training.Collection of such training data leads to prohibitive testing efforts as the run-to-failure tests can last for years.Here,we propose a semi-supervised representation learning method to enhance prediction accuracy by learning from data without RUL labels.Our approach builds on a sophisticated deep neural network that comprises an encoder and three decoder heads to extract time-dependent representation features from short-term battery operating data regardless of the existence of RUL labels.The approach is validated using three datasets collected from 34 batteries operating under various conditions,encompassing over 19,900 charge and discharge cycles.Our method achieves a root mean squared error(RMSE)within 25 cycles,even when only 1/50 of the training dataset is labelled,representing a reduction of 48%compared to the conventional approach.We also demonstrate the method's robustness with varying numbers of labelled data and different weights assigned to the three decoder heads.The projection of extracted features in low space reveals that our method effectively learns degradation features from unlabelled data.Our approach highlights the promise of utilising semi-supervised learning to reduce the data demand for reliability monitoring of energy devices.
基金Key project in Hubei Province,Grant/Award Number:2023BCB048National Key R&D Program of China,Grant/Award Number:2021YFC3100805+1 种基金National Natural Science Foundation of China,Grant/Award Numbers:42293355,51991392Project for Research Assistant of Chinese Academy of Sciences。
文摘Monitoring of the mechanical behavior of underwater shield tunnels is vital for ensuring their long-term structural stability.Typically determined by empirical or semi-empirical methods,the limited number of monitoring points and coarse monitoring schemes pose huge challenges in terms of capturing the complete mechanical state of the entire structure.Therefore,with the aim of optimizing the monitoring scheme,this study introduces a spatial deduction model for the stress distribution of the overall structure using a machine learning algorithm.Initially,clustering experiments were performed on a numerical data set to determine the typical positions of structural mechanical responses.Subsequently,supervised learning methods were applied to derive the data information across the entire surface by using the data from these typical positions,which allows flexibility in the number and combinations of these points.According to the evaluation results of the model under various conditions,the optimized number of monitoring points and their locations are determined.Experimental findings suggest that an excessive number of monitoring points results in information redundancy,thus diminishing the deduction capability.The primary positions for monitoring points are determined as the spandrel and hance of the tunnel structure,with the arch crown and inch arch serving as additional positions to enhance the monitoring network.Compared with common methods,the proposed model shows significantly improved characterization abilities,establishing its reliability for optimizing the monitoring scheme.
基金funded by the McGill University Graduate Excellence Fellowship Award(00157)the Mitacs Accelerate Program(IT13369)the McGill Engineering Doctoral Award(MEDA).
文摘Machine learning(ML)has recently enabled many modeling tasks in design,manufacturing,and condition monitoring due to its unparalleled learning ability using existing data.Data have become the limiting factor when implementing ML in industry.However,there is no systematic investigation on how data quality can be assessed and improved for ML-based design and manufacturing.The aim of this survey is to uncover the data challenges in this domain and review the techniques used to resolve them.To establish the background for the subsequent analysis,crucial data terminologies in ML-based modeling are reviewed and categorized into data acquisition,management,analysis,and utilization.Thereafter,the concepts and frameworks established to evaluate data quality and imbalance,including data quality assessment,data readiness,information quality,data biases,fairness,and diversity,are further investigated.The root causes and types of data challenges,including human factors,complex systems,complicated relationships,lack of data quality,data heterogeneity,data imbalance,and data scarcity,are identified and summarized.Methods to improve data quality and mitigate data imbalance and their applications in this domain are reviewed.This literature review focuses on two promising methods:data augmentation and active learning.The strengths,limitations,and applicability of the surveyed techniques are illustrated.The trends of data augmentation and active learning are discussed with respect to their applications,data types,and approaches.Based on this discussion,future directions for data quality improvement and data imbalance mitigation in this domain are identified.
基金This study received financial support from the Jilin Province Health and Technology Capacity Enhancement Project(Project Number:222Lc132).
文摘Objective: This study investigates the auxiliary role of resting-state electroencephalography (EEG) in the clinical diagnosis of attention-deficit hyperactivity disorder (ADHD) using machine learning techniques. Methods: Resting-state EEG recordings were obtained from 57 children, comprising 28 typically developing children and 29 children diagnosed with ADHD. The EEG signal data from both groups were analyzed. To ensure analytical accuracy, artifacts and noise in the EEG signals were removed using the EEGLAB toolbox within the MATLAB environment. Following preprocessing, a comparative analysis was conducted using various ensemble learning algorithms, including AdaBoost, GBM, LightGBM, RF, XGB, and CatBoost. Model performance was systematically evaluated and optimized, validating the superior efficacy of ensemble learning approaches in identifying ADHD. Conclusion: Applying machine learning techniques to extract features from resting-state EEG signals enabled the development of effective ensemble learning models. Differential entropy and energy features across multiple frequency bands proved particularly valuable for these models. This approach significantly enhances the detection rate of ADHD in children, demonstrating high diagnostic efficacy and sensitivity, and providing a promising tool for clinical application.
基金supported this research through the National Research Foundation of Korea(NRF),funded by the Ministry of Education(2019R1A6A1A11052070)。
文摘Advances in gene editing and natural genetic variability present significant opportunities to generate novel alleles and select natural sources of genetic variation for horticulture crop improvement.The genetic improvement of crops to enhance their resilience to abiotic stresses and new pests due to climate change is essential for future food security.The field of genomics has made significant strides over the past few decades,enabling us to sequence and analyze entire genomes.However,understanding the complex relationship between genes and their expression in phenotypes-the observable characteristics of an organism-requires a deeper understanding of phenomics.Phenomics seeks to link genetic information with biological processes and environmental factors to better understand complex traits and diseases.Recent breakthroughs in this field include the development of advanced imaging technologies,artificial intelligence algorithms,and large-scale data analysis techniques.These tools have enabled us to explore the relationships between genotype,phenotype,and environment in unprecedented detail.This review explores the importance of understanding the complex relationship between genes and their expression in phenotypes.Integration of genomics with efficient high throughput plant phenotyping as well as the potential of machine learning approaches for genomic and phenomics trait discovery.