Tea leaf picking is a crucial stage in tea production that directly influences the quality and value of the tea.Traditional tea-picking machines may compromise the quality of the tea leaves.High-quality teas are often...Tea leaf picking is a crucial stage in tea production that directly influences the quality and value of the tea.Traditional tea-picking machines may compromise the quality of the tea leaves.High-quality teas are often handpicked and need more delicate operations in intelligent picking machines.Compared with traditional image processing techniques,deep learning models have stronger feature extraction capabilities,and better generalization and are more suitable for practical tea shoot harvesting.However,current research mostly focuses on shoot detection and cannot directly accomplish end-to-end shoot segmentation tasks.We propose a tea shoot instance segmentation model based on multi-scale mixed attention(Mask2FusionNet)using a dataset from the tea garden in Hangzhou.We further analyzed the characteristics of the tea shoot dataset,where the proportion of small to medium-sized targets is 89.9%.Our algorithm is compared with several mainstream object segmentation algorithms,and the results demonstrate that our model achieves an accuracy of 82%in recognizing the tea shoots,showing a better performance compared to other models.Through ablation experiments,we found that ResNet50,PointRend strategy,and the Feature Pyramid Network(FPN)architecture can improve performance by 1.6%,1.4%,and 2.4%,respectively.These experiments demonstrated that our proposed multi-scale and point selection strategy optimizes the feature extraction capability for overlapping small targets.The results indicate that the proposed Mask2FusionNet model can perform the shoot segmentation in unstructured environments,realizing the individual distinction of tea shoots,and complete extraction of the shoot edge contours with a segmentation accuracy of 82.0%.The research results can provide algorithmic support for the segmentation and intelligent harvesting of premium tea shoots at different scales.展开更多
Nuclearmagnetic resonance imaging of breasts often presents complex backgrounds.Breast tumors exhibit varying sizes,uneven intensity,and indistinct boundaries.These characteristics can lead to challenges such as low a...Nuclearmagnetic resonance imaging of breasts often presents complex backgrounds.Breast tumors exhibit varying sizes,uneven intensity,and indistinct boundaries.These characteristics can lead to challenges such as low accuracy and incorrect segmentation during tumor segmentation.Thus,we propose a two-stage breast tumor segmentation method leveraging multi-scale features and boundary attention mechanisms.Initially,the breast region of interest is extracted to isolate the breast area from surrounding tissues and organs.Subsequently,we devise a fusion network incorporatingmulti-scale features and boundary attentionmechanisms for breast tumor segmentation.We incorporate multi-scale parallel dilated convolution modules into the network,enhancing its capability to segment tumors of various sizes through multi-scale convolution and novel fusion techniques.Additionally,attention and boundary detection modules are included to augment the network’s capacity to locate tumors by capturing nonlocal dependencies in both spatial and channel domains.Furthermore,a hybrid loss function with boundary weight is employed to address sample class imbalance issues and enhance the network’s boundary maintenance capability through additional loss.Themethod was evaluated using breast data from 207 patients at RuijinHospital,resulting in a 6.64%increase in Dice similarity coefficient compared to the benchmarkU-Net.Experimental results demonstrate the superiority of the method over other segmentation techniques,with fewer model parameters.展开更多
Existing semi-supervisedmedical image segmentation algorithms use copy-paste data augmentation to correct the labeled-unlabeled data distribution mismatch.However,current copy-paste methods have three limitations:(1)t...Existing semi-supervisedmedical image segmentation algorithms use copy-paste data augmentation to correct the labeled-unlabeled data distribution mismatch.However,current copy-paste methods have three limitations:(1)training the model solely with copy-paste mixed pictures from labeled and unlabeled input loses a lot of labeled information;(2)low-quality pseudo-labels can cause confirmation bias in pseudo-supervised learning on unlabeled data;(3)the segmentation performance in low-contrast and local regions is less than optimal.We design a Stochastic Augmentation-Based Dual-Teaching Auxiliary Training Strategy(SADT),which enhances feature diversity and learns high-quality features to overcome these problems.To be more precise,SADT trains the Student Network by using pseudo-label-based training from Teacher Network 1 and supervised learning with labeled data,which prevents the loss of rare labeled data.We introduce a bi-directional copy-pastemask with progressive high-entropy filtering to reduce data distribution disparities and mitigate confirmation bias in pseudo-supervision.For the mixed images,Deep-Shallow Spatial Contrastive Learning(DSSCL)is proposed in the feature spaces of Teacher Network 2 and the Student Network to improve the segmentation capabilities in low-contrast and local areas.In this procedure,the features retrieved by the Student Network are subjected to a random feature perturbation technique.On two openly available datasets,extensive trials show that our proposed SADT performs much better than the state-ofthe-art semi-supervised medical segmentation techniques.Using only 10%of the labeled data for training,SADT was able to acquire a Dice score of 90.10%on the ACDC(Automatic Cardiac Diagnosis Challenge)dataset.展开更多
In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accurac...In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods.展开更多
Lower back pain is one of the most common medical problems in the world and it is experienced by a huge percentage of people everywhere.Due to its ability to produce a detailed view of the soft tissues,including the s...Lower back pain is one of the most common medical problems in the world and it is experienced by a huge percentage of people everywhere.Due to its ability to produce a detailed view of the soft tissues,including the spinal cord,nerves,intervertebral discs,and vertebrae,Magnetic Resonance Imaging is thought to be the most effective method for imaging the spine.The semantic segmentation of vertebrae plays a major role in the diagnostic process of lumbar diseases.It is difficult to semantically partition the vertebrae in Magnetic Resonance Images from the surrounding variety of tissues,including muscles,ligaments,and intervertebral discs.U-Net is a powerful deep-learning architecture to handle the challenges of medical image analysis tasks and achieves high segmentation accuracy.This work proposes a modified U-Net architecture namely MU-Net,consisting of the Meijering convolutional layer that incorporates the Meijering filter to perform the semantic segmentation of lumbar vertebrae L1 to L5 and sacral vertebra S1.Pseudo-colour mask images were generated and used as ground truth for training the model.The work has been carried out on 1312 images expanded from T1-weighted mid-sagittal MRI images of 515 patients in the Lumbar Spine MRI Dataset publicly available from Mendeley Data.The proposed MU-Net model for the semantic segmentation of the lumbar vertebrae gives better performance with 98.79%of pixel accuracy(PA),98.66%of dice similarity coefficient(DSC),97.36%of Jaccard coefficient,and 92.55%mean Intersection over Union(mean IoU)metrics using the mentioned dataset.展开更多
Retinal blood vessel segmentation is crucial for diagnosing ocular and cardiovascular diseases.Although the introduction of U-Net in 2015 by Olaf Ronneberger significantly advanced this field,yet issues like limited t...Retinal blood vessel segmentation is crucial for diagnosing ocular and cardiovascular diseases.Although the introduction of U-Net in 2015 by Olaf Ronneberger significantly advanced this field,yet issues like limited training data,imbalance data distribution,and inadequate feature extraction persist,hindering both the segmentation performance and optimal model generalization.Addressing these critical issues,the DEFFA-Unet is proposed featuring an additional encoder to process domain-invariant pre-processed inputs,thereby improving both richer feature encoding and enhanced model generalization.A feature filtering fusion module is developed to ensure the precise feature filtering and robust hybrid feature fusion.In response to the task-specific need for higher precision where false positives are very costly,traditional skip connections are replaced with the attention-guided feature reconstructing fusion module.Additionally,innovative data augmentation and balancing methods are proposed to counter data scarcity and distribution imbalance,further boosting the robustness and generalization of the model.With a comprehensive suite of evaluation metrics,extensive validations on four benchmark datasets(DRIVE,CHASEDB1,STARE,and HRF)and an SLO dataset(IOSTAR),demonstrate the proposed method’s superiority over both baseline and state-of-the-art models.Particularly the proposed method significantly outperforms the compared methods in cross-validation model generalization.展开更多
Medical image segmentation has become a cornerstone for many healthcare applications,allowing for the automated extraction of critical information from images such as Computed Tomography(CT)scans,Magnetic Resonance Im...Medical image segmentation has become a cornerstone for many healthcare applications,allowing for the automated extraction of critical information from images such as Computed Tomography(CT)scans,Magnetic Resonance Imaging(MRIs),and X-rays.The introduction of U-Net in 2015 has significantly advanced segmentation capabilities,especially for small datasets commonly found in medical imaging.Since then,various modifications to the original U-Net architecture have been proposed to enhance segmentation accuracy and tackle challenges like class imbalance,data scarcity,and multi-modal image processing.This paper provides a detailed review and comparison of several U-Net-based architectures,focusing on their effectiveness in medical image segmentation tasks.We evaluate performance metrics such as Dice Similarity Coefficient(DSC)and Intersection over Union(IoU)across different U-Net variants including HmsU-Net,CrossU-Net,mResU-Net,and others.Our results indicate that architectural enhancements such as transformers,attention mechanisms,and residual connections improve segmentation performance across diverse medical imaging applications,including tumor detection,organ segmentation,and lesion identification.The study also identifies current challenges in the field,including data variability,limited dataset sizes,and issues with class imbalance.Based on these findings,the paper suggests potential future directions for improving the robustness and clinical applicability of U-Net-based models in medical image segmentation.展开更多
Brain tumor segmentation is critical in clinical diagnosis and treatment planning.Existing methods for brain tumor segmentation with missing modalities often struggle when dealing with multiple missing modalities,a co...Brain tumor segmentation is critical in clinical diagnosis and treatment planning.Existing methods for brain tumor segmentation with missing modalities often struggle when dealing with multiple missing modalities,a common scenario in real-world clinical settings.These methods primarily focus on handling a single missing modality at a time,making them insufficiently robust for the additional complexity encountered with incomplete data containing various missing modality combinations.Additionally,most existing methods rely on single models,which may limit their performance and increase the risk of overfitting the training data.This work proposes a novel method called the ensemble adversarial co-training neural network(EACNet)for accurate brain tumor segmentation from multi-modal magnetic resonance imaging(MRI)scans with multiple missing modalities.The proposed method consists of three key modules:the ensemble of pre-trained models,which captures diverse feature representations from the MRI data by employing an ensemble of pre-trained models;adversarial learning,which leverages a competitive training approach involving two models;a generator model,which creates realistic missing data,while sub-networks acting as discriminators learn to distinguish real data from the generated“fake”data.Co-training framework utilizes the information extracted by the multimodal path(trained on complete scans)to guide the learning process in the path handling missing modalities.The model potentially compensates for missing information through co-training interactions by exploiting the relationships between available modalities and the tumor segmentation task.EACNet was evaluated on the BraTS2018 and BraTS2020 challenge datasets and achieved state-of-the-art and competitive performance respectively.Notably,the segmentation results for the whole tumor(WT)dice similarity coefficient(DSC)reached 89.27%,surpassing the performance of existing methods.The analysis suggests that the ensemble approach offers potential benefits,and the adversarial co-training contributes to the increased robustness and accuracy of EACNet for brain tumor segmentation of MRI scans with missing modalities.The experimental results show that EACNet has promising results for the task of brain tumor segmentation of MRI scans with missing modalities and is a better candidate for real-world clinical applications.展开更多
This paper aims to develop a nonrigid registration method of preoperative and intraoperative thoracoabdominal CT images in computer-assisted interventional surgeries for accurate tumor localization and tissue visualiz...This paper aims to develop a nonrigid registration method of preoperative and intraoperative thoracoabdominal CT images in computer-assisted interventional surgeries for accurate tumor localization and tissue visualization enhancement.However,fine structure registration of complex thoracoabdominal organs and large deformation registration caused by respiratory motion is challenging.To deal with this problem,we propose a 3D multi-scale attention VoxelMorph(MAVoxelMorph)registration network.To alleviate the large deformation problem,a multi-scale axial attention mechanism is utilized by using a residual dilated pyramid pooling for multi-scale feature extraction,and position-aware axial attention for long-distance dependencies between pixels capture.To further improve the large deformation and fine structure registration results,a multi-scale context channel attention mechanism is employed utilizing content information via adjacent encoding layers.Our method was evaluated on four public lung datasets(DIR-Lab dataset,Creatis dataset,Learn2Reg dataset,OASIS dataset)and a local dataset.Results proved that the proposed method achieved better registration performance than current state-of-the-art methods,especially in handling the registration of large deformations and fine structures.It also proved to be fast in 3D image registration,using about 1.5 s,and faster than most methods.Qualitative and quantitative assessments proved that the proposed MA-VoxelMorph has the potential to realize precise and fast tumor localization in clinical interventional surgeries.展开更多
To solve the problems of redundant feature information,the insignificant difference in feature representation,and low recognition accuracy of the fine-grained image,based on the ResNeXt50 model,an MSFResNet network mo...To solve the problems of redundant feature information,the insignificant difference in feature representation,and low recognition accuracy of the fine-grained image,based on the ResNeXt50 model,an MSFResNet network model is proposed by fusing multi-scale feature information.Firstly,a multi-scale feature extraction module is designed to obtain multi-scale information on feature images by using different scales of convolution kernels.Meanwhile,the channel attention mechanism is used to increase the global information acquisition of the network.Secondly,the feature images processed by the multi-scale feature extraction module are fused with the deep feature images through short links to guide the full learning of the network,thus reducing the loss of texture details of the deep network feature images,and improving network generalization ability and recognition accuracy.Finally,the validity of the MSFResNet model is verified using public datasets and applied to wild mushroom identification.Experimental results show that compared with ResNeXt50 network model,the accuracy of the MSFResNet model is improved by 6.01%on the FGVC-Aircraft common dataset.It achieves 99.13%classification accuracy on the wild mushroom dataset,which is 0.47%higher than ResNeXt50.Furthermore,the experimental results of the thermal map show that the MSFResNet model significantly reduces the interference of background information,making the network focus on the location of the main body of wild mushroom,which can effectively improve the accuracy of wild mushroom identification.展开更多
The multi-scale modeling combined with the cohesive zone model(CZM)and the molecular dynamics(MD)method were preformed to simulate the crack propagation in NiTi shape memory alloys(SMAs).The metallographic microscope ...The multi-scale modeling combined with the cohesive zone model(CZM)and the molecular dynamics(MD)method were preformed to simulate the crack propagation in NiTi shape memory alloys(SMAs).The metallographic microscope and image processing technology were employed to achieve a quantitative grain size distribution of NiTi alloys so as to provide experimental data for molecular dynamics modeling at the atomic scale.Considering the size effect of molecular dynamics model on material properties,a reasonable modeling size was provided by taking into account three characteristic dimensions from the perspective of macro,meso,and micro scales according to the Buckinghamπtheorem.Then,the corresponding MD simulation on deformation and fracture behavior was investigated to derive a parameterized traction-separation(T-S)law,and then it was embedded into cohesive elements of finite element software.Thus,the crack propagation behavior in NiTi alloys was reproduced by the finite element method(FEM).The experimental results show that the predicted initiation fracture toughness is in good agreement with experimental data.In addition,it is found that the dynamics initiation fracture toughness increases with decreasing grain size and increasing loading velocity.展开更多
A new algorithm for segmentation of suspected lung ROI(regions of interest)by mean-shift clustering and multi-scale HESSIAN matrix dot filtering was proposed.Original image was firstly filtered by multi-scale HESSIAN ...A new algorithm for segmentation of suspected lung ROI(regions of interest)by mean-shift clustering and multi-scale HESSIAN matrix dot filtering was proposed.Original image was firstly filtered by multi-scale HESSIAN matrix dot filters,round suspected nodular lesions in the image were enhanced,and linear shape regions of the trachea and vascular were suppressed.Then,three types of information,such as,shape filtering value of HESSIAN matrix,gray value,and spatial location,were introduced to feature space.The kernel function of mean-shift clustering was divided into product form of three kinds of kernel functions corresponding to the three feature information.Finally,bandwidths were calculated adaptively to determine the bandwidth of each suspected area,and they were used in mean-shift clustering segmentation.Experimental results show that by the introduction of HESSIAN matrix of dot filtering information to mean-shift clustering,nodular regions can be segmented from blood vessels,trachea,or cross regions connected to the nodule,non-nodular areas can be removed from ROIs properly,and ground glass object(GGO)nodular areas can also be segmented.For the experimental data set of 127 different forms of nodules,the average accuracy of the proposed algorithm is more than 90%.展开更多
Watershed segmentation is sensitive to noises and irregular details within the image,which frequently leads to a serious over-segmentation Linear filtering before watershed segmentation can reduce over-segmentation to...Watershed segmentation is sensitive to noises and irregular details within the image,which frequently leads to a serious over-segmentation Linear filtering before watershed segmentation can reduce over-segmentation to some extent,however,it often causes the position offset of object contours.For the purpose of reducing over-segmentation to preserve the location of object contours,the watershed segmentation based on the hierarchical multi-scale modification of morphological gradient is proposed.Firstly,multi-scale morphological filtering was employed to smooth the original image.Then,the gradient image was divided into multi-levels by the volume of three-dimension topographic relief,where the lower gradient layers were further modifiedby morphological closing with larger-sized structuring-elements,and the higher layers with the smaller one.In this way,most local minimums caused by irregular details and noises can be removed,while region contour positions corresponding to the target area were largely preserved.Finally,morphological watershed algorithm was employed to implement segmentation on the modified gradient image.The experimental results show that the proposed method can greatly reduce the over-segmentation of the watershed and avoid the position offset of the object contours.展开更多
Cardiomyopathy is one of the most serious public health threats.The precise structural and functional cardiac measurement is an essential step for clinical diagnosis and follow-up treatment planning.Cardiologists are ...Cardiomyopathy is one of the most serious public health threats.The precise structural and functional cardiac measurement is an essential step for clinical diagnosis and follow-up treatment planning.Cardiologists are often required to draw endocardial and epicardial contours of the left ventricle(LV)manually in routine clinical diagnosis or treatment planning period.This task is time-consuming and error-prone.Therefore,it is necessary to develop a fully automated end-to-end semantic segmentation method on cardiac magnetic resonance(CMR)imaging datasets.However,due to the low image quality and the deformation caused by heartbeat,there is no effective tool for fully automated end-to-end cardiac segmentation task.In this work,we propose a multi-scale segmentation network(MSSN)for left ventricle segmentation.It can effectively learn myocardium and blood pool structure representations from 2D short-axis CMR image slices in a multi-scale way.Specifically,our method employs both parallel and serial of dilated convolution layers with different dilation rates to capture multi-scale semantic features.Moreover,we design graduated up-sampling layers with subpixel layers as the decoder to reconstruct lost spatial information and produce accurate segmentation masks.We validated our method using 164 T1 Mapping CMR images and showed that it outperforms the advanced convolutional neural network(CNN)models.In validation metrics,we archived the Dice Similarity Coefficient(DSC)metric of 78.96%.展开更多
As an important part of the new generation of information technology,the Internet of Things(IoT)has been widely concerned and regarded as an enabling technology of the next generation of health care system.The fundus ...As an important part of the new generation of information technology,the Internet of Things(IoT)has been widely concerned and regarded as an enabling technology of the next generation of health care system.The fundus photography equipment is connected to the cloud platform through the IoT,so as to realize the realtime uploading of fundus images and the rapid issuance of diagnostic suggestions by artificial intelligence.At the same time,important security and privacy issues have emerged.The data uploaded to the cloud platform involves more personal attributes,health status and medical application data of patients.Once leaked,abused or improperly disclosed,personal information security will be violated.Therefore,it is important to address the security and privacy issues of massive medical and healthcare equipment connecting to the infrastructure of IoT healthcare and health systems.To meet this challenge,we propose MIA-UNet,a multi-scale iterative aggregation U-network,which aims to achieve accurate and efficient retinal vessel segmentation for ophthalmic auxiliary diagnosis while ensuring that the network has low computational complexity to adapt to mobile terminals.In this way,users do not need to upload the data to the cloud platform,and can analyze and process the fundus images on their own mobile terminals,thus eliminating the leakage of personal information.Specifically,the interconnection between encoder and decoder,as well as the internal connection between decoder subnetworks in classic U-Net are redefined and redesigned.Furthermore,we propose a hybrid loss function to smooth the gradient and deal with the imbalance between foreground and background.Compared with the UNet,the segmentation performance of the proposed network is significantly improved on the premise that the number of parameters is only increased by 2%.When applied to three publicly available datasets:DRIVE,STARE and CHASE DB1,the proposed network achieves the accuracy/F1-score of 96.33%/84.34%,97.12%/83.17%and 97.06%/84.10%,respectively.The experimental results show that the MIA-UNet is superior to the state-of-the-art methods.展开更多
Liver cancer has the second highest incidence rate among all types of malignant tumors,and currently,its diagnosis heavily depends on doctors’manual labeling of CT scan images,a process that is time-consuming and sus...Liver cancer has the second highest incidence rate among all types of malignant tumors,and currently,its diagnosis heavily depends on doctors’manual labeling of CT scan images,a process that is time-consuming and susceptible to subjective errors.To address the aforementioned issues,we propose an automatic segmentation model for liver and tumors called Res2Swin Unet,which is based on the Unet architecture.The model combines Attention-Res2 and Swin Transformer modules for liver and tumor segmentation,respectively.Attention-Res2 merges multiple feature map parts with an Attention gate via skip connections,while Swin Transformer captures long-range dependencies and models the input globally.And the model uses deep supervision and a hybrid loss function for faster convergence.On the LiTS2017 dataset,it achieves better segmentation performance than other models,with an average Dice coefficient of 97.0%for liver segmentation and 81.2%for tumor segmentation.展开更多
This paper proposes an image segmentation method based on the combination of the wavelet multi-scale edge detection and the entropy iterative threshold selection.Image for segmentation is divided into two parts by hig...This paper proposes an image segmentation method based on the combination of the wavelet multi-scale edge detection and the entropy iterative threshold selection.Image for segmentation is divided into two parts by high- and low-frequency.In the high-frequency part the wavelet multiscale was used for the edge detection,and the low-frequency part conducted on segmentation using the entropy iterative threshold selection method.Through the consideration of the image edge and region,a CT image of the thorax was chosen to test the proposed method for the segmentation of the lungs.Experimental results show that the method is efficient to segment the interesting region of an image compared with conventional methods.展开更多
Accurate and reliable crack segmentation is a challenge and meaningful task.In this article,aiming at the characteristics of cracks on the concrete images,the intensity frequency information of source images which is ...Accurate and reliable crack segmentation is a challenge and meaningful task.In this article,aiming at the characteristics of cracks on the concrete images,the intensity frequency information of source images which is obtained by Discrete Wavelet Transform(DWT)is fed into deep learning-based networks to enhance the ability of network on crack segmentation.To well integrate frequency information into network an effective and novel DWTA module based on the DWT and scSE attention mechanism is proposed.The semantic information of cracks is enhanced and the irrelevant information is suppressed by DWTA module.And the gap between frequency information and convolution information from network is balanced by DWTA module which can well fuse wavelet information into image segmentation network.The Unet-DWTA is proposed to preserved the information of crack boundary and thin crack in intermediate feature maps by adding DWTA module in the encoderdecoder structures.In decoder,diverse level feature maps are fused to capture the information of crack boundary and the abstract semantic information which is beneficial to crack pixel classification.The proposed method is verified on three classic datasets including CrackDataset,CrackForest,and DeepCrack datasets.Compared with the other crack methods,the proposed Unet-DWTA shows better performance based on the evaluation of the subjective analysis and objective metrics about image semantic segmentation.展开更多
Visual semantic segmentation aims at separating a visual sample into diverse blocks with specific semantic attributes and identifying the category for each block,and it plays a crucial role in environmental perception...Visual semantic segmentation aims at separating a visual sample into diverse blocks with specific semantic attributes and identifying the category for each block,and it plays a crucial role in environmental perception.Conventional learning-based visual semantic segmentation approaches count heavily on largescale training data with dense annotations and consistently fail to estimate accurate semantic labels for unseen categories.This obstruction spurs a craze for studying visual semantic segmentation with the assistance of few/zero-shot learning.The emergence and rapid progress of few/zero-shot visual semantic segmentation make it possible to learn unseen categories from a few labeled or even zero-labeled samples,which advances the extension to practical applications.Therefore,this paper focuses on the recently published few/zero-shot visual semantic segmentation methods varying from 2D to 3D space and explores the commonalities and discrepancies of technical settlements under different segmentation circumstances.Specifically,the preliminaries on few/zeroshot visual semantic segmentation,including the problem definitions,typical datasets,and technical remedies,are briefly reviewed and discussed.Moreover,three typical instantiations are involved to uncover the interactions of few/zero-shot learning with visual semantic segmentation,including image semantic segmentation,video object segmentation,and 3D segmentation.Finally,the future challenges of few/zero-shot visual semantic segmentation are discussed.展开更多
A large number of nanopores and complex fracture structures in shale reservoirs results in multi-scale flow of oil. With the development of shale oil reservoirs, the permeability of multi-scale media undergoes changes...A large number of nanopores and complex fracture structures in shale reservoirs results in multi-scale flow of oil. With the development of shale oil reservoirs, the permeability of multi-scale media undergoes changes due to stress sensitivity, which plays a crucial role in controlling pressure propagation and oil flow. This paper proposes a multi-scale coupled flow mathematical model of matrix nanopores, induced fractures, and hydraulic fractures. In this model, the micro-scale effects of shale oil flow in fractal nanopores, fractal induced fracture network, and stress sensitivity of multi-scale media are considered. We solved the model iteratively using Pedrosa transform, semi-analytic Segmented Bessel function, Laplace transform. The results of this model exhibit good agreement with the numerical solution and field production data, confirming the high accuracy of the model. As well, the influence of stress sensitivity on permeability, pressure and production is analyzed. It is shown that the permeability and production decrease significantly when induced fractures are weakly supported. Closed induced fractures can inhibit interporosity flow in the stimulated reservoir volume (SRV). It has been shown in sensitivity analysis that hydraulic fractures are beneficial to early production, and induced fractures in SRV are beneficial to middle production. The model can characterize multi-scale flow characteristics of shale oil, providing theoretical guidance for rapid productivity evaluation.展开更多
基金This research was supported by the National Natural Science Foundation of China No.62276086the National Key R&D Program of China No.2022YFD2000100Zhejiang Provincial Natural Science Foundation of China under Grant No.LTGN23D010002.
文摘Tea leaf picking is a crucial stage in tea production that directly influences the quality and value of the tea.Traditional tea-picking machines may compromise the quality of the tea leaves.High-quality teas are often handpicked and need more delicate operations in intelligent picking machines.Compared with traditional image processing techniques,deep learning models have stronger feature extraction capabilities,and better generalization and are more suitable for practical tea shoot harvesting.However,current research mostly focuses on shoot detection and cannot directly accomplish end-to-end shoot segmentation tasks.We propose a tea shoot instance segmentation model based on multi-scale mixed attention(Mask2FusionNet)using a dataset from the tea garden in Hangzhou.We further analyzed the characteristics of the tea shoot dataset,where the proportion of small to medium-sized targets is 89.9%.Our algorithm is compared with several mainstream object segmentation algorithms,and the results demonstrate that our model achieves an accuracy of 82%in recognizing the tea shoots,showing a better performance compared to other models.Through ablation experiments,we found that ResNet50,PointRend strategy,and the Feature Pyramid Network(FPN)architecture can improve performance by 1.6%,1.4%,and 2.4%,respectively.These experiments demonstrated that our proposed multi-scale and point selection strategy optimizes the feature extraction capability for overlapping small targets.The results indicate that the proposed Mask2FusionNet model can perform the shoot segmentation in unstructured environments,realizing the individual distinction of tea shoots,and complete extraction of the shoot edge contours with a segmentation accuracy of 82.0%.The research results can provide algorithmic support for the segmentation and intelligent harvesting of premium tea shoots at different scales.
基金funded by the National Natural Foundation of China under Grant No.61172167the Science Fund Project of Heilongjiang Province(LH2020F035).
文摘Nuclearmagnetic resonance imaging of breasts often presents complex backgrounds.Breast tumors exhibit varying sizes,uneven intensity,and indistinct boundaries.These characteristics can lead to challenges such as low accuracy and incorrect segmentation during tumor segmentation.Thus,we propose a two-stage breast tumor segmentation method leveraging multi-scale features and boundary attention mechanisms.Initially,the breast region of interest is extracted to isolate the breast area from surrounding tissues and organs.Subsequently,we devise a fusion network incorporatingmulti-scale features and boundary attentionmechanisms for breast tumor segmentation.We incorporate multi-scale parallel dilated convolution modules into the network,enhancing its capability to segment tumors of various sizes through multi-scale convolution and novel fusion techniques.Additionally,attention and boundary detection modules are included to augment the network’s capacity to locate tumors by capturing nonlocal dependencies in both spatial and channel domains.Furthermore,a hybrid loss function with boundary weight is employed to address sample class imbalance issues and enhance the network’s boundary maintenance capability through additional loss.Themethod was evaluated using breast data from 207 patients at RuijinHospital,resulting in a 6.64%increase in Dice similarity coefficient compared to the benchmarkU-Net.Experimental results demonstrate the superiority of the method over other segmentation techniques,with fewer model parameters.
基金supported by the Natural Science Foundation of China(No.41804112,author:Chengyun Song).
文摘Existing semi-supervisedmedical image segmentation algorithms use copy-paste data augmentation to correct the labeled-unlabeled data distribution mismatch.However,current copy-paste methods have three limitations:(1)training the model solely with copy-paste mixed pictures from labeled and unlabeled input loses a lot of labeled information;(2)low-quality pseudo-labels can cause confirmation bias in pseudo-supervised learning on unlabeled data;(3)the segmentation performance in low-contrast and local regions is less than optimal.We design a Stochastic Augmentation-Based Dual-Teaching Auxiliary Training Strategy(SADT),which enhances feature diversity and learns high-quality features to overcome these problems.To be more precise,SADT trains the Student Network by using pseudo-label-based training from Teacher Network 1 and supervised learning with labeled data,which prevents the loss of rare labeled data.We introduce a bi-directional copy-pastemask with progressive high-entropy filtering to reduce data distribution disparities and mitigate confirmation bias in pseudo-supervision.For the mixed images,Deep-Shallow Spatial Contrastive Learning(DSSCL)is proposed in the feature spaces of Teacher Network 2 and the Student Network to improve the segmentation capabilities in low-contrast and local areas.In this procedure,the features retrieved by the Student Network are subjected to a random feature perturbation technique.On two openly available datasets,extensive trials show that our proposed SADT performs much better than the state-ofthe-art semi-supervised medical segmentation techniques.Using only 10%of the labeled data for training,SADT was able to acquire a Dice score of 90.10%on the ACDC(Automatic Cardiac Diagnosis Challenge)dataset.
基金supported by the National Natural Science Foundation of China(62272049,62236006,62172045)the Key Projects of Beijing Union University(ZKZD202301).
文摘In recent years,gait-based emotion recognition has been widely applied in the field of computer vision.However,existing gait emotion recognition methods typically rely on complete human skeleton data,and their accuracy significantly declines when the data is occluded.To enhance the accuracy of gait emotion recognition under occlusion,this paper proposes a Multi-scale Suppression Graph ConvolutionalNetwork(MS-GCN).TheMS-GCN consists of three main components:Joint Interpolation Module(JI Moudle),Multi-scale Temporal Convolution Network(MS-TCN),and Suppression Graph Convolutional Network(SGCN).The JI Module completes the spatially occluded skeletal joints using the(K-Nearest Neighbors)KNN interpolation method.The MS-TCN employs convolutional kernels of various sizes to comprehensively capture the emotional information embedded in the gait,compensating for the temporal occlusion of gait information.The SGCN extracts more non-prominent human gait features by suppressing the extraction of key body part features,thereby reducing the negative impact of occlusion on emotion recognition results.The proposed method is evaluated on two comprehensive datasets:Emotion-Gait,containing 4227 real gaits from sources like BML,ICT-Pollick,and ELMD,and 1000 synthetic gaits generated using STEP-Gen technology,and ELMB,consisting of 3924 gaits,with 1835 labeled with emotions such as“Happy,”“Sad,”“Angry,”and“Neutral.”On the standard datasets Emotion-Gait and ELMB,the proposed method achieved accuracies of 0.900 and 0.896,respectively,attaining performance comparable to other state-ofthe-artmethods.Furthermore,on occlusion datasets,the proposedmethod significantly mitigates the performance degradation caused by occlusion compared to other methods,the accuracy is significantly higher than that of other methods.
文摘Lower back pain is one of the most common medical problems in the world and it is experienced by a huge percentage of people everywhere.Due to its ability to produce a detailed view of the soft tissues,including the spinal cord,nerves,intervertebral discs,and vertebrae,Magnetic Resonance Imaging is thought to be the most effective method for imaging the spine.The semantic segmentation of vertebrae plays a major role in the diagnostic process of lumbar diseases.It is difficult to semantically partition the vertebrae in Magnetic Resonance Images from the surrounding variety of tissues,including muscles,ligaments,and intervertebral discs.U-Net is a powerful deep-learning architecture to handle the challenges of medical image analysis tasks and achieves high segmentation accuracy.This work proposes a modified U-Net architecture namely MU-Net,consisting of the Meijering convolutional layer that incorporates the Meijering filter to perform the semantic segmentation of lumbar vertebrae L1 to L5 and sacral vertebra S1.Pseudo-colour mask images were generated and used as ground truth for training the model.The work has been carried out on 1312 images expanded from T1-weighted mid-sagittal MRI images of 515 patients in the Lumbar Spine MRI Dataset publicly available from Mendeley Data.The proposed MU-Net model for the semantic segmentation of the lumbar vertebrae gives better performance with 98.79%of pixel accuracy(PA),98.66%of dice similarity coefficient(DSC),97.36%of Jaccard coefficient,and 92.55%mean Intersection over Union(mean IoU)metrics using the mentioned dataset.
文摘Retinal blood vessel segmentation is crucial for diagnosing ocular and cardiovascular diseases.Although the introduction of U-Net in 2015 by Olaf Ronneberger significantly advanced this field,yet issues like limited training data,imbalance data distribution,and inadequate feature extraction persist,hindering both the segmentation performance and optimal model generalization.Addressing these critical issues,the DEFFA-Unet is proposed featuring an additional encoder to process domain-invariant pre-processed inputs,thereby improving both richer feature encoding and enhanced model generalization.A feature filtering fusion module is developed to ensure the precise feature filtering and robust hybrid feature fusion.In response to the task-specific need for higher precision where false positives are very costly,traditional skip connections are replaced with the attention-guided feature reconstructing fusion module.Additionally,innovative data augmentation and balancing methods are proposed to counter data scarcity and distribution imbalance,further boosting the robustness and generalization of the model.With a comprehensive suite of evaluation metrics,extensive validations on four benchmark datasets(DRIVE,CHASEDB1,STARE,and HRF)and an SLO dataset(IOSTAR),demonstrate the proposed method’s superiority over both baseline and state-of-the-art models.Particularly the proposed method significantly outperforms the compared methods in cross-validation model generalization.
文摘Medical image segmentation has become a cornerstone for many healthcare applications,allowing for the automated extraction of critical information from images such as Computed Tomography(CT)scans,Magnetic Resonance Imaging(MRIs),and X-rays.The introduction of U-Net in 2015 has significantly advanced segmentation capabilities,especially for small datasets commonly found in medical imaging.Since then,various modifications to the original U-Net architecture have been proposed to enhance segmentation accuracy and tackle challenges like class imbalance,data scarcity,and multi-modal image processing.This paper provides a detailed review and comparison of several U-Net-based architectures,focusing on their effectiveness in medical image segmentation tasks.We evaluate performance metrics such as Dice Similarity Coefficient(DSC)and Intersection over Union(IoU)across different U-Net variants including HmsU-Net,CrossU-Net,mResU-Net,and others.Our results indicate that architectural enhancements such as transformers,attention mechanisms,and residual connections improve segmentation performance across diverse medical imaging applications,including tumor detection,organ segmentation,and lesion identification.The study also identifies current challenges in the field,including data variability,limited dataset sizes,and issues with class imbalance.Based on these findings,the paper suggests potential future directions for improving the robustness and clinical applicability of U-Net-based models in medical image segmentation.
基金supported by Gansu Natural Science Foundation Programme(No.24JRRA231)National Natural Science Foundation of China(No.62061023)Gansu Provincial Education,Science and Technology Innovation and Industry(No.2021CYZC-04)。
文摘Brain tumor segmentation is critical in clinical diagnosis and treatment planning.Existing methods for brain tumor segmentation with missing modalities often struggle when dealing with multiple missing modalities,a common scenario in real-world clinical settings.These methods primarily focus on handling a single missing modality at a time,making them insufficiently robust for the additional complexity encountered with incomplete data containing various missing modality combinations.Additionally,most existing methods rely on single models,which may limit their performance and increase the risk of overfitting the training data.This work proposes a novel method called the ensemble adversarial co-training neural network(EACNet)for accurate brain tumor segmentation from multi-modal magnetic resonance imaging(MRI)scans with multiple missing modalities.The proposed method consists of three key modules:the ensemble of pre-trained models,which captures diverse feature representations from the MRI data by employing an ensemble of pre-trained models;adversarial learning,which leverages a competitive training approach involving two models;a generator model,which creates realistic missing data,while sub-networks acting as discriminators learn to distinguish real data from the generated“fake”data.Co-training framework utilizes the information extracted by the multimodal path(trained on complete scans)to guide the learning process in the path handling missing modalities.The model potentially compensates for missing information through co-training interactions by exploiting the relationships between available modalities and the tumor segmentation task.EACNet was evaluated on the BraTS2018 and BraTS2020 challenge datasets and achieved state-of-the-art and competitive performance respectively.Notably,the segmentation results for the whole tumor(WT)dice similarity coefficient(DSC)reached 89.27%,surpassing the performance of existing methods.The analysis suggests that the ensemble approach offers potential benefits,and the adversarial co-training contributes to the increased robustness and accuracy of EACNet for brain tumor segmentation of MRI scans with missing modalities.The experimental results show that EACNet has promising results for the task of brain tumor segmentation of MRI scans with missing modalities and is a better candidate for real-world clinical applications.
基金supported in part by the National Natural Science Foundation of China[62301374]Hubei Provincial Natural Science Foundation of China[2022CFB804]+2 种基金Hubei Provincial Education Research Project[B2022057]the Youths Science Foundation of Wuhan Institute of Technology[K202240]the 15th Graduate Education Innovation Fund of Wuhan Institute of Technology[CX2023295].
文摘This paper aims to develop a nonrigid registration method of preoperative and intraoperative thoracoabdominal CT images in computer-assisted interventional surgeries for accurate tumor localization and tissue visualization enhancement.However,fine structure registration of complex thoracoabdominal organs and large deformation registration caused by respiratory motion is challenging.To deal with this problem,we propose a 3D multi-scale attention VoxelMorph(MAVoxelMorph)registration network.To alleviate the large deformation problem,a multi-scale axial attention mechanism is utilized by using a residual dilated pyramid pooling for multi-scale feature extraction,and position-aware axial attention for long-distance dependencies between pixels capture.To further improve the large deformation and fine structure registration results,a multi-scale context channel attention mechanism is employed utilizing content information via adjacent encoding layers.Our method was evaluated on four public lung datasets(DIR-Lab dataset,Creatis dataset,Learn2Reg dataset,OASIS dataset)and a local dataset.Results proved that the proposed method achieved better registration performance than current state-of-the-art methods,especially in handling the registration of large deformations and fine structures.It also proved to be fast in 3D image registration,using about 1.5 s,and faster than most methods.Qualitative and quantitative assessments proved that the proposed MA-VoxelMorph has the potential to realize precise and fast tumor localization in clinical interventional surgeries.
基金supported by National Natural Science Foundation of China(No.61862037)Lanzhou Jiaotong University Tianyou Innovation Team Project(No.TY202002)。
文摘To solve the problems of redundant feature information,the insignificant difference in feature representation,and low recognition accuracy of the fine-grained image,based on the ResNeXt50 model,an MSFResNet network model is proposed by fusing multi-scale feature information.Firstly,a multi-scale feature extraction module is designed to obtain multi-scale information on feature images by using different scales of convolution kernels.Meanwhile,the channel attention mechanism is used to increase the global information acquisition of the network.Secondly,the feature images processed by the multi-scale feature extraction module are fused with the deep feature images through short links to guide the full learning of the network,thus reducing the loss of texture details of the deep network feature images,and improving network generalization ability and recognition accuracy.Finally,the validity of the MSFResNet model is verified using public datasets and applied to wild mushroom identification.Experimental results show that compared with ResNeXt50 network model,the accuracy of the MSFResNet model is improved by 6.01%on the FGVC-Aircraft common dataset.It achieves 99.13%classification accuracy on the wild mushroom dataset,which is 0.47%higher than ResNeXt50.Furthermore,the experimental results of the thermal map show that the MSFResNet model significantly reduces the interference of background information,making the network focus on the location of the main body of wild mushroom,which can effectively improve the accuracy of wild mushroom identification.
基金Funded by the National Natural Science Foundation of China Academy of Engineering Physics and Jointly Setup"NSAF"Joint Fund(No.U1430119)。
文摘The multi-scale modeling combined with the cohesive zone model(CZM)and the molecular dynamics(MD)method were preformed to simulate the crack propagation in NiTi shape memory alloys(SMAs).The metallographic microscope and image processing technology were employed to achieve a quantitative grain size distribution of NiTi alloys so as to provide experimental data for molecular dynamics modeling at the atomic scale.Considering the size effect of molecular dynamics model on material properties,a reasonable modeling size was provided by taking into account three characteristic dimensions from the perspective of macro,meso,and micro scales according to the Buckinghamπtheorem.Then,the corresponding MD simulation on deformation and fracture behavior was investigated to derive a parameterized traction-separation(T-S)law,and then it was embedded into cohesive elements of finite element software.Thus,the crack propagation behavior in NiTi alloys was reproduced by the finite element method(FEM).The experimental results show that the predicted initiation fracture toughness is in good agreement with experimental data.In addition,it is found that the dynamics initiation fracture toughness increases with decreasing grain size and increasing loading velocity.
基金Projects(61172002,61001047,60671050)supported by the National Natural Science Foundation of ChinaProject(N100404010)supported by Fundamental Research Grant Scheme for the Central Universities,China
文摘A new algorithm for segmentation of suspected lung ROI(regions of interest)by mean-shift clustering and multi-scale HESSIAN matrix dot filtering was proposed.Original image was firstly filtered by multi-scale HESSIAN matrix dot filters,round suspected nodular lesions in the image were enhanced,and linear shape regions of the trachea and vascular were suppressed.Then,three types of information,such as,shape filtering value of HESSIAN matrix,gray value,and spatial location,were introduced to feature space.The kernel function of mean-shift clustering was divided into product form of three kinds of kernel functions corresponding to the three feature information.Finally,bandwidths were calculated adaptively to determine the bandwidth of each suspected area,and they were used in mean-shift clustering segmentation.Experimental results show that by the introduction of HESSIAN matrix of dot filtering information to mean-shift clustering,nodular regions can be segmented from blood vessels,trachea,or cross regions connected to the nodule,non-nodular areas can be removed from ROIs properly,and ground glass object(GGO)nodular areas can also be segmented.For the experimental data set of 127 different forms of nodules,the average accuracy of the proposed algorithm is more than 90%.
基金National Natural Science Foundation of China(No.61261029)
文摘Watershed segmentation is sensitive to noises and irregular details within the image,which frequently leads to a serious over-segmentation Linear filtering before watershed segmentation can reduce over-segmentation to some extent,however,it often causes the position offset of object contours.For the purpose of reducing over-segmentation to preserve the location of object contours,the watershed segmentation based on the hierarchical multi-scale modification of morphological gradient is proposed.Firstly,multi-scale morphological filtering was employed to smooth the original image.Then,the gradient image was divided into multi-levels by the volume of three-dimension topographic relief,where the lower gradient layers were further modifiedby morphological closing with larger-sized structuring-elements,and the higher layers with the smaller one.In this way,most local minimums caused by irregular details and noises can be removed,while region contour positions corresponding to the target area were largely preserved.Finally,morphological watershed algorithm was employed to implement segmentation on the modified gradient image.The experimental results show that the proposed method can greatly reduce the over-segmentation of the watershed and avoid the position offset of the object contours.
基金This work was supported by the Project of Sichuan Outstanding Young Scientific and Technological Talents(19JCQN0003)the major Project of Education Department in Sichuan(17ZA0063 and 2017JQ0030)+1 种基金in part by the Natural Science Foundation for Young Scientists of CUIT(J201704)the Sichuan Science and Technology Program(2019JDRC0077).
文摘Cardiomyopathy is one of the most serious public health threats.The precise structural and functional cardiac measurement is an essential step for clinical diagnosis and follow-up treatment planning.Cardiologists are often required to draw endocardial and epicardial contours of the left ventricle(LV)manually in routine clinical diagnosis or treatment planning period.This task is time-consuming and error-prone.Therefore,it is necessary to develop a fully automated end-to-end semantic segmentation method on cardiac magnetic resonance(CMR)imaging datasets.However,due to the low image quality and the deformation caused by heartbeat,there is no effective tool for fully automated end-to-end cardiac segmentation task.In this work,we propose a multi-scale segmentation network(MSSN)for left ventricle segmentation.It can effectively learn myocardium and blood pool structure representations from 2D short-axis CMR image slices in a multi-scale way.Specifically,our method employs both parallel and serial of dilated convolution layers with different dilation rates to capture multi-scale semantic features.Moreover,we design graduated up-sampling layers with subpixel layers as the decoder to reconstruct lost spatial information and produce accurate segmentation masks.We validated our method using 164 T1 Mapping CMR images and showed that it outperforms the advanced convolutional neural network(CNN)models.In validation metrics,we archived the Dice Similarity Coefficient(DSC)metric of 78.96%.
基金This work was supported in part by the National Natural Science Foundation of China(Nos.62072074,62076054,62027827,61902054)the Frontier Science and Technology Innovation Projects of National Key R&D Program(No.2019QY1405)+2 种基金the Sichuan Science and Technology Innovation Platform and Talent Plan(No.2020JDJQ0020)the Sichuan Science and Technology Support Plan(No.2020YFSY0010)the Natural Science Foundation of Guangdong Province(No.2018A030313354).
文摘As an important part of the new generation of information technology,the Internet of Things(IoT)has been widely concerned and regarded as an enabling technology of the next generation of health care system.The fundus photography equipment is connected to the cloud platform through the IoT,so as to realize the realtime uploading of fundus images and the rapid issuance of diagnostic suggestions by artificial intelligence.At the same time,important security and privacy issues have emerged.The data uploaded to the cloud platform involves more personal attributes,health status and medical application data of patients.Once leaked,abused or improperly disclosed,personal information security will be violated.Therefore,it is important to address the security and privacy issues of massive medical and healthcare equipment connecting to the infrastructure of IoT healthcare and health systems.To meet this challenge,we propose MIA-UNet,a multi-scale iterative aggregation U-network,which aims to achieve accurate and efficient retinal vessel segmentation for ophthalmic auxiliary diagnosis while ensuring that the network has low computational complexity to adapt to mobile terminals.In this way,users do not need to upload the data to the cloud platform,and can analyze and process the fundus images on their own mobile terminals,thus eliminating the leakage of personal information.Specifically,the interconnection between encoder and decoder,as well as the internal connection between decoder subnetworks in classic U-Net are redefined and redesigned.Furthermore,we propose a hybrid loss function to smooth the gradient and deal with the imbalance between foreground and background.Compared with the UNet,the segmentation performance of the proposed network is significantly improved on the premise that the number of parameters is only increased by 2%.When applied to three publicly available datasets:DRIVE,STARE and CHASE DB1,the proposed network achieves the accuracy/F1-score of 96.33%/84.34%,97.12%/83.17%and 97.06%/84.10%,respectively.The experimental results show that the MIA-UNet is superior to the state-of-the-art methods.
文摘Liver cancer has the second highest incidence rate among all types of malignant tumors,and currently,its diagnosis heavily depends on doctors’manual labeling of CT scan images,a process that is time-consuming and susceptible to subjective errors.To address the aforementioned issues,we propose an automatic segmentation model for liver and tumors called Res2Swin Unet,which is based on the Unet architecture.The model combines Attention-Res2 and Swin Transformer modules for liver and tumor segmentation,respectively.Attention-Res2 merges multiple feature map parts with an Attention gate via skip connections,while Swin Transformer captures long-range dependencies and models the input globally.And the model uses deep supervision and a hybrid loss function for faster convergence.On the LiTS2017 dataset,it achieves better segmentation performance than other models,with an average Dice coefficient of 97.0%for liver segmentation and 81.2%for tumor segmentation.
基金Science Research Foundation of Yunnan Fundamental Research Foundation of Applicationgrant number:2009ZC049M+1 种基金Science Research Foundation for the Overseas Chinese Scholars,State Education Ministrygrant number:2010-1561
文摘This paper proposes an image segmentation method based on the combination of the wavelet multi-scale edge detection and the entropy iterative threshold selection.Image for segmentation is divided into two parts by high- and low-frequency.In the high-frequency part the wavelet multiscale was used for the edge detection,and the low-frequency part conducted on segmentation using the entropy iterative threshold selection method.Through the consideration of the image edge and region,a CT image of the thorax was chosen to test the proposed method for the segmentation of the lungs.Experimental results show that the method is efficient to segment the interesting region of an image compared with conventional methods.
基金National Natural Science Foundation of China under Grant 61972267National Natural Science Foundation of Hebei Province under Grant F2018210148University Science Research Project of Hebei Province under Grant ZD2021334。
文摘Accurate and reliable crack segmentation is a challenge and meaningful task.In this article,aiming at the characteristics of cracks on the concrete images,the intensity frequency information of source images which is obtained by Discrete Wavelet Transform(DWT)is fed into deep learning-based networks to enhance the ability of network on crack segmentation.To well integrate frequency information into network an effective and novel DWTA module based on the DWT and scSE attention mechanism is proposed.The semantic information of cracks is enhanced and the irrelevant information is suppressed by DWTA module.And the gap between frequency information and convolution information from network is balanced by DWTA module which can well fuse wavelet information into image segmentation network.The Unet-DWTA is proposed to preserved the information of crack boundary and thin crack in intermediate feature maps by adding DWTA module in the encoderdecoder structures.In decoder,diverse level feature maps are fused to capture the information of crack boundary and the abstract semantic information which is beneficial to crack pixel classification.The proposed method is verified on three classic datasets including CrackDataset,CrackForest,and DeepCrack datasets.Compared with the other crack methods,the proposed Unet-DWTA shows better performance based on the evaluation of the subjective analysis and objective metrics about image semantic segmentation.
基金supported by National Key Research and Development Program of China(2021YFB1714300)the National Natural Science Foundation of China(62233005)+2 种基金in part by the CNPC Innovation Fund(2021D002-0902)Fundamental Research Funds for the Central Universities and Shanghai AI Labsponsored by Shanghai Gaofeng and Gaoyuan Project for University Academic Program Development。
文摘Visual semantic segmentation aims at separating a visual sample into diverse blocks with specific semantic attributes and identifying the category for each block,and it plays a crucial role in environmental perception.Conventional learning-based visual semantic segmentation approaches count heavily on largescale training data with dense annotations and consistently fail to estimate accurate semantic labels for unseen categories.This obstruction spurs a craze for studying visual semantic segmentation with the assistance of few/zero-shot learning.The emergence and rapid progress of few/zero-shot visual semantic segmentation make it possible to learn unseen categories from a few labeled or even zero-labeled samples,which advances the extension to practical applications.Therefore,this paper focuses on the recently published few/zero-shot visual semantic segmentation methods varying from 2D to 3D space and explores the commonalities and discrepancies of technical settlements under different segmentation circumstances.Specifically,the preliminaries on few/zeroshot visual semantic segmentation,including the problem definitions,typical datasets,and technical remedies,are briefly reviewed and discussed.Moreover,three typical instantiations are involved to uncover the interactions of few/zero-shot learning with visual semantic segmentation,including image semantic segmentation,video object segmentation,and 3D segmentation.Finally,the future challenges of few/zero-shot visual semantic segmentation are discussed.
基金This study was supported by the National Natural Science Foundation of China(U22B2075,52274056,51974356).
文摘A large number of nanopores and complex fracture structures in shale reservoirs results in multi-scale flow of oil. With the development of shale oil reservoirs, the permeability of multi-scale media undergoes changes due to stress sensitivity, which plays a crucial role in controlling pressure propagation and oil flow. This paper proposes a multi-scale coupled flow mathematical model of matrix nanopores, induced fractures, and hydraulic fractures. In this model, the micro-scale effects of shale oil flow in fractal nanopores, fractal induced fracture network, and stress sensitivity of multi-scale media are considered. We solved the model iteratively using Pedrosa transform, semi-analytic Segmented Bessel function, Laplace transform. The results of this model exhibit good agreement with the numerical solution and field production data, confirming the high accuracy of the model. As well, the influence of stress sensitivity on permeability, pressure and production is analyzed. It is shown that the permeability and production decrease significantly when induced fractures are weakly supported. Closed induced fractures can inhibit interporosity flow in the stimulated reservoir volume (SRV). It has been shown in sensitivity analysis that hydraulic fractures are beneficial to early production, and induced fractures in SRV are beneficial to middle production. The model can characterize multi-scale flow characteristics of shale oil, providing theoretical guidance for rapid productivity evaluation.