Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for India...Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for Indian English linguistics and categorized it into three main categories:(1)audio recognition,(2)visual feature extraction,and(3)combined audio and visual recognition.Audio features were extracted using the mel-frequency cepstral coefficient,and classification was performed using a one-dimension convolutional neural network.Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks.Finally,integration was performed using a deep convolutional network.The audio speech of Indian English was successfully recognized with accuracies of 93.67%and 91.53%,respectively,using testing data from 200 epochs.The training accuracy for visual speech recognition using the Indian English dataset was 77.48%and the test accuracy was 76.19%using 60 epochs.After integration,the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67%and 91.75%,respectively.展开更多
A tremendous amount of vendor invoices is generated in the corporate sector.To automate the manual data entry in payable documents,highly accurate Optical Character Recognition(OCR)is required.This paper proposes an e...A tremendous amount of vendor invoices is generated in the corporate sector.To automate the manual data entry in payable documents,highly accurate Optical Character Recognition(OCR)is required.This paper proposes an end-to-end OCR system that does both localization and recognition and serves as a single unit to automate payable document processing such as cheques and cash disbursement.For text localization,the maximally stable extremal region is used,which extracts a word or digit chunk from an invoice.This chunk is later passed to the deep learning model,which performs text recognition.The deep learning model utilizes both convolution neural networks and long short-term memory(LSTM).The convolution layer is used for extracting features,which are fed to the LSTM.The model integrates feature extraction,modeling sequence,and transcription into a unified network.It handles the sequences of unconstrained lengths,independent of the character segmentation or horizontal scale normalization.Furthermore,it applies to both the lexicon-free and lexicon-based text recognition,and finally,it produces a comparatively smaller model,which can be implemented in practical applications.The overall superior performance in the experimental evaluation demonstrates the usefulness of the proposed model.The model is thus generic and can be used for other similar recognition scenarios.展开更多
As a common and high-risk type of disease,heart disease seriously threatens people’s health.At the same time,in the era of the Internet of Thing(IoT),smart medical device has strong practical significance for medical...As a common and high-risk type of disease,heart disease seriously threatens people’s health.At the same time,in the era of the Internet of Thing(IoT),smart medical device has strong practical significance for medical workers and patients because of its ability to assist in the diagnosis of diseases.Therefore,the research of real-time diagnosis and classification algorithms for arrhythmia can help to improve the diagnostic efficiency of diseases.In this paper,we design an automatic arrhythmia classification algorithm model based on Convolutional Neural Network(CNN)and Encoder-Decoder model.The model uses Long Short-Term Memory(LSTM)to consider the influence of time series features on classification results.Simultaneously,it is trained and tested by the MIT-BIH arrhythmia database.Besides,Generative Adversarial Networks(GAN)is adopted as a method of data equalization for solving data imbalance problem.The simulation results show that for the inter-patient arrhythmia classification,the hybrid model combining CNN and Encoder-Decoder model has the best classification accuracy,of which the accuracy can reach 94.05%.Especially,it has a better advantage for the classification effect of supraventricular ectopic beats(class S)and fusion beats(class F).展开更多
Load forecasting is of great significance to the development of new power systems.With the advancement of smart grids,the integration and distribution of distributed renewable energy sources and power electronics devi...Load forecasting is of great significance to the development of new power systems.With the advancement of smart grids,the integration and distribution of distributed renewable energy sources and power electronics devices have made power load data increasingly complex and volatile.This places higher demands on the prediction and analysis of power loads.In order to improve the prediction accuracy of short-term power load,a CNN-BiLSTMTPA short-term power prediction model based on the Improved Whale Optimization Algorithm(IWOA)with mixed strategies was proposed.Firstly,the model combined the Convolutional Neural Network(CNN)with the Bidirectional Long Short-Term Memory Network(BiLSTM)to fully extract the spatio-temporal characteristics of the load data itself.Then,the Temporal Pattern Attention(TPA)mechanism was introduced into the CNN-BiLSTM model to automatically assign corresponding weights to the hidden states of the BiLSTM.This allowed the model to differentiate the importance of load sequences at different time intervals.At the same time,in order to solve the problem of the difficulties of selecting the parameters of the temporal model,and the poor global search ability of the whale algorithm,which is easy to fall into the local optimization,the whale algorithm(IWOA)was optimized by using the hybrid strategy of Tent chaos mapping and Levy flight strategy,so as to better search the parameters of the model.In this experiment,the real load data of a region in Zhejiang was taken as an example to analyze,and the prediction accuracy(R2)of the proposed method reached 98.83%.Compared with the prediction models such as BP,WOA-CNN-BiLSTM,SSA-CNN-BiLSTM,CNN-BiGRU-Attention,etc.,the experimental results showed that the model proposed in this study has a higher prediction accuracy.展开更多
滚动轴承作为机械设备的重要部件,对其进行剩余使用寿命预测在企业的生产过程中变得越来越重要。目前,虽然主流的卷积神经网络(convolutional neural network, CNN)可以自动地从轴承的振动信号中提取特征,却不能给特征分配不同的权重来...滚动轴承作为机械设备的重要部件,对其进行剩余使用寿命预测在企业的生产过程中变得越来越重要。目前,虽然主流的卷积神经网络(convolutional neural network, CNN)可以自动地从轴承的振动信号中提取特征,却不能给特征分配不同的权重来提高模型对重要特征的关注程度,对于长时间序列容易丢失重要信息。另外,神经网络中隐藏层神经元个数、学习率以及正则化参数等超参数还需要依靠人工经验设置。为了解决上述问题,提出基于灰狼优化(grey wolf optimizer, GWO)算法、优化集合CNN、双向长短期记忆(bidirectional long short term memory, BiLSTM)网络和注意力机制(Attention)轴承剩余使用寿命预测方法。首先,从原始振动信号中提取时域、频域以及时频域特征指标构建可选特征集;然后,通过构建考虑特征相关性、鲁棒性和单调性的综合评价指标筛选出高于设定阈值的轴承退化敏感特征集,作为预测模型的输入;最后,将预测值和真实值的均方误差作为GWO算法的适应度函数,优化预测模型获得最优隐藏层神经元个数、学习率和正则化参数,利用优化后模型进行剩余使用寿命预测,并在公开数据集上进行验证。结果表明,所提方法可在非经验指导下获得最优的超参数组合,优化后的预测模型与未进行优化模型相比,平均绝对误差与均方根误差分别降低了28.8%和24.3%。展开更多
Hand gestures are a natural way for human-robot interaction.Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications.This paper presents a novel deep learning netwo...Hand gestures are a natural way for human-robot interaction.Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications.This paper presents a novel deep learning network for hand gesture recognition.The network integrates several well-proved modules together to learn both short-term and long-term features from video inputs and meanwhile avoid intensive computation.To learn short-term features,each video input is segmented into a fixed number of frame groups.A frame is randomly selected from each group and represented as an RGB image as well as an optical flow snapshot.These two entities are fused and fed into a convolutional neural network(Conv Net)for feature extraction.The Conv Nets for all groups share parameters.To learn longterm features,outputs from all Conv Nets are fed into a long short-term memory(LSTM)network,by which a final classification result is predicted.The new model has been tested with two popular hand gesture datasets,namely the Jester dataset and Nvidia dataset.Comparing with other models,our model produced very competitive results.The robustness of the new model has also been proved with an augmented dataset with enhanced diversity of hand gestures.展开更多
当前推特等国外社交平台,已成为从事网络黑灰产犯罪不可或缺的工具,对推特上黑灰产账号进行发现、检测和分类对于打击网络犯罪、维护社会稳定具有重大意义。现有的推文分类模型双向长短时记忆网络(bi-directional long short-term memor...当前推特等国外社交平台,已成为从事网络黑灰产犯罪不可或缺的工具,对推特上黑灰产账号进行发现、检测和分类对于打击网络犯罪、维护社会稳定具有重大意义。现有的推文分类模型双向长短时记忆网络(bi-directional long short-term memory,BiLSTM)可以学习推文的上下文信息,却无法学习局部关键信息,卷积神经网络(convolution neural network,CNN)模型可以学习推文的局部关键信息,却无法学习推文的上下文信息。结合BiLSTM与CNN两种模型的优势,提出了BiLSTM-CNN推文分类模型,该模型将推文进行向量化后,输入BiLSTM模型学习推文的上下文信息,再在BiLSTM模型后引入CNN层,进行局部特征的提取,最后使用全连接层将经过池化的特征连接在一起,并应用softmax函数进行四分类。模型在自主构建的中文推特黑灰产推文数据集上进行实验,并使用TextCNN、TextRNN、TextRCNN三种分类模型作为对比实验,实验结果显示,所提的BiLSTM-CNN推文分类模型在对四类推文进行分类的宏准确率为98.32%,明显高于TextCNN、TextRNN和TextRCNN三种模型的准确率。展开更多
Geochemical survey data analysis is recognized as an implemented and feasible way for lithological mapping to assist mineral exploration.With respect to available approaches,recent methodological advances have focused...Geochemical survey data analysis is recognized as an implemented and feasible way for lithological mapping to assist mineral exploration.With respect to available approaches,recent methodological advances have focused on deep learning algorithms which provide access to learn and extract information directly from geochemical survey data through multi-level networks and outputting end-to-end classification.Accordingly,this study developed a lithological mapping framework with the joint application of a convolutional neural network(CNN)and a long short-term memory(LSTM).The CNN-LSTM model is dominant in correlation extraction from CNN layers and coupling interaction learning from LSTM layers.This hybrid approach was demonstrated by mapping leucogranites in the Himalayan orogen based on stream sediment geochemical survey data,where the targeted leucogranite was expected to be potential resources of rare metals such as Li,Be,and W mineralization.Three comparative case studies were carried out from both visual and quantitative perspectives to illustrate the superiority of the proposed model.A guided spatial distribution map of leucogranites in the Himalayan orogen,divided into high-,moderate-,and low-potential areas,was delineated by the success rate curve,which further improves the efficiency for identifying unmapped leucogranites through geological mapping.In light of these results,this study provides an alternative solution for lithologic mapping using geochemical survey data at a regional scale and reduces the risk for decision making associated with mineral exploration.展开更多
文章提出一种基于小波变换和卷积神经网络-双向长短期记忆(Convolutional Neural Network-Bidirectional Long Short Term Memory,CNN-BiLSTM)的电力电缆故障定位算法,结合小波变换的时频局部化特性和CNN与BiLSTM的深度学习能力,以提升...文章提出一种基于小波变换和卷积神经网络-双向长短期记忆(Convolutional Neural Network-Bidirectional Long Short Term Memory,CNN-BiLSTM)的电力电缆故障定位算法,结合小波变换的时频局部化特性和CNN与BiLSTM的深度学习能力,以提升故障定位的精准性。为验证提出算法的有效性,将True、BiLSTM、极值域均值模式分解(Extremum field Mean Mode Decomposition,EMMD)+小波变换算法与本文算法进行对比实验分析。实验结果表明,基于小波变换和CNN-BiLSTM的电力电缆故障定位算法能够将定位误差控制在0.02 km以内,显著提高了故障定位的精度。展开更多
文摘Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for Indian English linguistics and categorized it into three main categories:(1)audio recognition,(2)visual feature extraction,and(3)combined audio and visual recognition.Audio features were extracted using the mel-frequency cepstral coefficient,and classification was performed using a one-dimension convolutional neural network.Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks.Finally,integration was performed using a deep convolutional network.The audio speech of Indian English was successfully recognized with accuracies of 93.67%and 91.53%,respectively,using testing data from 200 epochs.The training accuracy for visual speech recognition using the Indian English dataset was 77.48%and the test accuracy was 76.19%using 60 epochs.After integration,the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67%and 91.75%,respectively.
基金Researchers would like to thank the Deanship of Scientific Research,Qassim University,for funding publication of this project.
文摘A tremendous amount of vendor invoices is generated in the corporate sector.To automate the manual data entry in payable documents,highly accurate Optical Character Recognition(OCR)is required.This paper proposes an end-to-end OCR system that does both localization and recognition and serves as a single unit to automate payable document processing such as cheques and cash disbursement.For text localization,the maximally stable extremal region is used,which extracts a word or digit chunk from an invoice.This chunk is later passed to the deep learning model,which performs text recognition.The deep learning model utilizes both convolution neural networks and long short-term memory(LSTM).The convolution layer is used for extracting features,which are fed to the LSTM.The model integrates feature extraction,modeling sequence,and transcription into a unified network.It handles the sequences of unconstrained lengths,independent of the character segmentation or horizontal scale normalization.Furthermore,it applies to both the lexicon-free and lexicon-based text recognition,and finally,it produces a comparatively smaller model,which can be implemented in practical applications.The overall superior performance in the experimental evaluation demonstrates the usefulness of the proposed model.The model is thus generic and can be used for other similar recognition scenarios.
基金Fundamental Research Funds for the Central Universities(Grant No.FRF-TP-19-006A3).
文摘As a common and high-risk type of disease,heart disease seriously threatens people’s health.At the same time,in the era of the Internet of Thing(IoT),smart medical device has strong practical significance for medical workers and patients because of its ability to assist in the diagnosis of diseases.Therefore,the research of real-time diagnosis and classification algorithms for arrhythmia can help to improve the diagnostic efficiency of diseases.In this paper,we design an automatic arrhythmia classification algorithm model based on Convolutional Neural Network(CNN)and Encoder-Decoder model.The model uses Long Short-Term Memory(LSTM)to consider the influence of time series features on classification results.Simultaneously,it is trained and tested by the MIT-BIH arrhythmia database.Besides,Generative Adversarial Networks(GAN)is adopted as a method of data equalization for solving data imbalance problem.The simulation results show that for the inter-patient arrhythmia classification,the hybrid model combining CNN and Encoder-Decoder model has the best classification accuracy,of which the accuracy can reach 94.05%.Especially,it has a better advantage for the classification effect of supraventricular ectopic beats(class S)and fusion beats(class F).
文摘Load forecasting is of great significance to the development of new power systems.With the advancement of smart grids,the integration and distribution of distributed renewable energy sources and power electronics devices have made power load data increasingly complex and volatile.This places higher demands on the prediction and analysis of power loads.In order to improve the prediction accuracy of short-term power load,a CNN-BiLSTMTPA short-term power prediction model based on the Improved Whale Optimization Algorithm(IWOA)with mixed strategies was proposed.Firstly,the model combined the Convolutional Neural Network(CNN)with the Bidirectional Long Short-Term Memory Network(BiLSTM)to fully extract the spatio-temporal characteristics of the load data itself.Then,the Temporal Pattern Attention(TPA)mechanism was introduced into the CNN-BiLSTM model to automatically assign corresponding weights to the hidden states of the BiLSTM.This allowed the model to differentiate the importance of load sequences at different time intervals.At the same time,in order to solve the problem of the difficulties of selecting the parameters of the temporal model,and the poor global search ability of the whale algorithm,which is easy to fall into the local optimization,the whale algorithm(IWOA)was optimized by using the hybrid strategy of Tent chaos mapping and Levy flight strategy,so as to better search the parameters of the model.In this experiment,the real load data of a region in Zhejiang was taken as an example to analyze,and the prediction accuracy(R2)of the proposed method reached 98.83%.Compared with the prediction models such as BP,WOA-CNN-BiLSTM,SSA-CNN-BiLSTM,CNN-BiGRU-Attention,etc.,the experimental results showed that the model proposed in this study has a higher prediction accuracy.
文摘滚动轴承作为机械设备的重要部件,对其进行剩余使用寿命预测在企业的生产过程中变得越来越重要。目前,虽然主流的卷积神经网络(convolutional neural network, CNN)可以自动地从轴承的振动信号中提取特征,却不能给特征分配不同的权重来提高模型对重要特征的关注程度,对于长时间序列容易丢失重要信息。另外,神经网络中隐藏层神经元个数、学习率以及正则化参数等超参数还需要依靠人工经验设置。为了解决上述问题,提出基于灰狼优化(grey wolf optimizer, GWO)算法、优化集合CNN、双向长短期记忆(bidirectional long short term memory, BiLSTM)网络和注意力机制(Attention)轴承剩余使用寿命预测方法。首先,从原始振动信号中提取时域、频域以及时频域特征指标构建可选特征集;然后,通过构建考虑特征相关性、鲁棒性和单调性的综合评价指标筛选出高于设定阈值的轴承退化敏感特征集,作为预测模型的输入;最后,将预测值和真实值的均方误差作为GWO算法的适应度函数,优化预测模型获得最优隐藏层神经元个数、学习率和正则化参数,利用优化后模型进行剩余使用寿命预测,并在公开数据集上进行验证。结果表明,所提方法可在非经验指导下获得最优的超参数组合,优化后的预测模型与未进行优化模型相比,平均绝对误差与均方根误差分别降低了28.8%和24.3%。
文摘Hand gestures are a natural way for human-robot interaction.Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications.This paper presents a novel deep learning network for hand gesture recognition.The network integrates several well-proved modules together to learn both short-term and long-term features from video inputs and meanwhile avoid intensive computation.To learn short-term features,each video input is segmented into a fixed number of frame groups.A frame is randomly selected from each group and represented as an RGB image as well as an optical flow snapshot.These two entities are fused and fed into a convolutional neural network(Conv Net)for feature extraction.The Conv Nets for all groups share parameters.To learn longterm features,outputs from all Conv Nets are fed into a long short-term memory(LSTM)network,by which a final classification result is predicted.The new model has been tested with two popular hand gesture datasets,namely the Jester dataset and Nvidia dataset.Comparing with other models,our model produced very competitive results.The robustness of the new model has also been proved with an augmented dataset with enhanced diversity of hand gestures.
基金supported by the National Natural Science Foundation of China (Nos.41972303 and 42102332)the Natural Science Foundation of Hubei Province (China) (Nos.2023AFA001 and 2023AFD232).
文摘Geochemical survey data analysis is recognized as an implemented and feasible way for lithological mapping to assist mineral exploration.With respect to available approaches,recent methodological advances have focused on deep learning algorithms which provide access to learn and extract information directly from geochemical survey data through multi-level networks and outputting end-to-end classification.Accordingly,this study developed a lithological mapping framework with the joint application of a convolutional neural network(CNN)and a long short-term memory(LSTM).The CNN-LSTM model is dominant in correlation extraction from CNN layers and coupling interaction learning from LSTM layers.This hybrid approach was demonstrated by mapping leucogranites in the Himalayan orogen based on stream sediment geochemical survey data,where the targeted leucogranite was expected to be potential resources of rare metals such as Li,Be,and W mineralization.Three comparative case studies were carried out from both visual and quantitative perspectives to illustrate the superiority of the proposed model.A guided spatial distribution map of leucogranites in the Himalayan orogen,divided into high-,moderate-,and low-potential areas,was delineated by the success rate curve,which further improves the efficiency for identifying unmapped leucogranites through geological mapping.In light of these results,this study provides an alternative solution for lithologic mapping using geochemical survey data at a regional scale and reduces the risk for decision making associated with mineral exploration.
文摘文章提出一种基于小波变换和卷积神经网络-双向长短期记忆(Convolutional Neural Network-Bidirectional Long Short Term Memory,CNN-BiLSTM)的电力电缆故障定位算法,结合小波变换的时频局部化特性和CNN与BiLSTM的深度学习能力,以提升故障定位的精准性。为验证提出算法的有效性,将True、BiLSTM、极值域均值模式分解(Extremum field Mean Mode Decomposition,EMMD)+小波变换算法与本文算法进行对比实验分析。实验结果表明,基于小波变换和CNN-BiLSTM的电力电缆故障定位算法能够将定位误差控制在0.02 km以内,显著提高了故障定位的精度。