摘要
高度文化特定含义的词汇或表达方式复杂,导致语料库文本解译误差暴露概率增加,由此,提出了基于相似度的语料库文本解译误差自动校正。按照选择话题建立样本语料库,提取出关键词,利用文本特征和相似性阈值,计算特征词权重,获得语义特征相似性。应用K-近邻算法标记文本特征,计算并优化解译误差概率,识别出文本解译误差;将注意力函数变换成输出矢量,挖掘序列文本解译信息,获得数据的最大似然度,通过自动化对比与校正,实现文本解译误差自动校正。仿真结果表明,文本解译误差自动校正后,解译准确性较高,优化了解译质量,促进跨语言沟通。
The complexity of vocabulary or expressions with highly cultural specific meanings increases the probability of exposure to errors in corpus text interpretation.Therefore,a similarity based automatic correction of corpus text interpretation errors is proposed.Establish a sample corpus according to the selected topics,extract keywords,use text features and similarity threshold,calculate the weight of feature words,and obtain semantic feature similarity.The K-nearest neighbor algorithm is used to mark the text features,calculate and optimize the interpretation error probability,and identify the text interpretation error;The attention function is transformed into an output vector,and the sequential text interpretation information is mined to obtain the maximum likelihood of the data.Through automatic comparison and correction,the text interpretation error is automatically corrected.The simulation results show that after the automatic correction of text interpretation errors,the interpretation accuracy is high,the understanding of translation quality is optimized,and cross language communication is promoted.
作者
周永英
薛阿亮
ZHOU Yong-ying;XUE A-liang(Xingzhi College of Xi'an University of Finance and Economics,Xi'an Shanxi 710038,China;Aerial Photogrammetry and Remote Sensing Group Co.,LTD,Xi'an Shaanxi 710199,China)
出处
《计算机仿真》
2025年第2期553-557,共5页
Computer Simulation
基金
陕西省哲学社会科学研究专项(2023HZ0993)。
关键词
语料库
语义相似度
文本解译
自注意力机制
误差校正
Corpus
Semantic similarity
Text decoding
Self-attention mechanism
Error correction