基于SVM的汉语句子片段划分

Chinese sentence segmentation based on SVM method

在线阅读下载PDF

导出

摘要针对长句子引起句法分析性能下降的问题,本文提出了一种基于SVM的句子片段划分方法:先根据语法结构将句子划分为多个片段,识别出每个片段的类别;然后根据片段的类别将句子分割为几个部分,每个部分作为句法分析的基本单元;最后将句法分析之后的各个部分进行合并,形成完整的分析结果.该方法减小了句法分析的复杂度,提高了分析的准确率. Aimed at the decreased performance of syntactic parsing caused by long sentence, this paper presents a method of identifying the segments based on the SVM classifier to solve this problem. In this method, a sentence is firstly divided into different segments, each of which is assigned a label to indicate its syntactic type. Then the sentence is parsed based on the segments. Finally, all the segments are linked together through the dependency relations and the parsing of the whole dependency tree is completed. Experiments show that the identification of segments decreases the complexity of parsing and improves the accuracy of Chinese dependency parsing.

作者马金山刘挺李生

机构地区哈尔滨工业大学计算机学院信息检索研究室

出处《哈尔滨工业大学学报》 EI CAS CSCD 北大核心 2009年第5期52-55,共4页 Journal of Harbin Institute of Technology

基金国家自然科学基金资助项目(60575042 60675034)

关键词依存句法分析句子片段依存关系支持向量机 dependency parsing segment dependency relation SVM

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献9

1SHIUAN P L, ANN C T H. A divide-and-conquer strategy for parsing[ C]//Proceedings of the 5th International Workshop on Parsing Technologies, Santa Cruz: [ s. n. ], 1996:57 - 66.
2LYON C, DICKERSON B. Reducing the complexity of parsing by a method of decomposition [ C ]//International Workshop on Parsing Technology, [ S. L. ] : Association of Computational Linguistics Massachusetts, 1997.
3SANG E F T K, DEJEAN H. Introduction to the CoNLL-2001 shared task: clause identification [ C ]//Proceedings of CoNLL - 2001, Toulouse: [ s. n. ] , 2001 : 53 - 57.
4CHIANG D, BIKEL D M. Recovering latent information in treebanks[ C ]//Proceedings of the 19th International Conference on Computational Linguistics, Taipei: [ s. n. ] , 2002:183 - 189.
5RIEDEL S, CAKICI R, MEZA-RUIZ I. Multi-lingual dependency parsing with incremental integer linear pro- gramming[ C]//Proceedings of the CoNLL- 2006, New York : [ s. n. ] , 2006 : 226 - 230.
6刘挺,马金山,李生.基于词汇支配度的汉语依存分析模型[J].软件学报,2006,17(9):1876-1883. 被引量：24
7KIM S D, ZHANG B T, KIM Y T. Reducing parsing complexity by intra- sentence segmentation based on maximum entropy[ C ]//Proceedings of EMNLP/VLC - 2000, Hong Kong: [ s. n. ], 2000 : 64 - 171.
8JIN M, MI-YOUNG K, KIM D, et al. Segmentation of chinese long sentences using commas [ C ]//Proceedings of 3rd ACL SIGHAN Workshop, Spain: Association for Computational Linguistics, 2004 : 1 - 8.
9MCDONALD R, LERMAN K, PEREIRA F. Multilingual dependency analysis with a two - stage discriminative parser [ C ]//Proceedings of the CoNLL - 2006, New York : [ s. n. ] ,2006 : 216 - 220.

二级参考文献1

1刘伟权,王明会,钟义信.建立现代汉语依存关系的层次体系[J].中文信息学报,1996,10(2):32-46. 被引量：17

共引文献23

1李剑锋,杨芸,周昌乐.一种基于汉语隐喻依存句法树的嵌入式树匹配算法[J].厦门大学学报（自然科学版）,2008,47(4):500-504. 被引量：1
2郎君,秦兵,刘挺,李正华,李生.中文人称名词短语单复数自动识别[J].自动化学报,2008,34(8):972-979. 被引量：4
3杨芸,李剑锋,周昌乐,黄孝喜.基于实例的汉语语义超常搭配的自动发现[J].计算机科学,2008,35(9):195-197. 被引量：3
4赵世奇,张宇,赵琳,刘挺,李生.基于网络挖掘的上下文相关词汇级复述研究(英文)[J].软件学报,2009,20(7):1746-1755. 被引量：2
5计峰,邱锡鹏.基于序列标注的中文依存句法分析方法[J].计算机应用与软件,2009,26(10):133-135. 被引量：6
6杨潇,马军,万建成.基于局部优先和嵌套层次的二元组合语法分析模型[J].模式识别与人工智能,2009,22(6):833-840.
7钟丹,朱倩,李梅,程显毅.人称名词短语单复数信息和最大熵模型的指代消解[J].江南大学学报（自然科学版）,2009,8(6):666-669. 被引量：1
8沈超.基于子树的确定性依存分析方法[J].计算机应用与软件,2011,28(2):268-270.
9周惠巍,黄德根,高洁,杨元生.最大生成树算法和决策式算法相结合的中文依存关系解析[J].中文信息学报,2012,26(3):16-21. 被引量：7
10曹希彬,胡辉.基于SNS的网络挖掘系统研究[J].现代计算机,2012,18(13):10-13.

1郑诚,夏青松,孙昌年.一种基于成分的句子相似度计算[J].计算机技术与发展,2012,22(12):101-104. 被引量：4
2费鲲.机器翻译中句法分析的设计与实现[J].计算机工程与设计,2006,27(15):2832-2834. 被引量：1
3全湘溶.中文短文本多级情感分析[J].现代电信科技,2015,0(5):51-59.
4张西龙,季铎,王岩,苗雪雷.英汉专利语料中长句的分割[J].沈阳航空航天大学学报,2011,28(5):67-70. 被引量：2
5姚文琳,王玉丹.基于SVM的汉语决策式依存分析[J].计算机工程,2010,36(21):217-219.
6赖鸿昌,朱礼军,徐硕.面向专利的化合物和生物实体识别系统[J].情报工程,2015,1(4):95-103. 被引量：4
7戴晓君.上下文无关语言句子的反向自然枚举[J].计算机工程与设计,2008,29(8):1874-1877.
8赵静.利奇的7种意义与英语教学[J].科技信息,2011(7).
9张合,邬晓钧,王晓东,郑方.一种基于句子分割的文法自动推导算法[J].清华大学学报（自然科学版）,2009(S1):1322-1327.
10徐琳宏,林鸿飞.基于语义特征和本体的语篇情感计算[J].计算机研究与发展,2007,44(z2):356-360. 被引量：13

哈尔滨工业大学学报

2009年第5期

浏览历史

内容加载中请稍等...

基于SVM的汉语句子片段划分

参考文献9

二级参考文献1

共引文献23

相关作者

相关机构

相关主题

浏览历史