期刊文献+

基于子词链的中文新闻广播故事自动分割 被引量:2

Subword-based lexical chaining for automatic story segmentation in Chinese broadcast news
在线阅读 下载PDF
导出
摘要 提出了一种基于子词链的中文新闻广播故事自动分割方法。利用中文同音异形字众多、词典开放、分词多样和组词灵活等特点,在新闻广播的语音识别抄本上采用中文子词单元(汉字和音节)创建子词链,进行中文新闻广播故事的自动分割,有效地解决了在传统词链方法中由于语音识别错误(特别是词典未收录词汇)导致的相关联词之间无法匹配的问题。同时,利用各级词汇表示单元之间的互补性,如词的表义确定性和子词对语音识别错误的鲁棒性,对各级词汇进行融合,利用不同级别词汇表示单元的优势进一步提高中文新闻广播故事分割的性能。在TDT2中文标准新闻广播语料库上进行的实验表明,基于一元汉字子词链分割方法的F-mea-sure比传统词链方法提高了6.06%。基于一元和二元汉字子词链边界强度的融合可以使F-mea-sure进一步提高2.55%。基于投票法的融合可以使F-measure比传统词链方法提高9.04%。 This paper applied Chinese subword representations(character and syllable n-grams) into chaining-based automa-tic story segmentation of Chinese broadcast news.It showed the robustness of Chinese subwords against speech recognition errors,especially OOV(out of vocabulary)words,in lexical term matching in erroneous speech recognition transcripts.Proposed a subword chaining approach that links repetitions of Chinese character/syllable n-gram units.Also proposed to integrate diffe-rent lexical scales in chainin...
作者 杨玉莲 谢磊
出处 《计算机应用研究》 CSCD 北大核心 2009年第2期583-586,594,共5页 Application Research of Computers
基金 国家教育部高等学校博士点学科专项基金资助项目(20070699015) 陕西省自然科学基础研究计划资助项目(2007F15) 西北工业大学基础研究基金资助项目 西北工业大学"翱翔之星"计划资助项目(07XE0150)
关键词 子词 词链 主题分割 故事分割 信息检索 语音文件检索 subword lexical chaining topic segmentation story segmentation information retrieval spoken document retrieval(SDR)
  • 相关文献

参考文献3

二级参考文献18

  • 1[1]J T Foote. An overview of audio information retrieval. Multimedia Systems, 1999, 7(1): 2~11
  • 2[2]S John. Real time discrimination of broadcast speech/music. In: Proc of Int'l Conf on Acoustic, Speech, and Signal Processing (ICASSP-96). Atlanta, GA, 1996. 993~996
  • 3[3]E Scheirer, M Slaney. Construction and evaluation of a robust multifeature music/speech discriminator. In: Proc of Int'l Conf on Acoustic, Speech, and Signal Processing (ICASSP-97). Munich, Germany, 1997. 1331~1334
  • 4[4]M Spina, V Zue. Automatic transcription of general audio data: Preliminary analysis. In: Proc of Int'l Conf on Spoken Language Processing. Philadelphia, PA, 1996. 594~597
  • 5[5]J T Foote. A similarity measure for automatic audio classification. In: Proc of AAAI 1997 Spring Symp on Intelligent Integration and Use of Text, Image, Video, and Audio Corpora. Palo Alto, CA: Stanford, 1997
  • 6[6]S Savitha, D Petkovic, D Ponceleon. Towards robust features for classifying audio in the cuevideo system. In: Proc of ACM Multimedia 99. New York, USA, 1999. 393~400
  • 7[7]Tong Zhang, C-C Jay Kuo. Heuristic approach for generic audio data segmentation and annotation. In: Proc of ACM Multimedia Conf. Orlando, 1999. 67~76
  • 8[8]M Slaney, R F Lyon. A perceptual pitch detector. In: Proc of Int'l Conf on Acoustic, Speech, and Signal Processing 1990 (ICASSP 90). Albuquerque, 1990. 357~360
  • 9[9]L R Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc of the IEEE, 1989, 77(2): 257~286
  • 10[10]G Tzanetakis, P Cook. Multifeature audio segmentation for browsing and annotation. In: Proc of 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. New Paltz, NY, 1999

共引文献10

同被引文献25

  • 1刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429. 被引量:198
  • 2刘华咏.基于音视频特征和文字信息自动分段新闻故事[J].系统仿真学报,2004,16(11):2608-2610. 被引量:8
  • 3傅间莲,陈群秀.自动文摘系统中的主题划分问题研究[J].中文信息学报,2005,19(6):28-35. 被引量:13
  • 4王会珍,朱靖波,季铎,叶娜,张斌.基于反馈学习自适应的中文话题追踪[J].中文信息学报,2006,20(3):92-98. 被引量:17
  • 5Lev Pevzner,Marti A. Hearst.A Critique and Improvement of an Evaluation Metric for Text Segmentation[J].Computational Linguistics,2002,28 (1):19-36.
  • 6Marti A.Hearst.TextTiling:Segmenting Text into Multi-paragraph Subtopic Passages[J].Computational Linguistics,1997,23(1):33-64.
  • 7Nicola Stokes,Joe Carthy,Alan F. Smeaton. SeLeCT:a lexical cohesion based news story segmentation system[J].Journal of AI Communication,2004,17(1):3-12.
  • 8Allan J,Carbonell J,Doddington G,et al.Topic detection and tracking pilot study final report[C]//Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, Virginia,USA,1998:194-218.
  • 9Doug Beeferman,Adam Berger,John Lafferty.Statistical Models for Text Segmentation[J]. Machine Learning,1999,34(1-3):177-210.
  • 10Qi W,Gu L,Jiang H,et al.Integrating visual,audio and text analysis for news video[C]//Proceedings of 7th IEEE Intn'l Conference on Image Processing,2000.

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部