摘要
现代汉语虚词的研究历史悠久,成果丰富。但是目前已有的虚词研究成果大都是面向人用的,对虚词个性的描写难以避免主观性和模糊性,很难直接应用于NLP的相关研究。本文从计算语言学的观点出发,根据目前已有的虚词研究成果以及对《人民日报》分词与词性标注语料中虚词用法规律的考察,着力构建面向NLP的现代汉语广义虚词知识库,旨在为现代汉语虚词用法的机器识别打下一定的数据基础。
Studies on Chinese functional words have a long and productive history.Up to date,most such studies are oriented towards human use.Being characteristically subjective and vague,they are difficult to be applied directly to natural language processing(NLP).From the perspective of computational linguistics,this paper discusses the construction of generalized knowledge base for Chinese functional words based on previous research results and the observation of the real uses of functional words in a segmentation and POS labeled corpus of People's Daily,aiming to provide data foundation for automatic identification of the usage of Chinese functional words.
出处
《当代语言学》
CSSCI
北大核心
2009年第2期124-135,共12页
Contemporary Linguistics
基金
国家973课题(2004CB318102)
河南省教育厅自然科学基金项目(2007520050)的支持