摘要
目前,互联网上的大部分群体性数据资源集中在微博、论坛等社交网络上.跨语言社会舆情分析是我国智能信息处理的一个研究热点.维吾尔语是我国主要少数民族语言之一,为了构建一个好的跨语言舆情分析系统,维吾尔文微博的数据获取显得尤为重要.维吾尔文微博数据获取最大的难点是微博开发商不提供API.本文以技术和经济为基础的"Guduk"微博为研究对象,提出了一种基于用户关系的维吾尔文微博数据获取爬虫系统方案,此方案解决了在不提供API情况下的数据获取难点.本文的研究为跨语言舆情分析系统提供大量的维吾尔文社交网络数据资源、数据获取方法和技术.
At present, most of the mass of data on the internet resources are concentrated in Microblogs,forums and other social networks cross-language social public opinion analysis is a hotspot of intelligent information processing in China, and Uyghur is one of the major minority languages in China. In order to build a good cross-language public opinion analysis system, Uyghur microblog's data acquisition is particularly important. The biggest difficulty of Uyghur microblog data access is that the microblog developers does not provide API. Research object of this paper is the "Guduk" Microblog,based on the technology and economy and this paper presents a program that user relationship-based microblog data acquisition crawler system. This program solved the difficulty of data acquisition on the case of not providing API. This study provides a big amount of Uyghur social network data resources,data acquisition method and techniques for cross-language public opinion analysis system.
出处
《新疆大学学报(自然科学版)》
CAS
北大核心
2015年第1期74-79,共6页
Journal of Xinjiang University(Natural Science Edition)
基金
国家重点基础研究发展计划(973)项目(2014cb340506)
国家自然科学基金项目(61331011)
关键词
跨语言
舆情
数据获取
用户关系
网络爬虫
微博API
Cross-language
Public Opinion
Data Extraction
User Relationship
Web Crawler
Micro Blog API