期刊文献+

基于Python的新浪微博数据爬虫 被引量:62

Data crawler for Sina Weibo based on Python
在线阅读 下载PDF
导出
摘要 目前很多的社交网络研究都是采用国外的平台数据,而国内的新浪微博没有很好的接口方便研究人员采集数据进行分析。为了快速地获取到微博中的数据,开发了一款支持并行的微博数据抓取工具。该工具可以实时抓取微博中指定用户的粉丝信息、微博正文等内容;该工具利用关键字匹配技术,匹配符合规定条件的微博,并抓取相关内容;该工具支持并行抓取,可以同时抓取多个用户的信息。最后将串行微博爬虫工具和其并行版本进行对比,并使用该工具对部分微博数据作了一个关于流感问题的分析。实验结果显示:并行爬虫拥有较好的加速比,可以快速地获取数据,并且这些数据具有实时性和准确性。 Nowadays, most of researches about social network use data from foreign social network platforms. However the largest social network platform Sina Weibo in China has no data interfaces for investors. A Sina Weibo data crawler combined with parallelization technology was put forward. It got fans information and Weibo data content of different weibo users in real-time. It also supported key words matching and parallelization. The serial data crawler and its parallel version were compared, and an experiment about flu was conducted on some Weibo data. The results indicate that, with parallelization, this tool has liner speedup and all the fetching data are with timeliness and accuracy.
出处 《计算机应用》 CSCD 北大核心 2014年第11期3131-3134,共4页 journal of Computer Applications
基金 国家自然科学基金资助项目(91330116) 高等学校博士学科点专项科研基金资助项目(20113108120022) 上海市科委重点项目(11510500300)
关键词 新浪微博 爬虫 PYTHON 并行 大数据 Sina Weibo crawler Python parallel big data
  • 相关文献

参考文献13

  • 1TUMASJAN A, SPRENGER T O, SANDNER P G, et al. Predicting elections with Twitter: what 140 characters reveal about political sentiment[C] // Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media. Madison: AAAI Press, 2010, 10: 178-185.
  • 2WELCH M J, SCHONFELD U, HE D, et al. Topical semantics of twitter links[C] // Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. New York: ACM Press, 2011: 327-336.
  • 3CARLISLE J E, PATTON R C. Is social media changing how we understand political engagement? An analysis of Facebook and the 2008 presidential election[J]. Political Research Quarterly, 2013, 66(4): 883-895.
  • 4CUNLIFFE D, MORRIS D, PRYS C. Young bilinguals' language behaviour in social networking sites: the use of welsh on Facebook[J]. Journal of Computer-Mediated Communication, 2013, 18(3): 339-361.
  • 5STRAFLING N, KRAMER N C. Learning together on Facebook et al. The influence of social aspects and personality on the usage of social media for study related exchange [J]. Gruppendynamik und Organisationsberatung, 2013, 44(4): 409-428.
  • 6DUAN J Y, DHOLAKIA N. The reshaping of Chinese consumer values in the social media era: exploring the impact of Weibo [J]. Journal of Macromarketing, 2013, 33(4): 402-403.
  • 7HUANG R, SUN X. Weibo network, information diffusion and implications for collective action in China [J]. Information Communication and Society, 2014, 17(1): 86-104.
  • 8MAZO J. Blocked on Weibo: what gets suppressed on China's version of Twitter (and why) [J]. Survival, 2013, 55(6): 191-192.
  • 9POELL T, de KLOET J, ZENG G, et al. Will the real Weibo please stand up? Chinese online contention and actor-network theory [J]. Chinese Journal of Communication, 2014,7(1): 1-18.
  • 10PINKERTON B. Finding what people want: experiences with the WebCrawler[EB/OL]. [2010-10-10]. http://www.webir.org/resources/phd/pinkerton_2000.pdf.

二级参考文献26

  • 1EHRIG M, MAEDCHE A. Ontology-focused crawling of Web documents[A]. Proceedings of the 2003 ACM symposium on Applied computing[C], March 2003.
  • 2GUO Q, GUO H, ZHANG ZQ, et al. Schema Driven Topic Specific Web Crawling[A]. DASFAA[C], 2005.
  • 3GRAUPMANN J, BIWER M, ZIMMER C, et al. COMPASS: A Concept-based Web Search Engine for HTML, XML, and Deep Web Data[A]. Proceedings of the 30th VLDB Conference[C],2004.
  • 4QIN JL, ZHOU YL, CHAU M. Building domain-specific web collections for scientific digital libraries: a meta-search enhanced focused crawling method[A]. Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries[C], June 2004.
  • 5CHO J , GARCIA - MOLINA H , PAGE L . Efficient crawling through URL ordering[A]. Proceedings of the seventh international conference on World Wide Web 7[C], April 1998.
  • 6FLORESCU D, LEVY AY, MENDELZON AO. Database techniques for the world-wide web: A survey[J]. SIGMOD Record, 1998,27(3) :59 -74.
  • 7LAWRENCE S, GILES CL. Searching the World Wide Web[J].Science, 1998,280(5360):98.
  • 8CHAKRABARTI S, VAN DEN BERG M, DOM B. Focused crawling: A new approach to topicspecific web resource discovery[A].Proceedings of the Eighth International World-Wide Web Conference[C], 1999.
  • 9DAVULCU H, KODURI S, NAGARAJAN S. Datarover: a taxonomy based crawler for automated data extraction from data-intensive websites[A]. Proceedings of the 5th ACM international workshop on Web information and data management[C], November 2003.
  • 10AGGARWAL CC. Collaborative Crawling: Aggarwal C. Collaborative crawling: mining user experiences for topical resource discovery [A]. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining[C], July 2002.

共引文献155

同被引文献349

引证文献62

二级引证文献351

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部