For faster navigation, this Iframe is preloading the Wikiwand page for Scrapy.

Scrapy

Scrapy
開發者Scrapinghub, Ltd.英语Scrapinghub, Ltd.
首次发布2008年6月26日 (2008-06-26)
当前版本
  • 2.11.2 (2024年5月14日;穩定版本)[1]
編輯維基數據鏈接
源代码库 編輯維基數據鏈接
编程语言Python
操作系统WindowsmacOSLinux
类型网络爬虫
许可协议BSD许可证
网站scrapy.org 编辑维基数据

Scrapy/ˈskrpi/ SKRAY-pee[3]是一个用Python编写的自由且开源网络爬虫框架。它在设计上的初衷是用于爬取网络数据,但也可用作使用API来提取数据,或作为生成目的的网络爬虫[4]。该框架目前由网络抓取的开发与服务公司Scrapinghub公司英语Scrapinghub Ltd.维护。

Scrapy项目围绕“蜘蛛”(spiders)建构,蜘蛛是提供一套指令的自包含的爬网程序(crawlers)。遵循其他如Django框架的一次且仅一次精神[5],允许开发者重用代码将便于构建和拓展大型的爬网项目。Scrapy也提供一个爬网shell,开发者可用它测试对网站的效果。[6]

使用Scrapy的知名公司和产品有:Lyst[7][8]Parse.ly英语Parse.ly[9]Sayone Technologies英语Sayone Technologies[10]Sciences Po Medialab[11]Data.gov.uk英语Data.gov.uk的世界政府数据网站[12]等。

历史

Scrapy诞生于网络聚合和电子商务公司Mydeco,它由Mydeco和Insophia公司的员工开发和维护。2008年8月首次以BSD许可证公开发布,2015年6月发布有里程碑意义的1.0版本[13]。2011年,Scrapinghub成为新的官方维护者[14][15]

参考文献

  1. ^ Release 2.11.2. 2024年5月14日 [2024年5月17日]. 
  2. ^ Release notes — Scrapy documentation. doc.scrapy.org. [18 November 2020]. (原始内容存档于2020-01-28) (英语). 
  3. ^ How do you pronounce "Scrapy"?页面存档备份,存于互联网档案馆))
  4. ^ Scrapy at a glance页面存档备份,存于互联网档案馆).
  5. ^ Frequently Asked Questions. [28 July 2015]. (原始内容存档于2020-11-11). 
  6. ^ Scrapy shell. [28 July 2015]. (原始内容存档于2020-10-31). 
  7. ^ Bell, Eddie; Heusser, Jonathan. Scalable Scraping Using Machine Learning. [28 July 2015]. (原始内容存档于2016-10-09). 
  8. ^ Scrapy | Companies using Scrapy. [2020-12-08]. (原始内容存档于2020-11-12). 
  9. ^ Montalenti, Andrew. Web Crawling & Metadata Extraction in Python. [2020-12-08]. (原始内容存档于2020-09-19). 
  10. ^ Scrapy Companies. Scrapy website. [2020-12-08]. (原始内容存档于2020-11-12). 
  11. ^ Hyphe v0.0.0: the first release of our new webcrawler is out!. [2020-12-08]. (原始内容存档于2016-06-13). 
  12. ^ Ben Firshman [@bfirsh]. World Govt Data site uses Django, Solr, Haystack, Scrapy and other exciting buzzwords bit.ly/5jU3La #opendata #datastore (推文). 21 January 2010 –通过Twitter. 
  13. ^ Medina, Julia. Scrapy 1.0 official release out!  . scrapy-users (邮件列表). 19 June 2015 [2018-09-13]. (原始内容存档于2011-01-22). 
  14. ^ Pablo Hoffman. List of the primary authors & contributors. 2013 [18 November 2013]. (原始内容存档于2017-05-29). 
  15. ^ Interview Scraping Hub页面存档备份,存于互联网档案馆).

外部链接

参见

{{bottomLinkPreText}} {{bottomLinkText}}
Scrapy
Listen to this article

This browser is not supported by Wikiwand :(
Wikiwand requires a browser with modern capabilities in order to provide you with the best reading experience.
Please download and use one of the following browsers:

This article was just edited, click to reload
This article has been deleted on Wikipedia (Why?)

Back to homepage

Please click Add in the dialog above
Please click Allow in the top-left corner,
then click Install Now in the dialog
Please click Open in the download dialog,
then click Install
Please click the "Downloads" icon in the Safari toolbar, open the first download in the list,
then click Install
{{::$root.activation.text}}

Install Wikiwand

Install on Chrome Install on Firefox
Don't forget to rate us

Tell your friends about Wikiwand!

Gmail Facebook Twitter Link

Enjoying Wikiwand?

Tell your friends and spread the love:
Share on Gmail Share on Facebook Share on Twitter Share on Buffer

Our magic isn't perfect

You can help our automatic cover photo selection by reporting an unsuitable photo.

This photo is visually disturbing This photo is not a good choice

Thank you for helping!


Your input will affect cover photo selection, along with input from other users.

X

Get ready for Wikiwand 2.0 🎉! the new version arrives on September 1st! Don't want to wait?