如何运行我们的蜘蛛爬虫(3)python SCRAPY教程1.51以上版本

发表于： 2020年8月26日 2022年12月7日
分类： Python, scrapy
标签： core, Crawled, DEBUG, engine, GET, html, HTTP, None, python, quotes, referer, Scrapy, scrapy教程, toscrape, 安装Scrapy, 爬虫, 蜘蛛

要让我们的蜘蛛工作，请转到项目的顶级目录并运行：

scrapy crawl quotes

此命令运行quotes我们刚添加的名称的spider ，它将发送一些quotes.toscrape.com域请求。您将获得类似于此的输出：

... (omitted for brevity)
2016-12-16 21:24:05 [scrapy.core.engine] INFO: Spider opened
2016-12-16 21:24:05 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-12-16 21:24:05 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/> (referer: None)
2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/2/> (referer: None)
2016-12-16 21:24:05 [quotes] DEBUG: Saved file quotes-1.html
2016-12-16 21:24:05 [quotes] DEBUG: Saved file quotes-2.html
2016-12-16 21:24:05 [scrapy.core.engine] INFO: Closing spider (finished)
...

现在，检查当前目录中的文件。您应该注意到已经创建了两个新文件：quotes-1.html和quotes-2.html，以及各个URL的内容，正如我们的parse方法所指示的那样。

注意

如果您想知道为什么我们还没有解析HTML，请继续，我们将很快介绍。

爬虫蜘蛛项目加载器Item Loader类详解之ItemLoader对象详解 (21)python… 2020年9月4日
爬虫蜘蛛Scrapy内置蜘蛛中间件SPIDER_MIDDLEWARES的详细介绍(61)python… 2020年9月25日
爬虫蜘蛛使用python内置日志记录系统Logging(38)python Scrapy教程1.51以上版本 2020年9月12日
爬虫蜘蛛采集请求和回应Request和Response之请求对象scrapy.Request(33)py… 2020年9月10日
爬虫蜘蛛Scrapy shell之运行使用shell详解 (26)python SCRAPY最新教程1.51以上版本 2020年9月6日
爬虫蜘蛛合同contracts(44)python Scrapy教程1.51以上版本 2020年9月16日
start_requests方法的快捷方式(4)python SCRAPY教程1.51以上版本 2020年8月27日
运行Scrapy爬虫蜘蛛的方法大全(45)python Scrapy教程1.51以上版本 2020年9月17日
抓取采集网页并提取数据(5)python SCRAPY最新教程1.51以上版本 2020年8月27日
运行爬虫蜘蛛crawl参数(6)python SCRAPY最新教程1.51以上版本 2020年8月28日
爬虫蜘蛛的运行与调试(43)python Scrapy教程1.51以上版本 2020年9月16日