爬虫蜘蛛项目加载器Item Loader类详解之嵌套加载器详解 (22)python SCRAPY最新教程1.51以上版本

发表于： 2020年9月4日 2022年12月8日
分类： Python, scrapy
标签： add, add_xpath, class, email, footer, href, item, ItemLoader, load, Loader, python, Scrapy, scrapy教程, stuff, 加载, 嵌套, 爬虫, 蜘蛛, 选择器, 页脚

解析文档子节中的相关值时，创建嵌套加载器会很有用。想象一下，您从页面的页脚中提取详细信息，如下所示：

例：

<footer>
    <a class="social" href="https://facebook.com/whatever">Like Us</a>
    <a class="social" href="https://twitter.com/whatever">Follow Us</a>
    <a class="email" href="mailto:[email protected]">Email Us</a>
</footer>

如果没有嵌套的加载器，则需要为要提取的每个值指定完整的xpath（或css）。

例：

loader = ItemLoader(item=Item())
# load stuff not in the footer
loader.add_xpath('social', '//footer/a[@class = "social"]/@href')
loader.add_xpath('email', '//footer/a[@class = "email"]/@href')
loader.load_item()

相反，您可以使用页脚选择器创建嵌套加载程序并添加相对于页脚的值。功能相同但您避免重复页脚选择器。

例：

loader = ItemLoader(item=Item())
# load stuff not in the footer
footer_loader = loader.nested_xpath('//footer')
footer_loader.add_xpath('social', 'a[@class = "social"]/@href')
footer_loader.add_xpath('email', 'a[@class = "email"]/@href')
# no need to call footer_loader.load_item()
loader.load_item()

您可以任意嵌套加载器，它们可以使用xpath或css选择器。作为一般准则，当它们使代码更简单时使用嵌套的加载器但是不要过度嵌套或者解析器变得难以阅读。

- 管理可执行的Python zip档案 - 软件打包和分发（Python教程）（参考资料） 2019年5月2日
与Tk的IDLE-图形用户界面（Python教程）（参考资料） 2019年4月22日
使用蒙特卡洛方案为奇异期权定价的观察 2022年9月1日
你可能不知道的15个有用的WordPress配置技巧 2023年5月1日
如何在WordPress中添加社交共享按钮？（初学者指南） 2019年1月29日
爬虫蜘蛛采集请求和回应Request和Response之响应对象scrapy.Response(34)p… 2020年9月10日
如何在WordPress中编辑wp-config.php文件 2019年1月24日
Python的数字/字符/切片等介绍(3)python入门教程 2019年1月3日
- HTTP服务器 - 互联网协议和支持（Python教程）（参考资料） 2019年4月10日
用于命令行选项，参数和子命令的解析器 - 通用操作系统服务（Python教程）（参考资料） 2019年2月19日
模块与包的导入与应用import(5)Python语言(语法教程)(参考资料) 2019年2月1日