当前位置 > it书童 > scrapy > 正文
推荐小册
java高效编程
怎样更高效地用 java 编程

juc并发工具库
java并发编程工具库

设计模式
设计模式

jvm调优
jvm调优

rabbitmq实战
rabbitmq实战

redis实战
redis实战

Keepavlied高可用集群
Keepavlied高可用集群

nginx入门到实战
nginx入门到实战

java调试
java调试中遇到的各种坑

java输入输出流
java输入输出流

scrapy 使用 Item 封装数据

scrapy it书童 2019-10-04 15:21:53 0赞 0踩 757阅读 0评论

Scrapy提供了以下两个类,用户可以使用它们自定义数据类(如书籍信息),封装爬取到的数据:

● Item基类

自定义数据类(如BookItem)的基类。

● Field类

用来描述自定义数据类包含哪些字段(如name、price等)。

In [1]: from scrapy import Item, Field

In [2]: class BookItem(Item):
   ...:     name = Field()
   ...:     price = Field()
   ...:

In [3]: book1 = BookItem(name="Needful Things", price=45.0)

In [4]: book1
Out[4]: {'name': 'Needful Things', 'price': 45.0}

In [5]: book2 = BookItem()

In [6]: book2
Out[6]: {}

In [7]: book2['name'] = 'Life of Pi'

In [8]: book2['price'] = 32.5

In [9]: book2
Out[9]: {'name': 'Life of Pi', 'price': 32.5}

BookItem会对字段进行检测,只能赋值定义过的字段

In [10]: book = BookItem()

In [11]: book['name'] = 'Memoirs of a Geisha'

In [12]: book['prize'] = 43.0 # 将 price 错写成 prize
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-12-a02f65b05a66> in <module>()
----> 1 book['prize'] = 43.0

~/.pyenv/versions/3.6.6/lib/python3.6/site-packages/scrapy/item.py in __setitem__(self, key, value)
     64         else:
     65             raise KeyError("%s does not support field: %s" %
---> 66                 (self.__class__.__name__, key))
     67
     68     def __delitem__(self, key):

KeyError: 'BookItem does not support field: prize'

访问BookItem对象中的字段与访问字典类似

In [14]: book1['name']
Out[14]: 'Needful Things'

In [15]: book1.get('price', 60.0)
Out[15]: 45.0

In [17]: list(book1.items())
Out[17]: [('name', 'Needful Things'), ('price', 45.0)]
关于我
一个文科出身的程序员,追求做个有趣的人,传播有价值的知识,微信公众号主要分享读书思考心得,不会有代码类文章,非程序员的同学请放心订阅
转载须注明出处:https://www.itshutong.com/articles/136/using-item-to-encapsulate-data