Skip to content Skip to sidebar Skip to footer

Scrapy: Populate Nested Items With Itemloader

I have this object I'm trying to populate with an itemLoader: { 'domains': 'string', 'date_insert': '2016-12-23T11:25:00.213Z', 'title': 'string', 'url': 'string', 'body

Solution 1:

Thanks to @eLRuLL I manage to find a decent solution :

items.py :

class StatsItem(scrapy.Item):
    views_count=scrapy.Field()
    comments_count=scrapy.Field()

class ArticleItem(scrapy.Item):
    [...]
    stats=scrapy.Field(
        input_processor=Identity())


class StatsItemLoader(ItemLoader):
    default_input_processor=MapCompose(remove_tags)
    default_output_processor=TakeFirst()
    default_item_class=StatsItem

spider.py:

def parse(self, response):
    [...]
    loader.add_value('stats', self.getStats(response))
    [...]

def getStats(self, response):
    statsLoader = StatsItemLoader(response=response)
    statsLoader.add_xpath('comments_count', '//div[@class=\'btn-count\']//a/text()')
    statsLoader.add_value('views_count', '42')
    return dict(statsLoader.load_item())

Originally it was not working because my input_processor was MapCompose(remove_tags) for the stats field. In order to serialize the object you have to return dict(loader.load_item()) and not just return loader.load_item()

Thanks !

Post a Comment for "Scrapy: Populate Nested Items With Itemloader"