Scrapy Xpath Not Extracting Div Containing Special Characters <%=

June 24, 2022 Post a Comment

I am new to Scrapy. I am trying to extract the h2 text from the following URL: 'https://www.tysonprop.co.za/agents/' I have 2 problems: My xpath can get to the script element, but

Solution 1:

Does branch.branch_name looks like a address in JSON format? is there a call which loads data you are looking for ? maybe, let's see

By looking through your browser developer tool you can find requests in network tab and by searching between them you will face this AJAX call which loads exactly the data you are looking for. so:

import json
import scrapy

class TysonSpider(scrapy.Spider):
    name = 'tyson_spider'

    def start_requests(self):
        url = 'https://www.tysonprop.co.za/ajax/agents/?branch_id=25'
        yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        json_data = json.loads(response.text)
        branch_name = json_data['branch']['branch_name']
        yield {'branchName': branch_name}

Solution 2:

The div inside script tag it is a text. To get it as html, you can do following:

from scrapy.selector import Selector

....
def parse(self, response):

        script = Selector(text=response.xpath('//script[@id="id_branch_template"]/text()').get())
        div = script.xpath('./div[contains(@class,"branch-container")]')
        h2 = div.xpath('.//h2[contains(@class,"branch-name")]/text()').extract()
        yield {'branchName': h2}

But please NOTE, the h2 doesn't contain any text, so you result will be an empty array

Introduction to Python Course

Scrapy Xpath Not Extracting Div Containing Special Characters <%=

Solution 1:

Solution 2:

Post a Comment for "Scrapy Xpath Not Extracting Div Containing Special Characters <%="