Use Scrapy Parse Function To Parse A Specific Url
I have a scrapy crawler which works fine. I now want to use its 'parse' function to parse a given url. While there exists a command line utility to do so for a single url using co
Solution 1:
Managed to figure it out.
Essentially, I just needed to pass the response body, url and the scrapy request to create the response object.
bs = BaseSpider('some')
head = 'www.mywebsite.com'
httpcon = httplib.HTTPConnection(head)
tail = '/mypage.html'
httpcon.request('GET',tail)
sreq = bs.make_requests_from_url(link)
sresp = TextResponse(url=link,status=200,body=httpcon.getresponse(),encoding='utf-8')
Solution 2:
A quick kludge (with pieces from here and here) in case, unlike for OP, subprocess
is an option.
importsubprocessbashCommand="Scrapy fetch http://www.testsite.com/testpage.html"
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
page, scrapy_meta_info = process.communicate()
Post a Comment for "Use Scrapy Parse Function To Parse A Specific Url"