Python Web-scraping Csrf Token Issue

March 26, 2024 Post a Comment

I am using MechanicalSoup to login to a website via Python 3.6 and I'm having issues with the CSRF token. Every time i request the html back i read 'Invalid CSRF token: Forbidden'.

Solution 1:

I believe the issue here is that <input> elements must have name attributes for them to be submitted via POST or GET. Since your token is in a name-less <input> element, it is not processed by MechanicalSoup because that's what the browser would do.

From the W3C specification:

Every successful control has its control name paired with its current value as part of the submitted form data set. A successful control must be defined within a FORM element and must have a control name.
...
A control's "control name" is given by its name attribute.

Perhaps there is some JavaScript that is handling the CSRF token.

For a similar discussion, see Does form data still transfer if the input tag has no name?

Baca Juga

Regarding your usage of MechanicalSoup, the classes StatefulBrowser and Form would simplify your script. For example, if you just had to open the page and input a username and password:

import mechanicalsoup

# These values are filled by the user
url = ""
username = ""
password = ""# Open the page
browser = mechanicalsoup.StatefulBrowser(raise_on_404=True)
browser.open(url)

# Fill in the form values
form = browser.select_form('form[id=loginForm]')
form['username'] = username
form['password'] = password

# Submit the form and print the resulting page text
response = browser.submit_selected()
print(response.text)

Introduction to Python Course

Python Web-scraping Csrf Token Issue

Solution 1:

Post a Comment for "Python Web-scraping Csrf Token Issue"