Skip to content Skip to sidebar Skip to footer

Python Webkit With Proxy Support

I am writing a python script for scraping a webpage. I have created a webkit webview object and used the open method for loading the url. But I want to load the url through a proxy

Solution 1:

try below code snippets. (reference from url)

import gtk, webkit
import ctypes
libgobject = ctypes.CDLL('/usr/lib/libgobject-2.0.so.0')
libwebkit = ctypes.CDLL('/usr/lib/libsoup-2.4.so.1')
libsoup = ctypes.CDLL('/usr/lib/libsoup-2.4.so.1')
libwebkit = ctypes.CDLL('/usr/lib/libwebkit-1.0.so')

proxy_uri = libsoup.soup_uri_new('http://127.0.0.1:8000') # set your proxy url

session = libwebkit.webkit_get_default_session()
libgobject.g_object_set(session, "proxy-uri", proxy_uri, None)

w = gtk.Window()
s = gtk.ScrolledWindow()
v = webkit.WebView()
s.add(v)
w.add(s)
w.show_all()

v.open('http://www.google.com')

Hope, it could help you.


Solution 2:

You can use QApplicationProxy if you're on pyqt or this snippet if you're using pygi:

from gi.repository import WebKit
from gi.repository import Soup

proxy_uri = Soup.URI.new("http://127.0.0.1:8080")
session = WebKit.get_default_session().set_property("proxy-uri")
session.set_property("proxy-uri",proxy_uri)

References:
PyGI
PyQt


Solution 3:

How about a solution that's already made?

PyPhantomJS is a minimalistic, headless, WebKit-based, JavaScript-driven tool. It is written in PyQt4 and Python. It runs on Linux, Windows, and Mac OS X.

It gives you access to a full headless WebKit browser, controllable via scripts written in JavaScript, with the ability to do various things, amongst which is screen scraping and proxy support. It uses the command line.

You can see the API here.

* When I say screen scraping, I mean you can either scrape page content, or even save page renders to a file. There's even a screen scraping JS library already written here.


Post a Comment for "Python Webkit With Proxy Support"