ghost.py

https://drone.io/github.com/jeanphix/Ghost.py/status.png

ghost.py is a webkit web client written in python:

from ghost import Ghost
ghost = Ghost()

with ghost.start() as session:
    page, extra_resources = session.open("http://jeanphix.me")
    assert page.http_status == 200 and 'jeanphix' in page.content

Installation

ghost.py requires either PySide (preferred) or PyQt Qt bindings:

pip install pyside
pip install ghost.py --pre

OSX:

brew install qt
mkvirtualenv foo
pip install -U pip  # make sure pip is current
pip install PySide
pyside_postinstall.py -install
pip install Ghost.py

API

Ghost

class ghost.Ghost(log_level=30, log_handler=<logging.StreamHandler object>, plugin_path=['/usr/lib/mozilla/plugins'], defaults=None)

Ghost manages a Qt application.

Parameters:
  • log_level – The optional logging level.
  • log_handler – The optional logging handler.
  • plugin_path – Array with paths to plugin directories (default [‘/usr/lib/mozilla/plugins’])
  • defaults – The defaults arguments to pass to new child sessions.
start(**kwargs)

Starts a new Session.

Session

class ghost.Session(ghost, user_agent='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2', wait_timeout=8, wait_callback=None, display=False, viewport_size=(800, 600), ignore_ssl_errors=True, plugins_enabled=False, java_enabled=False, javascript_enabled=True, download_images=True, show_scrollbars=True, exclude=None, network_access_manager_class=<class 'ghost.ghost.NetworkAccessManager'>, web_page_class=<class 'ghost.ghost.GhostWebPage'>)

Session manages a QWebPage.

Parameters:
  • ghost – The parent Ghost instance.
  • user_agent – The default User-Agent header.
  • wait_timeout – Maximum step duration in second.
  • wait_callback – An optional callable that is periodically executed until Ghost stops waiting.
  • log_level – The optional logging level.
  • log_handler – The optional logging handler.
  • display – A boolean that tells ghost to displays UI.
  • viewport_size – A tuple that sets initial viewport size.
  • ignore_ssl_errors – A boolean that forces ignore ssl errors.
  • plugins_enabled – Enable plugins (like Flash).
  • java_enabled – Enable Java JRE.
  • download_images – Indicate if the browser should download images
  • exclude – A regex use to determine which url exclude when sending a request
call(*args, **kwargs)

Call method on element matching given selector.

Parameters:
  • selector – A CSS selector to the target element.
  • method – The name of the method to call.
  • expect_loading – Specifies if a page loading is expected.
capture(region=None, selector=None, format=None)

Returns snapshot as QImage.

Parameters:
  • region – An optional tuple containing region as pixel coodinates.
  • selector – A selector targeted the element to crop on.
  • format – The output image format.
capture_to(path, region=None, selector=None, format=None)

Saves snapshot as image.

Parameters:
  • path – The destination path.
  • region – An optional tuple containing region as pixel coodinates.
  • selector – A selector targeted the element to crop on.
  • format – The output image format.
clear_alert_message()

Clears the alert message

click(*args, **kwargs)

Click the targeted element.

Parameters:
  • selector – A CSS3 selector to targeted element.
  • btn – The number of mouse button. 0 - left button, 1 - middle button, 2 - right button
confirm(*args, **kwds)

Statement that tells Ghost how to deal with javascript confirm().

Parameters:confirm – A boolean or a callable to set confirmation.
content

Returns current frame HTML as a string.

Parameters:to_unicode – Whether to convert html to unicode or not
cookies

Returns all cookies.

delete_cookies()

Deletes all cookies.

evaluate(*args, **kwargs)

Evaluates script in page frame.

Parameters:script – The script to evaluate.
evaluate_js_file(path, encoding='utf-8', **kwargs)

Evaluates javascript file at given path in current frame. Raises native IOException in case of invalid file.

Parameters:
  • path – The path of the file.
  • encoding – The file’s encoding.
exists(selector)

Checks if element exists for given selector.

Parameters:string – The element selector.
exit()

Exits all Qt widgets.

fill(*args, **kwargs)

Fills a form with provided values.

Parameters:
  • selector – A CSS selector to the target form to fill.
  • values – A dict containing the values.
fire(*args, **kwargs)

Fire event on element at selector

Parameters:
  • selector – A selector to target the element.
  • event – The name of the event to trigger.
frame(selector=None)

Set main frame as current main frame’s parent.

Parameters:frame – An optional name or index of the child to descend to.
global_exists(global_name)

Checks if javascript global exists.

Parameters:global_name – The name of the global.
hide()

Close the webview.

load_cookies(cookie_storage, keep_old=False)

load from cookielib’s CookieJar or Set-Cookie3 format text file.

Parameters:
  • cookie_storage – file location string on disk or CookieJar instance.
  • keep_old – Don’t reset, keep cookies not overridden.
open(address, method='get', headers={}, auth=None, body=None, default_popup_response=None, wait=True, timeout=None, client_certificate=None, encode_url=True, user_agent=None)

Opens a web page.

Parameters:
  • address – The resource URL.
  • method – The Http method.
  • headers – An optional dict of extra request hearders.
  • auth – An optional tuple of HTTP auth (username, password).
  • body – An optional string containing a payload.
  • default_popup_response – the default response for any confirm/

alert/prompt popup from the Javascript (replaces the need for the with blocks) :param wait: If set to True (which is the default), this method call waits for the page load to complete before returning. Otherwise, it just starts the page load task and it is the caller’s responsibilty to wait for the load to finish by other means (e.g. by calling wait_for_page_loaded()). :param timeout: An optional timeout. :param client_certificate An optional dict with “certificate_path” and “key_path” both paths corresponding to the certificate and key files :param encode_url Set to true if the url have to be encoded :param user_agent An option user agent string. :return: Page resource, and all loaded resources, unless wait is False, in which case it returns None.

print_to_pdf(path, paper_size=(8.5, 11.0), paper_margins=(0, 0, 0, 0), paper_units=None, zoom_factor=1.0)

Saves page as a pdf file.

See qt4 QPrinter documentation for more detailed explanations of options.

Parameters:
  • path – The destination path.
  • paper_size – A 2-tuple indicating size of page to print to.
  • paper_margins – A 4-tuple indicating size of each margin.
  • paper_units – Units for pager_size, pager_margins.
  • zoom_factor – Scale the output content.
prompt(*args, **kwds)

Statement that tells Ghost how to deal with javascript prompt().

Parameters:value – A string or a callable value to fill in prompt.
region_for_selector(selector)

Returns frame region for given selector as tuple.

Parameters:selector – The targeted element.
save_cookies(cookie_storage)

Save to cookielib’s CookieJar or Set-Cookie3 format text file.

Parameters:cookie_storage – file location string or CookieJar instance.
set_field_value(*args, **kwargs)

Sets the value of the field matched by given selector.

Parameters:
  • selector – A CSS selector that target the field.
  • value – The value to fill in.
  • blur – An optional boolean that force blur when filled in.
set_proxy(type_, host='localhost', port=8888, user='', password='')

Set up proxy for FURTHER connections.

Parameters:
  • type – proxy type to use: none/default/socks5/https/http.
  • host – proxy server ip or host name.
  • port – proxy port.
set_viewport_size(width, height)

Sets the page viewport size.

Parameters:
  • width – An integer that sets width pixel count.
  • height – An integer that sets height pixel count.
show()

Show current page inside a QWebView.

wait_for(condition, timeout_message, timeout=None)

Waits until condition is True.

Parameters:
  • condition – A callable that returns the condition.
  • timeout_message – The exception message on timeout.
  • timeout – An optional timeout.
wait_for_alert(timeout=None)

Waits for main frame alert().

Parameters:timeout – An optional timeout.
wait_for_page_loaded(timeout=None)

Waits until page is loaded, assumed that a page as been requested.

Parameters:timeout – An optional timeout.
wait_for_selector(selector, timeout=None)

Waits until selector match an element on the frame.

Parameters:
  • selector – The selector to wait for.
  • timeout – An optional timeout.
wait_for_text(text, timeout=None)

Waits until given text appear on main frame.

Parameters:
  • text – The text to wait for.
  • timeout – An optional timeout.
wait_while_selector(selector, timeout=None)

Waits until the selector no longer matches an element on the frame.

Parameters:
  • selector – The selector to wait for.
  • timeout – An optional timeout.